git.ipfire.org Git - thirdparty/qemu.git/log

mtest2make: do not repeat the same speed over and over

There are just two of them (slow and thorough; quick is simply the
default). Avoid repeating them for as many times as there are tests.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

mtest2make: add dependencies to the "speed-qualified" suite

Thorough tests may have more dependencies than faster ones.
Dependencies are now looked up based on the suites being
executed, not on the suites passed as goals to the makefile.
Therefore, it is possible to limit dependencies to the
speeds that need them.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

mtest2make: cleanup mtest-suites variables

Remove the "--suite" argument from the .*.mtest-suites variables, and
add it only when actually computing the arguments to "meson test".
This makes it possible to set ninja-cmd-goals from the set of suites,
instead of doing it via many different .ninja-goals.* variables.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: fix stack size when delivering real mode interrupts

The stack can be 32-bit even in real mode, and in this case
the stack pointer must be updated in its entirety rather than
just the bottom 16 bits. The same is true of real mode IRET,
for which there was even a comment suggesting the right thing
to do.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1506
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: svm: fix sign extension of exit code

The exit_code parameter of cpu_vmexit is declared as uint32_t, but exit
codes are 64 bits wide according to the AMD SVM specification. And because
uint32_t is unsigned, this causes exit codes to be zero-extended, for example
writing SVM_EXIT_ERR as 0xffff_ffff instead of the expected 0xffff_ffff_ffff_ffff.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2977
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386/tcg: validate segment registers

Correctly reject invalid segment registers, including CS when used as
the destination of a MOV. Ignore the REX prefix as well.

Fixes: 5e9e21bcc4d ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07)
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3195
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: Mark VPERMILPS as not valid with prefix 0

There are a small set of binary SSE insns which have no MMX
equivalent, which we create the gen functions for with the
BINARY_INT_SSE() macro. This forwards to gen_binary_int_sse() with a
NULL pointer for 'mmx'.

For almost all of these insns we correctly mark them in the decode
table as not permitting a zero prefix byte; however we got this wrong
for VPERMILPS, with the result that a bogus instruction would get
through the decode checks and end up in gen_binary_int_sse() trying
to call a NULL pointer.

Correct the decode table entry for VPERMILPS so that we get the
expected #UD exception.

In the x86 SDM, table A-4 "Three-byte Opcode Map: 08H-FFH
(First Two Bytes are 0F 38H)" confirms that there is no pfx 0
version of VPERMILPS.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3199
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20251114175417.2794804-1-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: emulate: Make sure fetch_instruction exist before calling it

Currently, this function is only available in MSHV. If a different accelerator
is used, and the code jumps to this section, a segfault will occur.
(I ran into this with HVF)

Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Link: https://lore.kernel.org/r/20251114082915.71884-2-phind.uet@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ioapic: fix typo in irqfd check

Not registering the IEC notifier results in a regression with interrupt remapping
when running a VM configured with an intel-iommu device and an assigned
PCI VF. At boot, Linux complains with :

[ 15.416794] __common_interrupt: 2.37 No irq handler for vector

Reported-by: Cédric Le Goater <clg@redhat.com>
Analyzed-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'pull-target-arm-20251114' of https://gitlab.com/pm215/qemu into staging

target-arm queue:
* MAINTAINERS file update for whpx
* target/arm: Fix accidental write to TCG constant
* target/arm/cpu64: remove duplicate include
* hw/display/xlnx_dp: don't abort() on guest errors
* cxl, vfio, tests: clean up includes
* hw/misc/npcm_clk: Don't divide by zero when calculating frequency
* hw/audio/lm4549: Don't try to open a zero-frequency audio voice

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmkXSF0ZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3iLKEACahSPxoRe4+TOgr3F7mJvq
# CDFOOUQSXbBC4WTviyJAh1+MYFhtWrOxUB1EzLb9iw1+sbBcT6/K1CBEFiQ65dpn
# kjtIaJDidz4x52vNc1nz1B9jzRdme4xQ0kg5NeY9PqCGO4nC0iWqzzbBoA1XYHsR
# RXfXr9JNXKqN3cm+x/ZX/o++rz3eG8ba0DxJUIO+OR9rAv3n0No+oTOeAJ4SbDu4
# lcP+MHFA/V//Q4O9QSeZv1tD+brXerpNcMQlsRrffkmT8bvJMPozyvcijtEZQz3+
# 9s8GUeL0b7/GgpdIqWyEAl2sreMtqmWh1GGpCZziFTiEmNWWI9M6fHINyZ2NVnPD
# T5UFOA9JbSG1ybxQHHf4Vj5tUjwWAAnVwRP1wXAb3p35fBYl0Y3JFDX+0HpL9tM/
# vB1BHA+PGRV51vDy7VoUpbbZkpa1/WJCqTm9s1BxzZ2BFu0tpQ2Rqg/V+y004NQY
# Xx1t7ilm18LyQrZpHYqmw3OJ/EVPtATBN2jomK2Z8ZWExLsDQ/Qd8k3cHg6OcN4N
# /ORpbqy29dOL5mQTEuBW8L0tLEN9tBqfadlqvlsbI9S0eDlZdyvPT9utV0aSCfe2
# km/rSjD2IJEmtJA1kcYgq3ipNsPu5eGFfw2OqGe+vowLaU42ki3uteaOqLgN81AX
# sB5cO49w7AtAmaocraAzPA==
# =+I+o
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 14 Nov 2025 04:18:53 PM CET
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [unknown]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [unknown]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [unknown]
# gpg:                 aka "Peter Maydell <peter@archaic.org.uk>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* tag 'pull-target-arm-20251114' of https://gitlab.com/pm215/qemu:
  hw/audio/lm4549: Don't try to open a zero-frequency audio voice
  hw/misc/npcm_clk: Don't divide by zero when calculating frequency
  tests: Clean up includes
  vfio: Clean up includes
  cxl: Clean up includes
  hw/display/xlnx_dp: Don't abort for unsupported graphics formats
  hw/display/xlnx_dp.c: Don't abort on AUX FIFO overrun/underrun
  target/arm/cpu64: remove duplicate include
  target/arm: Fix accidental write to TCG constant
  MAINTAINERS: update maintainers for WHPX

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAmkWo9EACgkQ7wSWWzmN
# YhHargf/Uf801PmKskryVENF9sVe6u5NxJZlT3BUJVsSTGitucBIHWZ5J7MMR1lw
# If4tfMho3BX5Wrtl5GuCEzolk9pCz3wmSN6nyOU25C5tKaoJ/uR135K25D0CwVmD
# eTOyg+gKktVfogXxJ/zwZpRHMq4XXrk/C2ZP41r/CdcLyaeuDS9GIbd/q4N7f3vv
# bEsVqECzjEwWr2JBY9SD0xlIRp3nWwEvRsgRZPzBiQzfjSTlImqGLUsxIpF5V2LV
# 1BU0V/FShWyrwckBXSqCWBUh6uBUGgEl6qKnK4vH7+ed4Kd9giyp1vWAFEjHgIg+
# gZtPaT/MJQOtLyCuzfuSdUpAzz5Sfw==
# =Is8a
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 14 Nov 2025 04:36:49 AM CET
# gpg:                using RSA key 215D46F48246689EC77F3562EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* tag 'net-pull-request' of https://github.com/jasowang/qemu:
  net: pad packets to minimum length in qemu_receive_packet()
  hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers() assert
  hw/net/e1000e_core: Correct rx oversize packet checks
  hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors
  net/hub: make net_hub_port_cleanup idempotent

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'pull-nbd-2025-11-13' of https://repo.or.cz/qemu/ericb into staging

NBD patches for 2025-11-13

- Fix NBD client deadlock when connecting to same-process server
- Several iotests improvements

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmkWYUwACgkQp6FrSiUn
# Q2rYDgf/TQZ1UVkLhUvnH7RhF4y94tXpfVcl3/PObtis5mldZKkGlTEnFSZGJG4Y
# +ra/tdMS8ZBbTgXIAdR7tEp+n9YpWMLvYxcWcLpQQ2H3MXghtBGGjYHwkzppIvG+
# U3F8YdImbuOgR0V9NP0JWlk9DztsoRkiO3zaqLqvtwvzDXKPdjsMsGM13pHJVVru
# LdkM828Mrr8eu+DcAVFd7ZofftEgyd/E7IV1/0YCj3MaWR3BJ45gsfMUHvWwtaBP
# Mn8tQvB6yJEbAZwmepZbxrkFAJQhE916qbQyZscbnEJvDiKwK6PagQ5NAVtBaiz5
# xN3ywPOw4kghRaRLMiOsq1q/9M/p9A==
# =hhAb
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 13 Nov 2025 11:53:00 PM CET
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [unknown]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [unknown]
# gpg:                 aka "[jpeg image of size 6874]" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* tag 'pull-nbd-2025-11-13' of https://repo.or.cz/qemu/ericb:
  tests/qemu-iotest: fix iotest 024 with qed images
  tests/qemu-iotests: Fix broken grep command in iotest 207
  iotests: Add coverage of recent NBD qio deadlock fix
  nbd: Avoid deadlock in client connecting to same-process server
  qio: Add QIONetListener API for using AioContext
  qio: Prepare NetListener to use AioContext
  qio: Provide accessor around QIONetListener->sioc
  chardev: Reuse channel's cached local address
  qio: Factor out helpers qio_net_listener_[un]watch
  qio: Minor optimization when callback function is unchanged
  qio: Protect NetListener callback with mutex
  qio: Remember context of qio_net_listener_set_client_func_full
  qio: Unwatch before notify in QIONetListener
  qio: Add trace points to net_listener
  iotests: Drop execute permissions on vvfat.out

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

hw/audio/lm4549: Don't try to open a zero-frequency audio voice

If the guest incorrectly programs the lm4549 audio chip with a zero
frequency, we will pass this to AUD_open_out(), which will complain:

   A bug was just triggered in AUD_open_out
   Save all your work and restart without audio
   I am sorry
   Context:
   audio: frequency=0 nchannels=2 fmt=S16 endianness=little

The datasheet doesn't say what we should do here, only that the valid
range for the freqency is 4000 to 48000 Hz; we choose to log the
guest error and ignore an attempt to change the DAC rate to something
outside the valid range.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/410
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251107154116.1396769-1-peter.maydell@linaro.org

hw/misc/npcm_clk: Don't divide by zero when calculating frequency

If the guest misprograms the PLL registers to request a zero
divisor, we currently fall over with a division by zero:

../../hw/misc/npcm_clk.c:221:14: runtime error: division by zero
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../hw/misc/npcm_clk.c:221:14

Thread 1 "qemu-system-aar" received signal SIGFPE, Arithmetic exception.
0x00005555584d8f6d in npcm7xx_clk_update_pll (opaque=0x7fffed159a20) at ../../hw/misc/npcm_clk.c:221
221 freq /= PLLCON_INDV(con) * PLLCON_OTDV1(con) * PLLCON_OTDV2(con);

Avoid this by treating this invalid setting like a stopped clock
(setting freq to 0).

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/549
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251107150137.1353532-1-peter.maydell@linaro.org

tests: Clean up includes

This commit was created with scripts/clean-includes:
./scripts/clean-includes --git tests tests

with one hand-edit to remove a now-empty #ifndef WIN32...#endif
from tests/qtest/dbus-display-test.c .

All .c should include qemu/osdep.h first.  The script performs three
related cleanups:

* Ensure .c files include qemu/osdep.h first.
* Including it in a .h is redundant, since the .c  already includes
  it.  Drop such inclusions.
* Likewise, including headers qemu/osdep.h includes is redundant.
  Drop these, too.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-id: 20251104160943.751997-10-peter.maydell@linaro.org

vfio: Clean up includes

This commit was created with scripts/clean-includes:
./scripts/clean-includes --git vfio hw/vfio hw/vfio-user

All .c should include qemu/osdep.h first.  The script performs three
related cleanups:

* Ensure .c files include qemu/osdep.h first.
* Including it in a .h is redundant, since the .c  already includes
  it.  Drop such inclusions.
* Likewise, including headers qemu/osdep.h includes is redundant.
  Drop these, too.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251104160943.751997-9-peter.maydell@linaro.org

cxl: Clean up includes

This commit was created with scripts/clean-includes:
./scripts/clean-includes --git cxl hw/cxl hw/mem

All .c should include qemu/osdep.h first.  The script performs three
related cleanups:

* Ensure .c files include qemu/osdep.h first.
* Including it in a .h is redundant, since the .c  already includes
  it.  Drop such inclusions.
* Likewise, including headers qemu/osdep.h includes is redundant.
  Drop these, too.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-id: 20251104160943.751997-8-peter.maydell@linaro.org

hw/display/xlnx_dp: Don't abort for unsupported graphics formats

If the guest writes an invalid or unsupported value to the
AV_BUF_FORMAT register, currently we abort(). Instead, log this as
either a guest error or an unimplemented error and continue.

The existing code treats DP_NL_VID_CB_Y0_CR_Y1 as x8b8g8r8
via a "case 0" that does not use the enum constant name for some
reason; we leave that alone beyond adding a comment about the
weird code.

Documentation of this register seems to be at:
https://docs.amd.com/r/en-US/ug1087-zynq-ultrascale-registers/AV_BUF_FORMAT-DISPLAY_PORT-Register

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1415
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20251106145209.1083998-3-peter.maydell@linaro.org

hw/display/xlnx_dp.c: Don't abort on AUX FIFO overrun/underrun

The documentation of the Xilinx DisplayPort subsystem at
https://www.xilinx.com/support/documents/ip_documentation/v_dp_txss1/v3_1/pg299-v-dp-txss1.pdf
doesn't say what happens if a guest tries to issue an AUX write
command with a length greater than the amount of data in the AUX
write FIFO, or tries to write more data to the write FIFO than it can
hold, or issues multiple commands that put data into the AUX read
FIFO without reading it such that it overflows.

Currently QEMU will abort() in these guest-error situations, either
in xlnx_dp.c itself or in the fifo8 code. Make these cases all be
logged as guest errors instead. We choose to ignore the new data on
overflow, and return 0 on underflow. This is in line with how we handled
the "read from empty RX FIFO" case in commit a09ef5040477.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1418
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1419
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1424
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Message-id: 20251106145209.1083998-2-peter.maydell@linaro.org

target/arm/cpu64: remove duplicate include

cpregs.h is included twice.

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-id: 20251110161552.700333-1-osama.abdelkader@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Fix accidental write to TCG constant

Currently an unpredictable movw such as

  movw pc, 0x123

results in the tinycode

   and_i32 $0x123,$0x123,$0xfffffffc
   mov_i32 pc,$0x123
   exit_tb $0x0

which is clearly a bug: writing to a constant is incorrect and
discards the result of the mask.  Fix this by always doing an and_i32
and trusting the optimizer to turn this into a simple move when the
mask is zero.

Signed-off-by: Anton Johansson <anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: <gustavo.romero@linaro.org>
Message-id: 20251106144909.533997-1-richard.henderson@linaro.org
[rth: Avoid an extra temp and extra move.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMM: commit message tweak]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

MAINTAINERS: update maintainers for WHPX

From Pedro Barbuda (on Teams):

> we meant to have that switched a while back. you can add me as the maintainer. Pedro Barbuda (pbarbuda@microsoft.com)

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>
Message-id: 20251107072337.28932-1-mohamed@unpredictable.fr
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

net: pad packets to minimum length in qemu_receive_packet()

In commits like 969e50b61a28 ("net: Pad short frames to minimum size
before sending from SLiRP/TAP") we switched away from requiring
network devices to handle short frames to instead having the net core
code do the padding of short frames out to the ETH_ZLEN minimum size.
We then dropped the code for handling short frames from the network
devices in a series of commits like 140eae9c8f7 ("hw/net: e1000:
Remove the logic of padding short frames in the receive path").

This missed one route where the device's receive code can still see a
short frame: if the device is in loopback mode and it transmits a
short frame via the qemu_receive_packet() function, this will be fed
back into its own receive code without being padded.

Add the padding logic to qemu_receive_packet().

This fixes a buffer overrun which can be triggered in the
e1000_receive_iov() logic via the loopback code path.

Other devices that use qemu_receive_packet() to implement loopback
are cadence_gem, dp8393x, lan9118, msf2-emac, pcnet, rtl8139
and sungem.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3043
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>

hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers() assert

An assertion in e1000e_write_payload_frag_to_rx_buffers() attempts to
guard against the calling code accidentally trying to write too much
data to a single RX descriptor, such that the E1000EBAState::cur_idx
indexes off the end of the EB1000BAState::written[] array.

Unfortunately it is overzealous: it asserts that cur_idx is in
range after it has been incremented. This will fire incorrectly
for the case where the guest configures four buffers and exactly
enough bytes are written to fill all four of them.

The only places where we use cur_idx and index in to the written[]
array are the functions e1000e_write_hdr_frag_to_rx_buffers() and
e1000e_write_payload_frag_to_rx_buffers(), so we can rewrite this to
assert before doing the array dereference, rather than asserting
after updating cur_idx.

Cc: qemu-stable@nongnu.org
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>

hw/net/e1000e_core: Correct rx oversize packet checks

In e1000e_write_packet_to_guest() we attempt to ensure that we don't
write more of a packet to a descriptor than will fit in the guest
configured receive buffers.  However, this code does not allow for
the "packet split" feature.  When packet splitting is enabled, the
first of up to 4 buffers in the descriptor is used for the packet
header only, with the payload going into buffers 2, 3 and 4.  Our
length check only checks against the total sizes of all 4 buffers,
which meant that if an incoming packet was large enough to fit in (1
+ 2 + 3 + 4) but not into (2 + 3 + 4) and packet splitting was
enabled, we would run into the assertion in
e1000e_write_hdr_frag_to_rx_buffers() that we had enough buffers for
the data:

qemu-system-i386: ../../hw/net/e1000e_core.c:1418: void e1000e_write_payload_frag_to_rx_buffers(E1000ECore *, hwaddr *, E1000EBAState *, const char *, dma_addr_t): Assertion `bastate->cur_idx < MAX_PS_BUFFERS' failed.

A malicious guest could provoke this assertion by configuring the
device into loopback mode, and then sending itself a suitably sized
packet into a suitably arrange rx descriptor.

The code also fails to deal with the possibility that the descriptor
buffers are sized such that the trailing checksum word does not fit
into the last descriptor which has actual data, which might also
trigger this assertion.

Rework the length handling to use two variables:
* desc_size is the total amount of data DMA'd to the guest
   for the descriptor being processed in this iteration of the loop
* rx_desc_buf_size is the total amount of space left in it

As we copy data to the guest (packet header, payload, checksum),
update these two variables.  (Previously we attempted to calculate
desc_size once at the top of the loop, but this is too difficult to
do correctly.) Then we can use the variables to ensure that we clamp
the amount of copied payload data to the remaining space in the
descriptor's buffers, even if we've used one of the buffers up in the
packet-split code, and we can tell whether we have enough space for
the full checksum word in this descriptor or whether we're going to
need to split that to the following descriptor.

I have included comments that hopefully help to make the loop
logic a little clearer.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/537
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>

hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors

In e1000e_write_packet_to_guest() we don't write data for RX descriptors
where the buffer address is NULL (as required by the i82574 datasheet
section 7.1.7.2). However, when we do this we still update desc_offset
by the amount of data we would have written to the RX descriptor if
it had a valid buffer pointer, resulting in our dropping that data
entirely. The data sheet is not 100% clear on the subject, but this
seems unlikely to be the correct behaviour.

Rearrange the null-descriptor logic so that we don't treat these
do-nothing descriptors as if we'd really written the data.

This both fixes a bug and also is a prerequisite to cleaning up
the size calculation logic in the next patch.

(Cc to stable largely because it will be needed for the next patch,
which fixes a more serious bug.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net/hub: make net_hub_port_cleanup idempotent

Makes the net_hub_port_cleanup function idempotent to avoid double
removals by guarding its QLIST_REMOVE with a flag.

When using a Xen networking device with hubport backends, e.g.:

-accel kvm,xen-version=0x40011
-netdev hubport,...
-device xen-net-device,...

the shutdown order starts with net_cleanup, which walks the list and
deletes netdevs (including hubports). Then Xen's xen_device_unrealize is
called, which eventually leads to a second net_hub_port_cleanup call,
resulting in a segfault.

Fixes: e7891c57 ("net: move backend cleanup to NIC cleanup")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

tests/qemu-iotest: fix iotest 024 with qed images

Use 'qemu-io -c map' instead of 'qemu-img map' to get an output that
works with both image types.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: 909852ba6b4a ("qemu-img rebase: don't exceed IO_BUF_SIZE in one operation")
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-ID: <20251112170959.700840-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

tests/qemu-iotests: Fix broken grep command in iotest 207

Running "./check -ssh 207" fails for me with lots of lines like this
in the output:

+base64: invalid input

While looking closer at it, I noticed that the grep -v "\\^#" command
in this test is not working as expected - it is likely meant to filter
out the comment lines that are starting with a "#", but at least my
version of grep (GNU grep 3.11) does not work with the backslashes here.
There does not seem to be a compelling reason for these backslashes,
so let's simply drop them to fix this issue.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251113080525.444826-1-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

iotests: Add coverage of recent NBD qio deadlock fix

Test that all images in a qcow2 chain using an NBD backing file can be
served by the same process. Prior to the recent QIONetListener fixes,
this test would demonstrate deadlock.

The test borrows heavily from the original formula by "John Doe" in
the gitlab bug, but uses a Unix socket rather than TCP to avoid port
contention, and uses a full-blown QEMU rather than qemu-storage-daemon
since both programs were impacted.

The test starts out with the even simpler task of directly adding an
NBD client without qcow2 chain ('client'), which also provokes the
deadlock; but commenting out the 'Adding explicit NBD client' section
will still show deadlock when reaching the 'Adding wrapper image...'.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20251113011625.878876-28-eblake@redhat.com>

nbd: Avoid deadlock in client connecting to same-process server

See the previous patch for a longer description of the deadlock. Now
that QIONetListener supports waiting for clients in the main loop
AioContext, NBD can use that to ensure that the server can make
progress even when a client is intentionally starving the GMainContext
from any activity not tied to an AioContext.

Note that command-line arguments and QMP commands like
nbd-server-start or nbd-server-stop that manipulate whether the NBD
server exists are serviced in the main loop; and therefore, this patch
does not fall foul of the restrictions in the previous patch about the
inherent unsafe race possible if a QIONetListener can have its async
callback modified by a different thread than the one servicing polls.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-27-eblake@redhat.com>

qio: Add QIONetListener API for using AioContext

The user calling himself "John Doe" reported a deadlock when
attempting to use qemu-storage-daemon to serve both a base file over
NBD, and a qcow2 file with that NBD export as its backing file, from
the same process, even though it worked just fine when there were two
q-s-d processes.  The bulk of the NBD server code properly uses
coroutines to make progress in an event-driven manner, but the code
for spawning a new coroutine at the point when listen(2) detects a new
client was hard-coded to use the global GMainContext; in other words,
the callback that triggers nbd_client_new to let the server start the
negotiation sequence with the client requires the main loop to be
making progress.  However, the code for bdrv_open of a qcow2 image
with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
ensure that the entire qcow2 backing chain is either fully loaded or
rejected, without any side effects from the main loop causing unwanted
changes to the disk being loaded (in short, an AioContext represents
the set of actions that are known to be safe while handling block
layer I/O, while excluding any other pending actions in the global
main loop with potentially larger risk of unwanted side effects).

This creates a classic case of deadlock: the server can't progress to
the point of accept(2)ing the client to write to the NBD socket
because the main loop is being starved until the AIO_WAIT_WHILE
completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
it is blocked on the client coroutine stuck in a read() of the
expected magic number from the server side of the socket.

This patch adds a new API to allow clients to opt in to listening via
an AioContext rather than a GMainContext.  This will allow NBD to fix
the deadlock by performing all actions during bdrv_open in the main
loop AioContext.

Technical debt warning: I would have loved to utilize a notify
function with AioContext to guarantee that we don't finalize listener
due to an object_unref if there is any callback still running (the way
GSource does), but wiring up notify functions into AioContext is a
bigger task that will be deferred to a later QEMU release.  But for
solving the NBD deadlock, it is sufficient to note that the QMP
commands for enabling and disabling the NBD server are really the only
points where we want to change the listener's callback.  Furthermore,
those commands are serviced in the main loop, which is the same
AioContext that is also listening for connections.  Since a thread
cannot interrupt itself, we are ensured that at the point where we are
changing the watch, there are no callbacks active.  This is NOT as
powerful as the GSource cross-thread safety, but sufficient for the
needs of today.

An upcoming patch will then add a unit test (kept separate to make it
easier to rearrange the series to demonstrate the deadlock without
this patch).

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-26-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

qio: Prepare NetListener to use AioContext

For ease of review, this patch adds an AioContext pointer to the
QIONetListener struct, the code to trace it, and refactors
listener->io_source to instead be an array of utility structs; but the
aio_context pointer is always NULL until the next patch adds an API to
set it. There should be no semantic change in this patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-25-eblake@redhat.com>

qio: Provide accessor around QIONetListener->sioc

An upcoming patch needs to pass more than just sioc as the opaque
pointer to an AioContext; but since our AioContext code in general
(and its QIO Channel wrapper code) lacks a notify callback present
with GSource, we do not have the trivial option of just g_malloc'ing a
small struct to hold all that data coupled with a notify of g_free.
Instead, the data pointer must outlive the registered handler; in
fact, having the data pointer have the same lifetime as QIONetListener
is adequate.

But the cleanest way to stick such a helper struct in QIONetListener
will be to rearrange internal struct members. And that in turn means
that all existing code that currently directly accesses
listener->nsioc and listener->sioc[] should instead go through
accessor functions, to be immune to the upcoming struct layout
changes. So this patch adds accessor methods qio_net_listener_nsioc()
and qio_net_listener_sioc(), and puts them to use.

While at it, notice that the pattern of grabbing an sioc from the
listener only to turn around can call
qio_channel_socket_get_local_address is common enough to also warrant
the helper of qio_net_listener_get_local_address, and fix a copy-paste
error in the corresponding documentation.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-24-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

chardev: Reuse channel's cached local address

Directly accessing the fd member of a QIOChannelSocket is an
undesirable leaky abstraction. What's more, grabbing that fd merely
to force an eventual call to getsockname() can be wasteful, since the
channel is often able to return its cached local name.

Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-23-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

qio: Factor out helpers qio_net_listener_[un]watch

The code had three similar repetitions of an iteration over one or all
of nsiocs to set up a GSource, and likewise for teardown. Since an
upcoming patch wants to tweak whether GSource or AioContext is used,
it's better to consolidate that into one helper function for fewer
places to edit later.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-22-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

qio: Minor optimization when callback function is unchanged

In qemu-nbd and other NBD server setups where parallel clients are
supported, it is common that the caller will re-register the same
callback function as long as it has not reached its limit on
simultaneous clients. In that case, there is no need to tear down and
reinstall GSource watches in the GMainContext.

In practice, all existing callers currently pass NULL for notify, and
no caller ever changes context across calls (for async uses, either
the caller consistently uses qio_net_listener_set_client_func_full
with the same context, or the caller consistently uses only
qio_net_listener_set_client_func which always uses the global
context); but the time spent checking these two fields in addition to
the more important func and data is still less than the savings of not
churning through extra GSource manipulations when the result will be
unchanged.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-21-eblake@redhat.com>

qio: Protect NetListener callback with mutex

Without a mutex, NetListener can run into this data race between a
thread changing the async callback callback function to use when a
client connects, and the thread servicing polling of the listening
sockets:

  Thread 1:
       qio_net_listener_set_client_func(lstnr, f1, ...);
           => foreach sock: socket
               => object_ref(lstnr)
               => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);

  Thread 2:
       poll()
          => event POLLIN on socket
               => ref(GSourceCallback)
               => if (lstnr->io_func) // while lstnr->io_func is f1
                    ...interrupt..

  Thread 1:
       qio_net_listener_set_client_func(lstnr, f2, ...);
          => foreach sock: socket
               => g_source_unref(sock_src)
          => foreach sock: socket
               => object_ref(lstnr)
               => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);

  Thread 2:
               => call lstnr->io_func(lstnr->io_data) // now sees f2
               => return dispatch(sock)
               => unref(GSourceCallback)
                  => destroy-notify
                     => object_unref

Found by inspection; I did not spend the time trying to add sleeps or
execute under gdb to try and actually trigger the race in practice.
This is a SEGFAULT waiting to happen if f2 can become NULL because
thread 1 deregisters the user's callback while thread 2 is trying to
service the callback.  Other messes are also theoretically possible,
such as running callback f1 with an opaque pointer that should only be
passed to f2 (if the client code were to use more than just a binary
choice between a single async function or NULL).

Mitigating factor: if the code that modifies the QIONetListener can
only be reached by the same thread that is executing the polling and
async callbacks, then we are not in a two-thread race documented above
(even though poll can see two clients trying to connect in the same
window of time, any changes made to the listener by the first async
callback will be completed before the thread moves on to the second
client).  However, QEMU is complex enough that this is hard to
generically analyze.  If QMP commands (like nbd-server-stop) are run
in the main loop and the listener uses the main loop, things should be
okay.  But when a client uses an alternative GMainContext, or if
servicing a QMP command hands off to a coroutine to avoid blocking, I
am unable to state with certainty whether a given net listener can be
modified by a thread different from the polling thread running
callbacks.

At any rate, it is worth having the API be robust.  To ensure that
modifying a NetListener can be safely done from any thread, add a
mutex that guarantees atomicity to all members of a listener object
related to callbacks.  This problem has been present since
QIONetListener was introduced.

Note that this does NOT prevent the case of a second round of the
user's old async callback being invoked with the old opaque data, even
when the user has already tried to change the async callback during
the first async callback; it is only about ensuring that there is no
sharding (the eventual io_func(io_data) call that does get made will
correspond to a particular combination that the user had requested at
some point in time, and not be sharded to a combination that never
existed in practice).  In other words, this patch maintains the status
quo that a user's async callback function already needs to be robust
to parallel clients landing in the same window of poll servicing, even
when only one client is desired, if that particular listener can be
amended in a thread other than the one doing the polling.

CC: qemu-stable@nongnu.org
Fixes: 53047392 ("io: introduce a network socket listener API", v2.12.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-20-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: minor commit message wording improvements]
Signed-off-by: Eric Blake <eblake@redhat.com>

qio: Remember context of qio_net_listener_set_client_func_full

io/net-listener.c has two modes of use: asynchronous (the user calls
qio_net_listener_set_client_func to wake up the callback via the
global GMainContext, or qio_net_listener_set_client_func_full to wake
up the callback via the caller's own alternative GMainContext), and
synchronous (the user calls qio_net_listener_wait_client which creates
its own GMainContext and waits for the first client connection before
returning, with no need for a user's callback).  But commit 938c8b79
has a latent logic flaw: when qio_net_listener_wait_client finishes on
its temporary context, it reverts all of the siocs back to the global
GMainContext rather than the potentially non-NULL context they might
have been originally registered with.  Similarly, if the user creates
a net-listener, adds initial addresses, registers an async callback
with a non-default context (which ties to all siocs for the initial
addresses), then adds more addresses with qio_net_listener_add, the
siocs for later addresses are blindly placed in the global context,
rather than sharing the context of the earlier ones.

In practice, I don't think this has caused issues.  As pointed out by
the original commit, all async callers prior to that commit were
already okay with the NULL default context; and the typical usage
pattern is to first add ALL the addresses the listener will pay
attention to before ever setting the async callback.  Likewise, if a
file uses only qio_net_listener_set_client_func instead of
qio_net_listener_set_client_func_full, then it is never using a custom
context, so later assignments of async callbacks will still be to the
same global context as earlier ones.  Meanwhile, any callers that want
to do the sync operation to grab the first client are unlikely to
register an async callback; altogether bypassing the question of
whether later assignments of a GSource are being tied to a different
context over time.

I do note that chardev/char-socket.c is the only file that calls both
qio_net_listener_wait_client (sync for a single client in
tcp_chr_accept_server_sync), and qio_net_listener_set_client_func_full
(several places, all with chr->gcontext, but sometimes with a NULL
callback function during teardown).  But as far as I can tell, the two
uses are mutually exclusive, based on the is_waitconnect parameter to
qmp_chardev_open_socket_server.

That said, it is more robust to remember when an async callback
function is tied to a non-default context, and have both the sync wait
and any late address additions honor that same context.  That way, the
code will be robust even if a later user performs a sync wait for a
specific client in the middle of servicing a longer-lived
QIONetListener that has an async callback for all other clients.

CC: qemu-stable@nongnu.org
Fixes: 938c8b79 ("qio: store gsources for net listeners", v2.12.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-19-eblake@redhat.com>

qio: Unwatch before notify in QIONetListener

When changing the callback registered with QIONetListener, the code
was calling notify on the old opaque data prior to actually removing
the old GSource objects still pointing to that data.  Similarly,
during finalize, it called notify before tearing down the various
GSource objects tied to the data.

In practice, a grep of the QEMU code base found that every existing
client of QIONetListener passes in a NULL notifier (the opaque data,
if non-NULL, outlives the NetListener and so does not need cleanup
when the NetListener is torn down), so this patch has no impact.  And
even if a caller had passed in a reference-counted object with a
notifier of object_unref but kept its own reference on the data, then
the early notify would merely reduce a refcount from (say) 2 to 1, but
not free the object.  However, it is a latent bug waiting to bite any
future caller that passes in data where the notifier actually frees
the object, because the GSource could then trigger a use-after-free if
it loses the race on a last-minute client connection resulting in the
data being passed to one final use of the async callback.

Better is to delay the notify call until after all GSource that have
been given a copy of the opaque data are torn down.

CC: qemu-stable@nongnu.org
Fixes: 530473924d "io: introduce a network socket listener API", v2.12.0
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-18-eblake@redhat.com>

qio: Add trace points to net_listener

Upcoming patches will adjust how net_listener watches for new client
connections; adding trace points now makes it easier to debug that the
changes work as intended. For example, adding
--trace='qio_net_listener*' to the qemu-storage-daemon command line
before --nbd-server will track when the server first starts listening
for clients.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-17-eblake@redhat.com>

iotests: Drop execute permissions on vvfat.out

Output files are not executables. Noticed while preparing another
iotest addition.

Fixes: c8f60bfb43 ("iotests: Add `vvfat` tests", v9.1.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-16-eblake@redhat.com>

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- stream: Fix potential crash during job completion
- aio: add the aio_add_sqe() io_uring API
- qcow2: put discards in discard queue when discard-no-unref is enabled
- qcow2, vmdk: Restrict creation with secondary file using protocol
- qemu-img rebase: Fix assertion failure due to exceeding IO_BUF_SIZE
- iotests: Run iotests with sanitizers
- iotests: Add more image formats to the thorough testing
- iotests: Improve the dry run list to speed up thorough testing
- Code cleanup

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCgAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmkTqWcRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9awPg//VqEgqYbEr3dVUvBFk8tlcewoo7KGICVk
# 4kddOwMJIdcsVpiLuNzqQARH2kHV93Hiv+mVt25o00PkJx565eCGTh/bBFas3UXL
# JMBjgHyJutGr4cijkNrnQgqWfeTgc32xdVEWh1nZM2K7LslzC9I1PfUzfxRMYqZA
# Em0KE3vwQDC7xtIyk4t451hkfcQY8fwN9bDMpD+zbzaLsYTEyOJ900En88iW7oHE
# TuJhrviin11jdQCA26QVNXRaw7iIVVo8vJP1VEgbn31iY+Qpcr/HcQRs0x2gex67
# OqIdh4onqkdGCFDxTGUoAH+jORXWUmk/JipIhl9pJP0ZDyAjsm97ThJ6SvctURsK
# UMU0dzXEc1C5spD2CWnN0PujqHYQqYaylx7MdiCJMjaCfDB3ZeIRsTGoiLMB24P+
# WBrcn2P+f03nC/sVvxRZWrpyI2kZwEh1RsO/mnLQ3apVBFeKqaFi8Ouo9oi1ZMd6
# ahUw7sZSoTxmGY1FhOSRCGEh2Wjy0ZIOx9tHT1U9vig5Kf9KeE81yO8yaq2T60mq
# 9eaUL8rcUrKRiJw9NUkcEYmIUJrh0nUe/kK2RWmbEGMYIH7ASrGqiyUP5FxpekD+
# i/uen4BeyRwe6rnPOzGolg+HMysMBr8VD/8PwJ8g88FLH1jIdTYvFUdRbrkciUlo
# okC+y4+kqiU=
# =SI8s
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 11 Nov 2025 10:23:51 PM CET
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (28 commits)
  qemu-img rebase: don't exceed IO_BUF_SIZE in one operation
  qcow2, vmdk: Restrict creation with secondary file using protocol
  block: Allow drivers to control protocol prefix at creation
  tests/qemu-iotest: Add more image formats to the thorough testing
  tests/qemu-iotests: Improve the dry run list to speed up thorough testing
  tests/qemu-iotests/184: Fix skip message for qemu-img without throttle
  qcow2: put discards in discard queue when discard-no-unref is enabled
  qcow2: rename update_refcount_discard to queue_discard
  iotests: Run iotests with sanitizers
  qemu-img: Fix amend option parse error handling
  iotests: Test resizing file node under raw with size/offset
  block: Drop detach_subchain for bdrv_replace_node
  block: replace TABs with space
  block/io_uring: use non-vectored read/write when possible
  block/io_uring: use aio_add_sqe()
  aio-posix: add aio_add_sqe() API for user-defined io_uring requests
  aio-posix: add fdmon_ops->dispatch()
  aio-posix: unindent fdmon_io_uring_destroy()
  aio-posix: gracefully handle io_uring_queue_init() failure
  aio: add errp argument to aio_context_setup()
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qemu-img rebase: don't exceed IO_BUF_SIZE in one operation

During a rebase operation data is copied from the backing chain into
the target image using a loop, and each iteration looks for a
contiguous region of allocated data of at most IO_BUF_SIZE (2 MB).

Once that region is found, and in order to avoid partial writes, its
boundaries are extended so they are aligned to the (sub)clusters of
the target image (see commit 12df580b).

This operation can however result in a region that exceeds the maximum
allowed IO_BUF_SIZE, crashing qemu-img.

This can be easily reproduced when the source image has a smaller
cluster size than the target image:

base <- int <- active

$ qemu-img create -f qcow2 base.qcow2 4M
$ qemu-img create -f qcow2 -F qcow2 -b base.qcow2 -o cluster_size=1M int.qcow2
$ qemu-img create -f qcow2 -F qcow2 -b int.qcow2 -o cluster_size=2M active.qcow2
$ qemu-io -c "write -P 0xff 1M 2M" int.qcow2
$ qemu-img rebase -F qcow2 -b base.qcow2 active.qcow2
qemu-img: qemu-img.c:4102: img_rebase: Assertion `written + pnum <= IO_BUF_SIZE' failed.
Aborted

Cc: qemu-stable <qemu-stable@nongnu.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3174
Fixes: 12df580b3b7f ("qemu-img: rebase: avoid unnecessary COW operations")
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-ID: <20251107091834.383781-1-berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2, vmdk: Restrict creation with secondary file using protocol

Ever since CVE-2024-4467 (see commit 7ead9469 in qemu v9.1.0), we have
intentionally treated the opening of secondary files whose name is
specified in the contents of the primary file, such as a qcow2
data_file, as something that must be a local file and not a protocol
prefix (it is still possible to open a qcow2 file that wraps an NBD
data image by using QMP commands, but that is from the explicit action
of the QMP overriding any string encoded in the qcow2 file).  At the
time, we did not prevent the use of protocol prefixes on the secondary
image while creating a qcow2 file, but it results in a qcow2 file that
records an empty string for the data_file, rather than the protocol
passed in during creation:

$ qemu-img create -f raw datastore.raw 2G
$ qemu-nbd -e 0 -t -f raw datastore.raw &
$ qemu-img create -f qcow2 -o data_file=nbd://localhost:10809/ \
  datastore_nbd.qcow2 2G
Formatting 'datastore_nbd.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2147483648 data_file=nbd://localhost:10809/ lazy_refcounts=off refcount_bits=16
$ qemu-img info datastore_nbd.qcow2 | grep data
$ qemu-img info datastore_nbd.qcow2 | grep data
image: datastore_nbd.qcow2
    data file:
    data file raw: false
    filename: datastore_nbd.qcow2

And since an empty string was recorded in the file, attempting to open
the image without using QMP to supply the NBD data store fails, with a
somewhat confusing error message:

$ qemu-io -f qcow2 datastore_nbd.qcow2
qemu-io: can't open device datastore_nbd.qcow2: The 'file' block driver requires a file name

Although the ability to create an image with a convenience reference
to a protocol data file is not a security hole (unlike the case with
open, the image is not untrusted if we are the ones creating it), the
above demo shows that it is still inconsistent.  Thus, it makes more
sense if we also insist that image creation rejects a protocol prefix
when using the same syntax.  Now, the above attempt produces:

$ qemu-img create -f qcow2 -o data_file=nbd://localhost:10809/ \
  datastore_nbd.qcow2 2G
Formatting 'datastore_nbd.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2147483648 data_file=nbd://localhost:10809/ lazy_refcounts=off refcount_bits=16
qemu-img: datastore_nbd.qcow2: Could not create 'nbd://localhost:10809/': No such file or directory

with datastore_nbd.qcow2 no longer created.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250915213919.3121401-6-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow drivers to control protocol prefix at creation

This patch is pure refactoring: instead of hard-coding permission to
use a protocol prefix when creating an image, the drivers can now pass
in a parameter, comparable to what they could already do for opening a
pre-existing image. This patch is purely mechanical (all drivers pass
in true for now), but it will enable the next patch to cater to
drivers that want to differ in behavior for the primary image vs. any
secondary images that are opened at the same time as creating the
primary image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250915213919.3121401-5-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotest: Add more image formats to the thorough testing

Now that the "check" script is a little bit smarter with providing
a list of tests that are supported for an image format, we can also
add more image formats that can be used for generic block layer
testing. (Note: qcow1 and luks are not added because some tests
there currently fail, and other formats like bochs, cloop, dmg and
vvfat do not work with the generic tests and thus would only get
skipped if we'd tried to add them here)

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251014104142.1281028-4-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Improve the dry run list to speed up thorough testing

When running the tests in thorough mode, e.g. with:

make -j$(nproc) check SPEED=thorough

we currently always get a huge amount of total tests that the test
runner tries to execute (2457 in my case), but a big bunch of them are
only skipped (1099 in my case, meaning that only 1358 got executed).
This happens because we try to run the whole set of iotests for multiple
image formats while a lot of the tests can only run with one certain
format only and thus are marked as SKIP during execution. This is quite a
waste of time during each test run, and also unnecessarily blows up the
displayed list of executed tests in the console output.

Thus let's try to be a little bit smarter: If the "check" script is run
with "-n" and an image format switch (like "-qed") at the same time (which
is what we do for discovering the tests for the meson test runner already),
only report the tests that likely support the given format instead of
providing the whole list of all tests. We can determine whether a test
supports a format or not by looking at the lines in the file that contain
a "supported_fmt" or "unsupported_fmt" statement. This is only heuristics,
of course, but it is good enough for running the iotests via "make
check-block" - I double-checked that the list of executed tests does not
get changed by this patch, it's only the tests that are skipped anyway that
are now not run anymore.

This way the amount of total tests drops from 2457 to 1432 for me, and
the amount of skipped tests drops from 1099 to just 74 (meaning that we
still properly run 1432 - 74 = 1358 tests as we did before).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251014104142.1281028-3-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests/184: Fix skip message for qemu-img without throttle

If qemu-img does not support throttling, test 184 currently skips
with the message:

not suitable for this image format: raw

But that's wrong, it's not about the image format, it's about the
throttling not being available in qemu-img. Thus fix this by using
_notrun with a proper message instead.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251014104142.1281028-2-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: put discards in discard queue when discard-no-unref is enabled

When discard-no-unref is enabled, discards are not queued like it
should.
This was broken since discard-no-unref was added.

Add a helper function qcow2_discard_cluster which handles some common
checks and calls the queue_discards function if needed to add the
discard request to the queue.

Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-ID: <20250513132628.1055549-3-jean-louis@dupond.be>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: rename update_refcount_discard to queue_discard

The function just queues discards, and doesn't do any refcount change.
So let's change the function name to align with its function.

Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-ID: <20250513132628.1055549-2-jean-louis@dupond.be>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Run iotests with sanitizers

Commit 2cc4d1c5eab1 ("tests/check-block: Skip iotests when sanitizers
are enabled") changed iotests to skip when sanitizers are enabled.
The rationale is that AddressSanitizer emits warnings and reports leaks,
which results in test breakage. Later, sanitizers that are enabled for
production environments (safe-stack and cfi-icall) were exempted.

However, this approach has a few problems.

- It requires rebuild to disable sanitizers if the existing build has
  them enabled.
- It disables other useful non-production sanitizers.
- The exemption of safe-stack and cfi-icall is not correctly
  implemented, so qemu-iotests are incorrectly enabled whenever either
  safe-stack or cfi-icall is enabled *and*, even if there is another
  sanitizer like AddressSanitizer.

To solve these problems, direct AddressSanitizer warnings to separate
files to avoid changing the test results, and selectively disable
leak detection at runtime instead of requiring to disable all
sanitizers at buildtime.

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Message-ID: <20251023-iotests-v1-2-fab143ca4c2f@rsg.ci.i.u-tokyo.ac.jp>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-img: Fix amend option parse error handling

qemu_opts_del(opts) dereferences opts->list, which is the old amend_opts
pointer that can be dangling after executing
qemu_opts_append(amend_opts, bs->drv->create_opts) and cause
use-after-free.

Fix the potential use-after-free by moving the qemu_opts_del() call
before the qemu_opts_append() call.

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Message-ID: <20251023-iotests-v1-1-fab143ca4c2f@rsg.ci.i.u-tokyo.ac.jp>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Test resizing file node under raw with size/offset

This adds some more tests for using the 'size' and 'offset' options of
raw to the recently added resize-below-raw test.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251028094328.17919-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Drop detach_subchain for bdrv_replace_node

Detaching filters using detach_subchain=true can cause segfaults as
described in #3149.

More specifically, this was observed when executing concurrent
block-stream and query-named-block-nodes. block-stream adds a
copy-on-read filter as the main BDS for the blockjob; that filter was
dropped with detach_subchain=true but not unref'd until the the blockjob
was free'd. Because query-named-block-nodes assumes that a filter will
always have exactly one child, it caused a segfault when it observed the
detached filter. Stacktrace:

0  bdrv_refresh_filename (bs=0x5efed72f8350)
    at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:8082
1  0x00005efea73cf9dc in bdrv_block_device_info
    (blk=0x0, bs=0x5efed72f8350, flat=true, errp=0x7ffeb829ebd8)
    at block/qapi.c:62
2  0x00005efea7391ed3 in bdrv_named_nodes_list
    (flat=<optimized out>, errp=0x7ffeb829ebd8)
    at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/block.c:6275
3  0x00005efea7471993 in qmp_query_named_block_nodes
    (has_flat=<optimized out>, flat=<optimized out>, errp=0x7ffeb829ebd8)
    at /usr/src/qemu-1:10.1.0+ds-5ubuntu2/b/qemu/blockdev.c:2834
4  qmp_marshal_query_named_block_nodes
    (args=<optimized out>, ret=0x7f2b753beec0, errp=0x7f2b753beec8)
    at qapi/qapi-commands-block-core.c:553
5  0x00005efea74f03a5 in do_qmp_dispatch_bh (opaque=0x7f2b753beed0)
    at qapi/qmp-dispatch.c:128
6  0x00005efea75108e6 in aio_bh_poll (ctx=0x5efed6f3f430)
    at util/async.c:219
7  0x00005efea74ffdb2 in aio_dispatch (ctx=0x5efed6f3f430)
    at util/aio-posix.c:436
8  0x00005efea7512846 in aio_ctx_dispatch (source=<optimized out>,
    callback=<optimized out>,user_data=<optimized out>)
    at util/async.c:361
9  0x00007f2b77809bfb in ?? ()
    from /lib/x86_64-linux-gnu/libglib-2.0.so.0
10 0x00007f2b77809e70 in g_main_context_dispatch ()
    from /lib/x86_64-linux-gnu/libglib-2.0.so.0
11 0x00005efea7517228 in glib_pollfds_poll () at util/main-loop.c:287
12 os_host_main_loop_wait (timeout=0) at util/main-loop.c:310
13 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:589
14 0x00005efea7140482 in qemu_main_loop () at system/runstate.c:905
15 0x00005efea744e4e8 in qemu_default_main (opaque=opaque@entry=0x0)
    at system/main.c:50
16 0x00005efea6e76319 in main
    (argc=<optimized out>, argv=<optimized out>)
    at system/main.c:93

As discussed in 20251024-second-fix-3149-v1-1-d997fa3d5ce2@canonical.com,
a filter should not exist without children in the first place; therefore,
drop the parameter entirely as it is only used for filters.

This is a partial revert of 3108a15cf09865456d499b08fe14e3dbec4ccbb3.

After this change, a blockdev-backup job's copy-before-write filter will
hold references to its children until the filter is unref'd. This causes
an additional flush during bdrv_close, so also update iotest 257.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3149
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20251029-third-fix-3149-v2-1-94932bb404f4@canonical.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: replace TABs with space

Bring the block files in line with the QEMU coding style, with spaces
for indentation. This patch partially resolves the issue 371.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/371
Signed-off-by: Yeqi Fu <fufuyqqqqqq@gmail.com>
Message-ID: <20230325085224.23842-1-fufuyqqqqqq@gmail.com>
[thuth: Rebased the patch to the current master branch]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251007163511.334178-1-thuth@redhat.com>
[kwolf: Fixed up vertical alignemnt]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/io_uring: use non-vectored read/write when possible

The io_uring_prep_readv2/writev2() man pages recommend using the
non-vectored read/write operations when possible for performance
reasons.

I didn't measure a significant difference but it doesn't hurt to have
this optimization in place.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-16-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/io_uring: use aio_add_sqe()

AioContext has its own io_uring instance for file descriptor monitoring.
The disk I/O io_uring code was developed separately. Originally I
thought the characteristics of file descriptor monitoring and disk I/O
were too different, requiring separate io_uring instances.

Now it has become clear to me that it's feasible to share a single
io_uring instance for file descriptor monitoring and disk I/O. We're not
using io_uring's IOPOLL feature or anything else that would require a
separate instance.

Unify block/io_uring.c and util/fdmon-io_uring.c using the new
aio_add_sqe() API that allows user-defined io_uring sqe submission. Now
block/io_uring.c just needs to submit readv/writev/fsync and most of the
io_uring-specific logic is handled by fdmon-io_uring.c.

There are two immediate advantages:
1. Fewer system calls. There is no need to monitor the disk I/O io_uring
   ring fd from the file descriptor monitoring io_uring instance. Disk
   I/O completions are now picked up directly. Also, sqes are
   accumulated in the sq ring until the end of the event loop iteration
   and there are fewer io_uring_enter(2) syscalls.
2. Less code duplication.

Note that error_setg() messages are not supposed to end with
punctuation, so I removed a '.' for the non-io_uring build error
message.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-15-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: add aio_add_sqe() API for user-defined io_uring requests

Introduce the aio_add_sqe() API for submitting io_uring requests in the
current AioContext. This allows other components in QEMU, like the block
layer, to take advantage of io_uring features without creating their own
io_uring context.

This API supports nested event loops just like file descriptor
monitoring and BHs do. This comes at a complexity cost: CQE callbacks
must be placed on a list so that nested event loops can invoke pending
CQE callbacks from parent event loops. If you're wondering why
CqeHandler exists instead of just a callback function pointer, this is
why.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-14-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: add fdmon_ops->dispatch()

The ppoll and epoll file descriptor monitoring implementations rely on
the event loop's generic file descriptor, timer, and BH dispatch code to
invoke user callbacks.

The io_uring file descriptor monitoring implementation will need
io_uring-specific dispatch logic for CQE handlers for custom SQEs.

Introduce a new FDMonOps ->dispatch() callback that allows file
descriptor monitoring implementations to invoke user callbacks. The next
patch will use this new callback.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-13-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: unindent fdmon_io_uring_destroy()

Reduce the level of indentation to make further code changes easier to
read.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-12-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: gracefully handle io_uring_queue_init() failure

io_uring may not be available at runtime due to system policies (e.g.
the io_uring_disabled sysctl) or creation could fail due to file
descriptor resource limits.

Handle failure scenarios as follows:

If another AioContext already has io_uring, then fail AioContext
creation so that the aio_add_sqe() API is available uniformly from all
QEMU threads. Otherwise fall back to epoll(7) if io_uring is
unavailable.

Notes:
- Update the comment about selecting the fastest fdmon implementation.
  At this point it's not about speed anymore, it's about aio_add_sqe()
  API availability.
- Uppercase the error message when converting from error_report() to
  error_setg_errno() for consistency (but there are instances of
  lowercase in the codebase).
- It's easier to move the #ifdefs from aio-posix.h to aio-posix.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio: add errp argument to aio_context_setup()

When aio_context_new() -> aio_context_setup() fails at startup it
doesn't really matter whether errors are returned to the caller or the
process terminates immediately.

However, it is not acceptable to terminate when hotplugging --object
iothread at runtime. Refactor aio_context_setup() so that errors can be
propagated. The next commit will set errp when fdmon_io_uring_setup()
fails.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-10-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio: free AioContext when aio_context_new() fails

g_source_destroy() only removes the GSource from the GMainContext it's
attached to, if any. It does not free it.

Use g_source_unref() instead so that the AioContext (which embeds a
GSource) is freed. There is no need to call g_source_destroy() in
aio_context_new() because the GSource isn't attached to a GMainContext
yet.

aio_ctx_finalize() expects everything to be set up already, so introduce
the new ctx->initialized boolean and do nothing when called with
!initialized. This also requires moving aio_context_setup() down after
event_notifier_init() since aio_ctx_finalize() won't release any
resources that aio_context_setup() acquired.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-9-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio: remove aio_context_use_g_source()

There is no need for aio_context_use_g_source() now that epoll(7) and
io_uring(7) file descriptor monitoring works with the glib event loop.
AioContext doesn't need to be notified that GSource is being used.

On hosts with io_uring support this now enables fdmon-io_uring.c by
default, replacing fdmon-poll.c and fdmon-epoll.c. In other words, the
event loop will use io_uring!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-8-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: integrate fdmon into glib event loop

AioContext's glib integration only supports ppoll(2) file descriptor
monitoring. epoll(7) and io_uring(7) disable themselves and switch back
to ppoll(2) when the glib event loop is used. The main loop thread
cannot use epoll(7) or io_uring(7) because it always uses the glib event
loop.

Future QEMU features may require io_uring(7). One example is uring_cmd
support in FUSE exports. Each feature could create its own io_uring(7)
context and integrate it into the event loop, but this is inefficient
due to extra syscalls. It would be more efficient to reuse the
AioContext's existing fdmon-io_uring.c io_uring(7) context because
fdmon-io_uring.c will already be active on systems where Linux io_uring
is available.

In order to keep fdmon-io_uring.c's AioContext operational even when the
glib event loop is used, extend FDMonOps with an API similar to
GSourceFuncs so that file descriptor monitoring can integrate into the
glib event loop.

A quick summary of the GSourceFuncs API:
- prepare() is called each event loop iteration before waiting for file
descriptors and timers.
- check() is called to determine whether events are ready to be
dispatched after waiting.
- dispatch() is called to process events.

More details here: https://docs.gtk.org/glib/struct.SourceFuncs.html

Move the ppoll(2)-specific code from aio-posix.c into fdmon-poll.c and
also implement epoll(7)- and io_uring(7)-specific file descriptor
monitoring code for glib event loops.

Note that it's still faster to use aio_poll() rather than the glib event
loop since glib waits for file descriptor activity with ppoll(2) and
does not support adaptive polling. But at least epoll(7) and io_uring(7)
now work in glib event loops.

Splitting this into multiple commits without temporarily breaking
AioContext proved difficult so this commit makes all the changes. The
next commit will remove the aio_context_use_g_source() API because it is
no longer needed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-7-stefanha@redhat.com>
[kwolf: Build fixes; fix AioContext.list_lock use after destroy]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/unit: skip test-nested-aio-poll with io_uring

test-nested-aio-poll relies on internal details of how fdmon-poll.c
handles AioContext polling. Skip it when other fdmon implementations are
in use.

The reason why fdmon-io_uring.c behaves differently from fdmon-poll.c is
that its fdmon_ops->need_wait() function returns true when
io_uring_enter(2) must be called (e.g. to submit pending SQEs).
AioContext polling is skipped when ->need_wait() returns true, so the
test case will never enter AioContext polling mode with
fdmon-io_uring.c.

Restrict this test to fdmon-poll.c and drop the
aio_context_use_g_source() call since it's no longer necessary.

Note that this test is only built on POSIX systems so it is safe to
include "util/aio-posix.h".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251104022933.618123-6-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: keep polling enabled with fdmon-io_uring.c

Commit 816a430c517e ("util/aio: Defer disabling poll mode as long as
possible") kept polling enabled when the event loop timeout is 0. Since
there is no timeout the event loop will continue immediately and the
overhead of disabling and re-enabling polling can be avoided.

fdmon-io_uring.c is unable to take advantage of this optimization
because its ->need_wait() function returns true whenever there are new
io_uring SQEs to submit:

if (timeout || ctx->fdmon_ops->need_wait(ctx)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Polling will be disabled even when timeout == 0.

Extend the optimization to handle the case when need_wait() returns true
and timeout == 0.

Cc: Chao Gao <chao.gao@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: fix spurious return from ->wait() due to signals

io_uring_enter(2) only returns -EINTR in some cases when interrupted by
a signal. Therefore the while loop in fdmon_io_uring_wait() is
incomplete and can lead to a spurious early return.

Handle the case when a signal interrupts io_uring_enter(2) but the
syscall returns the number of SQEs submitted (that takes priority over
-EINTR).

This patch probably makes little difference for QEMU, but the test suite
relies on the exact pattern of aio_poll() return values, so it's best to
hide this io_uring syscall interface quirk.

Here is the strace of test-aio receiving 3 SIGCONT signals after this
fix has been applied. Notice how the io_uring_enter(2) return value is 1
the first time because an SQE was submitted, but -EINTR the other times:

  eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 9
  io_uring_enter(7, 1, 0, 0, NULL, 8) = 1
  clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe38a46240) = 0
  io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8) = 1
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} ---
  io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8) = -1 EINTR (Interrupted system call)
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} ---
  io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...>
  <... io_uring_enter resumed>) = -1 EINTR (Interrupted system call)
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=596096, si_uid=1000} ---
  io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <unfinished ...>
  <... io_uring_enter resumed>) = 0

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: fix fdmon-io_uring.c timeout stack variable lifetime

io_uring_prep_timeout() stashes a pointer to the timespec struct rather
than copying its fields. That means the struct must live until after the
SQE has been submitted by io_uring_enter(2). add_timeout_sqe() violates
this constraint because the SQE is not submitted within the function.

Inline add_timeout_sqe() into fdmon_io_uring_wait() so that the struct
lives at least as long as io_uring_enter(2).

This fixes random hangs (bogus timeout values) when the kernel loads
undefined timespec struct values from userspace after the original
struct on the stack has been destroyed.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251104022933.618123-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: fix race between io_uring CQE and AioHandler deletion

When an AioHandler is enqueued on ctx->submit_list for removal, the
fill_sq_ring() function will submit an io_uring POLL_REMOVE operation to
cancel the in-flight POLL_ADD operation.

There is a race when another thread enqueues an AioHandler for deletion
on ctx->submit_list when the POLL_ADD CQE has already appeared. In that
case POLL_REMOVE is unnecessary. The code already handled this, but
forgot that the AioHandler itself is still on ctx->submit_list when the
POLL_ADD CQE is being processed. It's unsafe to delete the AioHandler at
that point in time (use-after-free).

Solve this problem by keeping the AioHandler alive but setting a flag so
that it will be deleted by fill_sq_ring() when it runs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20251104022933.618123-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge tag 'pull-request-2025-11-11' of https://gitlab.com/thuth/qemu into staging

* Fix some issues in the functional tests that pylint complains about

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCgAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmkTDfQRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbVj8RAAhOSNyBa81eFJXydkqp0qrQYw6WGT/mAP
# Zn5oTm6NhsgLbUKgbqYQIAivE7VNVWfdhj7aOO9wYM1GfhCk/LOHZWBTNXxFF/uH
# m7ICV5dtSF2zE1AdsWn2rB6vPocc/VMDCHhIzfC7AYlEA7AGuu/O2QALE8H/qOS5
# mQ3+Fuq2EYkOKxKsSnUcj+ZPnUA3NlIF2CTeY0jTQFrwO5RKU3jsScm+uOZZJycn
# DTOzJTymIBGNSlFMNEoj4AhoY43SDdcQcZhwvAPzHZZTVhotJxHf5Fvr7XnDW5VA
# zTA7xZgnY0eAtvzZ4ihyT9BfAHdk62WgBrUeohQ1Ggf/Bo11DVCJtkQ4iY5bY4uI
# yalO7QSMi04PudeIRJmKTAhR6zhDZb/XijtrIcFn6ypTnOEMw8V7MJt9qXB76I/X
# HDZ9859a0//8F70I3mAxDKj8ve/Y6ACuY7pOwKR1Ea0iuM47Dgw9jsuUKRRPUZ+p
# rhJiQ10j8B6mxI0HCqEr8S47zMbW7uJViVYLT7yYKL7vokr96mm08/gEOI07cc88
# CKw3FocW2/suOdFCJVsIrjjq/ySVv0GTAkIeGUaefnY13dmq8ZILmT+GOOf695s9
# PDCoPWzdCY5n0OxToMUosJkQKbFp2F2ls5IGcEHUwxkqPT68/gsqb1VeC8W7x6Gs
# nJGM9ZR7XcM=
# =FhJ1
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 11 Nov 2025 11:20:36 AM CET
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [unknown]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [unknown]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2025-11-11' of https://gitlab.com/thuth/qemu:
  tests/functional/m68k/test_nextcube: Fix issues reported by pylint
  tests/functional/mips64el: Silence issues reported by pylint
  tests/functional/aarch64/test_device_passthrough: Fix warnings from pylint
  tests/functional: Fix problems in testcase.py reported by pylint

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tests/functional/m68k/test_nextcube: Fix issues reported by pylint

Fix the indentation in one line, and while we're at it, use an f-string
instead of old-school formatting in another spot.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251110104837.52077-1-thuth@redhat.com>

tests/functional/mips64el: Silence issues reported by pylint

Drop unused imports, annotate imports that are not at the top, but done
on purpose in other locations, use f-strings where it makes sense, etc.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251103192430.63278-1-thuth@redhat.com>

tests/functional/aarch64/test_device_passthrough: Fix warnings from pylint

Remove unused imports, write constants with capital letters and make
sure that the code uses the right indentation / formatting.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20251030143203.297692-1-thuth@redhat.com>

tests/functional: Fix problems in testcase.py reported by pylint

- put 3rd party "import pycotap" after the standard imports
- "help" is a built-in function in Python, don't use it as a variable name
- put the doc strings in the right locations (after the "def" line)
- use isinstance() instead of checking via type()

Message-Id: <a3413bbd-e98c-4267-81c7-aa42aeda8a09@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pci,pc: fixes for 10.2

small fixes all over the place.
UDP tunnel and TSEG tweaks are kind of borderline,
but I feel not making the change now will just add
to compatibility headaches down the road.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmkQplIPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpFDsIAMlScYTW0fugUaP4B/a8xjgRFwBSk2CoU7aE
# l0k5ihyadecpnMLswkvoLfH9jl5Mu3MOZ6bpfcIHOWXMusGyiYcds6wupb8qcATP
# Ud4ZjybuNrpoGUul1ECkNTE3xvUtSBOVu8z9ac4ojP+w0LVDiuWyg1bl5QiRuzEg
# K87OjbdTIgCKKJi5QRw/dMJfoOofay98g0kbcuhkBiudvu3FtOpJW0g/aiY1m2sY
# MXYeBZjGbYGkAOXLKRcSr3nYtZbY4sg/onJ3Xb0HPbUZfRMTm7KKApwhH9jsHmlO
# VgaRGcF+dNDC7XIsaZt6k/YTsWCApYvuCcEQbjR1rW1d4ZmZU/Y=
# =ocWR
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 09 Nov 2025 03:33:54 PM CET
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [unknown]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
  vhost-user.rst: clarify when FDs can be sent
  q35: increase default tseg size
  virtio-net: Advertise UDP tunnel GSO support by default
  tests/qtest/bios-tables-test: Update DSDT blobs after GPEX _DSM change
  hw/pci-host/gpex-acpi: Fix _DSM function 0 support return value
  tests/qtest/bios-tables-test: Prepare for _DSM change in the DSDT table
  vhost-user: make vhost_set_vring_file() synchronous
  intel_iommu: Fix DMA failure when guest switches IOMMU domain
  intel_iommu: Reset pasid cache when system level reset
  intel_iommu: Handle PASID cache invalidation
  vhost-user: fix shared object lookup handler logic
  amd_iommu: Support 64-bit address for IOTLB lookup
  amd_iommu: Fix handling of devices on buses != 0
  MAINTAINERS: Update entry for AMD-Vi Emulation

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'pull-ppc-for-10.2-d5-20251110' of https://gitlab.com/harshpb/qemu into staging

PPC Patches for 10.2 Hard Freeze

* Pegasos fixes for mem leak and dtb blob updates

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEa4EM1tK+EPOIPSFCRUTplPnWj7sFAmkRm/YACgkQRUTplPnW
# j7tTWA/+PTQfODH0dRpuApQys23okruXRJ0C26e+1Bb/H7IeSerfZ33GgpgW8ldi
# R6amhrJ4GYXFkjK34iFV+daXhtKEA/44fBykr1SCwDixiD7qGGq7a0yOEDERurEq
# eDn4of82O2C2l1jUY+hx0jXgWlEQLAeLH1bVwikJL75jbV7Ob7wt3W3bC7M6iup9
# jaZP6RwcXW9JqFeavS5r3DCbdPf+U/jafmxIP+qpZVS92jwxcOZbmsXgZVPW92xe
# Cwc8AY3FwUIdUfPGKj2uyuJNtLWuev0+o1roZ8mmuiSFoMGQuw+X5bmLt0qBvVyK
# EPc0dxsliyUhPso4vq9SCI9hBid0NQlsqpGpRWpEuP0z8vc4aF41P++VBC4DQ8ls
# Ffc2dz3ncUhII8V+N7jGykWG2ZKOqxgndlq7V/8k2f96kbDWEXNYJomnJd5NN6NK
# uKlKQN9pu2Btp2Lo9bLNVQT3jclByBmNtSyzqQhbLT/JbhTorhs6mYilTM8Wv7da
# 1Dn+PesmxTMtO7wgjy1qu6Ms55zTweKvpW0sNDMOMGOvQ1ssff/3WT8nrk1jXXHw
# UeEidzTZtr375LkCJ7DQnChztr9YjiQLPPAEkpUMz1sV32fGRrOr4kR3zGbjAiBY
# ARZLAErqHBMYO0NYi/+MR266cjZ841d+ImrP329BZqBvGfGBbpE=
# =iAZh
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Nov 2025 09:01:58 AM CET
# gpg:                using RSA key 6B810CD6D2BE10F3883D21424544E994F9D68FBB
# gpg: Good signature from "Harsh Prateek Bora <harsh.prateek.bora@gmail.com>" [undefined]
# gpg:                 aka "Harsh Prateek Bora <harshpb@linux.ibm.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6B81 0CD6 D2BE 10F3 883D  2142 4544 E994 F9D6 8FBB

* tag 'pull-ppc-for-10.2-d5-20251110' of https://gitlab.com/harshpb/qemu:
  pc-bios/dtb/pegasos*.dtb: Fix compiled dtb blobs
  hw/ppc/pegasos: Fix memory leak

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'lasi-fixes-pull-request' of https://github.com/hdeller/qemu-hppa into staging

hppa lasi bugfixes pull request

Please pull a bunch of fixes which repair issues introduced due to the previous
patch series which added LASI SCSI and LASI network card support as  well as
the new 715 machines.
This includes fixes for reported coverty issues, and repairs the B160L machine
emulation.

Thanks!
Helge

# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaREQRQAKCRD3ErUQojoP
# Xy+DAQDJk9BbaZA4DOIMptbGewQMJLRYESa6XClF3s0IdbORQQD8DB49ipDtQkBz
# 50VfT6IusGBBKMaLr/9XgKqrk2bBqgc=
# =mgEV
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 09 Nov 2025 11:05:57 PM CET
# gpg:                using EDDSA key BCE9123E1AD29F07C049BBDEF712B510A23A0F5F
# gpg: Good signature from "Helge Deller <deller@gmx.de>" [unknown]
# gpg:                 aka "Helge Deller <deller@kernel.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4544 8228 2CD9 10DB EF3D  25F8 3E5F 3D04 A7A2 4603
#      Subkey fingerprint: BCE9 123E 1AD2 9F07 C049  BBDE F712 B510 A23A 0F5F

* tag 'lasi-fixes-pull-request' of https://github.com/hdeller/qemu-hppa:
  target/hppa: Update SeaBIOS-hppa to version 20
  ncr710: Use address space of device instead of global address space
  ncr710: Add missing vmstate entries
  i82596: Adding proper break-statement functionality in RX functions
  i82596: Remove crc_valid variable
  ncr710: Drop leftover debug code
  ncr710: Fix potential null pointer dereference

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'pull-misc-20251110' of https://gitlab.com/rth7680/qemu into staging

accel/tcg: Trace tb_flush() calls
accel/tcg: Trace tb_gen_code() buffer overflow
qapi/parser: Mollify mypy
tests/functional: Mark another MIPS replay test as flaky
target/x86: Correctly handle invalid 0x0f 0xc7 0xxx insns

# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmkRx8EdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9wywf/e1aFOMdj6SFHeum6
# vb7cmWZWDQr5KrV2lnHxkAhoGk4TL6StlWNgSJfUVAzeElbNTqM+W/w0yJrM7W6K
# LEsYCVsvA1juIrfD8aPkzO5+hS0bv+nCS74k7OsYlS4u20A7FBRrR98UI4icgYO0
# ND4hEdGMP+1+Rc+U8+qhP4KiXMW2c3MC7SXwsb8fvdBvbe9Oh7ExpeOJao8mlasg
# hmu4WrjGQwkxLLLkAK7F55IgJx6x8QIWxtjg+q1AxA7AhgnG/kQ8e4RDF8cZyORF
# fsVRgST4o7kCdM9n2eicVLf2P0BLbZgM1bpsoXPadjTUMpioXLujGCIzl5Cnto4k
# AjpTJQ==
# =Tirj
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Nov 2025 12:08:49 PM CET
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]

* tag 'pull-misc-20251110' of https://gitlab.com/rth7680/qemu:
  target/x86: Correctly handle invalid 0x0f 0xc7 0xxx insns
  tests/functional: Mark another MIPS replay test as flaky
  qapi/parser: Mollify mypy
  accel/tcg: Trace tb_gen_code() buffer overflow
  accel/tcg: Trace tb_flush() calls

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/x86: Correctly handle invalid 0x0f 0xc7 0xxx insns

In the decode_group9() function, if we don't recognise the insn as
one that we should handle, we leave the 'entry' pointer unaltered.
Because the X86OpEntry struct has a union for the gen and decode
pointers, this means that the top level code will call decode.e.gen()
which tries to use the decode function pointer (still set to
decode_group9) as a gen function pointer.

This is undefined behaviour, but seems to be mostly harmless in
practice (we call decode_group9() again with bogus arguments and it
does nothing). If you have CFI enabled then it will trip the CFI
check:

../target/i386/tcg/decode-new.c.inc:2862:9: runtime error: control flow integrity check for type 'void (struct DisasContext *, struct X86DecodedInsn *)' failed during indirect function call

Set *entry to UNKNOWN_OPCODE to provoke the #UD exception, as we do
in decode_group1A() and decode_group11() for similar situations.

Thanks to the bug reporter for the clear description and analysis of
the bug and the simple reproducer.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3172
Fixes: fcd16539ebfe2 ("target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251021173152.1695997-1-peter.maydell@linaro.org>

tests/functional: Mark another MIPS replay test as flaky

When disabling MIPS tests on commit 1c11aa18071
("tests/functional: Mark the MIPS replay tests as flaky")
we missed the 5KEc test.

Reported-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251104145955.84091-1-philmd@linaro.org>

qapi/parser: Mollify mypy

re.match(r'^ *', ...) can't fail, but mypy doesn't know that and
complains:

scripts/qapi/parser.py:444: error: Item "None" of "Match[str] | None" has no attribute "end" [union-attr]

Work around by using must_match() instead.

Fixes: 8107ba47fd78 (qapi: Add documentation format validation)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20251105152219.311154-1-armbru@redhat.com>

accel/tcg: Trace tb_gen_code() buffer overflow

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250925035610.80605-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Trace tb_flush() calls

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250925035610.80605-2-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/hppa: Update SeaBIOS-hppa to version 20

This is SeaBIOS for the hppa architecture v20
and it contains mostly bugfixes for issues which
were introduced by adding the 715/64 machine.

Fixes include:
- Fix inventory for 715 Snake machine
- Detect if LASI LAN and SCSI exists at startup
- Allow LASI LAN on B160L if created by qemu
- Enhance error messages

Signed-off-by: Helge Deller <deller@gmx.de>

ncr710: Use address space of device instead of global address space

Signed-off-by: Soumyajyotii Ssarkar <soumyajyotisarkar23@gmail.com>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>

ncr710: Add missing vmstate entries

Signed-off-by: Soumyajyotii Ssarkar <soumyajyotisarkar23@gmail.com>
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>

vhost-user.rst: clarify when FDs can be sent

Previously the spec did not say where in a message the FDs should be
sent.  As I understand it, FDs transferred in ancillary data will
always be received along with the first byte of the data they were
sent with, so we should define which byte that is.  Going by both
libvhost-user in QEMU and the rust-vmm crate, that byte is the first
byte of the message header.  This is important to specify because it
would make back-end implementation significantly more complicated if
receiving file descriptors in the middle of a message had to be
handled.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251106192105.3456755-1-hi@alyssa.is>

q35: increase default tseg size

With virtual machines becoming larger (more CPUs, more memory) the
memory needed by the SMM code in OVMF to manage page tables and vcpu
state grows too.

Default SMM memory (aka TSEG) size is 16 MB, and this often is not
enough. Bump it to 64 MB for new machine types.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251106105640.1642109-1-kraxel@redhat.com>

virtio-net: Advertise UDP tunnel GSO support by default

Allow bidirectional aggregated traffic for UDP encapsulated flows.

Add the needed compatibility entries to avoid migration issues
vs older QEMU instances.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <9c500fbcd2cf29afd1826b1ac906f9d5beac3601.1760104079.git.pabeni@redhat.com>

tests/qtest/bios-tables-test: Update DSDT blobs after GPEX _DSM change

Update the reference DSDT blobs after GPEX _DSM change. This affects the
aarch64 'virt', riscv64 "virt", loongarch64 "virt" and the x86 'microvm'
machines.

DSDT diff is the same for all the machines/tests:

/*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20230628 (64-bit version)
  * Copyright (c) 2000 - 2023 Intel Corporation
  *
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of tests/data/acpi/aarch64/virt/DSDT, Fri Oct 10 11:18:21 2025
+ * Disassembly of /tmp/aml-E6V9D3, Fri Oct 10 11:18:21 2025
  *
  * Original Table Header:
  *     Signature        "DSDT"
  *     Length           0x000014D9 (5337)
  *     Revision         0x02
- *     Checksum         0xA4
+ *     Checksum         0xA5
  *     OEM ID           "BOCHS "
  *     OEM Table ID     "BXPC    "
  *     OEM Revision     0x00000001 (1)
  *     Compiler ID      "BXPC"
  *     Compiler Version 0x00000001 (1)
  */
DefinitionBlock ("", "DSDT", 2, "BOCHS ", "BXPC    ", 0x00000001)
{
     Scope (\_SB)
     {
         Device (C000)
         {
             Name (_HID, "ACPI0007" /* Processor Device */)  // _HID: Hardware ID
             Name (_UID, Zero)  // _UID: Unique ID
         }

@@ -1822,33 +1822,33 @@
                 Else
                 {
                     CDW1 |= 0x04
                 }

                 Return (Arg3)
             }

             Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
             {
                 If ((Arg0 == ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d") /* Device Labeling Interface */))
                 {
                     If ((Arg2 == Zero))
                     {
                         Return (Buffer (One)
                         {
-                             0x01                                             // .
+                             0x00                                             // .
                         })
                     }
                 }

                 Return (Buffer (One)
                 {
                      0x00                                             // .
                 })
             }

             Device (RES0)
             {
                 Name (_HID, "PNP0C02" /* PNP Motherboard Resources */)  // _HID: Hardware ID
                 Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
                 {
                     QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite,

Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251022080639.243965-4-skolothumtho@nvidia.com>

hw/pci-host/gpex-acpi: Fix _DSM function 0 support return value

Currently, only function 0 is supported. According to the ACPI
Specification, Revision 6.6, Section 9.1.1 “_DSM (Device Specific
Method)”, bit 0 should be 0 to indicate that no other functions
are supported beyond function 0.

The resulting AML change looks like this:

Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
{
    If ((Arg0 == ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d")
    {
        If ((Arg2 == Zero))
        {
            Return (Buffer (One)
            {
-               0x01                                             // .
+               0x00                                             // .
            })
        }
    }
}

Fixes: 5b85eabe68f9 ("acpi: add acpi_dsdt_add_gpex")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251022080639.243965-3-skolothumtho@nvidia.com>

tests/qtest/bios-tables-test: Prepare for _DSM change in the DSDT table

Subsequent patch will fix the GPEX _DSM method. Add the affected DSDT blobs
to allowed-diff list for bios-table tests.

Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251022080639.243965-2-skolothumtho@nvidia.com>

vhost-user: make vhost_set_vring_file() synchronous

QEMU sends all of VHOST_USER_SET_VRING_KICK, _CALL, and _ERR without
setting the NEED_REPLY flag, i.e. by the time the respective
vhost_user_set_vring_*() function returns, it is completely up to chance
whether the back-end has already processed the request and switched over
to the new FD for interrupts.

At least for vhost_user_set_vring_call(), that is a problem: It is
called through vhost_virtqueue_mask(), which is generally used in the
VirtioDeviceClass.guest_notifier_mask() implementation, which is in turn
called by virtio_pci_one_vector_unmask(). The fact that we do not wait
for the back-end to install the FD leads to a race there:

Masking interrupts is implemented by redirecting interrupts to an
internal event FD that is not connected to the guest. Unmasking then
re-installs the guest-connected IRQ FD, then checks if there are pending
interrupts left on the masked event FD, and if so, issues an interrupt
to the guest.

Because guest_notifier_mask() (through vhost_user_set_vring_call())
doesn't wait for the back-end to switch over to the actual IRQ FD, it's
possible we check for pending interrupts while the back-end is still
using the masked event FD, and then we will lose interrupts that occur
before the back-end finally does switch over.

Fix this by setting NEED_REPLY on those VHOST_USER_SET_VRING_* messages,
so when we get that reply, we know that the back-end is now using the
new FD.

We have a few reports of a virtiofs mount hanging:
- https://gitlab.com/virtio-fs/virtiofsd/-/issues/101
- https://gitlab.com/virtio-fs/virtiofsd/-/issues/133
- https://gitlab.com/virtio-fs/virtiofsd/-/issues/213

This is quite difficult bug to reproduce, even for the reporters.
It only happens on production, every few weeks, and/or on 1 in 300 VMs.
So, we are not 100% sure this fixes that issue. However, we think this
is still a bug, and at least we have one report that claims this fixed
the issue:

https://gitlab.com/virtio-fs/virtiofsd/-/issues/133#note_2743209419

Fixes: 5f6f6664bf24 ("Add vhost-user as a vhost backend.")
Signed-off-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251022162405.318672-1-gmaglione@redhat.com>

intel_iommu: Fix DMA failure when guest switches IOMMU domain

Kernel allows user to switch IOMMU domain, e.g., switch between DMA
and identity domain. When this happen in IOMMU scalable mode, a pasid
cache invalidation request is sent, this request is ignored by vIOMMU
which leads to device binding to wrong address space, then DMA fails.

This issue exists in scalable mode with both first stage and second
stage translations, both emulated and passthrough devices.

Take network device for example, below sequence trigger issue:

1. start a guest with iommu=pt
2. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/unbind
3. echo DMA > /sys/kernel/iommu_groups/6/type
4. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/bind
5. Ping test

Fix it by switching address space in invalidation handler.

Fixes: 4a4f219e8a10 ("intel_iommu: add scalable-mode option to make scalable mode work")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251017093602.525338-4-zhenzhong.duan@intel.com>

intel_iommu: Reset pasid cache when system level reset

Reset pasid cache when system level reset. Currently we don't have any
device supporting PASID yet. So all are PASID_0, its vtd_as is allocated
by PCI system and never removed, just mark pasid cache invalid.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251017093602.525338-3-zhenzhong.duan@intel.com>

intel_iommu: Handle PASID cache invalidation

Adds a new entry VTDPASIDCacheEntry in VTDAddressSpace to cache the pasid
entry and track PASID usage and future PASID tagged DMA address translation
support in vIOMMU.

When guest triggers pasid cache invalidation, QEMU will capture it and
update or invalidate pasid cache.

vIOMMU emulator could figure out the reason by fetching latest guest pasid
entry in memory and compare it with cached PASID entry if it's valid.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251017093602.525338-2-zhenzhong.duan@intel.com>

pc-bios/dtb/pegasos*.dtb: Fix compiled dtb blobs

When adding these files somehow an incomplete version was committed.
Regenerate and update these dtb files to match the dts which fixes
problems caused by missing nodes in the dtb.

Fixes: 9099b430a4 (hw/ppc/pegasos2: Change device tree generation)
Fixes: 3c21f9dfcf (hw/ppc/pegasos2: Add VOF support for pegasos1)
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reported-by: Yogesh Vyas <yvyas1991@gmail.com>
Tested-by: Yogesh Vyas <yvyas1991@gmail.com>
Message-Id: <20251108193717.DADA9597302@zero.eik.bme.hu>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>

hw/ppc/pegasos: Fix memory leak

Commit 9099b430a4 introduced an early return that caused a leak of a
GString. Allocate it later to avoid the leak.

Fixes: 9099b430a4 (hw/ppc/pegasos2: Change device tree generation)
Resolves: Coverity CID 1642027
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Link: https://lore.kernel.org/r/20251101165236.76E8B5972E3@zero.eik.bme.hu
Message-ID: <20251101165236.76E8B5972E3@zero.eik.bme.hu>