]> git.ipfire.org Git - thirdparty/qemu.git/log
thirdparty/qemu.git
2 months agoqga: Fix truncated output handling in guest-exec status reporting
minglei.liu [Fri, 11 Jul 2025 02:17:14 +0000 (10:17 +0800)] 
qga: Fix truncated output handling in guest-exec status reporting

Signed-off-by: minglei.liu <minglei.liu@smartx.com>
Fixes: a1853dca743
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kostiantyn Kostiuk <kkostiuk@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250711021714.91258-1-minglei.liu@smartx.com
Signed-off-by: Kostiantyn Kostiuk <kkostiuk@redhat.com>
(cherry picked from commit 28c5d27dd4dc4100a96ff4c9e5871dd23c6b02ec)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoqga-vss: Write hex value of error in log
Kostiantyn Kostiuk [Mon, 25 Aug 2025 13:53:11 +0000 (16:53 +0300)] 
qga-vss: Write hex value of error in log

QGA-VSS writes error using error_setg_win32_internal,
which call g_win32_error_message.

g_win32_error_message - translate a Win32 error code
(as returned by GetLastError()) into the corresponding message.

In the same time, we call error_setg_win32_internal with
error codes from different Windows componets like VSS or
Performance monitor that provides different codes and
can't be converted with g_win32_error_message. In this
case, the empty suffix will be returned so error will be
masked.

This commit directly add hex value of error code.

Reproduce:
 - Run QGA command: {"execute": "guest-fsfreeze-freeze-list", "arguments": {"mountpoints": ["D:"]}}

QGA error example:
 - before changes:
  {"error": {"class": "GenericError", "desc": "failed to add D: to snapshot set: "}}
 - after changes:
  {"error": {"class": "GenericError", "desc": "failed to add D: to snapshot set: Windows error 0x8004230e: "}}

Reviewed-by: Yan Vugenfirer <yvugenfi@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250825135311.138330-1-kkostiuk@redhat.com
Signed-off-by: Kostiantyn Kostiuk <kkostiuk@redhat.com>
(cherry picked from commit edf3780a7dad4658ab7b72ea37e310a2be9b16d3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoqga/installer: Remove QGA VSS if QGA installation failed
Kostiantyn Kostiuk [Mon, 25 Aug 2025 14:31:55 +0000 (17:31 +0300)] 
qga/installer: Remove QGA VSS if QGA installation failed

When QGA Installer failed to install QGA service but install
QGA VSS provider, provider should be removed before installer
exits. Otherwise QGA VSS will has broken infomation and
prevent QGA installation in next run.

Reviewed-by: Yan Vugenfirer <yvugenfi@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250825143155.160913-1-kkostiuk@redhat.com
Signed-off-by: Kostiantyn Kostiuk <kkostiuk@redhat.com>
(cherry picked from commit 85ff0e956bf26a93c92e4dca8f6257613269a0cf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/arm/stm32f205_soc: Don't leak TYPE_OR_IRQ objects
Peter Maydell [Thu, 21 Aug 2025 15:42:29 +0000 (16:42 +0100)] 
hw/arm/stm32f205_soc: Don't leak TYPE_OR_IRQ objects

In stm32f250_soc_initfn() we mostly use the standard pattern
for child objects of calling object_initialize_child(). However
for s->adc_irqs we call object_new() and then later qdev_realize(),
and we never unref the object on deinit. This causes a leak,
detected by ASAN on the device-introspect-test:

Indirect leak of 10 byte(s) in 1 object(s) allocated from:
    #0 0x5b9fc4789de3 in malloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/qemu-system-arm+0x21f1de3) (BuildId: 267a2619a026ed91c78a07b1eb2ef15381538efe)
    #1 0x740de3f28b09 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x62b09) (BuildId: 1eb6131419edb83b2178b682829a6913cf682d75)
    #2 0x740de3f3e4d8 in g_strdup (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x784d8) (BuildId: 1eb6131419edb83b2178b682829a6913cf682d75)
    #3 0x5b9fc70159e1 in g_strdup_inline /usr/include/glib-2.0/glib/gstrfuncs.h:321:10
    #4 0x5b9fc70159e1 in object_property_try_add /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:1276:18
    #5 0x5b9fc7015f94 in object_property_add /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:1294:12
    #6 0x5b9fc701b900 in object_add_link_prop /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:2021:10
    #7 0x5b9fc701b3fc in object_property_add_link /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:2037:12
    #8 0x5b9fc4c299fb in qdev_init_gpio_out_named /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../hw/core/gpio.c:90:9
    #9 0x5b9fc4c29b26 in qdev_init_gpio_out /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../hw/core/gpio.c:101:5
    #10 0x5b9fc4c0f77a in or_irq_init /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../hw/core/or-irq.c:70:5
    #11 0x5b9fc70257e1 in object_init_with_type /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:428:9
    #12 0x5b9fc700cd4b in object_initialize_with_type /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:570:5
    #13 0x5b9fc700e66d in object_new_with_type /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:774:5
    #14 0x5b9fc700e750 in object_new /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../qom/object.c:789:12
    #15 0x5b9fc68b2162 in stm32f205_soc_initfn /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-asan/../../hw/arm/stm32f205_soc.c:69:26

Switch to using object_initialize_child() like all our
other child objects for this SoC object.

Cc: qemu-stable@nongnu.org
Fixes: b63041c8f6b ("STM32F205: Connect the ADC devices")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250821154229.2417453-1-peter.maydell@linaro.org
(cherry picked from commit 2e27650bddd35477d994a795a3b1cb57c8ed5c76)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoqemu/atomic: Finish renaming atomic128-cas.h headers
Richard Henderson [Fri, 15 Aug 2025 12:26:47 +0000 (22:26 +1000)] 
qemu/atomic: Finish renaming atomic128-cas.h headers

The aarch64 header was not renamed with the others, meaning it
was skipped in favor of the generic version.

Cc: qemu-stable@nongnu.org
Fixes: 15606965400b ("qemu/atomic: Rename atomic128-cas.h headers using .h.inc suffix")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250815122653.701782-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 1748c0d59228c7790940d8be381df1c3108022b1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoscripts/kernel-doc: Avoid new Perl precedence warning
Peter Maydell [Tue, 19 Aug 2025 11:56:48 +0000 (12:56 +0100)] 
scripts/kernel-doc: Avoid new Perl precedence warning

Newer versions of Perl (5.41.x and up) emit a warning for code in
kernel-doc:
 Possible precedence problem between ! and pattern match (m//) at /scripts/kernel-doc line 1597.

This is because the code does:
            if (!$param =~ /\w\.\.\.$/) {

In Perl, the !  operator has higher precedence than the =~
pattern-match binding, so the effect of this condition is to first
logically-negate the string $param into a true-or-false value and
then try to pattern match it against the regex, which in this case
will always fail.  This is almost certainly not what the author
intended.

In the new Python version of kernel-doc in the Linux kernel,
the equivalent code is written:

            if KernRe(r'\w\.\.\.$').search(param):
                # For named variable parameters of the form `x...`,
                # remove the dots
                param = param[:-3]
            else:
                # Handles unnamed variable parameters
                param = "..."

which is a more sensible way of writing the behaviour you would
get if you put in brackets to make the regex match first and
then negate the result.

Take this as the intended behaviour, and update the Perl to match.

For QEMU, this produces no change in output, presumably because we
never used the "unnamed variable parameters" syntax.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Message-id: 20250819115648.2125709-1-peter.maydell@linaro.org
(cherry picked from commit 5ffd387e9e0f787744fadaad35e1bf92224b0642)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agotarget/arm: Trap PMCR when MDCR_EL2.TPMCR is set
Smail AIDER [Tue, 26 Aug 2025 10:21:28 +0000 (11:21 +0100)] 
target/arm: Trap PMCR when MDCR_EL2.TPMCR is set

Trap PMCR_EL0 or PMCR accesses to EL2 when MDCR_EL2.TPMCR is set.
Similar to MDCR_EL2.TPM, MDCR_EL2.TPMCR allows trapping EL0 and EL1
accesses to the PMCR register to EL2.

Cc: qemu-stable@nongnu.org
Signed-off-by: Smail AIDER <smail.aider@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250811112143.1577055-2-smail.aider@huawei.com
Message-Id: <20250722131925.2119169-1-smail.aider@huawei.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 186db6a73bc5c01026bb9f4f4a59e442c0156841)
(Mjt: adjust for 10.0, before target/arm/helper.c split)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/intc/arm_gicv3_kvm: preserve pending interrupts during cpr
Steve Sistare [Tue, 26 Aug 2025 10:21:28 +0000 (11:21 +0100)] 
hw/intc/arm_gicv3_kvm: preserve pending interrupts during cpr

Close a race condition that causes cpr-transfer to lose VFIO
interrupts on ARM.

CPR stops VCPUs but does not disable VFIO interrupts, which may continue
to arrive throughout the transition to new QEMU.

CPR calls kvm_irqchip_remove_irqfd_notifier_gsi in old QEMU to force
future interrupts to the producer eventfd, where they are preserved.
Old QEMU then destroys the old KVM instance.  However, interrupts may
already be pending in KVM state.  To preserve them, call ioctl
KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES to flush them to guest RAM, where
they will be picked up when the new KVM+VCPU instance is created.

Cc: qemu-stable@nongnu.org
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-id: 1754936384-278328-1-git-send-email-steven.sistare@oracle.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 376cdd7e9c94f1e03b2c58e068e8ebfe78b49514)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agolinux-user: Add strace for rseq
Joel Stanley [Tue, 26 Aug 2025 06:03:40 +0000 (15:33 +0930)] 
linux-user: Add strace for rseq

 build/qemu-riscv64 -cpu rv64,v=on -d strace  build/tests/tcg/riscv64-linux-user/test-vstart-overflow
 1118081 riscv_hwprobe(0xffffbc038200,1,0,0,0,0) = 0
 1118081 brk(NULL) = 0x0000000000085000
 1118081 brk(0x0000000000085b00) = 0x0000000000085b00
 1118081 set_tid_address(0x850f0) = 1118081
 1118081 set_robust_list(0x85100,24) = -1 errno=38 (Function not implemented)
 1118081 rseq(0x857c0,32,0,0xf1401073) = -1 errno=38 (Function not implemented)

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250826060341.1118670-1-joel@jms.id.au>
(cherry picked from commit f91563d011a0439cd6709e169cdfac268779d562)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoi386/tcg/svm: fix incorrect canonicalization
Zero Tang [Mon, 18 Aug 2025 10:16:47 +0000 (12:16 +0200)] 
i386/tcg/svm: fix incorrect canonicalization

For all 32-bit systems and 64-bit Windows systems, "long" is 4 bytes long.
Due to using "long" for a linear address, svm_canonicalization would
set all high bits to 1 when (assuming 48-bit linear address) the segment
base is bigger than 0x7FFF.

This fixes booting guests under TCG when the guest IDT and GDT bases are
above 0x7FFF, thereby resulting in incorrect bases. When an interrupt
arrives, it would trigger a #PF exception; the #PF would trigger again,
resulting in a #DF exception; the #PF would trigger for the third time,
resulting in triple-fault, and eventually causes a shutdown VM-Exit to
the hypervisor right after guest boot.

Cc: qemu-stable@nongnu.org
Signed-off-by: Zero Tang <zero.tangptr@gmail.com>
(cherry picked from commit c12cbaa007c9da97a11e74119ea3aed9fcc3ac4c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agopython: mkvenv: fix messages printed by mkvenv
Paolo Bonzini [Fri, 22 Aug 2025 08:46:05 +0000 (10:46 +0200)] 
python: mkvenv: fix messages printed by mkvenv

The new Matcher class does not have a __str__ implementation, and therefore
it prints the debugging representation of the internal object:

  $ ../configure --enable-rust && make qemu-system-arm --enable-download
  python determined to be '/usr/bin/python3'
  python version: Python 3.13.6
  mkvenv: Creating non-isolated virtual environment at 'pyvenv'
  mkvenv: checking for LegacyMatcher('meson>=1.5.0')
  mkvenv: checking for LegacyMatcher('pycotap>=1.1.0')

Add the method to print the nicer

  mkvenv: checking for meson>=1.5.0
  mkvenv: checking for pycotap>=1.1.0

Cc: qemu-stable@nongnu.org
Cc: John Snow <jsnow@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ab85146ac4c6527d6d975afbd3157488cb42147f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/uefi: open json file in binary mode
Gerd Hoffmann [Mon, 11 Aug 2025 13:01:10 +0000 (15:01 +0200)] 
hw/uefi: open json file in binary mode

Fixes file length discrepancies due to line ending conversions
on windows hosts.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3058
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250811130110.820958-4-kraxel@redhat.com>
(cherry picked from commit 040237436f423253f3397547aa78d449394dfbca)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/uefi: check access for first variable
Gerd Hoffmann [Mon, 11 Aug 2025 13:01:09 +0000 (15:01 +0200)] 
hw/uefi: check access for first variable

When listing variables (via get-next-variable-name) only the names of
variables which can be accessed will be returned.  That check was
missing for the first variable though.  Add it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250811130110.820958-3-kraxel@redhat.com>
(cherry picked from commit fc8ee8fe58ad410f27fca64e4ad212c5a3eabe00)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/uefi: return success for notifications
Gerd Hoffmann [Mon, 11 Aug 2025 13:01:08 +0000 (15:01 +0200)] 
hw/uefi: return success for notifications

Set status to SUCCESS for ready-to-boot and exit-boot-services
notification calls.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250811130110.820958-2-kraxel@redhat.com>
(cherry picked from commit 88e5a28d5aabb57f44c1805fbba0a458023f5106)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/uefi: clear uefi-vars buffer in uefi_vars_write callback
Mauro Matteo Cascella [Mon, 11 Aug 2025 10:11:24 +0000 (12:11 +0200)] 
hw/uefi: clear uefi-vars buffer in uefi_vars_write callback

When the guest writes to register UEFI_VARS_REG_BUFFER_SIZE, the .write
callback `uefi_vars_write` is invoked. The function allocates a
heap buffer without zeroing the memory, leaving the buffer filled with
residual data from prior allocations. When the guest later reads from
register UEFI_VARS_REG_PIO_BUFFER_TRANSFER, the .read callback
`uefi_vars_read` returns leftover metadata or other sensitive process
memory from the previously allocated buffer, leading to an information
disclosure vulnerability.

Fixes: CVE-2025-8860
Fixes: 90ca4e03c27d ("hw/uefi: add var-service-core.c")
Reported-by: ZDI <zdi-disclosures@trendmicro.com>
Suggested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Message-ID: <20250811101128.17661-1-mcascell@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f757d9d90d19b914d4023663bfc4da73bbbf007e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agomkvenv: Support pip 25.2
Sv. Lockal [Mon, 11 Aug 2025 19:01:59 +0000 (15:01 -0400)] 
mkvenv: Support pip 25.2

Fix compilation with pip-25.2 due to missing distlib.version

Bug: https://gitlab.com/qemu-project/qemu/-/issues/3062

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
[Edits: Type "safety" whackamole --js]
Signed-off-by: John Snow <jsnow@redhat.com>
Message-ID: <20250811190159.237321-1-jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 6ad034e71232c2929ed546304c9d249312bb632f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/sd/ssi-sd: Return noise (dummy byte) when no card connected
Philippe Mathieu-Daudé [Fri, 8 Aug 2025 12:57:44 +0000 (14:57 +0200)] 
hw/sd/ssi-sd: Return noise (dummy byte) when no card connected

Commit 1585ab9f1ba ("hw/sd/sdcard: Fill SPI response bits in card
code") exposed a bug in the SPI adapter: if no SD card is plugged,
we are returning "there is a card with an error". This is wrong,
we shouldn't return any particular packet response, but the noise
shifted on the MISO line. Return the dummy byte, otherwise we get:

  qemu-system-riscv64: ../hw/sd/ssi-sd.c:160: ssi_sd_transfer: Assertion `s->arglen > 0' failed.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 775616c3ae8 ("Partial SD card SPI mode support")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250812140415.70153-2-philmd@linaro.org>
(cherry picked from commit e262646e12acd6c1132e03d57fea20680a503251)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoqemu-iotests: Ignore indentation in Killed messages
Werner Fink [Wed, 6 Aug 2025 06:54:51 +0000 (08:54 +0200)] 
qemu-iotests: Ignore indentation in Killed messages

New bash 5.3 uses a different padding for reporting job status.

Resolves: boo#1246830
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3050
Signed-off-by: Werner Fink <werner@suse.de>
Message-ID: <aJL8RH8ePPNEteMg@boole.nue2.suse.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Martin Kletzander <mkletzan@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit c0df98ab1f3d348bc05f09d1c093abc529f2b530)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agorbd: Fix .bdrv_get_specific_info implementation
Kevin Wolf [Mon, 11 Aug 2025 13:40:10 +0000 (15:40 +0200)] 
rbd: Fix .bdrv_get_specific_info implementation

qemu_rbd_get_specific_info() has at least two problems:

The first is that it issues a blocking rbd_read() call in order to probe
the encryption format for the image while querying the node. This means
that if the connection to the server goes down, not only I/O is stuck
(which is unavoidable), but query-names-block-nodes will actually make
the whole QEMU instance unresponsive. .bdrv_get_specific_info
implementations shouldn't perform blocking operations, but only return
what is already known.

The second is that the information returned isn't even correct. If the
image is already opened with encryption enabled at the RBD level, we'll
probe for "double encryption", i.e. if the encrypted data contains
another encryption header. If it doesn't (which is the normal case), we
won't return the encryption format. If it does, we return misleading
information because it looks like we're talking about the outer level
(the encryption format of the image itself) while the information is
about an encryption header in the guest data.

Fix this by storing the encryption format in BDRVRBDState when the image
is opened (and we do blocking operations anyway) and returning only the
stored information in qemu_rbd_get_specific_info().

The information we'll store is either the actual encryption format that
we enabled on the RBD level, or if the image is unencrypted, the result
of the same probing as we previously did when querying the node. Probing
image formats based on content that can be modified by the guest has
long been known as problematic, but as long as we only output it to the
user instead of making decisions based on it, it should be okay. It is
undoubtedly useful in the context of 'qemu-img info' when you're trying
to figure out which encryption options you have to use to open the
image successfully.

Fixes: 42e4ac9ef5a6 ("block/rbd: Add support for rbd image encryption")
Buglink: https://issues.redhat.com/browse/RHEL-105440
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250811134010.81787-1-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 4af976ef398e4e823addc00bf1c58787ba4952fe)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/nvme: cap MDTS value for internal limitation
Keith Busch [Fri, 1 Aug 2025 14:24:57 +0000 (07:24 -0700)] 
hw/nvme: cap MDTS value for internal limitation

The emulated device had let the user set whatever max transfers size
they wanted, including no limit. However the device does have an
internal limit of 1024 segments. NVMe doesn't report max segments,
though. This is implicitly inferred based on the MDTS and MPSMIN values.

IOV_MAX is currently 1024 which 4k PRPs can exceed with 2MB transfers.
Don't allow MDTS values that can exceed this, otherwise users risk
seeing "internal error" status to their otherwise protocol compliant
commands.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit 53493c1f836f4dda90a6b5f2fb3d9264918c6871)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/nvme: revert CMIC behavior
Klaus Jensen [Tue, 3 Jun 2025 12:59:06 +0000 (14:59 +0200)] 
hw/nvme: revert CMIC behavior

Commit cd59f50ab017 ("hw/nvme: always initialize a subsystem") causes
the controller to always set the CMIC.MCTRS ("Multiple Controllers")
bit. While spec-compliant, this is a deviation from the previous
behavior where this was only set if an nvme-subsys device was explicitly
created (to configure a subsystem with multiple controllers/namespaces).

Revert the behavior to only set CMIC.MCTRS if an nvme-subsys device is
created explicitly.

Reported-by: Alan Adamson <alan.adamson@oracle.com>
Fixes: cd59f50ab017 ("hw/nvme: always initialize a subsystem")
Reviewed-by: Alan Adamson <alan.adamson@oracle.com>
Tested-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit bc0c24fdb157b014932a5012fe4ccbeba7617f17)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agohw/nvme: fix namespace attachment
Klaus Jensen [Tue, 3 Jun 2025 12:59:05 +0000 (14:59 +0200)] 
hw/nvme: fix namespace attachment

Commit 6ccca4b6bb9f ("hw/nvme: rework csi handling") introduced a bug in
Namespace Attachment, causing it to

  a) not allow a controller to attach namespaces to other controllers
  b) assert if a valid non-attached namespace is detached

This fixes both issues.

Fixes: 6ccca4b6bb9f ("hw/nvme: rework csi handling")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2976
Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit 31b737b19dca4d50758f5e9834d27b683102f2f1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agotarget/loongarch: Fix [X]VLDI raising exception incorrectly
WANG Rui [Mon, 4 Aug 2025 13:22:12 +0000 (21:22 +0800)] 
target/loongarch: Fix [X]VLDI raising exception incorrectly

According to the specification, [X]VLDI should trigger an invalid instruction
exception only when Bit[12] is 1 and Bit[11:8] > 12. This patch fixes an issue
where an exception was incorrectly raised even when Bit[12] was 0.

Test case:

```
    .global main
main:
    vldi    $vr0, 3328
    ret
```

Reported-by: Zhou Qiankang <wszqkzqk@qq.com>
Signed-off-by: WANG Rui <wangrui@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20250804132212.4816-1-wangrui@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
(cherry picked from commit e66644c48e96e81848c6aa94b185f59fc212d080)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoui/curses: Fix infinite loop on windows
William Hu [Thu, 3 Apr 2025 01:07:56 +0000 (01:07 +0000)] 
ui/curses: Fix infinite loop on windows

Replace -1 comparisons for wint_t with WEOF to fix infinite loop caused by a
65535 == -1 comparison.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2905
Signed-off-by: William Hu <purplearmadillo77@proton.me>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[ Marc-André - Add missing similar code change, remove a comment ]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <tSO5to8--iex6QMThG3Z8ElfnNOUahK_yitw2G2tEVRPoMKV936CBdrpyfbeNpVEpziKqeQ1ShBwPOoDkofgApM8YWwnPKJR_JrPDThV8Bc=@proton.me>
(cherry picked from commit c7ac771ee750e37808928b62388fd27dcbf00f46)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive2: Fix treatment of PIPR in CPPR update
Glenn Miles [Mon, 12 May 2025 03:10:19 +0000 (13:10 +1000)] 
ppc/xive2: Fix treatment of PIPR in CPPR update

According to the XIVE spec, updating the CPPR should also update the
PIPR. The final value of the PIPR depends on other factors, but it
should never be set to a value that is above the CPPR.

Also added support for redistributing an active group interrupt when it
is precluded as a result of changing the CPPR value.

Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-11-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit d4720a7faf4bb415f3fe7f10e5c888212b81316a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive2: Fix irq preempted by lower priority group irq
Glenn Miles [Mon, 12 May 2025 03:10:18 +0000 (13:10 +1000)] 
ppc/xive2: Fix irq preempted by lower priority group irq

A problem was seen where uart interrupts would be lost resulting in the
console hanging. Traces showed that a lower priority interrupt was
preempting a higher priority interrupt, which would result in the higher
priority interrupt never being handled.

The new interrupt's priority was being compared against the CPPR
(Current Processor Priority Register) instead of the PIPR (Post
Interrupt Priority Register), as was required by the XIVE spec.
This allowed for a window between raising an interrupt and ACK'ing
the interrupt where a lower priority interrupt could slip in.

Fixes: 26c55b99418 ("ppc/xive2: Process group backlog when updating the CPPR")
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-10-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 8d373176181fbc11f8d8eae2b4532b867f083ea6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive2: Reset Generation Flipped bit on END Cache Watch
Michael Kowal [Mon, 12 May 2025 03:10:16 +0000 (13:10 +1000)] 
ppc/xive2: Reset Generation Flipped bit on END Cache Watch

When the END Event Queue wraps the END EQ Generation bit is flipped and the
Generation Flipped bit is set to one.  On a END cache Watch read operation,
the Generation Flipped bit needs to be reset.

While debugging an error modified END not valid error messages to include
the method since all were the same.

Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-8-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 576830428eea6ebfc85792851a343214b834e401)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive: Fix PHYS NSR ring matching
Nicholas Piggin [Mon, 12 May 2025 03:10:15 +0000 (13:10 +1000)] 
ppc/xive: Fix PHYS NSR ring matching

Test that the NSR exception bit field is equal to the pool ring value,
rather than any common bits set, which is more correct (although there
is no practical bug because the LSI NSR type is not implemented and
POOL/PHYS NSR are encoded with exclusive bits).

Fixes: 4c3ccac636 ("pnv/xive: Add special handling for pool targets")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-7-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit bde8c148bb22b99cb84cda800fa555851b8cb358)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive2: fix context push calculation of IPB priority
Nicholas Piggin [Mon, 12 May 2025 03:10:14 +0000 (13:10 +1000)] 
ppc/xive2: fix context push calculation of IPB priority

Pushing a context and loading IPB from NVP is defined to merge ('or')
that IPB into the TIMA IPB register. PIPR should therefore be calculated
based on the final IPB value, not just the NVP value.

Fixes: 9d2b6058c5b ("ppc/xive2: Add grouping level to notification")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-6-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit d1023a296c8297454fc4b207d58707c0a5e62e0a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive2: Remote VSDs need to match on forwarding address
Michael Kowal [Mon, 12 May 2025 03:10:13 +0000 (13:10 +1000)] 
ppc/xive2: Remote VSDs need to match on forwarding address

In a multi chip environment there will be remote/forwarded VSDs.  The check
to find a matching INT controller (XIVE) of the remote block number was
checking the INTs chip number.  Block numbers are not tied to a chip number.
The matching remote INT is the one that matches the forwarded VSD address
with VSD types associated MMIO BAR.

Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-5-npiggin@gmail.com
[ clg: Fixed log format in pnv_xive2_get_remote() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit e8cf73b849879cd93b1d1b5fd3bde79fb42064dc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive2: Fix calculation of END queue sizes
Glenn Miles [Mon, 12 May 2025 03:10:12 +0000 (13:10 +1000)] 
ppc/xive2: Fix calculation of END queue sizes

The queue size of an Event Notification Descriptor (END)
is determined by the 'cl' and QsZ fields of the END.
If the cl field is 1, then the queue size (in bytes) will
be the size of a cache line 128B * 2^QsZ and QsZ is limited
to 4.  Otherwise, it will be 4096B * 2^QsZ with QsZ limited
to 12.

Fixes: f8a233dedf2 ("ppc/xive2: Introduce a XIVE2 core framework")
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-4-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit f16697292add6c3c15014a20fd5fce70b8c56734)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive: Report access size in XIVE TM operation error logs
Nicholas Piggin [Mon, 12 May 2025 03:10:11 +0000 (13:10 +1000)] 
ppc/xive: Report access size in XIVE TM operation error logs

Report access size in XIVE TM operation error logs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-3-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit f0aab779418ed883ea2b5ffcc3985ef26f6e3545)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoppc/xive: Fix xive trace event output
Nicholas Piggin [Mon, 12 May 2025 03:10:10 +0000 (13:10 +1000)] 
ppc/xive: Fix xive trace event output

Typo, IBP should be IPB.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Reviewed-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Caleb Schlossin <calebs@linux.ibm.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20250512031100.439842-2-npiggin@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 301fbbaf03fb114a1e45833987ddfd7bbb403963)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agotarget/i386/cpu: Move addressable ID encoding out of compat property in CPUID[0x1]
Zhao Liu [Mon, 4 Aug 2025 05:35:48 +0000 (13:35 +0800)] 
target/i386/cpu: Move addressable ID encoding out of compat property in CPUID[0x1]

Currently, the addressable ID encoding for CPUID[0x1].EBX[bits 16-23]
(Maximum number of addressable IDs for logical processors in this
physical package) is covered by vendor_cpuid_only_v2 compat property.
The previous consideration was to avoid breaking migration and this
compat property makes it unfriendly to backport the commit f985a1195ba2
("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX
[23:16]").

However, NetBSD booting is broken since the commit 88dd4ca06c83
("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]"),
because NetBSD calculates smt information via `lp_max` / `core_max` for
legacy Intel CPUs which doesn't support 0xb leaf, where `lp_max` is from
CPUID[0x1].EBX.bits[16-23] and `core_max` is from CPUID[0x4].0x0.bits[26
-31].

The commit 88dd4ca0 changed the encoding rule of `core_max` but didn't
update `lp_max`, so that NetBSD would get the wrong smt information,
which leads to the module loading failure.

Luckily, the commit f985a1195ba2 ("i386/cpu: Fix number of addressable
IDs field for CPUID.01H.EBX[23:16]") updated the encoding rule for
`lp_max` and accidentally fixed the NetBSD issue too. This also shows
that using CPUID[0x1] and CPUID[0x4].0x0 to calculate HT/SMT information
is a common practice to detect CPU topology on legacy Intel CPUs.

Therefore, it's necessary to backport the commit f985a1195ba2 to
previous stable QEMU to help address the similar issues as well. Then
the compat property is not needed any more since all stable QEMUs will
follow the same encoding way.

So, in CPUID[0x1], move addressable ID encoding out of compat property.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Inspired-by: Chuang Xu <xuchuangxclwt@bytedance.com>
Fixes: commit f985a1195ba2 ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3061
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20250804053548.1808629-1-zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 4e5d58969ed6927a7f1b08ba6223c34c8b990c92)
(Mjt: when picking up commit f985a1195ba2d9c6f6f33e83fe2e419a7e8acb60
 "i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]"
 to 10.0, I commented out the condition (vendor_cpuid_only_v2 comparison)
 which is now being removed)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoi386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]
Qian Wen [Mon, 14 Jul 2025 08:08:57 +0000 (16:08 +0800)] 
i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]

The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM
Vol2:

Bits 23-16: Maximum number of addressable IDs for logical processors in
this physical package.

When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is
apic_id, 2) bits [23:16] get truncated.

Specifically, if launching the VM with -smp 256, the value written to
EBX[23:16] is 0 because of data overflow. If the guest only supports
legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f
or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs,
the return of the kernel invoking cpu_smt_allowed() is false and APs
(application processors) will fail to bring up. Then only CPU 0 is online,
and others are offline.

For example, launch VM via:
qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \
    -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \
    -drive file=guest.img,if=none,id=virtio-disk0,format=raw \
    -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic

The guest shows:
    CPU(s):               256
    On-line CPU(s) list:  0
    Off-line CPU(s) list: 1-255

To avoid this issue caused by overflow, limit the max value written to
EBX[23:16] to 255 as the HW does.

Cc: qemu-stable@nongnu.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Qian Wen <qian.wen@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-6-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a62fef58299562aae6667b8d8552247423e886b3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2 months agoi386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]
Chuang Xu [Mon, 14 Jul 2025 08:08:56 +0000 (16:08 +0800)] 
i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]

When QEMU is started with:
-cpu host,migratable=on,host-cache-info=on,l3-cache=off
-smp 180,sockets=2,dies=1,cores=45,threads=2

On Intel platform:
CPUID.01H.EBX[23:16] is defined as "max number of addressable IDs for
logical processors in the physical package".

When executing "cpuid -1 -l 1 -r" in the guest, we obtain a value of 90 for
CPUID.01H.EBX[23:16], whereas the expected value is 128. Additionally,
executing "cpuid -1 -l 4 -r" in the guest yields a value of 63 for
CPUID.04H.EAX[31:26], which matches the expected result.

As (1+CPUID.04H.EAX[31:26]) rounds up to the nearest power-of-2 integer,
it's necessary to round up CPUID.01H.EBX[23:16] to the nearest power-of-2
integer too. Otherwise there would be unexpected results in guest with
older kernel.

For example, when QEMU is started with CLI above and xtopology is disabled,
guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
as the result, even though threads-per-core should actually be 2.

And on AMD platform:
CPUID.01H.EBX[23:16] is defined as "Logical processor count". Current
result meets our expectation.

So round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer only
for Intel platform to solve the unexpected result.

Use the "x-vendor-cpuid-only-v2" compat option to fix this issue.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-5-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f985a1195ba2d9c6f6f33e83fe2e419a7e8acb60)
(Mjt: this change refers to cpu->vendor_cpuid_only_v2 which is introduced
 after 10.0, but this reference is removed later.  Replace it with false
 for now, so effectively this commit is a no-op, but prepares the code
 for the next change in this area and keeps historical references)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoi386/cpu: Move adjustment of CPUID_EXT_PDCM before feature_dependencies[] check
Xiaoyao Li [Tue, 4 Mar 2025 05:24:49 +0000 (00:24 -0500)] 
i386/cpu: Move adjustment of CPUID_EXT_PDCM before feature_dependencies[] check

There is one entry relates to CPUID_EXT_PDCM in feature_dependencies[].
So it needs to get correct value of CPUID_EXT_PDCM before using
feature_dependencies[] to apply dependencies.

Besides, it also ensures CPUID_EXT_PDCM value is tracked in
env->features[FEAT_1_ECX].

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250304052450.465445-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e68ec2980901c8e7f948f3305770962806c53f0b)
(Mjt: this simple cleanup is not strictly necessary for 10.0,
 but pick it up so subsequent changes in this area applies cleanly)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoRevert "i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]"
Michael Tokarev [Wed, 6 Aug 2025 05:11:56 +0000 (08:11 +0300)] 
Revert "i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]"

This reverts commit d0975531586742ec2eff8796b7ba93bc4858e63d,
which is a modified commit a62fef58299562aae6667b8d8552247423e886b3
from the master branch.

Since more changes from the master branch are needed in the areas
which are being touched by this one, and this change has been
modified to apply without the previous commit(s), let's revert it
for now, apply previous patches, and re-apply it without modifications
on top.

Additionally, when I cherry-picked a62fef58299562aa, it somehow lost
its original authorship (Qian Wen).  So this revert fixes both issues.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoqga: correctly write to /sys/power/state on linux
Michael Tokarev [Fri, 1 Aug 2025 11:53:14 +0000 (14:53 +0300)] 
qga: correctly write to /sys/power/state on linux

Commit v9.0.0-343-g2048129625 introduced usage of
g_file_set_contents() function to write to /sys/power/state.
This function uses G_FILE_SET_CONTENTS_CONSISTENT flag to
g_file_set_contents_full(), which is implemented by creating
a temp file in the same directory and renaming it to the final
destination.  Which is not how sysfs works.

Here, there's not a big deal to do open/write/close - it becomes
almost the same as using g_file_set_contents[_full]().  But it
does not have surprises like this.

Also, since this is linux code, it should be ok to use %m in
the error reporting function.

Fixes: 2048129625 "qga/commands-posix: don't do fork()/exec() when suspending via sysfs"
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3057
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250801115316.6845-1-mjt@tls.msk.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit b217d987a3c5c6e2473956723b396bc1ff0f5b2b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoscripts/make-release: Go back to cloning all the EDK2 submodules
Peter Maydell [Mon, 21 Jul 2025 15:33:41 +0000 (16:33 +0100)] 
scripts/make-release: Go back to cloning all the EDK2 submodules

In commit bd0da3a3d4f we changed make-release so that instead of
cloning every git submodule of EDK2 we only cloned a fixed list.
The original motivation for this was that one of the submodules:
 * was from a non-github repo
 * that repo had a "SSL certificate expired" failure
 * wasn't actually needed for the set of EDK2 binaries we build
and at the time we were trying to build the EDK2 binaries in one of
our CI jobs.

Unfortunately this change meant that we were exposed to bugs where
EDK2 adds a new submodule and the sources we ship in the release
tarball won't build any more.  In particular, in EDK2 commit
c6bb7d54beb05 the MipiSysTLib submodule was added, causing failure of
the ROM build in our tarball starting from QEMU release 8.2.0:

/tmp/qemu-10.0.0/roms/edk2/MdePkg/MdePkg.dec(32): error 000E: File/directory not found in workspace
        Library/MipiSysTLib/mipisyst/library/include is not found in packages path:
        /tmp/qemu-10.0.0/roms/.
        /tmp/qemu-10.0.0/roms/edk2

(Building from a QEMU git checkout works fine.)

In the intervening time EDK2 moved the submodule that had a problem
to be one they mirrored themselves (and at time of writing all their
submodules are hosted on github), and we stopped trying to build
EDK2 binaries in our own CI jobs with commit 690ceb71936f9037f6.

Go back to cloning every EDK2 submodule, so we don't have an
untested explicit list of submodules which will break without
our noticing it.

This increases the size of the QEMU tarball .tar.xz file from
133M to 139M in my testing.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3041
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20250721153341.2910800-1-peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
(cherry picked from commit 0311a6edb9db34a41a2662d94c37e1fbaabf6ecf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/arm: add support for 64-bit PMCCNTR in AArch32 mode
Alex Richardson [Fri, 25 Jul 2025 17:01:36 +0000 (10:01 -0700)] 
target/arm: add support for 64-bit PMCCNTR in AArch32 mode

In the PMUv3, a new AArch32 64-bit (MCRR/MRRC) accessor for the
PMCCNTR was added. In QEMU we forgot to implement this, so only
provide the 32-bit accessor. Since we have a 64-bit PMCCNTR
sysreg for AArch64, adding the 64-bit AArch32 version is easy.

We add the PMCCNTR to the v8_cp_reginfo because PMUv3 was added
in the ARMv8 architecture. This is consistent with how we
handle the existing PMCCNTR support, where we always implement
it for all v7 CPUs. This is arguably something we should
clean up so it is gated on ARM_FEATURE_PMU and/or an ID
register check for the relevant PMU version, but we should
do that as its own tidyup rather than being inconsistent between
this PMCCNTR accessor and the others.

Since the register name is the same as the 32-bit PMCCNTR, we set
ARM_CP_NO_GDB on the 32-bit one to avoid generating an invalid GDB XML.

See https://developer.arm.com/documentation/ddi0601/2024-06/AArch32-Registers/PMCCNTR--Performance-Monitors-Cycle-Count-Register?lang=en

Note for potential backporting:
 * this code in cpregs-pmu.c will be in helper.c on stable
   branches that don't have commit ae2086426d37

Cc: qemu-stable@nongnu.org
Signed-off-by: Alex Richardson <alexrichardson@google.com>
Message-id: 20250725170136.145116-1-alexrichardson@google.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit cd9f752fee75238f842a91be1146c988bd16a010)
(Mjt: moved code to target/arm/helper.c and modified it for 10.0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohw/ssi/aspeed_smc: Fix incorrect FMC_WDT2 register read on AST1030
Jamin Lin [Mon, 4 Aug 2025 01:46:33 +0000 (09:46 +0800)] 
hw/ssi/aspeed_smc: Fix incorrect FMC_WDT2 register read on AST1030

On AST1030, reading the FMC_WDT2 register always returns 0xFFFFFFFF.
This issue is due to the aspeed_smc_read function, which checks for the
ASPEED_SMC_FEATURE_WDT_CONTROL feature. Since AST1030 was missing this
feature flag, the read operation fails and returns -1.

To resolve this, add the WDT_CONTROL feature to AST1030's feature set
so that FMC_WDT2 can be correctly accessed by firmware.

Signed-off-by: Jamin Lin <jamin_lin@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Fixes: 2850df6a81bcdc2e063dfdd56751ee2d11c58030 ("aspeed/smc: Add AST1030 support ")
Link: https://lore.kernel.org/qemu-devel/20250804014633.512737-1-jamin_lin@aspeedtech.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 13ed972b4ce57198914a37217251d30fbec20e41)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/arm: Fix handling of setting SVE registers from gdb
Vacha Bhavsar [Tue, 22 Jul 2025 17:37:36 +0000 (17:37 +0000)] 
target/arm: Fix handling of setting SVE registers from gdb

The code to handle setting SVE registers via the gdbstub is broken:
 * it sets each pair of elements in the zregs[].d[] array in the
   wrong order for the most common (little endian) case: the least
   significant 64-bit value comes first
 * it makes no attempt to handle target_endian()
 * it does a simple copy out of the (target endian) gdbstub buffer
   into the (host endan) zregs data structure, which is wrong on
   big endian hosts

Fix all these problems:
 * use ldq_p() to read from the gdbstub buffer
 * check target_big_endian() to see if we need to handle the
   128-bit values the opposite way around

Cc: qemu-stable@nongnu.org
Signed-off-by: Vacha Bhavsar <vacha.bhavsar@oss.qualcomm.com>
Message-id: 20250722173736.2332529-3-vacha.bhavsar@oss.qualcomm.com
[PMM: adjusted commit message, fixed spacing]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 97b3d732afec9b165c33697452e31267a845338f)
(Mjt: s/target_big_endian/target_words_bigendian/ due to missing
 v10.0.0-277-gb939b8e42a "exec: Rename target_words_bigendian() -> target_big_endian()")
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/arm: Fix big-endian handling of NEON gdb remote debugging
Vacha Bhavsar [Tue, 22 Jul 2025 17:37:35 +0000 (17:37 +0000)] 
target/arm: Fix big-endian handling of NEON gdb remote debugging

In the code for allowing the gdbstub to set the value of an AArch64
FP/SIMD register, we weren't accounting for target_big_endian()
being true. This meant that for aarch64_be-linux-user we would
set the two halves of the FP register the wrong way around.
The much more common case of a little-endian guest is not affected;
nor are big-endian hosts.

Correct the handling of this case.

Cc: qemu-stable@nongnu.org
Signed-off-by: Vacha Bhavsar <vacha.bhavsar@oss.qualcomm.com>
Message-id: 20250722173736.2332529-2-vacha.bhavsar@oss.qualcomm.com
[PMM: added comment, expanded commit message, fixed missing space]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 35cca0f95ff5345f54c11d116efc8940a0dab8aa)
(Mjt: s/target_big_endian/target_words_bigendian/ due to missing
 v10.0.0-277-gb939b8e42a "exec: Rename target_words_bigendian() -> target_big_endian()")
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohw/intc/arm_gicv3_kvm: Write all 1's to clear enable/active
Zenghui Yu [Tue, 29 Jul 2025 16:16:50 +0000 (00:16 +0800)] 
hw/intc/arm_gicv3_kvm: Write all 1's to clear enable/active

KVM's userspace access interface to the GICD enable and active bits
is via set/clear register pairs which implement the hardware's "write
1s to the clear register to clear the 0 bits, and write 1s to the set
register to set the 1 bits" semantics.  We didn't get this right,
because we were writing 0 to the clear register.

Writing 0 to GICD_IC{ENABLE,ACTIVE}R architecturally has no effect on
interrupt status (all writes are simply ignored by KVM) and doesn't
comply with the intention of "first write to the clear-reg to clear
all bits".

Write all 1's to actually clear the enable/active status.

This didn't have any adverse effects on migration because there
we start with a clean VM state; it would be guest-visible when
doing a system reset, but since Linux always cleans up the
register state of the GIC during bootup before it enables it
most users won't have run into a problem here.

Cc: qemu-stable@nongnu.org
Fixes: 367b9f527bec ("hw/intc/arm_gicv3_kvm: Implement get/put functions")
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Message-id: 20250729161650.43758-3-zenghui.yu@linux.dev
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit b10bd4bd17ac8628ede8735a08ad82dc3b721c64)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
Sairaj Kodilkar [Fri, 1 Aug 2025 06:05:04 +0000 (11:35 +0530)] 
hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end

Setting up IOAPIC memory region requires mr_sys and mr_ir. Currently
these two memory regions are setup after the initializing the IOAPIC
memory region, which cause `amdvi_host_dma_iommu()` to use unitialized
mr_sys and mr_ir.

Move the IOAPIC memory region initialization to the end in order to use
the mr_sys and mr_ir regions after they are fully initialized.

Fixes: 577c470f4326 ("x86_iommu/amd: Prepare for interrupt remap support")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250801060507.3382-4-sarunkod@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit a7842d94067cddc80b47ac42fb6e49e2fc02a3c5)
(Mjt: context fix due to missing v10.0.0-833-gf864a3235ea1
 "hw/i386/amd_iommu: Isolate AMDVI-PCI from amd-iommu device
 to allow full control over the PCI device creation")
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agointel_iommu: Allow both Status Write and Interrupt Flag in QI wait
David Woodhouse [Mon, 14 Jul 2025 08:00:47 +0000 (09:00 +0100)] 
intel_iommu: Allow both Status Write and Interrupt Flag in QI wait

FreeBSD does both, and this appears to be perfectly valid. The VT-d
spec even talks about the ordering (the status write should be done
first, unsurprisingly).

We certainly shouldn't assert() and abort QEMU if the guest asks for
both.

Fixes: ed7b8fbcfb88 ("intel-iommu: add supports for queued invalidation interface")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3028
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <0122cbabc0adcc3cf878f5fd7834d8f258c7a2f2.camel@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit e8145dcd311b58921f3a45121792cbfab38fd2f6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agopcie_sriov: Fix configuration and state synchronization
Akihiko Odaki [Sun, 27 Jul 2025 06:50:08 +0000 (15:50 +0900)] 
pcie_sriov: Fix configuration and state synchronization

Fix issues in PCIe SR-IOV configuration register handling that caused
inconsistent internal state due to improper write mask handling and
incorrect migration behavior.

Two main problems were identified:

1. VF Enable bit write mask handling:
   pcie_sriov_config_write() incorrectly assumed that its val parameter
   was already masked, causing it to ignore the actual write mask.
   This led to the VF Enable bit being processed even when masked,
   resulting in incorrect VF registration/unregistration. It is
   identified as CVE-2025-54567.

2. Migration state inconsistency:
   pcie_sriov_pf_post_load() unconditionally called register_vfs()
   regardless of the VF Enable bit state, creating inconsistent
   internal state when VFs should not be enabled. Additionally,
   it failed to properly update the NumVFs write mask based on
   the current configuration. It is identified as CVE-2025-54566.

Root cause analysis revealed that both functions relied on incorrect
special-case assumptions instead of properly reading and consuming
the actual configuration values. This change introduces a unified
consume_config() function that reads actual configuration values and
synchronize the internal state without special-case assumptions.

The solution only adds register read overhead in non-hot-path code
while ensuring correct SR-IOV state management across configuration
writes and migration scenarios.

Fixes: 5e7dd17e4348 ("pcie_sriov: Remove num_vfs from PCIESriovPF")
Fixes: f9efcd47110d ("pcie_sriov: Register VFs after migration")
Fixes: CVE-2025-54566
Fixes: CVE-2025-54567
Cc: qemu-stable@nongnu.org
Reported-by: Corentin BAYET <corentin.bayet@reversetactics.com>
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Message-Id: <20250727-wmask-v2-1-394910b1c0b6@rsg.ci.i.u-tokyo.ac.jp>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit cad9aa6fbdccd95e56e10cfa57c354a20a333717)
(Mjt: context fix)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agovirtio-net: Fix VLAN filter table reset timing
Akihiko Odaki [Sun, 27 Jul 2025 06:22:36 +0000 (15:22 +0900)] 
virtio-net: Fix VLAN filter table reset timing

Problem
-------

The expected initial state of the table depends on feature negotiation:

With VIRTIO_NET_F_CTRL_VLAN:
  The table must be empty in accordance with the specification.
Without VIRTIO_NET_F_CTRL_VLAN:
  The table must be filled to permit all VLAN traffic.

Prior to commit 06b636a1e2ad ("virtio-net: do not reset vlan filtering
at set_features"), virtio_net_set_features() always reset the VLAN
table. That commit changed the behavior to skip table reset when
VIRTIO_NET_F_CTRL_VLAN was negotiated, assuming the table would be
properly cleared during device reset and remain stable.

However, this assumption breaks when a driver renegotiates features:
1. Initial negotiation without VIRTIO_NET_F_CTRL_VLAN (table filled)
2. Renegotiation with VIRTIO_NET_F_CTRL_VLAN (table will not be cleared)

The problem was exacerbated by commit 0caed25cd171 ("virtio: Call
set_features during reset"), which triggered virtio_net_set_features()
during device reset, exposing the bug whenever VIRTIO_NET_F_CTRL_VLAN
was negotiated after a device reset.

Solution
--------

Fix the issue by initializing the table when virtio_net_set_features()
is called to change the VIRTIO_NET_F_CTRL_VLAN bit of
vdev->guest_features.

This approach ensures the correct table state regardless of feature
negotiation sequence by performing initialization in
virtio_net_set_features() as QEMU did prior to commit 06b636a1e2ad
("virtio-net: do not reset vlan filtering at set_features").

This change still preserves the goal of the commit, which was to avoid
resetting the table during migration, by checking whether the
VIRTIO_NET_F_CTRL_VLAN bit of vdev->guest_features is being changed;
vdev->guest_features is set before virtio_net_set_features() gets called
during migration.

It also avoids resetting the table when the driver sets a feature
bitmask with no change for the VIRTIO_NET_F_CTRL_VLAN bit, which makes
the operation idempotent and its semantics cleaner.

Additionally, this change ensures the table is initialized after
feature negotiation and before the DRIVER_OK status bit being set for
compatibility with the Linux driver before commit 50c0ada627f5
("virtio-net: fix race between ndo_open() and virtio_device_ready()"),
which did not ensure to set the DRIVER_OK status bit before modifying
the table.

Fixes: 06b636a1e2ad ("virtio-net: do not reset vlan filtering at set_features")
Cc: qemu-stable@nongnu.org
Reported-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Tested-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250727-vlan-v3-1-bbee738619b1@rsg.ci.i.u-tokyo.ac.jp>
Tested-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6071d13c6a37493a6b26e1609b09a98aa058038a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agovhost: Do not abort on log-stop error
Hanna Czenczek [Thu, 24 Jul 2025 12:59:28 +0000 (14:59 +0200)] 
vhost: Do not abort on log-stop error

Failing to stop logging in a vhost device is not exactly fatal.  We can
log such an error, but there is no need to abort the whole qemu process
because of it.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20250724125928.61045-3-hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit d63c388dadb7717f6391e1bb7f11728c0c1a3e36)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agovhost: Do not abort on log-start error
Hanna Czenczek [Thu, 24 Jul 2025 12:59:27 +0000 (14:59 +0200)] 
vhost: Do not abort on log-start error

Commit 3688fec8923 ("memory: Add Error** argument to .log_global_start()
handler") enabled vhost_log_global_start() to return a proper error, but
did not change it to do so; instead, it still aborts the whole process
on error.

This crash can be reproduced by e.g. killing a virtiofsd daemon before
initiating migration.  In such a case, qemu should not crash, but just
make the attempted migration fail.

Buglink: https://issues.redhat.com/browse/RHEL-94534
Reported-by: Tingting Mao <timao@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20250724125928.61045-2-hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit c1997099dc26d95eb9f2249f2894aabf4cf0bf8b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agovirtio: fix off-by-one and invalid access in virtqueue_ordered_fill
Jonah Palmer [Mon, 21 Jul 2025 15:02:08 +0000 (15:02 +0000)] 
virtio: fix off-by-one and invalid access in virtqueue_ordered_fill

Commit b44135daa372 introduced virtqueue_ordered_fill for
VIRTIO_F_IN_ORDER support but had a few issues:

* Conditional while loop used 'steps <= max_steps' but should've been
  'steps < max_steps' since reaching steps == max_steps would indicate
  that we didn't find an element, which is an error. Without this
  change, the code would attempt to read invalid data at an index
  outside of our search range.

* Incremented 'steps' using the next chain's ndescs instead of the
  current one.

This patch corrects the loop bounds and synchronizes 'steps' and index
increments.

We also add a defensive sanity check against malicious or invalid
descriptor counts to avoid a potential infinite loop and DoS.

Fixes: b44135daa372 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support")
Reported-by: terrynini <terrynini38514@gmail.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20250721150208.2409779-1-jonah.palmer@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6fcf5ebafad65adc19a616260ca7dc90005785d1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/loongarch: Fix valid virtual address checking
Bibo Mao [Mon, 14 Jul 2025 01:54:46 +0000 (09:54 +0800)] 
target/loongarch: Fix valid virtual address checking

On LoongArch64 system, the high 32 bit of 64 bit virtual address should be
0x00000[0-7]yyy or 0xffff8yyy. The bit from 47 to 63 should be all 0 or
all 1.

Function get_physical_address() only checks bit 48 to 63, there will be
problem with the following test case. On physical machine, there is bus
error report and program exits abnormally. However on qemu TCG system
emulation mode, the program runs normally. The virtual address
0xffff000000000000ULL + addr and addr are treated the same on TLB entry
checking. This patch fixes this issue.

void main()
{
        void *addr, *addr1;
        int val;

        addr = malloc(100);
        *(int *)addr = 1;
        addr1 = 0xffff000000000000ULL + addr;
        val = *(int *)addr1;
        printf("val %d \n", val);
}

Cc: qemu-stable@nongnu.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20250714015446.746163-1-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
(cherry picked from commit caab7ac83507e3e9a5fe2f37be5cfa759e766ba2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/riscv: Restrict midelegh access to S-mode harts
Jay Chang [Tue, 1 Jul 2025 03:00:21 +0000 (11:00 +0800)] 
target/riscv: Restrict midelegh access to S-mode harts

RISC-V AIA Spec states:
"For a machine-level environment, extension Smaia encompasses all added
CSRs and all modifications to interrupt response behavior that the AIA
specifies for a hart, over all privilege levels. For a supervisor-level
environment, extension Ssaia is essentially the same as Smaia except
excluding the machine-level CSRs and behavior not directly visible to
supervisor level."

Since midelegh is an AIA machine-mode CSR, add Smaia extension check in
aia_smode32 predicate.

Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Jay Chang <jay.chang@sifive.com>
Reviewed-by: Nutty Liu<liujingqi@lanxincomputing.com>
Message-ID: <20250701030021.99218-3-jay.chang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 86bc3a0abf10072081cddd8dff25aa72c60e67b8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/riscv: Restrict mideleg/medeleg/medelegh access to S-mode harts
Jay Chang [Tue, 1 Jul 2025 03:00:20 +0000 (11:00 +0800)] 
target/riscv: Restrict mideleg/medeleg/medelegh access to S-mode harts

RISC-V Privileged Spec states:
"In harts with S-mode, the medeleg and mideleg registers must exist, and
setting a bit in medeleg or mideleg will delegate the corresponding trap
, when occurring in S-mode or U-mode, to the S-mode trap handler. In
harts without S-mode, the medeleg and mideleg registers should not
exist."

Add smode predicate to ensure these CSRs are only accessible when S-mode
is supported.

Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Jay Chang <jay.chang@sifive.com>
Reviewed-by: Nutty Liu<liujingqi@lanxincomputing.com>
Message-ID: <20250701030021.99218-2-jay.chang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit e443ba03361b63218e6c3aa4f73d2cb5b9b1d372)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agointc/riscv_aplic: Fix target register read when source is inactive
Yang Jialong [Mon, 28 Jul 2025 05:51:14 +0000 (13:51 +0800)] 
intc/riscv_aplic: Fix target register read when source is inactive

The RISC-V Advanced interrupt Architecture:
4.5.16. Interrupt targets:
If interrupt source i is inactive in this domain, register target[i] is
read-only zero.

Signed-off-by: Yang Jialong <z_bajeer@yeah.net>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250728055114.252024-1-z_bajeer@yeah.net>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit b6f1244678bebaf7e2c775cfc66d452f95678ebf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/riscv: Fix pmp range wraparound on zero
Vac Chen [Sun, 6 Jul 2025 06:55:54 +0000 (14:55 +0800)] 
target/riscv: Fix pmp range wraparound on zero

pmp_is_in_range() prefers to match addresses within the interval
[start, end]. To archieve this, pmpaddrX is decremented during the end
address update.

In TOR mode, a rule is ignored if its start address is greater than or
equal to its end address.

However, if pmpaddrX is set to 0, this decrement operation causes the
calulated end address to wrap around to UINT_MAX. In this scenario, the
address guard for this PMP entry would become ineffective.

This patch addresses the issue by moving the guard check earlier,
preventing the problematic wraparound when pmpaddrX is zero.

Signed-off-by: Vac Chen <vacantron@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250706065554.42953-1-vacantron@gmail.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 77707bfdf871199bbee665e721ced961aaf3a798)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/riscv: Fix exception type when VU accesses supervisor CSRs
Xu Lu [Tue, 8 Jul 2025 06:07:20 +0000 (14:07 +0800)] 
target/riscv: Fix exception type when VU accesses supervisor CSRs

When supervisor CSRs are accessed from VU-mode, a virtual instruction
exception should be raised instead of an illegal instruction.

Fixes: c1fbcecb3a (target/riscv: Fix csr number based privilege checking)
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Message-ID: <20250708060720.7030-1-luxu.kernel@bytedance.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 30ef718423e8018723087cd17be0fd9c6dfa2e53)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/riscv: do not call GETPC() in check_ret_from_m_mode()
Daniel Henrique Barboza [Mon, 14 Jul 2025 13:37:39 +0000 (10:37 -0300)] 
target/riscv: do not call GETPC() in check_ret_from_m_mode()

GETPC() should always be called from the top level helper, e.g. the
first helper that is called by the translation code. We stopped doing
that in commit 3157a553ec, and then we introduced problems when
unwinding the exceptions being thrown by helper_mret(), as reported by
[1].

Call GETPC() at the top level helper and pass the value along.

[1] https://gitlab.com/qemu-project/qemu/-/issues/3020

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: 3157a553ec ("target/riscv: Add Smrnmi mnret instruction")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/3020
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250714133739.1248296-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 16aa7771afeac422dcf7be2833d5426da6b814fa)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agolinux-user/strace.list: add riscv_hwprobe entry
Daniel Henrique Barboza [Mon, 28 Jul 2025 17:06:33 +0000 (14:06 -0300)] 
linux-user/strace.list: add riscv_hwprobe entry

We're missing a strace entry for riscv_hwprobe, and using -strace will
report it as "Unknown syscall 258".

After this patch we'll have:

$ ./build/qemu-riscv64 -strace test_mutex_riscv
110182 riscv_hwprobe(0x7f207efdc700,1,0,0,0,0) = 0
110182 brk(NULL) = 0x0000000000082000
(...)

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250728170633.113384-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit e111ffe48b29ca8abd450af9ee5dd71af3f93536)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoroms/Makefile: fix npcmNxx_bootrom build rules
Michael Tokarev [Sun, 27 Jul 2025 21:55:07 +0000 (00:55 +0300)] 
roms/Makefile: fix npcmNxx_bootrom build rules

Since commit 70ce076fa6dff60, the actual rom source dirs
are subdirs of vbootrom/ submodule, not in top-level of it.

Fixes: 70ce076fa6dff60 "roms: Update vbootrom to 1287b6e"
Fixes: 269b7effd90 ("pc-bios: Add NPCM8XX vBootrom")
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250727215511.807880-1-mjt@tls.msk.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 653a75a9d7f957c4ba9387d1a54bccfdce49cb9c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agosystem/physmem: fix use-after-free with dispatch
Pierrick Bouvier [Thu, 24 Jul 2025 16:11:42 +0000 (09:11 -0700)] 
system/physmem: fix use-after-free with dispatch

A use-after-free bug was reported when booting a Linux kernel during the
pci setup phase. It's quite hard to reproduce (needs smp, and favored by
having several pci devices with BAR and specific Linux config, which
is Debian default one in this case).

After investigation (see the associated bug ticket), it appears that,
under specific conditions, we might access a cached AddressSpaceDispatch
that was reclaimed by RCU thread meanwhile.
In the Linux boot scenario, during the pci phase, memory region are
destroyed/recreated, resulting in exposition of the bug.

The core of the issue is that we cache the dispatch associated to
current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
tcg_commit, which runs asynchronously on a given cpu.
At some point, we leave the rcu critial section, and the RCU thread
starts reclaiming it, but tcg_commit is not yet invoked, resulting in
the use-after-free.

It's not the first problem around this area, and commit 0d58c660689 [1]
("softmmu: Use async_run_on_cpu in tcg_commit") already tried to
address it. It did a good job, but it seems that we found a specific
situation where it's not enough.

This patch takes a simple approach: remove the cached value creating the
issue, and make sure we always get the current mapping for address
space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
This is not really costly, we just need two dereferences,
including one atomic (rcu) read, which is negligible considering we are
already on mmu slow path anyway.

Note that tcg_commit is still needed, as it's taking care of flushing
TLB, removing previously mapped entries.

Another solution would be to cache directly values under the dispatch
(dispatch themselves are not ref counted), keep an active reference on
associated memory section, and release it when appropriate (tricky).
Given the time already spent debugging this area now and previously, I
strongly prefer eliminating the root of the issue, instead of adding
more complexity for a hypothetical performance gain. RCU is precisely
used to ensure good performance when reading data, so caching is not as
beneficial as it might seem IMHO.

[1] https://gitlab.com/qemu-project/qemu/-/commit/0d58c660689f6da1e3feff8a997014003d928b3b

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20250724161142.2803091-1-pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 2865bf1c5795371ef441e9029bc22566ccff8299)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohw/net/cadence_gem: fix register mask initialization
Luc Michel [Wed, 16 Jul 2025 09:53:43 +0000 (11:53 +0200)] 
hw/net/cadence_gem: fix register mask initialization

The gem_init_register_masks function was called at init time but it
relies on the num-priority-queues property. Call it at realize time
instead.

Cc: qemu-stable@nongnu.org
Fixes: 4c70e32f05f ("net: cadence_gem: Define access permission for interrupt registers")
Signed-off-by: Luc Michel <luc.michel@amd.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Reviewed-by: Sai Pavan Boddu <sai.pavan.boddu@amd.com>
Message-ID: <20250716095432.81923-2-luc.michel@amd.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 2bfcd27e00a49da2efa5d703121b94cd9cd4948b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/mips: Only update MVPControl.EVP bit if executed by master VPE
Philippe Mathieu-Daudé [Tue, 27 Apr 2021 13:33:37 +0000 (15:33 +0200)] 
target/mips: Only update MVPControl.EVP bit if executed by master VPE

According to the 'MIPS MT Application-Specific Extension' manual:

  If the VPE executing the instruction is not a Master VPE,
  with the MVP bit of the VPEConf0 register set, the EVP bit
  is unchanged by the instruction.

Modify the DVPE/EVPE opcodes to only update the MVPControl.EVP bit
if executed on a master VPE.

Cc: qemu-stable@nongnu.org
Reported-by: Hansni Bu
Buglink: https://bugs.launchpad.net/qemu/+bug/1926277
Fixes: f249412c749 ("mips: Add MT halting and waking of VPEs")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-ID: <20210427133343.159718-1-f4bug@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit e895095c78ab877d40df2dd31ee79d85757d963b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agodocs/user: clarify user-mode expects the same OS
Alex Bennée [Fri, 25 Jul 2025 15:45:04 +0000 (16:45 +0100)] 
docs/user: clarify user-mode expects the same OS

While we somewhat cover this later when we talk about supported
operating systems make it clear in the front matter.

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250725154517.3523095-2-alex.bennee@linaro.org>
(cherry picked from commit 8d6c7de1cc71207ccc047583df0c84363a5da16b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agolinux-user/aarch64: Support TPIDR2_MAGIC signal frame record
Peter Maydell [Fri, 25 Jul 2025 17:55:09 +0000 (18:55 +0100)] 
linux-user/aarch64: Support TPIDR2_MAGIC signal frame record

FEAT_SME adds the TPIDR2 userspace-accessible system register, which
is used as part of the procedure calling standard's lazy saving
scheme for the ZA registers:
 https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#66the-za-lazy-saving-scheme

The Linux kernel has a signal frame record for saving
and restoring this value when calling signal handlers, but
we forgot to implement this. The result is that code which
tries to unwind an exception out of a signal handler will
not work correctly.

Add support for the missing record.

Cc: qemu-stable@nongnu.org
Fixes: 78011586b90d1 ("target/arm: Enable SME for user-only")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250725175510.3864231-3-peter.maydell@linaro.org>
(cherry picked from commit 99870aff907b1c863cd32558b543f0ab0d0e74ba)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agolinux-user/aarch64: Clear TPIDR2_EL0 when delivering signals
Peter Maydell [Fri, 25 Jul 2025 17:55:08 +0000 (18:55 +0100)] 
linux-user/aarch64: Clear TPIDR2_EL0 when delivering signals

A recent change to the kernel (Linux commit b376108e1f88
"arm64/fpsimd: signal: Clear TPIDR2 when delivering signals") updated
the signal-handler entry code to always clear TPIDR2_EL0.

This is necessary for the userspace ZA lazy saving scheme to work
correctly when unwinding exceptions across a signal boundary.
(For the essay-length description of the incorrect behaviour and
why this is the correct fix, see the commit message for the
kernel commit.)

Make QEMU also clear TPIDR2_EL0 on signal entry, applying the
equivalent bugfix to our implementation.

Note that getting this unwinding to work correctly also requires
changes to the userspace code, e.g.  as implemented in gcc in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b5ffc8e75a8

This change is technically an ABI change; from the kernel's
point of view SME was never enabled (it was hidden behind
CONFIG_BROKEN) before the change. From QEMU's point of view
our SME-related signal handling was broken anyway as we weren't
saving and restoring TPIDR2_EL0.

Cc: qemu-stable@nongnu.org
Fixes: 78011586b90d1 ("target/arm: Enable SME for user-only")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250725175510.3864231-2-peter.maydell@linaro.org>
(cherry picked from commit 3cdd990aa920ec8f2994b634f758dab4a86ac167)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/i386: fix width of third operand of VINSERTx128
Paolo Bonzini [Thu, 24 Jul 2025 23:10:12 +0000 (01:10 +0200)] 
target/i386: fix width of third operand of VINSERTx128

Table A-5 of the Intel manual incorrectly lists the third operand of
VINSERTx128 as Wqq, but it is actually a 128-bit value.  This is
visible when W is a memory operand close to the end of the page.

Fixes the recently-added poly1305_kunit test in linux-next.

(No testcase yet, but I plan to modify test-avx2 to use memory
close to the end of the page.  This would work because the test
vectors correctly have the memory operand as xmm2/m128).

Reported-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: qemu-stable@nongnu.org
Fixes: 79068477686 ("target/i386: reimplement 0x0f 0x3a, add AVX", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit feea87cd6b645d5166bdd304aac88f47f63dc2ef)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohw/display/qxl-render.c: fix qxl_unpack_chunks() chunk size calculation
Michael Tokarev [Fri, 21 Feb 2025 13:34:52 +0000 (16:34 +0300)] 
hw/display/qxl-render.c: fix qxl_unpack_chunks() chunk size calculation

In case of multiple chunks, code in qxl_unpack_chunks() takes size of the
wrong (next in the chain) chunk, instead of using current chunk size.
This leads to wrong number of bytes being copied, and to crashes if next
chunk size is larger than the current one.

Based on the code by Gao Yong.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1628
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit b8882becd572d3afb888c836a6ffc7f92c17d1c5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohost-utils: Drop workaround for buggy Apple Clang __builtin_subcll()
Peter Maydell [Mon, 21 Jul 2025 09:07:53 +0000 (10:07 +0100)] 
host-utils: Drop workaround for buggy Apple Clang __builtin_subcll()

In commit b0438861efe ("host-utils: Avoid using __builtin_subcll on
buggy versions of Apple Clang") we added a workaround for a bug in
Apple Clang 14 where its __builtin_subcll() implementation was wrong.
This bug was only present in Apple Clang 14, not in upstream clang,
and is not present in Apple Clang versions 15 and newer.

Since commit 4e035201 we have required at least Apple Clang 15, so we
no longer build with the buggy versions.  We can therefore drop the
workaround. This is effectively a revert of b0438861efe.

This should not be backported to stable branches, which may still
need to support Apple Clang 14.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3030
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250714145033.1908788-1-peter.maydell@linaro.org
(cherry picked from commit e74aad9f81cc2bfee2057086610e21bd98e9c5a5)
(Mjt: 10.0 already requres clang 15+, so it's okay to backport
 this change to 10.0 branch)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoUpdate version for 10.0.3 release v10.0.3
Michael Tokarev [Tue, 22 Jul 2025 17:46:10 +0000 (20:46 +0300)] 
Update version for 10.0.3 release

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohvf: arm: Emulate ICC_RPR_EL1 accesses properly
Zenghui Yu [Mon, 14 Jul 2025 16:01:39 +0000 (00:01 +0800)] 
hvf: arm: Emulate ICC_RPR_EL1 accesses properly

Commit a2260983c655 ("hvf: arm: Add support for GICv3") added GICv3 support
by implementing emulation for a few system registers. ICC_RPR_EL1 was
defined but not plugged in the sysreg handlers (for no good reason).

Fix it.

Fixes: a2260983c655 ("hvf: arm: Add support for GICv3")
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250714160139.10404-3-zenghui.yu@linux.dev
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit e6da704b711d5d731e4d933ad56cbbc25ee0a825)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/arm: Correct encoding of Debug Communications Channel registers
Peter Maydell [Mon, 21 Jul 2025 09:07:52 +0000 (10:07 +0100)] 
target/arm: Correct encoding of Debug Communications Channel registers

We don't implement the Debug Communications Channel (DCC), but
we do attempt to provide dummy versions of its system registers
so that software that tries to access them doesn't fall over.

However, we got the tx/rx register definitions wrong. These
should be:

AArch32:
  DBGDTRTX   p14 0 c0 c5 0  (on writes)
  DBGDTRRX   p14 0 c0 c5 0  (on reads)

AArch64:
  DBGDTRTX_EL0  2 3 0 5 0 (on writes)
  DBGDTRRX_EL0  2 3 0 5 0 (on reads)
  DBGDTR_EL0    2 3 0 4 0 (reads and writes)

where DBGDTRTX and DBGDTRRX are effectively different names for the
same 32-bit register, which has tx behaviour on writes and rx
behaviour on reads.  The AArch64-only DBGDTR_EL0 is a 64-bit wide
register whose top and bottom halves map to the DBGDTRRX and DBGDTRTX
registers.

Currently we have just one cpreg struct, which:
 * calls itself DBGDTR_EL0
 * uses the DBGDTRTX_EL0/DBGDTRRX_EL0 encoding
 * is marked as ARM_CP_STATE_BOTH but has the wrong opc1
   value for AArch32
 * is implemented as RAZ/WI

Correct the encoding so:
 * we name the DBGDTRTX/DBGDTRRX register correctly
 * we split it into AA64 and AA32 versions so we can get the
   AA32 encoding right
 * we implement DBGDTR_EL0 at its correct encoding

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2986
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250708141049.778361-1-peter.maydell@linaro.org
(cherry picked from commit 655659a74a36b63e33d2dc969d3c44beb1b008b3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoui: fix setting client_endian field defaults
Daniel P. Berrangé [Wed, 4 Jun 2025 15:47:31 +0000 (16:47 +0100)] 
ui: fix setting client_endian field defaults

When a VNC client sends a "set pixel format" message, the
'client_endian' field will get initialized, however, it is
valid to omit this message if the client wants to use the
server's native pixel format. In the latter scenario nothing
is initializing the 'client_endian' field, so it remains set
to 0, matching neither G_LITTLE_ENDIAN nor G_BIG_ENDIAN. This
then results in pixel format conversion routines taking the
wrong code paths.

This problem existed before the 'client_be' flag was changed
into the 'client_endian' value, but the lack of initialization
meant it semantically defaulted to little endian, so only big
endian systems would potentially be exposed to incorrect pixel
translation.

The 'virt-viewer' / 'remote-viewer' apps always send a "set
pixel format" message so aren't exposed to any problems, but
the classical 'vncviewer' app will show the problem easily.

Fixes: 7ed96710e82c385c6cfc3d064eec7dde20f0f3fd
Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 3ac6daa9e1c5d7dae2a3cd1c6a388174b462f3e8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohw/net/npcm_gmac.c: Send the right data for second packet in a row
Peter Maydell [Mon, 14 Jul 2025 16:55:20 +0000 (17:55 +0100)] 
hw/net/npcm_gmac.c: Send the right data for second packet in a row

The transmit loop in gmac_try_send_next_packet() is constructed in a
way that means it will send incorrect data if it it sends more than
one packet.

The function assembles the outbound data in a dynamically allocated
block of memory which is pointed to by tx_send_buffer.  We track the
first point in this block of memory which is not yet used with the
prev_buf_size offset, initially zero.  We track the size of the
packet we're sending with the length variable, also initially zero.

As we read chunks of data out of guest memory, we write them to
tx_send_buffer[prev_buf_size], and then increment both prev_buf_size
and length.  (We might dynamically reallocate the buffer if needed.)

When we send a packet, we checksum and send length bytes, starting at
tx_send_buffer, and then we reset length to 0.  This gives the right
data for the first packet.  But we don't reset prev_buf_size.  This
means that if we process more descriptors with further data for the
next packet, that data will continue to accumulate at offset
prev_buf_size, i.e.  after the data for the first packet.  But when
we transmit that second packet, we send length bytes from
tx_send_buffer, so we will send a packet which has the length of the
second packet but the data of the first one.

The fix for this is to also clear prev_buf_size after the packet has
been sent -- we never need the data from packet one after we've sent
it, so we can write packet two's data starting at the beginning of
the buffer.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 871a6e5b339f0b5e71925ec7d3f452944a1c82d3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agotarget/i386: do not expose ARCH_CAPABILITIES on AMD CPU
Paolo Bonzini [Mon, 14 Jul 2025 08:19:36 +0000 (10:19 +0200)] 
target/i386: do not expose ARCH_CAPABILITIES on AMD CPU

KVM emulates the ARCH_CAPABILITIES on x86 for both Intel and AMD
cpus, although the IA32_ARCH_CAPABILITIES MSR is an Intel-specific
MSR and it makes no sense to emulate it on AMD.

As a consequence, VMs created on AMD with qemu -cpu host and using
KVM will advertise the ARCH_CAPABILITIES feature and provide the
IA32_ARCH_CAPABILITIES MSR. This can cause issues (like Windows BSOD)
as the guest OS might not expect this MSR to exist on such cpus (the
AMD documentation specifies that ARCH_CAPABILITIES feature and MSR
are not defined on the AMD architecture).

A fix was proposed in KVM code, however KVM maintainers don't want to
change this behavior that exists for 6+ years and suggest changes to be
done in QEMU instead.  Therefore, hide the bit from "-cpu host":
migration of -cpu host guests is only possible between identical host
kernel and QEMU versions, therefore this is not a problematic breakage.

If a future AMD machine does include the MSR, that would re-expose the
Windows guest bug; but it would not be KVM/QEMU's problem at that
point, as we'd be following a genuine physical CPU impl.

Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d3a24134e37d57abd3e7445842cda2717f49e96d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoi386/cpu: Honor maximum value for CPUID.8000001DH.EAX[25:14]
Zhao Liu [Mon, 14 Jul 2025 08:08:59 +0000 (16:08 +0800)] 
i386/cpu: Honor maximum value for CPUID.8000001DH.EAX[25:14]

CPUID.8000001DH:EAX[25:14] is "NumSharingCache", and the number of
logical processors sharing this cache is the value of this field
incremented by 1. Because of its width limitation, the maximum value
currently supported is 4095.

Though at present Q35 supports up to 4096 CPUs, by constructing a
specific topology, the width of the APIC ID can be extended beyond 12
bits. For example, using `-smp threads=33,cores=9,modules=9` results in
a die level offset of 6 + 4 + 4 = 14 bits, which can also cause
overflow. Check and honor the maximum value as CPUID.04H did.

Cc: Babu Moger <babu.moger@amd.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-8-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5d21ee453ad8e3f95f75e542cb3b35c5bb7cf23a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoi386/cpu: Fix overflow of cache topology fields in CPUID.04H
Qian Wen [Mon, 14 Jul 2025 08:08:58 +0000 (16:08 +0800)] 
i386/cpu: Fix overflow of cache topology fields in CPUID.04H

According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of
addressable IDs for processor cores in the physical package. If we
launch over 64 cores VM, the 6-bit field will overflow, and the wrong
core_id number will be reported.

Since the HW reports 0x3f when the intel processor has over 64 cores,
limit the max value written to EAX[31:26] to 63, so max num_cores should
be 64.

For EAX[14:25], though at present Q35 supports up to 4096 CPUs, by
constructing a specific topology, the width of the APIC ID can be
extended beyond 12 bits. For example, using `-smp threads=33,cores=9,
modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which
can also cause overflow.  check and honor the maximum value for
EAX[14:25] as well.

In addition, for host-cache-info case, also apply the same checks and
fixes.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Qian Wen <qian.wen@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-7-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3e86124e7cb9b66e07fb992667865a308f16fcf2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoi386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]
Michael Tokarev [Thu, 17 Jul 2025 03:23:26 +0000 (06:23 +0300)] 
i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]

The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM
Vol2:

Bits 23-16: Maximum number of addressable IDs for logical processors in
this physical package.

When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is
apic_id, 2) bits [23:16] get truncated.

Specifically, if launching the VM with -smp 256, the value written to
EBX[23:16] is 0 because of data overflow. If the guest only supports
legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f
or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs,
the return of the kernel invoking cpu_smt_allowed() is false and APs
(application processors) will fail to bring up. Then only CPU 0 is online,
and others are offline.

For example, launch VM via:
qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \
    -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \
    -drive file=guest.img,if=none,id=virtio-disk0,format=raw \
    -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic

The guest shows:
    CPU(s):               256
    On-line CPU(s) list:  0
    Off-line CPU(s) list: 1-255

To avoid this issue caused by overflow, limit the max value written to
EBX[23:16] to 255 as the HW does.

Cc: qemu-stable@nongnu.org
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Qian Wen <qian.wen@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250714080859.1960104-6-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a62fef58299562aae6667b8d8552247423e886b3)
(Mjt: fixup for 10.0.x series due to missing v10.0.0-2217-gf985a1195b
 "i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]")
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoui/vnc: Do not copy z_stream
Akihiko Odaki [Tue, 3 Jun 2025 09:18:28 +0000 (18:18 +0900)] 
ui/vnc: Do not copy z_stream

vnc_worker_thread_loop() copies z_stream stored in its local VncState to
the persistent VncState, and the copied one is freed with deflateEnd()
later. However, deflateEnd() refuses to operate with a copied z_stream
and returns Z_STREAM_ERROR, leaking the allocated memory.

Avoid copying the zlib state to fix the memory leak.

Fixes: bd023f953e5e ("vnc: threaded VNC server")
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250603-zlib-v3-1-20b857bd8d05@rsg.ci.i.u-tokyo.ac.jp>
(cherry picked from commit aef22331b5a4670f42638a5f63a26e93bf779aae)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agovhost: Fix used memslot tracking when destroying a vhost device
David Hildenbrand [Tue, 3 Jun 2025 11:13:36 +0000 (13:13 +0200)] 
vhost: Fix used memslot tracking when destroying a vhost device

When we unplug a vhost device, we end up calling vhost_dev_cleanup()
where we do a memory_listener_unregister().

This memory_listener_unregister() call will end up disconnecting the
listener from the address space through listener_del_address_space().

In that process, we effectively communicate the removal of all memory
regions from that listener, resulting in region_del() + commit()
callbacks getting triggered.

So in case of vhost, we end up calling vhost_commit() with no remaining
memory slots (0).

In vhost_commit() we end up overwriting the global variables
used_memslots / used_shared_memslots, used for detecting the number
of free memslots. With used_memslots / used_shared_memslots set to 0
by vhost_commit() during device removal, we'll later assume that the
other vhost devices still have plenty of memslots left when calling
vhost_get_free_memslots().

Let's fix it by simply removing the global variables and depending
only on the actual per-device count.

Easy to reproduce by adding two vhost-user devices to a VM and then
hot-unplugging one of them.

While at it, detect unexpected underflows in vhost_get_free_memslots()
and issue a warning.

Reported-by: yuanminghao <yuanmh12@chinatelecom.cn>
Link: https://lore.kernel.org/qemu-devel/20241121060755.164310-1-yuanmh12@chinatelecom.cn/
Fixes: 2ce68e4cf5be ("vhost: add vhost_has_free_slot() interface")
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20250603111336.1858888-1-david@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9f749129e2629b19f424df106c92c5a5647e396c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoroms: re-remove execute bit from hppa-firmware*
Cole Robinson [Sun, 18 May 2025 17:54:20 +0000 (13:54 -0400)] 
roms: re-remove execute bit from hppa-firmware*

This was fixed in c9d77526bddba0803a1fa982fb59ec98057150f9 for
9.2.0 but regressed in db34be329162cf6b06192703065e6c1010dbe3c5 in
10.0.0

When the bit is present, rpmbuild complains about missing ELF build-id

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-ID: <52d0edfbb9b2f63a866f0065a721f3a95da6f8ba.1747590860.git.crobinso@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit a598090ebaeb930ce33c2df0d80d87da13be8848)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agofile-posix: Fix aio=threads performance regression after enablign FUA
Kevin Wolf [Wed, 25 Jun 2025 08:50:19 +0000 (10:50 +0200)] 
file-posix: Fix aio=threads performance regression after enablign FUA

For aio=threads, we're currently not implementing REQ_FUA in any useful
way, but just do a separate raw_co_flush_to_disk() call. This changes
behaviour compared to the old state, which used bdrv_co_flush() with its
optimisations. As a quick fix, call bdrv_co_flush() again like before.
Eventually, we can use pwritev2() to make use of RWF_DSYNC if available,
but we'll still have to keep this code path as a fallback, so this fix
is required either way.

While the fix itself is a one-liner, some new graph locking annotations
are needed to convince TSA that the locking is correct.

Cc: qemu-stable@nongnu.org
Fixes: 984a32f17e8d ("file-posix: Support FUA writes")
Buglink: https://issues.redhat.com/browse/RHEL-96854
Reported-by: Tingting Mao <timao@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250625085019.27735-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d402da1360c2240e81f0e5fc80ddbfc6238e0da8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Fix truncation of oldval in amdvi_writeq
Ethan Milon [Tue, 17 Jun 2025 15:04:27 +0000 (15:04 +0000)] 
amd_iommu: Fix truncation of oldval in amdvi_writeq

The variable `oldval` was incorrectly declared as a 32-bit `uint32_t`.
This could lead to truncation and incorrect behavior where the upper
read-only 32 bits are significant.

Fix the type of `oldval` to match the return type of `ldq_le_p()`.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Ethan Milon <ethan.milon@eviden.com>
Message-Id: <20250617150427.20585-9-alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 5788929e05e18ed5f76dc8ade4210f022c9ba5a1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Remove duplicated definitions
Alejandro Jimenez [Tue, 17 Jun 2025 15:04:26 +0000 (15:04 +0000)] 
amd_iommu: Remove duplicated definitions

No functional change.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-8-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 5959b641c98b5ae9677e2c1d89902dac31b344d9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Fix the calculation for Device Table size
Alejandro Jimenez [Tue, 17 Jun 2025 15:04:25 +0000 (15:04 +0000)] 
amd_iommu: Fix the calculation for Device Table size

Correctly calculate the Device Table size using the format encoded in the
Device Table Base Address Register (MMIO Offset 0000h).

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-7-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 67d3077ee403472d45794399e97c9f329242fce9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Fix mask to retrieve Interrupt Table Root Pointer from DTE
Alejandro Jimenez [Tue, 17 Jun 2025 15:04:24 +0000 (15:04 +0000)] 
amd_iommu: Fix mask to retrieve Interrupt Table Root Pointer from DTE

Fix an off-by-one error in the definition of AMDVI_IR_PHYS_ADDR_MASK. The
current definition masks off the most significant bit of the Interrupt Table
Root ptr i.e. it only generates a mask with bits [50:6] set. See the AMD I/O
Virtualization Technology (IOMMU) Specification for the Interrupt Table
Root Pointer[51:6] field in the Device Table Entry format.

Cc: qemu-stable@nongnu.org
Fixes: b44159fe0078 ("x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-6-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 123cf4bdd378f746dfa2f5415ba084148dded3e3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Fix masks for various IOMMU MMIO Registers
Alejandro Jimenez [Tue, 17 Jun 2025 15:04:23 +0000 (15:04 +0000)] 
amd_iommu: Fix masks for various IOMMU MMIO Registers

Address various issues with definitions of the MMIO registers e.g. for the
Device Table Address Register, the size mask currently encompasses reserved
bits [11:9], so change it to only extract the bits [8:0] encoding size.

Convert masks to use GENMASK64 for consistency, and make unrelated
definitions independent.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-5-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 108e10ff69099c3ebe147f505246be7c2ad2a499)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Update bitmasks representing DTE reserved fields
Alejandro Jimenez [Tue, 17 Jun 2025 15:04:22 +0000 (15:04 +0000)] 
amd_iommu: Update bitmasks representing DTE reserved fields

The DTE validation method verifies that all bits in reserved DTE fields are
unset. Update them according to the latest definition available in AMD I/O
Virtualization Technology (IOMMU) Specification - Section 2.2.2.1 Device
Table Entry Format. Remove the magic numbers and use a macro helper to
generate bitmasks covering the specified ranges for better legibility.

Note that some reserved fields specify that events are generated when they
contain non-zero bits, or checks are skipped under certain configurations.
This change only updates the reserved masks, checks for special conditions
are not yet implemented.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-4-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ff3dcb3bf652912466dcc1cd10d3267f185c212e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Fix Device ID decoding for INVALIDATE_IOTLB_PAGES command
Alejandro Jimenez [Tue, 17 Jun 2025 15:04:21 +0000 (15:04 +0000)] 
amd_iommu: Fix Device ID decoding for INVALIDATE_IOTLB_PAGES command

The DeviceID bits are extracted using an incorrect offset in the call to
amdvi_iotlb_remove_page(). This field is read (correctly) earlier, so use
the value already retrieved for devid.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250617150427.20585-3-alejandro.j.jimenez@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit c63b8d1425ba8b3b08ee4f7346457fd8a7f12a24)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoamd_iommu: Fix Miscellaneous Information Register 0 encoding
Alejandro Jimenez [Tue, 17 Jun 2025 15:04:20 +0000 (15:04 +0000)] 
amd_iommu: Fix Miscellaneous Information Register 0 encoding

The definitions encoding the maximum Virtual, Physical, and Guest Virtual
Address sizes supported by the IOMMU are using incorrect offsets i.e. the
VASize and GVASize offsets are switched. The value in the GVAsize field is
also modified, since it was incorrectly encoded.

Cc: qemu-stable@nongnu.org
Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Co-developed-by: Ethan MILON <ethan.milon@eviden.com>
Signed-off-by: Ethan MILON <ethan.milon@eviden.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Message-Id: <20250617150427.20585-2-alejandro.j.jimenez@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 091c7d7924f33781c2fb8e7297dc54971e0c3785)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agovirtio-net: Add queues for RSS during migration
Akihiko Odaki [Fri, 30 May 2025 05:18:53 +0000 (14:18 +0900)] 
virtio-net: Add queues for RSS during migration

virtio_net_pre_load_queues() inspects vdev->guest_features to tell if
VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required
number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for
VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features
is set at the point and VIRTIO_NET_F_RSS uses bit 60 while
VIRTIO_NET_F_MQ uses bit 22.

Instead of inferring the required number of queues from
vdev->guest_features, use the number loaded from the vm state. This
change also has a nice side effect to remove a duplicate peer queue
pair change by circumventing virtio_net_set_multiqueue().

Also update the comment in include/hw/virtio/virtio.h to prevent an
implementation of pre_load_queues() from refering to any fields being
loaded during migration by accident in the future.

Fixes: 8c49756825da ("virtio-net: Add only one queue pair when realizing")
Tested-by: Lei Yang <leiyang@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit adda0ad56bd28d5a809051cbd190fda5798ec4e4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agonet: fix buffer overflow in af_xdp_umem_create()
Anastasia Belova [Mon, 2 Jun 2025 08:57:17 +0000 (11:57 +0300)] 
net: fix buffer overflow in af_xdp_umem_create()

s->pool has n_descs elements so maximum i should be
n_descs - 1. Fix the upper bound.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: cb039ef3d9 ("net: add initial support for AF_XDP network backend")
Cc: qemu-stable@nongnu.org
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Anastasia Belova <nabelova31@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 110d0fa2d4d1f754242f6775baec43776a9adb35)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agoaccel/kvm: Adjust the note about the minimum required kernel version
Thomas Huth [Wed, 2 Jul 2025 06:03:19 +0000 (08:03 +0200)] 
accel/kvm: Adjust the note about the minimum required kernel version

Since commit 126e7f78036 ("kvm: require KVM_CAP_IOEVENTFD and
KVM_CAP_IOEVENTFD_ANY_LENGTH") we require at least kernel 4.5 to
be able to use KVM. Adjust the upgrade_note accordingly.
While we're at it, remove the text about kvm-kmod and the
SourceForge URL since this is not actively maintained anymore.

Fixes: 126e7f78036 ("kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit f180e367fce44b336105a11a62edf9610b6b2a06)

3 months agolinux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC
Peter Maydell [Fri, 11 Jul 2025 14:12:17 +0000 (15:12 +0100)] 
linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC

In the linux-user do_fork() function we try to set the FD_CLOEXEC
flag on a pidfd like this:

    fcntl(pid_fd, F_SETFD, fcntl(pid_fd, F_GETFL) | FD_CLOEXEC);

This has two problems:
 (1) it doesn't check errors, which Coverity complains about
 (2) we use F_GETFL when we mean F_GETFD

Deal with both of these problems by using qemu_set_cloexec() instead.
That function will assert() if the fcntls fail, which is fine (we are
inside fork_start()/fork_end() so we know nothing can mess around
with our file descriptors here, and we just got this one from
pidfd_open()).

(As we are touching the if() statement here, we correct the
indentation.)

Coverity: CID 1508111
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250711141217.1429412-1-peter.maydell@linaro.org>
(cherry picked from commit d6390204c61e148488f034d1f79be35cd3318d93)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agomigration: Don't sync volatile memory after migration completes
Chaney, Ben [Mon, 16 Jun 2025 20:56:50 +0000 (20:56 +0000)] 
migration: Don't sync volatile memory after migration completes

Syncing volatile memory provides no benefit, instead it can cause
performance issues in some cases.  Only sync memory that is marked as
non-volatile after migration completes on destination.

Signed-off-by: Ben Chaney <bchaney@akamai.com>
Fixes: bd108a44bc29 (migration: ram: Switch to ram block writeback)
Link: https://lore.kernel.org/r/1CC43F59-336F-4A12-84AD-DB89E0A17A95@akamai.com
Signed-off-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 983899eab4939dc4dff67fa4d822c5b4df7eae21)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agolinux-user: Hold the fd-trans lock across fork
Geoffrey Thomas [Fri, 14 Mar 2025 12:47:42 +0000 (08:47 -0400)] 
linux-user: Hold the fd-trans lock across fork

If another thread is holding target_fd_trans_lock during a fork,
then the lock becomes permanently locked in the child and the
emulator deadlocks at the next interaction with the fd-trans table.
As with other locks, acquire the lock in fork_start() and release
it in fork_end().

Cc: qemu-stable@nongnu.org
Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Fixes: c093364f4d91 "fd-trans: Fix race condition on reallocation of the translation table."
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2846
Buglink: https://github.com/astral-sh/uv/issues/6105
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250314124742.4965-1-geofft@ldpreload.com>
(cherry picked from commit e4e839b2eeea5745c48ce47144c7842eb7cd455f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agolinux-user: Check for EFAULT failure in nanosleep
Peter Maydell [Thu, 10 Jul 2025 16:43:54 +0000 (17:43 +0100)] 
linux-user: Check for EFAULT failure in nanosleep

target_to_host_timespec() returns an error if the memory the guest
passed us isn't actually readable.  We check for this everywhere
except the callsite in the TARGET_NR_nanosleep case, so this mistake
was caught by a Coverity heuristic.

Add the missing error checks to the calls that convert between the
host and target timespec structs.

Coverity: CID 1507104
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250710164355.1296648-1-peter.maydell@linaro.org>
(cherry picked from commit c4828cb8502d0b2adc39b9cde93df7d2886df897)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agolinux-user: Implement fchmodat2 syscall
Peter Maydell [Thu, 10 Jul 2025 11:31:23 +0000 (12:31 +0100)] 
linux-user: Implement fchmodat2 syscall

The fchmodat2 syscall is new from Linux 6.6; it is like the
existing fchmodat syscall except that it takes a flags parameter.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3019
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250710113123.1109461-1-peter.maydell@linaro.org>
(cherry picked from commit 6a3e132a1be8c9e649967a4eb341d00731be7f51)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
3 months agohw/arm/fsl-imx8mp: Wire VIRQ and VFIQ
Bernhard Beschow [Sun, 29 Jun 2025 20:48:50 +0000 (22:48 +0200)] 
hw/arm/fsl-imx8mp: Wire VIRQ and VFIQ

Allows to run KVM guests inside the imx8mp-evk machine.

Fixes: a4eefc69b237 ("hw/arm: Add i.MX 8M Plus EVK board")
CC: qemu-stable
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 930180f3b9a292639eb894f1ca846683834ed4b7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>