git.ipfire.org Git - thirdparty/qemu.git/log

target/arm: Make DisasContext.{fp, sve}_access_checked tristate

The check for fp_excp_el in assert_fp_access_checked is
incorrect. For SME, with StreamingMode enabled, the access
is really against the streaming mode vectors, and access
to the normal fp registers is allowed to be disabled.
C.f. sme_enabled_check.

Convert sve_access_checked to match, even though we don't
currently check the exception state.

Cc: qemu-stable@nongnu.org
Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250307190415.982049-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

util/cacheflush: Make first DSB unconditional on aarch64

On ARM hosts with CTR_EL0.DIC and CTR_EL0.IDC set, this would only cause
an ISB to be executed during cache maintenance, which could lead to QEMU
executing TBs containing garbage instructions.

This seems to be because the ISB finishes executing instructions and
flushes the pipeline, but the ISB doesn't guarantee that writes from the
executed instructions are committed. If a small enough TB is created, it's
possible that the writes setting up the TB aren't committed by the time the
TB is executed.

This function is intended to be a port of the gcc implementation
(https://github.com/gcc-mirror/gcc/blob/85b46d0795ac76bc192cb8f88b646a647acf98c1/libgcc/config/aarch64/sync-cache.c#L67)
which makes the first DSB unconditional, so we can fix the synchronization
issue by doing that as well.

Cc: qemu-stable@nongnu.org
Fixes: 664a79735e4deb1 ("util: Specialize flush_idcache_range for aarch64")
Signed-off-by: Joe Komlodi <komlodi@google.com>
Message-id: 20250310203622.1827940-2-komlodi@google.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Revert "hw/char/pl011: Warn when using disabled receiver"

The guest does not control whether characters are sent on the UART.
Sending them before the guest happens to boot will now result in a
"guest error" log entry that is only because of timing, even if the
guest _would_ later setup the receiver correctly.

This reverts the bulk of commit abf2b6a028670bd2890bb3aee7e103fe53e4b0df,
and instead adds a comment about why we don't check the enable bits.

Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20250311153717.206129-1-pbonzini@redhat.com
[PMM: expanded comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tests/functional: Bump up arm_replay timeout

On my machine the arm_replay test takes over 2 minutes to run
in a config with Rust enabled and debug enabled:

$ time (cd build/rust ; PYTHONPATH=../../python:../../tests/functional
QEMU_TEST_QEMU_BINARY=./qemu-system-arm ./pyvenv/bin/python3
../../tests/functional/test_arm_replay.py)
TAP version 13
ok 1 test_arm_replay.ArmReplay.test_cubieboard
ok 2 test_arm_replay.ArmReplay.test_vexpressa9
ok 3 test_arm_replay.ArmReplay.test_virt
1..3

real    2m16.564s
user    2m13.461s
sys     0m3.523s

Bump up the timeout to 4 minutes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20250310102830.3752440-1-peter.maydell@linaro.org

MAINTAINERS: Fix status for Arm boards I "maintain"

I'm down as the only listed maintainer for quite a lot of Arm SoC and
board types. In some cases this is only as the "maintainer of last
resort" and I'm not in practice doing anything beyond patch review
and the odd bit of tidyup.

Move these entries in MAINTAINERS from "Maintained" to "Odd Fixes",
to better represent reality. Entries for other boards and SoCs where
I do more actively care (or where there is a listed co-maintainer)
remain as they are.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250307152838.3226398-1-peter.maydell@linaro.org

target/arm: Forbid return to AArch32 when CPU is AArch64-only

In the Arm ARM, rule R_TYTWB states that returning to AArch32
is an illegal exception return if:
* AArch32 is not supported at any exception level
* the target EL is configured for AArch64 via SCR_EL3.RW
or HCR_EL2.RW or via CPU state at reset

We check the second of these, but not the first (which can only be
relevant for the case of a return to EL0, because if AArch32 is not
supported at one of the higher ELs then the RW bits will have an
effective value of 1 and the the "configured for AArch64" condition
will hold also).

Add the missing condition. Although this is technically a bug
(because we have one AArch64-only CPU: a64fx) it isn't worth
backporting to stable because no sensible guest code will
deliberately try to return to a nonexistent execution state
to check that it gets an illegal exception return.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Add cpu local variable to exception_return helper

We already call env_archcpu() multiple times within the
exception_return helper function, and we're about to want to
add another use of the ARMCPU pointer. Add a local variable
cpu so we can call env_archcpu() just once.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: HCR_EL2.RW should be RAO/WI if EL1 doesn't support AArch32

When EL1 doesn't support AArch32, the HCR_EL2.RW bit is supposed to
be RAO/WI. Enforce the RAO/WI behaviour.

Note that we handle "reset value should honour RES1 bits" in the same
way that SCR_EL3 does, via a reset function.

We do already have some CPU types which don't implement AArch32
above EL0, so this is technically a bug; it doesn't seem worth
backporting to stable because no sensible guest code will be
deliberately attempting to set the RW bit to a value corresponding
to an unimplemented execution state and then checking that we
did the right thing.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: SCR_EL3.RW should be treated as 1 if EL2 doesn't support AArch32

The definition of SCR_EL3.RW says that its effective value is 1 if:
- EL2 is implemented and does not support AArch32, and SCR_EL3.NS is 1
- the effective value of SCR_EL3.{EEL2,NS} is {1,0} (i.e. we are
Secure and Secure EL2 is disabled)

We implement the second of these in arm_el_is_aa64(), but forgot the
first.

Provide a new function arm_scr_rw_eff() to return the effective
value of SCR_EL3.RW, and use it in arm_el_is_aa64() and the other
places that currently look directly at the bit value.

(scr_write() enforces that the RW bit is RAO/WI if neither EL1 nor
EL2 have AArch32 support, but if EL1 does but EL2 does not then the
bit must still be writeable.)

This will mean that if code at EL3 attempts to perform an exception
return to AArch32 EL2 when EL2 is AArch64-only we will correctly
handle this as an illegal exception return: it will be caught by the
"return to an EL which is configured for a different register width"
check in HELPER(exception_return).

We do already have some CPU types which don't implement AArch32
above EL0, so this is technically a bug; it doesn't seem worth
backporting to stable because no sensible guest code will be
deliberately attempting to set the RW bit to a value corresponding
to an unimplemented execution state and then checking that we
did the right thing.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Move arm_current_el() and arm_el_is_aa64() to internals.h

The functions arm_current_el() and arm_el_is_aa64() are used only in
target/arm and in hw/intc/arm_gicv3_cpuif.c. They're functions that
query internal state of the CPU. Move them out of cpu.h and into
internals.h.

This means we need to include internals.h in arm_gicv3_cpuif.c, but
this is justifiable because that file is implementing the GICv3 CPU
interface, which really is part of the CPU proper; we just ended up
implementing it in code in hw/intc/ for historical reasons.

The motivation for this move is that we'd like to change
arm_el_is_aa64() to add a condition that uses cpu_isar_feature();
but we don't want to include cpu-features.h in cpu.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Move arm_cpu_data_is_big_endian() etc to internals.h

The arm_cpu_data_is_big_endian() and related functions are now used
only in target/arm; they can be moved to internals.h.

The motivation here is that we would like to move arm_current_el()
to internals.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

linux-user/arm: Remove unused get_put_user macros

In linux-user/arm/cpu_loop.c we define a full set of get/put
macros for both code and data (since the endianness handling
is different between the two). However the only one we actually
use is get_user_code_u32(). Remove the rest.

We leave a comment noting how data-side accesses should be handled
for big-endian, because that's a subtle point and we just removed the
macros that were effectively documenting it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

linux-user/aarch64: Remove unused get/put_user macros

At the top of linux-user/aarch64/cpu_loop.c we define a set of
macros for reading and writing data and code words, but we never
use these macros. Delete them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Un-inline access_secure_reg()

We would like to move arm_el_is_aa64() to internals.h; however, it is
used by access_secure_reg(). Make that function not be inline, so
that it can stay in cpu.h.

access_secure_reg() is used only in two places:
* in hflags.c
* in the user-mode arm emulators, to decide whether to store
the TLS value in the secure or non-secure banked field

The second of these is not on a super-hot path that would care about
the inlining (and incidentally will always use the NS banked field
because our user-mode CPUs never set ARM_FEATURE_EL3); put the
definition of access_secure_reg() in hflags.c, near its only use
inside target/arm.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Move A32_BANKED_REG_{GET,SET} macros to cpregs.h

The A32_BANKED_REG_{GET,SET} macros are only used inside target/arm;
move their definitions to cpregs.h. There's no need to have them
defined in all the code that includes cpu.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- virtio-scsi: add iothread-vq-mapping parameter
- Improve writethrough performance
- Fix missing zero init in bdrv_snapshot_goto()
- Added scripts/qcow2-to-stdout.py
- Code cleanup and iotests fixes

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmfTDysRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9Yz6A//asOl37zjbtf9pYjY/gliH859TQOppPGD
# LB9IIr+nTDME0wfUkCOlag+CeEYZwkeo2PF+XeopsyzlJeBOk4tL7AkY57XYe3lZ
# M5hlnNrn6l3gb6iioMg60pEKSMrpKprB16vT3nAtyN6aEXsm9TvtPkWPFTCFGVeK
# W74VCr7wuXbfdEJcOGd8WhB9ZHIgwoWYnoL41tvCoefW2yNaMA6X0TLn98toXzOi
# il50ZnnchTQngns5R+n+1R1Ma995t393D+CArQcYVRzxKGOs5p0y4otz4gCkMhdp
# GVL09R7Ge4TteSJ2myxlN/EjYOxmdoMrVDajr4xPdHBw12MKzgk8i82h4/Es/Q5o
# 3Npgx74+jDyqlICb/czTVM5KJINpyO80vO3N3WpYUOQGyTCcYgv7pIpy8pB2o6Te
# RPlv0W9bHVSSgThFFLQ0Ud8WRGJe1K/ar8bdmiWN08Wez1avENWaYmsv5zGnFL24
# vD6cNXMR4mF7mzyeWda/5hGKv75djVgX+ZfzvWNT3qgizD56JBOA3RdCRwBZJOJb
# TvJkfi5RGyaji9BfKVCYBL3/iDELJEVDW8jxvIIUrS0aPcTHpAQ5gTO7VAokreqZ
# 5Smll11eeoEgPPvNLw8ikmOGTWOMkJGrmExP2K1ApANq3kSbBSU4jroEr0BG9PZT
# 6Y0hUdtFSdU=
# =w2Ri
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 14 Mar 2025 01:00:27 HKT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (23 commits)
  scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout
  virtio-scsi: only expose cmd vqs via iothread-vq-mapping
  virtio-scsi: handle ctrl virtqueue in main loop
  virtio-scsi: add iothread-vq-mapping parameter
  virtio: extract iothread-vq-mapping.h API
  virtio-blk: tidy up iothread_vq_mapping functions
  virtio-blk: extract cleanup_iothread_vq_mapping() function
  virtio-scsi: perform TMFs in appropriate AioContexts
  virtio-scsi: protect events_dropped field
  virtio-scsi: introduce event and ctrl virtqueue locks
  scsi: introduce requests_lock
  scsi: track per-SCSIRequest AioContext
  dma: use current AioContext for dma_blk_io()
  scsi-disk: drop unused SCSIDiskState->bh field
  iotests: Limit qsd-migrate to working formats
  aio-posix: Adjust polling time also for new handlers
  aio-posix: Separate AioPolledEvent per AioHandler
  aio-posix: Factor out adjust_polling_time()
  aio: Create AioPolledEvent
  block/io: Ignore FUA with cache.no-flush=on
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-request-2025-03-13' of https://gitlab.com/thuth/qemu into staging

* Various fixes for functional tests
* Fix the name of the "configs" directory in the documentation

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmfSjagRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbWBmA//RhAHuF/fTmQagBsZPETXjU1g8ifw9aqm
# WPZcQEXyQFlqYYQZmtV7dk3aTGEw4kBDmm+SKTSQz1yUcBGptMl8xuWaxgdpcOw0
# Bqt+lYNgwGL9/OocCdNolU3+aVbETljr5l+rzbnwsTVIqGk63Qhmtwdupb8h1nfY
# 4vCXU+sY3BkvBF8HbV6Wb1aPtqC+iH/Ln8+yoKkC8UePD623dK58SsOVrhUQDfFr
# U/HUy4BZlHFCfGGmDVGBjHdEbOzQkLQ9N3ilsNSWcF87RPkWPft+qLs4RjDFW+oT
# oksXEFHcr8XQO03fwHBNTyv+NUfnrvDY8V+gl6C9ItQr58SZzse57caZKWrYppZ3
# l5iHoaLMV3juZFDNXNHkWHuveXi05+0V0UbZihzBeC4+zjNRyh3e1GuDoh5VoG8o
# XIb55RxU8eBG2/ulHZ71eAYrGpxO+tDdsdnak1coPFsU8HrC9QzRfywiAZe1Wwmx
# 5t5AHbZ7RdnxgStU1lWTUT2IDVSini4DKevt/FzhKkv1aD8NbhI/ooGDC0zbS6SU
# XK6PP2G5a5OnjQ904oRCQbnhrxFa5qNfryylvvreT2bVgX0BiE4pJ9JXdgQOMYlP
# kZERZZQcv3y6VVavAT67yeNKQpyb4HSHdTDQ2irgXP1UwHRpwLpKdqB1UhzNJ8m8
# k0faA8RXir4=
# =VtGZ
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 13 Mar 2025 15:47:52 HKT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2025-03-13' of https://gitlab.com/thuth/qemu:
  tests/functional: skip vulkan test if missing vulkaninfo
  tests/functional/asset: Add AssetError exception class
  tests/functional/asset: Verify downloaded size
  tests/functional/asset: Fail assert fetch when retries are exceeded
  docs/system: Fix the information on how to run certain functional tests
  tests/functional: Bump up arm_replay timeout
  tests/functional: Require 'user' netdev for ppc64 e500 test
  docs: Rename default-configs to configs

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout

This tool converts a disk image to qcow2, writing the result directly
to stdout. This can be used for example to send the generated file
over the network.

This is equivalent to using qemu-img to convert a file to qcow2 and
then writing the result to stdout, with the difference that this tool
does not need to create this temporary qcow2 file and therefore does
not need any additional disk space.

Implementing this directly in qemu-img is not really an option because
it expects the output file to be seekable and it is also meant to be a
generic tool that supports all combinations of file formats and image
options. Instead, this tool can only produce qcow2 files with the
basic options, without compression, encryption or other features.

The input file is read twice. The first pass is used to determine
which clusters contain non-zero data and that information is used to
create the qcow2 header, refcount table and blocks, and L1 and L2
tables. After all that metadata is created then the second pass is
used to write the guest data.

By default qcow2-to-stdout.py expects the input to be a raw file, but
if qemu-storage-daemon is available then it can also be used to read
images in other formats. Alternatively the user can also run qemu-nbd
or qemu-storage-daemon manually instead.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Madeeha Javed <javed@igalia.com>
Message-ID: <20240730141552.60404-1-berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-scsi: only expose cmd vqs via iothread-vq-mapping

Peter Krempa and Kevin Wolf observed that iothread-vq-mapping is
confusing to use because the control and event virtqueues have a fixed
location before the command virtqueues but need to be treated
differently.

Only expose the command virtqueues via iothread-vq-mapping so that the
command-line parameter is intuitive: it controls where SCSI requests are
processed.

The control virtqueue needs to be hardcoded to the main loop thread for
technical reasons anyway. Kevin also pointed out that it's better to
place the event virtqueue in the main loop thread since its no poll
behavior would prevent polling if assigned to an IOThread.

This change is its own commit to avoid squashing the previous commit.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250311132616.1049687-14-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-scsi: handle ctrl virtqueue in main loop

Previously the ctrl virtqueue was handled in the AioContext where SCSI
requests are processed. When IOThread Virtqueue Mapping was added things
become more complicated because SCSI requests could run in other
AioContexts.

Simplify by handling the ctrl virtqueue in the main loop where reset
operations can be performed. Note that BHs are still used canceling SCSI
requests in their AioContexts but at least the mean loop activity
doesn't need BHs anymore.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250311132616.1049687-13-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-scsi: add iothread-vq-mapping parameter

Allow virtio-scsi virtqueues to be assigned to different IOThreads. This
makes it possible to take advantage of host multi-queue block layer
scalability by assigning virtqueues that have affinity with vCPUs to
different IOThreads that have affinity with host CPUs. The same feature
was introduced for virtio-blk in the past:
https://developers.redhat.com/articles/2024/09/05/scaling-virtio-blk-disk-io-iothread-virtqueue-mapping

Here are fio randread 4k iodepth=64 results from a 4 vCPU guest with an
Intel P4800X SSD:
iothreads IOPS
------------------------------
1         189576
2         312698
4         346744

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250311132616.1049687-12-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
[kwolf: Updated 051 output, virtio-scsi can now use any iothread]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio: extract iothread-vq-mapping.h API

The code that builds an array of AioContext pointers indexed by the
virtqueue is not specific to virtio-blk. virtio-scsi will need to do the
same thing, so extract the functions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-11-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-blk: tidy up iothread_vq_mapping functions

Use noun_verb() function naming instead of verb_noun() because the
former is the most common naming style for APIs. The next commit will
move these functions into a header file so that virtio-scsi can call
them.

Shorten iothread_vq_mapping_apply()'s iothread_vq_mapping_list argument
to just "list" like in the other functions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-10-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-blk: extract cleanup_iothread_vq_mapping() function

This is the cleanup function that must be called after
apply_iothread_vq_mapping() succeeds. virtio-scsi will need this
function too, so extract it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-9-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-scsi: perform TMFs in appropriate AioContexts

With IOThread Virtqueue Mapping there will be multiple AioContexts
processing SCSI requests. scsi_req_cancel() and other SCSI request
operations must be performed from the AioContext where the request is
running.

Introduce a virtio_scsi_defer_tmf_to_aio_context() function and the
necessary VirtIOSCSIReq->remaining refcount infrastructure to move the
TMF code into the AioContext where the request is running.

For the time being there is still just one AioContext: the main loop or
the IOThread. When the iothread-vq-mapping parameter is added in a later
patch this will be changed to per-virtqueue AioContexts.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-8-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-scsi: protect events_dropped field

The block layer can invoke the resize callback from any AioContext that
is processing requests. The virtqueue is already protected but the
events_dropped field also needs to be protected against races. Cover it
using the event virtqueue lock because it is closely associated with
accesses to the virtqueue.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-7-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

virtio-scsi: introduce event and ctrl virtqueue locks

Virtqueues are not thread-safe. Until now this was not a major issue
since all virtqueue processing happened in the same thread. The ctrl
queue's Task Management Function (TMF) requests sometimes need the main
loop, so a BH was used to schedule the virtqueue completion back in the
thread that has virtqueue access.

When IOThread Virtqueue Mapping is introduced in later commits, event
and ctrl virtqueue accesses from other threads will become necessary.
Introduce an optional per-virtqueue lock so the event and ctrl
virtqueues can be protected in the commits that follow.

The addition of the ctrl virtqueue lock makes
virtio_scsi_complete_req_from_main_loop() and its BH unnecessary.
Instead, take the ctrl virtqueue lock from the main loop thread.

The cmd virtqueue does not have a lock because the entirety of SCSI
command processing happens in one thread. Only one thread accesses the
cmd virtqueue and a lock is unnecessary.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-6-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

scsi: introduce requests_lock

SCSIDevice keeps track of in-flight requests for device reset and Task
Management Functions (TMFs). The request list requires protection so
that multi-threaded SCSI emulation can be implemented in commits that
follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-5-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

scsi: track per-SCSIRequest AioContext

Until now, a SCSIDevice's I/O requests have run in a single AioContext.
In order to support multiple IOThreads it will be necessary to move to
the concept of a per-SCSIRequest AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-4-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

dma: use current AioContext for dma_blk_io()

In the past a single AioContext was used for block I/O and it was
fetched using blk_get_aio_context(). Nowadays the block layer supports
running I/O from any AioContext and multiple AioContexts at the same
time. Remove the dma_blk_io() AioContext argument and use the current
AioContext instead.

This makes calling the function easier and enables multiple IOThreads to
use dma_blk_io() concurrently for the same block device.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-3-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

scsi-disk: drop unused SCSIDiskState->bh field

Commit 71544d30a6f8 ("scsi: push request restart to SCSIDevice") removed
the only user of SCSIDiskState->bh.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-2-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Limit qsd-migrate to working formats

qsd-migrate is currently only working for raw, qcow2 and qed.
Other formats are failing, e.g. because they don't support migration.
Thus let's limit this test to the three usable formats now.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250224214058.205889-1-thuth@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: Adjust polling time also for new handlers

aio_dispatch_handler() adds handlers to ctx->poll_aio_handlers if
polling should be enabled. If we call adjust_polling_time() for all
polling handlers before this, new polling handlers are still left at
poll->ns = 0 and polling is only actually enabled after the next event.
Move the adjust_polling_time() call after aio_dispatch_handler().

This fixes test-nested-aio-poll, which expects that polling becomes
effective the first time around.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311141912.135657-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: Separate AioPolledEvent per AioHandler

Adaptive polling has a big problem: It doesn't consider that an event
loop can wait for many different events that may have very different
typical latencies.

For example, think of a guest that tends to send a new I/O request soon
after the previous I/O request completes, but the storage on the host is
rather slow. In this case, getting the new request from guest quickly
means that polling is enabled, but the next thing is performing the I/O
request on the backend, which is slow and disables polling again for the
next guest request. This means that in such a scenario, polling could
help for every other event, but is only ever enabled when it can't
succeed.

In order to fix this, keep a separate AioPolledEvent for each
AioHandler. We will then know that the backend file descriptor always
has a high latency and isn't worth polling for, but we also know that
the guest is always fast and we should poll for it. This solves at least
half of the problem, we can now keep polling for those cases where it
makes sense and get the improved performance from it.

Since the event loop doesn't know which event will be next, we still do
some unnecessary polling while we're waiting for the slow disk. I made
some attempts to be more clever than just randomly growing and shrinking
the polling time, and even to let callers be explicit about when they
expect a new event, but so far this hasn't resulted in improved
performance or even caused performance regressions. For now, let's just
fix the part that is easy enough to fix, we can revisit the rest later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio-posix: Factor out adjust_polling_time()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-5-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio: Create AioPolledEvent

As a preparation for having multiple adaptive polling states per
AioContext, move the 'ns' field into a separate struct.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-4-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/io: Ignore FUA with cache.no-flush=on

For block drivers that don't advertise FUA support, we already call
bdrv_co_flush(), which considers BDRV_O_NO_FLUSH. However, drivers that
do support FUA still see the FUA flag with BDRV_O_NO_FLUSH and get the
associated performance penalty that cache.no-flush=on was supposed to
avoid.

Clear FUA for write requests if BDRV_O_NO_FLUSH is set.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

file-posix: Support FUA writes

Until now, FUA was always emulated with a separate flush after the write
for file-posix. The overhead of processing a second request can reduce
performance significantly for a guest disk that has disabled the write
cache, especially if the host disk is already write through, too, and
the flush isn't actually doing anything.

Advertise support for REQ_FUA in write requests and implement it for
Linux AIO and io_uring using the RWF_DSYNC flag for write requests. The
thread pool still performs a separate fdatasync() call. This can be
improved later by using the pwritev2() syscall if available.

As an example, this is how fio numbers can be improved in some scenarios
with this patch (all using virtio-blk with cache=directsync on an nvme
block device for the VM, fio with ioengine=libaio,direct=1,sync=1):

                              | old           | with FUA support
------------------------------+---------------+-------------------
bs=4k, iodepth=1, numjobs=1   |  45.6k iops   |  56.1k iops
bs=4k, iodepth=1, numjobs=16  | 183.3k iops   | 236.0k iops
bs=4k, iodepth=16, numjobs=1  | 258.4k iops   | 311.1k iops

However, not all scenarios are clear wins. On another slower disk I saw
little to no improvment. In fact, in two corner case scenarios, I even
observed a regression, which I however consider acceptable:

1. On slow host disks in a write through cache mode, when the guest is
   using virtio-blk in a separate iothread so that polling can be
   enabled, and each completion is quickly followed up with a new
   request (so that polling gets it), it can happen that enabling FUA
   makes things slower - the additional very fast no-op flush we used to
   have gave the adaptive polling algorithm a success so that it kept
   polling. Without it, we only have the slow write request, which
   disables polling. This is a problem in the polling algorithm that
   will be fixed later in this series.

2. With a high queue depth, it can be beneficial to have flush requests
   for another reason: The optimisation in bdrv_co_flush() that flushes
   only once per write generation acts as a synchronisation mechanism
   that lets all requests complete at the same time. This can result in
   better batching and if the disk is very fast (I only saw this with a
   null_blk backend), this can make up for the overhead of the flush and
   improve throughput. In theory, we could optionally introduce a
   similar artificial latency in the normal completion path to achieve
   the same kind of completion batching. This is not implemented in this
   series.

Compatibility is not a concern for the kernel side of io_uring, it has
supported RWF_DSYNC from the start. However, io_uring_prep_writev2() is
not available before liburing 2.2.

Linux AIO started supporting it in Linux 4.13 and libaio 0.3.111. The
kernel is not a problem for any supported build platform, so it's not
necessary to add runtime checks. However, openSUSE is still stuck with
an older libaio version that would break the build.

We must detect the presence of the writev2 functions in the user space
libraries at build time to avoid build failures.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-2-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/functional: skip vulkan test if missing vulkaninfo

I could have sworn I had this is a previous iteration of the patches
but I guess it got lost in a re-base. As we are going to call
vulkaninfo to probe for "bad" drivers we need to skip if the binary
isn't available.

Fixes: 9f7e493d11 (tests/functional: skip vulkan tests with nVidia)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20250312190314.1632357-1-alex.bennee@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

Merge tag 'hw-misc-20250312' of https://github.com/philmd/qemu into staging

Misc HW patches

- Set correct values for MPC8569E's eSDHC (Zoltan)
- Emulate Ricoh RS5C372 RTC device (Bernhard)
- Array overflow fixes in SMSC91C111 netdev (Peter)
- Fix typo in Xen HVM (Philippe)
- Move graphic height/width/depth globals to their own file (Philippe)
- Introduce qemu_arch_available() helper (Philippe)
- Check fw_cfg's ACPI availability at runtime (Philippe)
- Remove virtio-mem dependency on CONFIG_DEVICES (Philippe)
- Sort HyperV SYNDBG API definitions (Pierrick)
- Remove need for SDHCI_VENDOR_FSL definition (Philippe)

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmfRXiMACgkQ4+MsLN6t
# wN5zFhAAzSW/hZneD8hycKtr9nBlvZSD72cEt+b656OCbTyyucUi1sG4rMPMvHeW
# h6HP6xt2SfQxXbec6Y0pWxWUkBOQzk72s0zpttOED3oEspkrId2D+VSsSH1E+QLh
# WoG7/hVgz0bDHexWYIDdGufO4no/icwewAKmC5Kp2HbaNxIIHyWlK1+RO69/lCLN
# s3qkNesMsQyEWN28ogEMRqyCIG3oJVP76U4TVcdxIiE51WI8sP8/7V2um0AXN68m
# IV3INrfVJjGDp501elrUbD3qsYopRdxoMAvwiVojrLXin6xtS+SQjEe/hcNxzM70
# 0IQPp9WWwLjNkeFlAJF4wpwGJttFNHj+5gtH7/YRrP75jt9kAxPXkFw/OFfpVd30
# NYbeFlWDhRL1QPBs+WPBZTrfD7fRmpfMJRLF3/w61+WvnVrshlyDaoCWbR+L329F
# uOQFsBdAD7m/lkZ0mHtskS2vkZx7Itn1av4gql7T7/6cE1R7ItKy1HY9UUCtY6Gp
# 7V6XrsAE3khg2HY8IcJ73+sPLQn/GxqZFE7PqmAhgcl6RZEFQv8PNrEgFxCEYyuK
# KJjx0hRMLoigp0CEclLfOqz2d3knsI8SJbgD4iTYQc02E69lx8a4XS4N8JXoLEdh
# 3i/ndwKEFmzwNuqbU0nYsSJDiAO9ejra8O2BXZS/a4pkxC2jtdw=
# =VVr6
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 12 Mar 2025 18:12:51 HKT
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* tag 'hw-misc-20250312' of https://github.com/philmd/qemu:
  hw/sd/sdhci: Remove need for SDHCI_VENDOR_IMX definition
  hw/hyperv/hyperv-proto: Move SYNDBG definitions from target/i386
  hw/virtio/virtio-mem: Remove CONFIG_DEVICES include
  hw/i386/fw_cfg: Check ACPI availability with acpi_builtin()
  hw/acpi: Introduce acpi_builtin() helper
  system: Replace arch_type global by qemu_arch_available() helper
  system: Extract target-specific globals to their own compilation unit
  hw/xen/hvm: Fix Aarch64 typo
  hw/net/smc91c111: Don't allow data register access to overrun buffer
  hw/net/smc91c111: Use MAX_PACKET_SIZE instead of magic numbers
  hw/net/smc91c111: Sanitize packet length on tx
  hw/net/smc91c111: Sanitize packet numbers
  hw/rtc: Add Ricoh RS5C372 RTC emulation
  hw/sd/sdhci: Set reset value of interrupt registers

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-vfio-20250311' of https://github.com/legoater/qemu into staging

vfio queue:

* Fixed endianness of VFIO device state packets
* Improved IGD passthrough support with legacy mode
* Improved build
* Added support for old AMD GPUs (x550)
* Updated property documentation

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmfQfQcACgkQUaNDx8/7
# 7KEUNw/+PjFpHrz5muQ8itkbyd36eJJdcxCl+9IPIWfnUfB582epkLcgvWyswGUo
# krFTregoRG0PKtgZDtv95owGtVJOgK6XYFadGHiYkvvsb41twOYsP7/SuI+KMiEv
# IDFLMvCTyorSIIoEF8i2EexfGPRV1VoWwvBoHgRRmYlzwzXnufjABpoZ0a25DTye
# DQ4yhSfqoIh1gOcdL9tPictnZg9OxKr2ePXNdrtymtEIhg3ZobD3Jd8J4WCcsfKT
# fxxBO5NsGgA8oM7i02fYN9kgMwqTnVhSAu1wq9PXsbrnNXam+trywAWSO6CjL+rV
# ++STWNSrRoHzuotRBr7BzrTpTFyQyfwBWqUT5L4NlhgXB3Xybk+M6Zj08Yva8pjE
# w78JQKvKp54gU34AWBW0/J6+u3v+iE8l1Eywx6xueF9Q+YSUDeW9B1LDdjFJryhF
# d8j3J+vuglbdsp05D+tVErf5cqFvFDfrjTkXkZNtmx7wky45XS9ZvNazYW1KI3f9
# bg8Wjb7ZujuvxpSjycPRZzdKa8kqSgSZg7fg91Wimiy1Iqe3SZVVWNchLYiPp8Dm
# nXMfOEpVHQZ1vzeo7dVWyxu9Y1ujgvUQy8kMa9q2W2S7HQ5Sna79n7eMVJxqZQ4G
# m0ETFToOcPPOnZBWgqNOSUlSQncFuIVgNTDvycQ9dMhGorYcBDI=
# =Vh0m
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 12 Mar 2025 02:12:23 HKT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg:                 aka "Cédric Le Goater <clg@kaod.org>" [full]
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20250311' of https://github.com/legoater/qemu: (21 commits)
  vfio/pci: Drop debug commentary from x-device-dirty-page-tracking
  vfio/pci-quirks: Exclude non-ioport BAR from ATI quirk
  hw/vfio: Compile display.c once
  hw/vfio: Compile iommufd.c once
  hw/vfio: Compile more objects once
  hw/vfio: Compile some common objects once
  hw/vfio/common: Get target page size using runtime helpers
  hw/vfio/common: Include missing 'system/tcg.h' header
  hw/vfio/spapr: Do not include <linux/kvm.h>
  system: Declare qemu_[min/max]rampagesize() in 'system/hostmem.h'
  vfio/migration: Use BE byte order for device state wire packets
  vfio/igd: Fix broken KVMGT OpRegion support
  vfio/igd: Introduce x-igd-lpc option for LPC bridge ID quirk
  vfio/igd: Handle x-igd-opregion option in config quirk
  vfio/igd: Decouple common quirks from legacy mode
  vfio/igd: Refactor vfio_probe_igd_bar4_quirk into pci config quirk
  vfio/pci: Add placeholder for device-specific config space quirks
  vfio/igd: Move LPC bridge initialization to a separate function
  vfio/igd: Consolidate OpRegion initialization into a single function
  vfio/igd: Do not include GTT stolen size in etc/igd-bdsm-size
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-ppc-for-10.0-1-20250311' of https://gitlab.com/npiggin/qemu into staging

* Next round of XIVE patches...

* tag 'pull-ppc-for-10.0-1-20250311' of https://gitlab.com/npiggin/qemu: (72 commits)
  docs/system/ppc/amigang.rst: Update for NVRAM emulation
  ppc/amigaone: Add #defines for memory map constants
  ppc/amigaone: Add kernel and initrd support
  ppc/amigaone: Add default environment
  ppc/amigaone: Implement NVRAM emulation
  ppc/amigaone: Simplify replacement dummy_fw
  spapr: Generate random HASHPKEYR for spapr machines
  target/ppc: Avoid warning message for zero process table entries
  target/ppc: Wire up BookE ATB registers for e500 family
  target/ppc: fix timebase register reset state
  spapr: nested: Add support for reporting Hostwide state counter
  ppc: spapr: Enable 2nd DAWR on Power10 pSeries machine
  ppc: Enable 2nd DAWR support on Power10 PowerNV machine
  hw/ppc/epapr: Do not swap ePAPR magic value
  hw/ppc/spapr: Convert DIRTY_HPTE() macro as hpte_set_dirty() method
  hw/ppc/spapr: Convert CLEAN_HPTE() macro as hpte_set_clean() method
  hw/ppc/spapr: Convert HPTE_DIRTY() macro as hpte_is_dirty() method
  hw/ppc/spapr: Convert HPTE_VALID() macro as hpte_is_valid() method
  hw/ppc/spapr: Convert HPTE() macro as hpte_get_ptr() method
  target/ppc: Restrict ATTN / SCV / PMINSN helpers to TCG
  ...

[Fix __packed macro redefinition on FreeBSD 14 hosts:
../hw/ppc/pnv_occ.c:397:9: error: '__packed' macro redefined [-Werror,-Wmacro-redefined]
  397 | #define __packed QEMU_PACKED
      |         ^
/usr/include/sys/cdefs.h:217:9: note: previous definition is here
  217 | #define __packed        __attribute__((__packed__))
      |         ^
--Stefan]

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

tests/functional/asset: Add AssetError exception class

Assets are uniquely identified by human-readable-ish url, so make an
AssetError exception class that prints url with error message.

A property 'transient' is used to capture whether the client may retry
or try again later, or if it is a serious and likely permanent error.
This is used to retain the existing behaviour of treating HTTP errors
other than 404 as 'transient' and not causing precache step to fail.
Additionally, partial-downloads and stale asset caches that fail to
resolve after the retry limit are now treated as transient and do not
cause precache step to fail.

For background: The NetBSD archive is, at the time of writing, failing
with short transfer. Retrying the fetch at that position (as wget does)
results in a "503 backend unavailable" error. We would like to get that
error code directly, but I have not found a way to do that with urllib,
so treating the short-copy as a transient failure covers that case (and
seems like a reasonable way to handle it in general).

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250312130002.945508-4-npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/functional/asset: Verify downloaded size

If the server provides a Content-Length header, use that to verify the
size of the downloaded file. This catches cases where the connection
terminates early, and gives the opportunity to retry. Without this, the
checksum will likely mismatch and fail without retry.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250312130002.945508-3-npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/functional/asset: Fail assert fetch when retries are exceeded

Currently the fetch code does not fail gracefully when retry limit is
exceeded, it just falls through the loop with no file, which ends up
hitting other errors.

Add a check for non-existing file, which indicates the retry limit was
exceeded.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250312130002.945508-2-npiggin@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

docs/system: Fix the information on how to run certain functional tests

The tests have been converted to the functional framework, so
we should not talk about Avocado here anymore.

Fixes: f7d6b772200 ("tests/functional: Convert BananaPi tests to the functional framework")
Fixes: 380f7268b7b ("tests/functional: Convert the OrangePi tests to the functional framework")
Fixes: 4c0a2df81c9 ("tests/functional: Convert some tests that download files via fetch_asset()")
Message-ID: <20250311160847.388670-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/functional: Bump up arm_replay timeout

On my machine the arm_replay test takes over 2 minutes to run
in a config with Rust enabled and debug enabled:

$ time (cd build/rust ; PYTHONPATH=../../python:../../tests/functional
QEMU_TEST_QEMU_BINARY=./qemu-system-arm ./pyvenv/bin/python3
../../tests/functional/test_arm_replay.py)
TAP version 13
ok 1 test_arm_replay.ArmReplay.test_cubieboard
ok 2 test_arm_replay.ArmReplay.test_vexpressa9
ok 3 test_arm_replay.ArmReplay.test_virt
1..3

real    2m16.564s
user    2m13.461s
sys     0m3.523s

Bump up the timeout to 4 minutes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250310102830.3752440-1-peter.maydell@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/functional: Require 'user' netdev for ppc64 e500 test

When commit 72cdd672e18c extended the ppc64 e500 test to add network
support, it forgot to require the 'user' netdev backend. Fix that.

Fixes: 72cdd672e18c ("tests/functional: Replace the ppc64 e500 advent calendar test")
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250308071328.193694-1-clg@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

docs: Rename default-configs to configs

This was missed at the time.

Fixes: 812b31d3f91 ("configs: rename default-configs to configs and reorganise")
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250306174113.427116-1-groug@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw/sd/sdhci: Remove need for SDHCI_VENDOR_IMX definition

All instances of TYPE_IMX_USDHC set vendor=SDHCI_VENDOR_IMX.
No need to special-case it.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20250308213640.13138-3-philmd@linaro.org>

Merge tag 'pull-qapi-2025-03-11' of https://repo.or.cz/qemu/armbru into staging

QAPI patches patches for 2025-03-11

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmfQCnkSHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTsJ0P/jcXiyFxjcbXN/3a6+iuPPqlviiWPAKG
# db2aHn2divceFEf7hUrwqjiJIPLDxaq6iJy71bjPUDkE8wAEdsf2zD7ryHo+sGcO
# rWaSaHmonn0QHvqcvkGGrbmTH+Ezl1RpP8XVGfG2lmHbjPQ3+EYnRwML6jC8dnvR
# C7qkyQ+qxmdV2lWb4MalgABKZToZ2aqnI9lr9KzHmN+55i2OxJrhECUKDHcgtG2i
# Pqc1GLGmmQ4Wj+4z0PyvKYZS4LP/90eH8bNyeA6TVsPHxgG79pencct7DOHxhc8q
# hHQ1TaqcBeWFQ7tndLMNDnHjm9XpAzMuew87xMTo6R450JxiSn+AkioTE0L563hy
# SjeXmIQ8COZbHsuSKlFJcV1OS1c/mJbwpkxptyaMLjTt2Lp9geFs39WKWHcs8pCN
# EmWSdvoqmP7D4bp1hXAVSPIIvJ7L2NwnM8ONH0KmRD5uMQrjiHsfvyWHAVnT10yu
# 8822hjlJp7l3B1QCi19mTlkiztCFScjb3Se8A+jScP5iX0q9C4H4t+tAw2m4UY1V
# pvn4xFxV82CvR3uQI0OMTKhp0/eEfvBioA1PEXOegPH5cS/L7YFF59mta1dCnaL7
# 0JRRCsTAnwAAAXoEteGqF1/6tXBdOnroL0OvHXJQVb2HH5c5YTnuxMiQywcP6Jty
# wt1vl42jfTj1
# =Gt4B
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 11 Mar 2025 18:03:37 HKT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-qapi-2025-03-11' of https://repo.or.cz/qemu/armbru: (61 commits)
  scripts/qapi/backend: Clean up create_backend()'s failure mode
  MAINTAINERS: Add jsnow as maintainer for Sphinx documentation
  docs: add qapi-domain syntax documentation
  docs: enable qapidoc transmogrifier for QEMU QMP Reference
  docs: disambiguate cross-references
  qapi/parser: add undocumented stub members to all_sections
  docs/qapidoc: generate entries for undocumented members
  docs/qapidoc: Add "the members of" pointers
  docs/qapidoc: add intermediate output debugger
  docs/qapidoc: process @foo into ``foo``
  docs/qapidoc: implement transmogrify() method
  docs/qapidoc: add visit_entity()
  docs/qapidoc: add visit_sections() method
  docs/qapidoc: add visit_member() method
  docs/qapidoc: add visit_returns() method
  docs/qapidoc: prepare to record entity being transmogrified
  docs/qapidoc: add visit_feature() method
  docs/qapidoc: add add_field() and generate_field() helper methods
  docs/qapidoc: add format_type() method
  docs/qapidoc: add visit_errors() method
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

A tracing cleanup.

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmfPpaMACgkQnKSrs4Gr
# c8iC2wf/WuKijQF2eQ6R5kVY/z3H+8eg1oR3MaeRgnzFDf5Dp9H4JxNEPXssdC7p
# Dg0mXL2FhdaaQcZ9VAuyEJGtGkcbNzpXixLto3+d1SNK4fWv1VlPASp8GiDkKxpt
# nGhChUUVXLIv/wRX/eOVEuBFrUdDl/2Ri/3dMij0cZsa361KiSIygHQqF3QyspIr
# crU9B1+7ti38x/Zem+J+Wrb4VHRgJk29QUqLnH4w9j4p3LtE5cfUndlTnx28Xwkl
# bZ45XCnEu2GabaSrOmGGiAyC89w6iuxxwsnlVqg0g8fyxpUbzfhsh70FCalKfgWo
# TetDo7penusK2CBlWbrCA5BKMF29Tg==
# =2HWS
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 11 Mar 2025 10:53:23 HKT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [ultimate]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [ultimate]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu:
  trace/control-target: cleanup headers and make compilation unit common

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

hw/hyperv/hyperv-proto: Move SYNDBG definitions from target/i386

Allows SYNDBG definitions to be available for common compilation units.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20250307215623.524987-5-pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/virtio/virtio-mem: Remove CONFIG_DEVICES include

Rather than checking ACPI availability at compile time by
checking the CONFIG_ACPI definition from CONFIG_DEVICES,
check at runtime via acpi_builtin().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20250307223949.54040-5-philmd@linaro.org>

hw/i386/fw_cfg: Check ACPI availability with acpi_builtin()

Define acpi_tables / acpi_tables_len stubs, then replace the
compile-time CONFIG_ACPI check in fw_cfg.c by a runtime one.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20250307223949.54040-4-philmd@linaro.org>

hw/acpi: Introduce acpi_builtin() helper

acpi_builtin() can be used to check at runtime whether
the ACPI subsystem is built in a qemu-system binary.

Reviewed-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250307223949.54040-3-philmd@linaro.org>

system: Replace arch_type global by qemu_arch_available() helper

qemu_arch_available() is a bit simpler to understand while
reviewing than the undocumented arch_type variable.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250305005225.95051-5-philmd@linaro.org>

system: Extract target-specific globals to their own compilation unit

We shouldn't use target specific globals for machine properties.
These ones could be desugarized, as explained in [*]. While
certainly doable, not trivial nor my priority for now. Just move
them to a different file to clarify they are *globals*, like the
generic globals residing in system/globals.c.

Since arch_init.c was introduced using the MIT license (see commit
ad96090a01d), retain the same license for the new globals-target.c
file.

[*] https://lore.kernel.org/qemu-devel/e514d6db-781d-4afe-b057-9046c70044dc@redhat.com/

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250305005225.95051-2-philmd@linaro.org>

hw/xen/hvm: Fix Aarch64 typo

There is no TARGET_ARM_64 definition. Luckily enough,
when TARGET_AARCH64 is defined, TARGET_ARM also is.

Fixes: 733766cd373 ("hw/arm: introduce xenpvh machine")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250305153929.43687-2-philmd@linaro.org>

hw/net/smc91c111: Don't allow data register access to overrun buffer

For accesses to the 91c111 data register, the address within the
packet's data frame is determined by a combination of the pointer
register and the offset used to access the data register, so that you
can access data at effectively wider than byte width.  The pointer
register's pointer field is 11 bits wide, which is exactly the size
to index a 2048-byte data frame.

We weren't quite getting the logic right for ensuring that we end up
with a pointer value to use in the s->data[][] array that isn't out
of bounds:

* we correctly mask when getting the initial pointer value
* for the "autoincrement the pointer register" case, we
   correctly mask after adding 1 so that the pointer register
   wraps back around at the 2048 byte mark
* but for the non-autoincrement case where we have to add the
   low 2 bits of the data register offset, we don't account
   for the possibility that the pointer register is 0x7ff
   and the addition should wrap

Fix this bug by factoring out the "get the p value to use as an array
index" into a function, making it use FIELD macro names rather than
hard-coded constants, and having a utility function that does "add a
value and wrap it" that we can use both for the "autoincrement" and
"add the offset bits" codepaths.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2758
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228191652.1957208-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/net/smc91c111: Use MAX_PACKET_SIZE instead of magic numbers

Now we have a constant for the maximum packet size, we can use it
to replace various hardcoded 2048 values.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228174802.1945417-4-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/net/smc91c111: Sanitize packet length on tx

When the smc91c111 transmits a packet, it must read a control byte
which is at the end of the data area and CRC.  However, we don't
sanitize the length field in the packet buffer, so if the guest sets
the length field to something large we will try to read past the end
of the packet data buffer when we access the control byte.

As usual, the datasheet says nothing about the behaviour of the
hardware if the guest misprograms it in this way.  It says only that
the maximum valid length is 2048 bytes.  We choose to log the guest
error and silently drop the packet.

This requires us to factor out the "mark the tx packet as complete"
logic, so we can call it for this "drop packet" case as well as at
the end of the loop when we send a valid packet.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2742
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228174802.1945417-3-peter.maydell@linaro.org>
[PMD: Update smc91c111_do_tx() as len > MAX_PACKET_SIZE]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/net/smc91c111: Sanitize packet numbers

The smc91c111 uses packet numbers as an index into its internal
s->data[][] array. Valid packet numbers are between 0 and 3, but
the code does not generally check this, and there are various
places where the guest can hand us an arbitrary packet number
and cause an out-of-bounds access to the data array.

Add validation of packet numbers. The datasheet is not very
helpful about how guest errors like this should be handled:
it says nothing on the subject, and none of the documented
error conditions are relevant. We choose to log the situation
with LOG_GUEST_ERROR and silently ignore the attempted operation.

In the places where we are about to access the data[][] array
using a packet number and we know the number is valid because
we got it from somewhere that has already validated, we add
an assert() to document that belief.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250228174802.1945417-2-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/rtc: Add Ricoh RS5C372 RTC emulation

The implementation just allows Linux to determine date and time.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20250223114708.1780-19-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

hw/sd/sdhci: Set reset value of interrupt registers

The interrupt enable registers are not reset to 0 on Freescale eSDHC
but some bits are enabled on reset. At least some U-Boot versions seem
to expect this and not initialise these registers before expecting
interrupts. Use existing vendor property for Freescale eSDHC and set
the reset value of the interrupt registers to match Freescale
documentation.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-ID: <20250210160329.DDA7F4E600E@zero.eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

vfio/pci: Drop debug commentary from x-device-dirty-page-tracking

The intent behind the x-device-dirty-page-tracking option is twofold:

1) development/testing in the presence of VFs with VF dirty page tracking

2) deliberately choosing platform dirty tracker over the VF one.

Item 2) scenario is useful when VF dirty tracker is not as fast as
IOMMU, or there's some limitations around it (e.g. number of them is
limited; aggregated address space under tracking is limited),
efficiency/scalability (e.g. 1 pagetable in IOMMU dirty tracker to scan
vs N VFs) or just troubleshooting. Given item 2 it is not restricted to
debugging, hence drop the debug parenthesis from the option description.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250311174807.79825-1-joao.m.martins@oracle.com
[ clg: Fixed subject spelling ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci-quirks: Exclude non-ioport BAR from ATI quirk

The ATI BAR4 quirk is targeting an ioport BAR. Older devices may
have a BAR4 which is not an ioport, causing a segfault here. Test
the BAR type to skip these devices.

Similar to
"8f419c5b: vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk"

Untested, as I don't have the card to test.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2856
Signed-off-by: Vasilis Liaskovitis <vliaskovitis@suse.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250310235833.41026-1-vliaskovitis@suse.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio: Compile display.c once

display.c doesn't rely on target specific definitions,
move it to system_ss[] to build it once.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-8-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-9-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio: Compile iommufd.c once

Removing unused "exec/ram_addr.h" header allow to compile
iommufd.c once for all targets.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-6-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-8-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio: Compile more objects once

These files depend on the VFIO symbol in their Kconfig
definition. They don't rely on target specific definitions,
move them to system_ss[] to build them once.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-5-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-7-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio: Compile some common objects once

Some files don't rely on any target-specific knowledge
and can be compiled once:

- helpers.c
- container-base.c
- migration.c (removing unnecessary "exec/ram_addr.h")
- migration-multifd.c
- cpr.c

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-4-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-6-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio/common: Get target page size using runtime helpers

Prefer runtime helpers to get target page size.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250305153929.43687-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-5-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio/common: Include missing 'system/tcg.h' header

Always include necessary headers explicitly, to avoid
when refactoring unrelated ones:

  hw/vfio/common.c:1176:45: error: implicit declaration of function ‘tcg_enabled’;
   1176 |                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
        |                                             ^~~~~~~~~~~

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250307180337.14811-2-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-4-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

hw/vfio/spapr: Do not include <linux/kvm.h>

<linux/kvm.h> is already included by "system/kvm.h" in the next line.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250307180337.14811-3-philmd@linaro.org>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-3-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

system: Declare qemu_[min/max]rampagesize() in 'system/hostmem.h'

Both qemu_minrampagesize() and qemu_maxrampagesize() are
related to host memory backends, having the following call
stack:

  qemu_minrampagesize()
     -> find_min_backend_pagesize()
         -> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)

  qemu_maxrampagesize()
     -> find_max_backend_pagesize()
        -> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)

Having TYPE_MEMORY_BACKEND defined in "system/hostmem.h":

  include/system/hostmem.h:23:#define TYPE_MEMORY_BACKEND "memory-backend"

Move their prototype declaration to "system/hostmem.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250308230917.18907-7-philmd@linaro.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-2-philmd@linaro.org
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/migration: Use BE byte order for device state wire packets

Wire data commonly use BE byte order (including in the existing migration
protocol), use it also for for VFIO device state packets.

This will allow VFIO multifd device state transfer between hosts with
different endianness.
Although currently there is no such use case, it's good to have it now
for completeness.

Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/qemu-devel/dcfc04cc1a50655650dbac8398e2742ada84ee39.1741611079.git.maciej.szmigiero@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Fix broken KVMGT OpRegion support

The KVMGT/GVT-g vGPU also exposes OpRegion. But unlike IGD passthrough,
it only needs the OpRegion quirk. A previous change moved x-igd-opregion
handling to config quirk breaks KVMGT functionality as it brings extra
checks and applied other quirks. Here we check if the device is mdev
(KVMGT) or not (passthrough), and then applies corresponding quirks.

As before, users must manually specify x-igd-opregion=on to enable it
on KVMGT devices. In the future, we may check the VID/DID and enable
OpRegion automatically.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-11-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Introduce x-igd-lpc option for LPC bridge ID quirk

The LPC bridge/Host bridge IDs quirk is also not dependent on legacy
mode. Recent Windows driver no longer depends on these IDs, as well as
Linux i915 driver, while UEFI GOP seems still needs them. Make it an
option to allow users enabling and disabling it as needed.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-10-tomitamoeko@gmail.com
[ clg: - Fixed spelling in vfio_probe_igd_config_quirk() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Handle x-igd-opregion option in config quirk

Both enable OpRegion option (x-igd-opregion) and legacy mode require
setting up OpRegion copy for IGD devices. As the config quirk no longer
depends on legacy mode, we can now handle x-igd-opregion option there
instead of in vfio_realize.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-9-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Decouple common quirks from legacy mode

So far, IGD-specific quirks all require enabling legacy mode, which is
toggled by assigning IGD to 00:02.0. However, some quirks, like the BDSM
and GGC register quirks, should be applied to all supported IGD devices.
A new config option, x-igd-legacy-mode=[on|off|auto], is introduced to
control the legacy mode only quirks. The default value is "auto", which
keeps current behavior that enables legacy mode implicitly and continues
on error when all following conditions are met.
* Machine type is i440fx
* IGD device is at guest BDF 00:02.0

If any one of the conditions above is not met, the default behavior is
equivalent to "off", QEMU will fail immediately if any error occurs.

Users can also use "on" to force enabling legacy mode. It checks if all
the conditions above are met and set up legacy mode. QEMU will also fail
immediately on error in this case.

Additionally, the hotplug check in legacy mode is removed as hotplugging
IGD device is never supported, and it will be checked when enabling the
OpRegion quirk.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-8-tomitamoeko@gmail.com
[ clg: - Changed warn_report() by info_report() in
vfio_probe_igd_config_quirk() as suggested by Alex W.
- Fixed spelling in vfio_probe_igd_config_quirk () ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Refactor vfio_probe_igd_bar4_quirk into pci config quirk

The actual IO BAR4 write quirk in vfio_probe_igd_bar4_quirk was removed
in previous change, leaving the function not matching its name, so move
it into the newly introduced vfio_config_quirk_setup. There is no
functional change in this commit.

For now, to align with current legacy mode behavior, it returns and
proceeds on error. Later it will fail on error after decoupling the
quirks from legacy mode.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-7-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/pci: Add placeholder for device-specific config space quirks

IGD devices require device-specific quirk to be applied to their PCI
config space. Currently, it is put in the BAR4 quirk that does nothing
to BAR4 itself. Add a placeholder for PCI config space quirks to hold
that quirk later.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-6-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Move LPC bridge initialization to a separate function

A new option will soon be introduced to decouple the LPC bridge/Host
bridge ID quirk from legacy mode. To prepare for this, move the LPC
bridge initialization into a separate function.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-5-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Consolidate OpRegion initialization into a single function

Both x-igd-opregion option and legacy mode require identical steps to
set up OpRegion for IGD devices. Consolidate these steps into a single
vfio_pci_igd_setup_opregion function.

The function call in pci.c is wrapped with ifdef temporarily to prevent
build error for non-x86 archs, it will be removed after we decouple it
from legacy mode.

Additionally, move vfio_pci_igd_opregion_init to igd.c to prevent it
from being compiled in non-x86 builds.

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-4-tomitamoeko@gmail.com
[ clg: Fixed spelling in vfio_pci_igd_setup_opregion() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Do not include GTT stolen size in etc/igd-bdsm-size

Though GTT Stolen Memory (GSM) is right below Data Stolen Memory (DSM)
in host address space, direct access to GSM is prohibited, and it is
not mapped to guest address space. Both host and guest accesses GSM
indirectly through the second half of MMIO BAR0 (GTTMMADR).

Guest firmware only need to reserve a memory region for DSM and program
the BDSM register with the base address of that region, that's actually
what both SeaBIOS[1] and IgdAssignmentDxe does now.

[1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-3-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

vfio/igd: Remove GTT write quirk in IO BAR 4

The IO BAR4 of IGD devices contains a pair of 32-bit address/data
registers, MMIO_Index (0x0) and MMIO_Data (0x4), which provide access
to the MMIO BAR0 (GTTMMADR) from IO space. These registers are probably
only used by the VBIOS, and are not documented by intel. The observed
layout of MMIO_Index register is:
31                                                   2   1      0
+-------------------------------------------------------------------+
|                        Offset                        | Rsvd | Sel |
+-------------------------------------------------------------------+
- Offset: Byte offset in specified region, 4-byte aligned.
- Sel: Region selector
       0: MMIO register region (first half of MMIO BAR0)
       1: GTT region (second half of MMIO BAR0). Pre Gen11 only.

Currently, QEMU implements a quirk that adjusts the guest Data Stolen
Memory (DSM) region address to be (addr - host BDSM + guest BDSM) when
programming GTT entries via IO BAR4, assuming guest still programs GTT
with host DSM address, which is not the case. Guest's BDSM register is
emulated and initialized to 0 at startup by QEMU, then SeaBIOS programs
its value[1]. As result, the address programmed to GTT entries by VBIOS
running in guest are valid GPA, and this unnecessary adjustment brings
inconsistency.

[1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332

Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com>
Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-2-tomitamoeko@gmail.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>

block: Zero block driver state before reopening

Block drivers assume in their .bdrv_open() implementation that their
state in bs->opaque has been zeroed; it is initially allocated with
g_malloc0() in bdrv_open_driver().

bdrv_snapshot_goto() needs to make sure that it is zeroed again before
calling drv->bdrv_open() to avoid that block drivers use stale values.

One symptom of this bug is VMDK running into a double free when the user
tries to apply an internal snapshot like 'qemu-img snapshot -a test
test.vmdk'. This should be a graceful error because VMDK doesn't support
internal snapshots.

==25507== Invalid free() / delete / delete[] / realloc()
==25507==    at 0x484B347: realloc (vg_replace_malloc.c:1801)
==25507==    by 0x54B592A: g_realloc (gmem.c:171)
==25507==    by 0x1B221D: vmdk_add_extent (../block/vmdk.c:570)
==25507==    by 0x1B1084: vmdk_open_sparse (../block/vmdk.c:1059)
==25507==    by 0x1AF3D8: vmdk_open (../block/vmdk.c:1371)
==25507==    by 0x1A2AE0: bdrv_snapshot_goto (../block/snapshot.c:299)
==25507==    by 0x205C77: img_snapshot (../qemu-img.c:3500)
==25507==    by 0x58FA087: (below main) (libc_start_call_main.h:58)
==25507==  Address 0x832f3e0 is 0 bytes inside a block of size 272 free'd
==25507==    at 0x4846B83: free (vg_replace_malloc.c:989)
==25507==    by 0x54AEAC4: g_free (gmem.c:208)
==25507==    by 0x1AF629: vmdk_close (../block/vmdk.c:2889)
==25507==    by 0x1A2A9C: bdrv_snapshot_goto (../block/snapshot.c:290)
==25507==    by 0x205C77: img_snapshot (../qemu-img.c:3500)
==25507==    by 0x58FA087: (below main) (libc_start_call_main.h:58)

This error was discovered by fuzzing qemu-img.

Cc: qemu-stable@nongnu.org
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2853
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2851
Reported-by: Denis Rastyogin <gerben@altlinux.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250310104858.28221-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove unused blk_op_is_blocked()

Commit fc4e394b28 removed the last caller of blk_op_is_blocked(). Remove
the now unused function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250206165331.379033-1-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

docs/system/ppc/amigang.rst: Update for NVRAM emulation

Add NVRAM and hint on how to make it persistent. Also update Linux
boot section which should now boot automatically with the new NVRAM
defaults so manual settings in menu may not be needed normally.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250304205926.87E364E6010@zero.eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/amigaone: Add #defines for memory map constants

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <3b8e54ad9220d57e7b0a33f3570e880f26677ce8.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/amigaone: Add kernel and initrd support

Add support for -kernel, -initrd and -append command line options.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <489b1be5d95d5153e924c95b0691b8b53f9ffb9e.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/amigaone: Add default environment

Initialise empty NVRAM with default values. This also enables IDE UDMA
mode in AmigaOS that is faster but has to be enabled in environment
due to problems with real hardware but that does not affect emulation
so we can use faster defaults here.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <4d63f88191612329e0ca8102c7c0d4fc626dc372.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/amigaone: Implement NVRAM emulation

The board has a battery backed NVRAM where U-Boot environment is
stored which is also accessed by AmigaOS and e.g. C:NVGetVar command
crashes without it having at least a valid checksum.

[npiggin: 32-bit compile fix]
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <7e4c0107ef6bdc2b20fb1e780a188275c7dc1e49.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/amigaone: Simplify replacement dummy_fw

There's no need to do shift in a loop, doing it in one instruction
works just as well, only the result is used.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <446bf740cbb99422be2cc5a31e51a1034eddded7.1740673173.git.balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

spapr: Generate random HASHPKEYR for spapr machines

The hypervisor is expected to create a value for the HASHPKEY SPR for
each partition. Currently it uses zero for all partitions, use a
random number instead, which in theory might make kernel ROP protection
more secure.

Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241219034035.1826173-4-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Avoid warning message for zero process table entries

A translation that encounters a process table entry that is zero is
something that Linux does to cause certain kernel NULL pointer
dereferences to fault. It is not itself a programming error, so avoid
the guest error log.

Message-ID: <20241219034035.1826173-5-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Wire up BookE ATB registers for e500 family

From the Freescale PowerPC Architecture Primer:

  Alternate time base APU. This APU, implemented on the e500v2, defines
  a 64-bit time base counter that differs from the PowerPC defined time
  base in that it is not writable and counts at a different, and
  typically much higher, frequency. The alternate time base always
  counts up, wrapping when the 64-bit count overflows.

This implementation of ATB uses the same frequency as the TB. The
existing spr_read_atbu/l functions are unused without this patch
to wire them into the SPR.

RTEMS uses this SPR on the e6500, though this hasn't been tested.

Message-ID: <20241219034035.1826173-6-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: fix timebase register reset state

(H)DEC and PURR get reset before icount does, which causes them to be
skewed and not match the init state. This can cause replay to not
match the recorded trace exactly. For DEC and HDEC this is usually not
noticable since they tend to get programmed before affecting the
target machine. PURR has been observed to cause replay bugs when
running Linux.

Fix this by resetting using a time of 0.

Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

spapr: nested: Add support for reporting Hostwide state counter

Add support for reporting Hostwide state counters for nested KVM pseries
guests running with 'cap-nested-papr' on Qemu-TCG acting as
L0-hypervisor. The Hostwide state counters are statistics about state that
L0-hypervisor maintains for the L2-guests and represent the state of all
L2-guests, not just a specific one.

These stats counters are exposed to L1-Hypervisor by the L0-Hypervisor via a
new bit-flag named 'getHostWideState' for the H_GUEST_GET_STATE hcall which
is documented at [1]. Once this flag is set the hcall should populate the
Guest-State-Elements in the requested GSB with the stat counter
values. Currently following five counters are supported:

* l0_guest_heap_size_inuse
* l0_guest_heap_size_max
* l0_guest_pagetable_size_inuse
* l0_guest_pagetable_size_max
* l0_guest_pagetable_reclaimed

At the moment '0' is being reported for all these counters as these
counters doesn't align with how L0-Qemu manages Guest memory.

The patch implements support for these counters by adding new members to
the 'struct SpaprMachineStateNested'. These new members are then plugged
into the existing 'guest_state_element_types[]' with the help of a new
macro 'GSBE_NESTED_MACHINE_DW' together with a new helper
'get_machine_ptr()'. guest_state_request_check() is updated to ensure
correctness of the requested GSB and finally h_guest_getset_state() is
updated to handle the newly introduced flag
'GUEST_STATE_REQUEST_HOST_WIDE'.

This patch is tested with the proposed linux-kernel implementation to
expose these stat-counter as perf-events at [2].

[1]
https://lore.kernel.org/all/20241222140247.174998-2-vaibhav@linux.ibm.com

[2]
https://lore.kernel.org/all/20241222140247.174998-1-vaibhav@linux.ibm.com

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250221155449.530645-1-vaibhav@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc: spapr: Enable 2nd DAWR on Power10 pSeries machine

As per the PAPR, bit 0 of byte 64 in pa-features property
indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
DAWR is present, otherwise not. Use KVM_CAP_PPC_DAWR1 capability to find
whether kvm supports 2nd DAWR or not. If it's supported, allow user to set
the pa-feature bit in guest DT using cap-dawr1 machine capability.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-ID: <173708681866.1678.11128625982438367069.stgit@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>