target/riscv/kvm: turn u32/u64 reg functions into macros
This change is motivated by a future change w.r.t CSRs management. We
want to handle them the same way as KVM extensions, i.e. a static array
with KVMCPUConfig objs that will be read/write during init and so on.
But to do that properly we must be able to declare a static array that
hold KVM regs.
C does not allow to init static arrays and use functions as
initializers, e.g. we can't do:
.kvm_reg_id = kvm_riscv_reg_id_ulong(...)
When instantiating the array. We can do that with macros though, so our
goal is turn kvm_riscv_reg_ulong() in a macro. It is cleaner to turn
every other reg_id_*() function in macros, and ulong will end up using
the macros for u32 and u64, so we'll start with them.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250429124421.223883-4-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
target/riscv/kvm: fix leak in kvm_riscv_init_multiext_cfg()
'reglist' is being g-malloc'ed but never freed.
Reported-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250429124421.223883-3-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Remove an unused 'KVMScratchCPU' pointer argument in
kvm_riscv_check_sbi_dbcn_support().
Put kvm_riscv_reset_regs_csr() after kvm_riscv_put_regs_csr(). This will
make a future patch diff easier to read, when changes in
kvm_riscv_reset_regs_csr() and kvm_riscv_get_regs_csr() will be made.
Fixes: a6b53378f5 ("target/riscv/kvm: implement SBI debug console (DBCN) calls") Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250429124421.223883-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250425152311.804338-7-richard.henderson@linaro.org> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250425152311.804338-6-richard.henderson@linaro.org> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250425152311.804338-5-richard.henderson@linaro.org> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250425152311.804338-4-richard.henderson@linaro.org> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250425152311.804338-3-richard.henderson@linaro.org> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250425152311.804338-2-richard.henderson@linaro.org> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Alistair Francis [Tue, 22 Apr 2025 02:47:52 +0000 (12:47 +1000)]
MAINTAINERS: Add common-user/host/riscv to RISC-V section
Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250422024752.2060289-1-alistair.francis@wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Max Chou [Tue, 8 Apr 2025 10:39:31 +0000 (18:39 +0800)]
target/riscv: rvv: Apply vext_check_input_eew to vrgather instructions to check mismatched input EEWs encoding constraint
According to the v spec, a vector register cannot be used to provide source
operands with more than one EEW for a single instruction.
The vs1 EEW of vrgatherei16.vv is 16.
Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20250408103938.3623486-4-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Anton Blanchard [Tue, 8 Apr 2025 10:39:30 +0000 (18:39 +0800)]
target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS
Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Max Chou <max.chou@sifive.com> Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20250408103938.3623486-3-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Add the relevant ISA paragraphs explaining why source (and destination)
registers cannot overlap the mask register.
Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Max Chou <max.chou@sifive.com> Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20250408103938.3623486-2-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
common-user/host/riscv: use tail pseudoinstruction for calling tail
The j pseudoinstruction maps to a JAL instruction, which can only handle
a jump to somewhere with a signed 20-bit destination. In case of static
linking and LTO'ing this easily leads to "relocation truncated to fit"
error.
Switch to use tail pseudoinstruction, which is the standard way to
tail-call a function in medium code model (emits AUIPC+JALR).
Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250417072206.364008-1-uwu@icenowy.me> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Ziqiao Kong [Tue, 15 Apr 2025 08:02:54 +0000 (16:02 +0800)]
target/riscv: fix endless translation loop on big endian systems
On big endian systems, pte and updated_pte hold big endian host data
while pte_pa points to little endian target data. This means the branch
at cpu_helper.c:1669 will be always satisfied and restart translation,
causing an endless translation loop.
The correctness of this patch can be deduced by:
old_pte will hold value either from cpu_to_le32/64(pte) or
cpu_to_le32/64(updated_pte), both of wich is litte endian. After that,
an in-place conversion by le32/64_to_cpu(old_pte) ensures that old_pte
now is in native endian, same with pte. Therefore, the endianness of the
both side of if (old_pte != pte) is correct.
Signed-off-by: Ziqiao Kong <ziqiaokong@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250415080254.3667878-2-ziqiaokong@gmail.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Paolo Bonzini [Thu, 10 Apr 2025 16:17:22 +0000 (18:17 +0200)]
hw/riscv: Fix type conflict of GLib function pointers
qtest_set_command_cb passed to g_once should match GThreadFunc,
which it does not. But using g_once is actually unnecessary,
because the function is called by riscv_harts_realize() under
the Big QEMU Lock.
Reported-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250410161722.595634-1-pbonzini@redhat.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Paolo Savini [Thu, 13 Mar 2025 12:39:26 +0000 (12:39 +0000)]
Expand the probe_pages helper function to handle probe flags.
This commit expands the probe_pages helper function in
target/riscv/vector_helper.c to handle also the cases in which we need access to
the flags raised while probing the memory and the host address.
This is done in order to provide a unified interface to probe_access and
probe_access_flags.
The new version of probe_pages can now act as a regular call to probe_access as
before and as a call to probe_access_flags. In the latter case the user need to
pass pointers to flags and host address and a boolean value for nonfault.
The flags and host address will be set and made available as for a direct call
to probe_access_flags.
Signed-off-by: Paolo Savini <paolo.savini@embecosm.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250313123926.374878-2-paolo.savini@embecosm.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Paolo Savini [Thu, 13 Mar 2025 15:23:30 +0000 (15:23 +0000)]
target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.
This patch replaces the use of a helper function with direct tcg ops generation
in order to emulate whole register loads and stores. This is done in order to
improve the performance of QEMU.
We still use the helper function when vstart is not 0 at the beginning of the
emulation of the whole register load or store or when we would end up generating
partial loads or stores of vector elements (e.g. emulating 64 bits element loads
with pairs of 32 bits loads on hosts with 32 bits registers).
The latter condition ensures that we are not surprised by a trap in mid-element
and consecutively that we can update vstart correctly.
We also use the helper function when it performs better than tcg for specific
combinations of vector length, number of fields and element size.
Signed-off-by: Paolo Savini <paolo.savini@embecosm.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Richard Handerson <richard.henderson@linaro.org> Reviewed-by: Max Chou <max.chou@sifive.com> Reviewed-by: "Alex Bennée" <alex.bennee@linaro.org>
Message-ID: <20250313152330.398396-2-paolo.savini@embecosm.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Sebastian Huber [Wed, 19 Mar 2025 06:13:38 +0000 (07:13 +0100)]
hw/riscv: microchip_pfsoc: Rework documentation
Mention that running the HSS no longer works. Document the changed boot
options. Reorder documentation blocks. Update URLs.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250319061342.26435-7-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250319061342.26435-6-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Sebastian Huber [Wed, 19 Mar 2025 06:13:36 +0000 (07:13 +0100)]
hw/riscv: Allow direct start of kernel for MPFS
Further customize the -bios and -kernel options behaviour for the
microchip-icicle-kit machine. If "-bios none -kernel filename" is
specified, then do not load a firmware and instead only load and start
the kernel image.
For test runs, use an approach similar to
riscv_find_and_load_firmware().
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250319061342.26435-5-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Sebastian Huber [Wed, 19 Mar 2025 06:13:35 +0000 (07:13 +0100)]
hw/riscv: Make FDT optional for MPFS
Real-time kernels such as RTEMS or Zephyr may use a static device tree
built into the kernel image. Do not require to use the -dtb option if
-kernel is used for the microchip-icicle-kit machine. Issue a warning
if no device tree is provided by the user since the machine does not
generate one.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250319061342.26435-4-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Sebastian Huber [Wed, 19 Mar 2025 06:13:34 +0000 (07:13 +0100)]
hw/riscv: More flexible FDT placement for MPFS
If the kernel entry is in the high DRAM area, place the FDT into this
area.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250319061342.26435-3-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Sebastian Huber [Wed, 19 Mar 2025 06:13:33 +0000 (07:13 +0100)]
hw/misc: Add MPFS system reset support
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250319061342.26435-2-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Paolo Savini [Wed, 12 Mar 2025 15:55:47 +0000 (15:55 +0000)]
Generate strided vector loads/stores with tcg nodes.
This commit improves the performance of QEMU when emulating strided vector
loads and stores by substituting the call for the helper function with the
generation of equivalent TCG operations.
Signed-off-by: Paolo Savini <paolo.savini@embecosm.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250312155547.289642-2-paolo.savini@embecosm.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Loïc Lefort [Thu, 13 Mar 2025 19:30:09 +0000 (20:30 +0100)]
target/riscv: pmp: fix checks on writes to pmpcfg in Smepmp MML mode
With Machine Mode Lockdown (mseccfg.MML) set and RLB not set, checks on pmpcfg
writes would match the wrong cases of Smepmp truth table.
The existing code allows writes for the following cases:
- L=1, X=0: cases 8, 10, 12, 14
- L=0, RWX!=WX: cases 0-2, 4-6
This leaves cases 3, 7, 9, 11, 13, 15 for which writes are ignored.
From the Smepmp specification: "Adding a rule with executable privileges that
either is M-mode-only or a locked Shared-Region is not possible (...)" This
description matches cases 9-11, 13 of the truth table.
This commit implements an explicit check for these cases by using
pmp_get_epmp_operation to convert between PMP configuration and Smepmp truth
table cases.
Signed-off-by: Loïc Lefort <loic@rivosinc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Message-ID: <20250313193011.720075-4-loic@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Loïc Lefort [Thu, 13 Mar 2025 19:30:07 +0000 (20:30 +0100)]
target/riscv: pmp: don't allow RLB to bypass rule privileges
When Smepmp is supported, mseccfg.RLB allows bypassing locks when writing CSRs
but should not affect interpretation of actual PMP rules.
This is not the case with the current implementation where pmp_hart_has_privs
calls pmp_is_locked which implements mseccfg.RLB bypass.
This commit implements the correct behavior by removing mseccfg.RLB bypass from
pmp_is_locked.
RLB bypass when writing CSRs is implemented by adding a new pmp_is_readonly
function that calls pmp_is_locked and check mseccfg.RLB. pmp_write_cfg and
pmpaddr_csr_write are changed to use this new function.
Signed-off-by: Loïc Lefort <loic@rivosinc.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Message-ID: <20250313193011.720075-2-loic@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
Sunil V L [Sat, 22 Mar 2025 04:31:38 +0000 (10:01 +0530)]
hw/riscv/virt-acpi-build: Add support for RIMT
RISC-V IO Mapping Table (RIMT) is a new static ACPI table used to
communicate IOMMU information to the OS. Add support for creating this
table when the IOMMU is present. The specification is frozen and
available at [1].
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250322043139.2003479-3-sunilvl@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Sunil V L [Sat, 22 Mar 2025 04:31:37 +0000 (10:01 +0530)]
hw/riscv/virt: Add the BDF of IOMMU to RISCVVirtState structure
When the IOMMU is implemented as a PCI device, its BDF is created
locally in virt.c. However, the same BDF is also required in
virt-acpi-build.c to support ACPI. Therefore, make this information part
of the global RISCVVirtState structure so that it can be accessed
outside of virt.c as well.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250322043139.2003479-2-sunilvl@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Stefan Hajnoczi [Thu, 15 May 2025 17:42:20 +0000 (13:42 -0400)]
Merge tag 'pull-target-arm-20250515' of https://git.linaro.org/people/pmaydell/qemu-arm into staging
target-arm queue:
* target/arm: refactoring for compile-twice changes
* MAINTAINERS: Add an entry for the Bananapi machine
* arm/omap: remove hard coded tabs
* rust: pl011: Cut down amount of text quoted from PL011 TRM
* target/arm: refactor Arm CPU class hierarchy
* tag 'pull-target-arm-20250515' of https://git.linaro.org/people/pmaydell/qemu-arm: (58 commits)
target/arm/tcg/vfp_helper: compile file twice (system, user)
target/arm/tcg/arith_helper: compile file once
target/arm/tcg/tlb-insns: compile file once (system)
target/arm/helper: restrict define_tlb_insn_regs to system target
target/arm/tcg/tlb_helper: compile file twice (system, user)
target/arm/tcg/neon_helper: compile file twice (system, user)
target/arm/tcg/iwmmxt_helper: compile file twice (system, user)
target/arm/tcg/hflags: compile file twice (system, user)
target/arm/tcg/crypto_helper: compile file once
target/arm/tcg/vec_internal: use forward declaration for CPUARMState
target/arm/machine: compile file once (system)
target/arm/kvm-stub: add missing stubs
target/arm/machine: move cpu_post_load kvm bits to kvm_arm_cpu_post_load function
target/arm/machine: remove TARGET_AARCH64 from migration state
target/arm/machine: reduce migration include to avoid target specific definitions
target/arm/kvm-stub: compile file once (system)
target/arm/meson: accelerator files are not needed in user mode
target/arm/ptw: compile file once (system)
target/arm/ptw: replace TARGET_AARCH64 by CONFIG_ATOMIC64 from arm_casq_ptw
target/arm/ptw: replace target_ulong with int64_t
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* tag 'pull-nbd-2025-05-14' of https://repo.or.cz/qemu/ericb:
mirror: Reduce I/O when destination is detect-zeroes:unmap
tests: Add iotest mirror-sparse for recent patches
iotests/common.rc: add disk_usage function
mirror: Skip writing zeroes when target is already zero
mirror: Skip pre-zeroing destination if it is already zero
mirror: Drop redundant zero_target parameter
mirror: Allow QMP override to declare target already zero
mirror: Pass full sync mode rather than bool to internals
mirror: Minor refactoring
iotests: Improve iotest 194 to mirror data
block: Add new bdrv_co_is_all_zeroes() function
block: Let bdrv_co_is_zero_fast consolidate adjacent extents
file-posix, gluster: Handle zero block status hint better
block: Expand block status mode from bool to flags
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Thu, 15 May 2025 17:41:56 +0000 (13:41 -0400)]
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pci,pc: fixes, features
vhost-scsi now supports scsi hotplug
cxl gained a bag of new operations, motably media operations
virtio-net now supports SR-IOV emulation
pci-testdev now supports backing memory bar with host memory
amd iommu now supports migration
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (27 commits)
hw/i386/amd_iommu: Allow migration when explicitly create the AMDVI-PCI device
hw/i386/amd_iommu: Isolate AMDVI-PCI from amd-iommu device to allow full control over the PCI device creation
intel_iommu: Take locks when looking for and creating address spaces
intel_iommu: Use BQL_LOCK_GUARD to manage cleanup automatically
virtio: Move virtio_reset()
virtio: Call set_features during reset
vhost-scsi: support VIRTIO_SCSI_F_HOTPLUG
vhost-user: return failure if backend crash when live migration
vhost: return failure if stop virtqueue failed in vhost_dev_stop
system/runstate: add VM state change cb with return value
pci-testdev.c: Add membar-backed option for backing membar
pcie_sriov: Make a PCI device with user-created VF ARI-capable
docs: Document composable SR-IOV device
virtio-net: Implement SR-IOV VF
virtio-pci: Implement SR-IOV PF
pcie_sriov: Allow user to create SR-IOV device
pcie_sriov: Check PCI Express for SR-IOV PF
pcie_sriov: Ensure PF and VF are mutually exclusive
hw/pci: Fix SR-IOV VF number calculation
hw/pci: Do not add ROM BAR for SR-IOV VF
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* tag 'pull-request-2025-05-14' of https://gitlab.com/thuth/qemu:
tests/functional: Skip the screendump tests if the command is not available
tests/functional/test_s390x_tuxrun: Check whether the machine is available
include/hw/dma/xlnx_dpdma: Remove dependency on console.h
s390x: Fix leak in machine_set_loadparm
hw/s390x/s390-virtio-ccw: Remove the deprecated 4.0 machine type
hw/s390x/s390-virtio-ccw: Remove the deprecated 3.1 machine type
hw/s390x: Remove the obsolete hpage_1m_allowed switch
hw/s390x/s390-virtio-ccw: Remove the deprecated 3.0 machine type
hw/s390x/s390-virtio-ccw: Remove the deprecated 2.12 machine type
target/s390x: Rename the qemu_V2_11 feature set to qemu_MIN
hw/s390x/event-facility: Remove the obsolete "allow_all_mask_sizes" code
hw/s390x/s390-virtio-ccw: Remove the deprecated 2.11 machine type
hw/s390x/s390-virtio-ccw: Remove the deprecated 2.10 machine type
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Tue, 13 May 2025 22:00:45 +0000 (17:00 -0500)]
mirror: Reduce I/O when destination is detect-zeroes:unmap
If we are going to punch holes in the mirror destination even for the
portions where the source image is unallocated, it is nicer to treat
the entire image as dirty and punch as we go, rather than pre-zeroing
the entire image just to re-do I/O to the allocated portions of the
image.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250513220142.535200-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:30 +0000 (15:40 -0500)]
tests: Add iotest mirror-sparse for recent patches
Prove that blockdev-mirror can now result in sparse raw destination
files, regardless of whether the source is raw or qcow2. By making
this a separate test, it was possible to test effects of individual
patches for the various pieces that all have to work together for a
sparse mirror to be successful.
Note that ./check -file produces different job lengths than ./check
-qcow2 (the test uses a filter to normalize); that's because when
deciding how much of the image to be mirrored, the code looks at how
much of the source image was allocated (for qcow2, this is only the
written clusters; for raw, it is the entire file). But the important
part is that the destination file ends up smaller than 3M, rather than
the 20M it used to be before this patch series.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250509204341.3553601-28-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Move the definition from iotests/250 to common.rc. This is used to
detect real disk usage of sparse files. In particular, we want to use
it for checking subclusters-based discards.
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-ID: <20240913163942.423050-6-andrey.drobyshev@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250509204341.3553601-27-eblake@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:28 +0000 (15:40 -0500)]
mirror: Skip writing zeroes when target is already zero
When mirroring, the goal is to ensure that the destination reads the
same as the source; this goal is met whether the destination is sparse
or fully-allocated (except when explicitly punching holes, then merely
reading zero is not enough to know if it is sparse, so we still want
to punch the hole). Avoiding a redundant write to zero (whether in
the background because the zero cluster was marked in the dirty
bitmap, or in the foreground because the guest is writing zeroes) when
the destination already reads as zero makes mirroring faster, and
avoids allocating the destination merely because the source reports as
allocated.
The effect is especially pronounced when the source is a raw file.
That's because when the source is a qcow2 file, the dirty bitmap only
visits the portions of the source that are allocated, which tend to be
non-zero. But when the source is a raw file,
bdrv_co_is_allocated_above() reports the entire file as allocated so
mirror_dirty_init sets the entire dirty bitmap, and it is only later
during mirror_iteration that we change to consulting the more precise
bdrv_co_block_status_above() to learn where the source reads as zero.
Remember that since a mirror operation can write a cluster more than
once (every time the guest changes the source, the destination is also
changed to keep up), and the guest can change whether a given cluster
reads as zero, is discarded, or has non-zero data over the course of
the mirror operation, we can't take the shortcut of relying on
s->target_is_zero (which is static for the life of the job) in
mirror_co_zero() to see if the destination is already zero, because
that information may be stale. Any solution we use must be dynamic in
the face of the guest writing or discarding a cluster while the mirror
has been ongoing.
We could just teach mirror_co_zero() to do a block_status() probe of
the destination, and skip the zeroes if the destination already reads
as zero, but we know from past experience that extra block_status()
calls are not always cheap (tmpfs, anyone?), especially when they are
random access rather than linear. Use of block_status() of the source
by the background task in a linear fashion is not our bottleneck (it's
a background task, after all); but since mirroring can be done while
the source is actively being changed, we don't want a slow
block_status() of the destination to occur on the hot path of the
guest trying to do random-access writes to the source.
So this patch takes a slightly different approach: any time we have to
track dirty clusters, we can also track which clusters are known to
read as zero. For sync=TOP or when we are punching holes from
"detect-zeroes":"unmap", the zero bitmap starts out empty, but
prevents a second write zero to a cluster that was already zero by an
earlier pass; for sync=FULL when we are not punching holes, the zero
bitmap starts out full if the destination reads as zero during
initialization. Either way, I/O to the destination can now avoid
redundant write zero to a cluster that already reads as zero, all
without having to do a block_status() per write on the destination.
With this patch, if I create a raw sparse destination file, connect it
with QMP 'blockdev-add' while leaving it at the default "discard":
"ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
destination remains sparse rather than fully allocated. Meanwhile, a
destination image that is already fully allocated remains so unless it
was opened with "detect-zeroes": "unmap". And any time writing zeroes
is skipped, the job counters are not incremented.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250509204341.3553601-26-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:27 +0000 (15:40 -0500)]
mirror: Skip pre-zeroing destination if it is already zero
When doing a sync=full mirroring, we can skip pre-zeroing the
destination if it already reads as zeroes and we are not also trying
to punch holes due to detect-zeroes. With this patch, there are fewer
scenarios that have to pass in an explicit target-is-zero, while still
resulting in a sparse destination remaining sparse.
A later patch will then further improve things to skip writing to the
destination for parts of the image where the source is zero; but even
with just this patch, it is possible to see a difference for any
source that does not report itself as fully allocated, coupled with a
destination BDS that can quickly report that it already reads as zero.
(For a source that reports as fully allocated, such as a file, the
rest of mirror_dirty_init() still sets the entire dirty bitmap to
true, so even though we avoided the pre-zeroing, we are not yet
avoiding all redundant I/O).
Iotest 194 detects the difference made by this patch: for a file
source (where block status reports the entire image as allocated, and
therefore we end up writing zeroes everywhere in the destination
anyways), the job length remains the same. But for a qcow2 source and
a destination that reads as all zeroes, the dirty bitmap changes to
just tracking the allocated portions of the source, which results in
faster completion and smaller job statistics. For the test to pass
with both ./check -file and -qcow2, a new python filter is needed to
mask out the now-varying job amounts (this matches the shell filters
_filter_block_job_{offset,len} in common.filter). A later test will
also be added which further validates expected sparseness, so it does
not matter that 194 is no longer explicitly looking at how many bytes
were copied.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250509204341.3553601-25-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:26 +0000 (15:40 -0500)]
mirror: Drop redundant zero_target parameter
The two callers to a mirror job (drive-mirror and blockdev-mirror) set
zero_target precisely when sync mode == FULL, with the one exception
that drive-mirror skips zeroing the target if it was newly created and
reads as zero. But given the previous patch, that exception is
equally captured by target_is_zero.
Meanwhile, there is another slight wrinkle, fortunately caught by
iotest 185: if the caller uses "sync":"top" but the source has no
backing file, the code in blockdev.c was changing sync to be FULL, but
only after it had set zero_target=false. In mirror.c, prior to recent
patches, this didn't matter: the only places that inspected sync were
setting is_none_mode (both TOP and FULL had set that to false), and
mirror_start() setting base = mode == MIRROR_SYNC_MODE_TOP ?
bdrv_backing_chain_next(bs) : NULL. But now that we are passing sync
around, the slammed sync mode would result in a new pre-zeroing pass
even when the user had passed "sync":"top" in an effort to skip
pre-zeroing. Fortunately, the assignment of base when bs has no
backing chain still works out to NULL if we don't slam things. So
with the forced change of sync ripped out of blockdev.c, the sync mode
is passed through the full callstack unmolested, and we can now
reliably reconstruct the same settings as what used to be passed in by
zero_target=false, without the redundant parameter.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250509204341.3553601-24-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[eblake: Fix regression in iotest 185] Signed-off-by: Eric Blake <eblake@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:25 +0000 (15:40 -0500)]
mirror: Allow QMP override to declare target already zero
QEMU has an optimization for a just-created drive-mirror destination
that is not possible for blockdev-mirror (which can't create the
destination) - any time we know the destination starts life as all
zeroes, we can skip a pre-zeroing pass on the destination. Recent
patches have added an improved heuristic for detecting if a file
contains all zeroes, and we plan to use that heuristic in upcoming
patches. But since a heuristic cannot quickly detect all scenarios,
and there may be cases where the caller is aware of information that
QEMU cannot learn quickly, it makes sense to have a way to tell QEMU
to assume facts about the destination that can make the mirror
operation faster. Given our existing example of "qemu-img convert
--target-is-zero", it is time to expose this override in QMP for
blockdev-mirror as well.
This patch results in some slight redundancy between the older
s->zero_target (set any time mode==FULL and the destination image was
not just created - ie. clear if drive-mirror is asking to skip the
pre-zero pass) and the newly-introduced s->target_is_zero (in addition
to the QMP override, it is set when drive-mirror creates the
destination image); this will be cleaned up in the next patch.
There is also a subtlety that we must consider. When drive-mirror is
passing target_is_zero on behalf of a just-created image, we know the
image is sparse (skipping the pre-zeroing keeps it that way), so it
doesn't matter whether the destination also has "discard":"unmap" and
"detect-zeroes":"unmap". But now that we are letting the user set the
knob for target-is-zero, if the user passes a pre-existing file that
is fully allocated, it is fine to leave the file fully allocated under
"detect-zeroes":"on", but if the file is open with
"detect-zeroes":"unmap", we should really be trying harder to punch
holes in the destination for every region of zeroes copied from the
source. The easiest way to do this is to still run the pre-zeroing
pass (turning the entire destination file sparse before populating
just the allocated portions of the source), even though that currently
results in double I/O to the portions of the file that are allocated.
A later patch will add further optimizations to reduce redundant
zeroing I/O during the mirror operation.
Since "target-is-zero":true is designed for optimizations, it is okay
to silently ignore the parameter rather than erroring if the user ever
sets the parameter in a scenario where the mirror job can't exploit it
(for example, when doing "sync":"top" instead of "sync":"full", we
can't pre-zero, so setting the parameter won't make a speed
difference).
Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250509204341.3553601-23-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:24 +0000 (15:40 -0500)]
mirror: Pass full sync mode rather than bool to internals
Out of the five possible values for MirrorSyncMode, INCREMENTAL and
BITMAP are already rejected up front in mirror_start, leaving NONE,
TOP, and FULL as the remaining values that the code was collapsing
into a single bool is_none_mode. Furthermore, mirror_dirty_init() is
only reachable for modes TOP and FULL, as further guided by
s->zero_target. However, upcoming patches want to further optimize
the pre-zeroing pass of a sync=full mirror in mirror_dirty_init(),
while avoiding that pass on a sync=top action. Instead of throwing
away context by collapsing these two values into
s->is_none_mode=false, it is better to pass s->sync_mode throughout
the entire operation. For active commit, the desired semantics match
sync mode TOP.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250509204341.3553601-22-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:23 +0000 (15:40 -0500)]
mirror: Minor refactoring
Commit 5791ba52 (v9.2) pre-initialized ret in mirror_dirty_init to
silence a false positive compiler warning, even though in all code
paths where ret is used, it was guaranteed to be reassigned
beforehand. But since the function returns -errno, and -1 is not
always the right errno, it's better to initialize to -EIO.
An upcoming patch wants to track two bitmaps in
do_sync_target_write(); this will be easier if the current variables
related to the dirty bitmap are renamed.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250509204341.3553601-21-eblake@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:22 +0000 (15:40 -0500)]
iotests: Improve iotest 194 to mirror data
Mirroring a completely sparse image to a sparse destination should be
practically instantaneous. It isn't yet, but the test will be more
realistic if it has some non-zero to mirror as well as the holes.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250509204341.3553601-20-eblake@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:21 +0000 (15:40 -0500)]
block: Add new bdrv_co_is_all_zeroes() function
There are some optimizations that require knowing if an image starts
out as reading all zeroes, such as making blockdev-mirror faster by
skipping the copying of source zeroes to the destination. The
existing bdrv_co_is_zero_fast() is a good building block for answering
this question, but it tends to give an answer of 0 for a file we just
created via QMP 'blockdev-create' or similar (such as 'qemu-img create
-f raw'). Why? Because file-posix.c insists on allocating a tiny
header to any file rather than leaving it 100% sparse, due to some
filesystems that are unable to answer alignment probes on a hole. But
teaching file-posix.c to read the tiny header doesn't scale - the
problem of a small header is also visible when libvirt sets up an NBD
client to a just-created file on a migration destination host.
So, we need a wrapper function that handles a bit more complexity in a
common manner for all block devices - when the BDS is mostly a hole,
but has a small non-hole header, it is still worth the time to read
that header and check if it reads as all zeroes before giving up and
returning a pessimistic answer.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250509204341.3553601-19-eblake@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:20 +0000 (15:40 -0500)]
block: Let bdrv_co_is_zero_fast consolidate adjacent extents
Some BDS drivers have a cap on how much block status they can supply
in one query (for example, NBD talking to an older server cannot
inspect more than 4G per query; and qcow2 tends to cap its answers
rather than cross a cluster boundary of an L1 table). Although the
existing callers of bdrv_co_is_zero_fast are not passing in that large
of a 'bytes' parameter, an upcoming caller wants to query the entire
image at once, and will thus benefit from being able to treat adjacent
zero regions in a coalesced manner, rather than claiming the region is
non-zero merely because pnum was truncated and didn't match the
incoming bytes.
While refactoring this into a loop, note that there is no need to
assign pnum prior to calling bdrv_co_common_block_status_above() (it
is guaranteed to be assigned deeper in the callstack).
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250509204341.3553601-18-eblake@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:19 +0000 (15:40 -0500)]
file-posix, gluster: Handle zero block status hint better
Although the previous patch to change 'bool want_zero' into a bitmask
made no semantic change, it is now time to differentiate. When the
caller specifically wants to know what parts of the file read as zero,
we need to use lseek and actually reporting holes, rather than
short-circuiting and advertising full allocation.
This change will be utilized in later patches to let mirroring
optimize for the case when the destination already reads as zeroes.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250509204341.3553601-17-eblake@redhat.com>
Eric Blake [Fri, 9 May 2025 20:40:18 +0000 (15:40 -0500)]
block: Expand block status mode from bool to flags
This patch is purely mechanical, changing bool want_zero into an
unsigned int for bitwise-or of flags. As of this patch, all
implementations are unchanged (the old want_zero==true is now
mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but
the callers in io.c that used to pass want_zero==false are now
prepared for future driver changes that can now distinguish bewteen
BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED. The next patch will actually
change the file-posix driver along those lines, now that we have
more-specific hints.
As for the background why this patch is useful: right now, the
file-posix driver recognizes that if allocation is being queried, the
entire image can be reported as allocated (there is no backing file to
refer to) - but this throws away information on whether the entire
image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
returns -ENXIO, a bit more complicated to prove if the raw file was
created with 'qemu-img create' since we intentionally allocate a small
chunk of all-zero data to help with alignment probing). Later patches
will add a generic algorithm for seeing if an entire file reads as
zeroes.
Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250509204341.3553601-16-eblake@redhat.com>
Pierrick Bouvier [Mon, 12 May 2025 18:04:49 +0000 (11:04 -0700)]
target/arm/machine: remove TARGET_AARCH64 from migration state
This exposes two new subsections for arm: vmstate_sve and vmstate_za.
Those sections have a ".needed" callback, which already allow to skip
them when not needed.
vmstate_sve .needed is checking cpu_isar_feature(aa64_sve, cpu).
vmstate_za .needed is checking ZA flag in cpu->env.svcr.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250512180502.2395029-36-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Pierrick Bouvier [Mon, 12 May 2025 18:04:44 +0000 (11:04 -0700)]
target/arm/ptw: replace TARGET_AARCH64 by CONFIG_ATOMIC64 from arm_casq_ptw
This function needs 64 bit compare exchange, so we hide implementation
for hosts not supporting it (some 32 bit target, which don't run 64 bit
guests anyway).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-id: 20250512180502.2395029-31-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Pierrick Bouvier [Mon, 12 May 2025 18:04:29 +0000 (11:04 -0700)]
target/arm/helper: extract common helpers
Allow later commits to include only the "new" tcg/helper.h, thus
preventing to pull aarch64 helpers (+ target/arm/helper.h contains a
ifdef TARGET_AARCH64).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-id: 20250512180502.2395029-16-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>