Merge tag 'staging-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration/Memory Pull for 10.2
- PeterX's fix on tls warning for preempt channel when migratino completes
- Arun's series to enhance error reporting for vTPM and migration framework
- PeterX's patch to cleanup multifd send TLS BYE messages
- Juraj's fix on postcopy start state transition when switchover failed
- Yanfei's fix to migrate APIC before VFIO-PCI to avoid irq fallbacks
- Dan's cleanup to simplify error reporting in qemu_fill_buffer()
- PeterM's fix on address space leak when cpu hot plug / unplug
- Steve's cpr-exec wholeset
# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCaN/uIhIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wZ+mAEA1l2RS9sZS1W3vXQMCNb+Nu8Uo2p+e5Qj
# Uu6J0WVV+XsBANtzGZk2UM/frqlABywW3/ozJ4qBvIPKo758Mr6/lqUH
# =asUv
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 03 Oct 2025 08:39:14 AM PDT
# gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg: issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown]
# gpg: aka "Peter Xu <peterx@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'staging-pull-request' of https://gitlab.com/peterx/qemu: (45 commits)
migration-test: test cpr-exec
vfio: cpr-exec mode
migration: cpr-exec docs
migration: cpr-exec mode
migration: cpr-exec save and load
migration: cpr-exec-command parameter
oslib: qemu_clear_cloexec
migration: add cpr_walk_fd
migration: multi-mode notifier
migration: simplify error reporting after channel read
physmem: Destroy all CPU AddressSpaces on unrealize
memory: New AS helper to serialize destroy+free
include/system/memory.h: Clarify address_space_destroy() behaviour
migration: ensure APIC is loaded prior to VFIO PCI devices
migration: Fix state transition in postcopy_start() error handling
migration/multifd/tls: Cleanup BYE message processing on sender side
migration: HMP: Adjust the order of output fields
migration: Make migration_has_failed() work even for CANCELLING
io/crypto: Move tls premature termination handling into QIO layer
backends/tpm: Propagate vTPM error on migration failure
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The migrate command stops the VM, saves state to uri-1,
directly exec's a new version of QEMU on the same host,
replacing the original process while retaining its PID, and
loads state from uri-1. Guest RAM is preserved in place,
albeit with new virtual addresses.
The new QEMU process is started by exec'ing the command
specified by the @cpr-exec-command parameter. The first word of
the command is the binary, and the remaining words are its
arguments. The command may be a direct invocation of new QEMU,
or may be a non-QEMU command that exec's the new QEMU binary.
This mode creates a second migration channel that is not visible
to the user. At the start of migration, old QEMU saves CPR state
to the second channel, and at the end of migration, it tells the
main loop to call cpr_exec. New QEMU loads CPR state early, before
objects are created.
Because old QEMU terminates when new QEMU starts, one cannot
stream data between the two, so uri-1 must be a type,
such as a file, that accepts all data before old QEMU exits.
Otherwise, old QEMU may quietly block writing to the channel.
Memory-backend objects must have the share=on attribute, but
memory-backend-epc is not supported. The VM must be started with
the '-machine aux-ram-share=on' option, which allows anonymous
memory to be transferred in place to the new process. The memfds
are kept open across exec by clearing the close-on-exec flag, their
values are saved in CPR state, and they are mmap'd in new QEMU.
Steve Sistare [Wed, 1 Oct 2025 15:33:57 +0000 (08:33 -0700)]
migration: cpr-exec save and load
To preserve CPR state across exec, create a QEMUFile based on a memfd, and
keep the memfd open across exec. Save the value of the memfd in an
environment variable so post-exec QEMU can find it.
These new functions are called in a subsequent patch.
Steve Sistare [Wed, 1 Oct 2025 15:33:56 +0000 (08:33 -0700)]
migration: cpr-exec-command parameter
Create the cpr-exec-command migration parameter, defined as a list of
strings. It will be used for cpr-exec migration mode in a subsequent
patch, and contains forward references to cpr-exec mode in the qapi
doc.
No functional change, except that cpr-exec-command is shown by the
'info migrate' command.
Steve Sistare [Wed, 1 Oct 2025 15:33:55 +0000 (08:33 -0700)]
oslib: qemu_clear_cloexec
Define qemu_clear_cloexec, analogous to qemu_set_cloexec.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1759332851-370353-4-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
Steve Sistare [Wed, 1 Oct 2025 15:33:53 +0000 (08:33 -0700)]
migration: multi-mode notifier
Allow a notifier to be added for multiple migration modes.
To allow a notifier to appear on multiple per-node lists, use
a generic list type. We can no longer use NotifierWithReturnList,
because it shoe horns the notifier onto a single list.
migration: simplify error reporting after channel read
The code handling the return value of qio_channel_read proceses
len == 0 (EOF) separately from len < 1 (error), but in both
cases ends up calling qemu_file_set_error_obj() with -EIO as the
errno. This logic can be merged into one codepath to simplify it.
Peter Maydell [Mon, 29 Sep 2025 14:42:28 +0000 (15:42 +0100)]
physmem: Destroy all CPU AddressSpaces on unrealize
When we unrealize a CPU object (which happens on vCPU hot-unplug), we
should destroy all the AddressSpace objects we created via calls to
cpu_address_space_init() when the CPU was realized.
Commit 24bec42f3d6eae added a function to do this for a specific
AddressSpace, but did not add any places where the function was
called.
Since we always want to destroy all the AddressSpaces on unrealize,
regardless of the target architecture, we don't need to try to keep
track of how many are still undestroyed, or make the target
architecture code manually call a destroy function for each AS it
created. Instead we can adjust the function to always completely
destroy the whole cpu->ases array, and arrange for it to be called
during CPU unrealize as part of the common code.
Without this fix, AddressSanitizer will report a leak like this
from a run where we hot-plugged and then hot-unplugged an x86 KVM
vCPU:
Direct leak of 416 byte(s) in 1 object(s) allocated from:
#0 0x5b638565053d in calloc (/data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/qemu-system-x86_64+0x1ee153d) (BuildId: c1cd6022b195142106e1bffeca23498c2b752bca)
#1 0x7c28083f77b1 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x637b1) (BuildId: 1eb6131419edb83b2178b682829a6913cf682d75)
#2 0x5b6386999c7c in cpu_address_space_init /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../system/physmem.c:797:25
#3 0x5b638727f049 in kvm_cpu_realizefn /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../target/i386/kvm/kvm-cpu.c:102:5
#4 0x5b6385745f40 in accel_cpu_common_realize /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../accel/accel-common.c:101:13
#5 0x5b638568fe3c in cpu_exec_realizefn /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../hw/core/cpu-common.c:232:10
#6 0x5b63874a2cd5 in x86_cpu_realizefn /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../target/i386/cpu.c:9321:5
#7 0x5b6387a0469a in device_set_realized /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../hw/core/qdev.c:494:13
#8 0x5b6387a27d9e in property_set_bool /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/object.c:2375:5
#9 0x5b6387a2090b in object_property_set /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/object.c:1450:5
#10 0x5b6387a35b05 in object_property_set_qobject /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/qom-qobject.c:28:10
#11 0x5b6387a21739 in object_property_set_bool /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/object.c:1520:15
#12 0x5b63879fe510 in qdev_realize /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../hw/core/qdev.c:276:12
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2517 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250929144228.1994037-4-peter.maydell@linaro.org Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Mon, 29 Sep 2025 14:42:27 +0000 (15:42 +0100)]
memory: New AS helper to serialize destroy+free
If an AddressSpace has been created in its own allocated
memory, cleaning it up requires first destroying the AS
and then freeing the memory. Doing this doesn't work:
address_space_destroy(as);
g_free_rcu(as, rcu);
because both address_space_destroy() and g_free_rcu()
try to use the same 'rcu' node in the AddressSpace struct
and the address_space_destroy hook gets overwritten.
Provide a new address_space_destroy_free() function which
will destroy the AS and then free the memory it uses, all
in one RCU callback.
(CC to stable because the next commit needs this function.)
Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250929144228.1994037-3-peter.maydell@linaro.org Signed-off-by: Peter Xu <peterx@redhat.com>
address_space_destroy() doesn't actually immediately destroy the AS;
it queues it to be destroyed via RCU. This means you can't g_free()
the memory the AS struct is in until that has happened.
Yanfei Xu [Mon, 18 Aug 2025 13:11:27 +0000 (21:11 +0800)]
migration: ensure APIC is loaded prior to VFIO PCI devices
The load procedure of VFIO PCI devices involves setting up IRT
for each VFIO PCI devices. This requires determining whether an
interrupt is single-destination interrupt to decide between
Posted Interrupt(PI) or remapping mode for the IRTE. However,
determining this may require accessing the VM's APIC registers.
For example:
ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs)
...
kvm_arch_irq_bypass_add_producer
kvm_x86_call(pi_update_irte)
vmx_pi_update_irte
kvm_intr_is_single_vcpu
If the LAPIC has not been loaded yet, interrupts will use remapping
mode. To prevent the fallback of interrupt mode, keep APIC is always
loaded prior to VFIO PCI devices.
Juraj Marcin [Tue, 26 Aug 2025 11:51:40 +0000 (13:51 +0200)]
migration: Fix state transition in postcopy_start() error handling
Commit 48814111366b ("migration: Always set DEVICE state") introduced
DEVICE state to postcopy, which moved the actual state transition that
leads to POSTCOPY_ACTIVE.
However, the error handling part of the postcopy_start() function still
expects the state POSTCOPY_ACTIVE, but depending on where an error
happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but
never POSTCOPY_ACTIVE, as this transition now happens just before a
successful return from the function.
Instead, accept any state except CANCELLING when transitioning to FAILED
state.
Cc: qemu-stable@nongnu.org Fixes: 48814111366b ("migration: Always set DEVICE state") Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250826115145.871272-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 25 Sep 2025 20:16:01 +0000 (16:16 -0400)]
migration/multifd/tls: Cleanup BYE message processing on sender side
This patch is a trivial cleanup to the BYE messages on the multifd sender
side. It could also be a fix, but since we do not have a solid clue,
taking this as a cleanup only.
One trivial concern is, migration_tls_channel_end() might be unsafe to be
invoked in the migration thread if migration is not successful, because
when failed / cancelled we do not know whether the multifd sender threads
can be writting to the channels, while GnuTLS library (when it's a TLS
channel) logically doesn't support concurrent writes.
When at it, cleanup on a few things. What changed:
- Introduce a helper to do graceful shutdowns with rich comment, hiding
the details
- Only send bye() iff migration succeeded, skip if it failed / cancelled
- Detect TLS channel using channel type rather than thread created flags
- Move the loop into the existing one that will close the channels, but
do graceful shutdowns before channel shutdowns
- local_err seems to have been leaked if set, fix it along the way
Bin Guo [Mon, 29 Sep 2025 02:12:13 +0000 (10:12 +0800)]
migration: HMP: Adjust the order of output fields
Adjust the positions of 'tls-authz' and 'max-postcopy-bandwidth' in
the fields output by the 'info migrate_parameters' command so that
related fields are next to each other.
For clarity only, no functional changes.
Sample output after this commit:
(qemu) info migrate_parameters
...
max-cpu-throttle: 99
tls-creds: ''
tls-hostname: ''
tls-authz: ''
max-bandwidth: 134217728 bytes/second
avail-switchover-bandwidth: 0 bytes/second
max-postcopy-bandwidth: 0 bytes/second
downtime-limit: 300 ms
...
Cc: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Bin Guo <guobin@linux.alibaba.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250929021213.28369-1-guobin@linux.alibaba.com
[peterx: move postcopy-bw before avail-switchover-bw] Signed-off-by: Peter Xu <peterx@redhat.com>
Peter Xu [Thu, 18 Sep 2025 20:39:36 +0000 (16:39 -0400)]
io/crypto: Move tls premature termination handling into QIO layer
QCryptoTLSSession allows TLS premature termination in two cases, one of the
case is when the channel shutdown() is invoked on READ side.
It's possible the shutdown() happened after the read thread blocked at
gnutls_record_recv(). In this case, we should allow the premature
termination to happen.
The problem is by the time qcrypto_tls_session_read() was invoked,
tioc->shutdown may not have been set, so this may instead be treated as an
error if there is concurrent shutdown() calls.
To allow the flag to reflect the latest status of tioc->shutdown, move the
check upper into the QIOChannel level, so as to read the flag only after
QEMU gets an GNUTLS_E_PREMATURE_TERMINATION.
When at it, introduce qio_channel_tls_allow_premature_termination() helper
to make the condition checks easier to read. When doing so, change the
qatomic_load_acquire() to qatomic_read(): here we don't need any ordering
of memory accesses, but reading a flag. qatomic_read() would suffice
because it guarantees fetching from memory. Nothing else we should need to
order on memory access.
This patch will fix a qemu qtest warning when running the preempt tls test,
reporting premature termination:
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test --full -r /x86_64/migration/postcopy/preempt/tls/psk
...
qemu-kvm: Cannot read from TLS channel: The TLS connection was non-properly terminated.
...
In this specific case, the error was set by postcopy_preempt_thread, which
normally will be concurrently shutdown()ed by the main thread.
backends/tpm: Propagate vTPM error on migration failure
- When migration of a VM with encrypted vTPM fails on the
destination host, (e.g., due to a mismatch in secret values),
the error message displayed on the source host is generic and unhelpful.
- For example, a typical error looks like this:
"operation failed: job 'migration out' failed: Sibling indicated error 1.
operation failed: job 'migration in' failed: load of migration failed:
Input/output error"
- Such generic errors are logged using error_report(), which prints to
the console/monitor but does not make the detailed error accessible via
the QMP query-migrate command.
- This change, along with the set of changes of passing errp Error object
to the VM state loading functions, help in addressing the issue.
We use the post_load_errp hook of VMStateDescription to propagate errors
by setting Error **errp objects in case of failure in the TPM backend.
- It can then be retrieved using QMP command:
{"execute" : "query-migrate"}
migration: Add error-parameterized function variants in VMSD struct
- We need to have good error reporting in the callbacks in
VMStateDescription struct. Specifically pre_save, pre_load
and post_load callbacks.
- It is not possible to change these functions everywhere in one
patch, therefore, we introduce a duplicate set of callbacks
with Error object passed to them.
- So, in this commit, we implement 'errp' variants of these callbacks,
introducing an explicit Error object parameter.
- This is a functional step towards transitioning the entire codebase
to the new error-parameterized functions.
- Deliberately called in mutual exclusion from their counterparts,
to prevent conflicts during the transition.
- New impls should preferentally use 'errp' variants of
these methods, and existing impls incrementally converted.
The variants without 'errp' are intended to be removed
once all usage is converted.
migration: Remove error variant of vmstate_save_state() function
This commit removes the redundant vmstate_save_state_with_err()
function.
Previously, commit 969298f9d7 introduced vmstate_save_state_with_err()
to handle error propagation, while vmstate_save_state() existed for
non-error scenarios.
This is because there were code paths where vmstate_save_state_v()
(called internally by vmstate_save_state) did not explicitly set
errors on failure.
This change unifies error handling by
- updating vmstate_save_state() to accept an Error **errp argument.
- vmstate_save_state_v() ensures errors are set directly within the errp
object, eliminating the need for two separate functions.
All calls to vmstate_save_state_with_err() are replaced with
vmstate_save_state(). This simplifies the API and improves code
maintainability.
vmstate_save_state() that only calls vmstate_save_state_v(),
by inference, also has errors set in errp in case of failure.
The errors are reported using error_report_err().
If we want the function to exit on error, then &error_fatal is
passed.
migration: Capture error in postcopy_ram_listen_thread()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
postcopy_ram_listen_thread() calls qemu_loadvm_state_main()
to load the vm, and in case of a failure, it should set the error
in the migration object.
migration: push Error **errp into loadvm_postcopy_handle_switchover_start()
This is an incremental step in converting vmstate loading code to report
error via Error objects instead of directly printing it to console/monitor.
It is ensured that loadvm_postcopy_handle_switchover_start() must report
an error in errp, in case of failure.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-22-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
migration: push Error **errp into loadvm_process_enable_colo()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_process_enable_colo() must report an error
in errp, in case of failure.
migration: Return -1 on memory allocation failure in ram.c
The function colo_init_ram_cache() currently returns -errno if
qemu_anon_ram_alloc() fails. However, the subsequent cleanup loop that
calls qemu_anon_ram_free() could potentially alter the value of errno.
This would cause the function to return a value that does not accurately
represent the original allocation failure.
This commit changes the return value to -1 on memory allocation failure.
This ensures that the return value is consistent and is not affected by
any errno changes that may occur during the free process.
migration: push Error **errp into loadvm_handle_recv_bitmap()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_handle_recv_bitmap() must report an error
in errp, in case of failure.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-19-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
migration: push Error **errp into loadvm_postcopy_ram_handle_discard()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_postcopy_ram_handle_discard() must report an error
in errp, in case of failure.
migration: push Error **errp into loadvm_postcopy_handle_run()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_postcopy_handle_run() must report an error
in errp, in case of failure.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-17-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
migration: push Error **errp into loadvm_postcopy_handle_listen()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_postcopy_handle_listen() must report an error
in errp, in case of failure.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-16-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
migration: push Error **errp into loadvm_postcopy_handle_advise()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_postcopy_handle_advise() must report an error
in errp, in case of failure.
migration: push Error **errp into ram_postcopy_incoming_init()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that ram_postcopy_incoming_init() must report an error
in errp, in case of failure.
migration: make loadvm_postcopy_handle_resume() void
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
Use warn_report() instead of error_report(); it ensures that
a resume command received while the migration is not
in postcopy recover state is not fatal. It only informs that
the command received is unusual, and therefore we should not set
errp with the error string.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-13-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
migration: Update qemu_file_get_return_path() docs and remove dead checks
The documentation of qemu_file_get_return_path() states that it can
return NULL on failure. However, a review of the current implementation
reveals that it is guaranteed that it will always succeed and will never
return NULL.
As a result, the NULL checks post calling the function become redundant.
This commit updates the documentation for the function and removes all
NULL checks throughout the migration code.
migration: push Error **errp into qemu_loadvm_section_part_end()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that qemu_loadvm_section_part_end() must report an error
in errp, in case of failure.
This patch also removes the setting of errp when errp is NULL in the
out section as it is no longer required in the series.
migration: push Error **errp into qemu_loadvm_section_start_full()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that qemu_loadvm_section_start_full() must report an error
in errp, in case of failure.
migration: push Error **errp into qemu_loadvm_state_main()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that qemu_loadvm_state_main() must report an error
in errp, in case of failure.
Set errp explicitly if it is NULL in case of failure in the out
section. This will be removed in the subsequent patch when all of
the calls are converted to passing errp.
The error message in the default case of qemu_loadvm_state_main()
has the word "savevm". This is removed because it can confuse the
user while reading destination side error logs.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-9-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
migration: push Error **errp into qemu_load_device_state()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that qemu_load_device_state() must report an error
in errp, in case of failure.
migration: push Error **errp into qemu_loadvm_state()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that qemu_loadvm_state() must report an error
in errp, in case of failure.
When postcopy live migration runs, the device states are loaded by
both the qemu coroutine process_incoming_migration_co() and the
postcopy_ram_listen_thread(). Therefore, it is important that the
coroutine also reports the error in case of failure, with
error_report_err(). Otherwise, the source qemu will not display
any errors before going into the postcopy pause state.
migration: push Error **errp into loadvm_handle_cmd_packaged()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_handle_cmd_packaged() must report an error
in errp, in case of failure.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-6-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
migration: push Error **errp into loadvm_process_command()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that loadvm_process_command() must report an error
in errp, in case of failure.
The errors are temporarily reported using error_report_err().
This is removed in the subsequent patches in this series
when we are actually able to propagate the error to the calling
function.
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that vmstate_load() must report an error
in errp, in case of failure.
The errors are temporarily reported using error_report_err().
This is removed in the subsequent patches in this series
when we are actually able to propagate the error to the calling
function.
migration: push Error **errp into qemu_loadvm_state_header()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that qemu_loadvm_state_header() must report an error
in errp, in case of failure.
migration: push Error **errp into vmstate_load_state()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that vmstate_load_state() must report an error
in errp, in case of failure.
The errors are temporarily reported using error_report_err().
This is removed in the subsequent patches in this series,
when we are actually able to propagate the error to the calling
function using errp. Whereas, if we want the function to exit on
error, then error_fatal is passed.
migration: push Error **errp into vmstate_subsection_load()
This is an incremental step in converting vmstate loading
code to report error via Error objects instead of directly
printing it to console/monitor.
It is ensured that vmstate_subsection_load() must report an error
in errp, in case of failure.
The errors are temporarily reported using error_report_err().
This is removed in the subsequent patches in this series,
when we are actually able to propagate the error to the calling
function using errp.
Merge tag 'pull-vfio-20251003' of https://github.com/legoater/qemu into staging
vfio queue:
* Remove workaround for kernel DMA unmap overflow
* Remove invalid uses of ram_addr_t type
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmjfpl4ACgkQUaNDx8/7
# 7KFAHQ//R0WtsAsEYE8Diczscl9++gqORrrLYN2ffTKrhUBrBskPptWZ+4Rh4R2e
# OSxdcf1cl0sFNkzCqnbWE3sbAG1Yq6mvCXTGTx3Y+2wi0KNwZXSxYGMWApOydp5K
# McQv1Uyd48TKCEwjumu6jmoPUSi89kvA58BLjBtw2bwJQzdlMZpIHX0XlSjlBHTz
# wHPqqW5+WCWq52pTp2vNkRrcqTl/HuoaijHPEJMzd/GIl1x2tBruuXuwzkY33ZKy
# EyDNq/stK12Pa1Va1ey8QOMQUJJ1jb3feVognDDVRMUGbBPljMawi8vtXW6LW28P
# 0micGzDk1A3yi8X+tIHjQE/rcL86mIKyzCmrSB7WM+t3r79/hWZQruUu2e1eUGCE
# Mw5K0UoxBvp4LxeB2wKSIFUL1VgcB0azgsq6nOwRgMyzcqjniBu7M7gctIQQdypZ
# wSdUo8cViagUXS+YDVLsMreq4FShFWx6JLOGlxvN/eTaicUTjiOccriGmu1huhW/
# VzcfkgZWL1lSKoDeOAOafNjUP557hv0YbiAGa8ywglrukFLdFKIFJOvNdnzmmkiG
# 5YJt2RH/rx+etF0hBI4uZLCnumpiKVM27/9MuMRiF7jZSXx0rz8tFVcscxQY10GP
# pSPL3SZAeLD4HMhndrlLSPAJyboQ4TGPA26yn5nahUGmOhoP91o=
# =kCV9
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 03 Oct 2025 03:33:02 AM PDT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [full]
* tag 'pull-vfio-20251003' of https://github.com/legoater/qemu:
hw/vfio: Use uint64_t for IOVA mapping size in vfio_container_dma_*map
hw/vfio: Avoid ram_addr_t in vfio_container_query_dirty_bitmap()
hw/vfio: Reorder vfio_container_query_dirty_bitmap() trace format
system/iommufd: Use uint64_t type for IOVA mapping size
vfio: Remove workaround for kernel DMA unmap overflow bug
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Merge tag 'pull-riscv-to-apply-20251003-3' of https://github.com/alistair23/qemu into staging
First RISC-V PR for 10.2
* Fix MSI table size limit
* Add riscv64 to FirmwareArchitecture
* Sync RISC-V hwprobe with Linux
* Implement MonitorDef HMP API
* Update OpenSBI to v1.7
* Fix SiFive UART character drop issue and minor refactors
* Fix RISC-V timer migration issues
* Use riscv_cpu_is_32bit() when handling SBI_DBCN reg
* Use riscv_csrr in riscv_csr_read
* Align memory allocations to 2M on RISC-V
* Do not use translator_ldl in opcode_at
* Minor fixes of RISC-V CFI
* Modify minimum VLEN rule
* Fix vslide1[up|down].vx unexpected result when XLEN=32 and SEW=64
* Fixup IOMMU PDT Nested Walk
* Fix endianness swap on compressed instructions
* Update status of IOMMU kernel support
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmjfQhoACgkQr3yVEwxT
# gBPnTg//eQ9GMFTLcW4kFMsVYeY8TbkmQN9Wnk+XubG92siGkzuNmfy36yo7oeib
# dB6/h5JLjycjttOfgyx73/TKUucyZs+ZYkVVWWQCSU+sqPTA370MmGNM8CSmPms/
# lFuNIixd+sSUDIOod9zQHzxv+f3ZN2bjEAyzJAEhSXgTO+1xnOeJHHjxB5O2Z/a1
# ccd3Po1wR6nm2T4x88LcHDHj8svLsfG0G1RRkU+yeLu7J6Qpp0d/lOZI7if+AQqb
# Nmz65n2uSuUEuNNQIxYaQp/nbkF3DSxi3mg3+hCQjF+hMjXL4hAhSEPril3MQjGi
# 802nEaqG8Qdzec+bZiKt0c3e0f4SrnpDXDnz7NrtfSO6vXAvqqZuC8kTdZy8dsPU
# 1D809ksZoNDIB87z89MQPsQ7k1Bs2Iq9pNpB9huD3mzY4DHqYhkzysAwc8Qhvimv
# pBaeSDV66OrI/al5c0FqSN0LiLHvlRcwqiATiQwIdCV+PUe+cVPwIKq6ABQiYpVu
# mvnzgEJ4r7iO92hOoAGM+eRC7krafF1/gbe3SDI3RLUTDPM6hcTRcluvBlpBdNDj
# lIYXs89f0jBh0I4IRGm8ftqD9xPDP56mZVEIIjSWDRTT6mfZLxWWMmXC/OK63U7/
# bpJKohFOKy8P6SSvTACcLSOQlP3r+FRrmBOXs7S24U+Hr9xUep0=
# =DGkt
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 02 Oct 2025 08:25:14 PM PDT
# gpg: using RSA key 6AE902B6A7CA877D6D659296AF7C95130C538013
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65 9296 AF7C 9513 0C53 8013
* tag 'pull-riscv-to-apply-20251003-3' of https://github.com/alistair23/qemu: (26 commits)
docs: riscv-iommu: Update status of kernel support
target/riscv: Fix endianness swap on compressed instructions
hw/riscv/riscv-iommu: Fixup PDT Nested Walk
target/riscv: rvv: Fix vslide1[up|down].vx unexpected result when XLEN=32 and SEW=64
target/riscv: rvv: Modify minimum VLEN according to enabled vector extensions
target/riscv: rvv: Replace checking V by checking Zve32x
target/riscv: Fix ssamoswap error handling
target/riscv: Fix SSP CSR error handling in VU/VS mode
target/riscv: Fix the mepc when sspopchk triggers the exception
target/riscv: do not use translator_ldl in opcode_at
qemu/osdep: align memory allocations to 2M on RISC-V
target/riscv: use riscv_csrr in riscv_csr_read
target/riscv/kvm: Use riscv_cpu_is_32bit() when handling SBI_DBCN reg
target/riscv: Save stimer and vstimer in CPU vmstate
hw/intc: Save timers array in RISC-V mtimer VMState
migration: Add support for a variable-length array of UINT32 pointers
hw/intc: Save time_delta in RISC-V mtimer VMState
hw/char: sifive_uart: Add newline to error message
hw/char: sifive_uart: Remove outdated comment about Tx FIFO
hw/char: sifive_uart: Avoid pushing Tx FIFO when size is zero
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Joel Stanley [Thu, 14 Aug 2025 00:14:50 +0000 (09:44 +0930)]
docs: riscv-iommu: Update status of kernel support
The iommu Linux kernel support is now upstream. VFIO is still
downstream at this stage.
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
Message-ID: <20250814001452.504510-1-joel@jms.id.au> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Current implementation is wrong when iohgatp != bare. The RISC-V
IOMMU specification has defined that the PDT is based on GPA, not
SPA. So this patch fixes the problem, making PDT walk correctly
when the G-stage table walk is enabled.
Fixes: 0c54acb8243d ("hw/riscv: add RISC-V IOMMU base emulation") Cc: qemu-stable@nongnu.org Cc: Sebastien Boeuf <seb@rivosinc.com> Cc: Tomasz Jeznach <tjeznach@rivosinc.com> Reviewed-by: Weiwei Li <liwei1518@gmail.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Tested-by: Chen Pei <cp0613@linux.alibaba.com> Tested-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Message-ID: <20250913041233.972870-1-guoren@kernel.org>
[ Changes by AF:
- Add braces to if statements
] Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Max Chou [Fri, 24 Jan 2025 07:33:22 +0000 (15:33 +0800)]
target/riscv: rvv: Fix vslide1[up|down].vx unexpected result when XLEN=32 and SEW=64
When XLEN is 32 and SEW is 64, the original implementation of
vslide1up.vx and vslide1down.vx helper functions fills the 32-bit value
of rs1 into the first element of the destination vector register (rd),
which is a 64-bit element.
This commit attempted to resolve the issue by extending the rs1 value
to 64 bits during the TCG translation phase to ensure that the helper
functions won't lost the higer 32 bits.
Signed-off-by: Max Chou <max.chou@sifive.com> Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250124073325.2467664-1-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Max Chou [Tue, 23 Sep 2025 09:07:29 +0000 (17:07 +0800)]
target/riscv: rvv: Modify minimum VLEN according to enabled vector extensions
According to the RISC-V unprivileged specification, the VLEN should be greater
or equal to the ELEN. This commit modifies the minimum VLEN based on the vector
extensions and introduces a check rule for VLEN and ELEN.
Signed-off-by: Max Chou <max.chou@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250923090729.1887406-3-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Max Chou [Tue, 23 Sep 2025 09:07:28 +0000 (17:07 +0800)]
target/riscv: rvv: Replace checking V by checking Zve32x
The Zve32x extension will be applied by the V and Zve* extensions.
Therefore we can replace the original V checking with Zve32x checking for both
the V and Zve* extensions.
Signed-off-by: Max Chou <max.chou@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250923090729.1887406-2-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Jim Shu [Wed, 24 Sep 2025 07:48:18 +0000 (15:48 +0800)]
target/riscv: Fix ssamoswap error handling
Follow the RISC-V CFI v1.0 spec [1] to fix the exception type
when ssamoswap is disabled by xSSE.
[1] RISC-V CFI spec v1.0, ch2.7 Atomic Swap from a Shadow Stack Location
Signed-off-by: Jim Shu <jim.shu@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250924074818.230010-4-jim.shu@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Jim Shu [Wed, 24 Sep 2025 07:48:17 +0000 (15:48 +0800)]
target/riscv: Fix SSP CSR error handling in VU/VS mode
In VU/VS mode, accessing $ssp CSR will trigger the virtual instruction
exception instead of illegal instruction exception if SSE is disabled
via xenvcfg CSRs.
This is from RISC-V CFI v1.0 spec ch2.2.4. Shadow Stack Pointer
Signed-off-by: Jim Shu <jim.shu@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250924074818.230010-3-jim.shu@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Jim Shu [Wed, 24 Sep 2025 07:48:16 +0000 (15:48 +0800)]
target/riscv: Fix the mepc when sspopchk triggers the exception
When sspopchk is in the middle of TB and triggers the SW check
exception, it should update PC from gen_update_pc(). If not, RISC-V mepc
CSR will get wrong PC address which is still at the start of TB.
Signed-off-by: Jim Shu <jim.shu@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250924074818.230010-2-jim.shu@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Vladimir Isaev [Fri, 15 Aug 2025 14:06:33 +0000 (17:06 +0300)]
target/riscv: do not use translator_ldl in opcode_at
opcode_at is used only in semihosting checks to match opcodes with expected
pattern.
This is not a translator and if we got following assert if page is not in TLB:
qemu-system-riscv64: ../accel/tcg/translator.c:363: record_save: Assertion
`offset == db->record_start + db->record_len' failed.
Fixes: 1f9c4462334f ("target/riscv: Use translator_ld* for everything") Signed-off-by: Vladimir Isaev <vladimir.isaev@syntacore.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250815140633.86920-1-vladimir.isaev@syntacore.com>
[ Changes by AF:
- Fixup header includes after rebase
] Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Xuemei Liu [Wed, 24 Sep 2025 05:18:03 +0000 (13:18 +0800)]
qemu/osdep: align memory allocations to 2M on RISC-V
Similar to other architectures (e.g., x86_64, aarch64), utilizing THP on RISC-V
KVM requires 2MiB-aligned memory blocks.
Signed-off-by: Xuemei Liu <liu.xuemei1@zte.com.cn> Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20250924131803656Yqt9ZJKfevWkInaGppFdE@zte.com.cn> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
stove [Wed, 27 Aug 2025 20:36:17 +0000 (13:36 -0700)]
target/riscv: use riscv_csrr in riscv_csr_read
Commit 38c83e8d3a33 ("target/riscv: raise an exception when CSRRS/CSRRC
writes a read-only CSR") changed the behavior of riscv_csrrw, which
would formerly be treated as read-only if the write mask were set to 0.
Fixes an exception being raised when accessing read-only vector CSRs
like vtype.
Fixes: 38c83e8d3a33 ("target/riscv: raise an exception when CSRRS/CSRRC writes a read-only CSR") Signed-off-by: stove <stove@rivosinc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250827203617.79947-1-stove@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
target/riscv/kvm: Use riscv_cpu_is_32bit() when handling SBI_DBCN reg
Use the existing riscv_cpu_is_32bit() helper to check for 32-bit CPU.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20250924164515.51782-1-philmd@linaro.org> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
target/riscv: Save stimer and vstimer in CPU vmstate
vmstate_riscv_cpu was missing env.stimer and env.vstimer.
Without migrating these QEMUTimer fields, active S/VS-mode
timer events are lost after snapshot or migration.
Add VMSTATE_TIMER_PTR() entries to save and restore them.
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: TANG Tiancheng <lyndra@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911-timers-v3-4-60508f640050@linux.alibaba.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
hw/intc: Save timers array in RISC-V mtimer VMState
The current 'timecmp' field in vmstate_riscv_mtimer is insufficient to keep
timers functional after migration.
If an mtimer's entry in 'mtimer->timers' is active at the time the snapshot
is taken, it means riscv_aclint_mtimer_write_timecmp() has written to
'mtimecmp' and scheduled a timer into QEMU's main loop 'timer_list'.
During snapshot save, these active timers must also be migrated; otherwise,
after snapshot load there is no mechanism to restore 'mtimer->timers' back
into the 'timer_list', and any pending timer events would be lost.
QEMU's migration framework commonly uses VMSTATE_TIMER_xxx macros to save
and restore 'QEMUTimer' variables. However, 'timers' is a pointer array
with variable length, and vmstate.h did not previously provide a helper
macro for such type.
This commit adds a new macro, 'VMSTATE_TIMER_PTR_VARRAY', to handle saving
and restoring a variable-length array of 'QEMUTimer *'. We then use this
macro to migrate the 'mtimer->timers' array, ensuring that timer events
remain scheduled correctly after snapshot load.
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Signed-off-by: TANG Tiancheng <lyndra@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911-timers-v3-3-60508f640050@linux.alibaba.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
migration: Add support for a variable-length array of UINT32 pointers
Add support for defining a vmstate field which is a variable-length array
of pointers, and use this to define a VMSTATE_TIMER_PTR_VARRAY() which allows
a variable-length array of QEMUTimer* to be used by devices.
Message-id: 20250909-timers-v1-0-7ee18a9d8f4b@linux.alibaba.com Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: TANG Tiancheng <lyndra@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911-timers-v3-2-60508f640050@linux.alibaba.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
hw/vfio: Avoid ram_addr_t in vfio_container_query_dirty_bitmap()
The 'ram_addr_t' type is described as:
a QEMU internal address space that maps guest RAM physical
addresses into an intermediate address space that can map
to host virtual address spaces.
vfio_container_query_dirty_bitmap() doesn't expect such QEMU
intermediate address, but a guest physical addresses. Use the
appropriate 'hwaddr' type, rename as @translated_addr for
clarity.
hw/vfio: Reorder vfio_container_query_dirty_bitmap() trace format
Update the trace-events comments after the changes from
commit dcce51b1938 ("hw/vfio/container-base.c: rename file
to container.c") and commit a3bcae62b6a ("hw/vfio/container.c:
rename file to container-legacy.c").
vfio: Remove workaround for kernel DMA unmap overflow bug
A kernel bug was introduced in Linux v4.15 via commit 71a7d3d78e3c
("vfio/type1: Check for address space wrap-around on unmap"), which
added a test for address space wrap-around in the vfio DMA unmap path.
Unfortunately, due to an integer overflow, the kernel would
incorrectly detect an unmap of the last page in the 64-bit address
space as a wrap-around, causing the unmap to fail with -EINVAL.
A QEMU workaround was introduced in commit 567d7d3e6be5 ("vfio/common:
Work around kernel overflow bug in DMA unmap") to retry the unmap,
excluding the final page of the range.
The kernel bug was then fixed in Linux v5.0 via commit 58fec830fc19
("vfio/type1: Fix dma_unmap wrap-around check"). Since the oldest
supported LTS kernel is now v5.4, kernels affected by this bug are
considered deprecated, and the workaround is no longer necessary.
This change reverts 567d7d3e6be5, removing the workaround.
In QEMU's RISC-V ACLINT timer model, 'mtime' is not stored directly as a
state variable. It is computed on demand as:
mtime = rtc_r + time_delta
where:
- 'rtc_r' is the current VM virtual time (in ticks) obtained via
cpu_riscv_read_rtc_raw() from QEMU_CLOCK_VIRTUAL.
- 'time_delta' is an offset applied when the guest writes a new 'mtime'
value via riscv_aclint_mtimer_write():
time_delta = value - rtc_r
Under this design, 'rtc_r' is assumed to be monotonically increasing
during VM execution. Even if the guest writes an 'mtime' value smaller
than the current one (making 'time_delta' negative in signed arithmetic,
or underflow in unsigned arithmetic), the computed 'mtime' remains
correct because 'rtc_r_new > rtc_r_old':
mtime_new = rtc_r_new + (value - rtc_r_old)
However, this monotonicity assumption breaks on snapshot load.
Before restoring a snapshot, QEMU resets the guest, which calls
riscv_aclint_mtimer_reset_enter() to set 'mtime' to 0 and recompute
'time_delta' as:
time_delta = 0 - rtc_r_reset
Here, the time_delta differs from the value that was present when the
snapshot was saved. As a result, subsequent reads produce a fixed offset
from the true mtime.
This can be observed with the 'date' command inside the guest: after loading
a snapshot, the reported time appears "frozen" at the save point, and only
resumes correctly after the guest has run long enough to compensate for the
erroneous offset.
The fix is to treat 'time_delta' as part of the device's migratable
state and save/restore it via vmstate. This preserves the correct
relation between 'rtc_r' and 'mtime' across snapshot save/load, ensuring
'mtime' continues incrementing from the precise saved value after
restore.
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: TANG Tiancheng <lyndra@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911-timers-v3-1-60508f640050@linux.alibaba.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Frank Chang [Thu, 11 Sep 2025 16:06:46 +0000 (00:06 +0800)]
hw/char: sifive_uart: Add newline to error message
Adds a missing newline character to the error message.
Signed-off-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911160647.5710-5-frank.chang@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Frank Chang [Thu, 11 Sep 2025 16:06:45 +0000 (00:06 +0800)]
hw/char: sifive_uart: Remove outdated comment about Tx FIFO
Since Tx FIFO is now implemented using "qemu/fifo8.h", remove the comment
that no longer reflects the current implementation.
Signed-off-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911160647.5710-4-frank.chang@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Frank Chang [Thu, 11 Sep 2025 16:06:44 +0000 (00:06 +0800)]
hw/char: sifive_uart: Avoid pushing Tx FIFO when size is zero
There's no need to call fifo8_push_all() when size is zero.
Signed-off-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911160647.5710-3-frank.chang@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Frank Chang [Thu, 11 Sep 2025 16:06:43 +0000 (00:06 +0800)]
hw/char: sifive_uart: Raise IRQ according to the Tx/Rx watermark thresholds
Currently, the SiFive UART raises an IRQ whenever:
1. ie.txwm is enabled.
2. ie.rxwm is enabled and the Rx FIFO is not empty.
It does not check the watermark thresholds set by software. However,
since commit [1] changed the SiFive UART character printing from
synchronous to asynchronous, Tx overflows may occur, causing characters
to be dropped when running Linux because:
1. The Linux SiFive UART driver sets the transmit watermark level to 1
[2], meaning a transmit watermark interrupt is raised whenever a
character is enqueued into the Tx FIFO.
2. Upon receiving a transmit watermark interrupt, the Linux driver
transfers up to a full Tx FIFO's worth of characters from the Linux
serial transmit buffer [3], without checking the txdata.full flag
before transferring multiple characters [4].
To fix this issue, we must honor the Tx/Rx watermark thresholds and
raise interrupts only when the Tx threshold is exceeded or the Rx
threshold is undercut.
Signed-off-by: Frank Chang <frank.chang@sifive.com> Signed-off-by: Emmanuel Blot <emmanuel.blot@sifive.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250911160647.5710-2-frank.chang@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Update OpenSBI and the pre-built opensbi32 and opensbi64 images to
version 1.7.
It has been almost an year since we last updated OpenSBI (at the time,
up to v1.5.1) and we're missing a lot of good stuff from both v1.6 and
v1.7, including SBI 3.0 and RPMI 1.0.
The changelog is too large and tedious to post in the commit msg so I
encourage refering to [1] and [2] to see the new features we're adding
into the QEMU roms.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
The MonitorDef API is related to two HMP monitor commands: 'p' and 'x':
(qemu) help p
print|p /fmt expr -- print expression value (use $reg for CPU register access)
(qemu) help x
x /fmt addr -- virtual memory dump starting at 'addr'
For x86, one of the few targets that implements it, it is possible to
print the PC register value with $pc and use the PC value in the 'x'
command as well.
Those 2 commands are hooked into get_monitor_def(), called by
exp_unary() in hmp.c. The function tries to fetch a reg value in two
ways: by reading them directly via a target_monitor_defs array or using
a target_get_monitor_def() helper. In RISC-V we have *A LOT* of
registers and this number will keep getting bigger, so we're opting out
of an array declaration.
We're able to retrieve all regs but vregs because the API only fits an
uint64_t and vregs have 'vlen' size that are bigger than that.
With this patch we can do things such as:
- print CSRs and use their val in expressions:
(qemu) p $mstatus
0xa000000a0
(qemu) p $mstatus & 0xFF
0xa0
- dump the next 10 insn from virtual memory starting at x1 (ra):
Suggested-by: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250703130815.1592493-1-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
linux-user/syscall.c: sync RISC-V hwprobe with Linux
It has been awhile since the last sync. Let's bring QEMU hwprobe support
on par with Linux 6.17-rc4.
A lot of new RISCV_HWPROBE_KEY_* entities are added but this patch is
only adding support for ZICBOM_BLOCK_SIZE.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20250903164043.2828336-1-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Andrew Jones [Thu, 4 Sep 2025 13:27:24 +0000 (08:27 -0500)]
hw/riscv/riscv-iommu: Fix MSI table size limit
The MSI table is not limited to 4k. The only constraint the table has
is that its base address must be aligned to its size, ensuring no
offsets of the table size will overrun when added to the base address
(see "8.5. MSI page tables" of the AIA spec).
Fixes: 0c54acb8243d ("hw/riscv: add RISC-V IOMMU base emulation") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250904132723.614507-2-ajones@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Merge tag 'qtest-20251001-pull-request' of https://gitlab.com/farosas/qemu into staging
Qtest pull request
- Fix for qtest_get_machines QEMU var caching
- Fixes for migration-test in --without-default-devices build
- Preparation patches for cpr-exec test
# -----BEGIN PGP SIGNATURE-----
#
# iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmjdoeoQHGZhcm9zYXNA
# c3VzZS5kZQAKCRDHmNx0G+wxnUGMEACQuy8eGVnh7ni9rDpJxyyUoKAKlNI9+7c+
# 2bi/e+pT26Od5/ExznVOoDlEFFoogQZiVDqxZ3wBB0SziEn41+Wm8SSV6Tto7eNy
# qqVZuYymWUY+MmAeL7RKY+EuVV3Y1a/2lS+w04dCSQYrWf9rr9AX8xdDDln/ebqm
# F1sUhVKO7PA05O3Sw6M1G32l27r6WCsVRliz46gl5MHmmWe4YRR72Eoi3gTsXIkT
# CiEmT9EjOkHykSkgekiN+jgLiAO1pwcaU7Cf4ENhYouBjW9kL46LCmMS+7pcQ1LG
# 2UuXhR3+Ws0ukAUmcJthMRMssjy0OGr845DjiTmOFnbiiUKX9CysuTIIlM+xbDN4
# m/IomtxXAnEncQcUO4vgv81eyGFRtn/Mx9Zdo/x8dtNc6Lh62zqsMj7lp8sjatR/
# DScPTRCRxAUQiY6YMIrJH38m7XLyIaG+oCzg1+EBllmHLoQTtPE4hQHz4MDRj0RA
# aKAvQsucQ/8LxHV9va4W5epVrdP54Yw040QbTnN5XdH4U06yv4pfpBnCw4RYFPj2
# l5TZSNE2WmjeOfT/FERjJWXEXbvcEWoPZMWOc0r1qYGqEAhIui69n//H28EZ2DAE
# QEzNJLBqu4t7U+pw5h1QW5YFdkVZIfmDOx3+SlA/tNcLTQ/a2gzAWkr0AKIzddH9
# ZOGP6b4wFw==
# =OSa5
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 01 Oct 2025 02:49:30 PM PDT
# gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D
# gpg: issuer "farosas@suse.de"
# gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown]
# gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D
* tag 'qtest-20251001-pull-request' of https://gitlab.com/farosas/qemu:
migration-test: strv parameter
migration-test: migrate_args
migration-test: misc exports
migration-test: shm path accessor
migration-test: only_source option
tests/qtest: qtest_init_after_exec
tests/qtest: qtest_qemu_spawn_func
tests/qtest: qtest_create_test_state
tests/qtest: qtest_qemu_args
tests/qtest: export qtest_qemu_binary
tests/qtest: optimize qtest_get_machines
tests/qtest/migration: Fix cpr-tests in case the machine is not available
tests/qtest: Add missing checks for the availability of machines
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Merge tag 'pull-error-2025-09-30-v2' of https://repo.or.cz/qemu/armbru into staging
Error reporting patches for 2025-09-30
# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmjczNQSHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTX3kP/1doayteIqVfNLYJn8EDIU6ccZgAsdVw
# GLHkxSikaBBzjJoG2ebadGusmX8F5H16/KG4vpilP1WHuIw73QRiCFJduFmfFjU/
# SCagaj58PPZaiNJeydN8dSHIDyLLAbIpI1xqdFObBgVKl37E7nZ2uatjKwopmK69
# iV7y39Xcs6wu4gVsz5IH3FC+CdzctWfjjkZbkk3PeNj+Nt7q22RvbB0Rf30P9SBo
# FWnh3UEDz2VIlnuIFSAAXQfJ0+h2l9L0yZ05RnVyMM8rZ72v393X8h/jgEo0ETHI
# eNnJHh/pKL6I+vq10aM/mMgj5fRsly+CsAmjC+11ULg7ybDUMbEU32Ftqeylo2HS
# ZkGw20egEgzMldC5yELTgTjMPCGF9VWWwNNH9OWM58w9ZCyjDb9wDw1uaHU3Tc15
# TZaBwcCGEc/atRFHfWD66oK/KcDrFnWETr6qi9fPJ2SJxiHjHbJe/eNQaxxrEZCu
# 1OntcQdL46Ef1LeQGzhgLNlKyAxq9V9ybh8gPD4yhCK5NCNub2NvWj/CLlnxGJwH
# JHZRRXvVoBPlIMSMydGPV8RHkfUr4NMgHql5Y+VykheEBcg+ThZ2JSjS7avwzCHM
# 5dSUeV+YcvhQN2sojH4xdnUUJWxAAEM1SirkaHTHWZoDKagfjHu3SEYwNyIIchhi
# BAfRdd94Lxpg
# =tlEf
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 30 Sep 2025 11:40:20 PM PDT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [unknown]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* tag 'pull-error-2025-09-30-v2' of https://repo.or.cz/qemu/armbru:
error: Kill @error_warn
ivshmem-flat: Mark an instance of missing error handling FIXME
ui/dbus: Consistent handling of texture mutex failure
ui/dbus: Clean up dbus_update_gl_cb() error checking
ui/pixman: Consistent error handling in qemu_pixman_shareable_free()
util/oslib-win32: Do not treat null @errp as &error_warn
ui/spice-core: Clean up error reporting
net/slirp: Clean up error reporting
hw/remote/vfio-user: Clean up error reporting
migration/cpr: Clean up error reporting in cpr_resave_fd()
hw/cxl: Convert cxl_fmws_link() to Error
tcg: Fix error reporting on mprotect() failure in tcg_region_init()
monitor: Clean up HMP gdbserver error reporting
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Steve Sistare [Wed, 1 Oct 2025 15:34:05 +0000 (08:34 -0700)]
tests/qtest: qtest_init_after_exec
Define a function to create a QTestState object representing the state
of QEMU after old QEMU exec's new QEMU. This is needed for testing
the cpr-exec migration mode.
Steve Sistare [Wed, 1 Oct 2025 15:34:04 +0000 (08:34 -0700)]
tests/qtest: qtest_qemu_spawn_func
Allow the qtest_qemu_spawn caller to pass the function to be called
to perform the spawn. The opaque argument is needed by a new spawn
function in a subsequent patch.
Steve Sistare [Wed, 1 Oct 2025 15:34:02 +0000 (08:34 -0700)]
tests/qtest: qtest_qemu_args
Define an accessor that returns all the arguments used to exec QEMU.
Collect the arguments that were passed to qtest_spawn_qemu, plus the trace
arguments that were composed inside qtest_spawn_qemu, and move them to a
new function qtest_qemu_args.
This will be needed to test the cpr-exec migration mode.
Steve Sistare [Fri, 19 Sep 2025 13:58:30 +0000 (06:58 -0700)]
tests/qtest: optimize qtest_get_machines
qtest_get_machines returns the machines supported by the QEMU binary
described by an environment variable and caches the result. If the
next call to qtest_get_machines passes the same variable name, the cached
result is returned, but if the name changes, the caching is defeated.
To make caching more effective, remember the path of the QEMU binary
instead. Different env vars, eg QTEST_QEMU_BINARY_SRC and
QTEST_QEMU_BINARY_DST, usually resolve to the same path.
Before the optimization, the test /x86_64/migration/precopy/unix/plain
exec's QEMU and calls query-machines 3 times. After optimization, that
only happens once. This does not significantly speed up the tests, but
it reduces QTEST_LOG output, and launches fewer QEMU instances, making
it easier to debug problems.
Thomas Huth [Tue, 30 Sep 2025 09:09:32 +0000 (11:09 +0200)]
tests/qtest/migration: Fix cpr-tests in case the machine is not available
When QEMU has been compiled with "--without-default-devices", the
migration cpr-tests are currently failing since the first test leaves
a socket file behind that avoids that the second test can be initialized
correctly. Make sure that we delete the socket file in case that the
migrate_start() failed due to the missing machine.
Thomas Huth [Tue, 30 Sep 2025 09:04:44 +0000 (11:04 +0200)]
tests/qtest: Add missing checks for the availability of machines
When QEMU has been compiled with "--without-default-devices", the
machines might not be available in the binary. Let's properly check
for the machines before running the tests to avoid that they are
failing in this case.
The syslog backend needs the syslog function from libc and the LOG_INFO enum
value; they are re-exported as "::trace::syslog" and "::trace::LOG_INFO"
so that device crates do not all have to add the libc dependency, but
otherwise there is nothing special.
Signed-off-by: Tanish Desai <tanishdesai37@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250929154938.594389-17-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>