Stefan Hajnoczi [Sun, 13 Jul 2025 05:46:04 +0000 (01:46 -0400)]
Merge tag 'pull-tcg-20250711' of https://gitlab.com/rth7680/qemu into staging
fpu: Process float_muladd_negate_result after rounding
tcg: Use uintptr_t in tcg_malloc implementation
linux-user: Hold the fd-trans lock across fork
linux-user: Implement fchmodat2 syscall
linux-user: Check for EFAULT failure in nanosleep
linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC
linux-user/gen-vdso: Handle fseek() failure
linux-user/gen-vdso: Don't read off the end of buf[]
* tag 'pull-tcg-20250711' of https://gitlab.com/rth7680/qemu:
linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC
tcg: Use uintptr_t in tcg_malloc implementation
linux-user: Hold the fd-trans lock across fork
linux-user/mips/o32: Drop sa_restorer functionality
linux-user/gen-vdso: Don't read off the end of buf[]
linux-user/gen-vdso: Handle fseek() failure
linux-user: Check for EFAULT failure in nanosleep
linux-user: Implement fchmodat2 syscall
fpu: Process float_muladd_negate_result after rounding
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Sun, 13 Jul 2025 05:45:30 +0000 (01:45 -0400)]
Merge tag 'migration-20250711-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- General cleanups around: postcopy, bg-snapshot, migration hooks,
migration completion and formatting of 'info migrate'.
- Overhaul of postcopy blocktime tracking.
# -----BEGIN PGP SIGNATURE-----
#
# iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmhxGdgQHGZhcm9zYXNA
# c3VzZS5kZQAKCRDHmNx0G+wxnahoD/9uNXirlmRk3tDnhiJsiYx+HnXYPFEORSZq
# zlpUyqvhQ1POp3Fa5pRf+bJ5mmPw8h8PdOR2StMpnW2Xa1OatAZj5m1uityAVWOl
# EkVfZLl0j6j9HCCmE3c4dztOGIBsd9YY0GWizL05XHYZPrdX4zOpolMN4m53RwQY
# HUVD6T2y9eFDnCO6MsoA9EfmkFYCRvqlS0VzTcYzQFN4H+QHlcpDfweqJpTLPa+1
# trahAN9PBuMjoewjDqwkNkf0CLaCXHszAfj6yv62Vi8Cbp9DDPywIYJKFnxspElW
# Fjg1b4MdsbYZNmeKgIawzgTOL1RrojvKkoi7KWp3D7M+/ZZl9kBwQuUcBXKI7N0R
# Y0GNfkkTycn18nM0JU/6QWSuVeiPbLArxQUGP1cLgvcHSSNgD9JxWbNBu5+1fFOG
# Gg3qnyYatJ6xJDiCrdKqV8fwozNlm/G6b9BiCDeVq+4nA2OKQ0shiNA1GZHvVSQL
# X4uAPexETdHfA/LeA2w5sgVBEw7BewBdjLntZDIFsyBnLrvqrDcU5Aav0wiHoI8U
# QBC2aIpJfMLHiIQ93mVX96NltXC7KvJTIZVl3iwfiYEYCvQtTYgdJ09ELXFJYxFX
# XpTTazqpmPSfuZpPRgx9YbDP/kS8Fg/PTOlPeD0T/frFgd1S6Thh6OW455PavMp8
# ht2lE4sxjA==
# =vtRD
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 11 Jul 2025 10:04:08 EDT
# gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D
# gpg: issuer "farosas@suse.de"
# gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown]
# gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D
* tag 'migration-20250711-pull-request' of https://gitlab.com/farosas/qemu: (26 commits)
migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread
migration/postcopy: Add latency distribution report for blocktime
migration/postcopy: blocktime allows track / report non-vCPU faults
migration/postcopy: Optimize blocktime fault tracking with hashtable
migration/postcopy: Cleanup the total blocktime accounting
migration/postcopy: Cache the tid->vcpu mapping for blocktime
migration/postcopy: Initialize blocktime context only until listen
migration/postcopy: Report fault latencies in blocktime
migration/postcopy: Add blocktime fault counts per-vcpu
migration/postcopy: Bring blocktime layer to ns level
migration/postcopy: Drop PostcopyBlocktimeContext.start_time
migration/postcopy: Make all blocktime vars 64bits
migration/postcopy: Drop all atomic ops in blocktime feature
migration/postcopy: Push blocktime start/end into page req mutex
migration: Add option to set postcopy-blocktime
migration/postcopy: Avoid clearing dirty bitmap for postcopy too
migration: Rewrite the migration complete detect logic
migration/ram: Add tracepoints for ram_save_complete()
migration/ram: One less indent for ram_find_and_save_block()
migration: qemu_savevm_complete*() helpers
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Sun, 13 Jul 2025 05:45:17 +0000 (01:45 -0400)]
Merge tag 'pull-target-arm-20250711' of https://gitlab.com/pm215/qemu into staging
target-arm queue:
* New board type max78000fthr
* Enable use of CXL on Arm 'virt' board
* Some more tidyup of ID register handling
* Refactor AT insns and PMU regs into separate source files
* Don't enforce NSE,NS check for EL3->EL3 returns
* hw/arm/fsl-imx8mp: Wire VIRQ and VFIQ
* Allow nested-virtualization with KVM on the 'virt' board
* system/qdev: Remove pointless NULL check in qdev_device_add_from_qdict
* hw/arm/virt-acpi-build: Don't create ITS id mappings by default
* target/arm: Remove unused helper_sme2_luti4_4b
* tag 'pull-target-arm-20250711' of https://gitlab.com/pm215/qemu: (36 commits)
tests/functional: Add a test for the MAX78000 arm machine
docs/system: arm: Add max78000 board description
target/arm: Remove helper_sme2_luti4_4b
hw/arm/virt-acpi-build: Don't create ITS id mappings by default
system/qdev: Remove pointless NULL check in qdev_device_add_from_qdict
hw/arm/virt: Allow virt extensions with KVM
hw/arm/arm_gicv3_kvm: Add a migration blocker with kvm nested virt
target/arm: Enable feature ARM_FEATURE_EL2 if EL2 is supported
target/arm/kvm: Add helper to detect EL2 when using KVM
hw/arm: Allow setting KVM vGIC maintenance IRQ
hw/arm/fsl-imx8mp: Wire VIRQ and VFIQ
target/arm: Don't enforce NSE,NS check for EL3->EL3 returns
target/arm: Split out performance monitor regs to cpregs-pmu.c
target/arm: Split out AT insns to tcg/cpregs-at.c
target/arm: Drop stub for define_tlb_insn_regs
arm/kvm: shorten one overly long line
arm/cpu: store clidr into the idregs array
arm/cpu: fix trailing ',' for SET_IDREG
arm/cpu: store id_aa64afr{0,1} into the idregs array
arm/cpu: store id_afr0 into the idregs array
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi [Sun, 13 Jul 2025 05:44:51 +0000 (01:44 -0400)]
Merge tag 'pull-request-2025-07-11' of https://gitlab.com/thuth/qemu into staging
* s390x: Allow to select different entries when booting via pxelinux.cfg
* Link s390-ccw.img statically
* Fix broken bamboo functional test
* s390x code cleanups and refactorings
* tag 'pull-request-2025-07-11' of https://gitlab.com/thuth/qemu:
target/s390x: Have s390_cpu_halt() not return anything
target/s390x: Expose s390_count_running_cpus() method
target/s390x: Remove unused s390_cpu_[un]halt() user stubs
tests/functional/test_ppc_bamboo: Replace broken link with working assets
tests/functional: Add dependency to the keymap_targets
pc-bios: Update the s390 bios images with the pxelinux.cfg loadparm changes
pc-bios/s390-ccw: link statically
tests/functional: Add a test for s390x pxelinux.cfg network booting
pc-bios/s390-ccw: Add a boot menu for booting via pxelinux.cfg
pc-bios/s390-ccw: Make get_boot_index() from menu.c global
pc-bios/s390-ccw: Allow up to 31 entries for pxelinux.cfg
pc-bios/s390-ccw: Allow to select a different pxelinux.cfg entry via loadparm
hw/s390x/s390-pci-bus.c: Use g_assert_not_reached() in functions taking an ett
target/s390x/tcg: Use vaddr in s390_probe_access()
target/s390x/kvm: Use vaddr in find/insert_hw_breakpoint()
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This has two problems:
(1) it doesn't check errors, which Coverity complains about
(2) we use F_GETFL when we mean F_GETFD
Deal with both of these problems by using qemu_set_cloexec() instead.
That function will assert() if the fcntls fail, which is fine (we are
inside fork_start()/fork_end() so we know nothing can mess around
with our file descriptors here, and we just got this one from
pidfd_open()).
(As we are touching the if() statement here, we correct the
indentation.)
Coverity: CID 1508111 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250711141217.1429412-1-peter.maydell@linaro.org>
Juraj Marcin [Thu, 26 Jun 2025 08:52:32 +0000 (10:52 +0200)]
migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread
Recent patch [1] renames the save_live_complete_precopy handler to
save_complete, as the machine is not live in most cases when this
handler is executed. The same is true also for
save_live_complete_precopy_thread, therefore this patch removes the
"live" keyword from the handler itself and related types to keep the
naming unified.
In contrast to save_complete, this handler is only executed at the end
of precopy, therefore the "precopy" keyword is retained.
Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Peter Xu [Fri, 13 Jun 2025 14:12:17 +0000 (10:12 -0400)]
migration/postcopy: Add latency distribution report for blocktime
Add the latency distribution too for blocktime, using order-of-two buckets.
It accounts for all the faults, from either vCPU or non-vCPU threads. With
prior rework, it's very easy to achieve by adding an array to account for
faults in each buckets.
Sample output for HMP (while for QMP it's simply an array):
Postcopy Latency Distribution:
[ 1 us - 2 us ]: 0
[ 2 us - 4 us ]: 0
[ 4 us - 8 us ]: 1
[ 8 us - 16 us ]: 2
[ 16 us - 32 us ]: 2
[ 32 us - 64 us ]: 3
[ 64 us - 128 us ]: 10169
[ 128 us - 256 us ]: 50151
[ 256 us - 512 us ]: 12876
[ 512 us - 1 ms ]: 97
[ 1 ms - 2 ms ]: 42
[ 2 ms - 4 ms ]: 44
[ 4 ms - 8 ms ]: 93
[ 8 ms - 16 ms ]: 138
[ 16 ms - 32 ms ]: 0
[ 32 ms - 65 ms ]: 0
[ 65 ms - 131 ms ]: 0
[ 131 ms - 262 ms ]: 0
[ 262 ms - 524 ms ]: 0
[ 524 ms - 1 sec ]: 0
[ 1 sec - 2 sec ]: 0
[ 2 sec - 4 sec ]: 0
[ 4 sec - 8 sec ]: 0
[ 8 sec - 16 sec ]: 0
Cc: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
When used to report page fault latencies, the blocktime feature can be
almost useless when KVM async page fault is enabled, because in most cases
such remote fault will kickoff async page faults, then it's not trackable
from blocktime layer.
After all these recent rewrites to blocktime layer, it's finally so easy to
also support tracking non-vCPU faults. It'll be even faster if we could
always index fault records with TIDs, unfortunately we need to maintain the
blocktime API which report things in vCPU indexes.
Of course this can work not only for kworkers, but also any guest accesses
that may reach a missing page, for example, very likely when in the QEMU
main thread too (and all other threads whenever applicable).
In this case, we don't care about "how long the threads are blocked", but
we only care about "how long the fault will be resolved".
Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
The old algorithm was almost OK and fast on inserts, except that the lookup
is slow and won't scale if there are a lot of vCPUs: when a page is copied
during postcopy, mark_postcopy_blocktime_end() will walk the whole array
trying to find which vCPUs are blocked by the address. So it needs
constant O(N) walk for each page resolution.
Alexey (the author of postcopy blocktime) mentioned the perf issue and how
to optimize it in a piece of comment in the page resolution path. The
comment was (interestingly..) not complete, but it's relatively clear what
he wanted to say about this perf issue.
Issue 2: Wrong Accounting on re-entrancies
==========================================
People might think that each vCPU should only and always get one fault at a
time, so that when the blocktime layer captured one fault on one vCPU, we
should never see another fault message on this vCPU.
It's almost correct, except in some extreme rare cases.
Case 1: it's possible the fault thread processes the userfaultfd messages
too fast so it can see >1 messages on one vCPU before the previous one was
resolved.
Case 2: it's theoretically also possible one vCPU can get even more than
one message on the same fault address if a fault is retried by the
kernel (e.g., handle_userfault() got interrupted before page resolution).
As this info might be important, instead of using commit message, I put
more details into the code as comment, when introducing an array
maintaining concurrent faults on one vCPU. Please refer to the comments
for details on both cases, especially case 1 which can be tricky.
Case 1 sounds rare, but it can be easily reproduced locally for me when we
run blocktime together with the migration-test on the vanilla postcopy.
New Design
==========
This patch should do almost what Alexey mentioned, but slightly
differently: instead of having an array to maintain vCPU fault addresses,
for each of the fault message we push a message into a hash, indexed by the
fault address.
With the hash, it can replace the old two structs: both the vcpu_addr[]
array, and also the array to store the start time of the fault. However
due to above we need one more counter array to account concurrent faults on
the same vCPU - that should even be needed in the old code, it's just that
the old code was buggy and it will blindly overwrite an existing
entry.. now we'll start to really track everything.
The hash structure might be more efficient than tree to maintain such
addr->(cpu, fault_time) information, so that the insert() and lookup()
paths should ideally both be ~O(1). After all, we do not need to sort.
Here we need to do one remove() though after the lookup(). It could be
slow but only if many vCPUs faulted exactly on the same address (so when
the list of cpu entries is long), which should be unlikely. Even with that,
it's still a worst case O(N) (consider 400 vCPUs faulted on the same
address and how likely is it..) rather than a constant O(N) complexity.
When at it, touch up the tracepoints to make them slightly more useful.
One tracepoint is added when walking all the fault entries.
Peter Xu [Fri, 13 Jun 2025 14:12:14 +0000 (10:12 -0400)]
migration/postcopy: Cleanup the total blocktime accounting
The variable vcpu_total_blocktime isn't easy to follow. In reality, it
wants to capture the case where all vCPUs are stopped, and now there will
be some vCPUs starts running.
The name now starts to conflict with vcpu_blocktime_total[], meanwhile it's
actually not necessary to have the variable at all: since nobody is
touching smp_cpus_down except ourselves, we can safely do the calculation
at the end before decrementing smp_cpus_down.
Hopefully this makes the logic easier to read, side benefit is we drop one
temp var.
Peter Xu [Fri, 13 Jun 2025 14:12:13 +0000 (10:12 -0400)]
migration/postcopy: Cache the tid->vcpu mapping for blocktime
Looking up the vCPU index for each fault can be expensive when there're
hundreds of vCPUs. Provide a cache for tid->vcpu instead with a hash
table, then lookup from there.
When at it, add another counter to record how many non-vCPU faults it gets.
For example, the main thread can also access a guest page that was missing.
These kind of faults are not accounted by blocktime so far.
Peter Xu [Fri, 13 Jun 2025 14:12:12 +0000 (10:12 -0400)]
migration/postcopy: Initialize blocktime context only until listen
Before this patch, the blocktime context can be created very early, because
postcopy_ram_supported_by_host() <- migrate_caps_check() can happen during
migration object init.
The trick here is the blocktime context needs system vCPU information,
which seems to be possible to change after that point. I didn't verify it,
but it doesn't sound right.
Now move it out and initialize the context only when postcopy listen
starts. That is already during a migration so it should be guaranteed the
vCPU topology can never change on both sides.
While at it, assert that the ctx isn't created instead this time; the old
"if" trick isn't needed when we're sure it will only happen once now.
Peter Xu [Fri, 13 Jun 2025 14:12:11 +0000 (10:12 -0400)]
migration/postcopy: Report fault latencies in blocktime
Blocktime so far only cares about the time one vcpu (or the whole system)
got blocked. It would be also be helpful if it can also report the latency
of page requests, which could be very sensitive during postcopy.
Blocktime itself is sometimes not very important, especially when one
thinks about KVM async PF support, which means vCPUs are literally almost
not blocked at all because the guest OS is smart enough to switch to
another task when a remote fault is needed.
However, latency is still sensitive and important because even if the guest
vCPU is running on threads that do not need a remote fault, the workload
that accesses some missing page is still affected.
Add two entries to the report, showing how long it takes to resolve a
remote fault. Mention in the QAPI doc that this is not the real average
fault latency, but only the ones that was requested for a remote fault.
Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice,
meanwhile add the entry checks in qtests for all postcopy tests.
Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Peter Xu [Fri, 13 Jun 2025 14:12:09 +0000 (10:12 -0400)]
migration/postcopy: Bring blocktime layer to ns level
With 64-bit fields, it is trivial. The caution is when exposing any values
in QMP, it was still declared with milliseconds (ms). Hence it's needed to
do the convertion when exporting the values to existing QMP queries.
Peter Xu [Fri, 13 Jun 2025 14:12:07 +0000 (10:12 -0400)]
migration/postcopy: Make all blocktime vars 64bits
I am guessing it was used to be 32bits because of the atomic ops. Now all
the atomic ops are gone and we're protected by a mutex instead, it's ok we
can switch to 64 bits.
Reasons to move over:
- Allow further patches to change the unit from ms to us: with postcopy
preempt mode, we're really into hundreds of microseconds level on
blocktime. We'd better be able to trap those.
- This also paves way for some other tricks that the original version
used to avoid overflows, e.g., start_time was almost only useful before
to make sure the sampled timestamp won't overflow a 32-bit field.
- This prepares further reports on top of existing data collected,
e.g. average page fault latencies. When average operation is taken into
account, milliseconds are simply too coarse grained.
When at it:
- Rename page_fault_vcpu_time to vcpu_blocktime_start.
- Rename vcpu_blocktime to vcpu_blocktime_total.
- Touch up the trace-events to not dump blocktime ctx pointer
Peter Xu [Fri, 13 Jun 2025 14:12:05 +0000 (10:12 -0400)]
migration/postcopy: Push blocktime start/end into page req mutex
The postcopy blocktime feature was tricky that it used quite some atomic
operations over quite a few arrays and vars, without explaining how that
would be thread safe. The thread safety here is about concurrency between
the fault thread and the fault resolution threads, possible to access the
same chunk of data. All these atomic ops can be expensive too before
knowing clearly how it works.
OTOH, postcopy has one page_request_mutex used to serialize the received
bitmap updates. So far it's ok - we don't yet have a lot of threads
contending the lock. It might change after multifd will be supported, but
that's a separate story. What is important is, with that mutex, it's
pretty lightweight to move all the blocktime maintenance into the mutex
critical section. It's because the blocktime layer is lightweighted:
almost "remember which vcpu faulted on which address", and "ok we get some
fault resolved, calculate how long it takes". It's also an optional
feature for now (but I have thought of changing that, maybe in the future).
Let's push the blocktime layer into the mutex, so that it's always
thread-safe even without any atomic ops.
To achieve that, I'll need to add a tid parameter on fault path so that
it'll start to pass the faulted thread ID into deeper the stack, but not
too deep. When at it, add a comment for the shared fault handler (for
example, vhost-user devices running with postcopy), to mention a TODO. One
reason it might not be trivial is that vhost-user's userfaultfds should be
opened by vhost-user process, so it's pretty hard to control making sure
the TID feature will be around. It wasn't supported before, so keep it
like that for now.
Now we should be as ease when everything is protected by a mutex that we
always take anyway.
One side effect: we can finally remove one ramblock_recv_bitmap_test() in
mark_postcopy_blocktime_begin(), which was pretty weird and which also
includes a weird (but maybe necessary.. but maybe not?) operation to inject
a blocktime entry then quickly erase it.. When we're with the mutex, and
when we make sure it's invoked after checking the receive bitmap, it's not
needed anymore. Instead, we assert.
As another side effect, this paves way for removing all atomic ops in all
the mem accesses in blocktime layer.
Note that we need a stub for mark_postcopy_blocktime_begin() for Windows
builds.
Peter Xu [Fri, 13 Jun 2025 14:08:00 +0000 (10:08 -0400)]
migration: Rewrite the migration complete detect logic
There're a few things off here in that logic, rewrite it. When at it, add
rich comment to explain each of the decisions.
Since this is very sensitive path for migration, below are the list of
things changed with their reasonings.
(1) Exact pending size is only needed for precopy not postcopy
Fundamentally it's because "exact" version only does one more deep
sync to fetch the pending results, while in postcopy's case it's
never going to sync anything more than estimate as the VM on source
is stopped.
(2) Do _not_ rely on threshold_size anymore to decide whether postcopy
should complete
threshold_size was calculated from the expected downtime and
bandwidth only during precopy as an efficient way to decide when to
switchover. It's not sensible to rely on threshold_size in postcopy.
For precopy, if switchover is decided, the migration will complete
soon. It's not true for postcopy. Logically speaking, postcopy
should only complete the migration if all pending data is flushed.
Here it used to work because save_complete() used to implicitly
contain save_live_iterate() when there's pending size.
Even if that looks benign, having RAMs to be migrated in postcopy's
save_complete() has other bad side effects:
(a) Since save_complete() needs to be run once at a time, it means
when moving RAM there's no way moving other things (rather than
round-robin iterating the vmstate handlers like what we do with
ITERABLE phase). Not an immediate concern, but it may stop working
in the future when there're more than one iterables (e.g. vfio
postcopy).
(b) postcopy recovery, unfortunately, only works during ITERABLE
phase. IOW, if src QEMU moves RAM during postcopy's save_complete()
and network failed, then it'll crash both QEMUs... OTOH if it failed
during iteration it'll still be recoverable. IOW, this change should
further reduce the window QEMU split brain and crash in extreme cases.
If we enable the ram_save_complete() tracepoints, we'll see this
before this patch:
This shouldn't be super important, the movement makes sure there's
only one in_postcopy check, then we are clear on what we do with the
two completely differnt use cases (precopy v.s. postcopy).
(4) Trivial touch up on threshold_size comparision
Peter Xu [Fri, 13 Jun 2025 14:07:56 +0000 (10:07 -0400)]
migration: Rename save_live_complete_precopy to save_complete
Now after merging the precopy and postcopy version of complete() hook,
rename the precopy version from save_live_complete_precopy() to
save_complete().
Dropping the "live" when at it, because it's in most cases not live when
happening (in precopy).
No functional change intended.
Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-7-peterx@redhat.com
[peterx: squash the fixup that covers a few more doc spots, per Juraj] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Peter Xu [Fri, 13 Jun 2025 14:07:55 +0000 (10:07 -0400)]
migration: Drop save_live_complete_postcopy hook
The hook is only defined in two vmstate users ("ram" and "block dirty
bitmap"), meanwhile both of them define the hook exactly the same as the
precopy version. Hence, this postcopy version isn't needed.
Peter Xu [Fri, 13 Jun 2025 14:07:53 +0000 (10:07 -0400)]
migration/docs: Move docs for postcopy blocktime feature
Move it out of vanilla postcopy session, but instead a standalone feature.
When at it, removing the NOTE because it's incorrect now after introduction
of max-postcopy-bandwidth, which can control the throughput even for
postcopy phase.
This is only found when I started to look into making the blocktime feature
more useful (so as to avoid using bpftrace, even though I'm not sure which
one will be harder to use..).
So the old dump would look like this:
Postcopy vCPU Blocktime: 0-1,4,10,21,33,46,48,59
Even though there're actually 40 vcpus, and the string will merge same
elements and also sort them.
To fix it, simply loop over the uint32List manually. Now it looks like:
Cc: Dr. David Alan Gilbert <dave@treblig.org> Cc: Alexey Perevalov <a.perevalov@samsung.com> Cc: Markus Armbruster <armbru@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
I followed Dave's suggestion, and some more modifications on top:
- Added all elements into the picture
- Use size_to_str() and drop most of the units: benefit is more friendly
to most human eyes, bad side effect is lose of details, but that should
be corner case per my uses, and one can still leverage the QMP interface
when necessary.
- Sub-grouping for "Transfers" ("Channels" and "Page Types").
Suggested-by: Dr. David Alan Gilbert <dave@treblig.org> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
tests/functional: Add a test for the MAX78000 arm machine
Runs a binary from the max78000test repo used in
developing the qemu implementation of the max78000
to verify that the machine and implemented devices
generally still work.
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20250711110626.624534-3-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jackson Donaldson <jcksn@duck.com>
Message-id: 20250711110626.624534-2-jcksn@duck.com
[PMM: Moved doc to correct place in index; made underlines correct
length; added missing trailing newline; added SPDX] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
target/s390x: Have s390_cpu_halt() not return anything
Since halting a vCPU and how many left running do not need
to be tied together, split the s390_count_running_cpus()
call out of s390_cpu_halt() to the single caller using it:
s390_handle_wait().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250708095746.12697-4-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
In order to simplify the next commit where s390_count_running_cpus()
is split out of s390_cpu_halt(), make its prototype public as a
preliminary step.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250708095746.12697-3-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
target/s390x: Remove unused s390_cpu_[un]halt() user stubs
Since commit da944885469 ("target/s390x: make helper.c
sysemu-only") target/s390x/helper.c is only built for
system mode, so s390_cpu_halt() and s390_cpu_unhalt()
are never called from user mode.
Fixes: da944885469 ("target/s390x: make helper.c sysemu-only") Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250708095746.12697-2-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Thomas Huth [Mon, 7 Jul 2025 18:47:36 +0000 (20:47 +0200)]
tests/functional/test_ppc_bamboo: Replace broken link with working assets
The old image that we used for testing the bamboo machine has disappeared
from the internet. Fortunately there is another kernel + initrd provided
by Cédric that can be used for testing this machine, too.
Reported-by: Stefan Hajnoczi <stefanha@gmail.com> Suggested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250707184736.88660-1-thuth@redhat.com>
Thomas Huth [Tue, 1 Jul 2025 10:48:27 +0000 (12:48 +0200)]
tests/functional: Add dependency to the keymap_targets
When doing a "configure" in a an empty build directory, followed by
a "make check" without a normal build in between, the vnc functional
test currently fails since the keymaps have not been built yet.
Thus add a dependency to the keymap_targets here to make sure that
the keymaps are built before running the functional tests.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250701104827.363904-1-thuth@redhat.com>
Thomas Huth [Wed, 9 Jul 2025 08:34:43 +0000 (10:34 +0200)]
tests/functional: Add a test for s390x pxelinux.cfg network booting
Check the various ways of booting a kernel via pxelinux.cfg file,
e.g. by specifying the config file name via the MAC address or the
UUID of the guest. Also check whether we can successfully load an
alternate kernel via the "loadparm" parameter here and whether the
boot menu shows up with "-boot menu=on".
Reviewed-by: Jared Rossi <jrossi@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250709083443.41574-6-thuth@redhat.com>
Thomas Huth [Wed, 9 Jul 2025 08:34:41 +0000 (10:34 +0200)]
pc-bios/s390-ccw: Make get_boot_index() from menu.c global
We are going to reuse this function for selecting an entry from
the pxelinux.cfg menu, so rename this function with a "menu_"
prefix and make it available globally.
Reviewed-by: Jared Rossi <jrossi@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250709083443.41574-4-thuth@redhat.com>
Thomas Huth [Wed, 9 Jul 2025 08:34:40 +0000 (10:34 +0200)]
pc-bios/s390-ccw: Allow up to 31 entries for pxelinux.cfg
We're going to support a menu for the pxelinux.cfg code, and to be able
to reuse some functionality from menu.c, we should align the maximum
amount of possible entries with the MAX_BOOT_ENTRIES constant that is
used there. Thus replace MAX_PXELINUX_ENTRIES with MAX_BOOT_ENTRIES.
Reviewed-by: Jared Rossi <jrossi@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250709083443.41574-3-thuth@redhat.com>
Thomas Huth [Wed, 9 Jul 2025 08:34:39 +0000 (10:34 +0200)]
pc-bios/s390-ccw: Allow to select a different pxelinux.cfg entry via loadparm
Since we're linking the network booting code into the main firmware
binary nowadays, we can support the "loadparm" parameter now quite
easily for pxelinux.cfg config files that contain multiple entries.
Reviewed-by: Jared Rossi <jrossi@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20250709083443.41574-2-thuth@redhat.com>
Peter Maydell [Thu, 10 Jul 2025 16:15:52 +0000 (17:15 +0100)]
hw/s390x/s390-pci-bus.c: Use g_assert_not_reached() in functions taking an ett
The s390-pci-bus.c code, Coverity complains about a possible overflow
because get_table_index() can return -1 if the ett value passed in is
not one of the three permitted ZPCI_ETT_PT, ZPCI_ETT_ST, ZPCI_ETT_RT,
but the caller in table_translate() doesn't check this and instead
uses the return value directly in a calculation of the guest address
to read from.
In fact this case cannot happen, because:
* get_table_index() is called only from table_translate()
* the only caller of table_translate() loops through the ett values
in the order RT, ST, PT until table_translate() returns 0
* table_translate() will return 0 for the error cases and when
translate_iscomplete() returns true
* translate_iscomplete() is always true for ZPCI_ETT_PT
So table_translate() is always called with a valid ett value.
Instead of having the various functions called from table_translate()
return a default or dummy value when the ett argument is out of range,
use g_assert_not_reached() to indicate that this is impossible.
Coverity: CID 1547609 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Message-ID: <20250710161552.1287399-1-peter.maydell@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
target/s390x/tcg: Use vaddr in s390_probe_access()
Commit 70ebd9ce1cb ("s390x/tcg: Fault-safe memset") passed
vaddr type to access_prepare(), and commit b6c636f2cd6
("s390x/tcg: Fault-safe memmove") to do_access_get_byte(),
but declared S390Access::vaddr[1,2] as target_ulong.
Directly declare these as vaddr type, and have
s390_probe_access() use that type as argument.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250707171059.3064-3-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
target/s390x/kvm: Use vaddr in find/insert_hw_breakpoint()
Since commit b8a6eb1862a both kvm_arch_insert_hw_breakpoint()
and kvm_arch_remove_hw_breakpoint() use a vaddr type. Use the
same type for the callees.
Fixes: b8a6eb1862a ("sysemu/kvm: Use vaddr for kvm_arch_[insert|remove]_hw_breakpoint") Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250707171059.3064-2-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Page size of TLB entry comes from CSR STLBPS and pwcl register. With
huge page, it is dir_base + dir_width from pwcl register. With normal
page, it is field of PTBASE from pwcl register.
So it is ok to check validity in function helper_ldpte() and function
helper_csrwr_stlbps(). And it is unnecessary in tlb entry fill path.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn>
Function helper_csrwr_stlbps() is emulation with CSR STLBPS register
write operation. However there is only parameter checking action, and
no register updating action. Here update value of CSR_STLBPS when
parameter passes to check.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn>
Geoffrey Thomas [Fri, 14 Mar 2025 12:47:42 +0000 (08:47 -0400)]
linux-user: Hold the fd-trans lock across fork
If another thread is holding target_fd_trans_lock during a fork,
then the lock becomes permanently locked in the child and the
emulator deadlocks at the next interaction with the fd-trans table.
As with other locks, acquire the lock in fork_start() and release
it in fork_end().
Cc: qemu-stable@nongnu.org Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com> Fixes: c093364f4d91 "fd-trans: Fix race condition on reallocation of the translation table."
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2846 Buglink: https://github.com/astral-sh/uv/issues/6105 Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250314124742.4965-1-geofft@ldpreload.com>
linux-user/mips/o32: Drop sa_restorer functionality
The Linux kernel dropped support for sa_restorer on O32 MIPS in the
release 2.5.48 because it was unused. See the comment in
arch/mips/include/uapi/asm/signal.h.
Applications using the kernels UAPI headers will not reserve enough
space for qemu-user to copy the sigaction.sa_restorer field to.
Unrelated data may be overwritten.
Align qemu-user with the kernel by also dropping sa_restorer support.
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250709-mips-sa-restorer-v1-1-fc17120e4afe@t-8ch.de>
Peter Maydell [Thu, 10 Jul 2025 17:07:07 +0000 (18:07 +0100)]
linux-user/gen-vdso: Don't read off the end of buf[]
In gen-vdso we load in a file and assume it's a valid ELF file. In
particular we assume it's big enough to be able to read the ELF
information in e_ident in the ELF header.
Add a check that the total file length is at least big enough for all
the e_ident bytes, which is good enough for the code in gen-vdso.c.
This will catch the most obvious possible bad input file (truncated)
and allow us to run the sanity checks like "not actually an ELF file"
without potentially crashing.
The code in elf32_process() and elf64_process() still makes
assumptions about the file being well-formed, but this is OK because
we only run it on the vdso binaries that we create ourselves in the
build process by running the compiler.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250710170707.1299926-3-peter.maydell@linaro.org>
Peter Maydell [Thu, 10 Jul 2025 16:43:54 +0000 (17:43 +0100)]
linux-user: Check for EFAULT failure in nanosleep
target_to_host_timespec() returns an error if the memory the guest
passed us isn't actually readable. We check for this everywhere
except the callsite in the TARGET_NR_nanosleep case, so this mistake
was caught by a Coverity heuristic.
Add the missing error checks to the calls that convert between the
host and target timespec structs.
Coverity: CID 1507104 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250710164355.1296648-1-peter.maydell@linaro.org>
Peter Maydell [Thu, 10 Jul 2025 11:31:23 +0000 (12:31 +0100)]
linux-user: Implement fchmodat2 syscall
The fchmodat2 syscall is new from Linux 6.6; it is like the
existing fchmodat syscall except that it takes a flags parameter.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3019 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250710113123.1109461-1-peter.maydell@linaro.org>
fpu: Process float_muladd_negate_result after rounding
Changing the sign before rounding affects the correctness of
the asymmetric rouding modes: float_round_up and float_round_down.
Reported-by: WANG Rui <wangrui@loongson.cn> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
hw/arm/virt-acpi-build: Don't create ITS id mappings by default
Commit d6afe18b7242 ("hw/arm/virt-acpi-build: Fix ACPI IORT and MADT tables
when its=off") moved ITS group node generation under the its=on condition.
However, it still creates rc_its_idmaps unconditionally, which results in
duplicate ID mappings in the IORT table.
Fixes:d6afe18b7242 ("hw/arm/virt-acpi-build: Fix ACPI IORT and MADT tables when its=off") Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
system/qdev: Remove pointless NULL check in qdev_device_add_from_qdict
Coverity reported a unnecessary NULL check:
qemu/system/qdev-monitor.c: 720 in qdev_device_add_from_qdict()
683 /* create device */
684 dev = qdev_new(driver);
...
719 err_del_dev:
>>> CID 1590192: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "dev" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
720 if (dev) {
721 object_unparent(OBJECT(dev));
722 object_unref(OBJECT(dev));
723 }
724 return NULL;
725 }
Indeed, unlike qdev_try_new() which can return NULL,
qdev_new() always returns a heap pointer (or aborts).
Remove the unnecessary assignment and check.
Fixes: f3a85056569 ("qdev/qbus: add hidden device support")
Resolves: Coverity CID 1590192 (Null pointer dereferences) Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 10 Jul 2025 10:39:33 +0000 (11:39 +0100)]
hw/arm/virt: Allow virt extensions with KVM
Up to now virt support on guest has been only supported with TCG.
Now it becomes feasible to use it with KVM acceleration.
Check neither in-kernel GICv3 nor aarch64=off is used along with KVM
EL2.
Signed-off-by: Haibo Xu <haibo.xu@linaro.org> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250707164129.1167837-6-eric.auger@redhat.com
[PMM: make "kernel doesn't have EL2 support" error message
distinct from the old "QEMU doesn't have KVM EL2 support" one] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Eric Auger [Mon, 7 Jul 2025 16:40:30 +0000 (18:40 +0200)]
hw/arm/arm_gicv3_kvm: Add a migration blocker with kvm nested virt
We may be miss some NV related GIC register save/restore. Until
we complete the study, let's add a migration blocker when the
maintenance IRQ is set.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250707164129.1167837-5-eric.auger@redhat.com Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
target/arm: Enable feature ARM_FEATURE_EL2 if EL2 is supported
KVM_CAP_ARM_EL2 must be supported by the cpu to enable ARM_FEATURE_EL2.
In case the host does support NV, expose the feature.
Signed-off-by: Haibo Xu <haibo.xu@linaro.org> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250707164129.1167837-4-eric.auger@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
target/arm/kvm: Add helper to detect EL2 when using KVM
Introduce query support for KVM_CAP_ARM_EL2.
Signed-off-by: Haibo Xu <haibo.xu@linaro.org> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250707164129.1167837-3-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Allow virt arm machine to set the interrupt ID for the KVM
GIC maintenance interrupt.
This setting must be done before the KVM_DEV_ARM_VGIC_CTRL_INIT
hence the choice to perform the setting in the GICv3 realize
instead of proceeding the same way as kvm_arm_pmu_set_irq().
Signed-off-by: Haibo Xu <haibo.xu@linaro.org> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250707164129.1167837-2-eric.auger@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 4 Jul 2025 16:56:36 +0000 (17:56 +0100)]
target/arm: Don't enforce NSE,NS check for EL3->EL3 returns
In the Arm ARM, rule R_TYTWB that defines illegal exception return cases
includes the case:
If FEAT_RME is implemented, then if SCR_EL3.{NSE, NS} is {1, 0}, an
exception return from EL3 to a lower Exception level
Our implementation of this check fails to check that the
return is to a lower exception level, so it will incorrectly fire on
EL3->EL3 exception returns.
Fix the check condition. This requires us to move it further
down in the function to a point where we know the new_el value.
Fixes: 35aa6715ddcd9 ("target/arm: Catch illegal-exception-return from EL3 with bad NSE/NS") Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3016 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250704165636.261888-1-peter.maydell@linaro.org
target/arm: Split out performance monitor regs to cpregs-pmu.c
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250707151547.196393-4-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Split out all "system instructions for address translation".
While mapped into "cpregs", these are instructions, and thus
are handled in hardware by virtualization. They are all
priviledged, and thus not reachable for user-only.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250707151547.196393-3-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Allow the call to be compiled out by protecting it
with tcg_enabled.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250707151547.196393-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 804cfc7eedb7 ("arm/cpu: Store aa64isar0/aa64zfr0 into the idregs arrays") Signed-off-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250704141927.38963-6-cohuck@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
While a trailing comma is not broken for SET_IDREG invocations, it
does look odd; use a semicolon instead.
Fixes: f1fd81291c91 ("arm/cpu: Store aa64mmfr0-3 into the idregs array") Fixes: def3f1c1026a ("arm/cpu: Store aa64dfr0/1 into the idregs array") Signed-off-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20250704141927.38963-4-cohuck@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add a single complex case for aarch64 virt machine.
Given existing much more comprehensive tests for x86 cover the common
functionality, a single test should be enough to verify that the aarch64
part continues to work.
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Message-id: 20250703104110.992379-6-Jonathan.Cameron@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Only add one very simple example as all the i386/pc examples will work
for arm/virt with a change to appropriate executable and appropriate
standard launch line for arm/virt. Note that max cpu is used to
ensure we have plenty of physical address space.
Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Message-id: 20250703104110.992379-5-Jonathan.Cameron@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl
Code based on i386/pc enablement.
The memory layout places space for 16 host bridge register regions after
the GIC_REDIST2 in the extended memmap. This is a hole in the current
map so adding them here has no impact on placement of other memory regions
(tested with enough CPUs for GIC_REDIST2 to be in use.)
The high memory map is GiB aligned so the hole is there whatever the
size of memory or device_memory below this point.
The CFMWs are placed above the extended memmap. Note the existing
variable highest_gpa is the highest GPA that has been allocated at
a particular point in setting up the memory map. Whilst this caused
some confusion in review there are existing comments explaining this
so nothing is added.
The cxl_devices_state.host_mr provides a small space in which to place
the individual host bridge register regions for whatever host bridges are
allocated via -device pxb-cxl on the command line. The existing dynamic
sysbus infrastructure is not reused because pxb-cxl is a PCI device not
a sysbus one but these registers are directly in the main memory map,
not the PCI address space.
Only create the CEDT table if cxl=on set for the machine. Default to off.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Message-id: 20250703104110.992379-4-Jonathan.Cameron@huawei.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
hw/cxl: Make the CXL fixed memory windows devices.
Previously these somewhat device like structures were tracked using a list
in the CXLState in each machine. This is proving restrictive in a few
cases where we need to iterate through these without being aware of the
machine type. Just make them sysbus devices.
Restrict them to not user created as they need to be visible to early
stages of machine init given effects on the memory map.
This change both simplifies state tracking and enables features needed
for performance optimization and hotness tracking by making it possible
to retrieve the fixed memory window on actions elsewhere in the topology.
In some cases the ordering of the Fixed Memory Windows matters.
For those utility functions provide a GSList sorted by the window index.
This ensures that we get consistency across:
- ordering in the command line
- ordering of the host PA ranges
- ordering of ACPI CEDT structures describing the CFMWS.
Other aspects don't have this constraint. For those direct iteration
of the underlying hash structures is fine.
In the setup path for the memory map in pc_memory_init() split the
operations into two calls. The first, cxl_fmws_set_mmemap(), loops over
fixed memory windows in order and assigns their addresses. The second,
cxl_fmws_update_mmio() actually sets up the mmio for each window.
This is obviously less efficient than a single loop but this split design
is needed to put the logic in two different places in the arm64 support
and it is not a hot enough path to justify an x86 only implementation.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Message-id: 20250703104110.992379-3-Jonathan.Cameron@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
hw/cxl-host: Add an index field to CXLFixedMemoryWindow
To enable these to be found in a fixed order, that order needs to be known.
This will later be used to sort a list of these structures so that address
map and ACPI table entries are predictable.
Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Message-id: 20250703104110.992379-2-Jonathan.Cameron@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-12-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-11-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jackson Donaldson
Message-id: 20250704223239.248781-10-jcksn@duck.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit implements the True Random Number
Generator for the MAX78000
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-9-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit adds the Global Control Register to
max78000_soc
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-8-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit implements the Global Control Register
for the MAX78000
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-7-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <petermaydell@linaro.org>
Message-id: 20250704223239.248781-6-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit implements UART support for the MAX78000
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-5-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit adds the instruction cache controller
to max78000_soc
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <petermaydell@linaro.org>
Message-id: 20250704223239.248781-4-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit implements the Instruction Cache Controller
for the MAX78000
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-3-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch adds support for the MAX78000FTHR machine.
The MAX78000FTHR contains a MAX78000 and a RISC-V core. This patch
implements only the MAX78000, which is Cortex-M4 based.
Details can be found at:
https://www.analog.com/media/en/technical-documentation/user-guides/max78000-user-guide.pdf
Signed-off-by: Jackson Donaldson <jcksn@duck.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20250704223239.248781-2-jcksn@duck.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Stefan Hajnoczi [Mon, 7 Jul 2025 13:22:41 +0000 (09:22 -0400)]
Merge tag 'pull-target-arm-20250704' of https://gitlab.com/pm215/qemu into staging
target-arm queue:
* Implement emulation of SME2p1 and SVE2p1
* Correctly enforce alignment checks for v8M loads and
stores done via helper functions
* Mark the "highbank" and the "midway" machine as deprecated
* tag 'pull-target-arm-20250704' of https://gitlab.com/pm215/qemu: (119 commits)
linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1
target/arm: Enable FEAT_SME2p1 on -cpu max
target/arm: Implement SME2 BFMOPA (non-widening)
target/arm: Implement FMOPA (non-widening) for fp16
target/arm: Support FPCR.AH in SME FMOPS, BFMOPS
target/arm: Rename BFMOPA to BFMOPA_w
target/arm: Rename FMOPA_h to FMOPA_w_h
target/arm: Implement LUTI2, LUTI4 for SME2/SME2p1
target/arm: Implement MOVAZ for SME2p1
target/arm: Implement LD1Q, ST1Q for SVE2p1
target/arm: Implement {LD, ST}[234]Q for SME2p1/SVE2p1
target/arm: Move ld1qq and st1qq primitives to sve_ldst_internal.h
target/arm: Implement {LD1, ST1}{W, D} (128-bit element) for SVE2p1
target/arm: Split the ST_zpri and ST_zprr patterns
target/arm: Implement SME2 counted predicate register load/store
target/arm: Implement TBLQ, TBXQ for SME2p1/SVE2p1
target/arm: Implement ZIPQ, UZPQ for SME2p1/SVE2p1
target/arm: Implement PMOV for SME2p1/SVE2p1
target/arm: Implement EXTQ for SME2p1/SVE2p1
target/arm: Implement DUPQ for SME2p1/SVE2p1
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* tag 'accel-20250704' of https://github.com/philmd/qemu: (35 commits)
MAINTAINERS: Add me as reviewer of overall accelerators section
monitor/hmp-cmds-target: add CPU_DUMP_VPU in hmp_info_registers()
accel/system: Convert pre_resume() from AccelOpsClass to AccelClass
accel: Pass AccelState argument to gdbstub_supported_sstep_flags()
accel: Remove unused MachineState argument of AccelClass::setup_post()
accel: Directly pass AccelState argument to AccelClass::has_memory()
accel/kvm: Directly pass KVMState argument to do_kvm_create_vm()
accel/kvm: Prefer local AccelState over global MachineState::accel
accel/tcg: Prefer local AccelState over global current_accel()
accel/hvf: Re-use QOM allocated state
accel: Propagate AccelState to AccelClass::init_machine()
accel: Keep reference to AccelOpsClass in AccelClass
accel: Expose and register generic_handle_interrupt()
accel/dummy: Extract 'dummy-cpus.h' header from 'system/cpus.h'
accel/whpx: Expose whpx_enabled() to common code
accel/nvmm: Expose nvmm_enabled() to common code
accel/system: Document cpu_synchronize_state_post_init/reset()
accel/system: Document cpu_synchronize_state()
accel/kvm: Remove kvm_cpu_synchronize_state() stub
accel/whpx: Replace @dirty field by generic CPUState::vcpu_dirty field
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Conflicts:
accel/accel-system.c
accel/hvf/hvf-all.c
include/qemu/accel.h
linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250704142112.1018902-108-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250704142112.1018902-107-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Fri, 4 Jul 2025 14:21:08 +0000 (08:21 -0600)]
target/arm: Implement SME2 BFMOPA (non-widening)
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250704142112.1018902-106-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>