When we remove dest from orig's links, we lose the link
that we rely on later to reset links. This can lead to
failure to release from spinlock with self-modifying code.
Cc: qemu-stable@nongnu.org Reported-by: 李威威 <liweiwei@kubuds.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Anton Johansson <anjo@rev.ng> Tested-by: Anton Johansson <anjo@rev.ng>
This is a logical reversion of 2ad04500543, though
there have been additions to the set of mmu indexes
since then. The impetus to that original patch,
"9-15 will use shorter assembler instructions when
run on a x86-64 host" is now handled generically.
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
include/hw/core/cpu: Invert the indexing into CPUTLBDescFast
This array is within CPUNegativeOffsetState, which means the
last element of the array has an offset from env with the
smallest magnitude. This can be encoded into fewer bits
when generating TCG fast path memory references.
When we changed the NB_MMU_MODES to be a global constant,
rather than a per-target value, we pessimized the code
generated for targets which use only a few mmu indexes.
By inverting the array index, we counteract that.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Peter Maydell [Thu, 18 Sep 2025 11:42:59 +0000 (12:42 +0100)]
hw/pci-host/astro: Don't call pci_regsiter_root_bus() in init
In the astro PCI host bridge device, we call pci_register_root_bus()
in the device's instance_init. This is a problem for two reasons
* the PCI bridge is then available to the rest of the simulation
(e.g. via pci_qdev_find_device()), even though it hasn't
yet been realized
* we do not attempt to unregister in an instance_deinit,
which means that if you go through an instance_init -> deinit
lifecycle the freed memory for the host-bridge device is
left on the pci_host_bridges list
ASAN reports the resulting use-after-free:
==1776584==ERROR: AddressSanitizer: heap-use-after-free on address 0x51f00000cb00 at pc 0x5b2d460a89b5 bp 0x7ffef7617f50 sp 0x7ffef7617f48
WRITE of size 8 at 0x51f00000cb00 thread T0
#0 0x5b2d460a89b4 in pci_host_bus_register /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:608:5
#1 0x5b2d46093566 in pci_root_bus_internal_init /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:677:5
#2 0x5b2d460935e0 in pci_root_bus_new /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:706:5
#3 0x5b2d46093fe5 in pci_register_root_bus /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:751:11
#4 0x5b2d46fe2335 in elroy_pcihost_init /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci-host/astro.c:455:16
0x51f00000cb00 is located 1664 bytes inside of 3456-byte region [0x51f00000c480,0x51f00000d200)
freed by thread T0 here:
#0 0x5b2d4582385a in free (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/qemu-system-hppa+0x17ad85a) (BuildId: 692b49eedc6fb0ef618bbb6784a09311b3b7f1e8)
#1 0x5b2d47160723 in object_finalize /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:734:9
#2 0x5b2d471589db in object_unref /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:1232:9
#3 0x5b2d477d373c in qmp_device_list_properties /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/qom-qmp-cmds.c:237:5
previously allocated by thread T0 here:
#0 0x5b2d45823af3 in malloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/qemu-system-hppa+0x17adaf3) (BuildId: 692b49eedc6fb0ef618bbb6784a09311b3b7f1e8)
#1 0x79728fa08b09 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x62b09) (BuildId: 1eb6131419edb83b2178b682829a6913cf682d75)
#2 0x5b2d471595fc in object_new_with_type /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:767:15
#3 0x5b2d47159409 in object_new_with_class /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:782:12
#4 0x5b2d477d29a5 in qmp_device_list_properties /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/qom-qmp-cmds.c:206:11
Cc: qemu-stable@nongnu.org Fixes: e029bb00a79be ("hw/pci-host: Add Astro system bus adapter found on PA-RISC machines")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3118 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250918114259.1802337-3-peter.maydell@linaro.org>
Peter Maydell [Thu, 18 Sep 2025 11:42:58 +0000 (12:42 +0100)]
hw/pci-host/dino: Don't call pci_register_root_bus() in init
In the dino PCI host bridge device, we call pci_register_root_bus()
in the device's instance_init. This is a problem for two reasons
* the PCI bridge is then available to the rest of the simulation
(e.g. via pci_qdev_find_device()), even though it hasn't
yet been realized
* we do not attempt to unregister in an instance_deinit,
which means that if you go through an instance_init -> deinit
lifecycle the freed memory for the host-bridge device is
left on the pci_host_bridges list
ASAN reports the resulting use-after-free:
==1771223==ERROR: AddressSanitizer: heap-use-after-free on address 0x527000018f80 at pc 0x5b4b9d3369b5 bp 0x7ffd01929980 sp 0x7ffd01929978
WRITE of size 8 at 0x527000018f80 thread T0
#0 0x5b4b9d3369b4 in pci_host_bus_register /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:608:5
#1 0x5b4b9d321566 in pci_root_bus_internal_init /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:677:5
#2 0x5b4b9d3215e0 in pci_root_bus_new /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:706:5
#3 0x5b4b9d321fe5 in pci_register_root_bus /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci/pci.c:751:11
#4 0x5b4b9d390521 in dino_pcihost_init /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../hw/pci-host/dino.c:473:16
0x527000018f80 is located 1664 bytes inside of 12384-byte region [0x527000018900,0x52700001b960)
freed by thread T0 here:
#0 0x5b4b9cab185a in free (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/qemu-system-hppa+0x17ad85a) (BuildId: ca496bb2e4fc750ebd289b448bad8d99c0ecd140)
#1 0x5b4b9e3ee723 in object_finalize /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:734:9
#2 0x5b4b9e3e69db in object_unref /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:1232:9
#3 0x5b4b9ea6173c in qmp_device_list_properties /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/qom-qmp-cmds.c:237:5
#4 0x5b4b9ec4e0f3 in qmp_marshal_device_list_properties /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/qapi/qapi-commands-qdev.c:65:14
previously allocated by thread T0 here:
#0 0x5b4b9cab1af3 in malloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/qemu-system-hppa+0x17adaf3) (BuildId: ca496bb2e4fc750ebd289b448bad8d99c0ecd140)
#1 0x799d8270eb09 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x62b09) (BuildId: 1eb6131419edb83b2178b682829a6913cf682d75)
#2 0x5b4b9e3e75fc in object_new_with_type /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:767:15
#3 0x5b4b9e3e7409 in object_new_with_class /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/object.c:782:12
#4 0x5b4b9ea609a5 in qmp_device_list_properties /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/hppa-asan/../../qom/qom-qmp-cmds.c:206:11
where we allocated one instance of the dino device, put it on the
list, freed it, and then trying to allocate a second instance touches
the freed memory on the pci_host_bridges list.
Fix this by deferring all the setup of memory regions and registering
the PCI bridge to the device's realize method. This brings it into
line with almost all other PCI host bridges, which call
pci_register_root_bus() in realize.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3118 Fixes: 63901b6cc4d8b4 ("dino: move PCI bus initialisation to dino_pcihost_init()") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250918114259.1802337-2-peter.maydell@linaro.org>
Solaris 8 appears to have a bug whereby it executes v9 MEMBAR
instructions when booting a freshly installed image. According
to the SPARC v8 architecture manual, whilst bits 13 and bits 12-0
of the "Read State Register Instructions" are notionally zero,
they are marked as unused (i.e. ignored).
Fixes: af25071c1d ("target/sparc: Move RDASR, STBAR, MEMBAR to decodetree")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3097 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This commit adds support for the `prctl(PR_SET_SYSCALL_USER_DISPATCH)`
function in the Linux userspace emulator.
It is implemented as a fully host-independent function, by forcing
a SIGSYS early during syscall handling, if the PC is outside the
allowed range.
Since disabled SUD is indistinguishable from enabled SUD with
always-allowed region length == ~0, this encoding is used
instead of introducing a new flag.
Tested on [uglendix][1], will probably also apply to software like
tiny-wine, rpcsx, limbo, lazypoline, vicar, sysfail and endokernel,
to name a few.
[1]: https://sr.ht/~arusekk/uglendix
Signed-off-by: Arusekk <floss@arusekk.pl>
Message-ID: <20250711225226.14652-1-floss@arusekk.pl>
[rth: Split out is_vdso_sigreturn region matching and other minor tweaks.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
linux-user: Populate vdso_sigreturn_region_{start,end} from sigtramp page
When a target does not support a vdso, we generate a sigtramp page.
The only thing on this page is a (set of) signal return syscalls.
We do not need to narrowly restrict the vdso_sigreturn_region;
simply record the entire page for all such targets.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Merge tag 'misc-fixes-pull-request' of https://gitlab.com/berrange/qemu into staging
* Update security triage contact address
* Check and honour failures to the blocking flag on FDs
* Don't touch blocking flags on FDs received during migration
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE2vOm/bJrYpEtDo4/vobrtBUQT98FAmjNQuAACgkQvobrtBUQ
# T99xaBAAr6zQPii1tjzuzLovF6MIqtldXnmVO/yjcl5NgLWonIRDt2JsxnRxi3es
# 9uNDed5+ePNXmUAYd46k81gBEjBWbv465kt5FHAZZV6BRw/PPzkoh+jzGc8NVir8
# 3GZJ2kPr51PxGEl8md2vRthg4bMuhlS5ogCEqAMDYT4f6AVemfnNQ5NttGX353T2
# etxoMhEeMtTBKjMoTBv+SVhhO4nKwZ+6CFhvuGON423EfrGlkNTXyprKTdzpr4i0
# 4KDQLxxoANlmg/1W0PxfrLiBCmGpHweMR44Piv715VYa2YNPRq0G6EC6AFGbHZ51
# N+mKmWNE0CS5rP1TEacSCX4q6If5VxjSLLj+og8LmpIlJ6tiqdrisSqA6bzCJ1f/
# lMsfUsKoMqPhqat9ZGUkYu8REgKP+O+CSGJNftYTsEEY0oKZrAW4fsoN3E9qpfcG
# Xy6eSu0TTGDWE6CEe0vkHiQwlVHMtRcWMSPwlsvrgt2TO6k97reT3AoIBK2VfygC
# WzMv0P0nBvHFKeIbqmFOk3BEI5+JECgxVRc1WXWbSFLW0PBY/xd7g6ow8uaQsd9e
# pzMA1Pwh2EuM4DTlOy+m9zBOhm9YP9An188NLldOne3TFKFYe5QO1DQpvvEGvIGB
# +4XpmyOj3g2ycelZZ5XsDJk0LumCCOcbSPSiAvHZyWwLo24EABE=
# =rrMd
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 19 Sep 2025 04:47:44 AM PDT
# gpg: using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" [unknown]
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* tag 'misc-fixes-pull-request' of https://gitlab.com/berrange/qemu:
util/vhost-user-server: vu_message_read(): improve error handling
chardev: close an fd on failure path
chardev: qemu_chr_open_fd(): add errp
treewide: use qemu_set_blocking instead of g_unix_set_fd_nonblocking
util: drop qemu_socket_set_block()
io/channel-socket: rework qio_channel_socket_copy_fds()
util: drop qemu_socket_try_set_nonblock()
util: drop qemu_socket_set_nonblock()
migration: qemu_file_set_blocking(): add errp parameter
treewide: handle result of qio_channel_set_blocking()
util: add qemu_set_blocking() function
char-socket: tcp_chr_recv(): add comment
char-socket: tcp_chr_recv(): drop extra _set_(block,cloexec)
io/channel: document how qio_channel_readv_full() handles fds
migration/qemu-file: don't make incoming fds blocking again
MAINTAINERS: list qemu-security@nongnu.org as security contact
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
1. Drop extra error_report_err(NULL), it will just crash, if we get
here.
2. Get and report error of qemu_set_blocking(), instead of aborting.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
There are at least two failure paths, where we forget
to close an fd.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Every caller already support errp, let's go further.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
treewide: use qemu_set_blocking instead of g_unix_set_fd_nonblocking
Instead of open-coded g_unix_set_fd_nonblocking() calls, use
QEMU wrapper qemu_set_blocking().
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[DB: fix missing closing ) in tap-bsd.c, remove now unused GError var] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
We want to switch from qemu_socket_set_block() to newer
qemu_set_blocking(), which provides return status of operation,
to handle errors.
Still, we want to keep qio_channel_socket_readv() interface clean,
as currently it allocate @fds only on success.
So, in case of error, we should close all incoming fds and keep
user's @fds untouched or zero.
Let's make separate functions qio_channel_handle_fds() and
qio_channel_cleanup_fds(), to achieve what we want.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Now we can use qemu_set_blocking() in these cases.
Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Note that pre-patch the behavior of Win32 and Linux realizations
are inconsistent: we ignore failure for Win32, and assert success
for Linux.
How do we convert the callers?
1. Most of callers call qemu_socket_set_nonblock() on a
freshly created socket fd, in conditions when we may simply
report an error. Seems correct switching to error handling
both for Windows (pre-patch error is ignored) and Linux
(pre-patch we assert success). Anyway, we normally don't
expect errors in these cases.
Still in tests let's use &error_abort for simplicity.
What are exclusions?
2. hw/virtio/vhost-user.c - we are inside #ifdef CONFIG_LINUX,
so no damage in switching to error handling from assertion.
3. io/channel-socket.c: here we convert both old calls to
qemu_socket_set_nonblock() and qemu_socket_set_block() to
one new call. Pre-patch we assert success for Linux in
qemu_socket_set_nonblock(), and ignore all other errors here.
So, for Windows switch is a bit dangerous: we may get
new errors or crashes(when error_abort is passed) in
cases where we have silently ignored the error before
(was it correct in all such cases, if they were?) Still,
there is no other way to stricter API than take
this risk.
4. util/vhost-user-server - compiled only for Linux (see
util/meson.build), so we are safe, switching from assertion to
&error_abort.
Note: In qga/channel-posix.c we use g_warning(), where g_printerr()
would actually be a better choice. Still let's for now follow
common style of qga, where g_warning() is commonly used to print
such messages, and no call to g_printerr(). Converting everything
to use g_printerr() should better be another series.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
qemu_file_set_blocking() is a wrapper on qio_channel_set_blocking(),
so let's passthrough the errp.
Note the migration should not be using &error_abort in these calls,
however, this is done to expedite the API conversion.
The original code would have eventually ended up calling either
qemu_socket_set_nonblock which would asset on Linux, or
g_unix_set_fd_nonblocking which would propagate errors. We never
saw asserts in practice, and conceptually they should not happen,
but ideally this code will be later adapted to remove use of
&error_abort.
Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
treewide: handle result of qio_channel_set_blocking()
Currently, we just always pass NULL as errp argument. That doesn't
look good.
Some realizations of interface may actually report errors.
Channel-socket realization actually either ignore or crash on
errors, but we are going to straighten it out to always reporting
an errp in further commits.
So, convert all callers to either handle the error (where environment
allows) or explicitly use &error_abort.
Take also a chance to change the return value to more convenient
bool (keeping also in mind, that underlying realizations may
return -1 on failure, not -errno).
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[DB: fix return type mismatch in TLS/websocket channel
impls for qio_channel_set_blocking] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In generic code we have qio_channel_set_blocking(), which takes
bool parameter, and qemu_file_set_blocking(), which as well takes
bool parameter.
At lower fd-layer we have a mess of functions:
- enough direct calls to Unix-specific g_unix_set_fd_nonblocking()
(of course, all calls are out of Windows-compatible code), which
is glib specific with GError, which we can't use, and have to
handle error-reporting by hand after the call.
and several platform-agnostic qemu_* helpers:
- qemu_socket_set_nonblock(), which asserts success for posix (still,
in most cases we can handle the error in better way) and ignores
error for win32 realization
- qemu_socket_try_set_nonblock(), providing and error, but not errp,
so we have to handle it after the call
- qemu_socket_set_block(), which simply ignores an error
Note, that *_socket_* word in original API, which we are going
to substitute was intended, because Windows support these operations
only for sockets. What leads to solution of dropping it again?
1. Having a QEMU-native wrapper with errp parameter
for g_unix_set_fd_nonblocking() for non-socket fds worth doing,
at least to unify error handling.
2. So, if try to keep _socket_ vs _file_ words, we'll have two
actually duplicated functions for Linux, which actually will
be executed successfully on any (good enough) fds, and nothing
prevent using them improperly except for the name. That doesn't
look good.
3. Naming helped us in the world where we crash on errors or
ignore them. Now, with errp parameter, callers are intended to
proper error checking. And for places where we really OK with
crash-on-error semantics (like tests), we have an explicit
&error_abort.
So, this commit starts a series, which will effectively revert
commit ff5927baa7ffb9 "util: rename qemu_*block() socket functions"
(which in turn was reverting f9e8cacc5557e43
"oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock()",
so that's a long story).
Now we don't simply rename, instead we provide the new API and
update all the callers.
This commit only introduces a new fd-layer wrapper. Next commits
will replace old API calls with it, and finally remove old API.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Add comment, to stress that the order of operation (first drop old fds,
second check read status) is intended.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
char-socket: tcp_chr_recv(): drop extra _set_(block,cloexec)
qio_channel_readv_full() guarantees BLOCKING and CLOEXEC states for
incoming descriptors, no reason to call extra ioctls.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
io/channel: document how qio_channel_readv_full() handles fds
The only realization, which may have incoming fds is
qio_channel_socket_readv() (in io/channel-socket.c).
qio_channel_socket_readv() do call (through
qio_channel_socket_copy_fds()) qemu_socket_set_block() and
qemu_set_cloexec() for each fd.
Also, qio_channel_socket_copy_fds() is called at the end of
qio_channel_socket_readv(), on success path.
Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
migration/qemu-file: don't make incoming fds blocking again
In migration we want to pass fd "as is", not changing its
blocking status.
The only current user of these fds is CPR state (through VMSTATE_FD),
which of-course doesn't want to modify fds on target when source is
still running and use these fds.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
MAINTAINERS: list qemu-security@nongnu.org as security contact
The qemu-security@nongnu.org list is considered the authoritative
contact for reporting QEMU security issues. Remove the Red Hat
security team address in favour of QEMU's list, to ensure that
upstream gets first contact. There is a representative of the
Red Hat security team as a member of qemu-security@nongnu.org
whom requests CVE assignments on behalf of QEMU when needed.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mauro Matteo Cascella <mcascell@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Mark Johnston [Wed, 6 Aug 2025 17:53:08 +0000 (13:53 -0400)]
9pfs: Add FreeBSD support
This is largely derived from existing Darwin support. FreeBSD
apparently has better support for *at() system calls so doesn't require
workarounds for a missing mknodat(). The implementation has a couple of
warts however:
- The extattr(2) system calls don't support anything akin to
XATTR_CREATE or XATTR_REPLACE, so a racy workaround is implemented.
- Attribute names cannot begin with "user." or "system." on ZFS.
However FreeBSD's extattr(2) system calls support two dedicated
namespaces for these two. So "user." or "system." prefixes are
trimmed off from attribute names and instead EXTATTR_NAMESPACE_USER or
EXTATTR_NAMESPACE_SYSTEM are picked and passed to extattr system calls
accordingly.
The 9pfs tests were verified to pass on the UFS, ZFS and tmpfs
filesystems.
# -----BEGIN PGP SIGNATURE-----
#
# iLMEAAEIAB0WIQTKRzxE1qCcGJoZP81FK5aFKyaCFgUCaMvTpQAKCRBFK5aFKyaC
# Fkk0BACDkaQa6jDON8aLcTFcwpIlrnblqlYo6EK7TaGqpI866EhTX09BscRF5bvp
# 3JtGARKy5a6s5GJ64KItIl4n5Z6xvt4ME1KjyqeUTpD99c7J1krgxl6+W/NthK/K
# cLbSnlfvcw/L6KfIsGP6i2F6Y+riyZf6OYMc9IF/xFEAIMKJyA==
# =EgXn
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 18 Sep 2025 02:40:53 AM PDT
# gpg: using RSA key CA473C44D6A09C189A193FCD452B96852B268216
# gpg: Good signature from "Song Gao <gaosong@loongson.cn>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: CA47 3C44 D6A0 9C18 9A19 3FCD 452B 9685 2B26 8216
* tag 'pull-loongarch-20250918' of https://github.com/gaosong715/qemu:
hw/loongarch/virt: Register reset interface with cpu plug callback
hw/loongarch/virt: Remove unnecessay pre-boot setting with BSP
hw/loongarch/virt: Add BSP support with aux boot code
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* cpu-exec: more cleanups to CPU loop exits
* python: bump bundled Meson to 1.9.0
* rust: require Rust 1.83.0
* rust: temporarily remove from Ubuntu CI
* rust: vmstate: convert to use builder pattern
* rust: split "qemu-api" crate
* rust: rename qemu_api_macros -> qemu_macros
* rust: re-export qemu macros from other crates
* x86: fix functional test failure for Xen emulation
* x86: cleanups
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjK6ZsUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNBKwf/aadInCT4vASOfpxbwZgYfYgR2m2m
# BJE9oYKxZJ6MlEOU/1Wfywf9fg4leMSh3XxkDKkEIL19yS6emwin8n3SNYrdAFn3
# 6u4IIWO4NI1Ht3NKytrqFk9wtbH9pAs/gVHLlnmpMxIqtOtZLumPAKNz8rlantmK
# UVDYL3Y0L4pD9i5FK1ObMNpk5AsWNr8Tr64fmb+nTkHutld3sBrEMCLI0+EByGyN
# lQ16sLn9PGqHOr210zuQP7wP2T3NCI3YokFSPQrUUL8LZGxRdXoNF4hI4uZDKGdn
# UbtRu9EkM052qzfsFMrEw5JSbdxEfIjKlPoFKseMv+aWvNAuximAraD3Vg==
# =Lr+x
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 17 Sep 2025 10:02:19 AM PDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [unknown]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (60 commits)
accel/kvm: Set guest_memfd_offset to non-zero value only when guest_memfd is valid
accel/kvm: Zero out mem explicitly in kvm_set_user_memory_region()
accel/kvm: Switch to check KVM_CAP_GUEST_MEMFD and KVM_CAP_USER_MEMORY2 on VM
i386/kvm: Drop KVM_CAP_X86_SMM check in kvm_arch_init()
multiboot: Fix the split lock
target/i386: Define enum X86ASIdx for x86's address spaces
i386/cpu: Enable SMM cpu address space under KVM
hpet: guard IRQ handling with BQL
rust: do not inline do_init_io
rust: meson: remove unnecessary complication in device crates
docs: update rust.rst
rust: re-export qemu macros from common/qom/hwcore
rust: re-export qemu_macros internal helper in "bits"
rust: repurpose qemu_api -> tests
rust/pl011: drop dependency on qemu_api
rust/hpet: drop now unneeded qemu_api dep
rust: rename qemu_api_macros -> qemu_macros
rust: split "hwcore" crate
rust: split "system" crate
rust: split "chardev" crate
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Bibo Mao [Sat, 6 Sep 2025 07:02:00 +0000 (15:02 +0800)]
hw/loongarch/virt: Register reset interface with cpu plug callback
With cpu hotplug is implemented on LoongArch virt machine, reset
interface with hot-added CPU should be registered. Otherwise there
will be problem if system reboots after cpu is hot-added.
Now register reset interface with CPU plug callback, so that all
cold/hot added CPUs let their reset interface registered. And remove
reset interface with CPU unplug callback.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Tested-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20250906070200.3749326-4-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
Bibo Mao [Sat, 6 Sep 2025 07:01:59 +0000 (15:01 +0800)]
hw/loongarch/virt: Remove unnecessay pre-boot setting with BSP
With BSP core, it boots from aux boot code and loads data into register
A0-A2 and PC. Pre-boot setting is not unnecessary and can be removed.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20250906070200.3749326-3-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
Bibo Mao [Sat, 6 Sep 2025 07:01:58 +0000 (15:01 +0800)]
hw/loongarch/virt: Add BSP support with aux boot code
If system boots directly from Linux kernel, BSP core jumps to kernel
entry of Linux kernel image and other APs jump to aux boot code. Instead
BSP and APs can all jump to aux boot code like UEFI bios.
With aux boot code, BSP core is judged from physical cpu id, whose
cpu id is 0. With BSP core, load data to register A0-A2 and PC.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-ID: <20250906070200.3749326-2-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
Merge tag 'pull-target-arm-20250916' of https://gitlab.com/pm215/qemu into staging
target-arm queue:
* tests, scripts: Don't import print_function from __future__
* Implement FEAT_ATS1A
* Remove deprecated pxa CPU family
* arm/kvm: report registers we failed to set
* Expose SME registers to GDB via gdbstub
* linux-user/aarch64: Generate ESR signal records
* hw/arm/raspi4b: remove redundant check in raspi_add_memory_node
* hw/arm/virt: Allow user-creatable SMMUv3 dev instantiation
* system: drop the -old-param option
# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmjJpt8ZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vRGEACO3VrePiMIA9N7egqlUiGn
# aRQVqIKeuPVj6TRVG7BSNWlAX8qvnOWOKg1yGVHDZv/nLvRje9UyfUAw7pf6jXod
# bzxWBCPJ0J0eOB64Tz87WRCLltKB5pEN+uIG00PtpBcXT1ixYCDgBZXyD3mwuJ4Q
# 5Yc5hEwQzpmh+EycLtfCHbmjKDw3x1ncpVlGceOG4h5fvzIvIhcNcZJXfAHhbhyO
# Y4c5PELrCkCLZaTtSSxd6VJ+vXQ9bNWyKaSZu2KRRnLcMeAqw2Ic7dLPlkzCVyxM
# PTOHy4TuDu+kqCbkxdnhpI6fvq5kcHyfTL6qX6tth8ZZS+qKGtvMEIXnYoy6q1kh
# 4jV5vizK8avx31fSiuTKVpttRv4dC+Aq5QrcgYtIVMeOwtkWHv610D8gcFPmXoG+
# uHX9WdzOjrYOzXVKzJaCZF6b7L31ptSEfOrx7asBC9k2wPRwonFXg4JGNq16Yann
# aAO5TM7NAUvM2IPgqS+Tf1Bk0iQqORxGfqzCyL76OO/QMMgfBy9elKH0UR0G+ePJ
# yjpub1oWIELSXsQGMrdFo1W4/NIpFMTu3DP9W+6XRPu1AvrAx/AsrTuvSvXoeFY9
# d/U3yWAXm5XxRzbCIUg7ke8I8zLwRz924M5PA8vophvSnfDLS3V8CJHLwbz/PqYc
# 0P2KCeI6d2NIhVik4mgEoQ==
# =5tK3
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 16 Sep 2025 11:05:19 AM PDT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [unknown]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [unknown]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [unknown]
# gpg: aka "Peter Maydell <peter@archaic.org.uk>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* tag 'pull-target-arm-20250916' of https://gitlab.com/pm215/qemu: (36 commits)
hw/usb/network: Remove hardcoded 0x40 prefix in STRING_ETHADDR response
qtest/bios-tables-test: Update tables for smmuv3 tests
qtest/bios-tables-test: Add tests for legacy smmuv3 and smmuv3 device
bios-tables-test: Allow for smmuv3 test data.
qemu-options.hx: Document the arm-smmuv3 device
hw/arm/virt: Allow user-creatable SMMUv3 dev instantiation
hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
hw/arm/virt: Add an SMMU_IO_LEN macro
hw/arm/virt: Factor out common SMMUV3 dt bindings code
hw/arm/virt-acpi-build: Update IORT for multiple smmuv3 devices
hw/arm/virt-acpi-build: Re-arrange SMMUv3 IORT build
hw/arm/smmu-common: Check SMMU has PCIe Root Complex association
target/arm: Added test case for SME register exposure to GDB
target/arm: Added support for SME register exposure to GDB
target/arm: Increase MAX_PACKET_LENGTH for SME ZA remote gdb debugging
arm/kvm: report registers we failed to set
system: drop the -old-param option
target/arm: Drop ARM_FEATURE_IWMMXT handling
target/arm: Drop ARM_FEATURE_XSCALE handling
target/arm: Remove iwmmxt helper functions
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Xiaoyao Li [Mon, 28 Jul 2025 11:57:06 +0000 (19:57 +0800)]
accel/kvm: Zero out mem explicitly in kvm_set_user_memory_region()
Zero out the entire mem explicitly before it's used, to ensure the unused
feilds (pad1, pad2) are all zeros. Otherwise, it might cause problem when
the pad fields are extended by future KVM.
Fixes: ce5a983233b4 ("kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250728115707.1374614-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can see that the instruction at 0x1e3 is a far jmp through the GDT.
However, the GDT is not 8 byte aligned, the base is 0xc02b4.
Intel processors follow the LOCK semantics to set the accessed flag of the
segment descriptor when loading a segment descriptor. If the the segment
descriptor crosses two cache line, it causes split lock.
Fix it by aligning the GDT on 8 bytes, so that segment descriptor cannot
span two cache lines.
Xiaoyao Li [Wed, 30 Jul 2025 09:52:52 +0000 (17:52 +0800)]
i386/cpu: Enable SMM cpu address space under KVM
Kirill Martynov reported assertation in cpu_asidx_from_attrs() being hit
when x86_cpu_dump_state() is called to dump the CPU state[*]. It happens
when the CPU is in SMM and KVM emulation failure due to misbehaving
guest.
The root cause is that QEMU i386 never enables the SMM address space for
cpu since KVM SMM support has been added.
Enable the SMM cpu address space under KVM when the SMM is enabled for
the x86machine.
Igor Mammedov [Wed, 10 Sep 2025 14:25:06 +0000 (16:25 +0200)]
hpet: guard IRQ handling with BQL
Commit [1] made qemu fail with abort:
xen_evtchn_set_gsi: Assertion `bql_locked()' failed.
when running ./tests/functional/x86_64/test_kvm_xen.py tests.
To fix it make sure that BQL is held when manipulating IRQs.
Fixes: 7defb58baf (hpet: switch to fine-grained device locking) Reported-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Link: https://lore.kernel.org/r/20250910142506.86274-1-imammedo@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unfortunately, an example had to be compile-time disabled, since it
relies on higher level crates (qdev, irq etc). The alternative is
probably to move that code to an example in qemu-api or elsewere and
make a link to it, or include_str.
rust: make build.rs generic over various ./rust/projects
Guess the name of the subdir from the manifest directory, instead of
hard-coding it. In the following commits, other crates can then link to
this file, instead of maintaining their own copy.
The global allocator has always been disabled. There is no clear reason
Rust and C should use the same allocator. Allocations made from Rust
must be freed by Rust, and same for C, otherwise we head into troubles.
Paolo Bonzini [Mon, 8 Sep 2025 10:49:41 +0000 (12:49 +0200)]
rust: qdev: const_refs_to_static
Now that const_refs_static can be assumed, convert the members of
the DeviceImpl trait from functions to constants. This lets the
compiler know that they have a 'static lifetime, and removes the
need for the weird "Box::leak()".
Paolo Bonzini [Mon, 8 Sep 2025 10:49:40 +0000 (12:49 +0200)]
rust: vmstate: use const_refs_to_static
The VMStateDescriptionBuilder already needs const_refs_static, so
use it to remove the need for vmstate_clock! and vmstate_struct!,
as well as to simplify the implementation for scalars.
If the consts in the VMState trait can reference to static
VMStateDescription, scalars do not need the info_enum_to_ref!
indirection and structs can implement the VMState trait themselves.
Zhao Liu [Mon, 8 Sep 2025 10:49:39 +0000 (12:49 +0200)]
rust: vmstate: convert to use builder pattern
Similar to MemoryRegionOps, the builder pattern has two advantages:
1) it makes it possible to build a VMStateDescription that knows which
types it will be invoked on; 2) it provides a way to wrap the callbacks
and let devices avoid "unsafe".
Unfortunately, building a static VMStateDescription requires the
builder methods to be "const", and because the VMStateFields are
*also* static, this requires const_refs_static. So this requires
Rust 1.83.0.
Add derive macro for declaring qdev properties directly above the field
definitions. To do this, we split DeviceImpl::properties method on a
separate trait so we can implement only that part in the derive macro
expansion (we cannot partially implement the DeviceImpl trait).
Adding a `property` attribute above the field declaration will generate
a `qemu_api::bindings::Property` array member in the device's property
list.
Paolo Bonzini [Mon, 8 Sep 2025 10:49:34 +0000 (12:49 +0200)]
configure: bump Meson to 1.9.0 for use with Rust
Meson 1.9.0 provides mixed linking of Rust and C objects. As a side effect,
this also allows adding dependencies with "sources: ..." files to Rust crates
that use structured_sources().
It can also clean up up the meson.build files for Rust noticeably, but due
to an issue with doctests (see https://github.com/mesonbuild/meson/pull/14973)
that will have to wait for 1.9.1.
Paolo Bonzini [Mon, 8 Sep 2025 10:49:33 +0000 (12:49 +0200)]
ci: temporarily remove rust from Ubuntu
This is for the purpose of getting an easy-to-use base for future
development. The plan is:
- that Debian will require trixie to enable Rust usage
- that Ubuntu will backport 1.83 to its 22.04 and 24.04 versions
(https://bugs.launchpad.net/ubuntu/+source/rustc-1.83/+bug/2120318)
Marc-André is working on adding Rust to other CI jobs.
Paolo Bonzini [Mon, 11 Aug 2025 07:52:46 +0000 (09:52 +0200)]
accel: make all calls to qemu_process_cpu_events look the same
There is no reason for some accelerators to use qemu_process_cpu_events_common
(which is separated from qemu_process_cpu_events() specifically for round
robin TCG). They can also check for events directly on the first pass through
the loop, instead of setting cpu->exit_request to true.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 21 Aug 2025 16:56:55 +0000 (18:56 +0200)]
cpus: clear exit_request in qemu_process_cpu_events
Make the code common to all accelerators: after seeing cpu->exit_request
set to true, accelerator code needs to reach qemu_process_cpu_events_common().
So for the common cases where they use qemu_process_cpu_events(), go ahead and
clear it in there. Note that the cheap qatomic_set() is enough because
at this point the thread has taken the BQL; qatomic_set_mb() is not needed.
In particular, this is the ordering of the communication between
I/O and vCPU threads is always the same.
In the I/O thread:
(a) store other memory locations that will be checked if cpu->exit_request
or cpu->interrupt_request is 1 (for example cpu->stop or cpu->work_list
for cpu->exit_request)
(b) cpu_exit(): store-release cpu->exit_request, or
(b) cpu_interrupt(): store-release cpu->interrupt_request
>>> at this point, cpu->halt_cond is broadcast and the BQL released
(c) do the accelerator-specific kick (e.g. write icount_decr for TCG,
pthread_kill for KVM, etc.)
In the vCPU thread instead the opposite order is respected:
(c) the accelerator's execution loop exits thanks to the kick
(b) then the inner execution loop checks cpu->interrupt_request
and cpu->exit_request. If needed cpu->interrupt_request is
converted into cpu->exit_request when work is needed outside
the execution loop.
(a) then the other memory locations are checked. Some may need to
be read under the BQL, but the vCPU thread may also take other
locks (e.g. for queued work items) or none at all.
qatomic_set_mb() would only be needed if the halt sleep was done
outside the BQL (though in that case, cpu->exit_request probably
would be replaced by a QemuEvent or something like that).
Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a user-mode emulation version of the function. More will be
added later, for now it is just process_queued_cpu_work.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 1 Aug 2025 11:24:48 +0000 (13:24 +0200)]
cpus: remove TCG-ism from cpu_exit()
Now that TCG has its own kick function, make cpu_exit() do the right kick
for all accelerators.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 11 Aug 2025 06:33:40 +0000 (08:33 +0200)]
accel/tcg: inline cpu_exit()
Right now, cpu_exit() is not usable from all accelerators because it
includes a TCG-specific thread kick. In fact, cpu_exit() doubles as
the TCG thread-kick via tcg_kick_vcpu_thread().
In preparation for changing that, inline cpu_exit() into
tcg_kick_vcpu_thread(). The direction of the calls can then be
reversed, with an accelerator-independent cpu_exit() calling into
qemu_vcpu_kick() rather than the opposite.
Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 11 Aug 2025 06:28:31 +0000 (08:28 +0200)]
accel/tcg: create a thread-kick function for TCG
Round-robin TCG is calling into cpu_exit() directly. In preparation
for making cpu_exit() usable from all accelerators, define a generic
thread-kick function for TCG which is used directly in the multi-threaded
case, and through CPU_FOREACH in the round-robin case.
Use it also for user-mode emulation, and take the occasion to move
the implementation to accel/tcg/user-exec.c.
Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 8 Aug 2025 16:55:48 +0000 (18:55 +0200)]
accel: use atomic accesses for exit_request
CPU threads write exit_request as a "note to self" that they need to
go out to a slow path. This write happens out of the BQL and can be
a data race with another threads' cpu_exit(); use atomic accesses
consistently.
While at it, change the source argument from int ("1") to bool ("true").
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 1 Aug 2025 12:57:51 +0000 (14:57 +0200)]
accel: use store_release/load_acquire for cross-thread exit_request
Reads and writes cpu->exit_request do not use a load-acquire/store-release
pair right now, but this means that cpu_exit() may not write cpu->exit_request
after any flags that are read by the vCPU thread.
Probably everything is protected one way or the other by the BQL, because
cpu->exit_request leads to the slow path, where the CPU thread often takes
the BQL (for example, to go to sleep by waiting on the BQL-protected
cpu->halt_cond); but it's not clear, so use load-acquire/store-release
consistently.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>