git.ipfire.org Git - thirdparty/qemu.git/log

target/riscv/kvm: support KVM_GET_REG_LIST

KVM for RISC-V started supporting KVM_GET_REG_LIST in Linux 6.6. It
consists of a KVM ioctl() that retrieves a list of all available regs
for get_one_reg/set_one_reg. Regs that aren't present in the list aren't
supported in the host.

This simplifies our lives when initing the KVM regs since we don't have
to always attempt a KVM_GET_ONE_REG for all regs QEMU knows. We'll only
attempt a get_one_reg() if we're sure the reg is supported, i.e. it was
retrieved by KVM_GET_REG_LIST. Any error in get_one_reg() will then
always considered fatal, instead of having to handle special error codes
that might indicate a non-fatal failure.

Start by moving the current kvm_riscv_init_multiext_cfg() logic into a
new kvm_riscv_read_multiext_legacy() helper. We'll prioritize using
KVM_GET_REG_LIST, so check if we have it available and, in case we
don't, use the legacy() logic.

Otherwise, retrieve the available reg list and use it to check if the
host supports our known KVM regs, doing the usual get_one_reg() for
the supported regs and setting cpu->cfg accordingly.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20231003132148.797921-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 608bdebb6075b757e5505f6bbc60c45a54a1390b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: trivial context tweak in target/riscv/kvm.c)

target/riscv/kvm: improve 'init_multiext_cfg' error msg

Our error message is returning the value of 'ret', which will be always
-1 in case of error, and will not be that useful:

qemu-system-riscv64: Unable to read ISA_EXT KVM register ssaia, error -1

Improve the error message by outputting 'errno' instead of 'ret'. Use
strerrorname_np() to output the error name instead of the error code.
This will give us what we need to know right away:

qemu-system-riscv64: Unable to read ISA_EXT KVM register ssaia, error code: ENOENT

Given that we're going to exit(1) in this condition instead of
attempting to recover, remove the 'kvm_riscv_destroy_scratch_vcpu()'
call.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231003132148.797921-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 082e9e4a58ba80ec056220a2f762a1c6b9a3a96c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tracetool: avoid invalid escape in Python string

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: <qemu-stable@nongnu.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231108105649.60453-1-marcandre.lureau@redhat.com>
(cherry picked from commit 4d96307c5b4fac40c6ca25f38318b4b65d315de0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/tcg/s390x: Test LAALG with negative cc_src

Add a small test to prevent regressions.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231106093605.1349201-5-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit ebc14107f1f3ac1db13132cd28cf94adcd38e5d7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/s390x: Fix LAALG not updating cc_src

LAALG uses op_laa() and wout_addu64(). The latter expects cc_src to be
set, but the former does not do it. This can lead to assertion failures
if something sets cc_src to neither 0 nor 1 before.

Fix by introducing op_laa_addu64(), which sets cc_src, and using it for
LAALG.

Fixes: 4dba4d6fef61 ("target/s390x: Use atomic operations for LOAD AND OP")
Cc: qemu-stable@nongnu.org
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231106093605.1349201-4-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit bea402482a8c94389638cbd3d7fe3963fb317f4c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/tcg/s390x: Test CLC with inaccessible second operand

Add a small test to prevent regressions.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20231106093605.1349201-3-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 43fecbe7a53fe8e5a6aff0d6471b1cc624e26b51)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/s390x: Fix CLC corrupting cc_src

CLC updates cc_src before accessing the second operand; if the latter
is inaccessible, the former ends up containing a bogus value.

Fix by reading cc_src into a temporary first.

Fixes: 4f7403d52b1c ("target-s390: Convert CLC")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/1865
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20231106093605.1349201-2-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit aba2ec341c6d20c8dc3e6ecf87fa7c1a71e30c1e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/qtest: ahci-test: add test exposing reset issue with pending callback

Before commit "hw/ide: reset: cancel async DMA operation before
resetting state", this test would fail, because a reset with a
pending write operation would lead to an unsolicited write to the
first sector of the disk.

The test writes a pattern to the beginning of the disk and verifies
that it is still intact after a reset with a pending operation. It
also checks that the pending operation actually completes correctly.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20230906130922.142845-2-f.ebner@proxmox.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit cc610857bbd3551f4b86ae2299336b5d9aa0db2b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/ide: reset: cancel async DMA operation before resetting state

If there is a pending DMA operation during ide_bus_reset(), the fact
that the IDEState is already reset before the operation is canceled
can be problematic. In particular, ide_dma_cb() might be called and
then use the reset IDEState which contains the signature after the
reset. When used to construct the IO operation this leads to
ide_get_sector() returning 0 and nsector being 1. This is particularly
bad, because a write command will thus destroy the first sector which
often contains a partition table or similar.

Traces showing the unsolicited write happening with IDEState
0x5595af6949d0 being used after reset:

> ahci_port_write ahci(0x5595af6923f0)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000300
> ahci_reset_port ahci(0x5595af6923f0)[0]: reset port
> ide_reset IDEstate 0x5595af6949d0
> ide_reset IDEstate 0x5595af694da8
> ide_bus_reset_aio aio_cancel
> dma_aio_cancel dbs=0x7f64600089a0
> dma_blk_cb dbs=0x7f64600089a0 ret=0
> dma_complete dbs=0x7f64600089a0 ret=0 cb=0x5595acd40b30
> ahci_populate_sglist ahci(0x5595af6923f0)[0]
> ahci_dma_prepare_buf ahci(0x5595af6923f0)[0]: prepare buf limit=512 prepared=512
> ide_dma_cb IDEState 0x5595af6949d0; sector_num=0 n=1 cmd=DMA WRITE
> dma_blk_io dbs=0x7f6420802010 bs=0x5595ae2c6c30 offset=0 to_dev=1
> dma_blk_cb dbs=0x7f6420802010 ret=0

> (gdb) p *qiov
> $11 = {iov = 0x7f647c76d840, niov = 1, {{nalloc = 1, local_iov = {iov_base = 0x0,
>       iov_len = 512}}, {__pad = "\001\000\000\000\000\000\000\000\000\000\000",
>       size = 512}}}
> (gdb) bt
> #0  blk_aio_pwritev (blk=0x5595ae2c6c30, offset=0, qiov=0x7f6420802070, flags=0,
>     cb=0x5595ace6f0b0 <dma_blk_cb>, opaque=0x7f6420802010)
>     at ../block/block-backend.c:1682
> #1  0x00005595ace6f185 in dma_blk_cb (opaque=0x7f6420802010, ret=<optimized out>)
>     at ../softmmu/dma-helpers.c:179
> #2  0x00005595ace6f778 in dma_blk_io (ctx=0x5595ae0609f0,
>     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>     io_func=io_func@entry=0x5595ace6ee30 <dma_blk_write_io_func>,
>     io_func_opaque=io_func_opaque@entry=0x5595ae2c6c30,
>     cb=0x5595acd40b30 <ide_dma_cb>, opaque=0x5595af6949d0,
>     dir=DMA_DIRECTION_TO_DEVICE) at ../softmmu/dma-helpers.c:244
> #3  0x00005595ace6f90a in dma_blk_write (blk=0x5595ae2c6c30,
>     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>     cb=cb@entry=0x5595acd40b30 <ide_dma_cb>, opaque=opaque@entry=0x5595af6949d0)
>     at ../softmmu/dma-helpers.c:280
> #4  0x00005595acd40e18 in ide_dma_cb (opaque=0x5595af6949d0, ret=<optimized out>)
>     at ../hw/ide/core.c:953
> #5  0x00005595ace6f319 in dma_complete (ret=0, dbs=0x7f64600089a0)
>     at ../softmmu/dma-helpers.c:107
> #6  dma_blk_cb (opaque=0x7f64600089a0, ret=0) at ../softmmu/dma-helpers.c:127
> #7  0x00005595ad12227d in blk_aio_complete (acb=0x7f6460005b10)
>     at ../block/block-backend.c:1527
> #8  blk_aio_complete (acb=0x7f6460005b10) at ../block/block-backend.c:1524
> #9  blk_aio_write_entry (opaque=0x7f6460005b10) at ../block/block-backend.c:1594
> #10 0x00005595ad258cfb in coroutine_trampoline (i0=<optimized out>,
>     i1=<optimized out>) at ../util/coroutine-ucontext.c:177

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: simon.rowe@nutanix.com
Message-ID: <20230906130922.142845-1-f.ebner@proxmox.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 7d7512019fc40c577e2bdd61f114f31a9eb84a8e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/mips: Fix TX79 LQ/SQ opcodes

The base register address offset is *signed*.

Cc: qemu-stable@nongnu.org
Fixes: aaaa82a9f9 ("target/mips/tx79: Introduce LQ opcode (Load Quadword)")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230914090447.12557-1-philmd@linaro.org>
(cherry picked from commit 18f86aecd6a1bea0f78af14587a684ad966d8d3a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/mips: Fix MSA BZ/BNZ opcodes displacement

The PC offset is *signed*.

Cc: qemu-stable@nongnu.org
Reported-by: Sergey Evlashev <vectorchiefrocks@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1624
Fixes: c7a9ef7517 ("target/mips: Introduce decode tree bindings for MSA ASE")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230914085807.12241-1-philmd@linaro.org>
(cherry picked from commit 04591b3ddd9a96b9298a1dd437a6464ab55e62ee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ui/gtk-egl: apply scale factor when calculating window's dimension

Scale factor needs to be applied when calculating width/height of the
GTK windows.

Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20231012222643.13996-1-dongwon.kim@intel.com>
(cherry picked from commit 47fd6ab1e334962890bc3e8d2e32857f6594e1c1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ui/gtk: force realization of drawing area

Fixes the GL context creation from a widget that isn't yet realized (in
a hidden tab for example).

Resolves:
https://gitlab.com/qemu-project/qemu/-/issues/1727

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Antonio Caggiano <quic_acaggian@quicinc.com>
Message-Id: <20231017111642.1155545-1-marcandre.lureau@redhat.com>
(cherry picked from commit 565f85a9c293818a91a3d3414311303de7e00cec)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ati-vga: Implement fallback for pixman routines

Pixman routines can fail if no implementation is available and it will
become optional soon so add fallbacks when pixman does not work.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <ed0fba3f74e48143f02228b83bf8796ca49f3e7d.1698871239.git.balaton@eik.bme.hu>
(cherry picked from commit 08730ee0cc01c3fceb907a93436d15170a7556c4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

file-posix: fix over-writing of returning zone_append offset

raw_co_zone_append() sets "s->offset" where "BDRVRawState *s". This pointer
is used later at raw_co_prw() to save the block address where the data is
written.

When multiple IOs are on-going at the same time, a later IO's
raw_co_zone_append() call over-writes a former IO's offset address before
raw_co_prw() completes. As a result, the former zone append IO returns the
initial value (= the start address of the writing zone), instead of the
proper address.

Fix the issue by passing the offset pointer to raw_co_prw() instead of
passing it through s->offset. Also, remove "offset" from BDRVRawState as
there is no usage anymore.

Fixes: 4751d09adcc3 ("block: introduce zone append write for zoned devices")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Message-Id: <20231030073853.2601162-1-naohiro.aota@wdc.com>
Reviewed-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
(cherry picked from commit ad4feaca61d76fecad784e6d5e7bae40d0411c46)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

block/file-posix: fix update_zones_wp() caller

When the zoned request fail, it needs to update only the wp of
the target zones for not disrupting the in-flight writes on
these other zones. The wp is updated successfully after the
request completes.

Fixed the callers with right offset and nr_zones.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-Id: <20230825040556.4217-1-faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[hreitz: Rebased and fixed comment spelling]
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
(cherry picked from commit 10b9e0802a074c991e1ce485631d75641d0b0f9e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

qcow2: keep reference on zeroize with discard-no-unref enabled

When the discard-no-unref flag is enabled, we keep the reference for
normal discard requests.
But when a discard is executed on a snapshot/qcow2 image with backing,
the discards are saved as zero clusters in the snapshot image.

When committing the snapshot to the backing file, not
discard_in_l2_slice is called but zero_in_l2_slice. Which did not had
any logic to keep the reference when discard-no-unref is enabled.

Therefor we add logic in the zero_in_l2_slice call to keep the reference
on commit.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-Id: <20231003125236.216473-2-jean-louis@dupond.be>
[hreitz: Made the documentation change more verbose, as discussed
on-list]
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
(cherry picked from commit b2b109041ecd1095384f5be5bb9badd13c1cf286)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: Fix A64 LDRA immediate decode

In commit be23a049 in the conversion to decodetree we broke the
decoding of the immediate value in the LDRA instruction. This should
be a 10 bit signed value that is scaled by 8, but in the conversion
we incorrectly ended up scaling it only by 2. Fix the scaling
factor.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1970
Fixes: be23a049 ("target/arm: Convert load (pointer auth) insns to decodetree")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20231106113445.1163063-1-peter.maydell@linaro.org
(cherry picked from commit 5722fc471296d5f042df4b005a851cc8008df0c9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

block/nvme: nvme_process_completion() fix bound for cid

NVMeQueuePair::reqs has length NVME_NUM_REQS, which less than
NVME_QUEUE_SIZE by 1.

Fixes: 1086e95da17050 ("block/nvme: switch to a NVMeRequest freelist")
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
Message-id: 20231017125941.810461-5-vsementsov@yandex-team.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit cc8fb0c3ae3c950eb40e969607e17ff16a7519ac)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

virtio-gpu: block migration of VMs with blob=true

"blob" resources don't have an associated pixman image:

#0 pixman_image_get_stride (image=0x0) at ../pixman/pixman-image.c:921
#1 0x0000562327c25236 in virtio_gpu_save (f=0x56232bb13b00, opaque=0x56232b555a60, size=0, field=0x5623289ab6c8 <__compound_literal.3+104>, vmdesc=0x56232ab59fe0) at ../hw/display/virtio-gpu.c:1225

Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=2236353

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 9c549ab6895a43ad0cb33e684e11cdb0b5400897)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/xen: use correct default protocol for xen-block on x86

Even on x86_64 the default protocol is the x86-32 one if the guest doesn't
specifically ask for x86-64.

Cc: qemu-stable@nongnu.org
Fixes: b6af8926fb85 ("xen: add implementations of xen-block connect and disconnect functions...")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
(cherry picked from commit a1c1082908dde4867b1ac55f546bea0c17d52318)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/xen: take iothread mutex in xen_evtchn_reset_op()

The xen_evtchn_soft_reset() function requires the iothread mutex, but is
also called for the EVTCHNOP_reset hypercall. Ensure the mutex is taken
in that case.

Cc: qemu-stable@nongnu.org
Fixes: a15b10978fe6 ("hw/xen: Implement EVTCHNOP_reset")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
(cherry picked from commit debc995e883b05c2fd02fb797a61ab1328e5bae2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/xen: fix XenStore watch delivery to guest

When fire_watch_cb() found the response buffer empty, it would call
deliver_watch() to generate the XS_WATCH_EVENT message in the response
buffer and send an event channel notification to the guest… without
actually *copying* the response buffer into the ring. So there was
nothing for the guest to see. The pending response didn't actually get
processed into the ring until the guest next triggered some activity
from its side.

Add the missing call to put_rsp().

It might have been slightly nicer to call xen_xenstore_event() here,
which would *almost* have worked. Except for the fact that it calls
xen_be_evtchn_pending() to check that it really does have an event
pending (and clear the eventfd for next time). And under Xen it's
defined that setting that fd to O_NONBLOCK isn't guaranteed to work,
so the emu implementation follows suit.

This fixes Xen device hot-unplug.

Cc: qemu-stable@nongnu.org
Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation stubs")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
(cherry picked from commit 4a5780f52095f1daf23618dc6198a2a1665ea505)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/xen: don't clear map_track[] in xen_gnttab_reset()

The refcounts actually correspond to 'active_ref' structures stored in a
GHashTable per "user" on the backend side (mostly, per XenDevice).

If we zero map_track[] on reset, then when the backend drivers get torn
down and release their mapping we hit the assert(s->map_track[ref] != 0)
in gnt_unref().

So leave them in place. Each backend driver will disconnect and reconnect
as the guest comes back up again and reconnects, and it all works out OK
in the end as the old refs get dropped.

Cc: qemu-stable@nongnu.org
Fixes: de26b2619789 ("hw/xen: Implement soft reset for emulated gnttab")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
(cherry picked from commit 3de75ed352411899dbc9222e82fe164890c77e78)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/xen: select kernel mode for per-vCPU event channel upcall vector

A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.

For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:

       /* Trick toolstack to think we are enlightened. */
       if (!cpu)
               rc = xen_set_callback_via(1);

That's explicitly setting the delivery to GSI#1, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in Qemu
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.

Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.

Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)

Cc: qemu-stable@nongnu.org
Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
(cherry picked from commit 18e83f28bf39ffd2784aeb2e4e229096a86d349b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

i386/xen: fix per-vCPU upcall vector for Xen emulation

The per-vCPU upcall vector support had three problems. Firstly it was
using the wrong hypercall argument and would always return -EFAULT when
the guest tried to set it up. Secondly it was using the wrong ioctl() to
pass the vector to the kernel and thus the *kernel* would always return
-EINVAL. Finally, even when delivering the event directly from userspace
with an MSI, it put the destination CPU ID into the wrong bits of the
MSI address.

Linux doesn't (yet) use this mode so it went without decent testing
for a while.

Cc: qemu-stable@nongnu.org
Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
(cherry picked from commit e7dbb62ff19ce55548c785d76e814e7b144e6217)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

i386/xen: Don't advertise XENFEAT_supervisor_mode_kernel

This confuses lscpu into thinking it's running in PVH mode.

Cc: qemu-stable@nongnu.org
Fixes: bedcc139248 ("i386/xen: implement HYPERVISOR_xen_version")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
(cherry picked from commit e969f992c6562222e245dd8557f5b132a11ec29c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

util/uuid: Remove UUID_FMT_LEN

Dangerous and now unused.

Cc: Fam Zheng <fam@euphon.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: "Denis V. Lunev" <den@openvz.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 4ef9d97b1a37b8cfd152cc3ac5f9576e406868b1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vfio/pci: Fix buffer overrun when writing the VF token

qemu_uuid_unparse() includes a trailing NUL when writing the uuid
string and the buffer size should be UUID_FMT_LEN + 1 bytes. Use the
recently added UUID_STR_LEN which defines the correct size.

Fixes: CID 1522913
Fixes: 2dca1b37a760 ("vfio/pci: add support for VF token")
Cc: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: "Denis V. Lunev" <den@openvz.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit f8d6f3b16c37bd516a026e92a31dade5d761d3a6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

util/uuid: Add UUID_STR_LEN definition

qemu_uuid_unparse() includes a trailing NUL when writing the uuid
string and the buffer size should be UUID_FMT_LEN + 1 bytes. Add a
define for this size and use it where required.

Cc: Fam Zheng <fam@euphon.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: "Denis V. Lunev" <den@openvz.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 721da0396cfa0a4859cefb57e32cc79d19d80f54)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: Correctly propagate stage 1 BTI guarded bit in a two-stage walk

In a two-stage translation, the result of the BTI guarded bit should
be the guarded bit from the first stage of translation, as there is
no BTI guard information in stage two. Our code tried to do this,
but got it wrong, because we currently have two fields where the GP
bit information might live (ARMCacheAttrs::guarded and
CPUTLBEntryFull::extra::arm::guarded), and we were storing the GP bit
in the latter during the stage 1 walk but trying to copy the former
in combine_cacheattrs().

Remove the duplicated storage, and always use the field in
CPUTLBEntryFull; correctly propagate the stage 1 value to the output
in get_phys_addr_twostage().

Note for stable backports: in v8.0 and earlier the field is named
result->f.guarded, not result->f.extra.arm.guarded.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1950
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20231031173723.26582-1-peter.maydell@linaro.org
(cherry picked from commit 4c09abeae8704970ff03bf2196973f6bf08ab6f9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: replace f.extra.arm.guarded -> f.guarded due to v8.1.0-1179-ga81fef4b64)

target/arm: Fix SVE STR increment

The previous change missed updating one of the increments and
one of the MemOps. Add a test case for all vector lengths.

Cc: qemu-stable@nongnu.org
Fixes: e6dd5e782be ("target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20231031143215.29764-1-richard.henderson@linaro.org
[PMM: fixed checkpatch nit]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit b11293c212c2927fcea1befc50dabec9baba4fcc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fix in tests/tcg/aarch64/Makefile.target)
Tested-by: Alex Bennée <alex.bennee@linaro.org>

qemu-iotests: 024: add rebasing test case for overlay_size > backing_size

Before previous commit, rebase was getting infitely stuck in case of
rebasing within the same backing chain and when overlay_size > backing_size.
Let's add this case to the rebasing test 024 to make sure it doesn't
break again.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-3-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 827171c3180533f4ad0bc338ea166f401bb5d348)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

qemu-img: rebase: stop when reaching EOF of old backing file

In case when we're rebasing within one backing chain, and when target image
is larger than old backing file, bdrv_is_allocated_above() ends up setting
*pnum = 0. As a result, target offset isn't getting incremented, and we
get stuck in an infinite for loop. Let's detect this case and proceed
further down the loop body, as the offsets beyond the old backing size need
to be explicitly zeroed.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-2-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 8b097fd6b06ec295faefd4f30f96f8709abc9605)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/tcg: Add -fno-stack-protector

A build of GCC 13.2 will have stack protector enabled by default if it
was configured with --enable-default-ssp option. For such a compiler,
it is necessary to explicitly disable stack protector when linking
without standard libraries.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230731091042.139159-3-akihiko.odaki@daynix.com>
[AJB: fix comment string typo]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20231029145033.592566-3-alex.bennee@linaro.org>
(cherry picked from commit 580731dcc87eb27a2b0dc20ec331f1ce51864c97)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

block: Fix locking in media change monitor commands

blk_insert_bs() requires that the caller holds the AioContext lock for
the node to be inserted. Since commit c066e808e11, neglecting to do so
causes a crash when the child has to be moved to a different AioContext
to attach it to the BlockBackend.

This fixes qmp_blockdev_insert_anon_medium(), which is called for the
QMP commands 'blockdev-insert-medium' and 'blockdev-change-medium', to
correctly take the lock.

Cc: qemu-stable@nongnu.org
Fixes: https://issues.redhat.com/browse/RHEL-3922
Fixes: c066e808e11a5c181b625537b6c78e0de27a4801
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231013153302.39234-2-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit fed824501501518b1ad3dc08a39f8f855508190d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

misc/led: LED state is set opposite of what is expected

Testing of the LED state showed that when the LED polarity was
set to GPIO_POLARITY_ACTIVE_LOW and a low logic value was set on
the input GPIO of the LED, the LED was being turn off when it was
expected to be turned on.

Fixes: ddb67f6402 ("hw/misc/led: Allow connecting from GPIO output")
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Message-id: 20231024191945.4135036-1-milesg@linux.vnet.ibm.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 6f83dc67168d17856744275e2a0d7a6addf6cfb9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: Fix syndrome for FGT traps on ERET

In commit 442c9d682c94fc2 when we converted the ERET, ERETAA, ERETAB
instructions to decodetree, the conversion accidentally lost the
correct setting of the syndrome register when taking a trap because
of the FEAT_FGT HFGITR_EL1.ERET bit. Instead of reporting a correct
full syndrome value with the EC and IL bits, we only reported the low
two bits of the syndrome, because the call to syn_erettrap() got
dropped.

Fix the syndrome values for these traps by reinstating the
syn_erettrap() calls.

Fixes: 442c9d682c94fc2 ("target/arm: Convert ERET, ERETAA, ERETAB to decodetree")
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20231024172438.2990945-1-peter.maydell@linaro.org
(cherry picked from commit 307521d6e29e559c89afa9dbd337ae75fe3c436d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/sparc: Clear may_lookup for npc == DYNAMIC_PC

With pairs of jmp+rett, pc == DYNAMIC_PC_LOOKUP and
npc == DYNAMIC_PC. Make sure that we exit for interrupts.

Cc: qemu-stable@nongnu.org
Fixes: 633c42834c7 ("target/sparc: Introduce DYNAMIC_PC_LOOKUP")
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 930f1865cc654b637ffe1207fa5b44bf0a156279)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/rdma/vmw/pvrdma_cmd: Use correct struct in query_port()

In query_port() we pass the address of a local pvrdma_port_attr
struct to the rdma_query_backend_port() function. Unfortunately,
rdma_backend_query_port() wants a pointer to a struct ibv_port_attr,
and the two are not the same length.

Coverity spotted this (CID 1507146): pvrdma_port_attr is 48 bytes
long, and ibv_port_attr is 52 bytes, because it has a few extra
fields at the end.

Fortunately, all we do with the attrs struct after the call is to
read a few specific fields out of it which are all at the same
offsets in both structs, so we can simply make the local variable the
correct type. This also lets us drop the cast (which should have
been a bit of a warning flag that we were doing something wrong
here).

We do however need to add extra casts for the fields of the
struct that are enums: clang will complain about the implicit
cast to a different enum type otherwise.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 4ab9a7429bf7507fba4b96b97d4147628c91ba14)

hw/sd/sdhci: Block Size Register bits [14:12] is lost

Block Size Register bits [14:12] is SDMA Buffer Boundary, it is missed
in register write, but it is needed in SDMA transfer. e.g. it will be
used in sdhci_sdma_transfer_multi_blocks to calculate boundary_ variables.

Missing this field will cause wrong operation for different SDMA Buffer
Boundary settings.

Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Fixes: dfba99f17f ("hw/sdhci: Fix DMA Transfer Block Size field")
Signed-off-by: Lu Gao <lu.gao@verisilicon.com>
Signed-off-by: Jianxian Wen <jianxian.wen@verisilicon.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-ID: <20220321055618.4026-1-lu.gao@verisilicon.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit ae5f70baf549925080fcdbc6c1939c98a4a39246)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: Fix CNTPCT_EL0 trapping from EL0 when HCR_EL2.E2H is 0

On an attempt to access CNTPCT_EL0 from EL0 using a guest running on top
of Xen, a trap from EL2 was observed which is something not reproducible
on HW (also, Xen does not trap accesses to physical counter).

This is because gt_counter_access() checks for an incorrect bit (1
instead of 0) of CNTHCTL_EL2 if HCR_EL2.E2H is 0 and access is made to
physical counter. Refer ARM ARM DDI 0487J.a, D19.12.2:
When HCR_EL2.E2H is 0:
- EL1PCTEN, bit [0]: refers to physical counter
- EL1PCEN, bit [1]: refers to physical timer registers

Drop entire block "if (hcr & HCR_E2H) {...} else {...}" from EL0 case
and fall through to EL1 case, given that after fixing checking for the
correct bit, the handling is the same.

Fixes: 5bc8437136fb ("target/arm: Update timer access for VHE")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Message-id: 20230928094404.20802-1-michal.orzel@amd.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit d01448c79d89cfdc86228081b1dd1dfaf85fb4c3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

lasips2: LASI PS/2 devices are not user-createable

Those PS/2 ports are created with the LASI controller when
a 32-bit PA-RISC machine is created.

Mark them not user-createable to avoid showing them in
the qemu device list.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: qemu-stable@nongnu.org
(cherry picked from commit a1e6a5c46219bada2c7b932748527553b36559ae)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user/sh4: Fix crashes on signal delivery

sh4 uses gUSA (general UserSpace Atomicity) to provide atomicity on CPUs
that don't have atomic instructions. A gUSA region that adds 1 to an
atomic variable stored in @R2 looks like this:

  4004b6:       03 c7           mova    4004c4 <gusa+0x10>,r0
  4004b8:       f3 61           mov     r15,r1
  4004ba:       09 00           nop
  4004bc:       fa ef           mov     #-6,r15
  4004be:       22 63           mov.l   @r2,r3
  4004c0:       01 73           add     #1,r3
  4004c2:       32 22           mov.l   r3,@r2
  4004c4:       13 6f           mov     r1,r15

R0 contains a pointer to the end of the gUSA region
R1 contains the saved stack pointer
R15 contains negative length of the gUSA region

When this region is interrupted by a signal, the kernel detects if
R15 >= -128U. If yes, the kernel rolls back PC to the beginning of the
region and restores SP by copying R1 to R15.

The problem happens if we are interrupted by a signal at address 4004c4.
R15 still holds the value -6, but the atomic value was already written by
an instruction at address 4004c2. In this situation we can't undo the
gUSA. The function unwind_gusa does nothing, the signal handler attempts
to push a signal frame to the address -6 and crashes.

This patch fixes it, so that if we are interrupted at the last instruction
in a gUSA region, we copy R1 to R15 to restore the correct stack pointer
and avoid crashing.

There's another bug: if we are interrupted in a delay slot, we save the
address of the instruction in the delay slot. We must save the address of
the previous instruction.

Cc: qemu-stable@nongnu.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Yoshinori Sato <ysato@users.sourcefoege.jp>
Message-Id: <b16389f7-6c62-70b7-59b3-87533c0bcc@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 3b894b699c9a9c064466e128c18be80a3f2113bc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user/mips: fix abort on integer overflow

QEMU mips userspace emulation crashes with "qemu: unhandled CPU exception
0x15 - aborting" when one of the integer arithmetic instructions detects
an overflow.

This patch fixes it so that it delivers SIGFPE with FPE_INTOVF instead.

Cc: qemu-stable@nongnu.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Message-Id: <3ef979a8-3ee1-eb2d-71f7-d788ff88dd11@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 6fad9b4bb91dcc824f9c00a36ee843883b58313b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user: Fixes for zero_bss

The previous change, 2d385be6152, assumed !PAGE_VALID meant that
the page would be unmapped by the elf image. However, since we
reserved the entire image space via mmap, PAGE_VALID will always
be set. Instead, assume PROT_NONE for the same condition.

Furthermore, assume bss is only ever present for writable segments,
and that there is no page overlap between PT_LOAD segments.
Instead of an assert, return false to indicate failure.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1854
Fixes: 2d385be6152 ("linux-user: Do not adjust zero_bss for host page size")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit e6e66b03287331abc6f184456dbc6d25505590ec)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tracetool: avoid invalid escape in Python string

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e6d8e5e6e366ab4c9ed7d8ed1572f98c6ad6a38e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/vm: avoid invalid escape in Python string

This is an error in Python 3.12; fix it by using a raw string literal
or by double-escaping the backslash.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 86a8989d4557a09b68f8b78b6c3fb6ad3f23ca6f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/avocado: avoid invalid escape in Python string

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 1b5f3f65cc71341a4f9fc9e89bb6985fde703758)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/hexagon: avoid invalid escape in Python string

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e41c40d101fce79af4d679955eb6e0d31e02c47c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

docs/sphinx: avoid invalid escape in Python string

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e4b6532cc0a5518c0dea15eeca573829df9ec1df)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/docker: avoid invalid escape in Python string

This is an error in Python 3.12; fix it by using a raw string literal.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a5e3cb3b90a62a42cd19ad9a20ca25c7df1dc3da)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

python/qmp: remove Server.wait_closed() call for Python 3.12

This patch is a backport from
https://gitlab.com/qemu-project/python-qemu-qmp/-/commit/e03a3334b6a477beb09b293708632f2c06fe9f61

According to Guido in https://github.com/python/cpython/issues/104344 ,
this call was never meant to wait for the server to shut down - that is
handled synchronously - but instead, this waits for all connections to
close. Or, it would have, if it wasn't broken since it was introduced.

3.12 fixes the bug, which now causes a hang in our code. The fix is just
to remove the wait.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-id: 20231006195243.3131140-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit acf873873ae38e68371b0c53c42d3530636ff94e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Non multifd migration don't care about multifd flushes

RDMA was having trouble because
migrate_multifd_flush_after_each_section() can only be true or false,
but we don't want to send any flush when we are not in multifd
migration.

CC: Fabiano Rosas <farosas@suse.de
Fixes: 294e5a4034e81 ("multifd: Only flush once each full round of memory")
Reported-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011205548.10571-2-quintela@redhat.com>
(cherry picked from commit d4f34485ca8a077c98fc2303451e9bece9200dd7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Fix analyze-migration read operation signedness

The migration code uses unsigned values for 16, 32 and 64-bit
operations. Fix the script to do the same.

This was causing an issue when parsing the migration stream generated
on the ppc64 target because one of instance_ids was larger than the
32bit signed maximum:

Traceback (most recent call last):
  File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 658, in <module>
    dump.read(dump_memory = args.memory)
  File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 592, in read
    classdesc = self.section_classes[section_key]
KeyError: ('spapr_iommu', -2147483648)

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231009184326.15777-6-farosas@suse.de>
(cherry picked from commit caea03279e11dfcb0e5a567b81fe7f02ee80af02)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/pvrdma: Protect against buggy or malicious guest driver

Guest driver allocates and initialize page tables to be used as a ring
of descriptors for CQ and async events.
The page table that represents the ring, along with the number of pages
in the page table is passed to the device.
Currently our device supports only one page table for a ring.

Let's make sure that the number of page table entries the driver
reports, do not exceeds the one page table size.

Reported-by: Soul Chen <soulchen8650@gmail.com>
Signed-off-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Fixes: CVE-2023-1544
Message-ID: <20230301142926.18686-1-yuval.shaia.ml@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 85fc35afa93c7320d1641d344d0c5dfbe341d087)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

Update version for 8.1.2 release

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/riscv: Fix vfwmaccbf16.vf

The operator (fwmacc16) of vfwmaccbf16.vf helper function should be
replaced by fwmaccbf16.

Fixes: adf772b0f7 ("target/riscv: Add support for Zvfbfwma extension")
Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231005095734.567575-1-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 837570cef237b634eb4c245363470deebea7089d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

disas/riscv: Fix the typo of inverted order of pmpaddr13 and pmpaddr14

Fix the inverted order of pmpaddr13 and pmpaddr14 in csr_name().

Signed-off-by: Alvin Chang <alvinga@andestech.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230907084500.328-1-alvinga@andestech.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit cffa9954908830276c93b430681f66cc0e599aef)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

roms: use PYTHON to invoke python

python3 may not be the expected python version.
Use PYTHON to invoke python.

Fixes: 22e11539e1 ("edk2: replace build scripts")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 17b8d8ac3309e2cfed0d8cb3861afdcc23f66ce0)

hw/audio/es1370: reset current sample counter

Reset the current sample counter when writing the Channel Sample
Count Register. The Linux ens1370 driver and the AROS sb128
driver expect the current sample counter counts down from sample
count to 0 after a write to the Channel Sample Count Register.
Currently the current sample counter starts from 0 after a reset
or the last count when the counter was stopped.

The current sample counter is used to raise an interrupt whenever
a complete buffer was transferred. When the counter starts with a
value lower than the reload value, the interrupt triggeres before
the buffer was completly transferred. This may lead to corrupted
audio streams.

Tested-by: Rene Engel <ReneEngel80@emailn.de>
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20230917065813.6692-1-vr_qemu@t-online.de>
(cherry picked from commit 00e3b29d065f3b88bb3726afbd5c73f8b2bff1b4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration/qmp: Fix crash on setting tls-authz with null

QEMU will crash if anyone tries to set tls-authz (which is a type
StrOrNull) with 'null' value. Fix it in the easy way by converting it to
qstring just like the other two tls parameters.

Cc: qemu-stable@nongnu.org # v4.0+
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter")
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230905162335.235619-2-peterx@redhat.com>
(cherry picked from commit 86dec715a7339fc61c3bdb9715993b277b2089db)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

util/log: re-allow switching away from stderr log file

Commit 59bde21374 ("util/log: do not close and reopen log files when
flags are turned off") prevented switching away from stderr on a
subsequent invocation of qemu_set_log_internal(). This prevented
switching away from stderr with the 'logfile' monitor command as well
as an invocation like
> ./qemu-system-x86_64 -trace 'qemu_mutex_lock,file=log'
from opening the specified log file.

Fixes: 59bde21374 ("util/log: do not close and reopen log files when flags are turned off")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231004124446.491481-1-f.ebner@proxmox.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f05142d511e86d8e97967d21f205d990dfc634de)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vfio/display: Fix missing update to set backing fields

The below referenced commit renames scanout_width/height to
backing_width/height, but also promotes these fields in various portions
of the egl interface. Meanwhile vfio dmabuf support has never used the
previous scanout fields and is therefore missed in the update. This
results in a black screen when transitioning from ramfb to dmabuf display
when using Intel vGPU with these features.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1891
Link: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02726.html
Fixes: 9ac06df8b684 ("virtio-gpu-udmabuf: correct naming of QemuDmaBuf size properties")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 931150e56b056b120c868f94751722710df0b6a7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

amd_iommu: Fix APIC address check

An MSI from I/O APIC may not exactly equal to APIC_DEFAULT_ADDRESS. In
fact, Windows 17763.3650 configures I/O APIC to set the dest_mode bit.
Cover the range assigned to APIC.

Fixes: 577c470f43 ("x86_iommu/amd: Prepare for interrupt remap support")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230921114612.40671-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0114c4513095598cdf1cd8d7dacdfff757628121)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vdpa net: follow VirtIO initialization properly at cvq isolation probing

This patch solves a few issues. The most obvious is that the feature
set was done previous to ACKNOWLEDGE | DRIVER status bit set. Current
vdpa devices are permissive with this, but it is better to follow the
standard.

Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230915170836.3078172-4-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 845ec38ae1578dd2d42ff15c9979f1bf44b23418)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vdpa net: stop probing if cannot set features

Otherwise it continues the CVQ isolation probing.

Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230915170836.3078172-3-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit f1085882d028e5a1b227443cd6e96bbb63d66f43)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vdpa net: fix error message setting virtio status

It incorrectly prints "error setting features", probably because a copy
paste miss.

Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230915170836.3078172-2-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit cbc9ae87b5f6f81c52a249e0b64100d5011fca53)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vdpa net: zero vhost_vdpa iova_tree pointer at cleanup

Not zeroing it causes a SIGSEGV if the live migration is cancelled, at
net device restart.

This is caused because CVQ tries to reuse the iova_tree that is present
in the first vhost_vdpa device at the end of vhost_vdpa_net_cvq_start.
As a consequence, it tries to access an iova_tree that has been already
free.

Fixes: 00ef422e9fbf ("vdpa net: move iova tree creation from init to start")
Reported-by: Yanhui Ma <yama@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230913123408.2819185-1-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0a7a164bc37b4ecbf74466e1e5243d72a768ad06)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user/hppa: Fix struct target_sigcontext layout

Use abi_ullong not uint64_t so that the alignment of the field
and therefore the layout of the struct is correct.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 33bc4fa78b06fc4e5fe22e5576811a97707e0cc6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

chardev/char-pty: Avoid losing bytes when the other side just (re-)connected

When starting a guest via libvirt with "virsh start --console ...",
the first second of the console output is missing. This is especially
annoying on s390x that only has a text console by default and no graphical
output - if the bios fails to boot here, the information about what went
wrong is completely lost.

One part of the problem (there is also some things to be done on the
libvirt side) is that QEMU only checks with a 1 second timer whether
the other side of the pty is already connected, so the first second of
the console output is always lost.

This likely used to work better in the past, since the code once checked
for a re-connection during write, but this has been removed in commit
f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
some locking.

To ease the situation here at least a little bit, let's check with g_poll()
whether we could send out the data anyway, even if the connection has not
been marked as "connected" yet. The file descriptor is marked as non-blocking
anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
not cause any trouble if the other side is not ready for receiving yet.

With this patch applied, I can now successfully see the bios output of
a s390x guest when running it with "virsh start --console" (with a patched
version of virsh that fixes the remaining issues there, too).

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230816210743.1319018-1-thuth@redhat.com>
(cherry picked from commit 4f7689f0817a717d18cc8aca298990760f27a89b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/display/ramfb: plug slight guest-triggerable leak on mode setting

The fw_cfg DMA write callback in ramfb prepares a new display surface in
QEMU; this new surface is put to use ("swapped in") upon the next display
update. At that time, the old surface (if any) is released.

If the guest triggers the fw_cfg DMA write callback at least twice between
two adjacent display updates, then the second callback (and further such
callbacks) will leak the previously prepared (but not yet swapped in)
display surface.

The issue can be shown by:

(1) starting QEMU with "-trace displaysurface_free", and

(2) running the following program in the guest UEFI shell:

> #include <Library/ShellCEntryLib.h>           // ShellAppMain()
> #include <Library/UefiBootServicesTableLib.h> // gBS
> #include <Protocol/GraphicsOutput.h>          // EFI_GRAPHICS_OUTPUT_PROTOCOL
>
> INTN
> EFIAPI
> ShellAppMain (
>   IN UINTN   Argc,
>   IN CHAR16  **Argv
>   )
> {
>   EFI_STATUS                    Status;
>   VOID                          *Interface;
>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>   UINT32                        Mode;
>
>   Status = gBS->LocateProtocol (
>                   &gEfiGraphicsOutputProtocolGuid,
>                   NULL,
>                   &Interface
>                   );
>   if (EFI_ERROR (Status)) {
>     return 1;
>   }
>
>   Gop = Interface;
>
>   Mode = 1;
>   for ( ; ;) {
>     Status = Gop->SetMode (Gop, Mode);
>     if (EFI_ERROR (Status)) {
>       break;
>     }
>
>     Mode = 1 - Mode;
>   }
>
>   return 1;
> }

The symptom is then that:

- only one trace message appears periodically,

- the time between adjacent messages keeps increasing -- implying that
  some list structure (containing the leaked resources) keeps growing,

- the "surface" pointer is ever different.

> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0
> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10
> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0
> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0
> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80
> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00
> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0
> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50
> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50
> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0
> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0
> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0
> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00
> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910
> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20
> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40
> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020
> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160
> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0
> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870
> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960
> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140
> 18566@1695127484.759987:displaysurface_free surface=0x7f2fcc321700
> 18566@1695127485.940289:displaysurface_free surface=0x7f2fccaad100

We figured this wasn't a CVE-worthy problem, as only small amounts of
memory were leaked (the framebuffer itself is mapped from guest RAM, QEMU
only allocates administrative structures), plus libvirt restricts QEMU
memory footprint anyway, thus the guest can only DoS itself.

Plug the leak, by releasing the last prepared (not yet swapped in) display
surface, if any, in the fw_cfg DMA write callback.

Regarding the "reproducer", with the fix in place, the log is flooded with
trace messages (one per fw_cfg write), *and* the trace message alternates
between just two "surface" pointer values (i.e., nothing is leaked, the
allocator flip-flops between two objects in effect).

This issue appears to date back to the introducion of ramfb (995b30179bdc,
"hw/display: add ramfb, a simple boot framebuffer living in guest ram",
2018-06-18).

Cc: Gerd Hoffmann <kraxel@redhat.com> (maintainer:ramfb)
Cc: qemu-stable@nongnu.org
Fixes: 995b30179bdc
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20230919131955.27223-1-lersek@redhat.com>
(cherry picked from commit e0288a778473ebd35eac6cc1924faca7d477d241)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

win32: avoid discarding the exception handler

In all likelihood, the compiler with lto doesn't see the function being
used, from assembly macro __try1. Help it by marking the function has
being used.

Resolves:
https://gitlab.com/qemu-project/qemu/-/issues/1904

Fixes: commit d89f30b4df ("win32: wrap socket close() with an exception handler")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 75b773d84c89220463a14a6883d2b2a8e49e5b68)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(mjt: trivial context fixup in include/qemu/compiler.h)

target/i386: fix memory operand size for CVTPS2PD

CVTPS2PD only loads a half-register for memory, unlike the other
operations under 0x0F 0x5A. "Unpack" the group into separate
emission functions instead of using gen_unary_fp_sse.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit abd41884c530aa025ada253bf1a5bd0c2b808219)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/i386: generalize operand size "ph" for use in CVTPS2PD

CVTPS2PD only loads a half-register for memory, like CVTPH2PS. It can
reuse the "ph" packed half-precision size to load a half-register,
but rename it to "xh" because it is now a variation of "x" (it is not
used only for half-precision values).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a48b26978a090fe1f3f3e54319902d4ab56a6b3a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

subprojects/berkeley-testfloat-3: Update to fix a problem with compiler warnings

Update the berkeley-testfloat-3 wrap to include a patch provided by
Olaf Hering. This fixes a problem with "control reaches end of non-void
function [-Werror=return-type]" compiler warning/errors that are now
enabled by default in certain versions of GCC.

Reported-by: Olaf Hering <olaf@aepfle.de>
Message-Id: <20230816091522.1292029-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit c01196bdddc280ae3710912e98e78f3103155eaf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

scsi-disk: ensure that FORMAT UNIT commands are terminated

Otherwise when a FORMAT UNIT command is issued, the SCSI layer can become
confused because it can find itself in the situation where it thinks there
is still data to be transferred which can cause the next emulated SCSI
command to fail.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Fixes: 6ab71761 ("scsi-disk: add FORMAT UNIT command")
Tested-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230913204410.65650-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit be2b619a17345d007bcf9987a3e4afd1edea3e4f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

esp: restrict non-DMA transfer length to that of available data

In the case where a SCSI layer transfer is incorrectly terminated, it is
possible for a TI command to cause a SCSI buffer overflow due to the
expected transfer data length being less than the available data in the
FIFO. When this occurs the unsigned async_len variable underflows and
becomes a large offset which writes past the end of the allocated SCSI
buffer.

Restrict the non-DMA transfer length to be the smallest of the expected
transfer length and the available FIFO data to ensure that it is no longer
possible for the SCSI buffer overflow to occur.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1810
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230913204410.65650-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 77668e4b9bca03a856c27ba899a2513ddf52bb52)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

esp: use correct type for esp_dma_enable() in sysbus_esp_gpio_demux()

The call to esp_dma_enable() was being made with the SYSBUS_ESP type instead of
the ESP type. This meant that when GPIO 1 was being used to trigger a DMA
request from an external DMA controller, the setting of ESPState's dma_enabled
field would clobber unknown memory whilst the dma_cb callback pointer would
typically return NULL so the DMA request would never start.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230913204410.65650-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b86dc5cb0b4105fa8ad29e822ab5d21c589c5ec5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

optionrom: Remove build-id section

Our linker script for optionroms specifies only the placement of the
.text section, leaving the linker free to place the remaining sections
at arbitrary places in the file.

Since at least binutils 2.39, the .note.gnu.build-id section is now
being placed at the start of the file, which causes label addresses to
be shifted. For linuxboot_dma.bin that means that the PnP header
(among others) will not be found when determining the type of ROM at
optionrom_setup():

(0x1c is the label _pnph, where the magic "PnP" is)

$ xxd /usr/share/qemu/linuxboot_dma.bin | grep "PnP"
00000010: 0000 0000 0000 0000 0000 1c00 2450 6e50  ............$PnP

$ xxd pc-bios/optionrom/linuxboot_dma.bin | grep "PnP"
00000010: 0000 0000 0000 0000 0000 4c00 2450 6e50  ............$PnP
                                   ^bad

Using a freshly built linuxboot_dma.bin ROM results in a broken boot:

  SeaBIOS (version rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org)
  Booting from Hard Disk...
  Boot failed: could not read the boot disk

  Booting from Floppy...
  Boot failed: could not read the boot disk

  No bootable device.

We're not using the build-id section, so pass the --build-id=none
option to the linker to remove it entirely.

Note: In theory, this same issue could happen with any other
section. The ideal solution would be to have all unused sections
discarded in the linker script. However that would be a larger change,
specially for the pvh rom which uses the .bss and COMMON sections so
I'm addressing only the immediate issue here.

Reported-by: Vasiliy Ulyanov <vulyanov@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230926192502.15986-1-farosas@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 35ed01ba5448208695ada5fa20a13c0a4689a1c1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(mjt: remove unrelated stable@vger)

target/tricore: Fix RCPW/RRPW_INSERT insns for width = 0

we would crash if width was 0 for these insns, as tcg_gen_deposit() is
undefined for that case. For TriCore, width = 0 is a mov from the src reg
to the dst reg, so we special case this here.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-ID: <20230828112651.522058-9-kbastian@mail.uni-paderborn.de>
(cherry picked from commit 23fa6f56b33f8fddf86ba4d027fb7d3081440cd9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: Always require can_do_io

Require i/o as the last insn of a TranslationBlock always,
not only with icount. This is required for i/o that alters
the address space, such as a pci config space write.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1866
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 18a536f1f8d6222e562f59179e837fdfd8b92718)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: Always set CF_LAST_IO with CF_NOIRQ

Without this we can get see loops through cpu_io_recompile,
in which the cpu makes no progress.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 200c1f904f46c209cb022e711a48b89e46512902)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: Improve setting of can_do_io at start of TB

Initialize can_do_io to true if this the TB has CF_LAST_IO
and will consist of a single instruction. This avoids a
set to 0 followed immediately by a set to 1.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit a2f99d484c54adda13e62bf75ba512618a3fe470)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: Track current value of can_do_io in the TB

Simplify translator_io_start by recording the current
known value of can_do_io within DisasContextBase.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 0ca41ccf1c555f97873b8e02a47390fd6af4b18f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop

The condition checked is loop invariant; check it only once.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 5d97e94638100fd3e5b8d76ab30e1066cd4b1823)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: Avoid load of icount_decr if unused

With CF_NOIRQ and without !CF_USE_ICOUNT, the load isn't used.
Avoid emitting it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit f47a90dacca8f74210a2675bdde7ab3856872b94)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

softmmu: Use async_run_on_cpu in tcg_commit

After system startup, run the update to memory_dispatch
and the tlb_flush on the cpu. This eliminates a race,
wherein a running cpu sees the memory_dispatch change
but has not yet seen the tlb_flush.

Since the update now happens on the cpu, we need not use
qatomic_rcu_read to protect the read of memory_dispatch.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1826
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1834
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1846
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 0d58c660689f6da1e3feff8a997014003d928b3b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Move return path cleanup to main migration thread

Now that the return path thread is allowed to finish during a paused
migration, we can move the cleanup of the QEMUFiles to the main
migration thread.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-9-farosas@suse.de>
(cherry picked from commit 36e9aab3c569d4c9ad780473596e18479838d1aa)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Replace the return path retry logic

Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154         return f->last_error;

(gdb) bt
#0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
#1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
#2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
#3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
#4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
#5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
    qmp_migrate_pause()
     shutdown(ms->to_dst_file)
      f->last_error = -EIO
migrate_detect_error()
postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
    qmp_migrate(resume)
    migrate_fd_connect()
     resume = state == PAUSED
     open_return_path <-- TOO SOON!
     set_state(RECOVER)
     post(postcopy_pause_sem)
(incoming closes to_src_file)
res = qemu_file_get_error(rp)
migration_release_dst_files()
ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
postcopy_pause_return_path_thread()
  wait(postcopy_pause_rp_sem)
rp = ms->rp_state.from_dst_file
goto retry
qemu_file_get_error(rp)
SIGSEGV
-------------------------------------------------------------------------------------------

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-8-farosas@suse.de>
(cherry picked from commit ef796ee93b313ed2f0b427ef30320417387d2ad5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Consolidate return path closing code

We'll start calling the await_return_path_close_on_source() function
from other parts of the code, so move all of the related checks and
tracepoints into it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-7-farosas@suse.de>
(cherry picked from commit d50f5dc075cbb891bfe4a9378600a4871264468a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Remove redundant cleanup of postcopy_qemufile_src

This file is owned by the return path thread which is already doing
cleanup.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-6-farosas@suse.de>
(cherry picked from commit b3b101157d4651f12e6b3361af2de6bace7f9b4a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Fix possible race when shutting down to_dst_file

It's not safe to call qemu_file_shutdown() on the to_dst_file without
first checking for the file's presence under the lock. The cleanup of
this file happens at postcopy_pause() and migrate_fd_cleanup() which
are not necessarily running in the same thread as migrate_fd_cancel().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-5-farosas@suse.de>
(cherry picked from commit 7478fb0df914f0a5ab551ff74b1df62dd250500e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Fix possible races when shutting down the return path

We cannot call qemu_file_shutdown() on the return path file without
taking the file lock. The return path thread could be running it's
cleanup code and have just cleared the from_dst_file pointer.

Checking ms->to_dst_file for errors could also race with
migrate_fd_cleanup() which clears the to_dst_file pointer.

Protect both accesses by taking the file lock.

This was caught by inspection, it should be rare, but the next patches
will start calling this code from other places, so let's do the
correct thing.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-4-farosas@suse.de>
(cherry picked from commit 639decf529793fc544c8055b82be8abe77fa48fa)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Fix possible race when setting rp_state.error

We don't need to set the rp_state.error right after a shutdown because
qemu_file_shutdown() always sets the QEMUFile error, so the return
path thread would have seen it and set the rp error itself.

Setting the error outside of the thread is also racy because the
thread could clear it after we set it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-3-farosas@suse.de>
(cherry picked from commit 28a8347281e24c2e7bba6d3301472eda41d4c096)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Fix race that dest preempt thread close too early

We hit intermit CI issue on failing at migration-test over the unit test
preempt/plain:

qemu-system-x86_64: Unable to read from socket: Connection reset by peer
Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
(test program exited with status code -6)

Fabiano debugged into it and found that the preempt thread can quit even
without receiving all the pages, which can cause guest not receiving all
the pages and corrupt the guest memory.

To make sure preempt thread finished receiving all the pages, we can rely
on the page_requested_count being zero because preempt channel will only
receive requested page faults. Note, not all the faulted pages are required
to be sent via the preempt channel/thread; imagine the case when a
requested page is just queued into the background main channel for
migration, the src qemu will just still send it via the background channel.

Here instead of spinning over reading the count, we add a condvar so the
main thread can wait on it if that unusual case happened, without burning
the cpu for no good reason, even if the duration is short; so even if we
spin in this rare case is probably fine. It's just better to not do so.

The condvar is only used when that special case is triggered. Some memory
ordering trick is needed to guarantee it from happening (against the
preempt thread status field), so the main thread will always get a kick
when that triggers correctly.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
Debugged-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-2-farosas@suse.de>
(cherry picked from commit cf02f29e1e3843784630d04783e372fa541a77e5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ui/vnc: fix handling of VNC_FEATURE_XVP

VNC_FEATURE_XVP was not shifted left before adding it to vs->features,
so it was never enabled; but it was also checked the wrong way with
a logical AND instead of vnc_has_feature. Fix both places.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 477b301000d665313217f65e3a368d2cb7769c42)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ui/vnc: fix debug output for invalid audio message

The debug message was cut and pasted from the invalid audio format
case, but the audio message is at bytes 2-3.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0cb9c5880e6b8dedc4e20026ce859dd1ea9aac84)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/scsi/scsi-disk: Disallow block sizes smaller than 512 [CVE-2023-42467]

We are doing things like

nb_sectors /= (s->qdev.blocksize / BDRV_SECTOR_SIZE);

in the code here (e.g. in scsi_disk_emulate_mode_sense()), so if
the blocksize is smaller than BDRV_SECTOR_SIZE (=512), this crashes
with a division by 0 exception. Thus disallow block sizes of 256
bytes to avoid this situation.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1813
CVE: 2023-42467
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20230925091854.49198-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 7cfcc79b0ab800959716738aff9419f53fc68c9c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: mttcg remove false-negative halted assertion

mttcg asserts that an execution ending with EXCP_HALTED must have
cpu->halted. However between the event or instruction that sets
cpu->halted and requests exit and the assertion here, an
asynchronous event could clear cpu->halted.

This leads to crashes running AIX on ppc/pseries because it uses
H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and
H_PROD sets other cpu->halted = 0 and kicks it.

H_PROD could be turned into an interrupt to wake, but several other
places in ppc, sparc, and semihosting follow what looks like a similar
pattern setting halted = 0 directly. So remove this assertion.

Reported-by: Ivan Warren <ivan@vmfacility.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20230829010658.8252-1-npiggin@gmail.com>
[rth: Keep the case label and adjust the comment.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 0e5903436de712844b0e6cdd862b499c767e09e9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>