git.ipfire.org Git - thirdparty/qemu.git/log

Update version for 9.1.3 release

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

gdbstub/user-target: fix gdbserver int format (%d -> %x)

This commit fixes an incorrect format string for formatting integers
provided to GDB when debugging a target run in QEMU user mode.

The correct format is hexadecimal for both success and errno values,
some of which can be seen here [0].

[0] https://github.com/bminor/binutils-gdb/blob/e65a355022d0dc6b5707310876a72b5693ec0aa5/gdbserver/hostio.cc#L196-L213

Signed-off-by: Dominik 'Disconnect3d' Czarnota <dominik.b.czarnota@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Fixes: e282010b2e1e ("gdbstub: Add support for info proc mappings")
Cc: qemu-stable@nongnu.org
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 8b647bd352505234cab2acd2422aba183a1aa1fd)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

net/dump: Correctly compute Ethernet packet offset

When a packet is sent with QEMU_NET_PACKET_FLAG_RAW by QEMU it
never includes virtio-net header even if qemu_get_vnet_hdr_len()
is not 0, and filter-dump is not managing this case.

The only user of QEMU_NET_PACKET_FLAG_RAW is announce_self,
we can show the problem using it and tcpddump:

- QEMU parameters:

  .. -monitor stdio \
     -netdev bridge,id=netdev0,br=virbr0 \
     -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \
     -object filter-dump,netdev=netdev0,file=log.pcap,id=pcap0

- HMP command:

  (qemu) announce_self

- TCP dump:

  $ tcpdump -nxr log.pcap

  without the fix:

    08:00:06:04:00:03 > 2e:2f:80:35:00:01, ethertype Unknown (0x9a2b), length 50:
         0x0000:  2c2d 2e2f 0000 0000 9a2b 2c2d 2e2f 0000
         0x0010:  0000 0000 0000 0000 0000 0000 0000 0000
         0x0020:  0000 0000

  with the fix:

    ARP, Reverse Request who-is 9a:2b:2c:2d:2e:2f tell 9a:2b:2c:2d:2e:2f, length 46
         0x0000:  0001 0800 0604 0003 9a2b 2c2d 2e2f 0000
         0x0010:  0000 9a2b 2c2d 2e2f 0000 0000 0000 0000
         0x0020:  0000 0000 0000 0000 0000 0000 0000

Fixes: 481c52320a26 ("net: Strip virtio-net header when dumping")
Cc: akihiko.odaki@daynix.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit c6a1b591a68b4d7230d6c3f56965e18080d737e5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

net: Fix announce_self

b9ad513e1876 ("net: Remove receive_raw()") adds an iovec entry
in qemu_deliver_packet_iov() to add the virtio-net header
in the data when QEMU_NET_PACKET_FLAG_RAW is set but forgets
to increase the number of iovec entries in the array, so
receive_iov() will only send the first entry (the virtio-net
entry, full of 0) and no data. The packet will be discarded.

The only user of QEMU_NET_PACKET_FLAG_RAW is announce_self.

We can see the problem with tcpdump:

- QEMU parameters:

  .. -monitor stdio \
     -netdev bridge,id=netdev0,br=virbr0 \
     -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \

- HMP command:

  (qemu) announce_self

- TCP dump:

  $ sudo tcpdump -nxi virbr0

  without the fix:

    <nothing>

  with the fix:

   ARP, Reverse Request who-is 9a:2b:2c:2d:2e:2f tell 9a:2b:2c:2d:2e:2f, length 46
        0x0000:  0001 0800 0604 0003 9a2b 2c2d 2e2f 0000
        0x0010:  0000 9a2b 2c2d 2e2f 0000 0000 0000 0000
        0x0020:  0000 0000 0000 0000 0000 0000 0000

Reported-by: Xiaohui Li <xiaohli@redhat.com>
Bug: https://issues.redhat.com/browse/RHEL-73891
Fixes: b9ad513e1876 ("net: Remove receive_raw()")
Cc: akihiko.odaki@daynix.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 84dfdcbff33fff185528501be408c25c44499f32)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/s390x: Fix MVC not always invalidating translation blocks

Node.js crashes in qemu-system-s390x with random SIGSEGVs / SIGILLs.

The v8 JIT used by Node.js can garbage collect and overwrite unused
code. Overwriting is performed by WritableJitAllocation::CopyCode(),
which ultimately calls memcpy(). For certain sizes, memcpy() uses the
MVC instruction.

QEMU implements MVC and other similar instructions using helpers. While
TCG store ops invalidate affected translation blocks automatically,
helpers must do this manually by calling probe_access_flags(). The MVC
helper does this using the access_prepare() -> access_prepare_nf() ->
s390_probe_access() -> probe_access_flags() call chain.

At the last step of this chain, the store size is replaced with 0. This
causes the probe_access_flags() -> notdirty_write() ->
tb_invalidate_phys_range_fast() chain to miss some translation blocks.

When this happens, QEMU executes a mix of old and new code. This
quickly leads to either a SIGSEGV or a SIGILL in case the old code
ends in the middle of a new instruction.

Fix by passing the true size.

Reported-by: Berthold Gunreben <azouhr@opensuse.org>
Cc: Sarah Kriesch <ada.lovelace@gmx.de>
Cc: qemu-stable@nongnu.org
Closes: https://bugzilla.opensuse.org/show_bug.cgi?id=1235709
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Fixes: e2faabee78ff ("accel/tcg: Forward probe size on to notdirty_write")
Message-ID: <20250128001338.11474-1-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit e43ced8be18dda77c229ab09f85136a4d600d40d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

physmem: fix qemu_ram_alloc_from_fd size calculation

qemu_ram_alloc_from_fd allocates space if file_size == 0.  If non-zero,
it uses the existing space and verifies it is large enough, but the
verification was broken when the offset parameter was introduced.  As
a result, a file smaller than offset passes the verification and causes
errors later.  Fix that, and update the error message to include offset.

Peter provides this concise reproducer:

  $ touch ramfile
  $ truncate -s 64M ramfile
  $ ./qemu-system-x86_64 -object memory-backend-file,mem-path=./ramfile,offset=128M,size=128M,id=mem1,prealloc=on
  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address

With the fix, the error message is:
  qemu-system-x86_64: mem1 backing store size 0x4000000 is too small for 'size' option 0x8000000 plus 'offset' option 0x8000000

Cc: qemu-stable@nongnu.org
Fixes: 4b870dc4d0c0 ("hostmem-file: add offset option")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1736967650-129648-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 719168fba7c3215cc996dcfd32a6e5e9c7b8eee0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/usb/canokey: Fix buffer overflow for OUT packet

When USBPacket in OUT direction has larger payload
than the ep_out_buffer (of size 512), a buffer overflow
would occur.

It could be fixed by limiting the size of usb_packet_copy
to be at most buffer size. Further optimization gets rid
of the ep_out_buffer and directly uses ep_out as the target
buffer.

This is reported by a security researcher who artificially
constructed an OUT packet of size 2047. The report has gone
through the QEMU security process, and as this device is for
testing purpose and no deployment of it in virtualization
environment is observed, it is triaged not to be a security bug.

Cc: qemu-stable@nongnu.org
Fixes: d7d34918551dc48 ("hw/usb: Add CanoKey Implementation")
Reported-by: Juan Jose Lopez Jaimez <thatjiaozi@gmail.com>
Signed-off-by: Hongren Zheng <i@zenithal.me>
Message-id: Z4TfMOrZz6IQYl_h@Sun
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 664280abddcb3cacc9c6204706bb739fcc1316f7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: arm_reset_sve_state() should set FPSR, not FPCR

The pseudocode ResetSVEState() does:
FPSR = ZeroExtend(0x0800009f<31:0>, 64);
but QEMU's arm_reset_sve_state() called vfp_set_fpcr() by accident.

Before the advent of FEAT_AFP, this was only setting a collection of
RES0 bits, which vfp_set_fpsr() would then ignore, so the only effect
was that we didn't actually set the FPSR the way we are supposed to
do. Once FEAT_AFP is implemented, setting the bottom bits of FPSR
will change the floating point behaviour.

Call vfp_set_fpsr(), as we ought to.

(Note for stable backports: commit 7f2a01e7368f9 moved this function
from sme_helper.c to helper.c, but it had the same bug before the
move too.)

Cc: qemu-stable@nongnu.org
Fixes: f84734b87461 ("target/arm: Implement SMSTART, SMSTOP")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-4-peter.maydell@linaro.org
(cherry picked from commit 1edc3d43f20df0d04f8d00b906ba19fed37512a5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/cxl: Fix msix_notify: Assertion `vector < dev->msix_entries_nr`

This assertion always happens when we sanitize the CXL memory device.
$ echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize

It is incorrect to register an MSIX number beyond the device's capability.

Increase the device's MSIX number to cover the mailbox msix number(9).

Fixes: 43efb0bfad2b ("hw/cxl/mbox: Wire up interrupts for background completion")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Message-Id: <20250115075834.167504-1-lizhijian@fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1ce979e7269a34d19ea1a65808df014d8b2acbf6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests: acpi: update expected blobs

_DSM function 7 AML should have followig change:

               If ((Arg2 == 0x07))
               {
  -                Local0 = Package (0x02)
  -                    {
  -                        Zero,
  -                        ""
  -                    }
                   Local2 = AIDX (DerefOf (Arg4 [Zero]), DerefOf (Arg4 [One]
                       ))
  -                Local0 [Zero] = Local2
  +                Local0 = Package (0x02) {}
  +                If (!((Local2 == Zero) || (Local2 == 0xFFFFFFFF)))
  +                {
  +                    Local0 [Zero] = Local2
  +                    Local0 [One] = ""
  +                }
  +
                   Return (Local0)
               }
           }

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250115125342.3883374-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9fb1c9a1bb26e111ee5fa5538070cd684de14c08)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: rebuild acpi tables for 9.1)

pci: acpi: Windows 'PCI Label Id' bug workaround

Current versions of Windows call _DSM(func=7) regardless
of whether it is supported or not. It leads to NICs having bogus
'PCI Label Id = 0', where none should be set at all.

Also presence of 'PCI Label Id' triggers another Windows bug
on localized versions that leads to hangs. The later bug is fixed
in latest updates for 'Windows Server' but not in consumer
versions of Windows (and there is no plans to fix it
as far as I'm aware).

Given it's easy, implement Microsoft suggested workaround
(return invalid Package) so that affected Windows versions
could boot on QEMU.
This would effectvely remove bogus 'PCI Label Id's on NICs,
but MS teem confirmed that flipping 'PCI Label Id' should not
change 'Network Connection' ennumeration, so it should be safe
for QEMU to change _DSM without any compat code.

Smoke tested with WinXP and WS2022
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/774
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250115125342.3883374-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0b053391985abcc40b16ac8fc4a7f6588d1d95c1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests: acpi: whitelist expected blobs

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250115125342.3883374-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1ad32644fe4c9fb25086be15a66dde1d55d3410f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: remove q35/DSDT.acpihmat-generic-x which is not present in 9.1)

pci/msix: Fix msix pba read vector poll end calculation

The end vector calculation has a bug that results in polling fewer
than required vectors when reading at a non-zero offset in PBA memory.

Fixes: bbef882cc193 ("msi: add API to get notified about pending bit poll")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20241212120402.1475053-1-npiggin@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 42e2a7a0ab23784e44fcb18369e06067abc89305)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

pci: ensure valid link status bits for downstream ports

PCI hotplug for downstream endpoints on arm fails because Linux'
PCIe hotplug driver doesn't like the QEMU provided LNKSTA:

  pcieport 0000:08:01.0: pciehp: Slot(2): Card present
  pcieport 0000:08:01.0: pciehp: Slot(2): Link Up
  pcieport 0000:08:01.0: pciehp: Slot(2): Cannot train link: status 0x2000

There's 2 cases where LNKSTA isn't setup properly:
* the downstream device has no express capability
* max link width of the bridge is 0

Move the sanity checks added via 88c869198aa63
("pci: Sanity test minimum downstream LNKSTA") outside of the
branch to make sure downstream ports always have a valid LNKSTA.

Signed-off-by: Sebastian Ott <sebott@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20241203121928.14861-1-sebott@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 694632fd44987cc4618612a38ad151047524a590)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/usb/hcd-xhci-pci: Use modulo to select MSI vector as per spec

QEMU would crash with a failed assertion if the XHCI controller
attempted to raise the interrupt on an interrupter corresponding
to a MSI vector with a higher index than the highest configured
for the device by the guest driver.

This behaviour is correct on the MSI/PCI side: per PCI 3.0 spec,
devices must ensure they do not send MSI notifications for
vectors beyond the range of those allocated by the system/driver
software. Unlike MSI-X, there is no generic way for handling
aliasing in the case of fewer allocated vectors than requested,
so the specifics are up to device implementors. (Section
6.8.3.4. "Sending Messages")

It turns out the XHCI spec (Implementation Note in section 4.17,
"Interrupters") requires that the host controller signal the MSI
vector with the number computed by taking the interrupter number
modulo the number of enabled MSI vectors.

This change introduces that modulo calculation, fixing the
failed assertion. This makes the device work correctly in MSI mode
with macOS's XHCI driver, which only allocates a single vector.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250112210056.16658-2-phil@philjordan.eu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit bb5b7fced6b5d3334ab20702fc846e47bb1fb731)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

backends/cryptodev-vhost-user: Fix local_error leaks

Do not propagate error to the upper, directly output the error
to avoid leaks.

Fixes: 2fda101de07 ("virtio-crypto: Support asynchronous mode")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2714
Signed-off-by: Gabriel Barrantes <gabriel.barrantes.dev@outlook.com>
Reviewed-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <DM8PR13MB50781054A4FDACE6F4FB6469B30F2@DM8PR13MB5078.namprd13.prod.outlook.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 78b0c15a563ac4be5afb0375602ca0a3adc6c442)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/qtest/boot-serial-test: Correct HPPA machine name

Commit 7df6f751176 ("hw/hppa: Split out machine creation")
renamed the 'hppa' machine as 'B160L', but forgot to update
the boot serial test, which ended being skipped.

Cc: qemu-stable@nongnu.org
Fixes: 7df6f751176 ("hw/hppa: Split out machine creation")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250102100340.43014-2-philmd@linaro.org>
(cherry picked from commit a87077316ed2f1c1c8ba8faf05feed9dbf0f2fee)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

multifd: bugfix for incorrect migration data with QPL compression

When QPL compression is enabled on the migration channel and the same
dirty page changes from a normal page to a zero page in the iterative
memory copy, the dirty page will not be updated to a zero page again
on the target side, resulting in incorrect memory data on the source
and target sides.

The root cause is that the target side does not record the normal pages
to the receivedmap.

The solution is to add ramblock_recv_bitmap_set_offset in target side
to record the normal pages.

Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-3-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 2588a5f99b0c3493b4690e3ff01ed36f80e830cc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

multifd: bugfix for migration using compression methods

When compression is enabled on the migration channel and
the pages processed are all zero pages, these pages will
not be sent and updated on the target side, resulting in
incorrect memory data on the source and target sides.

The root cause is that all compression methods call
multifd_send_prepare_common to determine whether to compress
dirty pages, but multifd_send_prepare_common does not update
the IOV of MultiFDPacket_t when all dirty pages are zero pages.

The solution is to always update the IOV of MultiFDPacket_t
regardless of whether the dirty pages are all zero pages.

Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.")
Cc: qemu-stable@nongnu.org #9.0+
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Jason Zeng <jason.zeng@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241218091413.140396-2-yuan1.liu@intel.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit cdc3970f8597ebdc1a4c2090cfb4d11e297329ed)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: in 9.1 this code is in migration/multifd.c, not in migration/multifd-nocomp.c)

migration: Fix arrays of pointers in JSON writer

Currently, if an array of pointers contains a NULL pointer, that
pointer will be encoded as '0' in the stream. Since the JSON writer
doesn't define a "pointer" type, that '0' will now be an uint8, which
is different from the original type being pointed to, e.g. struct.

(we're further calling uint8 "nullptr", but that's irrelevant to the
issue)

That mixed-type array shouldn't be compressed, otherwise data is lost
as the code currently makes the whole array have the type of the first
element:

css = {NULL, NULL, ..., 0x5555568a7940, NULL};

{"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css",
"version": 1, "fields": [
    ...,
    {"name": "css", "array_len": 256, "type": "nullptr", "size": 1},
    ...,
]}

In the above, the valid pointer at position 254 got lost among the
compressed array of nullptr.

While we could disable the array compression when a NULL pointer is
found, the JSON part of the stream still makes part of downtime, so we
should avoid writing unecessary bytes to it.

Keep the array compression in place, but if NULL and non-NULL pointers
are mixed break the array into several type-contiguous pieces :

css = {NULL, NULL, ..., 0x5555568a7940, NULL};

{"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css",
"version": 1, "fields": [
     ...,
     {"name": "css", "array_len": 254, "type": "nullptr", "size": 1},
     {"name": "css", "type": "struct", "struct": {"vmsd_name": "s390_css_img", ... }, "size": 768},
     {"name": "css", "type": "nullptr", "size": 1},
     ...,
]}

Now each type-discontiguous region will become a new JSON entry. The
reader should interpret this as a concatenation of values, all part of
the same field.

Parsing the JSON with analyze-script.py now shows the proper data
being pointed to at the places where the pointer is valid and
"nullptr" where there's NULL:

"s390_css (14)": {
    ...
    "css": [
        "nullptr",
        "nullptr",
        ...
        "nullptr",
        {
            "chpids": [
            {
                "in_use": "0x00",
                "type": "0x00",
                "is_virtual": "0x00"
            },
            ...
            ]
        },
        "nullptr",
    }

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-7-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 35049eb0d2fc72bb8c563196ec75b4d6c13fce02)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Dump correct JSON format for nullptr replacement

QEMU plays a trick with null pointers inside an array of pointers in a VMSD
field.  See 07d4e69147 ("migration/vmstate: fix array of ptr with
nullptrs") for more details on why.  The idea makes sense in general, but
it may overlooked the JSON writer where it could write nothing in a
"struct" in the JSON hints section.

We hit some analyze-migration.py issues on s390 recently, showing that some
of the struct field contains nothing, like:

{"name": "css", "array_len": 256, "type": "struct", "struct": {}, "size": 1}

As described in details by Fabiano:

https://lore.kernel.org/r/87pll37cin.fsf@suse.de

It could be that we hit some null pointers there, and JSON was gone when
they're null pointers.

To fix it, instead of hacking around only at VMStateInfo level, do that
from VMStateField level, so that JSON writer can also be involved.  In this
case, JSON writer will replace the pointer array (which used to be a
"struct") to be the real representation of the nullptr field.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-6-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 9867c3a7ced12dd7519155c047eb2c0098a11c5f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Rename vmstate_info_nullptr

Rename vmstate_info_nullptr from "uint64_t" to "nullptr". This vmstate
actually reads and writes just a byte, so the proper name would be
uint8. However, since this is a marker for a NULL pointer, it's
convenient to have a more explicit name that can be identified by the
consumers of the JSON part of the stream.

Change the name to "nullptr" and add support for it in the
analyze-migration.py script. Arbitrarily use the name of the type as
the value of the field to avoid the script showing 0x30 or '0', which
could be confusing for readers.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-5-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit f52965bf0eeee28e89933264f1a9dbdcdaa76a7e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

s390x: Fix CSS migration

Commit a55ae46683 ("s390: move css_migration_enabled from machine to
css.c") disabled CSS migration globally instead of doing it
per-instance.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: qemu-stable@nongnu.org #9.1
Fixes: a55ae46683 ("s390: move css_migration_enabled from machine to css.c")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2704
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250109185249.23952-8-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit c76ee1f6255c3988a9447d363bb17072f1ec84e1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Fix parsing of s390 stream

The parsing for the S390StorageAttributes section is currently leaving
an unconsumed token that is later interpreted by the generic code as
QEMU_VM_EOF, cutting the parsing short.

The migration will issue a STATTR_FLAG_DONE between iterations, which
the script consumes correctly, but there's a final STATTR_FLAG_EOS at
.save_complete that the script is ignoring. Since the EOS flag is a
u64 0x1ULL and the stream is big endian, on little endian hosts a byte
read from it will be 0x0, the same as QEMU_VM_EOF.

Fixes: 81c2c9dd5d ("tests/qtest/migration-test: Fix analyze-migration.py for s390x")
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-4-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 69d1f784569fdb950f2923c3b6d00d7c1b71acc1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Remove unused argument in vmsd_desc_field_end

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-3-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 2aead53d39b828f8d9d0769ffa3579dadd64d846)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Add more error handling to analyze-migration.py

The analyze-migration script was seen failing in s390x in misterious
ways. It seems we're reaching the VMSDFieldStruct constructor without
any fields, which would indicate an empty .subsection entry, a
VMSTATE_STRUCT with no fields or a vmsd with no fields. We don't have
any of those, at least not without the unmigratable flag set, so this
should never happen.

Add some debug statements so that we can see what's going on the next
time the issue happens.

Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20250109185249.23952-2-farosas@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
(cherry picked from commit 86bee9e0c761a3d0e67c43b44001fd752f894cb0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/i386/cpu: Fix notes for CPU models

Fixes: 644e3c5d812 ("missing vmx features for Skylake-Server and Cascadelake-Server")
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 93dcc9390e5ad0696ae7e9b7b3a5b08c2d1b6de6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

docs: Correct release of TCG trace-events removal

TCG trace-events were deprecated before the v6.2 release,
and removed for v7.0.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit b4859e8f33a7d9c793a60395f792c10190cb4f78)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/loongarch: Use actual operand size with vbsrl check

Hardcoded 32 bytes is used for vbsrl emulation check, there is
problem when options lsx=on,lasx=off is used for vbsrl.v instruction
in TCG mode. It injects LASX exception rather LSX exception.

Here actual operand size is used.

Cc: qemu-stable@nongnu.org
Fixes: df97f338076 ("target/loongarch: Implement xvreplve xvinsve0 xvpickve")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit d41989e7548397b469ec9c7be4cee699321a317e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

config/targets: update aarch64_be-linux-user gdb XML list

Attempting to run the binary asserts when it can't find the XML entry.
We can fix it so we don't although I suspect other stuff is broken.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2580
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20241023113406.1284676-13-alex.bennee@linaro.org>
(cherry picked from commit 591e848aca7af3b4d25af03ed5bd266c479054bf)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

s390x/s390-virtio-ccw: don't crash on weird RAM sizes

KVM is not happy when starting a VM with weird RAM sizes:

  # qemu-system-s390x --enable-kvm --nographic -m 1234K
  qemu-system-s390x: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION
    failed, slot=0, start=0x0, size=0x244000: Invalid argument
  kvm_set_phys_mem: error registering slot: Invalid argument
  Aborted (core dumped)

Let's handle that in a better way by rejecting such weird RAM sizes
right from the start:

  # qemu-system-s390x --enable-kvm --nographic -m 1234K
  qemu-system-s390x: ram size must be multiples of 1 MiB

Message-ID: <20241219144115.2820241-2-david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 14e568ab4836347481af2e334009c385f456a734)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/intc/riscv_aplic: Fix APLIC in_clrip and clripnum write emulation

In the section "4.7 Precise effects on interrupt-pending bits"
of the RISC-V AIA specification defines that:

"If the source mode is Level1 or Level0 and the interrupt domain
is configured in MSI delivery mode (domaincfg.DM = 1):
The pending bit is cleared whenever the rectified input value is
low, when the interrupt is forwarded by MSI, or by a relevant
write to an in_clrip register or to clripnum."

Update the riscv_aplic_set_pending() to match the spec.

Fixes: bf31cf06eb ("hw/intc/riscv_aplic: Fix setipnum_le write emulation for APLIC MSI-mode")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241029085349.30412-1-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
(cherry picked from commit 0d0141fadc9063e527865ee420b2baf34e306093)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/i386: Reset TSCs of parked vCPUs too on VM reset

Since commit 5286c3662294 ("target/i386: properly reset TSC on reset")
QEMU writes the special value of "1" to each online vCPU TSC on VM reset
to reset it.

However parked vCPUs don't get that handling and due to that their TSCs
get desynchronized when the VM gets reset.
This in turn causes KVM to turn off PVCLOCK_TSC_STABLE_BIT in its exported
PV clock.
Note that KVM has no understanding of vCPU being currently parked.

Without PVCLOCK_TSC_STABLE_BIT the sched clock is marked unstable in
the guest's kvm_sched_clock_init().
This causes a performance regressions to show in some tests.

Fix this issue by writing the special value of "1" also to TSCs of parked
vCPUs on VM reset.

Reproducing the issue:
1) Boot a VM with "-smp 2,maxcpus=3" or similar

2) device_add host-x86_64-cpu,id=vcpu,node-id=0,socket-id=0,core-id=2,thread-id=0

3) Wait a few seconds

4) device_del vcpu

5) Inside the VM run:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is 1.

6) Reboot the VM

7) Once the VM boots once again run inside it:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is now 0.

Fixes: 5286c3662294 ("target/i386: properly reset TSC on reset")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/5a605a88e9a231386dc803c60f5fed9b48108139.1734014926.git.maciej.szmigiero@oracle.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3f2a05b31ee9ce2ddb6c75a9bc3f5e7f7af9a76f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vhost-user: fix shared object return values

VHOST_USER_BACKEND_SHARED_OBJECT_ADD and
VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE state
in the spec that they return 0 for successful
operations, non-zero otherwise. However,
implementation relies on the return types
of the virtio-dmabuf library, with opposite
semantics (true if everything is correct,
false otherwise). Therefore, current
implementation violates the specification.

Revert the logic so that the implementation
of the vhost-user handling methods matches
the specification.

Fixes: 043e127a126bb3ceb5fc753deee27d261fd0c5ce
Fixes: 160947666276c5b7f6bca4d746bcac2966635d79
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Albert Esteve <aesteve@redhat.com>
Message-Id: <20241022124615.585596-1-aesteve@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit eea5aeef84e1b74f515b474d3a86377701f93750)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

meson.build: Disallow libnfs v6 to fix the broken macOS build

The macOS builds in our CI (and possibly other very recent distros)
are currently broken since the update to libnfs version 6 there.
That version apparently comes with a big API breakage. v5.0.3 was
the final release of the old API (see the libnfs commit here:
https://github.com/sahlberg/libnfs/commit/4379837 ).

Disallow version 6.x for now to get the broken CI job working
again. Once somebody had enough time to adapt our code in
block/nfs.c, we can revert this change again.

Message-ID: <20241218065157.209020-1-thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit e2d98f257138b83b6a492d1da5847a7fe0930d10)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/intc/arm_gicv3_its: Zero initialize local DTEntry etc structs

In the GICv3 ITS model, we have a common coding pattern which has a
local C struct like "DTEntry dte", which is a C representation of an
in-guest-memory data structure, and we call a function such as
get_dte() to read guest memory and fill in the C struct.  These
functions to read in the struct sometimes have cases where they will
leave early and not fill in the whole struct (for instance get_dte()
will set "dte->valid = false" and nothing else for the case where it
is passed an entry_addr implying that there is no L2 table entry for
the DTE).  This then causes potential use of uninitialized memory
later, for instance when we call a trace event which prints all the
fields of the struct.  Sufficiently advanced compilers may produce
-Wmaybe-uninitialized warnings about this, especially if LTO is
enabled.

Rather than trying to carefully separate out these trace events into
"only the 'valid' field is initialized" and "all fields can be
printed", zero-init all the structs when we define them. None of
these structs are large (the biggest is 24 bytes) and having
consistent behaviour is less likely to be buggy.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2718
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20241213182337.3343068-1-peter.maydell@linaro.org
(cherry picked from commit 9678b9c505725732353baefedb88b53c2eb8a184)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

pc-bios: add missing riscv64 descriptor

Without descriptor libvirt cannot discover the EDK II binaries via
the qemu:///system connection.

Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Message-ID: <20241212090059.94167-1-heinrich.schuchardt@canonical.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 74dc38d0c6c15fd57a5dee94125d13ac5b00491d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fix in pc-bios/descriptors/meson.build due to missing
v9.1.0-337-gb883fb93db "roms: Support compile the efi bios for loongarch")

roms: re-add edk2-basetools target

Needed to build ipxe nic roms.

Reported-by: Liu Jaloo <liu.jaloo@gmail.com>
Fixes: 22e11539e167 ("edk2: replace build scripts")
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20241212084408.1390728-1-kraxel@redhat.com>
(cherry picked from commit 0f5715e4b5706b31b3550d8e6b88871e029c7823)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

x86/loader: only patch linux kernels

If the binary loaded via -kernel is *not* a linux kernel (in which
case protocol == 0), do not patch the linux kernel header fields.

It's (a) pointless and (b) might break binaries by random patching
and (c) changes the binary hash which in turn breaks secure boot
verification.

Background: OVMF happily loads and runs not only linux kernels but
any efi binary via direct kernel boot.

Note: Breaking the secure boot verification is a problem for linux
kernels too, but fixed that is left for another day ...

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20240905141211.1253307-3-kraxel@redhat.com>
(cherry picked from commit 57e2cc9abf5da38f600354fe920ff20e719607b4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tcg/riscv: Fix StoreStore barrier generation

On RISC-V to StoreStore barrier corresponds
`fence w, w` not `fence r, r`

Cc: qemu-stable@nongnu.org
Fixes: efbea94c76b ("tcg/riscv: Add slowpath load and store instructions")
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Denis Tomashev <denis.tomashev@syntacore.com>
Signed-off-by: Roman Artemev <roman.artemev@syntacore.com>
Message-ID: <e2f2131e294a49e79959d4fa9ec02cf4@syntacore.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit b438362a142527b97b638b7f0f35ebe11911a8d5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tcg: Reset free_temps before tcg_optimize

When allocating new temps during tcg_optmize, do not re-use
any EBB temps that were used within the TB. We do not have
any idea what span of the TB in which the temp was live.

Introduce tcg_temp_ebb_reset_freed and use before tcg_optimize,
as well as replacing the equivalent in plugin_gen_inject and
tcg_func_start.

Cc: qemu-stable@nongnu.org
Fixes: fb04ab7ddd8 ("tcg/optimize: Lower TCG_COND_TST{EQ,NE} if unsupported")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2711
Reported-by: wannacu <wannacu2049@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 04e006ab36a8565b92d4e21dd346367fbade7d74)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

9pfs: fix regression regarding CVE-2023-2861

The released fix for this CVE:

  f6b0de53fb8 ("9pfs: prevent opening special files (CVE-2023-2861)")

caused a regression with security_model=passthrough. When handling a
'Tmknod' request there was a side effect that 'Tmknod' request could fail
as 9p server was trying to adjust permissions:

  #6  close_if_special_file (fd=30) at ../hw/9pfs/9p-util.h:140
  #7  openat_file (mode=<optimized out>, flags=2228224,
      name=<optimized out>, dirfd=<optimized out>) at
      ../hw/9pfs/9p-util.h:181
  #8  fchmodat_nofollow (dirfd=dirfd@entry=31,
      name=name@entry=0x5555577ea6e0 "mysocket", mode=493) at
      ../hw/9pfs/9p-local.c:360
  #9  local_set_cred_passthrough (credp=0x7ffbbc4ace10, name=0x5555577ea6e0
      "mysocket", dirfd=31, fs_ctx=0x55555811f528) at
      ../hw/9pfs/9p-local.c:457
  #10 local_mknod (fs_ctx=0x55555811f528, dir_path=<optimized out>,
      name=0x5555577ea6e0 "mysocket", credp=0x7ffbbc4ace10) at
      ../hw/9pfs/9p-local.c:702
  #11 v9fs_co_mknod (pdu=pdu@entry=0x555558121140,
      fidp=fidp@entry=0x5555574c46c0, name=name@entry=0x7ffbbc4aced0,
      uid=1000, gid=1000, dev=<optimized out>, mode=49645,
      stbuf=0x7ffbbc4acef0) at ../hw/9pfs/cofs.c:205
  #12 v9fs_mknod (opaque=0x555558121140) at ../hw/9pfs/9p.c:3711

That's because server was opening the special file to adjust permissions,
however it was using O_PATH and it would have not returned the file
descriptor to guest. So the call to close_if_special_file() on that branch
was incorrect.

Let's lift the restriction introduced by f6b0de53fb8 such that it would
allow to open special files on host if O_PATH flag is supplied, not only
for 9p server's own operations as described above, but also for any client
'Topen' request.

It is safe to allow opening special files with O_PATH on host, because
O_PATH only allows path based operations on the resulting file descriptor
and prevents I/O such as read() and write() on that file descriptor.

Fixes: f6b0de53fb8 ("9pfs: prevent opening special files (CVE-2023-2861)")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2337
Reported-by: Dirk Herrendorfer <d.herrendoerfer@de.ibm.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Dirk Herrendorfer <d.herrendoerfer@de.ibm.com>
Message-Id: <E1tJWbk-007BH4-OB@kylie.crudebyte.com>
(cherry picked from commit d06a9d843fb65351e0e4dc42ba0c404f01ea92b3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/riscv: Avoid bad shift in riscv_cpu_do_interrupt()

In riscv_cpu_do_interrupt() we use the 'cause' value we got out of
cs->exception as a shift value.  However this value can be larger
than 31, which means that "1 << cause" is undefined behaviour,
because we do the shift on an 'int' type.

This causes the undefined behaviour sanitizer to complain
on one of the check-tcg tests:

$ UBSAN_OPTIONS=print_stacktrace=1:abort_on_error=1:halt_on_error=1 ./build/clang/qemu-system-riscv64 -M virt -semihosting -display none -device loader,file=build/clang/tests/tcg/riscv64-softmmu/issue1060
../../target/riscv/cpu_helper.c:1805:38: runtime error: shift exponent 63 is too large for 32-bit type 'int'
    #0 0x55f2dc026703 in riscv_cpu_do_interrupt /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/../../target/riscv/cpu_helper.c:1805:38
    #1 0x55f2dc3d170e in cpu_handle_exception /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/../../accel/tcg/cpu-exec.c:752:9

In this case cause is RISCV_EXCP_SEMIHOST, which is 0x3f.

Use 1ULL instead to ensure that the shift is in range.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 1697837ed9 ("target/riscv: Add M-mode virtual interrupt and IRQ filtering support.")
Fixes: 40336d5b1d ("target/riscv: Add HS-mode virtual interrupt and IRQ filtering support.")
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20241128103831.3452572-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 5311599cdc48337f2f27b1b51a80d46d75b05ed0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/openrisc/openrisc_sim: keep serial@90000000 as default

We used to only have a single UART on the platform and it was located at
address 0x90000000. When the number of UARTs was increased to 4, the
first UART remained at it's location, but instead of being the first one
to be registered, it became the last.

This caused QEMU to pick 0x90000300 as the default UART, which broke
software that hardcoded the address of 0x90000000 and expected it's
output to be visible when the user configured only a single console.

This caused regressions[1] in the barebox test suite when updating to a
newer QEMU. As there seems to be no good reason to register the UARTs in
inverse order, let's register them by ascending address, so existing
software can remain oblivious to the additional UART ports.

Changing the order of uart registration alone breaks Linux which
was choosing the UART at 0x90000300 as the default for ttyS0.  To fix
Linux we fix three things in the device tree:

1. Define stdout-path only one time for the first registered UART
    instead of incorrectly defining for each UART.
2. Change the UART alias name from 'uart0' to 'serial0' as almost all
    Linux tty drivers look for an alias starting with "serial".
3. Add the UART nodes so they appear in the final DTB in the
    order starting with the lowest address and working upwards.

In summary these changes mean that the QEMU default UART (serial_hd(0))
is now setup where:

* serial_hd(0) is the lowest-address UART
* serial_hd(0) is listed first in the DTB
* serial_hd(0) is the /chosen/stdout-path one
* the /aliases/serial0 alias points at serial_hd(0)

[1]: https://lore.barebox.org/barebox/707e7c50-aad1-4459-8796-0cc54bab32e2@pengutronix.de/T/#m5da26e8a799033301489a938b5d5667b81cef6ad

Fixes: 777784bda468 ("hw/openrisc: support 4 serial ports in or1ksim")
Cc: qemu-stable@nongnu.org
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
[stafford: Change to serial0 alias and update change message, reverse
uart registration order]
Signed-off-by: Stafford Horne <shorne@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 26dcf2be7e153defa289d20317707af034aca692)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/nvme: take a reference on the subsystem on vf realization

Make sure we grab a reference on the subsystem when a VF is realized.
Otherwise, the subsytem will be unrealized automatically when the VFs
are unregistered and unreffed.

This fixes a latent bug but was not exposed until commit 08f632848008
("pcie: Release references of virtual functions"). This was then fixed
(or rather, hidden) by commit c613ad25125b ("pcie_sriov: Do not manually
unrealize"), but that was then reverted (due to other issues) in commit
b0fdaee5d1ed, exposing the bug yet again.

Cc: qemu-stable@nongnu.org
Fixes: 08f632848008 ("pcie: Release references of virtual functions")
Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit 6651f8f2e5051f6750c2534ab3151339b3c476a2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/nvme: fix msix_uninit with exclusive bar

Commit fa905f65c554 introduced a machine compatibility parameter to
enable an exclusive bar for msix. It failed to account for this when
cleaning up. Make sure that if an exclusive bar is enabled, we use the
proper cleanup routine.

Cc: qemu-stable@nongnu.org
Fixes: fa905f65c554 ("hw/nvme: add machine compatibility parameter to enable msix exclusive bar")
Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit 9162f101257639cc4c7e20f72f77268b1256dd79)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/ppc: Fix THREAD_SIBLING_FOREACH for multi-socket

The THREAD_SIBLING_FOREACH macro wasn't excluding threads from other
chips. Add chip_index field to the thread state and add a check for the
new field in the macro.

Fixes: b769d4c8f4c6 ("target/ppc: Add initial flags and helpers for SMT support")
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
[npiggin: set chip_index for spapr too]
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 2fc0a78a57731fda50d5b01e16fd68681900f709)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/ppc: Fix non-maskable interrupt while halted

The ppc (pnv and spapr) NMI injection code does not go through the
asynchronous interrupt path and set a bit in env->pending_interrupts
and raise an interrupt request that the cpu_exec() loop can see.
Instead it injects the exception directly into registers.

This can lead to cpu_exec() missing that the thread has work to do,
if a NMI is injected while it was idle.

Fix this by clearing halted when injecting the interrupt. Probably
NMI injection should be reworked to use the interrupt request interface,
but this seems to work as a minimal fix.

Fixes: 3431648272d3 ("spapr: Add support for new NMI interface")
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit fa416ae6157a933ad3f7106090684759baaaf3c9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/9p: also check 'Tgetattr' in 'use-after-unlink' test

This verifies expected behaviour of previous bug fix patch.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <7017658155c517b9665b75333a97c79aa2d4f3df.1732465720.git.qemu_oss@crudebyte.com>
(cherry picked from commit eaab44ccc59b83d8dff60fca3361a9b98ec7fee6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

9pfs: fix 'Tgetattr' after unlink

With a valid file ID (FID) of an open file, it should be possible to send
a 'Tgettattr' 9p request and successfully receive a 'Rgetattr' response,
even if the file has been removed in the meantime. Currently this would
fail with ENOENT.

I.e. this fixes the following misbehaviour with a 9p Linux client:

  open("/home/tst/filename", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
  unlink("/home/tst/filename") = 0
  fstat(3, 0x23aa1a8) = -1 ENOENT (No such file or directory)

Expected results:

  open("/home/tst/filename", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
  unlink("/home/tst/filename") = 0
  fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0

This is because 9p server is always using a path name based lstat() call
which fails as soon as the file got removed. So to fix this, use fstat()
whenever we have an open file descriptor already.

Fixes: 00ede4c2529b ("virtio-9p: getattr server implementation...")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/103
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <4c41ad47f449a5cc8bfa9285743e029080d5f324.1732465720.git.qemu_oss@crudebyte.com>
(cherry picked from commit c81e7219e0736f80bfd3553676a19e2992cff41d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

9pfs: remove obsolete comment in v9fs_getattr()

The comment claims that we'd only support basic Tgetattr fields. This is
no longer true, so remove this comment.

Fixes: e06a765efbe3 ("hw/9pfs: Add st_gen support in getattr reply")
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <fb364d12045217a4c6ccd0dd6368103ddb80698b.1732465720.git.qemu_oss@crudebyte.com>
(cherry picked from commit 3bc4db44430f53387d17145bb52b330a830a03fe)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: pick it for stable so the next commit applies cleanly)

tests/9p: add 'use-after-unlink' test

After removing a file from the file system, we should still be able to
work with the file if we already had it open before removal.

As a first step we verify that it is possible to write to an unlinked
file, as this is what already works. This test is extended later on
after having fixed other use cases after unlink that are not working
yet.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <3d6449d4df25bcdd3e807eff169f46f1385e5257.1732465720.git.qemu_oss@crudebyte.com>
(cherry picked from commit 462db8fb1d405391b83a0d3099fdb9bfb85c2d92)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: pick it to stable so the next commit in this place applies too)

tests/9p: add missing Rgetattr response name

'Tgetattr' 9p request and its 'Rgetattr' response types are already used
by test client, however this response type is yet missing in function
rmessage_name(), so add it.

Fixes: a6821b828404 ("tests/9pfs: compare QIDs in fs_walk_none() test")
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <e183da80d390cfd7d55bdbce92f0ff6e3e5cdced.1732465720.git.qemu_oss@crudebyte.com>
(cherry picked from commit 4ec984965079b51a9afce339af75edea6de973a2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/9p: fix Rreaddir response name

All 9p response types are prefixed with an "R", therefore fix
"READDIR" -> "RREADDIR" in function rmessage_name().

Fixes: 4829469fd9ff ("tests/virtio-9p: added readdir test")
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <daad7af58b403aaa2487c566032beca36664b30e.1732465720.git.qemu_oss@crudebyte.com>
(cherry picked from commit abf0f092c1dd33b9ffa986c6924addc0a9c1d0b8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

scsi: megasas: Internal cdbs have 16-byte length

Host drivers do not necessarily set cdb_len in megasas io commands.
With commits 6d1511cea0 ("scsi: Reject commands if the CDB length
exceeds buf_len") and fe9d8927e2 ("scsi: Add buf_len parameter to
scsi_req_new()"), this results in failures to boot Linux from affected
SCSI drives because cdb_len is set to 0 by the host driver.
Set the cdb length to its actual size to solve the problem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.kernel.org/r/20230228171129.4094709-1-linux@roeck-us.net
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3abb67323aeecf06a27191076ab50424ec21f334)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ppc/spapr: fix drc index mismatch for partially enabled vcpus

In case when vcpus are explicitly enabled/disabled in a non-consecutive
order within a libvirt xml, it results in a drc index mismatch during
vcpu hotplug later because the existing logic uses vcpu id to derive the
corresponding drc index which is not correct. Use env->core_index to
derive a vcpu's drc index as appropriate to fix this issue.

For ex, for the given libvirt xml config:
  <vcpus>
    <vcpu id='0' enabled='yes' hotpluggable='no'/>
    <vcpu id='1' enabled='yes' hotpluggable='yes'/>
    <vcpu id='2' enabled='no' hotpluggable='yes'/>
    <vcpu id='3' enabled='yes' hotpluggable='yes'/>
    <vcpu id='4' enabled='no' hotpluggable='yes'/>
    <vcpu id='5' enabled='yes' hotpluggable='yes'/>
    <vcpu id='6' enabled='no' hotpluggable='yes'/>
    <vcpu id='7' enabled='no' hotpluggable='yes'/>
  </vcpus>

We see below error on guest console with "virsh setvcpus <domain> 5" :

pseries-hotplug-cpu: CPU with drc index 10000002 already exists

This patch fixes the issue by using correct drc index for explicitly
enabled vcpus during init.

Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Tested-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit e8185fdc63e1db1efba695aae568fae8a075a815)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

virtio-net: Add queues before loading them

Call virtio_net_set_multiqueue() to add queues before loading their
states. Otherwise the loaded queues will not have handlers and elements
in them will not be processed.

Cc: qemu-stable@nongnu.org
Fixes: 8c49756825da ("virtio-net: Add only one queue pair when realizing")
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 9379ea9db3c0064fa2787db0794a23a30f7b2d2d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Allow pipes to keep working for fd migrations

Libvirt may still use pipes for old file migrations in fd: URI form,
especially when loading old images dumped from Libvirt's compression
algorithms.

In that case, Libvirt needs to compress / uncompress the images on its own
over the migration binary stream, and pipes are passed over to QEMU for
outgoing / incoming migrations in "fd:" URIs.

For future such use case, it should be suggested to use mapped-ram when
saving such VM image. However there can still be old images that was
compressed in such way, so libvirt needs to be able to load those images,
uncompress them and use the same pipe mechanism to pass that over to QEMU.

It means, even if new file migrations can be gradually moved over to
mapped-ram (after Libvirt start supporting it), Libvirt still needs the
uncompressor for the old images to be able to load like before.

Meanwhile since Libvirt currently exposes the compression capability to
guest images, it may needs its own lifecycle management to move that over
to mapped-ram, maybe can be done after mapped-ram saved the image, however
Dan and PeterK raised concern on temporary double disk space consumption.
I suppose for now the easiest is to enable pipes for both sides of "fd:"
migrations, until all things figured out from Libvirt side on how to move
on.

And for "channels" QMP interface support on "migrate" / "migrate-incoming"
commands, we'll also need to move away from pipe. But let's leave that for
later too.

So far, still allow pipes to happen like before on both save/load sides,
just like we would allow sockets to pass.

Cc: qemu-stable <qemu-stable@nongnu.org>
Cc: Fabiano Rosas <farosas@suse.de>
Cc: Peter Krempa <pkrempa@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Fixes: c55deb860c ("migration: Deprecate fd: for file migration")
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20241120160132.3659735-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 87ae45e602e2943d58509e470e3a1d4ba084ab2f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

plugins: add missing export for qemu_plugin_num_vcpus

Fixes: 4a448b148ca ("plugins: add qemu_plugin_num_vcpus function")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20241112212622.3590693-2-pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20241121165806.476008-36-alex.bennee@linaro.org>
(cherry picked from commit cfa3a6c54511374e9ccee26d9c38ac1698fc7af2)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ssh: Do not switch session to non-blocking mode

The libssh does not handle non-blocking mode in SFTP correctly. The
driver code already changes the mode to blocking for the SFTP
initialization, but for some reason changes to non-blocking mode.
This used to work accidentally until libssh in 0.11 branch merged
the patch to avoid infinite looping in case of network errors:

https://gitlab.com/libssh/libssh-mirror/-/merge_requests/498

Since then, the ssh driver in qemu fails to read files over SFTP
as the first SFTP messages exchanged after switching the session
to non-blocking mode return SSH_AGAIN, but that message is lost
int the SFTP internals and interpretted as SSH_ERROR, which is
returned to the caller:

https://gitlab.com/libssh/libssh-mirror/-/issues/280

This is indeed an issue in libssh that we should address in the
long term, but it will require more work on the internals. For
now, the SFTP is not supported in non-blocking mode.

Fixes: https://gitlab.com/libssh/libssh-mirror/-/issues/280
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-ID: <20241113125526.2495731-1-rjones@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit fbdea3d6c13d5a75895c287a004c6f1a6bf6c164)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

qdev: Fix set_pci_devfn() to visit option only once

pci_devfn properties accept either a string or an integer as input. To
implement this, set_pci_devfn() first tries to visit the option as a
string, and if that fails, it visits it as an integer instead. While the
QemuOpts visitor happens to accept this, it is invalid according to the
visitor interface. QObject input visitors run into an assertion failure
when this is done.

QObject input visitors are used with the JSON syntax version of -device
on the command line:

$ ./qemu-system-x86_64 -enable-kvm -M q35 -device pcie-pci-bridge,id=pci.1,bus=pcie.0 -blockdev null-co,node-name=disk -device '{ "driver": "virtio-blk-pci", "drive": "disk", "id": "virtio-disk0", "bus": "pci.1", "addr": 1 }'
qemu-system-x86_64: ../qapi/qobject-input-visitor.c:143: QObject *qobject_input_try_get_object(QObjectInputVisitor *, const char *, _Bool): Assertion `removed' failed.

The proper way to accept both strings and integers is using the
alternate mechanism, which tells us the type of the input before it's
visited. With this information, we can directly visit it as the right
type.

This fixes set_pci_devfn() by using the alternate mechanism.

Cc: qemu-stable@nongnu.org
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20241119120353.57812-1-kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 5102f9df4a9a7adfbd902f9515c3f8f53dba288e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

virtio-net: Fix size check in dhclient workaround

work_around_broken_dhclient() accesses IP and UDP headers to detect
relevant packets and to calculate checksums, but it didn't check if
the packet has size sufficient to accommodate them, causing out-of-bound
access hazards. Fix this by correcting the size requirement.

Fixes: 1d41b0c1ec66 ("Work around dhclient brokenness")
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit a8575f7fb2f213e6690b23160b04271d47fdfaa8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user: Fix strace output for s390x mmap()

print_mmap() assumes that mmap() receives arguments via memory if
mmap2() is present. s390x (as opposed to s390) does not fit this
pattern: it does not have mmap2(), but mmap() still receives arguments
via memory.

Fix by sharing the detection logic between syscall.c and strace.c.

Cc: qemu-stable@nongnu.org
Fixes: d971040c2d16 ("linux-user: Fix strace output for old_mmap")
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20241120212717.246186-1-iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit d95fd9838b540e69da9b07538ec8ad6ab9eab260)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: compensate for chris architecture removal by v9.1.0-282-gbff4b02ca1f4
"linux-user: Remove support for CRIS target")

hw/intc/loongarch_extioi: Use set_bit32() and clear_bit32() for s->isr

In extioi_setirq() we try to operate on a bit array stored as an
array of uint32_t using the set_bit() and clear_bit() functions
by casting the pointer to 'unsigned long *'.
This has two problems:
* the alignment of 'uint32_t' is less than that of 'unsigned long'
   so we pass an insufficiently aligned pointer, which is
   undefined behaviour
* on big-endian hosts the 64-bit 'unsigned long' will have
   its two halves the wrong way around, and we will produce
   incorrect results

The undefined behaviour is shown by the clang undefined-behaviour
sanitizer when running the loongarch64-virt functional test:

/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/qemu/bitops.h:41:5: runtime error: store to misaligned address 0x555559745d9c for type 'unsigned long', which requires 8 byte alignment
0x555559745d9c: note: pointer points here
  ff ff ff ff 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^
    #0 0x555556fb81c4 in set_bit /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/qemu/bitops.h:41:9
    #1 0x555556fb81c4 in extioi_setirq /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/../../hw/intc/loongarch_extioi.c:65:9
    #2 0x555556fb6e90 in pch_pic_irq_handler /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/../../hw/intc/loongarch_pch_pic.c:75:5
    #3 0x555556710265 in serial_ioport_write /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/../../hw/char/serial.c

Fix these problems by using set_bit32() and clear_bit32(),
which work with bit arrays stored as an array of uint32_t.

Cc: qemu-stable@nongnu.org
Fixes: cbff2db1e92f8759 ("hw/intc: Add LoongArch extioi interrupt controller(EIOINTC)")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-id: 20241108135514.4006953-4-peter.maydell@linaro.org
(cherry picked from commit 335be5bc44aa6800a9e3ba5859ea3833cfe5a7bc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

bitops.h: Define bit operations on 'uint32_t' arrays

Currently bitops.h defines a set of operations that work on
arbitrary-length bit arrays.  However (largely because they
originally came from the Linux kernel) the bit array storage is an
array of 'unsigned long'.  This is OK for the kernel and even for
parts of QEMU where we don't really care about the underlying storage
format, but it is not good for devices, where we often want to expose
the storage to the guest and so need a type that is not
variably-sized between host OSes.

We already have a workaround for this in the GICv3 model:
arm_gicv3_common.h defines equivalents of the bit operations that
work on uint32_t.  It turns out that we should also be using
something similar in hw/intc/loongarch_extioi.c, which currently
casts a pointer to a uint32_t array to 'unsigned long *' in
extio_setirq(), which is both undefined behaviour and not correct on
a big-endian host.

Define equivalents of the set_bit() function family which work
with a uint32_t array.

(Cc stable because we're about to provide a bugfix to
loongarch_extioi which will depend on this commit.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20241108135514.4006953-2-peter.maydell@linaro.org
(cherry picked from commit 3d7680fb18c7b17701730589d241a32e85f763a3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/intc/openpic: Avoid taking address of out-of-bounds array index

The clang sanitizer complains about the code in the EOI handling
of openpic_cpu_write_internal():

UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1 ./build/clang/qemu-system-ppc -M mac99,graphics=off -display none -kernel day15/invaders.elf
../../hw/intc/openpic.c:1034:16: runtime error: index -1 out of bounds for type 'IRQSource[264]' (aka 'struct IRQSource[264]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../hw/intc/openpic.c:1034:16 in

This is because we do
src = &opp->src[n_IRQ];
when n_IRQ may be -1. This is in practice harmless because if n_IRQ
is -1 then we don't do anything with the src pointer, but it is
undefined behaviour. (This has been present since this device
was first added to QEMU.)

Rearrange the code so we only do the array index when n_IRQ is not -1.

Cc: qemu-stable@nongnu.org
Fixes: e9df014c0b ("Implement embedded IRQ controller for PowerPC 6xx/740 & 75")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 20241105180205.3074071-1-peter.maydell@linaro.org
(cherry picked from commit 3bf7dcd47a3da0e86a9347ce5b2b5d5a1dcb5857)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

Update version for 9.1.2 release

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

usb-hub: Fix handling port power control messages

The ClearPortFeature control message fails for PORT_POWER because there
is no break; at the end of the case statement, causing it to fall through
to the failure handler. Add the missing break; to solve the problem.

Fixes: 1cc403eb21 ("usb-hub: emulate per port power switching")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241112170152.217664-11-linux@roeck-us.net>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit b2cc69997924b651c0c6f4037782e25f2e438715)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/audio/hda: fix memory leak on audio setup

When SET_STREAM_FORMAT is called, the st->buft timer is overwritten, thus
causing a memory leak.  This was originally fixed in commit 816139ae6a5
("hw/audio/hda: fix memory leak on audio setup", 2024-11-14) but that
caused the audio to break in SPICE.

Fortunately, a simpler fix is possible.  The timer only needs to be
reset, because the callback is always the same (st->output is set at
realize time in hda_audio_init); call to timer_new_ns overkill.  Replace
it with timer_del and only initialize the timer once; for simplicity,
do it even if use_timer is false.

An even simpler fix would be to free the old time in hda_audio_setup().
However, it seems better to place the initialization of the timer close
to that of st->ouput.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20241114125318.1707590-3-pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 626b39006d2f9b1378a04cb88a2187bb852cb055)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

Revert "hw/audio/hda: fix memory leak on audio setup"

This reverts commit 6d03242a7e47815ed56687ecd13f683d8da3f2fe,
which causes SPICE audio to break. While arguably this is a SPICE bug,
it is possible to fix the leak in a less heavy-handed way.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2639
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20241114125318.1707590-2-pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit e125d9835b89545b09c0367404dcf69f18ae6de1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/misc/mos6522: Fix bad class definition of the MOS6522 device

When compiling QEMU with --enable-cfi, the "q800" m68k machine
currently crashes very early, when the q800_machine_init() function
tries to wire the interrupts of the "via1" device.
This happens because TYPE_MOS6522_Q800_VIA1 is supposed to be a
proper SysBus device, but its parent (TYPE_MOS6522) has a mistake
in its class definition where it is only derived from DeviceClass,
and not from SysBusDeviceClass, so we end up in funny memory access
issues here. Using the right class hierarchy for the MOS6522 device
fixes the problem.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2675
Signed-off-by: Thomas Huth <thuth@redhat.com>
Fixes: 51f233ec92 ("misc: introduce new mos6522 VIA device")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-ID: <20241114104653.963812-1-thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit c3d7c18b0d616cf7fb3c1f325503e1462307209d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

vfio/container: Fix container object destruction

When commit 96b7af4388b3 intoduced a .instance_finalize() handler,
it did not take into account that the container was not necessarily
inserted into the container list of the address space. Hence, if
the container object is destroyed, by calling object_unref() for
example, before vfio_address_space_insert() is called, QEMU may
crash when removing the container from the list as done in
vfio_container_instance_finalize(). This was seen with an SEV-SNP
guest for which discarding of RAM fails.

To resolve this issue, use the safe version of QLIST_REMOVE().

Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Fixes: 96b7af4388b3 ("vfio/container: Move vfio_container_destroy() to an instance_finalize() handler")
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit ebbf7c60bbd1ceedf9faf962e428ceda2388c248)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/i386: fix hang when using slow path for ptw_setl

When instrumenting memory accesses for plugin, we force memory accesses
to use the slow path for mmu [1]. This create a situation where we end
up calling ptw_setl_slow. This was fixed recently in [2] but the issue
still could appear out of plugins use case.

Since this function gets called during a cpu_exec, start_exclusive then
hangs. This exclusive section was introduced initially for security
reasons [3].

I suspect this code path was never triggered, because ptw_setl_slow
would always be called transitively from cpu_exec, resulting in a hang.

[1] https://gitlab.com/qemu-project/qemu/-/commit/6d03226b42247b68ab2f0b3663e0f624335a4055
[2] https://gitlab.com/qemu-project/qemu/-/commit/115ade42d50144c15b74368d32dc734ea277d853
[2] https://gitlab.com/qemu-project/qemu/-/commit/3a41aa8226bdaa709121515faea6e0e5ad1efa39 in 9.1.x series
[3] https://gitlab.com/qemu-project/qemu/-/issues/279

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2566
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20241025175857.2554252-2-pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 7ba055b49b74c4d2f4a338c5198485bdff373fb1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: mention [2] in 9.1.x series)

tcg: Allow top bit of SIMD_DATA_BITS to be set in simd_desc()

In simd_desc() we create a SIMD descriptor from various pieces
including an arbitrary data value from the caller.  We try to
sanitize these to make sure everything will fit: the 'data' value
needs to fit in the SIMD_DATA_BITS (== 22) sized field.  However we
do that sanitizing with:
   tcg_debug_assert(data == sextract32(data, 0, SIMD_DATA_BITS));

This works for the case where the data is supposed to be considered
as a signed integer (which can then be returned via simd_data()).
However, some callers want to treat the data value as unsigned.

Specifically, for the Arm SVE operations, make_svemte_desc()
assembles a data value as a collection of fields, and it needs to use
all 22 bits.  Currently if MTE is enabled then its MTEDESC SIZEM1
field may have the most significant bit set, and then it will trip
this assertion.

Loosen the assertion so that we only check that the data value will
fit into the field in some way, either as a signed or as an unsigned
value.  This means we will fail to detect some kinds of bug in the
callers, but we won't spuriously assert for intentional use of the
data field as unsigned.

Cc: qemu-stable@nongnu.org
Fixes: db432672dc50e ("tcg: Add generic vector expanders")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2601
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-ID: <20241115172515.1229393-1-peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 8377e3fb854d126ba10e61cb6b60885af8443ad4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user/arm: Select vdso for be8 and be32 modes

In be8 mode, instructions are little-endian.
In be32 mode, instructions are big-endian.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2333
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 95c9e2209cc09453cfd49e91321df254ccbf466f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user/arm: Reduce vdso alignment to 4k

Reduce vdso alignment to minimum page size.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit f7150b2151398c9274686d06c2c1e24618aa4cd6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user: Tolerate CONFIG_LSM_MMAP_MIN_ADDR

Running qemu-i386 on a system running with SELinux in enforcing mode
(more precisely: s390x trixie container on Fedora 40) fails with:

qemu-i386: tests/tcg/i386-linux-user/sigreturn-sigmask: Unable to find a guest_base to satisfy all guest address mapping requirements
00000000-ffffffff

The reason is that main() determines mmap_min_addr from
/proc/sys/vm/mmap_min_addr, but SELinux additionally defines
CONFIG_LSM_MMAP_MIN_ADDR, which is normally larger: 32K or 64K, but,
in general, can be anything. There is no portable way to query its
value: /boot/config, /proc/config and /proc/config.gz are distro- and
environment-specific.

Once the identity map fails, the magnitude of guest_base does not
matter, so fix by starting the search from 1M or 1G.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2598
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20241023002558.34589-1-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit fb7f3572b111ffb6c2dd2c7f6c5b4dc57dd8a3f5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

accel/tcg: Fix user-only probe_access_internal plugin check

The acc_flag check for write should have been against PAGE_WRITE_ORG,
not PAGE_WRITE. But it is better to combine two acc_flag checks
to a single check against access_type. This matches the system code
in cputlb.c.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2647
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: 20241111145002.144995-1-richard.henderson@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
(cherry picked from commit 2a339fee450638b512c5122281cb5ab49331cfb8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: Drop user-only special case in sve_stN_r

This path is reachable with plugins enabled, and provoked
with run-plugin-catch-syscalls-with-libinline.so.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20241112141232.321354-1-richard.henderson@linaro.org>
(cherry picked from commit f27550804688da43c6e0d87b2f9e143adbf76271)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

linux-user: Fix setreuid and setregid to use direct syscalls

The commit fd6f7798ac30 ("linux-user: Use direct syscalls for setuid(),
etc") added direct syscall wrappers for setuid(), setgid(), etc since the
system calls have different semantics than the libc functions.

Add and use the corresponding wrappers for setreuid and setregid which
were missed in that commit.

This fixes the build of the debian package of the uid_wrapper library
(https://cwrap.org/uid_wrapper.html) when running linux-user.

Cc: qemu-stable@nongnu.org
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <Zyo2jMKqq8hG8Pkz@p100>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 8491026a08b417b2d4070f7c373dcb43134c5312)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/i386/pc: Don't try to init PCI NICs if there is no PCI bus

The 'isapc' machine type has no PCI bus, but pc_nic_init() still
calls pci_init_nic_devices() passing it a NULL bus pointer. This
causes the clang sanitizer to complain:

$ ./build/clang/qemu-system-i386 -M isapc
../../hw/pci/pci.c:1866:39: runtime error: member access within null pointer of type 'PCIBus' (aka 'struct PCIBus')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../hw/pci/pci.c:1866:39 in

This is because pci_init_nic_devices() does
&bus->qbus
which is undefined behaviour on a NULL pointer even though we're not
actually dereferencing the pointer. (We don't actually crash as
a result, so if you aren't running a sanitizer build then there
are no user-visible effects.)

Make pc_nic_init() avoid trying to initialize PCI NICs on a non-PCI
system.

Cc: qemu-stable@nongnu.org
Fixes: 8d39f9ba14d64 ("hw/i386/pc: use qemu_get_nic_info() and pci_init_nic_devices()")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20241105171813.3031969-1-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit bd0e501e1a4813fa36a4cf9842aaf430323a03c3)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/i386: Fix legacy page table walk

Commit b56617bbcb4 ("target/i386: Walk NPT in guest real mode") added
logic to run the page table walker even in real mode if we are in NPT
mode.  That function then determined whether real mode or paging is
active based on whether the pg_mode variable was 0.

Unfortunately pg_mode is 0 in two situations:

  1) Paging is disabled (real mode)
  2) Paging is in 2-level paging mode (32bit without PAE)

That means the walker now assumed that 2-level paging mode was real
mode, breaking NetBSD as well as Windows XP.

To fix that, this patch adds a new PG flag to pg_mode which indicates
whether paging is active at all and uses that to determine whether we
are in real mode or not.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2654
Fixes: b56617bbcb4 ("target/i386: Walk NPT in guest real mode")
Fixes: 01bfc2e2959 (commit b56617bbcb4 in stable-9.1.x series)
Signed-off-by: Alexander Graf <graf@amazon.com>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Link: https://lore.kernel.org/r/20241106154329.67218-1-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 8fa11a4df344f58375eb26b3b65004345f21ef37)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

9pfs: fix crash on 'Treaddir' request

A bad (broken or malicious) 9p client (guest) could cause QEMU host to
crash by sending a 9p 'Treaddir' request with a numeric file ID (FID) that
was previously opened for a file instead of an expected directory:

  #0  0x0000762aff8f4919 in __GI___rewinddir (dirp=0xf) at
    ../sysdeps/unix/sysv/linux/rewinddir.c:29
  #1  0x0000557b7625fb40 in do_readdir_many (pdu=0x557bb67d2eb0,
    fidp=0x557bb67955b0, entries=0x762afe9fff58, offset=0, maxsize=131072,
    dostat=<optimized out>) at ../hw/9pfs/codir.c:101
  #2  v9fs_co_readdir_many (pdu=pdu@entry=0x557bb67d2eb0,
    fidp=fidp@entry=0x557bb67955b0, entries=entries@entry=0x762afe9fff58,
    offset=0, maxsize=131072, dostat=false) at ../hw/9pfs/codir.c:226
  #3  0x0000557b7625c1f9 in v9fs_do_readdir (pdu=0x557bb67d2eb0,
    fidp=0x557bb67955b0, offset=<optimized out>,
    max_count=<optimized out>) at ../hw/9pfs/9p.c:2488
  #4  v9fs_readdir (opaque=0x557bb67d2eb0) at ../hw/9pfs/9p.c:2602

That's because V9fsFidOpenState was declared as union type. So the
same memory region is used for either an open POSIX file handle (int),
or a POSIX DIR* pointer, etc., so 9p server incorrectly used the
previously opened (valid) POSIX file handle (0xf) as DIR* pointer,
eventually causing a crash in glibc's rewinddir() function.

Root cause was therefore a missing check in 9p server's 'Treaddir'
request handler, which must ensure that the client supplied FID was
really opened as directory stream before trying to access the
aforementioned union and its DIR* member.

Cc: qemu-stable@nongnu.org
Fixes: d62dbb51f7 ("virtio-9p: Add fidtype so that we can do type ...")
Reported-by: Akihiro Suda <suda.kyoto@gmail.com>
Tested-by: Akihiro Suda <suda.kyoto@gmail.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1t8GnN-002RS8-E2@kylie.crudebyte.com>
(cherry picked from commit 042b4ebfd2298ae01553844124f27d651cdb1071)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/nvme: fix handling of over-committed queues

If a host chooses to use the SQHD "hint" in the CQE to know if there is
room in the submission queue for additional commands, it may result in a
situation where there are not enough internal resources (struct
NvmeRequest) available to process the command. For a lack of a better
term, the host may "over-commit" the device (i.e., it may have more
inflight commands than the queue size).

For example, assume a queue with N entries. The host submits N commands
and all are picked up for processing, advancing the head and emptying
the queue. Regardless of which of these N commands complete first, the
SQHD field of that CQE will indicate to the host that the queue is
empty, which allows the host to issue N commands again. However, if the
device has not posted CQEs for all the previous commands yet, the device
will have less than N resources available to process the commands, so
queue processing is suspended.

And here lies an 11 year latent bug. In the absense of any additional
tail updates on the submission queue, we never schedule the processing
bottom-half again unless we observe a head update on an associated full
completion queue. This has been sufficient to handle N-to-1 SQ/CQ setups
(in the absense of over-commit of course). Incidentially, that "kick all
associated SQs" mechanism can now be killed since we now just schedule
queue processing when we return a processing resource to a non-empty
submission queue, which happens to cover both edge cases. However, we
must retain kicking the CQ if it was previously full.

So, apparently, no previous driver tested with hw/nvme has ever used
SQHD (e.g., neither the Linux NVMe driver or SPDK uses it). But then OSv
shows up with the driver that actually does. I salute you.

Fixes: f3c507adcd7b ("NVMe: Initial commit for new storage interface")
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2388
Reported-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
(cherry picked from commit 9529aa6bb4d18763f5b4704cb4198bd25cbbee31)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

migration: Ensure vmstate_save() sets errp

migration/savevm.c contains some calls to vmstate_save() that are
followed by migrate_set_error() if the integer return value indicates an
error.  migrate_set_error() requires that the `Error *` object passed to
it is set.  Therefore, vmstate_save() is assumed to always set *errp on
error.

Right now, that assumption is not met: vmstate_save_state_v() (called
internally by vmstate_save()) will not set *errp if
vmstate_subsection_save() or vmsd->post_save() fail.  Fix that by adding
an *errp parameter to vmstate_subsection_save(), and by generating a
generic error in case post_save() fails (as is already done for
pre_save()).

Without this patch, qemu will crash after vmstate_subsection_save() or
post_save() have failed inside of a vmstate_save() call (unless
migrate_set_error() then happen to discard the new error because
s->error is already set).  This happens e.g. when receiving the state
from a virtio-fs back-end (virtiofsd) fails.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Link: https://lore.kernel.org/r/20241015170437.310358-1-hreitz@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 37dfcba1a04989830c706f9cbc00450e5d3a7447)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)

Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
the calculation of the inner loop terminator wrong.  Although we
correctly account for the element size when we calculate the
terminator for the first iteration:
   intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
we don't do that when we move it forward after the first inner loop
completes.  The intention is that we process the vector in 128-bit
segments, which for a 64-bit element size should mean (1, 2), (3, 4),
(5, 6), etc.  This bug meant that we would iterate (1, 2), (3, 4, 5,
6), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
the operations, and also index off the end of the vector.

You don't see this bug if the vector length is small enough that we
don't need to iterate the outer loop, i.e.  if it is only 128 bits,
or if it is the 64-bit special case from AA32/AA64 AdvSIMD.  If the
vector length is 256 bits then we calculate the right results for the
elements in the vector but do index off the end of the vector. Vector
lengths greater than 256 bits see wrong answers. The instructions
that produce 32-bit results behave correctly.

Fix the recalculation of 'segend' for subsequent iterations, and
restore a version of the comment that was lost in the refactor of
commit 7020ffd656a5 that explains why we only need to clamp segend to
opr_sz_n for the first iteration, not the later ones.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
Fixes: 7020ffd656a5 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
(cherry picked from commit e6b2fa1b81ac6b05c4397237c846a295a9857920)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/arm: Add new MMU indexes for AArch32 Secure PL1&0

Our current usage of MMU indexes when EL3 is AArch32 is confused.
Architecturally, when EL3 is AArch32, all Secure code runs under the
Secure PL1&0 translation regime:
* code at EL3, which might be Mon, or SVC, or any of the
   other privileged modes (PL1)
* code at EL0 (Secure PL0)

This is different from when EL3 is AArch64, in which case EL3 is its
own translation regime, and EL1 and EL0 (whether AArch32 or AArch64)
have their own regime.

We claimed to be mapping Secure PL1 to our ARMMMUIdx_EL3, but didn't
do anything special about Secure PL0, which meant it used the same
ARMMMUIdx_EL10_0 that NonSecure PL0 does.  This resulted in a bug
where arm_sctlr() incorrectly picked the NonSecure SCTLR as the
controlling register when in Secure PL0, which meant we were
spuriously generating alignment faults because we were looking at the
wrong SCTLR control bits.

The use of ARMMMUIdx_EL3 for Secure PL1 also resulted in the bug that
we wouldn't honour the PAN bit for Secure PL1, because there's no
equivalent _PAN mmu index for it.

Fix this by adding two new MMU indexes:
* ARMMMUIdx_E30_0 is for Secure PL0
* ARMMMUIdx_E30_3_PAN is for Secure PL1 when PAN is enabled
The existing ARMMMUIdx_E3 is used to mean "Secure PL1 without PAN"
(and would be named ARMMMUIdx_E30_3 in an AArch32-centric scheme).

These extra two indexes bring us up to the maximum of 16 that the
core code can currently support.

This commit:
* adds the new MMU index handling to the various places
   where we deal in MMU index values
* adds assertions that we aren't AArch32 EL3 in a couple of
   places that currently use the E10 indexes, to document why
   they don't also need to handle the E30 indexes
* documents in a comment why regime_has_2_ranges() doesn't need
   updating

Notes for backporting: this commit depends on the preceding revert of
4c2c04746932; that revert and this commit should probably be
backported to everywhere that we originally backported 4c2c04746932.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2326
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2588
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101142845.1712482-3-peter.maydell@linaro.org
(cherry picked from commit efbe180ad2ed75d4cc64dfc6fb46a015eef713d1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

Revert "target/arm: Fix usage of MMU indexes when EL3 is AArch32"

This reverts commit 4c2c0474693229c1f533239bb983495c5427784d.

This commit tried to fix a problem with our usage of MMU indexes when
EL3 is AArch32, using what it described as a "more complicated
approach" where we share the same MMU index values for Secure PL1&0
and NonSecure PL1&0. In theory this should work, but the change
didn't account for (at least) two things:

(1) The design change means we need to flush the TLBs at any point
where the CPU state flips from one to the other.  We already flush
the TLB when SCR.NS is changed, but we don't flush the TLB when we
take an exception from NS PL1&0 into Mon or when we return from Mon
to NS PL1&0, and the commit didn't add any code to do that.

(2) The ATS12NS* address translate instructions allow Mon code (which
is Secure) to do a stage 1+2 page table walk for NS.  I thought this
was OK because do_ats_write() does a page table walk which doesn't
use the TLBs, so because it can pass both the MMU index and also an
ARMSecuritySpace argument we can tell the table walk that we want NS
stage1+2, not S.  But that means that all the code within the ptw
that needs to find e.g.  the regime EL cannot do so only with an
mmu_idx -- all these functions like regime_sctlr(), regime_el(), etc
would need to pass both an mmu_idx and the security_space, so they
can tell whether this is a translation regime controlled by EL1 or
EL3 (and so whether to look at SCTLR.S or SCTLR.NS, etc).

In particular, because regime_el() wasn't updated to look at the
ARMSecuritySpace it would return 1 even when the CPU was in Monitor
mode (and the controlling EL is 3).  This meant that page table walks
in Monitor mode would look at the wrong SCTLR, TCR, etc and would
generally fault when they should not.

Rather than trying to make the complicated changes needed to rescue
the design of 4c2c04746932, we revert it in order to instead take the
route that that commit describes as "the most straightforward" fix,
where we add new MMU indexes EL30_0, EL30_3, EL30_3_PAN to correspond
to "Secure PL1&0 at PL0", "Secure PL1&0 at PL1", and "Secure PL1&0 at
PL1 with PAN".

This revert will re-expose the "spurious alignment faults in
Secure PL0" issue #2326; we'll fix it again in the next commit.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-id: 20241101142845.1712482-2-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 056c5c90c171c4895b407af0cf3d198e1d44b40f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

acpi/disassemle-aml.sh: fix up after dir reorg

We moved expected files around, fix up the disassembler script.

Fixes: 7c08eefcaf ("tests/data/acpi: Move x86 ACPI tables under x86/${machine} path")
Fixes: 7434f90467 ("tests/data/acpi/virt: Move ARM64 ACPI tables under aarch64/${machine} path")
Cc: "Sunil V L" <sunilvl@ventanamicro.com>
Message-ID: <ce456091058734b7f765617ac5dfeebcb366d4a9.1730729695.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
(cherry picked from commit feb58e3b261db503ade94c5f43ccedeee4eac41f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/acpi: Fix ordering of BDF in Generic Initiator PCI Device Handle.

The ordering in ACPI specification [1] has bus number in the lowest byte.
As ACPI tables are little endian this is the reverse of the ordering
used by PCI_BUILD_BDF(). As a minimal fix split the QEMU BDF up
into bus and devfn and write them as single bytes in the correct
order.

[1] ACPI Spec 6.3, Table 5.80

Fixes: 0a5b5acdf2d8 ("hw/acpi: Implement the SRAT GI affinity structure")
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 16c687d84574a1139a6475c33e3b9191f7932ac0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

qemu-ga: Fix a SIGSEGV in ga_run_command() helper

qemu-ga on a NetBSD -current VM terminates with a SIGSEGV upon receiving
'guest-set-time' command...

Core was generated by `qemu-ga'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18)
    at ../qga/commands-posix.c:88
88         *str[len] = '\0';
[Current thread is 1 (process 1112)]
(gdb) bt
#0  0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18)
    at ../qga/commands-posix.c:88
#1  0x000000000cd37b60 in ga_run_command (argv=argv@entry=0xffffff922a90,
    action=action@entry=0xcda34b8 "set hardware clock to system time", errp=errp@entry=0xffffff922a70, in_str=0x0)
    at ../qga/commands-posix.c:164
#2  0x000000000cd380c4 in qmp_guest_set_time (has_time=<optimized out>, time_ns=<optimized out>,
    errp=errp@entry=0xffffff922ad0) at ../qga/commands-posix.c:304
#3  0x000000000cd253d8 in qmp_marshal_guest_set_time (args=<optimized out>, ret=<optimized out>, errp=0xffffff922b48)
    at qga/qga-qapi-commands.c:193
#4  0x000000000cd4e71c in qmp_dispatch (cmds=cmds@entry=0xcdf5b18 <ga_commands>, request=request@entry=0xf3c711a4b000,
    allow_oob=allow_oob@entry=false, cur_mon=cur_mon@entry=0x0) at ../qapi/qmp-dispatch.c:220
#5  0x000000000cd36524 in process_event (opaque=0xf3c711a79000, obj=0xf3c711a4b000, err=0x0) at ../qga/main.c:677
#6  0x000000000cd526f0 in json_message_process_token (lexer=lexer@entry=0xf3c711a79018, input=0xf3c712072480,
    type=type@entry=JSON_RCURLY, x=28, y=1) at ../qobject/json-streamer.c:99
#7  0x000000000cd93860 in json_lexer_feed_char (lexer=lexer@entry=0xf3c711a79018, ch=125 '}', flush=flush@entry=false)
    at ../qobject/json-lexer.c:313
#8  0x000000000cd93a00 in json_lexer_feed (lexer=lexer@entry=0xf3c711a79018,
    buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>)
    at ../qobject/json-lexer.c:350
#9  0x000000000cd5290c in json_message_parser_feed (parser=parser@entry=0xf3c711a79000,
    buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>)
    at ../qobject/json-streamer.c:121
#10 0x000000000cd361fc in channel_event_cb (condition=<optimized out>, data=0xf3c711a79000) at ../qga/main.c:703
#11 0x000000000cd3710c in ga_channel_client_event (channel=<optimized out>, condition=<optimized out>, data=0xf3c711b2d300)
    at ../qga/channel-posix.c:94
#12 0x0000f3c7120d9bec in g_main_dispatch () from /usr/pkg/lib/libglib-2.0.so.0
#13 0x0000f3c7120dd25c in g_main_context_iterate_unlocked.constprop () from /usr/pkg/lib/libglib-2.0.so.0
#14 0x0000f3c7120ddbf0 in g_main_loop_run () from /usr/pkg/lib/libglib-2.0.so.0
#15 0x000000000cda00d8 in run_agent_once (s=0xf3c711a79000) at ../qga/main.c:1522
#16 run_agent (s=0xf3c711a79000) at ../qga/main.c:1559
#17 main (argc=<optimized out>, argv=<optimized out>) at ../qga/main.c:1671
(gdb)

The commandline options used on the host machine...
qemu-system-aarch64 \
   -machine type=virt,pflash0=rom \
   -m 8G \
   -cpu host \
   -smp 8 \
   -accel hvf \
   -device virtio-net-pci,netdev=unet \
   -device virtio-blk-pci,drive=hd \
   -drive file=netbsd.qcow2,if=none,id=hd \
   -netdev user,id=unet,hostfwd=tcp::2223-:22 \
   -object rng-random,filename=/dev/urandom,id=viornd0 \
   -device virtio-rng-pci,rng=viornd0 \
   -serial mon:stdio \
   -display none \
   -blockdev node-name=rom,driver=file,filename=/opt/homebrew/Cellar/qemu/9.0.2/share/qemu/edk2-aarch64-code.fd,read-only=true \
   -chardev socket,path=/tmp/qga_netbsd.sock,server=on,wait=off,id=qga0 \
   -device virtio-serial \
   -device virtconsole,chardev=qga0,name=org.qemu.guest_agent.0

This patch rectifies the operator precedence while assigning the NUL
terminator.

Fixes: c3f32c13a325f1ca9a0b08c19fefe9e5cc04289d
Signed-off-by: Sunil Nimmagadda <sunil@nimmagadda.net>
Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/m15xppk9qg.fsf@nimmagadda.net
Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com>
(cherry picked from commit 9cfe110d9fc0be88178770a85dc6170eecdf6be1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/sd/sdcard: Fix calculation of size when using eMMC boot partitions

The sd_bootpart_offset() function calculates the *runtime* offset which
changes as the guest switches between accessing the main user data area
and the boot partitions by writing to the EXT_CSD_PART_CONFIG_ACC_MASK
bits, so it shouldn't be used to calculate the main user data area size.

Instead, subtract the boot_part_size directly (twice, as there are two
identical boot partitions defined by the eMMC spec).

Suggested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Fixes: c8cb19876d3e ("hw/sd/sdcard: Support boot area in emmc image")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit c078298301a8c72fe12da85d94372689196628bc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

tests/tcg: Replace -mpower8-vector with -mcpu=power8

[1] deprecated -mpower8-vector, resulting in:

    powerpc64-linux-gnu-gcc: warning: switch '-mpower8-vector' is no longer supported
    qemu/tests/tcg/ppc64/vsx_f2i_nan.c:4:15: error: expected ';' before 'float'
        4 | typedef vector float vsx_float32_vec_t;
          |               ^~~~~~

Use -mcpu=power8 instead. In order to properly verify that this works,
one needs a big-endian (the minimum supported CPU for 64-bit
little-endian is power8 anyway) GCC configured with --enable-checking
(see GCC commit e154242724b0 ("[RS6000] Don't pass -many to the
assembler").

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109987

Cc: qemu-stable@nongnu.org
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit ddf4dd46e5c31bd223f2e867f2cae43bfd41dfb9)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/ssi/pnv_spi: Fixes Coverity CID 1558831

In this commit the following coverity scan defect has been fixed
CID 1558831:  Resource leaks  (RESOURCE_LEAK)
  Variable "rsp_payload" going out of scope leaks the storage it
  points to.

Cc: qemu-stable@nongnu.org
Fixes: Coverity CID 1558831
Signed-off-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Fixes: b4cb930e40 ("hw/ssi: Extend SPI model")
[PMD: Rebased on previous commit (returning earlier)]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 031324472eee57bce9bd4a0231aa9b137494d8a1)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/ssi/pnv_spi: Return early in transfer()

Return early to simplify next commit.
No logical change intended.

Cc: qemu-stable@nongnu.org
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 3feabc18ad4d4bdc178a205b986353a54dbfcf20)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

hw/ssi/pnv_spi: Match _xfer_buffer_free() with _xfer_buffer_new()

pnv_spi_xfer_buffer_new() allocates %payload using g_malloc0(),
and pnv_spi_xfer_buffer_write_ptr() allocates %payload->data
using g_realloc(). Use the API equivalent g_free() to release
the buffers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 65f53702d2e4bd045ce16ca874469cdd1e1ef4e4)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ppc/pnv: ADU fix possible buffer overrun with invalid size

The ADU LPC transfer-size field is 7 bits, but the supported sizes for
LPC access via ADU appear to be 1, 2, 4, 8. The data buffer could
overrun if firmware set an invalid size field, so add checks to reject
them with a message.

Cc: qemu-stable@nongnu.org
Reported-by: Cédric Le Goater <clg@redhat.com>
Resolves: Coverity CID 1558830
Fixes: 24bd283bccb33 ("ppc/pnv: Implement ADU access to LPC space")
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit ddd2a060a0da41000ddca31e329ab1d54e37fedb)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/ppc: Fix HFSCR facility checks

The HFSCR defines were being encoded as bit masks, but the users
expect (and analogous FSCR defines are) bit numbers.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 87de77f6aeba4e38d123f7541cfdae7b124f6a02)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

target/ppc: Fix mtDPDES targeting SMT siblings

A typo in the loop over SMT threads to set irq level for doorbells
when storing to DPDES meant everything was aimed at the CPU executing
the instruction.

Cc: qemu-stable@nongnu.org
Fixes: d24e80b2ae ("target/ppc: Add msgsnd/p and DPDES SMT support")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 0324d236d2918c18a9ad4a1081b1083965a1433b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

ppc/pnv: Fix LPC POWER8 register sanity check

POWER8 does not have the ISA IRQ -> SERIRQ routing system of later
CPUs, instead all ISA IRQs are sent to the CPU via a single PSI
interrupt. There is a sanity check in the POWER8 case to ensure the
routing bits have not been set, because that would indicate a
programming error.

Those bits were incorrectly specified because of ppc bit numbering
fun. Coverity detected this as an always-zero expression.

Cc: qemu-stable@nongnu.org
Reported-by: Cédric Le Goater <clg@redhat.com>
Resolves: Coverity CID 1558829 (partially)
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 84416e262ea1218026a8567ed9ea31c16d77edea)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>