git.ipfire.org Git - thirdparty/qemu.git/log

target/ppc: fix timebase register reset state

(H)DEC and PURR get reset before icount does, which causes them to be
skewed and not match the init state. This can cause replay to not
match the recorded trace exactly. For DEC and HDEC this is usually not
noticable since they tend to get programmed before affecting the
target machine. PURR has been observed to cause replay bugs when
running Linux.

Fix this by resetting using a time of 0.

Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

spapr: nested: Add support for reporting Hostwide state counter

Add support for reporting Hostwide state counters for nested KVM pseries
guests running with 'cap-nested-papr' on Qemu-TCG acting as
L0-hypervisor. The Hostwide state counters are statistics about state that
L0-hypervisor maintains for the L2-guests and represent the state of all
L2-guests, not just a specific one.

These stats counters are exposed to L1-Hypervisor by the L0-Hypervisor via a
new bit-flag named 'getHostWideState' for the H_GUEST_GET_STATE hcall which
is documented at [1]. Once this flag is set the hcall should populate the
Guest-State-Elements in the requested GSB with the stat counter
values. Currently following five counters are supported:

* l0_guest_heap_size_inuse
* l0_guest_heap_size_max
* l0_guest_pagetable_size_inuse
* l0_guest_pagetable_size_max
* l0_guest_pagetable_reclaimed

At the moment '0' is being reported for all these counters as these
counters doesn't align with how L0-Qemu manages Guest memory.

The patch implements support for these counters by adding new members to
the 'struct SpaprMachineStateNested'. These new members are then plugged
into the existing 'guest_state_element_types[]' with the help of a new
macro 'GSBE_NESTED_MACHINE_DW' together with a new helper
'get_machine_ptr()'. guest_state_request_check() is updated to ensure
correctness of the requested GSB and finally h_guest_getset_state() is
updated to handle the newly introduced flag
'GUEST_STATE_REQUEST_HOST_WIDE'.

This patch is tested with the proposed linux-kernel implementation to
expose these stat-counter as perf-events at [2].

[1]
https://lore.kernel.org/all/20241222140247.174998-2-vaibhav@linux.ibm.com

[2]
https://lore.kernel.org/all/20241222140247.174998-1-vaibhav@linux.ibm.com

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250221155449.530645-1-vaibhav@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc: spapr: Enable 2nd DAWR on Power10 pSeries machine

As per the PAPR, bit 0 of byte 64 in pa-features property
indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
DAWR is present, otherwise not. Use KVM_CAP_PPC_DAWR1 capability to find
whether kvm supports 2nd DAWR or not. If it's supported, allow user to set
the pa-feature bit in guest DT using cap-dawr1 machine capability.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-ID: <173708681866.1678.11128625982438367069.stgit@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc: Enable 2nd DAWR support on Power10 PowerNV machine

Extend the existing watchpoint facility from TCG DAWR0 emulation
to DAWR1 on POWER10.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-ID: <173708680684.1678.13237334676438770057.stgit@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc/epapr: Do not swap ePAPR magic value

The ePAPR magic value in $r6 doesn't need to be byte swapped.

See ePAPR-v1.1.pdf chapter 5.4.1 "Boot CPU Initial Register State"
and the following mailing-list threads:
https://lore.kernel.org/qemu-devel/CAFEAcA_NR4XW5DNL4nq7vnH4XRH5UWbhQCxuLyKqYk6_FCBrAA@mail.gmail.com/
https://lore.kernel.org/qemu-devel/D6F93NM6OW2L.2FDO88L38PABR@gmail.com/

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-7-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc/spapr: Convert DIRTY_HPTE() macro as hpte_set_dirty() method

Convert DIRTY_HPTE() macro as hpte_set_dirty() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-6-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc/spapr: Convert CLEAN_HPTE() macro as hpte_set_clean() method

Convert CLEAN_HPTE() macro as hpte_set_clean() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-5-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc/spapr: Convert HPTE_DIRTY() macro as hpte_is_dirty() method

Convert HPTE_DIRTY() macro as hpte_is_dirty() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-4-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc/spapr: Convert HPTE_VALID() macro as hpte_is_valid() method

Convert HPTE_VALID() macro as hpte_is_valid() method.

sPAPR data structures including the hash page table are big-endian
regardless of current CPU endian mode, so use the big-endian LD/ST
API to access the hash PTEs.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-3-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc/spapr: Convert HPTE() macro as hpte_get_ptr() method

Convert HPTE() macro as hpte_get_ptr() method.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20241220213103.6314-2-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Restrict ATTN / SCV / PMINSN helpers to TCG

Move helper_attn(), helper_scv() and helper_pminsn() to
tcg-excp_helper.c.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-15-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Make powerpc_excp() prototype public

In order to move TCG specific code dependent on powerpc_excp()
in the next commit, expose its prototype in "internal.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-14-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Fix style in excp_helper.c

Fix style in do_rfi() before moving the code around.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-13-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Restrict various common helpers to TCG

Move helpers common to system/user emulation to tcg-excp_helper.c.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-12-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Restrict exception helpers to TCG

Move exception helpers to tcg-excp_helper.c so they are
only built when TCG is selected. Preprocessor guards
are added for some helpers unused when CONFIG_USER_ONLY.

[npiggin: mention USER_ONLY change]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250127102620.39159-10-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Remove raise_exception_ra()

Introduced in commit db789c6cd33 ("ppc: Provide basic
raise_exception_* functions"), raise_exception_ra() has
never been used. Remove as dead code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-9-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Restrict powerpc_checkstop() to TCG

Expose powerpc_checkstop() prototype, and move it to
tcg-excp_helper.c, only built when TCG is available.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-8-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Ensure powerpc_mcheck_checkstop() is only called under TCG

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250127102620.39159-7-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Move ppc_ldl_code() to tcg-excp_helper.c

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-6-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Move TCG specific exception handlers to tcg-excp_helper.c

Move the TCGCPUOps handlers to a new unit: tcg-excp_helper.c,
only built when TCG is selected.

See in target/ppc/cpu_init.c:

    #ifdef CONFIG_TCG
    static const TCGCPUOps ppc_tcg_ops = {
      ...
      .do_unaligned_access = ppc_cpu_do_unaligned_access,
      .do_transaction_failed = ppc_cpu_do_transaction_failed,
      .debug_excp_handler = ppc_cpu_debug_excp_handler,
      .debug_check_breakpoint = ppc_cpu_debug_check_breakpoint,
      .debug_check_watchpoint = ppc_cpu_debug_check_watchpoint,
    };
    #endif /* CONFIG_TCG */

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-5-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Make ppc_ldl_code() declaration public

We are going to move code calling ppc_ldl_code() out of
excp_helper.c where it is defined. Expose its declaration
for few commits, until eventually making it static again
once everything is moved.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-ID: <20250127102620.39159-4-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Add new PowerPC Special Purpose Registers (RWMR)

Register RWMR - Region Weighted Mode Register
for privileged access in Power9 and Power10

It controls what the SPURR register produces.

Specs:
- Power10: https://files.openpower.foundation/s/EgCy7C43p2NSRfR

TCG does not model SMT priority, timing, resource controls
and status so this register has no effect for now.

[npiggin: adjust changelog]
Signed-off-by: dan tan <dantan@linux.ibm.com>
Message-ID: <20250116154226.13376-1-dantan@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc/spapr: Restrict CONFER hypercall to TCG

KVM handles H_CONFER and does not pass it along to QEMU, so
only vhyp (as used by TCG spapr) needs to handle it.

[npiggin: Add changelog]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250127102620.39159-2-philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ssi/pnv_spi: Put a limit to RDR match failures

There is a possibility that SPI controller can get into loop due to indefinite
RDR match failures. Hence put a limit to failures and stop the sequencer.

Signed-off-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250303141328.23991-5-chalapathi.v@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ssi/pnv_spi: Make bus names distinct for each controllers of a socket

Create a spi buses with distinct names on each socket so that responders
are attached to correct SPI controllers.

Change the bus name to chipX.spi.<busnum> where X = 0..<num_sockets>

QOM tree on a 2 socket machine:
(qemu) info qom-tree
/machine (powernv10-machine)
  /chip[0] (power10_v2.0-pnv-chip)
    /pib_spic[0] (pnv-spi)
      /chip0.spi.0 (SSI)
      /xscom-spi[0] (memory-region)
  /chip[1] (power10_v2.0-pnv-chip)
    /pib_spic[0] (pnv-spi)
      /chip1.spi.0 (SSI)
      /xscom-spi[0] (memory-region)

Signed-off-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Message-ID: <20250303141328.23991-4-chalapathi.v@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ssi/pnv_spi: Use local var seq_index instead of get_seq_index().

Use a local variable seq_index instead of repeatedly calling
get_seq_index() method and open-code next_sequencer_fsm().

Signed-off-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250303141328.23991-3-chalapathi.v@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ssi/pnv_spi: Replace PnvXferBuffer with Fifo8 structure

In PnvXferBuffer dynamically allocating and freeing is a
process overhead. Hence used an existing Fifo8 buffer with
capacity of 16 bytes.

Signed-off-by: Chalapathi V <chalapathi.v@linux.ibm.com>
Message-ID: <20250303141328.23991-2-chalapathi.v@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

qtest/xive: Add test of pool interrupts

Added new test for pool interrupts. Removed all printfs from pnv-xive2-* qtests.

Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

qtest/xive: Change printf to g_test_message

Change all printf() in pnv-xive2-* qtests to g_test_message()

[npiggin: split from pool qtest]
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Check crowd backlog when scanning group backlog

When processing a backlog scan for group interrupts, also take
into account crowd interrupts.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

pnv/xive2: Rename nvp_ to nvx_ if they can refer to NVP or NVGC

The blk/index in some paths may refer to an NVP or an NVGC. When it
is not known ahead of time, use the nvx_ prefix to prevent confusion.

[npiggin: split out of larger fix patch and reworded]
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Support crowd-matching when looking for target

XIVE crowd sizes are encoded into a 2-bit field as follows:
  0: 0b00
  2: 0b01
  4: 0b10
16: 0b11

A crowd size of 8 is not supported.

If an END is defined with the 'crowd' bit set, then a target can be
running on different blocks. It means that some bits from the block
VP are masked when looking for a match. It is similar to groups, but
on the block instead of the VP index.

Most of the changes are due to passing the extra argument 'crowd' all
the way to the function checking for matches.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Add support for MMIO operations on the NVPG/NVC BAR

Add support for the NVPG and NVC BARs. Access to the BAR pages will
cause backlog counter operations to either increment or decriment
the counter.

Also added qtests for the same.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

qtest/xive: Add group-interrupt test

Add XIVE2 tests for group interrupts and group interrupts that have
been backlogged.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Process group backlog when updating the CPPR

When the hypervisor or OS pushes a new value to the CPPR, if the LSMFB
value is lower than the new CPPR value, there could be a pending group
interrupt in the backlog, so it needs to be scanned.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Process group backlog when pushing an OS context

When pushing an OS context, we were already checking if there was a
pending interrupt in the IPB and sending a notification if needed. We
also need to check if there is a pending group interrupt stored in the
NVG table. To avoid useless backlog scans, we only scan if the NVP
belongs to a group.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Add undelivered group interrupt to backlog

When a group interrupt cannot be delivered, we need to:
- increment the backlog counter for the group in the NVG table
  (if the END is configured to keep a backlog).
- start a broadcast operation to set the LSMFB field on matching CPUs
  which can't take the interrupt now because they're running at too
  high a priority.

[npiggin: squash in fixes from milesg]
[milesg: only load the NVP if the END is !ignore]
[milesg: always broadcast backlog, not only when there are precluded VPs]

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Support group-matching when looking for target

If an END has the 'i' bit set (ignore), then it targets a group of
VPs. The size of the group depends on the VP index of the target
(first 0 found when looking at the least significant bits of the
index) so a mask is applied on the VP index of a running thread to
know if we have a match.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Add grouping level to notification

The NSR has a (so far unused) grouping level field. When a interrupt
is presented, that field tells the hypervisor or OS if the interrupt
is for an individual VP or for a VP-group/crowd. This patch reworks
the presentation API to allow to set/unset the level when
raising/accepting an interrupt.

It also renames xive_tctx_ipb_update() to xive_tctx_pipr_update() as
the IPB is only used for VP-specific target, whereas the PIPR always
needs to be updated.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive: Rename ipb_to_pipr() to xive_ipb_to_pipr()

Rename to follow the convention of the other function names.

Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/xive2: Update NVP save/restore for group attributes

If the 'H' attribute is set on the NVP structure, the hardware
automatically saves and restores some attributes from the TIMA in the
NVP structure.

The group-specific attributes LSMFB, LGS and T have an extra flag to
individually control what is saved/restored.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Kowal <kowal@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Add a default formatted PNOR image

The default PNOR image is erased and not recognised by skiboot, so NVRAM
gets disabled. This change adds a tiny pnor file that is a proper FFS
image with a formatted NVRAM partition. This is recognised by skiboot and
will persist across machine reboots.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Add a PNOR address and size sanity checks

The BMC HIOMAP PNOR access protocol has certain limits on PNOR addresses
and sizes. Add some sanity checks for these so we don't get strange
behaviour.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Move PNOR to offset 0 in the ISA FW space

skiboot has a bug that does not handle ISA FW access correctly for IDSEL
devices > 0, and the current PNOR default address and size puts 64MB in
device 0 and 64MB in device 1, which causes skiboot to hit this bug and
breaks PNOR accesses.

Move the PNOR address down to 0 for now, so a 256MB PNOR can be accessed
via device 0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Implement LPC FW address space IDSEL

LPC FW address space is a 256MB (28-bit) region to one of 16-devices
that are selected with the IDSEL register. Implement this by making
the ISA FW address space 4GB, and move the 256MB OPB alias within
that space according to IDSEL.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: raise no-response errors if an LPC transaction fails

If nothing responds to an LPC access, the LPC host controller should
set an IRQSTAT error. Model this behaviour.

skiboot uses this error to "probe" LPC accesses, among other things to
determine if a SuperIO chip is present. After this change it recognizes
there is no SuperIO present and does not keep trying to access it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Support LPC host controller irqs other than serirqs

The LPC model has only supported serirqs (ISA device IRQs), however
there are internal sources that can raise other interrupts. Update the
device to handle these interrupt sources.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

target/ppc: Add Power9/10 power management SPRs

Linux power management code accesses these registers for pstate
management. Wire up a very simple implementation.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
After OCC fixes in QEMU pnv model and skiboot (since they have suffered
some bitrot), Linux will start performing PM SPR accesses. This is a
very simple implementation that makes it a bit happier.

Thanks,
Nick

ppc/pnv/occ: Implement a basic dynamic OCC model

The OCC is an On Chip Controller that handles various thermal and power
management. It is a PPC405 microcontroller that runs its own firmware
which is out of scope of the powernv machine model. Some dynamic
behaviour and interfaces that are important for host CPU testing can be
implemented with a much simpler state machine.

This change adds a 100ms timer that ticks through a simple state machine
that looks for "OCC command requests" coming from host firmware, and
responds to them.

For now the powercap command is implemented because that is used by
OPAL and exported to Linux and is easy to test.

  $ F=/sys/firmware/opal/powercap/system-powercap/powercap-current
  $ cat $F
  100
  $ echo 50 | sudo tee $F
  50
  $ cat $F
  50

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/occ: Add POWER10 OCC-OPAL data format

Add POWER10 OCC-OPAL data format. POWER10 changes major version and
adds a few fields.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/occ: Update pstate frequency tables

OCC pstate frequencies are in kHz, so the OCC data was 3-4MHz. Upgrade
to GHz. Make each pstate have a different frequency.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Make HOMER memory a RAM region

The HOMER is a region of memory used by host and firmware and
microconrollers. It has very little logic by itself, just some BAR
registers. Users of this memory should operate on it rather than
have HOMER implement them with MMIO registers, which is not the
right model.

This change switches the implementation of HOMER from MMIO to RAM,
and moves the OCC register implementation to in-memory structure
accesses performed by the OCC model.

This has the downside that access to unimplemented regions of HOMER
are no longer flagged. Perhaps that could be done by adding a memory
region for HOMER, and ram subregions under that for each implemented
part. But for now this takes the simpler approach.

Note: This brings some data structure definitions from skiboot, which
does not match QEMU coding style but is not changed to make comparisons
and updates simpler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/occ: Better document OCCMISC bits

Use defines for the OCCMISC register bits, and add a comment about the
IRQ request bit, which QEMU may not model quite correctly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/homer: class-based base and size

Put HOMER memory region base and size into the class, to allow more
code-reuse between different machines in later changes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/occ: Fix common area sensor offsets

The commit to fix the OCC common area sensor mappings didn't update the
register offsets to match.

Before this change, skiboot reports:

[ 0.347100086,3] OCC: Chip 0 sensor data invalid

Afterward, there is no error and the sensor_groups directory appears
under /sys/firmware/opal/.

The SLW_IMAGE_BASE address looks like a workaround to intercept firmware
memory accesses, but that does not seem to be required now (and would
have been broken by the OCC common area region mapping change anyway).
So it can be removed.

Fixes: 3a1b70b66b5cb4 ("ppc/pnv: Fix OCC common area region mapping")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/homer: Make dummy reads return 0

HOMER memory implements some dummy registers that return a nonsense
value to satisfy skiboot accesses caused by "SLW" init and register
save/restore programming that has never worked under QEMU:

[ 0.265000943,3] SLW: Failed to set HRMOR for CPU 0,RC=0x1
[ 0.265356988,3] Disabling deep stop states

To simplify a later change to implement HOMER as a RAM area, make
these return zero, which has the same result.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/homer: Fix OCC registers

The HOMER OCC registers seem to have bitrotted and fail for various
reasons on powernv8, 9, and 10.

The major problems are that POWER8 has the wrong version value and its
pstate ordering is incorrect. POWER9/10 have not set the OCC state to
active. Non-zero chips are also set to OCC slaves for POWER9/10.

Unfortunately skiboot has also bitrotted and requires fixes that are
not yet in the bios files to run. With a patched skiboot, before this
change, powernv9/10 report:

[    0.262050394,3] OCC: Chip: 0: OCC not active
[    0.262128603,3] OCC: Initialization on all chips did not complete(timed out)

powernv8 reports:

[    0.173572100,3] OCC: Unknown OCC-OPAL interface version.
[    0.173812059,3] OCC: Initialization on all chips did not complete(timed out)

After this patch, all report:

[    0.176815668,5] OCC: All Chip Rdy after 0 ms

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv/phb4: Add pervasive chiplet support to PHB4/5

Each non-core chiplet on a chip has a "pervasive chiplet" unit and its
xscom register set. This adds support for PHB4/5.

skiboot reads the CPLT_CONF1 register in __phb4/5_get_max_link_width(),
which shows up as unimplemented xscom reads. Set a value in PCI CONF1
register's link-width field to demonstrate skiboot doing something
interesting with it.

In the bigger picture, it might be better to model the pervasive
chiplet type as parent that each non-core chiplet model derives from.
For now this is enough to get the PHB registers implemented and working
for skiboot, and provides a second example (after the N1 chiplet) that
will help if the design is reworked as such.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

pseries: Update SLOF firmware image

This adds TPM pass through API.
Also, moves SLOF from github to gitlab.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/pnv: Update skiboot to 7.1-106

This skiboot firmware importantly contains updates for HOMER/OCC bugs.
These subsystems have bitrotted in QEMU and skiboot and this update
allows new QEMU models to be exercised.

Power11 support is also added. This model is not yet merged in QEMU,
but firmware support will make development and testing simpler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

hw/ppc: Deprecate 405 CPUs

The ref405ep machine is scheduled for removal in QEMU 10.0. Keep the
405 CPU implementation for a while because it is theoretically
possible to model the power management (OCC) co-processor found on the
IBM POWER [8-11] processors.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250204080649.836155-4-clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/ppc405: Remove boards

The ref405ep machine is the only PPC 405 machine. Drop all support by
removing the SoC and associated devices as-well as the machine.

Link: https://lore.kernel.org/qemu-devel/20250110141800.1587589-3-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250204080649.836155-3-clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

ppc/ppc405: Remove tests

Since we are about to remove all support for PPC 405, start by
removing the tests referring to the ref405ep machine.

Link: https://lore.kernel.org/qemu-devel/20250110141800.1587589-2-clg@redhat.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250204080649.836155-2-clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Merge tag 'migration-20250310-pull-request' of https://gitlab.com/farosas/qemu into staging

Migration pull request

- Fix use-after-free in incoming migration
- Improve cpr migration blocker for volatile ram
- Fix RDMA migration

# -----BEGIN PGP SIGNATURE-----
#
# iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmfPaCAQHGZhcm9zYXNA
# c3VzZS5kZQAKCRDHmNx0G+wxnQy9EADRp/6GaSzoqWgafU8DGM5Q69HyKiZ888DZ
# 7qXqJeH3c95nvOnIw2BMhUYX4t8kkAbUcWlr7L8KCjZT/6N/d1/Z5fimqymRkw4x
# +8kDyADv5FY0339aMLf3qBbIAQj/gvPvg8H+e+hXfokZqoYgLXZ0eqNAz8MjIcyN
# +A+waEBMLNvTgZyTQl2TbCvb+mbRial8u8C9BIoILhn/gNuoMX7lbt0tq41HZwe0
# l3v16jnXlsDvQUXp99bGySomRgkcYqdAt+HWHLje3frT/Ap8dGaUJKlpgJ8DXJiA
# fV1reKihJdj37q9GSG8cR02W+ATBesiecufV4TUPNQYQzTdxn3fOMwdc3Pck074D
# YAQxFT20OPou+NRxjYoHT/GqFUY36/2qBJpt7TY3ramdklHJhXpRyedK4rppTZNn
# pC3lnbpA/LHRmfD1Nh0CRmqZpbV+qW1BWEgMwk4qui46BxYWHxKHFpxAuwlJQmcw
# RxY8qPhIXQM03tiTgIddBNDZLoVqRoUP7YpzR7MMa1rz0T5inNFMcNGm72WpKODE
# rzpw4ezXO7+D4/QmMq3PoPfhFv3QFnH6jaGj8JkJM378KLvh4fQ0woXtDKFl4Tbq
# 1oBZ17WUv6aHr75b+KMyKJNLinvMu5WF5WoRYIt1lNXaqk7I494yvIjtRrimWZIS
# Z5Q0tpUmpw==
# =yEH0
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 11 Mar 2025 06:30:56 HKT
# gpg:                using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D
# gpg:                issuer "farosas@suse.de"
# gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown]
# gpg:                 aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3  64CF C798 DC74 1BEC 319D

* tag 'migration-20250310-pull-request' of https://gitlab.com/farosas/qemu:
  migration: Prioritize RDMA in ram_save_target_page()
  migration: ram block cpr blockers
  migration: Fix UAF for incoming migration on MigrationState

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-xen-20250310' of https://xenbits.xen.org/git-http/people/aperard/qemu-dm into staging

Xen queue:

* xen/passthrough: use gsi to map pirq when dom0 is PVH
* Fix missing xenstore node from xen-block backend
* Fix xen mapcache extraneous invalidate

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEE+AwAYwjiLP2KkueYDPVXL9f7Va8FAmfO+nEACgkQDPVXL9f7
# Va+QYggA9dmxMGDO05UEd2ZPv/Goub37Le44qBN4oeXizVRZgGUs2w9ETBXhPZus
# 34aI8CTID4fcH4rgF4LgJ4XuyOxYwP1ot8EpDHQg+ji2nyHeMpAyePTfubprq17U
# APN6Qqefd9X+TX+W9zUS5jV/AXO+apGX+tmVkVexFuy4gSRGSVCPoibHePtoLH9G
# 3rSREjdEx7ByY6ieCV5x3zHPp5tmnLWeHpNCVc5x6NplBslQduBz6vOqLNWB1LKO
# 3a/lYcvTn9PIla1zpvGNbeTsPv2lcdx3SccThcZmyTv2PDm1kzyUOIo1lSIP6bb3
# LjCl3dm1mfxAGEaZ+//rsRhTH8d5ew==
# =K79y
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Mar 2025 22:42:57 HKT
# gpg:                using RSA key F80C006308E22CFD8A92E7980CF5572FD7FB55AF
# gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>" [unknown]
# gpg:                 aka "Anthony PERARD <anthony.perard@vates.tech>" [unknown]
# gpg:                 aka "Anthony PERARD <anthony@xenproject.org>" [unknown]
# gpg:                 aka "Anthony PERARD <anthony.perard@citrix.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5379 2F71 024C 600F 778A  7161 D8D5 7199 DF83 42C8
#      Subkey fingerprint: F80C 0063 08E2 2CFD 8A92  E798 0CF5 572F D7FB 55AF

* tag 'pull-xen-20250310' of https://xenbits.xen.org/git-http/people/aperard/qemu-dm:
  xen: No need to flush the mapcache for grants
  hw/xen: Add "mode" parameter to xen-block devices
  xen/passthrough: use gsi to map pirq when dom0 is PVH

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAmfO1zkACgkQ7wSWWzmN
# YhET+wf+PkaGeFTNUrOtWpl35fSMKlmOVbb1fkPfuhVBmeY2Vh1EIN3OjqnzdV0F
# wxpuk+wwmFiuV1n6RNuMHQ0nz1mhgsSlZh93N5rArC/PUr3iViaT0cb82RjwxhaI
# RODBhhy7V9WxEhT9hR8sCP2ky2mrKgcYbjiIEw+IvFZOVQa58rMr2h/cbAb/iH4l
# 7T9Wba03JBqOS6qgzSFZOMxvqnYdVjhqXN8M6W9ngRJOjPEAkTB6Evwep6anRjcM
# mCUOgkf2sgQwKve8pYAeTMkzXFctvTc/qCU4ZbN8XcoKVVxe2jllGQqdOpMskPEf
# slOuINeW5M0K5gyjsb/huqcOTfDI2A==
# =/Y0+
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Mar 2025 20:12:41 HKT
# gpg:                using RSA key 215D46F48246689EC77F3562EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [full]
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* tag 'net-pull-request' of https://github.com/jasowang/qemu:
  tap-linux: Open ipvtap and macvtap
  Revert "hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()"
  util/iov: Do not assert offset is in iov
  net: move backend cleanup to NIC cleanup
  net: parameterize the removing client from nc list

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'pull-10.0-for-softfreeze-100325-3' of https://gitlab.com/stsquad/qemu into staging

functional and tcg tests, plugins and MAINTAINERS

  - update and expand aarch64 GPU tests
  - fix build dependence for plugins
  - update libvirt-ci to vulkan-tools
  - allow plugin tests to run on non-POSIX systems
  - tweak test/vm times
  - mark test-vma as linux only
  - various compiler fixes for tcg tests
  - add gitlab build unit tracker
  - error out early on stalled RME tests
  - compile core plugin code once
  - update MAINTAINERS

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmfOwjcACgkQ+9DbCVqe
# KkRkcwgAlRTflCqYlZxdlo4CiOSXaHAFr8yfwWq138LJQRQH530JZnz1lZtxTbEM
# pXT7ixnuJQDMybCQJmvUlK5UTUkZhGS3VuAR1VeM2J8/3VXYzf5sFjZ7yko9mA8S
# 2FX8vdfbko8/J31+lKccA0tpbHyi2AbMR+mO8xj6KZQoePwmHoRmhgH7p7LE35YO
# 8ytaOjMwTKF5fReVK+tlcrVJHFMdGsGNwtsnB2FhhVjI56fQqyM5hGXfOING2Fx3
# iZH3rjzfDB4SWbBqaRsMgH9RXjuB9Eo4v0Qs5ve5SjDyzRJk+/CbbBJ4oRt9hurJ
# bA+CYZuNLGBf8Z/mUeYMavA7rxT5rw==
# =qobU
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 10 Mar 2025 18:43:03 HKT
# gpg:                using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* tag 'pull-10.0-for-softfreeze-100325-3' of https://gitlab.com/stsquad/qemu: (31 commits)
  MAINTAINERS: remove widely sanctioned entities
  plugins/core: make a single build unit
  plugins/api: build only once
  plugins/api: split out time control helpers
  plugins/api: split out the vaddr/hwaddr helpers
  plugins/api: split out binary path/start/end/entry code
  plugins/loader: compile loader only once
  plugins/plugin.h: include queue.h
  plugins/api: clean-up the includes
  include/qemu: plugin-memory.h doesn't need cpu-defs.h
  plugins/loader: populate target_name with target_name()
  plugins/api: use qemu_target_page_mask() to get value
  tests/functional: add boot error detection for RME tests
  gitlab: add a new build_unit job to track build size
  tests/tcg: Suppress compiler false-positive warning on sha1.c
  tests/tcg: enable -fwrapv for test-i386-bmi
  tests/tcg: fix constraints in test-i386-adcox
  tests/tcg: add message to _Static_assert in test-avx
  tests/tcg: mark test-vma as a linux-only test
  tests/vm: bump timeout for shutdown
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

migration: Prioritize RDMA in ram_save_target_page()

Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.

Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unknown control messages and state loading errors
destination:
(qemu) qemu-system-x86_64: Unknown control message QEMU FILE
qemu-system-x86_64: error while loading state section id 1(ram)
qemu-system-x86_64: load of migration failed: Operation not permitted
source:
(qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
qemu-system-x86_64: rdma migration: recv polling control error!
qemu-system-x86_64: warning: Early error. Sending error.
qemu-system-x86_64: warning: rdma migration: send polling control error

RDMA migration implemented its own protocol/method to send pages to
destination side, hand over to RDMA first to prevent pages being saved by
other protocol.

Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Message-ID: <20250305062825.772629-2-lizhijian@fujitsu.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: ram block cpr blockers

Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile ram blocks
in the migration stream file and recreate them later, because the physical
memory for the blocks is pinned and registered for vfio. Add a blocker
for volatile ram blocks.

Also add a blocker for RAM_GUEST_MEMFD. Preserving guest_memfd may be
sufficient for CPR, but it has not been tested yet.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <1740667681-257312-1-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

migration: Fix UAF for incoming migration on MigrationState

On the incoming migration side, QEMU uses a coroutine to load all the VM
states.  Inside, it may reference MigrationState on global states like
migration capabilities, parameters, error state, shared mutexes and more.

However there's nothing yet to make sure MigrationState won't get
destroyed (e.g. after migration_shutdown()).  Meanwhile there's also no API
available to remove the incoming coroutine in migration_shutdown(),
avoiding it to access the freed elements.

There's a bug report showing this can happen and crash dest QEMU when
migration is cancelled on source.

When it happens, the dest main thread is trying to cleanup everything:

  #0  qemu_aio_coroutine_enter
  #1  aio_dispatch_handler
  #2  aio_poll
  #3  monitor_cleanup
  #4  qemu_cleanup
  #5  qemu_default_main

Then it found the migration incoming coroutine, schedule it (even after
migration_shutdown()), causing crash:

  #0  __pthread_kill_implementation
  #1  __pthread_kill_internal
  #2  __GI_raise
  #3  __GI_abort
  #4  __assert_fail_base
  #5  __assert_fail
  #6  qemu_mutex_lock_impl
  #7  qemu_lockable_mutex_lock
  #8  qemu_lockable_lock
  #9  qemu_lockable_auto_lock
  #10 migrate_set_error
  #11 process_incoming_migration_co
  #12 coroutine_trampoline

To fix it, take a refcount after an incoming setup is properly done when
qmp_migrate_incoming() succeeded the 1st time.  As it's during a QMP
handler which needs BQL, it means the main loop is still alive (without
going into cleanups, which also needs BQL).

Releasing the refcount now only until the incoming migration coroutine
finished or failed.  Hence the refcount is valid for both (1) setup phase
of incoming ports, mostly IO watches (e.g. qio_channel_add_watch_full()),
and (2) the incoming coroutine itself (process_incoming_migration_co()).

Note that we can't unref in migration_incoming_state_destroy(), because
both qmp_xen_load_devices_state() and load_snapshot() will use it without
an incoming migration.  Those hold BQL so they're not prone to this issue.

PS: I suspect nobody uses Xen's command at all, as it didn't register yank,
hence AFAIU the command should crash on master when trying to unregister
yank in migration_incoming_state_destroy()..  but that's another story.

Also note that in some incoming failure cases we may not always unref the
MigrationState refcount, which is a trade-off to keep things simple.  We
could make it accurate, but it can be an overkill.  Some examples:

  - Unlike most of the rest protocols, socket_start_incoming_migration()
    may create net listener after incoming port setup sucessfully.
    It means we can't unref in migration_channel_process_incoming() as a
    generic path because socket protocol might keep using MigrationState.

  - For either socket or file, multiple IO watches might be created, it
    means logically each IO watch needs to take one refcount for
    MigrationState so as to be 100% accurate on ownership of refcount taken.

In general, we at least need per-protocol handling to make it accurate,
which can be an overkill if we know incoming failed after all.  Add a short
comment to explain that when taking the refcount in qmp_migrate_incoming().

Bugzilla: https://issues.redhat.com/browse/RHEL-69775
Tested-by: Yan Fu <yafu@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20250220132459.512610-1-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>

xen: No need to flush the mapcache for grants

On IOREQ_TYPE_INVALIDATE we need to invalidate the mapcache for regular
mappings. Since recently we started reusing the mapcache also to keep
track of grants mappings. However, there is no need to remove grant
mappings on IOREQ_TYPE_INVALIDATE requests, we shouldn't do that. So
remove the function call.

Fixes: 9ecdd4bf08 (xen: mapcache: Add support for grant mappings)
Cc: qemu-stable@nongnu.org
Reported-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
Message-Id: <20250206194915.3357743-2-edgar.iglesias@gmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@vates.tech>

hw/xen: Add "mode" parameter to xen-block devices

Block devices don't work in PV Grub (0.9x) if there is no mode specified. It
complains: "Error ENOENT when reading the mode"

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20250207143724.30792-2-dwmw2@infradead.org>
Signed-off-by: Anthony PERARD <anthony.perard@vates.tech>

xen/passthrough: use gsi to map pirq when dom0 is PVH

In PVH dom0, when passthrough a device to domU, QEMU code
xen_pt_realize->xc_physdev_map_pirq wants to use gsi, but in current codes
the gsi number is got from file /sys/bus/pci/devices/<sbdf>/irq, that is
wrong, because irq is not equal with gsi, they are in different spaces, so
pirq mapping fails.

To solve above problem, use new interface of Xen, xc_pcidev_get_gsi to get
gsi and use xc_physdev_map_pirq_gsi to map pirq when dom0 is PVH.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Acked-by: Anthony PERARD <anthony@xenproject.org>
Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Message-Id: <20241106061418.3655304-1-Jiqian.Chen@amd.com>
Signed-off-by: Anthony PERARD <anthony.perard@vates.tech>

MAINTAINERS: remove widely sanctioned entities

The following organisations appear on the US sanctions list:

Yadro: https://sanctionssearch.ofac.treas.gov/Details.aspx?id=41125
ISPRAS: https://sanctionssearch.ofac.treas.gov/Details.aspx?id=50890

As a result maintainers interacting with such entities would face
legal risk in a number of jurisdictions. To reduce the risk of
inadvertent non-compliance remove entries from these organisations
from the MAINTAINERS file.

Mark the pcf8574 system as orphaned until someone volunteers to step
up as a maintainer. Add myself as a second reviewer to record/replay
so I can help with what odd fixes I can.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-32-alex.bennee@linaro.org>

plugins/core: make a single build unit

Trim through the includes and remove everything not needed for the
core. Only include tcg-op-common.h to remove the need to
TARGET_LONG_BITS and move the build unit into the common set.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-31-alex.bennee@linaro.org>

plugins/api: build only once

Now all the softmmu/user-mode stuff has been split out we can build
this compilation unit only once.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-30-alex.bennee@linaro.org>

plugins/api: split out time control helpers

These are only usable in system mode where we control the timer. For
user-mode make them NOPs.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-29-alex.bennee@linaro.org>

plugins/api: split out the vaddr/hwaddr helpers

These only work for system-mode and are NOPs for user-mode.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-28-alex.bennee@linaro.org>

plugins/api: split out binary path/start/end/entry code

To move the main api.c to a single build compilation object we need to
start splitting out user and system specific code. As we need to grob
around host headers we move these particular helpers into the *-user
mode directories.

The binary/start/end/entry helpers are all NOPs for system mode.

While using the plugin-api.c.inc trick means we build for both
linux-user and bsd-user the BSD user-mode command line is still
missing -plugin. This can be enabled once we have reliable check-tcg
tests working for the BSDs.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-27-alex.bennee@linaro.org>

plugins/loader: compile loader only once

There is very little in loader that is different between builds save
for a tiny user/system mode difference in the plugin_info structure.
Create two new files, user and system to hold mode specific helpers
and move loader into common_ss.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-26-alex.bennee@linaro.org>

plugins/plugin.h: include queue.h

Headers should bring in what they need so don't rely on getting
queue.h by side effects. This will help with clean-ups in the
following patches.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-25-alex.bennee@linaro.org>

plugins/api: clean-up the includes

Thanks to re-factoring and clean-up work (especially to exec-all) we
no longer need such broad headers for the api.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-24-alex.bennee@linaro.org>

include/qemu: plugin-memory.h doesn't need cpu-defs.h

hwaddr is a fixed size on all builds.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-23-alex.bennee@linaro.org>

plugins/loader: populate target_name with target_name()

We have a function we can call for this, lets not rely on macros that
stop us building once.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-22-alex.bennee@linaro.org>

plugins/api: use qemu_target_page_mask() to get value

Requiring TARGET_PAGE_MASK to be defined gets in the way of building
this unit once. qemu_target_page_mask() will tell us what it is.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-21-alex.bennee@linaro.org>

tests/functional: add boot error detection for RME tests

It was identified that those tests randomly fail with a synchronous
exception at boot (reported by EDK2).
While we solve this problem, report failure immediately so tests don't
timeout in CI.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250303185745.2504842-1-pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-20-alex.bennee@linaro.org>

gitlab: add a new build_unit job to track build size

We want to reduce the total number of build units in the system to get
on our way to a single binary. It will help to have some numbers so
lets add a job to gitlab to track our progress.

Cc: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-19-alex.bennee@linaro.org>

tests/tcg: Suppress compiler false-positive warning on sha1.c

GCC versions at least 12 through 15 incorrectly report a warning
about code in sha1.c:

tests/tcg/multiarch/sha1.c:161:13: warning: ‘SHA1Transform’ reading 64 bytes from a region of size 0 [-Wstringop-overread]
161 | SHA1Transform(context->state, &data[i]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is a piece of stock library code for doing SHA1 which we've
simply copied, rather than writing ourselves. The bug has been
reported to upstream GCC (about a different library's use of this
code):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106709

For our test case, since this isn't our original code and there isn't
actually a bug in it, suppress the incorrect warning rather than
trying to modify the code to work around the compiler issue.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2328
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250227141343.1675415-1-peter.maydell@linaro.org>
[AJB: -Wno-unknown-warning-option for clang's sake]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-18-alex.bennee@linaro.org>

tests/tcg: enable -fwrapv for test-i386-bmi

We allow things like:

tests/tcg/i386/test-i386-bmi2.c:124:35: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
assert(result == (mask & ~(-1 << 30)));

in the main code, so allow it for the test.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-17-alex.bennee@linaro.org>

tests/tcg: fix constraints in test-i386-adcox

Clang complains:

  clang -O2 -m64 -mcx16 /home/alex/lsrc/qemu.git/tests/tcg/i386/test-i386-adcox.c -o test-i386-adcox -static
  /home/alex/lsrc/qemu.git/tests/tcg/i386/test-i386-adcox.c:32:26: error: invalid input constraint '0' in asm
          : "r" ((REG)-1), "0" (flags), "1" (out_adcx), "2" (out_adox));
                           ^
  /home/alex/lsrc/qemu.git/tests/tcg/i386/test-i386-adcox.c:57:26: error: invalid input constraint '0' in asm
          : "r" ((REG)-1), "0" (flags), "1" (out_adcx), "2" (out_adox));
                           ^
  2 errors generated.

Pointing out a numbered input constraint can't point to a read/write
output [1]. Convert to a read-only input constraint to allow this.

[1] https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20101101/036036.html

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-16-alex.bennee@linaro.org>

tests/tcg: add message to _Static_assert in test-avx

In preparation for enabling clang and avoiding:

error: '_Static_assert' with no message is a C2x extension [-Werror,-Wc2x-extensions]

let us just add the message to silence the warning.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-15-alex.bennee@linaro.org>

tests/tcg: mark test-vma as a linux-only test

The main multiarch tests should compile for any POSIX system, however
test-vma's usage of MAP_NORESERVE makes it a linux-only test. Simply
moving the source file is enough for the build logic to skip on BSD's.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-14-alex.bennee@linaro.org>

tests/vm: bump timeout for shutdown

On my fairly beefy machine the timeout was triggering leaving a
corrupted disk image due to power being pulled before the disk had
synced. Triple the timeout to avoid this.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-13-alex.bennee@linaro.org>

libvirt-ci: bump to latest for vulkan-tools

The alpine baseline has also been updated in the meantime so we need
to address that while we are at it.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-12-alex.bennee@linaro.org>

tests/functional: Allow running TCG plugins tests on non-Linux/BSD hosts

Not all platforms use the '.so' suffix for shared libraries,
which is how plugins are built. Use the recently introduced
dso_suffix() helper to get the proper host suffix.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2804
Suggested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250220080215.49165-4-philmd@linaro.org>
[AJB: moved plugin_file into testcase.py]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-11-alex.bennee@linaro.org>

tests/functional: Introduce the dso_suffix() helper

Introduce a helper to get the default shared library
suffix used on the host.

Suggested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250220080215.49165-3-philmd@linaro.org>
[AJB: dropped whitespace cmd.py damage]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-10-alex.bennee@linaro.org>

plugins: add explicit dependency in functional tests

./tests/functional/test_aarch64_tcg_plugins.py needs to have plugin
libinsn built. However, it's not listed as a dependency, so meson can't
know it needs to be built.

Thus, we keep track of all plugins, and add them as an explicit
dependency.

Fixes: 4c134d07b9e ("tests: add a new set of tests to exercise plugins")
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-9-alex.bennee@linaro.org>

tests/functional: update the aarch64_virg_gpu images

Update to the most recent aarch64_virt_gpu image. The principle
differences are:

  - target a v8.0 baseline CPU
  - latest vkmark (2025.1)
  - actually uses the rootfs (previously was initrd)
  - rootfs includes more testing tools for interactive use

See README.md in https://fileserver.linaro.org/s/ce5jXBFinPxtEdx for
details about the image creation and the buildroot config.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-8-alex.bennee@linaro.org>

tests/functional: expand tests to cover virgl

Add two more test modes using glmark2-wayland to exercise the OpenGL
pass-through modes with virgl. Virgl can run with or without the
hostmem blob support. To avoid repeating ourselves too much we make
the initial pass a simple --validate pass.

We might want to eventually add more directed tests and individual
features later on but the glmark/vkmark tests are a good general
smoke test for accelerated 3D.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-7-alex.bennee@linaro.org>

tests/functional: skip vulkan tests with nVidia

While running the new GPU tests it was noted that the proprietary
nVidia driver barfed when run under the sanitiser:

  2025-02-20 11:13:08,226: [11:13:07.782] Output 'headless' attempts
  EOTF mode SDR and colorimetry mode default.
  2025-02-20 11:13:08,227: [11:13:07.784] Output 'headless' using color
  profile: stock sRGB color profile

  and that's the last thing it outputs.

  The sanitizer reports that when the framework sends the SIGTERM
  because of the timeout we get a write to a NULL pointer (but
  interesting not this time in an atexit callback):

  UndefinedBehaviorSanitizer:DEADLYSIGNAL
  ==471863==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address
  0x000000000000 (pc 0x7a18ceaafe80 bp 0x000000000000 sp 0x7ffe8e3ff6d0
  T471863)
  ==471863==The signal is caused by a WRITE memory access.
  ==471863==Hint: address points to the zero page.
      #0 0x7a18ceaafe80
  (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x16afe80)
  (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
      #1 0x7a18ce9e72c0
  (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15e72c0)
  (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
      #2 0x7a18ce9f11bb
  (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x15f11bb)
  (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
      #3 0x7a18ce6dc9d1
  (/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01+0x12dc9d1)
  (BuildId: 24b0d0b90369112e3de888a93eb8d7e00304a6db)
      #4 0x7a18e7d15326 in vrend_renderer_create_fence
  /usr/src/virglrenderer-1.0.0-1ubuntu2/obj-x86_64-linux-gnu/../src/vrend_renderer.c:10883:26
      #5 0x55bfb6621871 in virtio_gpu_virgl_process_cmd

The #dri-devel channel confirmed:

  <digetx> stsquad: nv driver is known to not work with venus, don't use
      it for testing

So lets skip running the test to avoid known failures. As we now use
vulkaninfo to probe we also need to handle the case where there is no
Vulkan driver configured for the hardware.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
[AJB: also skip if vulkaninfo can't find environment]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250304222439.2035603-6-alex.bennee@linaro.org>