git.ipfire.org Git - thirdparty/qemu.git/log

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20210921' into staging

target-arm queue:
* Optimize codegen for MVE when predication not active
* hvf: Add Apple Silicon support
* hw/intc: Set GIC maintenance interrupt level to only 0 or 1
* Fix mishandling of MVE FPSCR.LTPSIZE reset for usermode emulator
* elf2dmp: Fix coverity nits

# gpg: Signature made Tue 21 Sep 2021 16:31:17 BST
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20210921: (27 commits)
  target/arm: Optimize MVE 1op-immediate insns
  target/arm: Optimize MVE VSLI and VSRI
  target/arm: Optimize MVE VSHLL and VMOVL
  target/arm: Optimize MVE VSHL, VSHR immediate forms
  target/arm: Optimize MVE VMVN
  target/arm: Optimize MVE VDUP
  target/arm: Optimize MVE VNEG, VABS
  target/arm: Optimize MVE arithmetic ops
  target/arm: Optimize MVE logic ops
  target/arm: Add TB flag for "MVE insns not predicated"
  target/arm: Enforce that FPDSCR.LTPSIZE is 4 on inbound migration
  target/arm: Avoid goto_tb if we're trying to exit to the main loop
  hvf: arm: Add rudimentary PMC support
  arm: Add Hypervisor.framework build target
  hvf: arm: Implement PSCI handling
  hvf: arm: Implement -cpu host
  arm/hvf: Add a WFI handler
  hvf: Add Apple Silicon support
  hvf: Introduce hvf_arch_init() callback
  hvf: Add execute to dirty log permission bitmap
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Optimize MVE 1op-immediate insns

Optimize the MVE 1op-immediate insns (VORR, VBIC, VMOV) to
use TCG vector ops when possible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-13-peter.maydell@linaro.org

target/arm: Optimize MVE VSLI and VSRI

Optimize the MVE shift-and-insert insns by using TCG
vector ops when possible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-12-peter.maydell@linaro.org

target/arm: Optimize MVE VSHLL and VMOVL

Optimize the MVE VSHLL insns by using TCG vector ops when possible.
This includes the VMOVL insn, which we handle in mve.decode as "VSHLL
with zero shift count".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-11-peter.maydell@linaro.org

target/arm: Optimize MVE VSHL, VSHR immediate forms

Optimize the MVE VSHL and VSHR immediate forms by using TCG vector
ops when possible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-10-peter.maydell@linaro.org

target/arm: Optimize MVE VMVN

Optimize the MVE VMVN insn by using TCG vector ops when possible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-9-peter.maydell@linaro.org

target/arm: Optimize MVE VDUP

Optimize the MVE VDUP insns by using TCG vector ops when possible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-8-peter.maydell@linaro.org

target/arm: Optimize MVE VNEG, VABS

Optimize the MVE VNEG and VABS insns by using TCG
vector ops when possible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-7-peter.maydell@linaro.org

target/arm: Optimize MVE arithmetic ops

Optimize MVE arithmetic ops when we have a TCG
vector operation we can use.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-6-peter.maydell@linaro.org

target/arm: Optimize MVE logic ops

When not predicating, implement the MVE bitwise logical insns
directly using TCG vector operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-5-peter.maydell@linaro.org

target/arm: Add TB flag for "MVE insns not predicated"

Our current codegen for MVE always calls out to helper functions,
because some byte lanes might be predicated.  The common case is that
in fact there is no predication active and all lanes should be
updated together, so we can produce better code by detecting that and
using the TCG generic vector infrastructure.

Add a TB flag that is set when we can guarantee that there is no
active MVE predication, and a bool in the DisasContext.  Subsequent
patches will use this flag to generate improved code for some
instructions.

In most cases when the predication state changes we simply end the TB
after that instruction.  For the code called from vfp_access_check()
that handles lazy state preservation and creating a new FP context,
we can usually avoid having to try to end the TB because luckily the
new value of the flag following the register changes in those
sequences doesn't depend on any runtime decisions.  We do have to end
the TB if the guest has enabled lazy FP state preservation but not
automatic state preservation, but this is an odd corner case that is
not going to be common in real-world code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-4-peter.maydell@linaro.org

target/arm: Enforce that FPDSCR.LTPSIZE is 4 on inbound migration

Architecturally, for an M-profile CPU with the LOB feature the
LTPSIZE field in FPDSCR is always constant 4.  QEMU's implementation
enforces this everywhere, except that we don't check that it is true
in incoming migration data.

We're going to add come in gen_update_fp_context() which relies on
the "always 4" property.  Since this is TCG-only, we don't actually
need to be robust to bogus incoming migration data, and the effect of
it being wrong would be wrong code generation rather than a QEMU
crash; but if it did ever happen somehow it would be very difficult
to track down the cause.  Add a check so that we fail the inbound
migration if the FPDSCR.LTPSIZE value is incorrect.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-3-peter.maydell@linaro.org

target/arm: Avoid goto_tb if we're trying to exit to the main loop

Currently gen_jmp_tb() assumes that if it is called then the jump it
is handling is the only reason that we might be trying to end the TB,
so it will use goto_tb if it can.  This is usually the case: mostly
"we did something that means we must end the TB" happens on a
non-branch instruction.  However, there are cases where we decide
early in handling an instruction that we need to end the TB and
return to the main loop, and then the insn is a complex one that
involves gen_jmp_tb().  For instance, for M-profile FP instructions,
in gen_preserve_fp_state() which is called from vfp_access_check() we
want to force an exit to the main loop if lazy state preservation is
active and we are in icount mode.

Make gen_jmp_tb() look at the current value of is_jmp, and only use
goto_tb if the previous is_jmp was DISAS_NEXT or DISAS_TOO_MANY.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-2-peter.maydell@linaro.org

hvf: arm: Add rudimentary PMC support

We can expose cycle counters on the PMU easily. To be as compatible as
possible, let's do so, but make sure we don't expose any other architectural
counters that we can not model yet.

This allows OSs to work that require PMU support.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-10-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

arm: Add Hypervisor.framework build target

Now that we have all logic in place that we need to handle Hypervisor.framework
on Apple Silicon systems, let's add CONFIG_HVF for aarch64 as well so that we
can build it.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com> (x86 only)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210916155404.86958-9-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hvf: arm: Implement PSCI handling

We need to handle PSCI calls. Most of the TCG code works for us,
but we can simplify it to only handle aa64 mode and we need to
handle SUSPEND differently.

This patch takes the TCG code as template and duplicates it in HVF.

To tell the guest that we support PSCI 0.2 now, update the check in
arm_cpu_initfn() as well.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-8-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hvf: arm: Implement -cpu host

Now that we have working system register sync, we push more target CPU
properties into the virtual machine. That might be useful in some
situations, but is not the typical case that users want.

So let's add a -cpu host option that allows them to explicitly pass all
CPU capabilities of their host CPU into the guest.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Acked-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-7-agraf@csgraf.de
[PMM: drop unnecessary #include line from .h file]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

arm/hvf: Add a WFI handler

Sleep on WFI until the VTIMER is due but allow ourselves to be woken
up on IPI.

In this implementation IPI is blocked on the CPU thread at startup and
pselect() is used to atomically unblock the signal and begin sleeping.
The signal is sent unconditionally so there's no need to worry about
races between actually sleeping and the "we think we're sleeping"
state. It may lead to an extra wakeup but that's better than missing
it entirely.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Acked-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210916155404.86958-6-agraf@csgraf.de
[agraf: Remove unused 'set' variable, always advance PC on WFX trap,
support vm stop / continue operations and cntv offsets]
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Acked-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/legoater/tags/pull-aspeed-20210920' into staging

Aspeed patches :

* MAC enablement fixes (Guenter)
* Watchdog  and pca9552 fixes (Andrew)
* GPIO fixes (Joel)
* AST2600A3 SoC and DPS310 models (Joel)
* New Fuji BMC machine (Peter)

# gpg: Signature made Mon 20 Sep 2021 07:51:23 BST
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* remotes/legoater/tags/pull-aspeed-20210920:
  hw/arm/aspeed: Add Fuji machine type
  hw/arm/aspeed: Allow machine to set UART default
  hw/arm/aspeed: Initialize AST2600 UART clock selection registers
  arm/aspeed: Add DPS310 to Witherspoon and Rainier
  hw/misc: Add Infineon DPS310 sensor model
  aspeed: Emulate the AST2600A3
  arm/aspeed: rainier: Add i2c eeproms and muxes
  misc/pca9552: Fix LED status register indexing in pca955x_get_led()
  hw: aspeed_gpio: Clarify GPIO controller name
  hw: aspeed_gpio: Simplify 1.8V defines
  watchdog: aspeed: Fix sequential control writes
  watchdog: aspeed: Sanitize control register values
  hw: arm: aspeed: Enable mac0/1 instead of mac1/2 for g220a
  hw: arm: aspeed: Enable eth0 interface for aspeed-ast2600-evb

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Fri 17 Sep 2021 09:17:32 BST
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  virtio-net: fix use after unmap/free for sg
  ebpf: only include in system emulators

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/jsnow-gitlab/tags/python-pull-request' into staging

Python Pull request

This fixes the check-python-tox job.

CI including optional jobs is all green:
https://gitlab.com/jsnow/qemu/-/pipelines/372151147

# gpg: Signature made Thu 16 Sep 2021 23:05:35 BST
# gpg:                using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full]
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jsnow-gitlab/tags/python-pull-request:
  python: pylint 2.11 support
  python: Update for pylint 2.10

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hvf: Add Apple Silicon support

With Apple Silicon available to the masses, it's a good time to add support
for driving its virtualization extensions from QEMU.

This patch adds all necessary architecture specific code to get basic VMs
working, including save/restore.

Known limitations:

- WFI handling is missing (follows in later patch)
- No watchpoint/breakpoint support

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-5-agraf@csgraf.de
[PMM: added missing #include]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hvf: Introduce hvf_arch_init() callback

We will need to install a migration helper for the ARM hvf backend.
Let's introduce an arch callback for the overall hvf init chain to
do so.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-4-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hvf: Add execute to dirty log permission bitmap

Hvf's permission bitmap during and after dirty logging does not include
the HV_MEMORY_EXEC permission. At least on Apple Silicon, this leads to
instruction faults once dirty logging was enabled.

Add the bit to make it work properly.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-3-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

arm: Move PMC register definitions to internals.h

We will need PMC register definitions in accel specific code later.
Move all constant definitions to common arm headers so we can reuse
them.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210916155404.86958-2-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/intc: Set GIC maintenance interrupt level to only 0 or 1

During sbsa acs level 3 testing, it is seen that the GIC maintenance
interrupts are not triggered and the related test cases fail. This
is because we were incorrectly passing the value of the MISR register
(from maintenance_interrupt_state()) to qemu_set_irq() as the level
argument, whereas the device on the other end of this irq line
expects a 0/1 value.

Fix the logic to pass a 0/1 level indication, rather than a
0/not-0 value.

Fixes: c5fc89b36c0 ("hw/intc/arm_gicv3: Implement gicv3_cpuif_virt_update()")
Signed-off-by: Shashi Mallela <shashi.mallela@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210915205809.59068-1-shashi.mallela@linaro.org
[PMM: tweaked commit message; collapsed nested if()s into one]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Consolidate ifdef blocks in reset

Move an ifndef CONFIG_USER_ONLY code block up in arm_cpu_reset() so
it can be merged with another earlier one.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210914120725.24992-4-peter.maydell@linaro.org

target/arm: Always clear exclusive monitor on reset

There's no particular reason why the exclusive monitor should
be only cleared on reset in system emulation mode. It doesn't
hurt if it isn't cleared in user mode, but we might as well
reduce the amount of code we have that's inside an ifdef.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210914120725.24992-3-peter.maydell@linaro.org

target/arm: Don't skip M-profile reset entirely in user mode

Currently all of the M-profile specific code in arm_cpu_reset() is
inside a !defined(CONFIG_USER_ONLY) ifdef block.  This is
unintentional: it happened because originally the only
M-profile-specific handling was the setup of the initial SP and PC
from the vector table, which is system-emulation only.  But then we
added a lot of other M-profile setup to the same "if (ARM_FEATURE_M)"
code block without noticing that it was all inside a not-user-mode
ifdef.  This has generally been harmless, but with the addition of
v8.1M low-overhead-loop support we ran into a problem: the reset of
FPSCR.LTPSIZE to 4 was only being done for system emulation mode, so
if a user-mode guest tried to execute the LE instruction it would
incorrectly take a UsageFault.

Adjust the ifdefs so only the really system-emulation specific parts
are covered.  Because this means we now run some reset code that sets
up initial values in the FPCCR and similar FPU related registers,
explicitly set up the registers controlling FPU context handling in
user-emulation mode so that the FPU works by design and not by
chance.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/613
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210914120725.24992-2-peter.maydell@linaro.org

elf2dmp: Fail cleanly if PDB file specifies zero block_size

Coverity points out that if the PDB file we're trying to read
has a header specifying a block_size of zero then we will
end up trying to divide by zero in pdb_ds_read_file().
Check for this and fail cleanly instead.

Fixes: Coverity CID 1458869
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Message-id: 20210910170656.366592-3-philmd@redhat.com
Message-Id: <20210901143910.17112-3-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>

elf2dmp: Check curl_easy_setopt() return value

Coverity points out that we aren't checking the return value
from curl_easy_setopt().

Fixes: Coverity CID 1458895
Inspired-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Tested-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
Message-id: 20210910170656.366592-2-philmd@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/aspeed: Add Fuji machine type

This adds a new machine type "fuji-bmc" based on the following device tree:

https://github.com/torvalds/linux/blob/40cb6373b46/arch/arm/boot/dts/aspeed-bmc-facebook-fuji.dts

Most of the i2c devices are not there, they're added here:

https://github.com/facebook/openbmc/blob/fb2ed12002fb/meta-facebook/meta-fuji/recipes-utils/openbmc-utils/files/setup_i2c.sh

I tested this by building a Fuji image from Facebook's OpenBMC repo,
booting, and ssh'ing from host-to-guest.

Signed-off-by: Peter Delevoryas <pdel@fb.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
[ clg: On 32-bit hosts, lower RAM to 1G because of 2047 MB limit ]
Message-Id: <20210906133124.3674661-1-pdel@fb.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

hw/arm/aspeed: Allow machine to set UART default

When you run QEMU with an Aspeed machine and a single serial device
using stdio like this:

qemu -machine ast2600-evb -drive ... -serial stdio

The guest OS can read and write to the UART5 registers at 0x1E784000 and
it will receive from stdin and write to stdout. The Aspeed SoC's have a
lot more UART's though (AST2500 has 5, AST2600 has 13) and depending on
the board design, may be using any of them as the serial console. (See
"stdout-path" in a DTS to check which one is chosen).

Most boards, including all of those currently defined in
hw/arm/aspeed.c, just use UART5, but some use UART1. This change adds
some flexibility for different boards without requiring users to change
their command-line invocation of QEMU.

I tested this doesn't break existing code by booting an AST2500 OpenBMC
image and an AST2600 OpenBMC image, each using UART5 as the console.

Then I tested switching the default to UART1 and booting an AST2600
OpenBMC image that uses UART1, and that worked too.

Signed-off-by: Peter Delevoryas <pdel@fb.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901153615.2746885-2-pdel@fb.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

hw/arm/aspeed: Initialize AST2600 UART clock selection registers

UART5 is typically used as the default debug UART on the AST2600, but
UART1 is also designed to be a debug UART. All the AST2600 UART's have
semi-configurable clock rates through registers in the System Control
Unit (SCU), but only UART5 works out of the box with zero-initialized
values. The rest of the UART's expect a few of the registers to be
initialized to non-zero values, or else the clock rate calculation will
yield zero or undefined (due to a divide-by-zero).

For reference, the U-Boot clock rate driver here shows the calculation:

    https://github.com/facebook/openbmc-uboot/blob/15f7e0dc01d8/drivers/clk/aspeed/clk_ast2600.c#L357

To summarize, UART5 allows selection from 4 rates: 24 MHz, 192 MHz, 24 /
13 MHz, and 192 / 13 MHz. The other UART's allow selecting either the
"low" rate (UARTCLK) or the "high" rate (HUARTCLK). UARTCLK and HUARTCLK
are configurable themselves:

    UARTCLK = UXCLK * R / (N * 2)
    HUARTCLK = HUXCLK * HR / (HN * 2)

UXCLK and HUXCLK are also configurable, and depend on the APLL and/or
HPLL clock rates, which also derive from complicated calculations. Long
story short, there's lots of multiplication and division from
configurable registers, and most of these registers are zero-initialized
in QEMU, which at best is unexpected and at worst causes this clock rate
driver to hang from divide-by-zero's. This can also be difficult to
diagnose, because it may cause U-Boot to hang before serial console
initialization completes, requiring intervention from gdb.

This change just initializes all of these registers with default values
from the datasheet.

To test this, I used Facebook's AST2600 OpenBMC image for "fuji", with
the following diff applied (because fuji uses UART1 for console output,
not UART5).

  @@ -323,8 +323,8 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp)
       }

      /* UART - attach an 8250 to the IO space as our UART5 */
  -    serial_mm_init(get_system_memory(), sc->memmap[ASPEED_DEV_UART5], 2,
  -                   aspeed_soc_get_irq(s, ASPEED_DEV_UART5),
  +    serial_mm_init(get_system_memory(), sc->memmap[ASPEED_DEV_UART1], 2,
  +                   aspeed_soc_get_irq(s, ASPEED_DEV_UART1),
                    38400, serial_hd(0), DEVICE_LITTLE_ENDIAN);

       /* I2C */

Without these clock rate registers being initialized, U-Boot hangs in
the clock rate driver from a divide-by-zero, because the UART1 clock
rate register reads return zero, and there's no console output. After
initializing them with default values, fuji boots successfully.

Signed-off-by: Peter Delevoryas <pdel@fb.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
[ clg: Removed _PARAM suffix ]
Message-Id: <20210906134023.3711031-2-pdel@fb.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

arm/aspeed: Add DPS310 to Witherspoon and Rainier

Witherspoon uses the DPS310 as a temperature sensor. Rainier uses it as
a temperature and humidity sensor.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210629142336.750058-5-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

hw/misc: Add Infineon DPS310 sensor model

This contains some hardcoded register values that were obtained from the
hardware after reading the temperature.

It does enough to test the Linux kernel driver. The FIFO mode, IRQs and
operation modes other than the default as used by Linux are not modelled.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Message-Id: <20210616073358.750472-2-joel@jms.id.au>
[ clg: - Fixed sequential reading
- Reworked regs_reset_state array
- Moved model under hw/sensor/ ]
Message-Id: <20210629142336.750058-4-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

aspeed: Emulate the AST2600A3

This is the latest revision of the ASPEED 2600 SoC. As there is no
need to model multiple revisions of the same SoC for the moment,
update the SCU AST2600 to model the A3 revision instead of the A1 and
adapt the AST2600 SoC and machines.

Reset values are taken from v8 of the datasheet.

Signed-off-by: Joel Stanley <joel@jms.id.au>
[ clg: - Introduced an Aspeed "ast2600-a3" SoC class
- Commit log update ]
Message-Id: <20210629142336.750058-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

arm/aspeed: rainier: Add i2c eeproms and muxes

These are the devices documented by the Rainier device tree. With this
we can see the guest discovering the multiplexers and probing the eeprom
devices:

i2c i2c-2: Added multiplexed i2c bus 16
i2c i2c-2: Added multiplexed i2c bus 17
i2c i2c-2: Added multiplexed i2c bus 18
i2c i2c-2: Added multiplexed i2c bus 19
i2c-mux-gpio i2cmux: 4 port mux on 1e78a180.i2c-bus adapter
at24 20-0050: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
i2c i2c-4: Added multiplexed i2c bus 20
at24 21-0051: 8192 byte 24c64 EEPROM, writable, 1 bytes/write
i2c i2c-4: Added multiplexed i2c bus 21
at24 22-0052: 8192 byte 24c64 EEPROM, writable, 1 bytes/write

Signed-off-by: Joel Stanley <joel@jms.id.au>
[ clg: Introduced aspeed_eeprom_init ]
Message-Id: <20210629142336.750058-2-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

misc/pca9552: Fix LED status register indexing in pca955x_get_led()

There was a bit of a thinko in the state calculation where every odd pin
in was reported in e.g. "pwm0" mode rather than "off". This was the
result of an incorrect bit shift for the 2-bit field representing each
LED state.

Fixes: a90d8f84674d ("misc/pca9552: Add qom set and get")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210723043624.348158-1-andrew@aj.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

hw: aspeed_gpio: Clarify GPIO controller name

There are two GPIO controllers in the ast2600; one is 3.3V and the other
is 1.8V.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210713065854.134634-4-joel@jms.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

hw: aspeed_gpio: Simplify 1.8V defines

There's no need to define the registers relative to the 0x800 offset
where the controller is mapped, as the device is instantiated as it's
own model at the correct memory address.

Simplify the defines and remove the offset to save future confusion.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210713065854.134634-3-joel@jms.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

watchdog: aspeed: Fix sequential control writes

The logic in the handling for the control register required toggling the
enable state for writes to stick. Rework the condition chain to allow
sequential writes that do not update the enable state.

Fixes: 854123bf8d4b ("wdt: Add Aspeed watchdog device model")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210709053107.1829304-3-andrew@aj.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

watchdog: aspeed: Sanitize control register values

While some of the critical fields remain the same, there is variation in
the definition of the control register across the SoC generations.
Reserved regions are adjusted, while in other cases the mutability or
behaviour of fields change.

Introduce a callback to sanitize the value on writes to ensure model
behaviour reflects the hardware.

Fixes: 854123bf8d4b ("wdt: Add Aspeed watchdog device model")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210709053107.1829304-2-andrew@aj.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

hw: arm: aspeed: Enable mac0/1 instead of mac1/2 for g220a

According to its dts file in the Linux kernel, we need mac0 and mac1 enabled
instead of mac1 and mac2. Also, g220a is based on aspeed-g5 (ast2500) which
doesn't even have the third interface.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210810035742.550391-1-linux@roeck-us.net>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

hw: arm: aspeed: Enable eth0 interface for aspeed-ast2600-evb

Commit 7582591ae7 ("aspeed: Support AST2600A1 silicon revision") switched
the silicon revision for AST2600 to revision A1. On revision A1, the first
Ethernet interface is operational. Enable it.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210808200457.889955-1-linux@roeck-us.net>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916' into staging

virtiofsd pull 2021-08-16

Two minor fixes; one for performance, the other seccomp
on s390x.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Thu 16 Sep 2021 14:51:38 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916:
  virtiofsd: Reverse req_list before processing it
  tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

virtio-net: fix use after unmap/free for sg

When mergeable buffer is enabled, we try to set the num_buffers after
the virtqueue elem has been unmapped. This will lead several issues,
E.g a use after free when the descriptor has an address which belongs
to the non direct access region. In this case we use bounce buffer
that is allocated during address_space_map() and freed during
address_space_unmap().

Fixing this by storing the elems temporarily in an array and delay the
unmap after we set the the num_buffers.

This addresses CVE-2021-3748.

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Fixes: fbe78f4f55c6 ("virtio-net support")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>

ebpf: only include in system emulators

eBPF files are being included in user emulators, which is useless and
also breaks compilation because ebpf/trace-events is only processed
if a system emulator is included in the build.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/566
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-6.2-pull-request' into staging

Pull request linux-user 20210916

Code cleanup

# gpg: Signature made Thu 16 Sep 2021 16:11:58 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-6.2-pull-request:
  linux-user: Check lock_user result for ip_mreq_source sockopts
  linux-user: Drop unneeded includes from qemu.h
  linux-user: Don't include gdbstub.h in qemu.h
  linux-user: Split linux-user internals out of qemu.h
  linux-user: Split safe-syscall macro into its own header
  linux-user: Split mmap prototypes into user-mmap.h
  linux-user: Split loader-related prototypes into loader.h
  linux-user: Split signal-related prototypes into signal-common.h
  linux-user: Split strace prototypes into strace.h
  linux-user: Fix coding style nits in qemu.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

python: pylint 2.11 support

We're not ready to enforce f-strings everywhere, so just silence this
new warning.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Message-id: 20210916182248.721529-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>

python: Update for pylint 2.10

A few new annoyances. Of note is the new warning for an unspecified
encoding when opening a text file, which actually does indicate a
potentially real problem; see
https://www.python.org/dev/peps/pep-0597/#motivation

Use LC_CTYPE to determine an encoding to use for interpreting QEMU's
terminal output. Note that Python states: "language code and encoding
may be None if their values cannot be determined" -- use a platform
default as a backup.

Notes: Passing encoding=None will generate a suppressed warning on
Python 3.10+ that 'None' should not be passed as the encoding
argument. This behavior may be deprecated in the future and the default
switched to be a ubiquitous UTF-8. Opting in to the locale default will
be done by passing the encoding 'locale', but that isn't available in
3.6 through 3.9. Presumably this warning will be unsuppressed some time
prior to the actual switch and we can re-investigate these issues at
that time if necessary.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Message-id: 20210916182248.721529-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>

linux-user: Check lock_user result for ip_mreq_source sockopts

In do_setsockopt(), the code path for the options which take a struct
ip_mreq_source (IP_BLOCK_SOURCE, IP_UNBLOCK_SOURCE,
IP_ADD_SOURCE_MEMBERSHIP and IP_DROP_SOURCE_MEMBERSHIP) fails to
check the return value from lock_user(). Handle this in the usual
way by returning -TARGET_EFAULT.

(In practice this was probably harmless because we'd pass a NULL
pointer to setsockopt() and the kernel would then return EFAULT.)

Fixes: Coverity CID 1459987
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210809155424.30968-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-6.2-pull-request' into staging

Trivial patches pull request 20210916

# gpg: Signature made Thu 16 Sep 2021 15:09:39 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/trivial-branch-for-6.2-pull-request:
  target/sparc: Make sparc_cpu_dump_state() static
  target/avr: Fix compiler errors (-Werror=enum-conversion)
  hw/vfio: Fix typo in comments
  intel_iommu: Fix typo in comments
  target/i386: spelling: occured=>occurred, mininum=>minimum
  configure: add missing pc-bios/qemu_vga.ndrv symlink in build tree
  spelling: sytem => system
  qdev: Complete qdev_init_gpio_out() documentation
  hw/i386/acpi-build: Fix a typo
  util: Remove redundant checks in the openpty()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

virtiofsd: Reverse req_list before processing it

With the thread pool disabled, we add the requests in the queue to a
GList, processing by iterating over there afterwards.

For adding them, we're using "g_list_prepend()", which is more
efficient but causes the requests to be processed in reverse order,
breaking the read-ahead and request-merging optimizations in the host
for sequential operations.

According to the documentation, if you need to process the request
in-order, using "g_list_prepend()" and then reversing the list with
"g_list_reverse()" is more efficient than using "g_list_append()", so
let's do it that way.

Testing on a spinning disk (to boost the increase of read-ahead and
request-merging) shows a 4x improvement on sequential write fio test:

Test:
fio --directory=/mnt/virtio-fs --filename=fio-file1 --runtime=20
--iodepth=16 --size=4G --direct=1 --blocksize=4K --ioengine libaio
--rw write --name seqwrite-libaio

Without "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=22.4MiB/s][w=5735 IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=710: Tue Aug 24 12:58:16 2021
write: IOPS=5709, BW=22.3MiB/s (23.4MB/s)(446MiB/20002msec); 0 zone resets
...

With "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=84.0MiB/s][w=21.5k IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=716: Tue Aug 24 13:00:15 2021
write: IOPS=21.3k, BW=83.1MiB/s (87.2MB/s)(1663MiB/20001msec); 0 zone resets
...

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210824131158.39970-1-slp@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist

The virtiofsd currently crashes on s390x when doing something like
this in the guest:

mkdir -p /mnt/myfs
mount -t virtiofs myfs /mnt/myfs
touch /mnt/myfs/foo.txt
stat -f /mnt/myfs/foo.txt

The problem is that the fstatfs64 syscall is called in this case
from the virtiofsd. We have to put it on the seccomp allowlist to
avoid that the daemon gets killed in this case.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001728
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210914123214.181885-1-thuth@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

target/sparc: Make sparc_cpu_dump_state() static

The sparc_cpu_dump_state() function is only called within
the same file. Make it static.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210916084002.1918445-1-f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

Merge remote-tracking branch 'remotes/kraxel/tags/vga-20210916-pull-request' into staging

virtio-gpu + ui: fence syncronization.
qxl: unbreak live migration.

# gpg: Signature made Thu 16 Sep 2021 06:56:03 BST
# gpg:                using RSA key A0328CFFB93A17A79901FE7D4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/vga-20210916-pull-request:
  virtio-gpu: Add gl_flushed callback
  ui/gtk-egl: Wait for the draw signal for dmabuf blobs
  ui: Create sync objects and fences only for blobs
  ui/egl: Add egl helpers to help with synchronization
  ui/gtk: Create a common release_dmabuf helper
  qxl: fix pre-save logic

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/avr: Fix compiler errors (-Werror=enum-conversion)

../target/avr/translate.c: In function ‘gen_jmp_ez’:
../target/avr/translate.c:1012:22: error: implicit conversion from ‘enum <anonymous>’ to ‘DisasJumpType’ [-Werror=enum-conversion]
1012 | ctx->base.is_jmp = DISAS_LOOKUP;
| ^

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Michael Rolnik <mrolnik@gmail.com>
Message-Id: <20210706180936.249912-1-sw@weilnetz.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

hw/vfio: Fix typo in comments

Fix typo in comments:
*programatically  ==> programmatically
*disconecting  ==> disconnecting
*mulitple  ==> multiple
*timout  ==> timeout
*regsiter  ==> register
*forumula  ==> formula

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210730012613.2198-1-caihuoqing@baidu.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

intel_iommu: Fix typo in comments

Fix typo:
*Unknwon  ==> Unknown
*futher  ==> further
*configed  ==> configured

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210730014942.2311-1-caihuoqing@baidu.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

target/i386: spelling: occured=>occurred, mininum=>minimum

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20210818141352.417716-1-mjt@msgid.tls.msk.ru>
[lv: add mininum=>minimum in subject]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

Merge remote-tracking branch 'remotes/hreitz/tags/pull-block-2021-09-15' into staging

Block patches:
- Block-status cache for data regions
- qcow2 optimization (when using subclusters)
- iotests delinting, and let 297 (lint checker) cover named iotests
- qcow2 check improvements
- Added -F (target backing file format) option to qemu-img convert
- Mirror job fix
- Fix for when a migration is initiated while a backup job runs
- Fix for uncached qemu-img convert to a volume with 4k sectors (for an
  unaligned image)
- Minor gluster driver fix

# gpg: Signature made Wed 15 Sep 2021 18:39:11 BST
# gpg:                using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
# gpg:                issuer "hreitz@redhat.com"
# gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00  4D34 A1FA 40D0 9801 9CDF

* remotes/hreitz/tags/pull-block-2021-09-15: (32 commits)
  qemu-img: Add -F shorthand to convert
  qcow2-refcount: check_refblocks(): add separate message for reserved
  qcow2-refcount: check_refcounts_l1(): check reserved bits
  qcow2-refcount: improve style of check_refcounts_l1()
  qcow2-refcount: check_refcounts_l2(): check reserved bits
  qcow2-refcount: check_refcounts_l2(): check l2_bitmap
  qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmap
  qcow2-refcount: introduce fix_l2_entry_by_zero()
  qcow2: introduce qcow2_parse_compressed_l2_entry() helper
  qcow2: compressed read: simplify cluster descriptor passing
  qcow2-refcount: improve style of check_refcounts_l2()
  qemu-img: Allow target be aligned to sector size
  qcow2: handle_dependencies(): relax conflict detection
  qcow2: refactor handle_dependencies() loop body
  simplebench: add img_bench_templater.py
  block: bdrv_inactivate_recurse(): check for permissions and fix crash
  tests: add migrate-during-backup
  block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()
  iotests/297: Cover tests/
  mirror-top-perms: Fix AbnormalShutdown path
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

qemu-img: Add -F shorthand to convert

Although we have long supported 'qemu-img convert -o
backing_file=foo,backing_fmt=bar', the fact that we have a shortcut -B
for backing_file but none for backing_fmt has made it more likely that
users accidentally run into:

qemu-img: warning: Deprecated use of backing file without explicit backing format

when using -B instead of -o. For similarity with other qemu-img
commands, such as create and compare, add '-F $fmt' as the shorthand
for '-o backing_fmt=$fmt'. Update iotest 122 for coverage of both
spellings.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210913131735.1948339-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: check_refblocks(): add separate message for reserved

Split checking for reserved bits out of aligned offset check.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-11-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: check_refcounts_l1(): check reserved bits

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-10-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: improve style of check_refcounts_l1()

- use g_autofree for l1_table
- better name for size in bytes variable
- reduce code blocks nesting
- whitespaces, braces, newlines

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-9-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: check_refcounts_l2(): check reserved bits

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-8-vsementsov@virtuozzo.com>
[hreitz: Separated `type` declaration from statements]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: check_refcounts_l2(): check l2_bitmap

Check subcluster bitmap of the l2 entry for different types of
clusters:

- for compressed it must be zero
- for allocated check consistency of two parts of the bitmap
- for unallocated all subclusters should be unallocated
(or zero-plain)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Message-Id: <20210914122454.141075-7-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmap

We'll reuse the function to fix wrong L2 entry bitmap. Support it now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-6-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: introduce fix_l2_entry_by_zero()

Split fix_l2_entry_by_zero() out of check_refcounts_l2() to be
reused in further patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-5-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2: introduce qcow2_parse_compressed_l2_entry() helper

Add helper to parse compressed l2_entry and use it everywhere instead
of open-coding.

Note, that in most places we move to precise coffset/csize instead of
sector-aligned. Still it should work good enough for updating
refcounts.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-4-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2: compressed read: simplify cluster descriptor passing

Let's pass the whole L2 entry and not bother with
L2E_COMPRESSED_OFFSET_SIZE_MASK.

It also helps further refactoring that adds generic
qcow2_parse_compressed_l2_entry() helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2-refcount: improve style of check_refcounts_l2()

- don't use same name for size in bytes and in entries
- use g_autofree for l2_table
- add whitespace
- fix block comment style

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

gitlab-ci: Mark manual-only jobs as allow_failure

If a gitlab CI job is marked as manual-only but is not marked
as allow_failure, then gitlab considers that the pipeline is
"blocked" until the job has been manually triggered. We need
to mark these manual-only jobs as also allow_failure: true
so that gitlab doesn't insist that they have run before it
will consider the pipeline to be complete.

Fixes: 4c9af1ea1457782cf0adb29
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20210915123412.8232-1-peter.maydell@linaro.org
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>

qemu-img: Allow target be aligned to sector size

We cannot write to images opened with O_DIRECT unless we allow them to
be resized so they are aligned to the sector size: Since 9c60a5d1978,
bdrv_node_refresh_perm() ensures that for nodes whose length is not
aligned to the request alignment and where someone has taken a WRITE
permission, the RESIZE permission is taken, too).

Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
blk_new_open() to take the RESIZE permission) when using cache=none for
the target, so that when writing to it, it can be aligned to the target
sector size.

Without this patch, an error is returned:

$ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
permission without 'resize': Image size is not a multiple of request
alignment

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210819101200.64235-1-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

qcow2: handle_dependencies(): relax conflict detection

There is no conflict and no dependency if we have parallel writes to
different subclusters of one cluster when the cluster itself is already
allocated. So, relax extra dependency.

Measure performance:
First, prepare build/qemu-img-old and build/qemu-img-new images.

cd scripts/simplebench
./img_bench_templater.py

Paste the following to stdin of running script:

qemu_img=../../build/qemu-img-{old|new}
$qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
$qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
        -w -t none -n /ssd/x.qcow2

The result:

All results are in seconds

------------------  ---------  ---------
                    old        new
-s 2K               6.7 ± 15%  6.2 ± 12%
                                 -7%
-s 2K -o 512        13 ± 3%    11 ± 5%
                                 -16%
-s $((1024*2+512))  9.5 ± 4%   8.4
                                 -12%
------------------  ---------  ---------

So small writes are more independent now and that helps to keep deeper
io queue which improves performance.

271 iotest output becomes racy for three allocation in one cluster.
Second and third writes may finish in different order. Second and
third requests don't depend on each other any more. Still they both
depend on first request anyway. Filter out second and third write
offsets to cover both possible outputs.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210824101517.59802-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
[hreitz: s/ an / and /]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

qcow2: refactor handle_dependencies() loop body

No logic change, just prepare for the following commit. While being
here do also small grammar fix in a comment.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210824101517.59802-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

simplebench: add img_bench_templater.py

Add simple grammar-parsing template benchmark. New tool consume test
template written in bash with some special grammar injections and
produces multiple tests, run them and finally print a performance
comparison table of different tests produced from one template.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210824101517.59802-2-vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block: bdrv_inactivate_recurse(): check for permissions and fix crash

We must not inactivate child when parent has write permissions on
it.

Calling .bdrv_inactivate() doesn't help: actually only qcow2 has this
handler and it is used to flush caches, not for permission
manipulations.

So, let's simply check cumulative parent permissions before
inactivating the node.

This commit fixes a crash when we do migration during backup: prior to
the commit nothing prevents all nodes inactivation at migration finish
and following backup write to the target crashes on assertion
"assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().

After the commit, we rely on the fact that copy-before-write filter
keeps write permission on target node to be able to write to it. So
inactivation fails and migration fails as expected.

Corresponding test now passes, so, enable it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210911120027.8063-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

tests: add migrate-during-backup

Add a simple test which tries to run migration during backup.
bdrv_inactivate_all() should fail. But due to bug (see next commit with
fix) it doesn't, nodes are inactivated and continued backup crashes
on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210911120027.8063-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()

In mirror_iteration() we call mirror_wait_on_conflicts() with
`self` parameter set to NULL.

Starting from commit d44dae1a7c we dereference `self` pointer in
mirror_wait_on_conflicts() without checks if it is not NULL.

Backtrace:
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
      at ../block/mirror.c:172
  172                 self->waiting_for_op = op;
  [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))]
  (gdb) bt
  #0  mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
      at ../block/mirror.c:172
  #1  0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491
  #2  0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917
  #3  0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
      at ../util/coroutine-ucontext.c:173
  #4  0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
      from /usr/lib64/libc.so.6

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404
Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210910124533.288318-1-sgarzare@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

iotests/297: Cover tests/

297 so far does not check the named tests, which reside in the tests/
directory (i.e. full path tests/qemu-iotests/tests). Fix it.

Thanks to the previous two commits, all named tests pass its scrutiny,
so we do not have to add anything to SKIP_FILES.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210902094017.32902-6-hreitz@redhat.com>

mirror-top-perms: Fix AbnormalShutdown path

The AbnormalShutdown exception class is not in qemu.machine, but in
qemu.machine.machine. (qemu.machine.AbnormalShutdown was enough for
Python to find it in order to run this test, but pylint complains about
it.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210902094017.32902-5-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

migrate-bitmaps-test: Fix pylint warnings

There are a couple of things pylint takes issue with:
- The "time" import is unused
- The import order (iotests should come last)
- get_bitmap_hash() doesn't use @self and so should be a function
- Semicolons at the end of some lines
- Parentheses after "if"
- Some lines are too long (80 characters instead of 79)
- inject_test_case()'s @name parameter shadows a top-level @name
  variable
- "lambda self: mc(self)" were equivalent to just "mc", but in
  inject_test_case(), it is not equivalent, so add a comment and disable
  the warning locally
- Always put two empty lines after a function
- f'exec: cat > /dev/null' does not need to be an f-string

Fix them.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210902094017.32902-4-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

migrate-bitmaps-postcopy-test: Fix pylint warnings

pylint complains that discards1_sha256 and all_discards_sha256 are first
set in non-__init__ methods.

These variables are not really class-variables anyway, so let them
instead be returned by start_postcopy(), thus silencing pylint.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902094017.32902-3-hreitz@redhat.com>

iotests/297: Drop 169 and 199 from the skip list

169 and 199 have been renamed and moved to tests/ (commit a44be0334be:
"iotests: rename and move 169 and 199 tests"), so we can drop them from
the skip list.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210902094017.32902-2-hreitz@redhat.com>

iotests: Fix use-{list,dict}-literal warnings

pylint proposes using `[]` instead of `list()` and `{}` instead of
`dict()`, because it is faster. That seems simple enough, so heed its
advice.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210824153540.177128-3-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

iotests: Fix unspecified-encoding pylint warnings

As of recently, pylint complains when `open()` calls are missing an
`encoding=` specified. Everything we have should be UTF-8 (and in fact,
everything should be UTF-8, period (exceptions apply)), so use that.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210824153540.177128-2-hreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>

block/iscsi: Do not force-cap *pnum

bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-7-hreitz@redhat.com>

block/gluster: Do not force-cap *pnum

bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-6-hreitz@redhat.com>

block/file-posix: Do not force-cap *pnum

bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-5-hreitz@redhat.com>

block: Clarify that @bytes is no limit on *pnum

.bdrv_co_block_status() implementations are free to return a *pnum that
exceeds @bytes, because bdrv_co_block_status() in block/io.c will clamp
*pnum as necessary.

On the other hand, if drivers' implementations return values for *pnum
that are as large as possible, our recently introduced block-status
cache will become more effective.

So, make a note in block_int.h that @bytes is no upper limit for *pnum.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210812084148.14458-4-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: block-status cache for data regions

As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform.  The
main difference is that this time it is implemented as part of the
general block layer code.

The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow.  Given the
implementation is outside of qemu, there is little we can do about its
performance.

We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts.  So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.

To address this, we want to cache data regions.  Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.

(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine.  While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)

We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block: Drop BDS comment regarding bdrv_append()

There is a comment above the BDS definition stating care must be taken
to consider handling newly added fields in bdrv_append().

Actually, this comment should have said "bdrv_swap()" as of 4ddc07cac
(nine years ago), and in any case, bdrv_swap() was dropped in
8e419aefa (six years ago). So no such care is necessary anymore.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-2-hreitz@redhat.com>

gluster: Align block-status tail

gluster's block-status implementation is basically a copy of that in
block/file-posix.c, there is only one thing missing, and that is
aligning trailing data extents to the request alignment (as added by
commit 9c3db310ff0).

Note that 9c3db310ff0 mentions that "there seems to be no other block
driver that sets request_alignment and [...]", but while block/gluster.c
does indeed not set request_alignment, block/io.c's
bdrv_refresh_limits() will still default to an alignment of 512 because
block/gluster.c does not provide a byte-aligned read function.
Therefore, unaligned tails can conceivably occur, and so we should apply
the change from 9c3db310ff0 to gluster's block-status implementation.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210805143603.59503-1-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

configure: add missing pc-bios/qemu_vga.ndrv symlink in build tree

Ensure that a link to pc-bios/qemu_vga.ndrv is added to the build tree,
otherwise the optional MacOS client driver will not be loaded by OpenBIOS
when launching QEMU directly from the build directory.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210831165020.84855-1-programmingkidx@gmail.com>
[lv: commit message rewording as suggested by Mark]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

spelling: sytem => system

Signed-off-By: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <fefb5f5c-82bc-05e2-b4c1-665e9d6896ff@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

qdev: Complete qdev_init_gpio_out() documentation

qdev_init_gpio_out() states it "creates an array of anonymous
output GPIO lines" but doesn't document how this array is
released. Add a note that it is automatically free'd in qdev
instance_finalize().

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210819142731.2827912-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

hw/i386/acpi-build: Fix a typo

Fix 'hotplugabble' -> 'hotpluggable' typo.

Reviewed-by: Ani Sinha <ani@anisinha.ca>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210911082036.436139-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

util: Remove redundant checks in the openpty()

As we can see from the following function call stack, amaster and aslave
can not be NULL: char_pty_open() -> qemu_openpty_raw() -> openpty().
In addition, according to the API specification for openpty():
https://www.gnu.org/software/libc/manual/html_node/Pseudo_002dTerminal-Pairs.html,
the arguments name, termp and winp can all be NULL, but arguments amaster or aslave
can not be NULL.
Finally, amaster and aslave has been dereferenced at the beginning of the openpty().
So the checks on amaster and aslave in the openpty() are redundant. Remove them.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <5F9FE5B8.1030803@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>