Anthony Liguori [Mon, 28 Jan 2013 20:46:45 +0000 (14:46 -0600)]
Merge remote-tracking branch 'kwolf/for-anthony' into staging
# By Paolo Bonzini (14) and others
# Via Kevin Wolf
* kwolf/for-anthony: (24 commits)
ide: Add fall through annotations
block: Create proper size file for disk mirror
ahci: Add migration support
ahci: Change data types in preparation for migration
ahci: Remove unused AHCIDevice fields
hbitmap: add assertion on hbitmap_iter_init
mirror: do nothing on zero-sized disk
block/vdi: Check for bad signature
block/vdi: Improved return values from vdi_open
block/vdi: Improve debug output for signature
block: Use error code EMEDIUMTYPE for wrong format in some block drivers
block: Add special error code for wrong format
mirror: support arbitrarily-sized iterations
mirror: support more than one in-flight AIO operation
mirror: add buf-size argument to drive-mirror
mirror: switch mirror_iteration to AIO
mirror: allow customizing the granularity
block: allow customizing the granularity of the dirty bitmap
block: return count of dirty sectors, not chunks
mirror: perform COW if the cluster size is bigger than the granularity
...
Anthony Liguori [Mon, 28 Jan 2013 20:41:25 +0000 (14:41 -0600)]
Merge remote-tracking branch 'luiz/queue/qmp' into staging
# By Lei Li (3) and others
# Via Luiz Capitulino
* luiz/queue/qmp:
QAPI: Introduce memchar-read QMP command
QAPI: Introduce memchar-write QMP command
qemu-char: Add new char backend CirMemCharDriver
docs: document virtio-balloon stats
balloon: re-enable balloon stats
balloon: drop old stats code & API
block: Monitor command commit neglects to report some errors
Default to moving back to the IDLE state after the COLLECTING_DATA
state. For a well behaved guest this patch has no consequence, but
A bad guest could crash QEMU by using one of the erase commands
followed by a longer than 5 byte argument (undefined behaviour).
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
xilinx_ethlite: Flush queued packets on SW service
Software services a received packet by clearing the CTRL_S bit in the RX_CTRLn
register. If this bit is cleared, flush any packets queued for the device.
Reported-by: John Williams <john.williams@xilinx.com> Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
The eth_can_rx() function only checks the first buffers status ("ping"). The
controller should be able to receive into "pong" when ping-pong is enabled.
Checks the active buffer (either "ping" or "pong") when determining can_rx()
rather than just testing "ping".
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Blue Swirl [Sat, 26 Jan 2013 14:18:28 +0000 (14:18 +0000)]
Merge branch 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf
* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf:
PPC: e500: Select MPIC v4.2 on ppce500 platform
PPC: e500: fix mpic_iack address
openpic: add basic support for MPIC v4.2
openpic: fix timer address decoding
openpic: fix remaining issues from idr-to-destmask conversion
pseries: Adjust default VIO address allocations to play better with libvirt
pseries: Improve handling of multiple PCI host bridges
target-ppc: Give a meaningful error if too many threads are specified
cuda: Move ADB bus into CUDA state
adb: QOM'ify ADB devices
adb: QOM'ify Apple Desktop Bus
cuda: QOM'ify CUDA
ide/macio: QOM'ify MacIO IDE
mac_nvram: QOM'ify MacIO NVRAM
mac_nvram: Mark as Big Endian
mac_nvram: Clean up public API
macio: Split MacIO in two
macio: Delay qdev init until all fields are initialized
macio: QOM'ify some more
ppc: Move Mac machines to hw/ppc/
Andreas Färber [Sat, 26 Jan 2013 11:45:14 +0000 (12:45 +0100)]
tests: Add gcov support for x86_64 qtest
Since x86_64 is a superset of i386 and reuses all its test cases, adopt
all the i386 gcov source files as well, substituting their paths
appropriately.
Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Michael Tokarev [Fri, 25 Jan 2013 17:23:24 +0000 (21:23 +0400)]
vmware_vga: fix out of bounds and invalid rects updating
This is a follow up for several attempts to fix this issue.
Previous incarnations:
1. http://thread.gmane.org/gmane.linux.ubuntu.bugs.general/3156089
https://bugs.launchpad.net/bugs/918791
"qemu-kvm dies when using vmvga driver and unity in the guest" bug.
Fix by Serge Hallyn:
https://launchpadlibrarian.net/94916786/qemu-vmware.debdiff
This fix is incomplete, since it does not check width and height
for being negative. Serge weren't sure if that's the right place
to fix it, maybe the fix should be up the stack somewhere.
2. http://thread.gmane.org/gmane.comp.emulators.qemu/166064
by Marek Vasut: "vmware_vga: Redraw only visible area"
This one adds the (incomplete) check to vmsvga_update_rect_delayed(),
the routine just queues the rect updating but does no interesting
stuff. It is also incomplete in the same way as patch by Serge,
but also does not touch width&height at all after adjusting x&y,
which is wrong.
As far as I can see, when processing guest requests, the device
places them into a queue (vmsvga_update_rect_delayed()) and
processes this queue in different place/time, namely, in
vmsvga_update_rect(). Sometimes, vmsvga_update_rect() is
called directly, without placing the request to the gueue.
This is the place this patch changes, which is the last
(deepest) in the stack. I'm not sure if this is the right
place still, since it is possible we have some queue optimization
(or may have in the future) which will be upset by negative/wrong
values here, so maybe we should check for validity of input
right when receiving request from the guest (and maybe even
use unsigned types there). But I don't know the protocol
and implementation enough to have a definitive answer.
But since vmsvga_update_rect() has other sanity checks already,
I'm adding the missing ones there as well.
Cc'ing BALATON Zoltan and Andrzej Zaborowski who shows in `git blame'
output and may know something in this area.
If this patch is accepted, it should be applied to all active
stable branches (at least since 1.1, maybe even before), with
minor context change (ds_get_*(s->vga.ds) => s->*). I'm not
Cc'ing -stable yet, will do it explicitly once the patch is
accepted.
BTW, these checks use fprintf(stderr) -- it should be converted
to something more appropriate, since stderr will most likely
disappear somewhere.
Cc: Marek Vasut <marex@denx.de> CC: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: BALATON Zoltan <balaton@eik.bme.hu> Cc: Andrzej Zaborowski <balrogg@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Paolo Bonzini [Tue, 15 Jan 2013 08:49:37 +0000 (09:49 +0100)]
build: remove *.lo, *.a, *.la files from all subdirectories on make clean
.lo files in stubs/, util/ and libcacard/ were not cleaned.
Fix this.
Cc: Blue Swirl <blauwirbel@gmail.com> Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Peter Maydell [Thu, 24 Jan 2013 19:02:28 +0000 (19:02 +0000)]
hw/arm_boot: Align device tree to 4KB boundary, not page
Align the device tree blob to a 4KB boundary, not to QEMU's
idea of a page boundary -- the latter is the smallest possible
page size for the architecture, which on ARM is 1KB.
The documentation for Linux does not impose separation
or alignment requirements on the device tree blob, but
in practice some kernels will happily trash the entire
page the initrd ends in after they have finished uncompressing
the initrd. So 4KB-align the DTB to ensure it does not get
trampled by these kernels.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Andreas Färber [Thu, 24 Jan 2013 15:47:55 +0000 (16:47 +0100)]
make_device_config.sh: Fix target path in generated dependency file
config-devices.mak.d is included from Makefile.target, i.e. from inside
the *-softmmu/ directory. It included the directory path, so never
applied to the actual ./config-devices.mak. Symptoms were spurious
build failures due to missing dependency on default-configs/pci.mak.
Fix this by using `basename` to strip the directory path.
Reported-by: Gerhard Wiesinger <lists@wiesinger.com> Cc: qemu-stable@nongnu.org Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
fw_cfg: Splash image loader can overrun a stack variable, fix
read_splashfile() passes the address of an int variable as size_t *
parameter to g_file_get_contents(), with a cast to gag the compiler.
No problem on machines where sizeof(size_t) == sizeof(int).
Happens to work on my x86_64 box (64 bit little endian): the least
significant 32 bits of the file size end up in the right place
(caller's variable file_size), and the most significant 32 bits
clobber a place that gets assigned to before its next use (caller's
variable file_type).
I'd expect it to break on a 64 bit big-endian box.
Fix up the variable types and drop the problematic cast.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
softfloat: Handle float_muladd_negate_c when product is zero
Honour float_muladd_negate_c in the case where the product is zero and
c is nonzero. Previously we would fail to negate c.
Seen in (and tested against) the gfortran testsuite on MIPS.
Signed-off-by: Richard Sandiford <rdsandiford@googlemail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Peter Maydell [Mon, 21 Jan 2013 12:50:56 +0000 (12:50 +0000)]
hw/pxa2xx_timer: Explicitly mark fallthroughs
Explicitly mark the fallthroughs as intentional in the code
pattern where we gradually increment an index before falling
into the code to read/write that array entry:
case THINGY_3: idx++;
case THINGY_2: idx++;
case THINGY_1: idx++;
case THINGY_0: return s->thingy[idx];
This makes static analysers happy.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Peter Maydell [Mon, 21 Jan 2013 12:50:55 +0000 (12:50 +0000)]
hw/smc91c111: Add explicit 'return' rather than relying on fallthrough
Add an explicit 'return' statement to a case in smc91c111_readb
rather than relying on fallthrough to the following case's
return statement, for code clarity and to placate static analysers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Peter Maydell [Mon, 21 Jan 2013 12:50:53 +0000 (12:50 +0000)]
hw/omap_dma, hw/omap_spi: Explicitly mark fallthroughs
Explicitly mark the fallthroughs as intentional in the code
pattern where we gradually increment an index before falling
into the code to read/write that array entry:
case THINGY_3: idx++;
case THINGY_2: idx++;
case THINGY_1: idx++;
case THINGY_0: return s->thingy[idx];
This makes static analysers happy.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Peter Maydell [Mon, 21 Jan 2013 12:50:52 +0000 (12:50 +0000)]
hw/omap1.c: Add fallthrough markers and breaks
Explicitly mark cases where we are deliberately falling
through to the following code. In one case we insert a
'break' instead of falling through to a 'break', as this
seems slightly clearer.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Peter Maydell [Mon, 21 Jan 2013 12:50:51 +0000 (12:50 +0000)]
hw/arm_sysctl.c: Add missing 'break' statements
Add some break statements that were accidentally omitted
from some cases of arm_sysctl_write(). The omission was
harmless because in both cases the following case did
an immediate break, but adding the breaks explicitly
placates static analysers and avoids weird behaviour if
the following register is ever implemented as something
other than a no-op.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Michael Tokarev [Sat, 19 Jan 2013 14:58:09 +0000 (18:58 +0400)]
link seccomp only with softmmu targets
Now, if seccomp is detected, it is linked into every executable,
but is used only by softmmu targets (from vl.c). So link it
only where it is actually needed.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Paolo Bonzini [Sat, 19 Jan 2013 10:06:48 +0000 (11:06 +0100)]
build: remove extra-obj-y
extra-obj-y is somewhat complicated to understand. Replace it with a
special CONFIG_ALL symbol that is defined only at toplevel.
This limits the case of directories defining more than one
*-obj-y target.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Paolo Bonzini [Sat, 19 Jan 2013 10:06:47 +0000 (11:06 +0100)]
build: remove universal-obj-y
All of universal-obj-y, user-obj-y (right now unused) and common-obj-y can
be unified into common-obj-y if we take care of defining CONFIG_SOFTMMU
and CONFIG_USER_ONLY in the toplevel makefile. This is similar to how
we define symbols for hardware components.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Paolo Bonzini [Sat, 19 Jan 2013 10:06:45 +0000 (11:06 +0100)]
build: move around libcacard-y definition
It is also needed if !CONFIG_SOFTMMU, unlike everything that surrounds it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Scott Wood [Mon, 21 Jan 2013 15:53:55 +0000 (15:53 +0000)]
PPC: e500: Select MPIC v4.2 on ppce500 platform
The compatible string is changed to fsl,mpic on all e500 platforms, to
advertise the existence of BRR1. This matches what the device tree will
have on real hardware.
With MPIC v4.2 max_cpu can be increased from 15 to 32.
Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
Scott Wood [Mon, 21 Jan 2013 15:53:53 +0000 (15:53 +0000)]
openpic: add basic support for MPIC v4.2
Besides the new value in the version register, this provides:
- ILR support, which includes:
- IDR becoming a pure CPU bitmap, allowing 32 CPUs
- machine check output support (though other parts of QEMU need to
be fixed for it to do something other than immediately reboot the
guest)
- dummy error interrupt support (EISR0/EIMR0 read as zero)
- actually all FSL MPICs get all summary registers returning zero for now,
which includes EISR0/EIMR0
Various refactoring is done to support these changes and to ease
new functionality (e.g. a more flexible way of declaring regions).
Just as the code was already not a full implementation of MPIC v2.0,
this is not a full implementation of MPIC v4.2 -- e.g. it still has only
one bank of MSIs.
Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
Scott Wood [Mon, 21 Jan 2013 15:53:52 +0000 (15:53 +0000)]
openpic: fix timer address decoding
The timer memory range begins at 0x10f0, so that address 0x1120 shows
up as 0x30, 0x1130 shows up as 0x40, etc. However, the address
decoding (other than TFRR) is not adjusted for this, causing the
wrong registers to be accessed.
Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
Scott Wood [Mon, 21 Jan 2013 15:53:51 +0000 (15:53 +0000)]
openpic: fix remaining issues from idr-to-destmask conversion
openpic_update_irq() was checking idr rather than destmask, treating
it as if it were a simple bitmap of cpus. Changed to use destmask.
IPI delivery was removing bits directly from .idr, without calling
write_IRQreg_idr so that the change could be conveyed to destmask.
Changed to use destmask directly.
Save/restore destmask when serializing, as due to the IPI change it
cannot be reproduced from idr.
Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
David Gibson [Wed, 23 Jan 2013 17:20:43 +0000 (17:20 +0000)]
pseries: Adjust default VIO address allocations to play better with libvirt
Currently, if VIO devices for pseries don't have addresses explicitly
allocated, they get automatically numbered from 0x1000. This is in the
same general range that libvirt will typically assign VIO device addresses.
That means that if there is a device libvirt doesn't know about, and it
gets an address assigned before the libvirt assigned devices are processed,
we can end up with an address conflict (qemu will abort with an error).
While the real solution is to teach libvirt about the other devices, so it
can correctly manage the whole allocation, this patch reduces the interim
inconvenience by moving qemu allocations to a range that libvirt is less
likely to conflict with.
Because the guest gets the device addresses through the device tree, these
addresses are truly arbitrary and can be changed without breaking guests.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
David Gibson [Wed, 23 Jan 2013 17:20:39 +0000 (17:20 +0000)]
pseries: Improve handling of multiple PCI host bridges
Multiple - even many - PCI host bridges (i.e. PCI domains) are very
common on real PAPR compliant hardware. For reasons related to the
PAPR specified IOMMU interfaces, PCI device assignment with VFIO will
generally require at least two (virtual) PHBs and possibly more
depending on which devices are assigned.
At the moment the qemu PAPR PCI code will not deal with this well,
leaving several crucial parameters of PHBs other than the default one
uninitialized. This patch reworks the code to allow this.
Every PHB needs a unique BUID (Bus Unit Identifier, the id used for
the PAPR PCI related interfaces) and a unique LIOBN (Logical IO Bus
Number, the id used for the PAPR IOMMU related interfaces). In
addition they need windows in CPU real address space to access PCI
memory space, PCI IO space and MSIs. Properties are added to the PCI
host bridge qdevice to allow configuration of all these.
To simplify configuration of multiple PHBs for common cases, a
convenience "index" property is also added. This can be set instead
of the low-level properties, and will generate suitable values for the
other parameters, different for each index value.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
Mike Qiu [Wed, 23 Jan 2013 17:20:38 +0000 (17:20 +0000)]
target-ppc: Give a meaningful error if too many threads are specified
Currently the target-ppc tcg code only supports a single thread. You can
specify more, but they're treated identically to multiple cores. On KVM
we obviously can't support more threads than the hardware; if more are
specified it will cause strange and cryptic errors.
This patch clarifies the situation by giving a simple meaningful error if
more threads are specified than we can support.
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
Andreas Färber [Wed, 23 Jan 2013 23:04:04 +0000 (23:04 +0000)]
adb: QOM'ify ADB devices
They were not qdev'ified before. Derive ADBDevice from DeviceState and
convert reset callbacks to DeviceClass::reset, ADBDevice::opaque pointer
to ADBDevice subtypes for mouse and keyboard and adb_{kbd,mouse}_init()
to regular qdev functions.
Fixing Coding Style issues and splitting keyboard and mouse off into
their own files is left for a later point in time.
Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>
Andreas Färber [Wed, 23 Jan 2013 23:04:03 +0000 (23:04 +0000)]
adb: QOM'ify Apple Desktop Bus
It was not a qbus before, turn it into a first-class bus and initialize
it properly from CUDA. Leave it a global variable as long as devices are
not QOM'ified yet.
Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>
The qmp monitor command to mirror a disk was passing -1 for size
along with the disk's backing file. This size of the resulting disk
is the size of the backing file, which is incorrect if the disk
has been resized. Therefore we should always pass in the size of
the current disk.
Signed-off-by: Vishvananda Ishaya <vishvananda@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Jason Baron [Fri, 4 Jan 2013 19:44:42 +0000 (14:44 -0500)]
ahci: Add migration support
Jason tested these patches by migrating Windows 7 and Fedora 17 guests
(while under I/O) on both piix with ahci attached and on q35 (which has
a built-in AHCI controller).
Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Tue, 22 Jan 2013 14:01:12 +0000 (15:01 +0100)]
hbitmap: add assertion on hbitmap_iter_init
hbitmap_iter_init causes an out-of-bounds access when the "first"
argument is or greater than or equal to the size of the bitmap.
Forbid this with an assertion, and remove the failing testcase.
Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Tue, 22 Jan 2013 14:01:11 +0000 (15:01 +0100)]
mirror: do nothing on zero-sized disk
On a zero-sized disk we need to break out of the job successfully
before bdrv_dirty_iter_init is called, otherwise you will get an
assertion failure with the next patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Weil [Thu, 17 Jan 2013 20:45:24 +0000 (21:45 +0100)]
block: Add special error code for wrong format
The block drivers need a special error code for "wrong format".
From the available error codes EMEDIUMTYPE fits best.
It is not available on all platforms, so a definition in
qemu-common.h and a specific error report are needed.
Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Tue, 22 Jan 2013 08:03:15 +0000 (09:03 +0100)]
mirror: support arbitrarily-sized iterations
Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks. This limits the number of I/O operations and makes
mirroring efficient even with a small granularity. Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Tue, 22 Jan 2013 08:03:14 +0000 (09:03 +0100)]
mirror: support more than one in-flight AIO operation
With AIO support in place, we can start copying more than one chunk
in parallel. This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.
Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.
In addition, two different iterations on the HBitmap may want to
copy the same cluster. We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a pretty rare occurrence, though; as long as there is
no overlap the next iteration can start before the previous one finishes.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Tue, 22 Jan 2013 08:03:12 +0000 (09:03 +0100)]
mirror: switch mirror_iteration to AIO
There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target. However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Mon, 21 Jan 2013 16:09:46 +0000 (17:09 +0100)]
mirror: allow customizing the granularity
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.
Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Mon, 21 Jan 2013 16:09:44 +0000 (17:09 +0100)]
block: return count of dirty sectors, not chunks
Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Mon, 21 Jan 2013 16:09:43 +0000 (17:09 +0100)]
mirror: perform COW if the cluster size is bigger than the granularity
When mirroring runs, the backing files for the target may not yet be
ready. However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros. Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying). So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.
However, we want to lower the granularity for efficiency, so we need
a better solution. The solution is to always copy a whole cluster the
first time it is touched. The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Mon, 21 Jan 2013 16:09:42 +0000 (17:09 +0100)]
block: make round_to_clusters public
This is needed in the following patch.
Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Mon, 21 Jan 2013 16:09:41 +0000 (17:09 +0100)]
block: implement dirty bitmap using HBitmap
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.
Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts) Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini [Mon, 21 Jan 2013 16:09:40 +0000 (17:09 +0100)]
add hierarchical bitmap data type and test cases
HBitmaps provides an array of bits. The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.
In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level. When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).
Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):
bits 0-57 => word in the last bitmap | bits 58-63 => bit in the word
bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits. To move down, you shift the index left
similarly, and add the word index within the group. Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.
Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
When iterating on a bitmap, each bit (on any level) is only visited
once. Hence, The total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps. Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.
Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Luiz Capitulino [Sat, 1 Dec 2012 02:14:57 +0000 (00:14 -0200)]
balloon: re-enable balloon stats
The statistics are now available through device properties via a
polling mechanism. First a client has to enable polling, then it
can query available stats.
Polling is enabled by setting an update interval (in seconds)
to a property named guest-stats-polling-interval, like this:
Jeff Cody [Fri, 18 Jan 2013 17:45:35 +0000 (12:45 -0500)]
block: Monitor command commit neglects to report some errors
The non-live bdrv_commit() function may return one of the following
errors: -ENOTSUP, -EBUSY, -EACCES, -EIO. The only error that is
checked in the HMP handler is -EBUSY, so the monitor command 'commit'
silently fails for all error cases other than 'Device is in use'.
Report error using monitor_printf() and strerror(), and convert existing
qerror_report() calls in do_commit() to monitor_printf().
Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Anthony Liguori [Thu, 24 Jan 2013 18:56:02 +0000 (12:56 -0600)]
Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Paolo Bonzini (1) and Peter Lieven (1)
# Via Paolo Bonzini
* bonzini/scsi-next:
iscsi: add support for iovectors
iscsi: do not leak acb->buf when commands are aborted
I'm not sure if the retry logic has ever worked when not using FIFO mode. I
found this while writing a test case although code inspection confirms it is
definitely broken.
The TSR retry logic will never actually happen because it is guarded by an
'if (s->tsr_rety > 0)' but this is the only place that can ever make the
variable greater than zero. That effectively makes the retry logic an 'if (0)
I believe this is a typo and the intention was >= 0. Once this is fixed thoug
I see double transmits with my test case. This is because in the non FIFO
case, serial_xmit may get invoked while LSR.THRE is still high because the
character was processed but the retransmit timer was still active.
We can handle this by simply checking for LSR.THRE and returning early. It's
possible that the FIFO paths also need some attention.
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Even if the previous logic was never worked, new logic breaks stuff -
namely,
Peter Lieven [Mon, 3 Dec 2012 19:35:15 +0000 (20:35 +0100)]
iscsi: add support for iovectors
This patch adds support for directly passing the iovec
array from QEMUIOVector if libiscsi supports it (1.8.0
or newer).
Signed-off-by: Peter Lieven <pl@kamp.de>
[Preserve the improvements from commit 4cc841b, iscsi: partly
avoid iovec linearization in iscsi_aio_writev, 2012-11-19 - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 22 Jan 2013 16:34:29 +0000 (17:34 +0100)]
iscsi: do not leak acb->buf when commands are aborted
acb->buf is freed in the WRITE(16) callback, but this may not
get called at all when commands are aborted. Add another
free in the ABORT TASK callback, which requires setting acb->buf
to NULL everywhere.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Grant Likely [Wed, 23 Jan 2013 16:15:25 +0000 (16:15 +0000)]
trivial: etraxfs_eth: Eliminate checkpatch errors
This is a trivial patch to harmonize the coding style on
hw/etraxfs_eth.c. This is in preparation to split off the bitbang mdio
code into a separate file.
Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Paul Brook <paul@codesourcery.com> Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Andreas Färber <afaerber@suse.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Anthony Liguori [Wed, 23 Jan 2013 15:08:54 +0000 (09:08 -0600)]
Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Peter Lieven (3) and others
# Via Paolo Bonzini
* bonzini/scsi-next:
scsi: Drop useless null test in scsi_unit_attention()
lsi: use qbus_reset_all to reset SCSI bus
scsi: fix segfault with 0-byte disk
iscsi: add support for iSCSI NOPs [v2]
iscsi: partly avoid iovec linearization in iscsi_aio_writev
iscsi: add iscsi_create support
scsi: Drop useless null test in scsi_unit_attention()
req was created by scsi_req_alloc(), which initializes req->dev to a
value it dereferences. req->dev isn't changed anywhere else.
Therefore, req->dev can't be null.
Drop the useless null test; it spooks Coverity.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Paolo Bonzini [Thu, 10 Jan 2013 14:08:05 +0000 (15:08 +0100)]
scsi: fix segfault with 0-byte disk
When a 0-sized disk is found, READ CAPACITY will return a
LUN NOT READY error. However, because it returns -1 instead
of zero, the HBA will call scsi_req_continue. This will
typically cause a segmentation fault or an assertion failure.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Lieven [Thu, 6 Dec 2012 09:46:47 +0000 (10:46 +0100)]
iscsi: add support for iSCSI NOPs [v2]
This patch will send NOP-Out PDUs every 5 seconds to the iSCSI target.
If a consecutive number of NOP-In replies fail a reconnect is initiated.
iSCSI NOPs help to ensure that the connection to the target is still operational.
This should not, but in reality may be the case even if the TCP connection is still
alive if there are bugs in either the target or the initiator implementation.
v2:
- track the NOPs inside libiscsi so libiscsi can reset the counter
in case it initiates a reconnect.
Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Lieven [Mon, 19 Nov 2012 14:58:31 +0000 (15:58 +0100)]
iscsi: partly avoid iovec linearization in iscsi_aio_writev
libiscsi expects all write16 data in a linear buffer. If the
iovec only contains one buffer we can skip the linearization
step as well as the additional malloc/free and pass the
buffer directly.
Reported-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Lieven [Sat, 17 Nov 2012 15:13:24 +0000 (16:13 +0100)]
iscsi: add iscsi_create support
This patch adds support for bdrv_create. This allows e.g.
to use qemu-img to convert from any supported device to
an iscsi backed storage as destination.
Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>