Michael Roth [Tue, 24 Sep 2019 17:18:07 +0000 (12:18 -0500)]
slrip: ip_reass: Fix use after free
Using ip_deq after m_free might read pointers from an allocation reuse.
This would be difficult to exploit, but that is still related with
CVE-2019-14378 which generates fragmented IP packets that would trigger this
issue and at least produce a DoS.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(from libslirp.git commit c59279437eda91841b9d26079c70b8a540d41204) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
qemu-bridge-helper: restrict interface name to IFNAMSIZ
The network interface name in Linux is defined to be of size
IFNAMSIZ(=16), including the terminating null('\0') byte.
The same is applied to interface names read from 'bridge.conf'
file to form ACL rules. If user supplied '--br=bridge' name
is not restricted to the same length, it could lead to ACL bypass
issue. Restrict interface name to IFNAMSIZ, including null byte.
Reported-by: Riccardo Schirone <rschiron@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 6f5d8671225dc77190647f18a27a0d156d4ca97a) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Mon, 7 Jan 2019 12:02:48 +0000 (13:02 +0100)]
block: Fix hangs in synchronous APIs with iothreads
In the block layer, synchronous APIs are often implemented by creating a
coroutine that calls the asynchronous coroutine-based implementation and
then waiting for completion with BDRV_POLL_WHILE().
For this to work with iothreads (more specifically, when the synchronous
API is called in a thread that is not the home thread of the block
device, so that the coroutine will run in a different thread), we must
make sure to call aio_wait_kick() at the end of the operation. Many
places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
the condition has long become false.
Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
generally not enough for most other operations because they haven't set
the return value in the coroutine entry stub yet. To avoid race
conditions there, we need to kick after setting the return value.
The race window is small enough that the problem doesn't usually surface
in the common path. However, it does surface and causes easily
reproducible hangs if the operation can return early before even calling
bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
success paths).
The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
slightly different: These functions even neglected to schedule the
coroutine in the home thread of the node. This avoids the hang, but is
obviously wrong, too. Fix those to schedule the coroutine in the right
AioContext in addition to adding aio_wait_kick() calls.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 4720cbeea1f42fd905fc69338fd42b191e58b412) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
pvrdma: check return value from pvrdma_idx_ring_has_ routines
pvrdma_idx_ring_has_[data/space] routines also return invalid
index PVRDMA_INVALID_IDX[=-1], if ring has no data/space. Check
return value from these routines to avoid plausible infinite loops.
Reported-by: Li Qiang <liq3ea@163.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
(cherry picked from commit f1e2e38ee0136b7710a2caa347049818afd57a1b) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When creating CQ/QP rings, an object can have up to
PVRDMA_MAX_FAST_REG_PAGES 8 pages. Check 'npages' parameter
to avoid excessive memory allocation or a null dereference.
Reported-by: Li Qiang <liq3ea@163.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
(cherry picked from commit 2c858ce5da8ae6689c75182b73bc455a291cad41) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
device_tree: Fix integer overflowing in load_device_tree()
If the value of get_image_size() exceeds INT_MAX / 2 - 10000, the
computation of @dt_size overflows to a negative number, which then
gets converted to a very large size_t for g_malloc0() and
load_image_size(). In the (fortunately improbable) case g_malloc0()
succeeds and load_image_size() survives, we'd assign the negative
number to *sizep. What that would do to the callers I can't say, but
it's unlikely to be good.
Fix by rejecting images whose size would overflow.
Reported-by: Kurtis Miller <kurtis.miller@nccgroup.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190409174018.25798-1-armbru@redhat.com>
(cherry picked from commit 065e6298a75164b4347682b63381dbe752c2b156) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Fri, 14 Dec 2018 13:30:52 +0000 (13:30 +0000)]
device_tree.c: Don't use load_image()
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Instead use load_image_size().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181130151712.2312-9-peter.maydell@linaro.org
(cherry picked from commit da885fe1ee8b4589047484bd7fa05a4905b52b17) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When releasing spice resources in release_resource() routine,
if release info object 'ext.info' is null, it leads to null
pointer dereference. Add check to avoid it.
seccomp: don't kill process for resource control syscalls
The Mesa library tries to set process affinity on some of its threads in
order to optimize its performance. Currently this results in QEMU being
immediately terminated when seccomp is enabled.
Mesa doesn't consider failure of the process affinity settings to be
fatal to its operation, but our seccomp policy gives it no choice in
gracefully handling this denial.
It is reasonable to consider that malicious code using the resource
control syscalls to be a less serious attack than if they were trying
to spawn processes or change UIDs and other such things. Generally
speaking changing the resource control setting will "merely" affect
quality of service of processes on the host. With this in mind, rather
than kill the process, we can relax the policy for these syscalls to
return the EPERM errno value. This allows callers to detect that QEMU
does not want them to change resource allocations, and apply some
reasonable fallback logic.
The main downside to this is for code which uses these syscalls but does
not check the return value, blindly assuming they will always
succeeed. Returning an errno could result in sub-optimal behaviour.
Arguably though such code is already broken & needs fixing regardless.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 9a1565a03b79d80b236bc7cc2dbce52a2ef3a1b8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
slirp: check data length while emulating ident function
While emulating identification protocol, tcp_emu() does not check
available space in the 'sc_rcv->sb_data' buffer. It could lead to
heap buffer overflow issue. Add check to avoid it.
Reported-by: Kira <864786842@qq.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit a7104eda7dab99d0cdbd3595c211864cba415905) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Fri, 11 Jan 2019 16:27:31 +0000 (17:27 +0100)]
scsi-generic: avoid possible out-of-bounds access to r->buf
Whenever the allocation length of a SCSI request is shorter than the size of the
VPD page list, page_idx is used blindly to index into r->buf. Even though
the stores in the insertion sort are protected against overflows, the same is not
true of the reads and the final store of 0xb0.
This basically does the same thing as commit 57dbb58d80 ("scsi-generic: avoid
out-of-bounds access to VPD page list", 2018-11-06), except that here the
allocation length can be chosen by the guest. Note that according to the SCSI
standard, the contents of the PAGE LENGTH field are not altered based
on the allocation length.
The code was introduced by commit 6c219fc8a1 ("scsi-generic: keep VPD
page list sorted", 2018-11-06) but the overflow was already possible before.
Reported-by: Kevin Wolf <kwolf@redhat.com> Fixes: a71c775b24ebc664129eb1d9b4c360590353efd5 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e909ff93698851777faac3c45d03c1b73f311ea6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Niels de Vos [Tue, 5 Mar 2019 15:46:34 +0000 (16:46 +0100)]
gluster: the glfs_io_cbk callback function pointer adds pre/post stat args
The glfs_*_async() functions do a callback once finished. This callback
has changed its arguments, pre- and post-stat structures have been
added. This makes it possible to improve caching, which is useful for
Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first
release that includes these new arguments.
With an additional detection in ./configure, the new arguments can
conditionally get included in the glfs_io_cbk handler.
Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 0e3b891fefacc0e49f3c8ffa3a753b69eb7214d2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
New versions of Glusters libgfapi.so have an updated glfs_ftruncate()
function that returns additional 'struct stat' structures to enable
advanced caching of attributes. This is useful for file servers, not so
much for QEMU. Nevertheless, the API has changed and needs to be
adopted.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e014dbe74e0484188164c61ff6843f8a04a8cb9d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
hw: Use PFLASH_CFI0{1,2} and TYPE_PFLASH_CFI0{1,2}
We have two open-coded copies of macro PFLASH_CFI01(). Move the macro
to the header, so we can ditch the copies. Move PFLASH_CFI02() to the
header for symmetry.
We define macros TYPE_PFLASH_CFI01 and TYPE_PFLASH_CFI02 for type name
strings, then mostly use the strings. If the macros are worth
defining, they are worth using. Replace the strings by the macros.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-6-armbru@redhat.com>
(cherry picked from commit 81c7db723ebd0c677784a728020c7e8845868daf)
*prereq for 3a283507 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
pflash_cfi01.c and pflash_cfi02.c start their identifiers with
pflash_cfi01_ and pflash_cfi02_ respectively, except for
CFI_PFLASH01(), TYPE_CFI_PFLASH01, CFI_PFLASH02(), TYPE_CFI_PFLASH02.
Rename for consistency.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-5-armbru@redhat.com>
(cherry picked from commit e7b6274197c5f096860014ca750544d6aca0b9b9)
*prereq for 3a283507 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Our implementation of "write to buffer" (command 0xE8) is flawed.
LOG_UNIMP its use, and add some FIXME comments.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-4-armbru@redhat.com>
(cherry picked from commit 4dbda935e054600667f9e57095fa97e2ce5936f9)
*prereq for 3a283507 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
pflash_cfi01: Do not exit() on guest aborting "write to buffer"
When a guest tries to abort "write to buffer" (command 0xE8), we print
"PFLASH: Possible BUG - Write block confirm", then exit(1). Letting
the guest terminate QEMU is not a good idea. Instead, LOG_UNIMP we
screwed up, then reset the device.
Macro PFLASH_BUG() is now unused; delete it.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190308094610.21210-3-armbru@redhat.com>
(cherry picked from commit 2d93bebf81520ccebeb9a3ea9bd051ce088854ea)
*prereq for 3a283507 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
pflash: Rename pflash_t to PFlashCFI01, PFlashCFI02
flash.h's incomplete struct pflash_t is completed both in
pflash_cfi01.c and in pflash_cfi02.c. The complete types are
incompatible. This can hide type errors, such as passing a pflash_t
created with pflash_cfi02_register() to pflash_cfi01_get_memory().
Furthermore, POSIX reserves typedef names ending with _t.
Rename the two structs to PFlashCFI01 and PFlashCFI02.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308094610.21210-2-armbru@redhat.com>
(cherry picked from commit 1643406520f8ff6f4cc11062950f5f898b03b573)
*prereq for 3a283507 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
block/pflash_cfi02: Fix memory leak and potential use-after-free
Don't dynamically allocate the pflash's timer. But do use timer_del in
an unrealize function to make sure that the timer can't fire after the
pflash_t has been freed.
Signed-off-by: Stephen Checkoway <stephen.checkoway@oberlin.edu> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190219153727.62279-1-stephen.checkoway@oberlin.edu> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit d80cf1eb2e87df3a9bfb226bcc7fb3a1aa858817)
*prereq for 16434065/3a283507 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
hw/display/xlnx_dp: Avoid crash when reading empty RX FIFO
In the previous commit we fixed a crash when the guest read a
register that pop from an empty FIFO.
By auditing the repository, we found another similar use with
an easy way to reproduce:
$ qemu-system-aarch64 -M xlnx-zcu102 -monitor stdio -S
QEMU 4.0.50 monitor - type 'help' for more information
(qemu) xp/b 0xfd4a0134
Aborted (core dumped)
(gdb) bt
#0 0x00007f6936dea57f in raise () at /lib64/libc.so.6
#1 0x00007f6936dd4895 in abort () at /lib64/libc.so.6
#2 0x0000561ad32975ec in xlnx_dp_aux_pop_rx_fifo (s=0x7f692babee70) at hw/display/xlnx_dp.c:431
#3 0x0000561ad3297dc0 in xlnx_dp_read (opaque=0x7f692babee70, offset=77, size=4) at hw/display/xlnx_dp.c:667
#4 0x0000561ad321b896 in memory_region_read_accessor (mr=0x7f692babf620, addr=308, value=0x7ffe05c1db88, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
#5 0x0000561ad321bd70 in access_with_adjusted_size (addr=308, value=0x7ffe05c1db88, size=1, access_size_min=4, access_size_max=4, access_fn=0x561ad321b858 <memory_region_read_accessor>, mr=0x7f692babf620, attrs=...) at memory.c:569
#6 0x0000561ad321e9d5 in memory_region_dispatch_read1 (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1420
#7 0x0000561ad321ea9d in memory_region_dispatch_read (mr=0x7f692babf620, addr=308, pval=0x7ffe05c1db88, size=1, attrs=...) at memory.c:1447
#8 0x0000561ad31bd742 in flatview_read_continue (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1, addr1=308, l=1, mr=0x7f692babf620) at exec.c:3385
#9 0x0000561ad31bd895 in flatview_read (fv=0x561ad69c04f0, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3423
#10 0x0000561ad31bd90b in address_space_read_full (as=0x561ad5bb3020, addr=4249485620, attrs=..., buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", len=1) at exec.c:3436
#11 0x0000561ad33b1c42 in address_space_read (len=1, buf=0x7ffe05c1dcf0 "\020\335\301\005\376\177", attrs=..., addr=4249485620, as=0x561ad5bb3020) at include/exec/memory.h:2131
#12 0x0000561ad33b1c42 in memory_dump (mon=0x561ad59c4530, count=1, format=120, wsize=1, addr=4249485620, is_physical=1) at monitor/misc.c:723
#13 0x0000561ad33b1fc1 in hmp_physical_memory_dump (mon=0x561ad59c4530, qdict=0x561ad6c6fd00) at monitor/misc.c:795
#14 0x0000561ad37b4a9f in handle_hmp_command (mon=0x561ad59c4530, cmdline=0x561ad59d0f22 "/b 0x00000000fd4a0134") at monitor/hmp.c:1082
Fix by checking the FIFO is not empty before popping from it.
The datasheet is not clear about the reset value of this register,
we choose to return '0'.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20190709113715.7761-4-philmd@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit a09ef5040477643a7026703199d8781fe048d3a8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
hw/ssi/mss-spi: Avoid crash when reading empty RX FIFO
Reading the RX_DATA register when the RX_FIFO is empty triggers
an abort. This can be easily reproduced:
$ qemu-system-arm -M emcraft-sf2 -monitor stdio -S
QEMU 4.0.50 monitor - type 'help' for more information
(qemu) x 0x40001010
Aborted (core dumped)
(gdb) bt
#1 0x00007f035874f895 in abort () at /lib64/libc.so.6
#2 0x00005628686591ff in fifo8_pop (fifo=0x56286a9a4c68) at util/fifo8.c:66
#3 0x00005628683e0b8e in fifo32_pop (fifo=0x56286a9a4c68) at include/qemu/fifo32.h:137
#4 0x00005628683e0efb in spi_read (opaque=0x56286a9a4850, addr=4, size=4) at hw/ssi/mss-spi.c:168
#5 0x0000562867f96801 in memory_region_read_accessor (mr=0x56286a9a4b60, addr=16, value=0x7ffeecb0c5c8, size=4, shift=0, mask=4294967295, attrs=...) at memory.c:439
#6 0x0000562867f96cdb in access_with_adjusted_size (addr=16, value=0x7ffeecb0c5c8, size=4, access_size_min=1, access_size_max=4, access_fn=0x562867f967c3 <memory_region_read_accessor>, mr=0x56286a9a4b60, attrs=...) at memory.c:569
#7 0x0000562867f99940 in memory_region_dispatch_read1 (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1420
#8 0x0000562867f99a08 in memory_region_dispatch_read (mr=0x56286a9a4b60, addr=16, pval=0x7ffeecb0c5c8, size=4, attrs=...) at memory.c:1447
#9 0x0000562867f38721 in flatview_read_continue (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, addr1=16, l=4, mr=0x56286a9a4b60) at exec.c:3385
#10 0x0000562867f38874 in flatview_read (fv=0x56286aec6360, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3423
#11 0x0000562867f388ea in address_space_read_full (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4) at exec.c:3436
#12 0x0000562867f389c5 in address_space_rw (as=0x56286aa3e890, addr=1073745936, attrs=..., buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=false) at exec.c:3466
#13 0x0000562867f3bdd7 in cpu_memory_rw_debug (cpu=0x56286aa19d00, addr=1073745936, buf=0x7ffeecb0c7c0 "\340ǰ\354\376\177", len=4, is_write=0) at exec.c:3976
#14 0x000056286811ed51 in memory_dump (mon=0x56286a8c32d0, count=1, format=120, wsize=4, addr=1073745936, is_physical=0) at monitor/misc.c:730
#15 0x000056286811eff1 in hmp_memory_dump (mon=0x56286a8c32d0, qdict=0x56286b15c400) at monitor/misc.c:785
#16 0x00005628684740ee in handle_hmp_command (mon=0x56286a8c32d0, cmdline=0x56286a8caeb2 "0x40001010") at monitor/hmp.c:1082
From the datasheet "Actel SmartFusion Microcontroller Subsystem
User's Guide" Rev.1, Table 13-3 "SPI Register Summary", this
register has a reset value of 0.
Check the FIFO is not empty before accessing it, else log an
error message.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20190709113715.7761-3-philmd@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit c0bccee9b40ec58c9d165b406ae3d4f63652ce53) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Tested-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 526668c734e6a07f2fedfd378840a61b70c1cbab) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Previous patches switched to a temporary pbp but that does not go far
enough: after device uses a buffer, guest is free to reuse it, so
tracking the page and freeing it later is wrong.
Free and reset the pbp after we push each element.
Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Cc: qemu-stable@nongnu.org #v4.0.0 Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1b47b37c33ec01ae1efc527f4c97f97f93723bc4) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
As ramblocks cannot get removed/readded while we are processing a bulk
of inflation requests, there is no more need to track the page size
in form of the number of subpages.
Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190725113638.4702-8-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9a7ca8a7c920360db9dcaf616ca6f1440c025043) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We still have multiple issues in the current code
- The PBP is not freed during unrealize()
- The PBP is not reset on device resets: After a reset, the PBP is stale.
- We are not indicating VIRTIO_BALLOON_F_MUST_TELL_HOST, therefore
guests (esp. legacy guests) will reuse pages without deflating,
turning the PBP stale. Adding that would require compat handling.
Instead, let's use the PBP only temporarily, when processing one bulk of
inflation requests. This will keep guest_page_size > 4k working (with
Linux guests). There is nothing to do for deflation requests anymore.
The pbp is only used for a limited amount of time.
Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Cc: qemu-stable@nongnu.org #v4.0.0 Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-7-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit a8cd64d488325f3be5c4ddec4bf07efb3b8c7330)
*drop context dependency on qemu_4_0_config_size changes Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Using the address of a RAMBlock to test for a matching pbp is not really
safe. Instead, let's use the guest physical address of the base page
along with the page size (via the number of subpages).
Also, let's allocate the bitmap separately. This makes the code
easier to read and maintain - we can reuse bitmap_new().
Prepare the code to move the PBP out of the device.
Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Fixes: b27b32391404 ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates") Cc: qemu-stable@nongnu.org #v4.0.0 Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-6-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 1c5cfc2b7153dd72bf4b8ddc456408eb2b9b66d8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
While at it, use QEMU_ALIGN_DOWN() instead of a handcrafted computation
and move the computation to the place where it is needed.
Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit e6129b271b9dccca22c84870e313c315f2c70063) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Let's simplify this - the case we are optimizing for is very hard to
trigger and not worth the effort. If we're switching from inflation to
deflation, let's reset the pbp.
Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2ffc49eea1bbd454913a88a0ad872c2649b36950) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
virtio-balloon: Fix QEMU crashes on pagesize > BALLOON_PAGE_SIZE
We are using the wrong functions to set/clear bits, effectively touching
multiple bits, writing out of range of the bitmap, resulting in memory
corruptions. We have to use set_bit()/clear_bit() instead.
Can easily be reproduced by starting a qemu guest on hugetlbfs memory,
inflating the balloon. QEMU crashes. This never could have worked
properly - especially, also pages would have been discarded when the
first sub-page would be inflated (the whole bitmap would be set).
While testing I realized, that on hugetlbfs it is pretty much impossible
to discard a page - the guest just frees the 4k sub-pages in random order
most of the time. I was only able to discard a hugepage a handful of
times - so I hope that now works correctly.
Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Fixes: b27b32391404 ("virtio-balloon: Fix possible guest memory corruption with inflates & deflates") Cc: qemu-stable@nongnu.org #v4.0.0 Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 483f13524bb2a08b7ff6a7560b846564ed3b0c33) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If we directly cast from int to uint64_t, we will first sign-extend to
an int64_t, which is wrong. We actually want to treat the PFNs like
unsigned values.
As far as I can see, this dates back to the initial virtio-balloon
commit, but wasn't triggered as fairly big guests would be required.
Cc: qemu-stable@nongnu.org Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190722134108.22151-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit ffa207d08253ffffb3993a1dbe09e40af4fc91f1) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Wed, 6 Mar 2019 03:06:01 +0000 (14:06 +1100)]
virtio-balloon: Restore MADV_WILLNEED hint on balloon deflate
Prior to f6deb6d9 "virtio-balloon: Remove unnecessary MADV_WILLNEED on
deflate", the balloon device issued an madvise() MADV_WILLNEED on
pages removed from the balloon. That would hint to the host kernel
that the pages were likely to be needed by the guest in the near
future.
It's unclear if this is actually valuable or not, and so f6deb6d9
removed this, essentially ignoring balloon deflate requests. However,
concerns have been raised that this might cause a performance
regression by causing extra latency for the guest in certain
configurations.
So, until we can get actual benchmark data to see if that's the case,
this restores the old behaviour, issuing a MADV_WILLNEED when a page is
removed from the balloon.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190306030601.21986-4-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 596546fe9e4d1d1fa6423c300e2a73b6f90baeb0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Wed, 6 Mar 2019 03:06:00 +0000 (14:06 +1100)]
virtio-balloon: Fix possible guest memory corruption with inflates & deflates
This fixes a balloon bug with a nasty consequence - potentially
corrupting guest memory - but which is extremely unlikely to be
triggered in practice.
The balloon always works in 4kiB units, but the host could have a
larger page size on certain platforms. Since ed48c59 "virtio-balloon:
Safely handle BALLOON_PAGE_SIZE < host page size" we've handled this
by accumulating requests to balloon 4kiB subpages until they formed a
full host page. Since f6deb6d "virtio-balloon: Remove unnecessary
MADV_WILLNEED on deflate" we essentially ignore deflate requests.
Suppose we have a host with 8kiB pages, and one host page has subpages
A & B. If we get this sequence of events -
inflate A
deflate A
inflate B
- the current logic will discard the whole host page. That's
incorrect because the guest has deflated subpage A, and could have
written important data to it.
This patch fixes the problem by adjusting our state information about
partially ballooned host pages when deflate requests are received.
Fixes: ed48c59 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size" Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190306030601.21986-3-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit b27b3239140470b7d593e3b0b09687bcc6fbf274) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ed48c59875b6 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host
page size" introduced a new temporary data structure which tracks 4kiB
chunks which have been inserted into the balloon by the guest but
don't yet form a full host page which we can discard.
Unfortunately, I had a thinko and allocated that structure with
g_malloc0() but freed it with a plain free() rather than g_free().
This corrects the problem.
Fixes: ed48c59875b6 Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190306030601.21986-2-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit 301cf2a8dd5024aa5bbdc6bd3e121174bbfc2957) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
we can only actually discard memory in units of the host page size.
Now, we handle this very badly: we silently ignore balloon requests that
aren't host page aligned, and for requests that are host page aligned we
discard the entire host page. The latter can corrupt guest memory if its
page size is smaller than the host's.
The obvious choice would be to disable the balloon if the host page size is
not 4kiB. However, that would break the special case where host and guest
have the same page size, but that's larger than 4kiB. That case currently
works by accident[1] - and is used in practice on many production POWER
systems where 64kiB has long been the Linux default page size on both host
and guest.
To make the balloon safe, without breaking that useful special case, we
need to accumulate 4kiB balloon requests until we have a whole contiguous
host page to discard.
We could in principle do that across all guest memory, but it would require
a large bitmap to track. This patch represents a compromise: we track
ballooned subpages for a single contiguous host page at a time. This means
that if the guest discards all 4kiB chunks of a host page in succession,
we will discard it. This is the expected behaviour in the (host page) ==
(guest page) != 4kiB case we want to support.
If the guest scatters 4kiB requests across different host pages, we don't
discard anything, and issue a warning. Not ideal, but at least we don't
corrupt guest memory as the previous version could.
Warning reporting is kind of a compromise here. Determining whether we're
in a problematic state at realize() time is tricky, because we'd have to
look at the host pagesizes of all memory backends, but we can't really know
if some of those backends could be for special purpose memory that's not
subject to ballooning.
Reporting only when the guest tries to balloon a partial page also isn't
great because if the guest page size happens to line up it won't indicate
that we're in a non ideal situation. It could also cause alarming repeated
warnings whenever a migration is attempted.
So, what we do is warn the first time the guest attempts balloon a partial
host page, whether or not it will end up ballooning the rest of the page
immediately afterwards.
[1] Because when the guest attempts to balloon a page, it will submit
requests for each 4kiB subpage. Most will be ignored, but the one
which happens to be host page aligned will discard the whole lot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190214043916.22128-6-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ed48c59875b603058366490f472490f0fb9c30f3) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Thu, 14 Feb 2019 04:39:15 +0000 (15:39 +1100)]
virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
discard RAM pages inserted into the balloon. This is basically a Linux
only interface (MADV_DONTNEED exists on some other platforms, but doesn't
always have the same semantics). It also doesn't work on hugepages and has
some other limitations.
It turns out that postcopy also needs to discard chunks of memory, and uses
a better interface for it: ram_block_discard_range(). It doesn't cover
every case, but it covers more than going direct to madvise() and this
gives us a single place to update for more possibilities in future.
There are some subtleties here to maintain the current balloon behaviour:
* For now, we just ignore requests to balloon in a hugepage backed region.
That matches current behaviour, because MADV_DONTNEED on a hugepage would
simply fail, and we ignore the error.
* If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
non-host-page-aligned addresses. These would also fail in madvise(),
which we then ignored. ram_block_discard_range() error_report()s calls
on unaligned addresses, so we explicitly check that case to avoid
spamming the logs.
* We now call ram_block_discard_range() with the *host* page size, whereas
we previously called madvise() with BALLOON_PAGE_SIZE. Surprisingly,
this also matches existing behaviour. Although the kernel fails madvise
on unaligned addresses, it will round unaligned sizes *up* to the host
page size. Yes, this means that if BALLOON_PAGE_SIZE < guest page size
we can incorrectly discard more memory than the guest asked us to. I'm
planning to address that soon.
Errors other than the ones discussed above, will now be reported by
ram_block_discard_range(), rather than silently ignored, which means we
have a much better chance of seeing when something is going wrong.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-5-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit dbe1a2774521d838c34b831d89a4bb646a8e9d7c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Thu, 14 Feb 2019 04:39:14 +0000 (15:39 +1100)]
virtio-balloon: Rework ballon_page() interface
This replaces the balloon_page() internal interface with
ballon_inflate_page(), with a slightly different interface. The new
interface will make future alterations simpler.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-4-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit e9550234d79ddb69b01721d8cb197edc0a14a245) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Thu, 14 Feb 2019 04:39:13 +0000 (15:39 +1100)]
virtio-balloon: Corrections to address verification
The virtio-balloon device's verification of the address given to it by the
guest has a number of faults:
* The addresses here are guest physical addresses, which should be
'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
pretty subtle and confusing)
* We don't check for section.mr being NULL, which is the main way that
memory_region_find() reports basic failures. We really need to check
that before looking at any other section fields, because
memory_region_find() doesn't initialize them on the failure path
* We're passing a length of '1' to memory_region_find(), but really the
guest is requesting that we put the entire page into the balloon,
so it makes more sense to call it with BALLOON_PAGE_SIZE
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-3-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit b218a70e6ae882f52cc339ae965f515a36a9139f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Thu, 14 Feb 2019 04:39:12 +0000 (15:39 +1100)]
virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
When the balloon is inflated, we discard memory place in it using madvise()
with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
sounds like it makes sense but is actually unnecessary.
The misleadingly named MADV_DONTNEED just discards the memory in question,
it doesn't set any persistent state on it in-kernel; all that's necessary
to bring the memory back is to touch it. MADV_WILLNEED in contrast
specifically says that the memory will be used soon and faults it in.
This patch simplify's the balloon operation by dropping the madvise()
on deflate. This might have an impact on performance - it will move a
delay at deflate time until that memory is actually touched, which
might be more latency sensitive. However:
* Memory that's being given back to the guest by deflating the
balloon *might* be used soon, but it equally could just sit around
in the guest's pools until needed (or even be faulted out again if
the host is under memory pressure).
* Usually, the timescale over which you'll be adjusting the balloon
is long enough that a few extra faults after deflation aren't
going to make a difference.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20190214043916.22128-2-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f6deb6d95aa7c29fa0047057512060ca720cad22) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Fri, 18 Jan 2019 18:36:03 +0000 (18:36 +0000)]
hw/virtio/virtio-balloon: zero-initialize the virtio_balloon_config struct
In virtio_balloon_get_config() we initialize a struct virtio_balloon_config
which we then copy to guest memory. However, the local variable is not
zero initialized. This works OK at the moment because we initialize
all the fields in it; however an upcoming kernel header change will
add some new fields. If we don't zero out the whole struct then we
will start leaking a small amount of the contents of QEMU's stack
to the guest as soon as we update linux-headers/ to a set of headers
that includes the new fields.
Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190118183603.24757-1-peter.maydell@linaro.org
(cherry picked from commit 5385a5988c8a55bebdc878c05b96648579b5d6e0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
i386/acpi: show PCI Express bus on pxb-pcie expanders
Show PCIe host bridge PNP id with PCI host bridge as a compatible id
when expanding a pcie bus.
Cc: qemu-stable@nongnu.org Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
Message-Id: <1563526469-15588-1-git-send-email-wrfsh@yandex-team.ru> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit ee4b0c8686f781987879508d7c6dd605b5435bac) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When very large regions (32GB sized in our case, PCI pass-through of GPUs)
are compared substraction result does not fit into gint.
As a result crs_replace_with_free_ranges does not get sorted ranges and
incorrectly computes PCI64 free space regions. Which then makes linux
guest complain about device and PCI64 hole intersection and device
becomes unusable.
Fix that by returning exactly fitting ranges.
Also fix indentation of an entire crs_replace_with_free_ranges to make
checkpatch happy.
Jan Kiszka [Sun, 2 Jun 2019 11:42:13 +0000 (13:42 +0200)]
ioapic: kvm: Skip route updates for masked pins
Masked entries will not generate interrupt messages, thus do no need to
be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
the kind
if the masked entry happens to reference a non-present IRTE.
Cc: qemu-stable@nongnu.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <a84b7e03-f9a8-b577-be27-4d93d1caa1c9@siemens.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit be1927c97e564346cbd409cb17fe611df74b84e5) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Li Hangjing [Mon, 3 Jun 2019 06:15:24 +0000 (14:15 +0800)]
vhost: fix vhost_log size overflow during migration
When a guest which doesn't support multiqueue is migrated with a multi queues
vhost-user-blk deivce, a crash will occur like:
0 qemu_memfd_alloc (name=<value optimized out>, size=562949953421312, seals=<value optimized out>, fd=0x7f87171fe8b4, errp=0x7f87171fe8a8) at util/memfd.c:153
1 0x00007f883559d7cf in vhost_log_alloc (size=70368744177664, share=true) at hw/virtio/vhost.c:186
2 0x00007f88355a0758 in vhost_log_get (listener=0x7f8838bd7940, enable=1) at qemu-2-12/hw/virtio/vhost.c:211
3 vhost_dev_log_resize (listener=0x7f8838bd7940, enable=1) at hw/virtio/vhost.c:263
4 vhost_migration_log (listener=0x7f8838bd7940, enable=1) at hw/virtio/vhost.c:787
5 0x00007f88355463d6 in memory_global_dirty_log_start () at memory.c:2503
6 0x00007f8835550577 in ram_init_bitmaps (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2173
7 ram_init_all (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2192
8 ram_save_setup (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2219
9 0x00007f88357a419d in qemu_savevm_state_setup (f=0x7f88384ce600) at migration/savevm.c:1002
10 0x00007f883579fc3e in migration_thread (opaque=0x7f8837530400) at migration/migration.c:2382
11 0x00007f8832447893 in start_thread () from /lib64/libpthread.so.0
12 0x00007f8832178bfd in clone () from /lib64/libc.so.6
This is because vhost_get_log_size() returns a overflowed vhost-log size.
In this function, it uses the uninitialized variable vqs->used_phys and
vqs->used_size to get the vhost-log size.
Signed-off-by: Li Hangjing <lihangjing@baidu.com> Reviewed-by: Xie Yongji <xieyongji@baidu.com> Reviewed-by: Chai Wen <chaiwen@baidu.com>
Message-Id: <20190603061524.24076-1-lihangjing@baidu.com> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 240e647a14df9677b3a501f7b8b870e40aac3fd5) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Wed, 15 May 2019 04:15:41 +0000 (06:15 +0200)]
iotests: Test unaligned raw images with O_DIRECT
We already have 221 for accesses through the page cache, but it is
better to create a new file for O_DIRECT instead of integrating those
test cases into 221. This way, we can make use of
_supported_cache_modes (and _default_cache_mode) so the test is
automatically skipped on filesystems that do not support O_DIRECT.
As part of the split, add _supported_cache_modes to 221. With that, it
no longer fails when run with -c none or -c directsync.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2fab30c80b33cdc6157c7efe6207e54b6835cf92)
*remove context dependencies on iotests not in 3.1 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Wed, 15 May 2019 04:15:40 +0000 (06:15 +0200)]
block/file-posix: Unaligned O_DIRECT block-status
Currently, qemu crashes whenever someone queries the block status of an
unaligned image tail of an O_DIRECT image:
$ echo > foo
$ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
Offset Length Mapped to File
qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
failed.
This is because bdrv_co_block_status() checks that the result returned
by the driver's implementation is aligned to the request_alignment, but
file-posix can fail to do so, which is actually mentioned in a comment
there: "[...] possibly including a partial sector at EOF".
Fix this by rounding up those partial sectors.
There are two possible alternative fixes:
(1) We could refuse to open unaligned image files with O_DIRECT
altogether. That sounds reasonable until you realize that qcow2
does necessarily not fill up its metadata clusters, and that nobody
runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2
files usually have an unaligned image tail.
(2) bdrv_co_block_status() could ignore unaligned tails. It actually
throws away everything past the EOF already, so that sounds
reasonable.
Unfortunately, the block layer knows file lengths only with a
granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
would have to guess whether its file length information is inexact
or whether the driver is broken.
Fixing what raw_co_block_status() returns is the safest thing to do.
There seems to be no other block driver that sets request_alignment and
does not make sure that it always returns aligned values.
Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 9c3db310ff0b7473272ae8dce5e04e2f8a825390) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Max Reitz [Wed, 30 Jan 2019 23:52:51 +0000 (00:52 +0100)]
iotests: Filter second BLOCK_JOB_ERROR from 229
Without this filter, this test sometimes fails.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit fff2388d5d9caecca6200455d0ab6d5e13f4e9bd) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Lieven [Thu, 4 Apr 2019 12:10:15 +0000 (14:10 +0200)]
megasas: fix mapped frame size
the current value of 1024 bytes (16 * MFI_FRAME_SIZE) we map is not enough to hold
the maximum number of scatter gather elements we advertise. We actually need a
maximum of 2048 bytes. This is 128 max sg elements * 16 bytes (sizeof (union mfi_sgl)).
Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20190404121015.28634-1-pl@kamp.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2e56fbc87f6ec3cd56c37b01d313abd502b80d61) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Dan Streetman [Tue, 16 Apr 2019 18:46:24 +0000 (14:46 -0400)]
do not call vhost_net_cleanup() on running net from char user event
Buglink: https://launchpad.net/bugs/1823458
Currently, a user CHR_EVENT_CLOSED event will cause net_vhost_user_event()
to call vhost_user_cleanup(), which calls vhost_net_cleanup() for all
its queues. However, vhost_net_cleanup() must never be called like
this for fully-initialized nets; when other code later calls
vhost_net_stop() - such as from virtio_net_vhost_status() - it will try
to access the already-cleaned-up fields and fail with assertion errors
or segfaults.
The vhost_net_cleanup() will eventually be called from
qemu_cleanup_net_client().
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Message-Id: <20190416184624.15397-3-dan.streetman@canonical.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6ab79a20af3a7b3bf610ba9aebb446a9f0b05930) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Wed, 17 Apr 2019 15:15:25 +0000 (17:15 +0200)]
block: Fix AioContext switch for bs->drv == NULL
Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.
This is an example stack trace of a related crash:
#0 0x00007ffff59da57f in raise () at /lib64/libc.so.6
#1 0x00007ffff59c4895 in abort () at /lib64/libc.so.6
#2 0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
#3 0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
#4 0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
#5 0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
at block/io.c:833
#6 0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
#7 0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
#8 0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
#9 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
#10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
#12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
#13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
#14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
#15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
#16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
#17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
#18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
#19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
#20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
#21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
#22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
#23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776
The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
(cherry picked from commit 1bffe1ae7a7b707c3a14ea2ccd00d3609d3ce4d8)
*drop context dependency on e64f25f30b8 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Blake [Wed, 17 Apr 2019 17:11:00 +0000 (12:11 -0500)]
cutils: Fix size_to_str() on 32-bit platforms
When extracting a human-readable size formatter, we changed 'uint64_t
div' pre-patch to 'unsigned long div' post-patch. Which breaks on
32-bit platforms, resulting in 'inf' instead of intended values larger
than 999GB.
Fixes: 22951aaa CC: qemu-stable@nongnu.org Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 754da86714d550c3f995f11a2587395081362e0a) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Mon, 15 Apr 2019 14:25:01 +0000 (16:25 +0200)]
qcow2: Avoid COW during metadata preallocation
Limiting the allocation to INT_MAX bytes isn't particularly clever
because it means that the final cluster will be a partial cluster which
will be completed through a COW operation. This results in unnecessary
data read and write requests which lead to an unwanted non-sparse
filesystem block for metadata preallocation.
Align the maximum allocation size down to the cluster size to avoid this
situation.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit f29fbf7c6b1c9a84f6931c1c222716fbe073e6e4)
*modified to avoid functional dependency on 93e32b3e Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
qom: Clean up error reporting in user_creatable_add_opts_foreach()
Some callers were updated to pass in "&error_fatal" but all the ones in
qemu-img were left passing NULL. As a result all errors went to
/dev/null instead of being reported to the user.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 334c43e2c342e878311c66b4e62343f0a7c2c6be) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Gerd Hoffmann [Thu, 13 Dec 2018 12:25:11 +0000 (13:25 +0100)]
usb-mtp: use O_NOFOLLOW and O_CLOEXEC.
Open files and directories with O_NOFOLLOW to avoid symlinks attacks.
While being at it also add O_CLOEXEC.
usb-mtp only handles regular files and directories and ignores
everything else, so users should not see a difference.
Because qemu ignores symlinks, carrying out a successful symlink attack
requires swapping an existing file or directory below rootdir for a
symlink and winning the race against the inotify notification to qemu.
Fixes: CVE-2018-16872 Cc: Prasad J Pandit <ppandit@redhat.com> Cc: Bandan Das <bsd@redhat.com> Reported-by: Michael Hanselmann <public@hansmi.ch> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Michael Hanselmann <public@hansmi.ch>
Message-id: 20181213122511.13853-1-kraxel@redhat.com
(cherry picked from commit bab9df35ce73d1c8e19a37e2737717ea1c984dc1) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
qga: update docs with systemd suspend support info
Commit 067927d62e ("qga: systemd hibernate/suspend/hybrid-sleep
support") failed to update qapi-schema.json after adding systemd
hibernate/suspend/hybrid-sleep capabilities to guest-suspend-* QGA
commands.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(cherry picked from commit bb6c8d407e49d7b805ac52fe46abf4d8d5262046) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
mac_newworld: use node name instead of alias name for hd device in FWPathProvider
When using -drive to configure the hd drive for the New World machine, the node
name "disk" should be used instead of the "hd" alias.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190307212058.4890-3-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 31bc6fa7fa8124ff8fb08373f9402985c806919f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
mac_oldworld: use node name instead of alias name for hd device in FWPathProvider
When using -drive to configure the hd drive for the Old World machine, the node
name "disk" should be used instead of the "hd" alias.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190307212058.4890-2-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 484d366e02732c8de6f92e53e2ee9bb93dd4ca23) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Petazzoni [Wed, 13 Feb 2019 21:18:27 +0000 (22:18 +0100)]
configure: improve usbfs check
The current check to test if usbfs support should be compiled or not
solely relies on the presence of <linux/usbdevice_fs.h>, without
actually checking that all definition used by Qemu are provided by
this header file.
With sufficiently old kernel headers, <linux/usbdevice_fs.h> may be
present, but some of the definitions needed by Qemu may not be
available.
This commit improves the check by building a small program that
actually tests whether the necessary definitions are available.
In addition, it fixes a bug where have_usbfs was set to "yes"
regardless of the result of the test.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190213211827.20300-1-thomas.petazzoni@bootlin.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 96566d09aa105ee04cbc1c9539cf8a9a40e8e422) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Michael Roth [Tue, 12 Feb 2019 21:29:16 +0000 (15:29 -0600)]
qga-win: include glib when building VSS DLL
Commit 3ebee3b191e defined assert() as g_assert(), but when we build
the VSS DLL component of QGA (to handle fsfreeze) we do not include
glib, which results in breakage when building with VSS support enabled.
Fix this by including glib (along with the -lintl and -lws2_32
dependencies it brings).
Since the VSS DLL is built statically, this introduces an additional
dependency on static glib and supporting libs for the mingw environment
(possibly why we didn't include glib originally), but VSS support
already has very specific prerequisites so it shouldn't affect too many
build environments.
Since the VSS DLL code does use qemu/osdep.h, this should also help
avoid future breakages and possibly allow for some clean ups in current
VSS code.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(cherry picked from commit 82a58d270c6fbbe2f2381224946340fd3814a273) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Commit 8bca4613 added support for %% in json strings when interpolating,
but in doing so broke handling of % when not interpolating.
When parse_string() is fed a string token containing '%', it skips the
'%' regardless of ctxt->ap, i.e. even it's not interpolating. If the
'%' is the string's last character, it fails an assertion. Else, it
"merely" swallows the '%'.
Fix parse_string() to handle '%' specially only when interpolating.
To gauge the bug's impact, let's review non-interpolating users of this
parser, i.e. code passing NULL context to json_message_parser_init():
Filenames are trusted when they come from command line, QMP or HMP.
They are untrusted when they come from from image file headers.
Example: QCOW2 backing file name. Note that this is *not* the security
boundary between host and guest. It's the boundary between host and an
image file from an untrusted source.
Neither failing an assertion nor skipping a character in a filename of
your choice looks exploitable. Note that we don't support compiling
with NDEBUG.
Fixes: 8bca4613e6cddd948895b8db3def05950463495b Cc: qemu-stable@nongnu.org Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Message-Id: <20190102140535.11512-1-cfergeau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>
[Commit message extended to discuss impact] Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit bbc0586ced6e9ffdfd29d89fcc917b3d90ac3938) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Fri, 21 Dec 2018 11:35:56 +0000 (12:35 +0100)]
i386: remove the 'INTEL_PT' CPUID bit from named CPU models
Processor tracing is not yet implemented for KVM and it will be an
opt in feature requiring a special module parameter.
Disable it, because it is wrong to enable it by default and
it is impossible that no one has ever used it.
Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 4c257911dcc7c4189768e9651755c849ce9db4e8)
*drop context dependency on ecb85fe48 Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Robert Hoo [Wed, 19 Dec 2018 13:44:40 +0000 (21:44 +0800)]
i386: remove the new CPUID 'PCONFIG' from Icelake-Server CPU model
PCONFIG is not available to guests; it must be specifically enabled
using the PCONFIG_ENABLE execution control. Disable it, because
no one can ever use it.
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1545227081-213696-2-git-send-email-robert.hu@linux.intel.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 76e5a4d58357b9d077afccf7f7c82e17f733b722) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Liam Merwick [Fri, 15 Feb 2019 13:35:17 +0000 (13:35 +0000)]
tpm_tis: fix loop that cancels any seizure by a lower locality
In tpm_tis_mmio_write() if the requesting locality is seizing
access, any seizure by a lower locality is cancelled. However the
loop doing the seizure had an off-by-one error and the locality
immediately preceding the requesting locality was not being cleared.
This is fixed by adjusting the test in the for loop to check the
localities up to the requesting locality.
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
(cherry picked from commit 37b55d67c0f001b20b7831db3f9f24f1d453e1de) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
William Bowling [Fri, 1 Mar 2019 21:45:56 +0000 (21:45 +0000)]
slirp: check sscanf result when emulating ident
When emulating ident in tcp_emu, if the strchr checks passed but the
sscanf check failed, two uninitialized variables would be copied and
sent in the reply, so move this code inside the if(sscanf()) clause.
Signed-off-by: William Bowling <will@wbowling.info> Cc: qemu-stable@nongnu.org Cc: secalert@redhat.com
Message-Id: <1551476756-25749-1-git-send-email-will@wbowling.info> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit d3222975c7d6cda9e25809dea05241188457b113) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Thu, 31 Jan 2019 14:16:10 +0000 (15:16 +0100)]
block: Fix invalidate_cache error path for parent activation
bdrv_co_invalidate_cache() clears the BDRV_O_INACTIVE flag before
actually activating a node so that the correct permissions etc. are
taken. In case of errors, the flag must be restored so that the next
call to bdrv_co_invalidate_cache() retries activation.
Restoring the flag was missing in the error path for a failed
parent->role->activate() call. The consequence is that this attempt to
activate all images correctly fails because we still set errp, however
on the next attempt BDRV_O_INACTIVE is already clear, so we return
success without actually retrying the failed action.
An example where this is observable in practice is migration to a QEMU
instance that has a raw format block node attached to a guest device
with share-rw=off (the default) while another process holds
BLK_PERM_WRITE for the same image. In this case, all activation steps
before parent->role->activate() succeed because raw can tolerate other
writers to the image. Only the parent callback (in particular
blk_root_activate()) tries to implement the share-rw=on property and
requests exclusive write permissions. This fails when the migration
completes and correctly displays an error. However, a manual 'cont' will
incorrectly resume the VM without calling blk_root_activate() again.
This case is described in more detail in the following bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1531888
Fix this by correctly restoring the BDRV_O_INACTIVE flag in the error
path.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 78fc3b3a26c145eebcdee992988644974b243a74) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Fri, 1 Feb 2019 14:55:45 +0000 (14:55 +0000)]
exec.c: Don't reallocate IOMMUNotifiers that are in use
The tcg_register_iommu_notifier() code has a GArray of
TCGIOMMUNotifier structs which it has registered by passing
memory_region_register_iommu_notifier() a pointer to the embedded
IOMMUNotifier field. Unfortunately, if we need to enlarge the
array via g_array_set_size() this can cause a realloc(), which
invalidates the pointer that memory_region_register_iommu_notifier()
put into the MemoryRegion's iommu_notify list. This can result
in segfaults.
Switch the GArray to holding pointers to the TCGIOMMUNotifier
structs, so that we can individually allocate and free them.
Cc: qemu-stable@nongnu.org Fixes: 1f871c5e6b0f30644a60a ("exec.c: Handle IOMMUs in address_space_translate_for_iotlb()") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128174241.5860-1-peter.maydell@linaro.org
(cherry picked from commit 5601be3b01d73e21c09331599e2ce62df016ff94) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Janosch Frank [Fri, 11 Jan 2019 11:36:57 +0000 (12:36 +0100)]
s390x: Return specification exception for unimplemented diag 308 subcodes
The architecture specifies specification exceptions for all
unavailable subcodes.
The presence of subcodes is indicated by checking some query subcode.
For example 6 will indicate that 3-6 are available. So future systems
might call new subcodes to check for new features. This should not
trigger a hw error, instead we return the architectured specification
exception.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Cc: qemu-stable@nongnu.org
Message-Id: <20190111113657.66195-3-frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 37dbd1f4d4805edcd18d94eb202bb3461b3cd52d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Peter Maydell [Tue, 8 Jan 2019 18:49:00 +0000 (18:49 +0000)]
linux-user: make pwrite64/pread64(fd, NULL, 0, offset) return 0
Linux returns success if pwrite64() or pread64() are called with a
zero length NULL buffer, but QEMU was returning -TARGET_EFAULT.
This is the same bug that we fixed in commit 58cfa6c2e6eb51b23cc9
for the write syscall, and long before that in 38d840e6790c29f59
for the read syscall.
Fixes: https://bugs.launchpad.net/qemu/+bug/1810433 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190108184900.9654-1-peter.maydell@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 2bd3f8998e1e7dcd9afc29fab252fb9936f9e956) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Fri, 14 Dec 2018 13:08:07 +0000 (14:08 +0100)]
hw/s390x: Fix bad mask in time2tod()
Since "s390x/tcg: avoid overflows in time2tod/tod2time", the
time2tod() function tries to deal with the 9 uppermost bits in the
time value, but uses the wrong mask for this: 0xff80000000000000 should
be used instead of 0xff10000000000000 here.
Fixes: 14055ce53c2d901d826ffad7fb7d6bb8ab46bdfd Cc: qemu-stable@nongnu.org Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1544792887-14575-1-git-send-email-thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
[CH: tweaked commit message] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit aba7a5a2de3dba5917024df25441f715b9249e31) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Corey Minyard [Mon, 26 Nov 2018 18:28:44 +0000 (12:28 -0600)]
pc:piix4: Update smbus I/O space after a migration
Otherwise it won't be set up correctly and won't work after
miigration.
Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2b4e573c7c7b9a698ba6931ba456bbd8d3d8c84c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Zheng Xiang [Mon, 3 Dec 2018 07:05:17 +0000 (15:05 +0800)]
pcie: set link state inactive/active after hot unplug/plug
When VM boots from the latest version of linux kernel, after
hot-unpluging virtio-blk disks which are hotplugged into
pcie-root-port, the VM's dmesg log shows:
[ 151.046242] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0001 from Slot Status
[ 151.046365] pciehp 0000:00:05.0:pcie004: Slot(0-3): Attention button pressed
[ 151.046369] pciehp 0000:00:05.0:pcie004: Slot(0-3): Powering off due to button press
[ 151.046420] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 151.046425] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 151.046464] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 151.046468] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 156.163421] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 2f1
[ 156.163427] pciehp 0000:00:05.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:06:00
[ 156.198736] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 156.198772] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
[ 157.224124] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0018 from Slot Status
[ 157.224194] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
[ 157.224220] pciehp 0000:00:05.0:pcie004: pciehp_check_link_active: lnk_status = 2011
[ 157.224223] pciehp 0000:00:05.0:pcie004: Slot(0-3): Link Up
[ 157.224233] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 7f1
[ 157.224281] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 157.224285] pciehp 0000:00:05.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
[ 157.224300] pciehp 0000:00:05.0:pcie004: __pciehp_link_set: lnk_ctrl = 0
[ 157.224336] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 157.224339] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 159.739294] pci 0000:06:00.0 id reading try 50 times with interval 20 ms to get ffffffff
[ 159.739315] pciehp 0000:00:05.0:pcie004: pciehp_check_link_status: lnk_status = 2011
[ 159.739318] pciehp 0000:00:05.0:pcie004: Failed to check link status
[ 159.739371] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 159.739394] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
[ 160.771426] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771452] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
[ 160.771495] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771499] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
[ 160.771535] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771539] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
After analyzing the log information, it seems that qemu doesn't
change the Link Status from active to inactive after hot-unplug.
This results in the abnormal log after the linux kernel commit d331710ea78fea merged.
Furthermore, If I hotplug the same virtio-blk disk after hot-unplug,
the virtio-blk would turn on and then back off.
So this patch set the Link Status inactive after hot-unplug and
active after hot-plug.
Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com> Signed-off-by: Zheng Xiang <xiang.zheng@linaro.org> Cc: Wang Haibin <wanghaibin.wang@huawei.com> Cc: qemu-stable@nongnu.org Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2f2b18f60bf17453b4c01197a9316615a3c1f1de) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paul A. Clarke [Fri, 7 Dec 2018 17:13:14 +0000 (15:13 -0200)]
Changes requirement for "vsubsbs" instruction
Changes requirement for "vsubsbs" instruction, which has been supported
since ISA 2.03. (Please see section 5.9.1.2 of ISA 2.03)
Reported-by: Paul A. Clarke <pc@us.ibm.com> Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Signed-off-by: Leonardo Bras <leonardo@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit fcfbc18d00b10335310c9665edd6e04f2d152be8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
"-machine pc" will not work all architectures. Lets fall back to the
default machine by not specifying it.
In addition we also need to specify -no-shutdown on s390 as qemu will
exit otherwise.
Cc: qemu-stable@nongnu.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 2c26e648e4350079b0c86a6627b2d3566c3709c0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
BALATON Zoltan [Wed, 28 Nov 2018 19:27:06 +0000 (20:27 +0100)]
i2c: Move typedef of bitbang_i2c_interface to i2c.h
Clang 3.4 considers duplicate typedef in ppc4xx_i2c.h and
bitbang_i2c.h an error even if they are identical. Move it to a common
place to allow building with this clang version.
Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 2b4c1125ac3db2734222ff43c25388a16aca4a99) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
i2c: pm_smbus: check smb_index before block transfer write
While performing block transfer write in smb_ioport_writeb(),
'smb_index' is incremented and used to index smb_data[] array.
Check 'smb_index' value to avoid OOB access.
Note that this bug is exploitable by a guest to escape
from the virtual machine. However the commit which
introduced the bug was only made after the 3.0 release,
and so it is not present in any released QEMU versions.
Fixes: 38ad4fae43 i2c: pm_smbus: Add block transfer capability Reported-by: Michael Hanselmann <public@hansmi.ch> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Michael Hanselmann <public@hansmi.ch> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20181206121830.6177-1-ppandit@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Jason Wang [Tue, 4 Dec 2018 03:53:47 +0000 (11:53 +0800)]
virtio-net-test: add large tx buffer test
This test tries to build a packet whose size is greater than INT_MAX
which tries to trigger integer overflow in qemu_net_queue_append_iov()
which may result OOB.
Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20181204035347.6148-6-jasowang@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Jason Wang [Tue, 4 Dec 2018 03:53:46 +0000 (11:53 +0800)]
virtio-net-test: remove unused macro
Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-id: 20181204035347.6148-5-jasowang@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Jason Wang [Tue, 4 Dec 2018 03:53:45 +0000 (11:53 +0800)]
virtio-net-test: accept variable length argument in pci_test_start()
This allows flexibility to be reused for all kinds of command line
used by other tests.
Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-id: 20181204035347.6148-4-jasowang@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Jason Wang [Tue, 4 Dec 2018 03:53:44 +0000 (11:53 +0800)]
net: hub: suppress warnings of no host network for qtest
If we want to qtest through hub, it would be much more simpler and
safer to configure the hub without host network. So silent this
warnings for qtest.
Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20181204035347.6148-3-jasowang@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Jason Wang [Tue, 4 Dec 2018 03:53:43 +0000 (11:53 +0800)]
net: drop too large packet early
We try to detect and drop too large packet (>INT_MAX) in 1592a9947036
("net: ignore packet size greater than INT_MAX") during packet
delivering. Unfortunately, this is not sufficient as we may hit
another integer overflow when trying to queue such large packet in
qemu_net_queue_append_iov():
- size of the allocation may overflow on 32bit
- packet->size is integer which may overflow even on 64bit
Fixing this by moving the check to qemu_sendv_packet_async() which is
the entrance of all networking codes and reduce the limit to
NET_BUFSIZE to be more conservative. This works since:
- For the callers that call qemu_sendv_packet_async() directly, they
only care about if zero is returned to determine whether to prevent
the source from producing more packets. A callback will be triggered
if peer can accept more then source could be enabled. This is
usually used by high speed networking implementation like virtio-net
or netmap.
- For the callers that call qemu_sendv_packet() that calls
qemu_sendv_packet_async() indirectly, they often ignore the return
value. In this case qemu will just the drop packets if peer can't
receive.
Qemu will copy the packet if it was queued. So it was safe for both
kinds of the callers to assume the packet was sent.
Since we move the check from qemu_deliver_packet_iov() to
qemu_sendv_packet_async(), it would be safer to make
qemu_deliver_packet_iov() static to prevent any external user in the
future.
This is a revised patch of CVE-2018-17963.
Cc: qemu-stable@nongnu.org Cc: Li Qiang <liq3ea@163.com> Fixes: 1592a9947036 ("net: ignore packet size greater than INT_MAX") Reported-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20181204035347.6148-2-jasowang@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>