Laine Stump [Tue, 25 Apr 2017 16:26:43 +0000 (12:26 -0400)]
network: better log message when network is inactive during reconnect
If the network isn't active during networkNotifyActualDevice(), we
would log an error message stating that the bridge device didn't
exist. This patch adds a check to see if the network is active, making
the logs more useful in the case that it isn't.
Laine Stump [Tue, 25 Apr 2017 16:20:30 +0000 (12:20 -0400)]
qemu: don't kill qemu process on restart if networkNotify fails
Nothing that could happen during networkNotifyActualDevice() could
justify unceremoniously killing the qemu process, but that's what we
were doing.
In particular, new code added in commit 85bcc022 (first appearred in
libvirt-3.2.0) attempts to reattach tap devices to their assigned
bridge devices when libvirtd restarts (to make it easier to recover
from a restart of a libvirt network). But if the network has been
stopped and *not* restarted, the bridge device won't exist and
networkNotifyActualDevice() will fail.
This patch changes networkNotifyActualDevice() and
qemuProcessNotifyNets() to return void, so that qemuProcessReconnect()
will soldier on regardless of what happens (any errors will still be
logged though).
Peter Krempa [Wed, 26 Apr 2017 07:57:39 +0000 (09:57 +0200)]
qemu: process: Don't leak priv->usbaddrs after VM restart
Since the private data structure is not freed upon stopping a VM, the
usbaddrs pointer would be leaked:
==15388== 136 (16 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 893 of 1,019
==15388== at 0x4C2CF55: calloc (vg_replace_malloc.c:711)
==15388== by 0x54BF64A: virAlloc (viralloc.c:144)
==15388== by 0x5547588: virDomainUSBAddressSetCreate (domain_addr.c:1608)
==15388== by 0x144D38A2: qemuDomainAssignUSBAddresses (qemu_domain_address.c:2458)
==15388== by 0x144D38A2: qemuDomainAssignAddresses (qemu_domain_address.c:2515)
==15388== by 0x144ED1E3: qemuProcessPrepareDomain (qemu_process.c:5398)
==15388== by 0x144F51FF: qemuProcessStart (qemu_process.c:5979)
[...]
Peter Krempa [Tue, 25 Apr 2017 13:17:34 +0000 (15:17 +0200)]
qemu: process: Clean automatic NUMA/cpu pinning information on shutdown
Clean the stale data after shutting down the VM. Otherwise the data
would be leaked on next VM start. This happens due to the fact that the
private data object is not freed on destroy of the VM.
Eric Farman [Wed, 26 Apr 2017 21:10:01 +0000 (17:10 -0400)]
qemu: Remove extra messages for vhost-scsi hotplug
As with virtio-scsi, the "internal error" messages after
preparing a vhost-scsi hostdev overwrites more meaningful
error messages deeper in the callchain. Remove it too.
Eric Farman [Wed, 26 Apr 2017 21:10:00 +0000 (17:10 -0400)]
qemu: Remove extra messages from virtio-scsi hotplug
I tried to attach a SCSI LUN to two different guests, and forgot
to specify "shareable" in the hostdev XML. Attaching the device
to the second guest failed, but the message was not helpful in
telling me what I was doing wrong:
$ virsh attach-device dasd_fedora_0e1e scsi_scratch_disk.xml
error: Failed to attach device from scsi_scratch_disk.xml
error: internal error: Unable to prepare scsi hostdev: scsi_host3:0:15:1074151456
I eventually discovered my error, but thought it was weird that
Libvirt doesn't provide something more helpful in this case.
Looking over the code we had just gone through, I commented out
the "internal error" message, and got something more useful:
$ virsh attach-device dasd_fedora_0e1e scsi_scratch_disk.xml
error: Failed to attach device from scsi_scratch_disk.xml
error: Requested operation is not valid: SCSI device 3:0:15:1074151456 is already in use by other domain(s) as 'non-shareable'
Looking over the error paths here, we seem to issue better
messages deeper in the callchain so these "internal error"
messages overwrite any of them. Remove them, so that the
more detailed errors are seen.
Peter Krempa [Wed, 26 Apr 2017 07:01:30 +0000 (09:01 +0200)]
qemu: numa: Don't return automatic nodeset for inactive domain
qemuDomainGetNumaParameters would return the automatic nodeset even for
the persistent config if the domain was running. This is incorrect since
the automatic nodeset will be re-queried upon starting the vm.
Migration with old QEMU which does not support query-migrate-parameters
would fail because the QMP command is called unconditionally since the
introduction of TLS migration. Previously it was only called if the user
explicitly requested a feature which uses QEMU migration parameters. And
even then the situation was not ideal, instead of reporting an
unsupported feature we'd just complain about missing QMP command.
Trivially no migration parameters are supported when
query-migrate-parameters QMP command is missing. There's no need to
report an error if it is missing, the callers will report better error
if needed.
Yi Wang [Tue, 18 Apr 2017 01:55:29 +0000 (09:55 +0800)]
rpc: fix keep alive timer segfault
ka maybe have been freeed in virObjectUnref, application using
virKeepAliveTimer will segfault when unlock ka. We should keep
ka's refs positive before using it.
#0 0x00007fd8f79970e8 in virClassIsDerivedFrom (klass=0xdeadbeef, parent=0x7fd8e8001b80) at util/virobject.c:169
#1 0x00007fd8f799742e in virObjectIsClass (anyobj=anyobj entry=0x7fd8e800b9c0, klass=<optimized out>) at util/virobject.c:365
#2 0x00007fd8f79974e4 in virObjectUnlock (anyobj=0x7fd8e800b9c0) at util/virobject.c:338
#3 0x00007fd8f7ac477e in virKeepAliveTimer (timer=<optimized out>, opaque=0x7fd8e800b9c0) at rpc/virkeepalive.c:177
#4 0x00007fd8f7e5c9cf in libvirt_virEventInvokeTimeoutCallback () from /usr/lib64/python2.7/site-packages/libvirtmod.so
#5 0x00007fd8ff64db94 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6 0x00007fd8ff64f1ad in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#7 0x00007fd8ff64d85f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8 0x00007fd8ff64d950 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#9 0x00007fd8ff64d950 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#10 0x00007fd8ff64f1ad in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#11 0x00007fd8ff5dc098 in function_call () from /lib64/libpython2.7.so.1.0
#12 0x00007fd8ff5b7073 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#13 0x00007fd8ff5c6085 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#14 0x00007fd8ff5b7073 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#15 0x00007fd8ff648ff7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#16 0x00007fd8ff67d7e2 in t_bootstrap () from /lib64/libpython2.7.so.1.0
#17 0x00007fd8ff358df3 in start_thread () from /lib64/libpthread.so.0
#18 0x00007fd8fe97d3ed in clone () from /lib64/libc.so.6
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
(cherry picked from commit ab5bb6f3465aa8608cdf8e7b7bbeda8500bada17)
Laine Stump [Mon, 17 Apr 2017 14:50:48 +0000 (10:50 -0400)]
util: allow ignoring SIOCSIFHWADDR when errno is EPERM
Commit f4ef3a71 made a variation of virNetDevSetMAC that would return
without logging an error message if errno was set to
EADDRNOTAVAIL. This errno is set by some SRIOV VF drivers (in
particular igbvf) when they fail to set the device's MAC address due
to the PF driver refusing the request. This is useful if we want to
try a different method of setting the VF MAC address before giving up
(Commit 86556e16 actually does this, setting the desired MAC address
to the "admin MAC in the PF, then detaching and reattaching the VF
netdev driver to force a reinit of the MAC address).
During testing of Bug 1442040 t was discovered that the ixgbe driver
returns EPERM in this situation, so this patch changes the exception
case for silent+non-terminal failure to account for this difference.
Completes resolution to: https://bugzilla.redhat.com/1415609 (RHEL 7.4)
https://bugzilla.redhat.com/1442040 (RHEL 7.3.z)
Pavel Hrdina [Fri, 21 Apr 2017 08:50:12 +0000 (10:50 +0200)]
util: check ifa_addr pointer before accessing its elements
Reported by Rafał Wojciechowski <it@rafalwojciechowski.pl>.
Thread 1 (Thread 0x7f194b99d700 (LWP 5631)):
0 virNetDevGetifaddrsAddress (addr=0x7f194b99c7c0, ifname=0x7f193400e2b0 "ovirtmgmt") at util/virnetdevip.c:738
1 virNetDevIPAddrGet (ifname=0x7f193400e2b0 "ovirtmgmt", addr=addr@entry=0x7f194b99c7c0) at util/virnetdevip.c:795
2 0x00007f19467800d6 in networkGetNetworkAddress (netname=<optimized out>, netaddr=netaddr@entry=0x7f1924013f18) at network/bridge_driver.c:4780
3 0x00007f193e43a33c in qemuProcessGraphicsSetupNetworkAddress (listenAddr=0x7f19340f7650 "127.0.0.1", glisten=0x7f1924013f10) at qemu/qemu_process.c:4062
4 qemuProcessGraphicsSetupListen (vm=<optimized out>, graphics=0x7f1924014f10, cfg=0x7f1934119f00) at qemu/qemu_process.c:4133
5 qemuProcessSetupGraphics (flags=17, vm=0x7f19240155d0, driver=0x7f193411f1d0) at qemu/qemu_process.c:4196
6 qemuProcessPrepareDomain (conn=conn@entry=0x7f192c00ab50, driver=driver@entry=0x7f193411f1d0, vm=vm@entry=0x7f19240155d0, flags=flags@entry=17) at qemu/qemu_process.c:4969
7 0x00007f193e4417c0 in qemuProcessStart (conn=conn@entry=0x7f192c00ab50, driver=driver@entry=0x7f193411f1d0, vm=0x7f19240155d0,asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=migrateFd@entry=-1, migratePath=migratePath@entry=0x0,snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17, flags@entry=1) at qemu/qemu_process.c:5553
Man page for getifaddrs also states that the "ifa_addr" may contain
a null pointer which happens if there is an existing network interface
on the host without IP address.
Jim Fehlig [Wed, 19 Apr 2017 18:54:33 +0000 (12:54 -0600)]
Increase default task limit for libvirtd
libvirtd can spawn threads/tasks when creating new domains for
some hypervisors such as Xen's libxl driver, quickly reaching
the cgroups pids controller default TasksMax setting of 512. When
the limit is reached, attempting to create additional domains
results in an error from the cgroups pids controller, e.g.
kernel: [71282.213347] cgroup: fork rejected by pids controller in
/system.slice/libvirtd.service
Depending on domain type and configuration, anywhere from 4-7
threads/tasks may be created by libxl when starting a domain.
In order to support 4096 domains, similar to commit 27cd763500,
increase the TasksMax setting in libvirtd.service to
4096 * 8 = 32768 tasks.
Fix error reporting when poll returns POLLHUP/POLLERR
In the RPC client event loop code, if poll() returns only a POLLHUP
or POLLERR status, then we end up reporting a bogus error message:
error: failed to connect to the hypervisor
error: An error occurred, but the cause is unknown
We do actually report an error, but we virNetClientMarkClose method
has already captured the error status before we report it, so the
real error gets thrown away. The key fix is to report the error
before calling virNetClientMarkClose(). In changing this, we also
split out reporting of POLLHUP vs POLLERR to make any future bugs
easier to diagnose.
spec: Avoid RPM verification errors on nwfilter XMLs
/etc/libvirt/nwfilter/*.xml files are installed with no UUID, which
means libvirtd will automatically alter all of them once it starts. Thus
RPM verification will always fail on them. Let's use a trick similar to
the default network XML and store nwfilter XMLs in /usr/share. They will
be copied into /etc in %post. Additionally the /etc files are marked as
%ghost so that they are uninstalled if the RPM package is removed.
Note that the %post script overwrites existing files with new ones on
upgrade, which is what has always been happening.
Jim Fehlig [Thu, 13 Apr 2017 16:39:52 +0000 (10:39 -0600)]
xenconfig: avoid double free on OOM testing
Fix xlconfig channel tests when OOM testing is enabled.
TEST: xlconfigtest
32) Xen XL-2-XML Format channel-unix ... OK
Test OOM for nalloc=55 ................................................*** Error in `/home/jfehlig/virt/upstream/libvirt/tests/.libs/xlconfigtest': double free or corruption (fasttop): 0x0000000000679550 ***
...
(gdb) bt
#0 0x00007ffff36875af in raise () from /lib64/libc.so.6
#1 0x00007ffff36889aa in abort () from /lib64/libc.so.6
#2 0x00007ffff36c5150 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff36cb4f6 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ffff36cbcee in _int_free () from /lib64/libc.so.6
#5 0x00007ffff782babf in virFree (ptrptr=0x7fffffffdca8) at util/viralloc.c:582
#6 0x000000000042f2f3 in xenParseXLChannel (conf=0x677350, def=0x6815b0) at xenconfig/xen_xl.c:788
#7 0x000000000042f44e in xenParseXL (conf=0x677350, caps=0x6832b0, xmlopt=0x67f6e0) at xenconfig/xen_xl.c:828
#8 0x00000000004105a3 in testCompareFormatXML (
xlcfg=0x6811e0 "/home/jfehlig/virt/upstream/libvirt/tests/xlconfigdata/test-channel-unix.cfg",
xml=0x681110 "/home/jfehlig/virt/upstream/libvirt/tests/xlconfigdata/test-channel-unix.xml", replaceVars=false)
at xlconfigtest.c:152
When a channel is successfully parsed and its path and name fields
assigned from local variables, set the local variables to NULL to
prevent a double free on error.
Wim ten Have [Mon, 27 Mar 2017 20:20:19 +0000 (22:20 +0200)]
xenFormatXLDomainDisks: avoid double free on OOM testing
Fix xlconfigtest runs build for --enable-test-oom on
Xen XL-2-XML Parse new-disk
#0 0x00007ffff3bd791f in raise () from /lib64/libc.so.6
#1 0x00007ffff3bd951a in abort () from /lib64/libc.so.6
#2 0x00007ffff3c1b200 in __libc_message () from /lib64/libc.so.6
#3 0x00007ffff3c2488a in _int_free () from /lib64/libc.so.6
#4 0x00007ffff3c282bc in free () from /lib64/libc.so.6
#5 0x00007ffff7864fcb in virFree (ptrptr=ptrptr@entry=0x7fffffffd868) at util/viralloc.c:582
#6 0x00007ffff78776e5 in virConfFreeValue (val=<optimized out>) at util/virconf.c:178
==> #7 0x0000000000425759 in xenFormatXLDomainDisks (def=0x7fffffffd8c0, def=0x7fffffffd8c0, conf=0x658220)
at xenconfig/xen_xl.c:1159
#8 xenFormatXL (def=def@entry=0x66ec20, conn=conn@entry=0x668cf0) at xenconfig/xen_xl.c:1558
#9 0x000000000040ea1d in testCompareParseXML (replaceVars=<optimized out>,
xml=0x65f5e0 "/home/wtenhave/WORK/libvirt/history/libvirt/tests/xlconfigdata/test-fullvirt-ovmf.xml",
xlcfg=0x65f6b0 "/home/wtenhave/WORK/libvirt/history/libvirt/tests/xlconfigdata/test-fullvirt-ovmf.cfg")
at xlconfigtest.c:105
#10 testCompareHelper (data=<optimized out>) at xlconfigtest.c:205
#11 0x000000000041079a in virTestRun (title=title@entry=0x431cf0 "Xen XL-2-XML Parse fullvirt-ovmf",
body=body@entry=0x40e720 <testCompareHelper>, data=data@entry=0x7fffffffda50) at testutils.c:247
#12 0x000000000040ebc2 in mymain () at xlconfigtest.c:256
#13 0x0000000000411070 in virTestMain (argc=1, argv=0x7fffffffdc08, func=0x40f2c0 <mymain>) at testutils.c:992
#14 0x00007ffff3bc2401 in __libc_start_main () from /lib64/libc.so.6
#15 0x000000000040e5da in _start ()
symmetry seems missing its sibbling coded functionality
demonstrated under functions;
xenFormatXLUSBController()
xenFormatXLUSB()
xenFormatXLDomainChannels()
xenFormatXMDisks
Wim ten Have [Mon, 27 Mar 2017 20:20:43 +0000 (22:20 +0200)]
virConfSaveValue: protect against a NULL pointer reference
Fix xlconfigtest runs build for --enable-test-oom on
Xen XL-2-XML Parse channel-pty
Program received signal SIGSEGV, Segmentation fault.
#0 0x00007ffff3c2b373 in __strchr_sse2 () from /lib64/libc.so.6
==> #1 0x00007ffff7875701 in virConfSaveValue (buf=buf@entry=0x7fffffffd8a0, val=val@entry=0x674750) at util/virconf.c:290
#2 0x00007ffff7875668 in virConfSaveValue (buf=buf@entry=0x7fffffffd8a0, val=<optimized out>) at util/virconf.c:306
#3 0x00007ffff78757ef in virConfSaveEntry (buf=buf@entry=0x7fffffffd8a0, cur=cur@entry=0x674780) at util/virconf.c:338
#4 0x00007ffff78783eb in virConfWriteMem (memory=0x665570 "", len=len@entry=0x7fffffffd910, conf=conf@entry=0x65b940)
at util/virconf.c:1543
#5 0x000000000040eccb in testCompareParseXML (replaceVars=<optimized out>, xml=<optimized out>,
xlcfg=0x662c00 "/home/wtenhave/WORK/libvirt/OOMtesting/libvirt-devel/tests/xlconfigdata/test-channel-pty.cfg")
at xlconfigtest.c:108
#6 testCompareHelper (data=<optimized out>) at xlconfigtest.c:205
#7 0x0000000000410b3a in virTestRun (title=title@entry=0x432cc0 "Xen XL-2-XML Parse channel-pty",
body=body@entry=0x40e9b0 <testCompareHelper>, data=data@entry=0x7fffffffd9f0) at testutils.c:247
#8 0x000000000040f322 in mymain () at xlconfigtest.c:278
#9 0x0000000000411410 in virTestMain (argc=1, argv=0x7fffffffdba8, func=0x40f660 <mymain>) at testutils.c:992
#10 0x00007ffff3bc0401 in __libc_start_main () from /lib64/libc.so.6
#11 0x000000000040e86a in _start ()
(gdb) frame 1
#1 0x00007ffff7875701 in virConfSaveValue (buf=buf@entry=0x7fffffffd8a0, val=val@entry=0x674750) at util/virconf.c:290
290 if (strchr(val->str, '\n') != NULL) {
(gdb) print *val
$1 = {type = VIR_CONF_STRING, next = 0x0, l = 0, str = 0x0, list = 0x0}
If the parent is not a scsi_host, then we can just happily return since
we won't be removing a vport.
Fixes a bug with the following output:
$ virsh pool-destroy host4_hba_pool
error: Failed to destroy pool host4_hba_pool
error: internal error: Invalid adapter name 'pci_0000_10_00_1' for SCSI pool
Peter Krempa [Fri, 7 Apr 2017 14:56:49 +0000 (16:56 +0200)]
qemu: snapshot: Skip empty drives with internal snapshots
The code that validates whether an internal snapshot is possible would
reject an empty but not-readonly drive. Since floppies can have this
property, add a check for emptiness.
Peter Krempa [Wed, 12 Apr 2017 12:54:04 +0000 (14:54 +0200)]
qemu: conf: Don't leak snapshot image format conf variable
==20406== 4 bytes in 1 blocks are definitely lost in loss record 6 of 1,059
==20406== at 0x4C2AF3F: malloc (vg_replace_malloc.c:299)
==20406== by 0x8F17D39: strdup (in /lib64/libc-2.24.so)
==20406== by 0x552C0E0: virStrdup (virstring.c:784)
==20406== by 0x54D3622: virConfGetValueString (virconf.c:945)
==20406== by 0x144E4692: virQEMUDriverConfigLoadFile (qemu_conf.c:687)
==20406== by 0x1452A744: qemuStateInitialize (qemu_driver.c:664)
==20406== by 0x55DB585: virStateInitialize (libvirt.c:770)
==20406== by 0x124570: daemonRunStateInit (libvirtd.c:881)
==20406== by 0x5532990: virThreadHelper (virthread.c:206)
==20406== by 0x8C82493: start_thread (in /lib64/libpthread-2.24.so)
==20406== by 0x8F7FA1E: clone (in /lib64/libc-2.24.so)
Erik Skultety [Wed, 12 Apr 2017 08:46:35 +0000 (10:46 +0200)]
qemu: Fix mdev checking for VFIO support
Commit a4a39d90 added a check that checks for VFIO support with mediated
devices. The problem is that the hostdev preparing functions behave like
a fallthrough if device of that specific type doesn't exist. However,
the check for VFIO support was independent of the existence of a mdev
device which caused the guest to fail to start with any device to be
directly assigned if VFIO was disabled/unavailable in the kernel.
The proposed change first ensures that it makes sense to check for VFIO
support in the first place, and only then performs the VFIO support check
itself.
Wang King [Wed, 12 Apr 2017 07:36:09 +0000 (15:36 +0800)]
virsh: don't leak @cpumap in virshVcpuPinQuery
==18591== 16 bytes in 1 blocks are definitely lost in loss record 41 of 183
==18591== at 0x4C2B934: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==18591== by 0x54EBB1C: virAllocN (viralloc.c:191)
==18591== by 0x1628CA: _vshMalloc (vsh.c:136)
==18591== by 0x1344C4: virshVcpuPinQuery (virsh-domain.c:6603)
==18591== by 0x1344C4: cmdVcpuPin (virsh-domain.c:6707)
==18591== by 0x1631BF: vshCommandRun (vsh.c:1312)
==18591== by 0x12DBB1: main (virsh.c:961)
Pavel Hrdina [Sun, 9 Apr 2017 10:55:09 +0000 (12:55 +0200)]
rpc: fix resource leak
Commit 252610f7dd1 switched to use hash to store servers.
Function virHashGetItems returns allocated array which needs
to be freed also for successful path, not only if there is
an error.
Pavel Hrdina [Sun, 9 Apr 2017 10:43:45 +0000 (12:43 +0200)]
conf/domain_capabilities: fix resource leak
Commit 14319c81a0 introduced CPU host model in domain capabilities
and the *hostmodel* variable is always filled by virCPUDefCopy()
and needs to be freed.
Marc Hartmayer [Mon, 3 Apr 2017 08:24:35 +0000 (10:24 +0200)]
qemu: Fix two use-after-free situations
There were multiple race conditions that could lead to segmentation
faults. The first precondition for this is qemuProcessLaunch must fail
sometime shortly after starting the new QEMU process. The second
precondition for the segmentation faults is that the new QEMU process
dies - or to be more precise the QEMU monitor has to be closed
irregularly. If both happens during qemuProcessStart (starting a
domain) there are race windows between the thread with the event
loop (T1) and the thread that is starting the domain (T2).
First segmentation fault scenario:
If qemuProcessLaunch fails during qemuProcessStart the code branches
to the 'stop' path where 'qemuMonitorSetDomainLog(priv->mon, NULL,
NULL, NULL)' will set the log function of the monitor to NULL (done in
T2). In the meantime the event loop of T1 will wake up with an EOF
event for the QEMU monitor because the QEMU process has died. The
crash occurs if T1 has checked 'mon->logFunc != NULL' in qemuMonitorIO
just before the logFunc was set to NULL by T2. If this situation
occurs T1 will try to call mon->logFunc which leads to the
segmentation fault.
Solution:
Require the monitor lock for setting the log function.
Backtrace:
0 0x0000000000000000 in ?? ()
1 0x000003ffe9e45316 in qemuMonitorIO (watch=<optimized out>,
fd=<optimized out>, events=<optimized out>, opaque=0x3ffe08aa860) at
../../src/qemu/qemu_monitor.c:727
2 0x000003fffda2e1a4 in virEventPollDispatchHandles (nfds=<optimized
out>, fds=0x2aa000fd980) at ../../src/util/vireventpoll.c:508
3 0x000003fffda2e398 in virEventPollRunOnce () at
../../src/util/vireventpoll.c:657
4 0x000003fffda2ca10 in virEventRunDefaultImpl () at
../../src/util/virevent.c:314
5 0x000003fffdba9366 in virNetDaemonRun (dmn=0x2aa000cc550) at
../../src/rpc/virnetdaemon.c:818
6 0x000002aa00024668 in main (argc=<optimized out>, argv=<optimized
out>) at ../../daemon/libvirtd.c:1541
Second segmentation fault scenario:
If qemuProcessLaunch fails it will unref the log context and with
invoking qemuMonitorSetDomainLog(priv->mon, NULL, NULL, NULL)
qemuDomainLogContextFree() will be invoked. qemuDomainLogContextFree()
invokes virNetClientClose() to close the client and cleans everything
up (including unref of _virLogManager.client) when virNetClientClose()
returns. When T1 is now trying to report 'qemu unexpectedly closed the
monitor' libvirtd will crash because the client has already been
freed.
Solution:
As the critical section in qemuMonitorIO is protected with the monitor
lock we can use the same solution as proposed for the first
segmentation fault.
Backtrace:
0 virClassIsDerivedFrom (klass=0x3100979797979797,
parent=0x2aa000d92f0) at ../../src/util/virobject.c:169
1 0x000003fffda659e6 in virObjectIsClass (anyobj=<optimized out>,
klass=<optimized out>) at ../../src/util/virobject.c:365
2 0x000003fffda65a24 in virObjectLock (anyobj=0x3ffe08c1db0) at
../../src/util/virobject.c:317
3 0x000003fffdba4688 in
virNetClientIOEventLoop (client=client@entry=0x3ffe08c1db0,
thiscall=thiscall@entry=0x2aa000fbfa0) at
../../src/rpc/virnetclient.c:1668
4 0x000003fffdba4b4c in
virNetClientIO (client=client@entry=0x3ffe08c1db0,
thiscall=0x2aa000fbfa0) at ../../src/rpc/virnetclient.c:1944
5 0x000003fffdba4d42 in
virNetClientSendInternal (client=client@entry=0x3ffe08c1db0,
msg=msg@entry=0x2aa000cc710, expectReply=expectReply@entry=true,
nonBlock=nonBlock@entry=false) at ../../src/rpc/virnetclient.c:2116
6 0x000003fffdba6268 in
virNetClientSendWithReply (client=0x3ffe08c1db0, msg=0x2aa000cc710) at
../../src/rpc/virnetclient.c:2144
7 0x000003fffdba6e8e in virNetClientProgramCall (prog=0x3ffe08c1120,
client=<optimized out>, serial=<optimized out>, proc=<optimized out>,
noutfds=<optimized out>, outfds=0x0, ninfds=0x0, infds=0x0,
args_filter=0x3fffdb64440
<xdr_virLogManagerProtocolDomainReadLogFileArgs>, args=0x3ffffffe010,
ret_filter=0x3fffdb644c0
<xdr_virLogManagerProtocolDomainReadLogFileRet>, ret=0x3ffffffe008) at
../../src/rpc/virnetclientprogram.c:329
8 0x000003fffdb64042 in
virLogManagerDomainReadLogFile (mgr=<optimized out>, path=<optimized
out>, inode=<optimized out>, offset=<optimized out>, maxlen=<optimized
out>, flags=0) at ../../src/logging/log_manager.c:272
9 0x000003ffe9e0315c in qemuDomainLogContextRead (ctxt=0x3ffe08c2980,
msg=0x3ffffffe1c0) at ../../src/qemu/qemu_domain.c:4422
10 0x000003ffe9e280a8 in qemuProcessReadLog (logCtxt=<optimized out>,
msg=msg@entry=0x3ffffffe288) at ../../src/qemu/qemu_process.c:1800
11 0x000003ffe9e28206 in qemuProcessReportLogError (logCtxt=<optimized
out>, msgprefix=0x3ffe9ec276a "qemu unexpectedly closed the monitor")
at ../../src/qemu/qemu_process.c:1836
12 0x000003ffe9e28306 in
qemuProcessMonitorReportLogError (mon=mon@entry=0x3ffe085cf10,
msg=<optimized out>, opaque=<optimized out>) at
../../src/qemu/qemu_process.c:1856
13 0x000003ffe9e452b6 in qemuMonitorIO (watch=<optimized out>,
fd=<optimized out>, events=<optimized out>, opaque=0x3ffe085cf10) at
../../src/qemu/qemu_monitor.c:726
14 0x000003fffda2e1a4 in virEventPollDispatchHandles (nfds=<optimized
out>, fds=0x2aa000fd980) at ../../src/util/vireventpoll.c:508
15 0x000003fffda2e398 in virEventPollRunOnce () at
../../src/util/vireventpoll.c:657
16 0x000003fffda2ca10 in virEventRunDefaultImpl () at
../../src/util/virevent.c:314
17 0x000003fffdba9366 in virNetDaemonRun (dmn=0x2aa000cc550) at
../../src/rpc/virnetdaemon.c:818
18 0x000002aa00024668 in main (argc=<optimized out>, argv=<optimized
out>) at ../../daemon/libvirtd.c:1541
Other code parts where the same problem was possible to occur are
fixed as well (qemuMigrationFinish, qemuProcessStart, and
qemuDomainSaveImageStartVM).
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com> Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
(cherry picked from commit 20e95cb7c8653b02016ab9a7118c6de8c9866ea9)
Add "bsd" to the list of format types to not checked during blkid
processing even though it supposedly knows the format - for some
(now unknown) reason it's returning partition table not found. So
let's just let PARTED handle "bsd" too.
Commit id 'a48c674fb' added a check for format types "dvh" and "pc98"
to use the parted print processing instead of using blkid processing
in order to validate the label on the disk was what is expected for
disk pool startup. However, commit id 'a4cb4a74f' really messed things
up by missing an else condition causing PARTEDFindLabel to always
return DIFFERENT.
qemu: Properly reset TLS in qemuProcessRecoverMigrationIn
There is no async job running when a freshly started libvirtd is trying
to recover from an interrupted incoming migration. While at it, let's
call qemuMigrationResetTLS every time we don't kill the domain. This is
not strictly necessary since TLS is not supported when v2 migration
protocol is used, but doing so makes more sense.
We want to ignore all files except *.pl in build-aux directory, however
the unignore pattern "!/build-aux/*.pl" doesn't have any effect because
a previous "/build-aux/" pattern ignores the directory itself rather
than individual files in it.
If formatting NUMA topology fails, the function returns immediatelly,
but the buffer structure allocated on the stack references lot of
heap-allocated memory and that would get lost in such case.
This function runs an iscsi command and parses its output.
However, due to the nature of things, virISCSIExtractSession()
callback can be called multiple times. In each run it would
allocate new memory and overwrite the variable where we keep
pointer to it and thus leaking old allocations.
Imagine that this function is called twice over the same disk
source. While in the first run all allocated memory is freed, not
all pointers are set to NULL (e.g. def->srcpool). So when called
again, these poitners are freed again resulting in double free.
Peter Krempa [Thu, 30 Mar 2017 11:18:43 +0000 (13:18 +0200)]
storage: gluster: Implement 'checkPool' method so that state is restored
After restart of libvirtd the 'checkPool' method is supposed to validate
that the pool is online. Since libvirt then refreshes the pool contents
anyways just return whether the pool was supposed to be online so that
the code can be reached. This is necessary since if a pool does not
implement the method it's automatically considered as inactive.
Peter Krempa [Thu, 30 Mar 2017 13:08:06 +0000 (15:08 +0200)]
storage: util: Pass pool type to virStorageBackendFindGlusterPoolSources
The native gluster pool source list data differs from the data used for
attaching gluster volumes as netfs pools. Currently the only difference
was the format. Since native pools don't use it and later there will be
more differences add a more deterministic way to switch between the
types instead.
Ján Tomko [Tue, 4 Apr 2017 10:51:47 +0000 (12:51 +0200)]
util: ignore -Wcast-align in virNetlinkDumpCommand
Similar to commit b202c39 ignore the warning that breaks the build
with clang:
util/virnetlink.c:365:52: error: cast from 'char *' to 'struct nlmsghdr *'
increases required alignment from 1 to 4 [-Werror,-Wcast-align]
for (msg = resp; NLMSG_OK(msg, len); msg = NLMSG_NEXT(msg, len)) {
^~~~~~~~~~~~~~~~~~~~
/usr/include/linux/netlink.h:87:7: note: expanded from macro 'NLMSG_NEXT'
(struct nlmsghdr*)(((char*)(nlh)) + NLMSG_ALIGN((nlh)->nlmsg_len)))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Peter Krempa [Fri, 31 Mar 2017 11:02:14 +0000 (13:02 +0200)]
qemu: hotplug: Iterate over vcpu 0 in individual vcpu hotplug code
Buggy condition meant that vcpu0 would not be iterated in the checks.
Since it's not hotpluggable anyways we would not be able to break the
configuration of a live VM.
The 'capacity' value (e.g. guest logical size) for a LUKS volume is
smaller than the 'physical' value of the file in the file system, so
we need to account for that.
When peeking at the encryption information about the volume add a fetch
of the payload_offset which is described as the offset to the start of
the volume data (in 512 byte sectors) in QEMU's QCryptoBlockLUKSHeader.
Then adjust the ->capacity appropriately when we determine that the
volume target encryption has a payload_offset value.
Cédric Bosdonnat [Tue, 28 Mar 2017 14:00:24 +0000 (16:00 +0200)]
virNetDevIPCheckIPv6ForwardingCallback fixes
Add check for more than one RTA_OIF, even though this is rather
unlikely.
Get rid of the buggy switch / break as this code won't need to
handle more attributes.
Use VIR_WARNINGS_NO_CAST_ALIGN to fix impossible to fix
util/virnetdevip.c:560:17: error: cast increases required alignment of target type [-Werror=cast-align]
Peter Krempa [Thu, 30 Mar 2017 11:47:45 +0000 (13:47 +0200)]
storage: driver: Remove unavailable transient pools after restart
If a transient storage pool is deemed inactive after libvirtd restart it
would not be deleted from the list. Reuse virStoragePoolUpdateInactive
along with a refactor necessary to properly update the state.
Peter Krempa [Thu, 30 Mar 2017 11:45:45 +0000 (13:45 +0200)]
storage: driver: Split out code fixing pool state after deactivation
After a pool is made inactive the definition objects need to be updated
(if a new definition is prepared) and transient pools need to be
completely removed. Split out the code doing these steps into a separate
function for later reuse.
Peter Krempa [Thu, 30 Mar 2017 08:13:36 +0000 (10:13 +0200)]
storage: backend: Use correct stringifier for pool type
When registering a storage poll backend, the code would use
virStorageTypeToString instead of virStoragePoolTypeToString. The
following message would be logged:
virDriverLoadModuleFunc:71 : Lookup function 'virStorageBackendSCSIRegister'
virStorageBackendRegister:174 : Registering storage backend '(null)'
(cherry picked from commit 894133a3bd88fadb950042aec1e9edda0a640f83)
Erik Skultety [Fri, 28 Apr 2017 07:24:31 +0000 (09:24 +0200)]
mdev: Fix daemon crash on domain shutdown after reconnect
The problem resides in virHostdevUpdateActiveMediatedDevices which gets
called during qemuProcessReconnect. The issue here is that
virMediatedDeviceListAdd takes a pointer to the item to be added to the
list to which VIR_APPEND_ELEMENT is used, which also clears the pointer.
However, in this case only the local copy of the pointer got cleared,
leaving the original pointing to valid memory. To sum it up, during
cleanup phase, the original pointer is freed and the daemon crashes
basically any time it would access it.
Backtrace:
0x00007ffff3ccdeba in __strcmp_sse2_unaligned
0x00007ffff72a444a in virMediatedDeviceListFindIndex
0x00007ffff7241446 in virHostdevReAttachMediatedDevices
0x00007fffc60215d9 in qemuHostdevReAttachMediatedDevices
0x00007fffc60216dc in qemuHostdevReAttachDomainDevices
0x00007fffc6046e6f in qemuProcessStop
0x00007fffc6091596 in processMonitorEOFEvent
0x00007fffc6091793 in qemuProcessEventHandler
0x00007ffff7294bf5 in virThreadPoolWorker
0x00007ffff7294184 in virThreadHelper
0x00007ffff3fdc3c4 in start_thread () from /lib64/libpthread.so.0
0x00007ffff3d269cf in clone () from /lib64/libc.so.6
Erik Skultety [Fri, 28 Apr 2017 05:52:52 +0000 (07:52 +0200)]
util: mdev: Use a local variable instead of a direct pointer access
Use a local variable to hold data, rather than accessing the pointer
after calling virMediatedDeviceListAdd (therefore VIR_APPEND_ELEMENT).
Although not causing an issue at the moment, this change is a necessary
prerequisite for tweaking virMediatedDeviceListAdd in a separate patch,
which will take a reference for the source pointer (instead of pointer
value) and will clear it along the way.
qemu: Fix regression when hyperv/vendor_id feature is used
qemuProcessVerifyHypervFeatures is supposed to check whether all
requested hyperv features were actually honored by QEMU/KVM. This is
done by checking the corresponding CPUID bits reported by the virtual
CPU. In other words, it doesn't work for string properties, such as
VIR_DOMAIN_HYPERV_VENDOR_ID (there is no CPUID bit we could check). We
could theoretically check all 96 bits corresponding to the vendor
string, but luckily we don't have to check the feature at all. If QEMU
is too old to support hyperv features, the domain won't even start.
Otherwise, it is always supported.
Without this patch, libvirt refuses to start a domain which contains
reporting internal error: "unknown CPU feature __kvm_hv_vendor_id.
This regression was introduced by commit v3.1.0-186-ge9dbe7011, which
(by fixing the virCPUDataCheckFeature condition in
qemuProcessVerifyHypervFeatures) revealed an old bug in the feature
verification code. It's been there ever since the verification was
implemented by commit v1.3.3-rc1-5-g95bbe4bf5, which effectively did not
check VIR_DOMAIN_HYPERV_VENDOR_ID at all.
Erik Skultety [Fri, 31 Mar 2017 08:05:08 +0000 (10:05 +0200)]
admin: Throw a system error when 'open' fails on user-provided output
There was an unhandled 'open' call which resulted in:
"error: Library function returned error but did not set virError"
Even if this happens during the daemon's start when we still don't have
any set of outputs defined yet, we can safely report an error, since we
automatically fallback to stderr which is fine even for both
running as a daemonized process, since this happens before the daemon
forks into the background, and running as a systemd service, since
systemd re-directs std outputs to journald by default.
Peter Krempa [Fri, 31 Mar 2017 07:48:42 +0000 (09:48 +0200)]
news: Add template for a <release> section
After the release it's necessary to add a new <release> section for the
upcoming release. Add a template so that it does not have to be
compiled over and over again.
In 9e2465834 a check that denies internal snapshots when pflash
based loader is configured for the domain. However, if there's
none and an user tries to do an internal snapshot they will
witness daemon crash as in that case vm->def->os.loader is NULL
and we dereference it unconditionally.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Jiri Denemark [Wed, 29 Mar 2017 09:00:32 +0000 (11:00 +0200)]
qemu: Check non-migratable host CPU features
CPU features which change their value from disabled to enabled between
two calls to query-cpu-model-expansion (the first with no extra
properties set and the second with 'migratable' property set to false)
can be marked as enabled and non-migratable in qemuMonitorCPUModelInfo.
Since the code consuming qemuMonitorCPUModelInfo currently ignores the
migratable flag, this change is effectively changing the CPU model
advertised in domain capabilities to contain all features (even those
which block migration). And this matches what we do for QEMU older than
2.9.0, when we detect all CPUID bits ourselves without asking QEMU.
Jiri Denemark [Wed, 29 Mar 2017 08:58:41 +0000 (10:58 +0200)]
qemu: Check migratable host CPU features
If calling query-cpu-model-expansion on the 'host'/'max' CPU model with
'migratable' property set to false succeeds, we know QEMU is able to
tell us which features would disable migration. Thus we can mark all
enabled features as migratable.
Jiri Denemark [Wed, 29 Mar 2017 08:33:08 +0000 (10:33 +0200)]
qemuMonitorCPUModelInfo: Add support for non-migratable features
QEMU is able to tell us whether a CPU feature would block migration or
not. This patch adds support for storing such features in
qemuMonitorCPUModelInfo.
Peter Krempa [Wed, 29 Mar 2017 14:56:05 +0000 (16:56 +0200)]
qemu: domain: Properly lookup top of chain in qemuDomainGetStorageSourceByDevstr
When idx is 0 virStorageFileChainLookup returns the base (bottom) of the
backing chain rather than the top. This is expected by the callers of
qemuDomainGetStorageSourceByDevstr.
Ján Tomko [Tue, 28 Mar 2017 13:07:50 +0000 (15:07 +0200)]
schema: do not require name for certain pool types
Pool types that have the VIR_STORAGE_POOL_SOURCE_NAME flag set
allow omitting the <name> element and instead fill out the pool name
from the <source><name> element.
Relax the schema to make <name> optional for these pools.
Expressing that at least one of these is required is out of scope
of the schema.
Michal Privoznik [Tue, 28 Mar 2017 13:47:42 +0000 (15:47 +0200)]
qemuDomainGetStats: Copy domain ID too
One of the problems with our virGetDomain function is that it
copies just domain name and domain UUID. Therefore it's very
easy to forget aboud domain ID. This can cause some bugs, like
virConnectGetAllDomainStats not reporting proper domain IDs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
(assuming myFavouriteDomain has an interface from default
network)
Regardless of how unlikely this scenario looks like, we should
not crash. The problem is, on net-destroy in
networkShutdownNetworkVirtual() the virMacMap module is unrefed,
but the stale pointer is kept around. Thus when the domain
destroy procedure comes in, networkReleaseActualDevice() and
subsequently networkMacMgrDel() is called. This function sees the
stale pointer and starts calling the virMacMap module APIs which
work over freed memory.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Clean up the virsh man page description for --pool-create-as in order
to better describe how the various arguments are used when creating
(or defining) a logical pool.
Ján Tomko [Tue, 28 Mar 2017 07:45:10 +0000 (09:45 +0200)]
Revert "storage: Better describe logical pool creation/definition parameters"
This reverts commit ca4515d2639057020c749470f390fe1f5981e91e
which also included a functional change that broke logical storage pools
not named after their volume groups.
Andrea Bolognani [Wed, 22 Mar 2017 12:44:45 +0000 (13:44 +0100)]
process: Translate "unlimited" correctly
The value we use internally to represent the lack of a memory
locking limit, VIR_DOMAIN_MEMORY_PARAM_UNLIMITED, doesn't
match the value setrlimit() and prlimit() use for the same
purpose, RLIM_INFINITY, so we have to handle the translation
ourselves.
Andrea Bolognani [Tue, 21 Mar 2017 18:52:50 +0000 (19:52 +0100)]
qemu: Remove qemuDomainRequiresMemLock()
Instead of having a separate function, we can simply return
zero from the existing qemuDomainGetMemLockLimitBytes() to
signal the caller that the memory locking limit doesn't need
to be set for the guest.
Having a single function instead of two makes it less likely
that we will use the wrong value, which is exactly what
happened when we started applying the limit that was meant
for VFIO-using guests to <memoryBacking><locked>-using
guests.
Turns out this check is excessively strict: there are ways
other than <memtune><hard_limit> to raise the memory locking
limit for QEMU processes, one prominent example being
tweaking /etc/security/limits.conf.