Ján Tomko [Fri, 26 Jul 2013 10:04:32 +0000 (12:04 +0200)]
Set the number of elements 0 in virNetwork*Clear
Decrementing it when it was already 0 causes an invalid free
in virNetworkDefUpdateDNSHost if virNetworkDNSHostDefParseXML
fails and virNetworkDNSHostDefClear gets called twice.
Ján Tomko [Wed, 17 Jul 2013 08:56:05 +0000 (10:56 +0200)]
cgroup: reuse buffer for getline
Reuse the buffer for getline and track buffer allocation
separately from the string length to prevent unlikely
out-of-bounds memory access.
This fixes the following leak that happened when zero bytes were read:
==404== 120 bytes in 1 blocks are definitely lost in loss record 1,344 of 1,671
==404== at 0x4C2C71B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==404== by 0x906F862: getdelim (iogetdelim.c:68)
==404== by 0x52A48FB: virCgroupPartitionNeedsEscaping (vircgroup.c:1136)
==404== by 0x52A0FB4: virCgroupPartitionEscape (vircgroup.c:1171)
==404== by 0x52A0EA4: virCgroupNewDomainPartition (vircgroup.c:1450)
(cherry picked from commit cc7329317fee6088055d7b09594c19f1b8fec5e3)
qemu: https://bugzilla.redhat.com/show_bug.cgi?id=981094
The commit 0ad9025ef introduce qemu flag QEMU_CAPS_DEVICE_VIDEO_PRIMARY
for using -device VGA, -device cirrus-vga, -device vmware-svga and
-device qxl-vga. In use, for -device qxl-vga, mouse doesn't display
in guest window like the desciption in above bug.
This patch try to use -device for primary video when qemu >=1.6 which
contains the bug fix patch
Eric Blake [Mon, 29 Jul 2013 18:53:36 +0000 (12:53 -0600)]
examples: fix mingw build vs. printf
Mingw *printf is a moving target; newer mingw now provides a version
of asprintf() that fails to understand %lld:
CC event_test-event-test.o
../../../../examples/domain-events/events-c/event-test.c: In function 'myDomainEventRTCChangeCallback':
../../../../examples/domain-events/events-c/event-test.c:270:18: error: unknown conversion type character 'l' in format [-Werror=format=]
virDomainGetID(dom), offset) < 0)
^
But since our examples already admitted that they were hacking around
a mingw deficiency, it is easier to just use printf() directly, coupled
with <inttypes.h> macros, for a more portable work-around.
* examples/domain-events/events-c/event-test.c
(myDomainEventRTCChangeCallback): Use PRIdMAX instead of asprintf.
On platforms without decent group support, the build failed:
Cannot export virGetGroupList: symbol not defined
./.libs/libvirt_security_manager.a(libvirt_security_manager_la-security_dac.o): In function `virSecurityDACPreFork':
/home/eblake/libvirt-tmp/build/src/../../src/security/security_dac.c:248: undefined reference to `virGetGroupList'
collect2: error: ld returned 1 exit status
* src/util/virutil.c (virGetGroupList): Provide dummy implementation.
Eric Blake [Tue, 2 Jul 2013 12:09:30 +0000 (06:09 -0600)]
build: work around mingw header pollution
On Fedora 18, when cross-compiling to mingw with the mingw*-dbus
packages installed, compilation fails with:
CC libvirt_net_rpc_server_la-virnetserver.lo
In file included from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-connection.h:32:0,
from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-bus.h:30,
from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus.h:31,
from ../../src/util/virdbus.h:26,
from ../../src/rpc/virnetserver.c:39:
/usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-message.h:74:58: error: expected ';', ',' or ')' before 'struct'
I have reported this as a bug against two packages:
- mingw-headers, for polluting the namespace
https://bugzilla.redhat.com/show_bug.cgi?id=980270
- dbus, for not dealing with the pollution
https://bugzilla.redhat.com/show_bug.cgi?id=980278
At least dbus has agreed that a future version of dbus headers will
do s/interface/iface/, regardless of what happens in mingw. But it
is also easy to workaround in libvirt in the meantime, without having
to wait for either mingw or dbus to upgrade.
* src/util/virdbus.h (includes): Undo mingw's pollution so that
dbus doesn't fail.
Eric Blake [Mon, 1 Jul 2013 22:48:11 +0000 (16:48 -0600)]
build: configure must not affect tarball contents
On mingw, configure sets the name of the lxc symfile to
libvirt_lxc.defs rather than libvirt_lxc.syms. But tarballs
must be arch-independent, regardless of the configure options
used for the tree where we ran 'make dist'. This led to the
following failure in autobuild.sh:
CCLD libvirt-lxc.la
CCLD libvirt-qemu.la
/usr/lib64/gcc/i686-w64-mingw32/4.7.2/../../../../i686-w64-mingw32/bin/ld: cannot find libvirt_lxc.def: No such file or directory
collect2: error: ld returned 1 exit status
make[3]: *** [libvirt-lxc.la] Error 1
make[3]: *** Waiting for unfinished jobs....
We were already doing the right thing with libvirt_qemu.syms.
* src/Makefile.am (EXTRA_DIST): Don't ship a built file which
depends on configure for its final name.
Eric Blake [Mon, 1 Jul 2013 19:21:57 +0000 (13:21 -0600)]
build: avoid build failure without gnutls
Found while trying to cross-compile to mingw:
CC libvirt_driver_remote_la-remote_driver.lo
../../src/remote/remote_driver.c: In function 'doRemoteOpen':
../../src/remote/remote_driver.c:487:23: error: variable 'verify' set but not used [-Werror=unused-but-set-variable]
* src/remote/remote_driver.c (doRemoteOpen): Also ignore 'verify'.
Ján Tomko [Thu, 4 Jul 2013 09:35:59 +0000 (11:35 +0200)]
Fix build with clang
Partially revert cdd703f's revert of c163410, as linking with clang
with --param=ssp-buffer-size=4 still fails with:
"argument unused during compilation".
Eric Blake [Thu, 18 Jul 2013 21:47:41 +0000 (15:47 -0600)]
maint: update to latest gnulib
Upstream gnulib recently patched a bug in bootstrap, for projects
that use a different name than build-aux for a subdirectory. We
don't, but it doesn't hurt to update.
* .gnulib: Update, for bootstrap fix.
* bootstrap: Sync to upstream.
* bootstrap.conf: Match upstream bug fix.
Eric Blake [Thu, 11 Jul 2013 11:07:16 +0000 (05:07 -0600)]
maint: update to latest gnulib
Future patches need LGPLv2+ versions of some modules that had
recent license changes; but separating the gnulib update from
the actual use of the modules makes it easier to backport to
an older version while avoiding a submodule update (assuming,
of course, that the backport is to a system where glibc provides
adequate functionaliy without needing the gnulib module).
* .gnulib: Update to latest, for modules needed in later patches.
Eric Blake [Wed, 3 Jul 2013 20:43:11 +0000 (14:43 -0600)]
build: honor autogen.sh --no-git
Based on a report by Chandrashekar Shastri, at
https://bugzilla.redhat.com/show_bug.cgi?id=979360
On systems where git cannot access the outside world, a developer
can instead arrange to get a copy of gnulib at the right commit
via side channels (such as NFS share drives), set GNULIB_SRCDIR,
then use ./autogen.sh --no-git. In this setup, we will now
avoid direct use of git. Of course, this means no automatic
gnulib updates when libvirt.git updates its submodule, but it
is expected that any developer in such a situation is already
prepared to deal with the fallout.
* .gnulib: Update to latest, for bootstrap.
* bootstrap: Synchronize from gnulib.
* autogen.sh (no_git): Avoid git when requested.
* cfg.mk (_update_required): Skip automatic rerun of bootstrap if
we can't use git.
* docs/compiling.html.in: Document this setup.
* docs/hacking.html.in: Mention this.
* HACKING: Regenerate.
Eric Blake [Tue, 2 Jul 2013 23:26:42 +0000 (17:26 -0600)]
maint: update to latest gnulib
The latest mingw headers on Fedora 19 fail to build with gnulib
without an update.
Meanwhile, now that upstream gnulib has better handling of -W
probing for clang, we can drop some of our own solutions in
favor of upstream; thus this reverts commit c1634100, "Correctly
detect warning flags with clang".
FreeBSD ships an old gcc 4.2.1 which generates
bogus code, e.g. getsockopt() call returns
struct xucred with bogus values, which doesn't even
allow to connect to libvirtd:
error: Failed to find group record for gid '1284660778': No error: 0
So roll back to just -fstack-protector on FreeBSD.
Commit 17cdc298 tried to backport upstream 90a0c6d, but in
resolving conflicts, failed to account that upstream commit e1d32bb refactored code to leave off a leading /dev.
* src/lxc/lxc_container.c (lxcContainerPopulateDevices): Use
correct device name.
Attempts to start a domain with both SELinux and DAC security
modules loaded will deadlock; latent problem introduced in commit fdb3bde and exposed in commit 29fe5d7. Basically, when recursing
into the security manager for other driver's prefork, we have to
undo the asymmetric lock taken at the manager level.
Reported by Jiri Denemark, with diagnosis help from Dan Berrange.
* src/security/security_stack.c (virSecurityStackPreFork): Undo
extra lock grabbed during recursion.
Commit 75c1256 states that virGetGroupList must not be called
between fork and exec, then commit ee777e99 promptly violated
that for lxc's use of virSecurityManagerSetProcessLabel. Hoist
the supplemental group detection to the time that the security
manager needs to fork. Qemu is safe, as it uses
virSecurityManagerSetChildProcessLabel which in turn uses
virCommand to determine supplemental groups.
This does not fix the fact that virSecurityManagerSetProcessLabel
calls virSecurityDACParseIds calls parseIds which eventually
calls getpwnam_r, which also violates fork/exec async-signal-safe
safety rules, but so far no one has complained of hitting
deadlock in that case.
* src/security/security_dac.c (_virSecurityDACData): Track groups
in private data.
(virSecurityDACPreFork): New function, to set them.
(virSecurityDACClose): Clean up new fields.
(virSecurityDACGetIds): Alter signature.
(virSecurityDACSetSecurityHostdevLabelHelper)
(virSecurityDACSetChardevLabel, virSecurityDACSetProcessLabel)
(virSecurityDACSetChildProcessLabel): Update callers.
A future patch wants the DAC security manager to be able to safely
get the supplemental group list for a given uid, but at the time
of a fork rather than during initialization so as to pick up on
live changes to the system's group database. This patch adds the
framework, including the possibility of a pre-fork callback
failing.
For now, any driver that implements a prefork callback must be
robust against the possibility of being part of a security stack
where a later element in the chain fails prefork. This means
that drivers cannot do any action that requires a call to postfork
for proper cleanup (no grabbing a mutex, for example). If this
is too prohibitive in the future, we would have to switch to a
transactioning sequence, where each driver has (up to) 3 callbacks:
PreForkPrepare, PreForkCommit, and PreForkAbort, to either clean
up or commit changes made during prepare.
* src/security/security_driver.h (virSecurityDriverPreFork): New
callback.
* src/security/security_manager.h (virSecurityManagerPreFork):
Change signature.
* src/security/security_manager.c (virSecurityManagerPreFork):
Optionally call into driver, and allow returning failure.
* src/security/security_stack.c (virSecurityDriverStack):
Wrap the handler for the stack driver.
* src/qemu/qemu_process.c (qemuProcessStart): Adjust caller.
POSIX states that multi-threaded apps should not use functions
that are not async-signal-safe between fork and exec, yet we
were using getpwuid_r and initgroups. Although rare, it is
possible to hit deadlock in the child, when it tries to grab
a mutex that was already held by another thread in the parent.
I actually hit this deadlock when testing multiple domains
being started in parallel with a command hook, with the following
backtrace in the child:
Thread 1 (Thread 0x7fd56bbf2700 (LWP 3212)):
#0 __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007fd5761e7388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007fd5761e7257 in __pthread_mutex_lock (mutex=0x7fd56be00360)
at pthread_mutex_lock.c:61
#3 0x00007fd56bbf9fc5 in _nss_files_getpwuid_r (uid=0, result=0x7fd56bbf0c70,
buffer=0x7fd55c2a65f0 "", buflen=1024, errnop=0x7fd56bbf25b8)
at nss_files/files-pwd.c:40
#4 0x00007fd575aeff1d in __getpwuid_r (uid=0, resbuf=0x7fd56bbf0c70,
buffer=0x7fd55c2a65f0 "", buflen=1024, result=0x7fd56bbf0cb0)
at ../nss/getXXbyYY_r.c:253
#5 0x00007fd578aebafc in virSetUIDGID (uid=0, gid=0) at util/virutil.c:1031
#6 0x00007fd578aebf43 in virSetUIDGIDWithCaps (uid=0, gid=0, capBits=0,
clearExistingCaps=true) at util/virutil.c:1388
#7 0x00007fd578a9a20b in virExec (cmd=0x7fd55c231f10) at util/vircommand.c:654
#8 0x00007fd578a9dfa2 in virCommandRunAsync (cmd=0x7fd55c231f10, pid=0x0)
at util/vircommand.c:2247
#9 0x00007fd578a9d74e in virCommandRun (cmd=0x7fd55c231f10, exitstatus=0x0)
at util/vircommand.c:2100
#10 0x00007fd56326fde5 in qemuProcessStart (conn=0x7fd53c000df0,
driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, migrateFrom=0x0, stdin_fd=-1,
stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
flags=1) at qemu/qemu_process.c:3694
...
The solution is to split the work of getpwuid_r/initgroups into the
unsafe portions (getgrouplist, called pre-fork) and safe portions
(setgroups, called post-fork).
Conflicts:
src/lxc/lxc_container.c - did not use setUIDGID before 1.1.0
src/util/virutil.c - oom handling changes not backported
src/util/virfile.c - functions still lived in virutil.c this far back
configure.ac - context with previous commit
Since neither getpwuid_r() nor initgroups() are safe to call in
between fork and exec (they obtain a mutex, but if some other
thread in the parent also held the mutex at the time of the fork,
the child will deadlock), we have to split out the functionality
that is unsafe. At least glibc's initgroups() uses getgrouplist
under the hood, so the ideal split is to expose getgrouplist for
use before a fork. Gnulib already gives us a nice wrapper via
mgetgroups; we wrap it once more to look up by uid instead of name.
* bootstrap.conf (gnulib_modules): Add mgetgroups.
* src/util/virutil.h (virGetGroupList): New declaration.
* src/util/virutil.c (virGetGroupList): New function.
* src/libvirt_private.syms (virutil.h): Export it.
Conflicts:
bootstrap.conf - not updating gnulib submodule...
configure.ac - ...so checking for getgrouplist by hand...
src/util/virutil.c - ...and copying only the getgrouplist implementation rather than calling the gnulib function
A future patch needs to look up pw_gid; but it is wasteful
to crawl through getpwuid_r twice for two separate pieces
of information, and annoying to copy that much boilerplate
code for doing the crawl. The current internal-only
virGetUserEnt is also a rather awkward interface; it's easier
to just design it to let callers request multiple pieces of
data as needed from one traversal.
And while at it, I noticed that virGetXDGDirectory could deref
NULL if the getpwuid_r lookup fails.
* src/util/virutil.c (virGetUserEnt): Alter signature.
(virGetUserDirectory, virGetXDGDirectory, virGetUserName): Adjust
callers.
Laine Stump [Mon, 1 Jul 2013 03:52:43 +0000 (23:52 -0400)]
pci: initialize virtual_functions array pointer to avoid segfault
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=971325
The problem was that if virPCIGetVirtualFunctions was given the name
of a non-existent interface, it would return to its caller without
initializing the pointer to the array of virtual functions to NULL,
and the caller (virNetDevGetVirtualFunctions) would try to VIR_FREE()
the invalid pointer.
The final error message before the crash would be:
virPCIGetVirtualFunctions:2088 :
Failed to open dir '/sys/class/net/eth2/device':
No such file or directory
In this patch I move the initialization in virPCIGetVirtualFunctions()
to the begining of the function, and also do an explicit
initialization in virNetDevGetVirtualFunctions, just in case someone
in the future adds code into that function prior to the call to
virPCIGetVirtualFunctions.
When checking for a collision of a new libvirt network's subnet with
any existing routes, we read all of /proc/net/route into memory, then
parse all the entries. The function that we use to read this file
requires a "maximum length" parameter, which had previously been set
to 64*1024. As each line in /proc/net/route is 128 bytes, this would
allow for a maximum of 512 entries in the routing table.
This patch increases that number to 128 * 100000, which allows for
100,000 routing table entries. This means that it's possible that 12MB
would be allocated, but that would only happen if there really were
100,000 route table entries on the system, it's only held for a very
short time.
Since there is no method of specifying and unlimited max (and that
would create a potential denial of service anyway) hopefully this
limit is large enough to accomodate everyone.
Ján Tomko [Tue, 11 Jun 2013 13:03:17 +0000 (15:03 +0200)]
qemu: allow restore with non-migratable XML input
Convert input XML to migratable before using it in
qemuDomainSaveImageOpen.
XML in the save image is migratable, i.e. doesn't contain implicit
controllers. If these controllers were in a non-default order in the
input XML, the ABI check would fail. Removing and re-adding these
controllers fixes it.
As of d7f9d827531bc843b7c5aa9d3e8c08738a1de248 we copy the listen
address from the qemu.conf config file in case none has been provided
via XML. But later, when migrating, we should not include such listen
address in the migratable XML as it is something autogenerated, not
requested by user. Moreover, the binding to the listen address will
likely fail, unless the address is '0.0.0.0' or its IPv6 equivalent.
This patch introduces a new boolean attribute to virDomainGraphicsListenDef
to distinguish autofilled listen addresses. However, we must keep the
attribute over libvirtd restarts, so it must be kept within status XML.
Though both libvirt and QEMU's document say RTC_CHANGE returns
the offset from the host UTC, qemu actually returns the offset
from the specified date instead when specific date is provided
(-rtc base=$date).
It's not safe for qemu to fix it in code, it worked like that
for 3 years, changing it now may break other QEMU use cases.
What qemu tries to do is to fix the document:
And in libvirt side, instead of replying on the value from qemu,
this converts the offset returned from qemu to the offset from
host UTC, by:
/*
* a: the offset from qemu RTC_CHANGE event
* b: The specified date (-rtc base=$date)
* c: the host date when libvirt gets the RTC_CHANGE event
* offset: What libvirt will report
*/
offset = a + (b - c);
The specified date (-rtc base=$date) is recorded in clock's def as
an internal only member (may be useful to exposed outside?).
Internal only XML tag "basetime" is introduced to not lose the
guest's basetime after libvirt restarting/reloading:
During domain destruction it's possible that the learnIPAddressThread has
already removed the interface prior to the teardown filter path being run.
The teardown code would only be telling the thread to terminate.
Peter Krempa [Fri, 31 May 2013 13:38:46 +0000 (15:38 +0200)]
qemu: snapshot: Don't kill access to disk if snapshot creation fails
If snapshot creation failed for example due to invalid use of the
"REUSE_EXTERNAL" flag, libvirt killed access to the original image file
instead of the new image file. On machines with selinux this kills the
whole VM as the selinux context is enforced immediately.
* qemu_driver.c:qemuDomainSnapshotUndoSingleDiskActive():
- Kill access to the new image file instead of the old one.
Function qemuDomainSetBlockIoTune() was checking QEMU capabilities
even when !(flags & VIR_DOMAIN_AFFECT_LIVE) and the domain was
shutoff, resulting in the following problem:
Ensure non-root can read /proc/meminfo file in LXC containers
By default files in a FUSE mount can only be accessed by the
user which created them, even if the file permissions would
otherwise allow it. To allow other users to access the FUSE
mount the 'allow_other' mount option must be used. This bug
prevented non-root users in an LXC container from reading
the /proc/meminfo file.
Many applications use /dev/tty to read from stdin.
e.g. zypper on openSUSE.
Let's create this device node to unbreak those applications.
As /dev/tty is a synonym for the current controlling terminal
it cannot harm the host or any other containers.
Peter Krempa [Mon, 27 May 2013 14:08:30 +0000 (16:08 +0200)]
qemu: Implement new QMP command for cpu hotplug
This patch implements support for the "cpu-add" QMP command that plugs
CPUs into a live guest. The "cpu-add" command was introduced in QEMU
1.5. For the hotplug to work machine type "pc-i440fx-1.5" is required.
Don't mount selinux fs in LXC if selinux is disabled
Before trying to mount the selinux filesystem in a container
use is_selinux_enabled() to check if the machine actually
has selinux support (eg not booted with selinux=0)
Change bbe97ae968eba60b71e0066d49f9fc909966d9d6 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing
Jim Fehlig [Fri, 10 May 2013 18:05:00 +0000 (12:05 -0600)]
Fix starting domains when kernel has no cgroups support
Found that I was unable to start existing domains after updating
to a kernel with no cgroups support
# zgrep CGROUP /proc/config.gz
# CONFIG_CGROUPS is not set
# virsh start test
error: Failed to start domain test
error: Unable to initialize /machine cgroup: Cannot allocate memory
virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when
attempting to open /proc/cgroups on such a system, but it was being
dropped in virCgroupSetPartitionSuffix().
Change virCgroupSetPartitionSuffix() to propagate errors returned by
its callees. Also check for ENOENT in qemuInitCgroup() when determining
if cgroups support is available.
(cherry picked from commit bbe97ae968eba60b71e0066d49f9fc909966d9d6)
It is possible to build a kernel without swap cgroup controls
present. This causes a fatal error when querying memory
parameters. Treat missing swap controls as meaning "unlimited".
The fatal error remains if the user tries to actually change
the limit.
Jim Fehlig [Thu, 16 May 2013 14:42:21 +0000 (08:42 -0600)]
libxl: fix build with Xen4.3
Xen 4.3 fixes a mistake in the libxl event handler signature where the
event owned by the application was defined as const. Detect this and
define the libvirt libxl event handler signature appropriately.
(cherry picked from commit 43b0ff5b1eb7fbcdc348b2a53088a7db939d5e48)
Dennis Chen [Fri, 28 Jun 2013 09:59:51 +0000 (11:59 +0200)]
Fix vPort management: FC vHBA creation
When creating a virtual FC HBA with virsh/libvirt API, an error message
will be returned: "error: Node device not found",
also the 'nodedev-dumpxml' shows wrong information of wwpn & wwnn
for the new created device.
Signed-off-by: xschen@tnsoft.com.cn
This reverts f90af69 which switched wwpn & wwwn in the wrong place.
Ján Tomko [Wed, 26 Jun 2013 11:07:24 +0000 (13:07 +0200)]
Fix invalid read in virCgroupGetValueStr
Don't check for '\n' at the end of file if zero bytes were read.
Found by valgrind:
==404== Invalid read of size 1
==404== at 0x529B09F: virCgroupGetValueStr (vircgroup.c:540)
==404== by 0x529AF64: virCgroupMoveTask (vircgroup.c:1079)
==404== by 0x1EB475: qemuSetupCgroupForEmulator (qemu_cgroup.c:1061)
==404== by 0x1D9489: qemuProcessStart (qemu_process.c:3801)
==404== by 0x18557E: qemuDomainObjStart (qemu_driver.c:5787)
==404== by 0x190FA4: qemuDomainCreateWithFlags (qemu_driver.c:5839)
Ján Tomko [Tue, 25 Jun 2013 13:10:27 +0000 (15:10 +0200)]
virsh: edit: don't leak XML string on reedit or redefine
Free the old XML strings before overwriting them if the user
has chosen to reedit the file or force the redefinition.
Found by Alex Jia trying to reproduce another bug:
https://bugzilla.redhat.com/show_bug.cgi?id=977430#c3
(cherry picked from commit 1e3a252974c8e5c650f1d84dc2b167f0ae8cee3c)
As a consequence of the cgroup layout changes from commit 'cfed9ad4', the
lxcDomainGetSchedulerParameters[Flags]()' and lxcGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
As a consequence of the cgroup layout changes from commit '632f78ca', the
qemuDomainGetSchedulerParameters[Flags]()' and qemuGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Cole Robinson [Tue, 28 May 2013 19:37:53 +0000 (15:37 -0400)]
virsh: migrate: Don't disallow --p2p and --migrateuri
Because it's a valid combination. p2p still uses a separate channel
for qemu migration, so there's value in letting the user specify a manual
migrate URI for overriding auto-port, or libvirt's FQDN lookup.
What _isn't_ allowed is --migrateuri and TUNNELLED, since there is
no separate migration channel. Disallow that instead
(cherry picked from commit 5e1de4fcdd86aceba610e9b77f6e62fc6525883b)
Cole Robinson [Tue, 28 May 2013 14:23:00 +0000 (10:23 -0400)]
qemu: Don't report error on successful media eject
If we are just ejecting media, ret == -1 even after the retry loop
determines that the tray is open, as requested. This means media
disconnect always report's error.
Fix it, and fix some other mini issues:
- Don't overwrite the 'eject' error message if the retry loop fails
- Move the retries decrement inside the loop, otherwise the final loop
might succeed, yet retries == 0 and we will raise error
- Setting ret = -1 in the disk->src check is unneeded
- Fix comment typos
Michal Privoznik [Mon, 20 May 2013 17:26:14 +0000 (19:26 +0200)]
qemuDomainChangeEjectableMedia: Unlock domain while waiting for event
In 84c59ffa I've tried to fix changing ejectable media process. The
process should go like this:
1) we need to call 'eject' on the monitor
2) we should wait for 'DEVICE_TRAY_MOVED' event
3) now we can issue 'change' command
However, while waiting in step 2) the domain monitor was locked. So
even if qemu reported the desired event, the proper callback was not
called immediately. The monitor handling code needs to lock the
monitor in order to read the event. So that's the first lock we must
not hold while waiting. The second one is the domain lock. When
monitor handling code reads an event, the appropriate callback is
called then. The first thing that each callback does is locking the
corresponding domain as a domain or its device is about to change
state. So we need to unlock both monitor and VM lock. Well, holding
any lock while sleep()-ing is not the best thing to do anyway.
(cherry picked from commit 543af79a14f06cd16844c28887210bbb93a455fa)
Thread 1 must be calling nwfilterDriverRemoveDBusMatches(). It does so with
nwfilterDriverLock held. In the patch below I am now moving the
nwfilterDriverLock(driverState) further up so that the initialization, which
seems to either take a long time or is entirely stuck, occurs with the lock
held and the shutdown cannot occur at the same time.
Remove the lock in virNWFilterDriverIsWatchingFirewallD to avoid
double-locking.
Doug Goldstein [Sun, 9 Jun 2013 22:23:35 +0000 (17:23 -0500)]
Fix use of VIR_STRDUP vs strdup
Commit 894f784948a93760629de3cb195c69ef4f4b831f broke the v1.0.5-maint
branch because VIR_STRDUP() didn't exist in the v1.0.5 release so the
resulting build is missing that symbol.
qemu: Fix crash in migration of graphics-less guests.
Commit 7f15ebc7a2b599ab10dbc15bca6f823591213e67 introduced a bug
happening when guests without a <graphics> element are migrated.
The initialization of listenAddress happens unconditionally
from the cookie even if the cookie->graphics pointer was NULL.
Moved the initialization to where it is safe.
The problem was that qemuUpdateActivePciHostdevs was returning 0
(success) when no hostdevs were present, but would otherwise return -1
(failure) even when it completed successfully. It is only called from
qemuProcessReconnect(), and when qemuProcessReconnect got back an
error, it would not only stop reconnecting, but would terminate the
guest qemu process "to remove danger of it ending up running twice if
user tries to start it again later".
(This bug was introduced in commit 011cf7ad, which was pushed between
v1.0.2 and v1.0.3, so all maintenance branches from v1.0.3 up to 1.0.5
will need this one line patch applied.)
(cherry picked from commit 2ea45647bcde23cff5da48f725561ff5ba3fba39)
Ján Tomko [Fri, 31 May 2013 11:24:06 +0000 (13:24 +0200)]
qemu: escape literal IPv6 address in NBD migration
A literal IPv6 must be escaped, otherwise migration fails with:
unable to execute QEMU command 'drive-mirror': address resolution failed
for f0::0d:5901: Servname not supported for ai_socktype
since QEMU treats everything after the first ':' as the port.
(cherry picked from commit 2136327e23a7c87e7a75e4fd69faa5b2e8771d38)
Ján Tomko [Thu, 23 May 2013 13:51:05 +0000 (15:51 +0200)]
qemu: fix NBD migration to hosts with IPv6 enabled
Since f03dcc5 we use [::] as the listening address both on qemu
command line in -incoming and in nbd-server-start QMP command.
However the latter requires just :: without the braces.
(cherry picked from commit 2326006410a921bba38c0ce67a367cd1ea88cc33)
Eric Blake [Tue, 21 May 2013 02:30:30 +0000 (20:30 -0600)]
cgroup: be robust against cgroup movement races
https://bugzilla.redhat.com/show_bug.cgi?id=965169 documents a
problem starting domains when cgroups are enabled; I was able
to reliably reproduce the race about 5% of the time when I added
hooks to domain startup by 3 seconds (as that seemed to be about
the length of time that qemu created and then closed a temporary
thread, probably related to aio handling of initially opening
a disk image). The problem has existed since we introduced
virCgroupMoveTask in commit 9102829 (v0.10.0).
There are some inherent TOCTTOU races when moving tasks between
kernel cgroups, precisely because threads can be created or
completed in the window between when we read a thread id from the
source and when we write to the destination. As the goal of
virCgroupMoveTask is merely to move ALL tasks into the new
cgroup, it is sufficient to iterate until no more threads are
being created in the old group, and ignoring any threads that
die before we can move them.
It would be nicer to start the threads in the right cgroup to
begin with, but by default, all child threads are created in
the same cgroup as their parent, and we don't want vcpu child
threads in the emulator cgroup, so I don't see any good way
of avoiding the move. It would also be nice if the kernel were
to implement something like rename() as a way to atomically move
a group of threads from one cgroup to another, instead of forcing
a window where we have to read and parse the source, then format
and write back into the destination.
* src/util/vircgroup.c (virCgroupAddTaskStrController): Ignore
ESRCH, because a thread ended between read and write attempts.
(virCgroupMoveTask): Loop until all threads have moved.
Commit 632f78c introduced a regression which causes schedinfo being
unable to set some parameters. When migrating to priv->cgroup there
was missing variable left out and due to passed NULL to underlying
function, the setting failed.
Ján Tomko [Fri, 12 Apr 2013 15:30:56 +0000 (17:30 +0200)]
daemon: fix leak after listing all volumes
CVE-2013-1962
remoteDispatchStoragePoolListAllVolumes wasn't freeing the pool.
The pool also held a reference to the connection, preventing it from
getting freed and closing the netcf interface driver, which held two
sockets open.
(cherry picked from commit ca697e90d5bd6a6dfb94bfb6d4438bdf9a44b739)
In b2878ed860ceceec3cd6481424fed0b543b687cd we added the O_NOCTTY
flag when opening files in the stream code. Unfortunately a later
piece of code was comparing the flags == O_RDONLY, without masking
out the non-access mode flags. This broke the iohelper when used
with streams for read, since it caused us to attach the stream
output pipe to the stream input FD instead of output FD :-(
The first problem was that virFileOpenAs was returning fd (-1) in one
of the error cases rather than ret (-errno), so the caller thought
that the error was EPERM rather than ENOENT.
The second problem was that some log messages in the general purpose
qemuOpenFile() function would always say "Failed to create" even if
the caller hadn't included O_CREAT (i.e. they were trying to open an
existing file).
This fixes virFileOpenAs to jump down to the error return (which
returns ret instead of fd) in the previously mentioned incorrect
failure case of virFileOpenAs(), removes all error logging from
virFileOpenAs() (since the callers report it), and modifies
qemuOpenFile to appropriately use "open" or "create" in its log
messages.
NB: I seriously considered removing logging from all callers of
virFileOpenAs(), but there is at least one case where the caller
doesn't want virFileOpenAs() to log any errors, because it's just
going to try again (qemuOpenFile()). We can't simply make a silent
variation of virFileOpenAs() though, because qemuOpenFile() can't make
the decision about whether or not it wants to retry until after
virFileOpenAs() has already returned an error code.
Likewise, I also considered changing virFileOpenAs() to return -1 with
errno set on return, and may still do that, but only as a separate
patch, as it obscures the intent of this patch too much.
(cherry picked from commit a2c1bedbd8fa977dc733266e88a1b57e28b50dd3)
Laine Stump [Mon, 6 May 2013 19:43:56 +0000 (15:43 -0400)]
qemu: allocate network connections sooner during domain startup
VFIO device assignment requires a cgroup ACL to be setup for access to
the /dev/vfio/nn "group" device for any devices that will be assigned
to a guest. In the case of a host device that is allocated from a
pool, it was being allocated during qemuBuildCommandLine(), which is
called by qemuProcessStart() *after* the all-encompassing
qemuSetupCgroup() was called, meaning that the standard Cgroup ACL
setup wasn't creating ACLs for these devices allocated from pools.
One possible solution was to manually add a single ACL down inside
qemuBuildCommandLine() when networkAllocateActualDevice() is called,
but that has two problems: 1) the function that adds the cgroup ACL
requires a virDomainObjPtr, which isn't available in
qemuBuildCommandLine(), and 2) we really shouldn't be doing network
device setup inside qemuBuildCommandLine() anyway.
Instead, I've created a new function called
qemuNetworkPrepareDevices() which is called just before
qemuPrepareHostDevices() during qemuProcessStart() (explanation of
ordering in the comments), i.e. well before the call to
qemuSetupCgroup(). To minimize code churn in a patch that will be
backported to 1.0.5-maint, qemuNetworkPrepareDevices only does
networkAllocateActualDevice() and the bare amount of setup required
for type='hostdev network devices, but it eventually should do *all*
device setup for guest network devices.
Note that some of the code that was previously needed in
qemuBuildCommandLine() is no longer required when
networkAllocateActualDevice() is called earlier:
* qemuAssignDeviceHostdevAlias() is already done further down in
qemuProcessStart().
* qemuPrepareHostdevPCIDevices() is called by
qemuPrepareHostDevices() which is called after
qemuNetworkPrepareDevices() in qemuProcessStart().
As hinted above, this new function should be moved into a separate
qemu_network.c (or similarly named) file along with
qemuPhysIfaceConnect(), qemuNetworkIfaceConnect(), and
qemuOpenVhostNet(), and expanded to call those functions as well, then
the nnets loop in qemuBuildCommandLine() should be reduced to only
build the commandline string (which itself can be in a separate
qemuInterfaceBuilldCommandLine() function as suggested by
Michal). However, this will require storing away an array of tapfd and
vhostfd that are needed for the commandline, so I would rather do that
in a separate patch and leave this patch at the minimum to fix the
bug.
(cherry picked from commit 8cd40e7e0d92a0edbe08941fdf728a81c2e6cf15)
Guido Günther [Fri, 3 May 2013 06:03:26 +0000 (08:03 +0200)]
Make detect_scsi_host_caps a function on all architectures
In the non linux case some callers like gather_scsi_host_caps needed the
return code of -1 while others like update_caps needed an empty
statement (to avoid a "statement without effect" warning). This is much
simpler solved by using a function instead of a define.
(cherry picked from commit 58662f44165ccbffee95b4911a7eec49975bde0b)
The lockd plugin for the lock manager was not correctly
handling the release of resource locks. This meant that
during migration, or when pausing a VM, the locks would
not get released. This in turn made it impossible to
resume the domain, or finish migration
(cherry picked from commit 8dc93ffadcca0cc9813ba04036b7139922c55400)
Eric Blake [Thu, 2 May 2013 20:23:02 +0000 (14:23 -0600)]
build: avoid non-portable cast of pthread_t
POSIX says pthread_t is opaque. We can't guarantee if it is scaler
or a pointer, nor what size it is; and BSD differs from Linux.
We've also had reports of gcc complaining on attempts to cast it,
if we use a cast to the wrong type (for example, pointers have to be
cast to void* or intptr_t before being narrowed; while casting a
function return of scalar pthread_t to void* triggers a different
warning).
Give up on casts, and use unions to get at decent bits instead. And
rather than futz around with figuring which 32 bits of a potentially
64-bit pointer are most likely to be unique, convert the rest of
the code base to use 64-bit values when using a debug id.
Based on a report by Guido Günther against kFreeBSD, but with a
fix that doesn't regress commit 4d970fd29 for FreeBSD.
* src/util/virthreadpthread.c (virThreadSelfID, virThreadID): Use
union to get at a decent bit representation of thread_t bits.
* src/util/virthread.h (virThreadSelfID, virThreadID): Alter
signature.
* src/util/virthreadwin32.c (virThreadSelfID, virThreadID):
Likewise.
* src/qemu/qemu_domain.h (qemuDomainJobObj): Alter type of owner.
* src/qemu/qemu_domain.c (qemuDomainObjTransferJob)
(qemuDomainObjSetJobPhase, qemuDomainObjReleaseAsyncJob)
(qemuDomainObjBeginNestedJob, qemuDomainObjBeginJobInternal): Fix
clients.
* src/util/virlog.c (virLogFormatString): Likewise.
* src/util/vireventpoll.c (virEventPollInterruptLocked):
Likewise.
Fix potential use of undefined variable in remote dispatch code
If an early dispatch check caused a jump to the 'cleanup' branch
then virTypeParamsFree() would be called with an uninitialized
'nparams' variable. Fortunately 'params' is initialized to NULL,
so the uninitialized 'nparams' variable would not be used.
Eric Blake [Thu, 2 May 2013 21:46:19 +0000 (15:46 -0600)]
build: fix mingw build of virprocess.c
Commit 776d49f4 added a static function that is only called
conditionally; leading to this compile error on mingw:
CC libvirt_util_la-virprocess.lo
../../src/util/virprocess.c:624:26: error: 'struct rlimit' declared inside parameter list [-Werror]
../../src/util/virprocess.c:624:26: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
../../src/util/virprocess.c:622:1: error: 'virProcessPrLimit' defined but not used [-Werror=unused-function]
* src/util/virprocess.c (virProcessPrLimit): Only declare
virProcessPrLimit when used.
The F_DUPFD_CLOEXEC operation with fcntl() expects a single
int argument, specifying the minimum FD number for the newly
dup'd file descriptor. We were not specifying that causing
random stack data to be accessed as the FD number. Sometimes
that worked, sometimes it didn't.
Eric Blake [Wed, 1 May 2013 20:28:43 +0000 (14:28 -0600)]
spec: proper soft static allocation of qemu uid
https://bugzilla.redhat.com/show_bug.cgi?id=924501 tracks a
problem that occurs if uid 107 is already in use at the time
libvirt is first installed. In response that problem, Fedora
packaging guidelines were recently updated. This fixes the
spec file to comply with the new guidelines:
https://fedoraproject.org/wiki/Packaging:UsersAndGroups