John Ferlan [Fri, 18 Jan 2013 14:34:13 +0000 (09:34 -0500)]
selinux: Resolve resource leak using the default disk label
Commit id a994ef2d1 changed the mechanism to store/update the default
security label from using disk->seclabels[0] to allocating one on the
fly. That change allocated the label, but never saved it. This patch
will save the label. The new virDomainDiskDefAddSecurityLabelDef() is
a copy of the virDomainDefAddSecurityLabelDef().
(cherry picked from commit 05cc03518987fa0f8399930d14c1d635591ca49b)
Peter Krempa [Fri, 4 Jan 2013 15:15:04 +0000 (16:15 +0100)]
rpc: Fix crash on error paths of message dispatching
This patch resolves CVE-2013-0170:
https://bugzilla.redhat.com/show_bug.cgi?id=893450
When reading and dispatching of a message failed the message was freed
but wasn't removed from the message queue.
After that when the connection was about to be closed the pointer for
the message was still present in the queue and it was passed to
virNetMessageFree which tried to call the callback function from an
uninitialized pointer.
This patch removes the message from the queue before it's freed.
* rpc/virnetserverclient.c: virNetServerClientDispatchRead:
- avoid use after free of RPC messages
(cherry picked from commit 46532e3e8ed5f5a736a02f67d6c805492f9ca720)
Although the nwfilter driver skips startup when running in a
session libvirtd, it did not skip reload or shutdown. This
caused errors to be reported when sending SIGHUP to libvirtd,
and caused an abort() in libdbus on shutdown due to trying
to remove a dbus filter that was never added
(cherry picked from commit abbec81bd0c9bf917f2c63045222734d7e4411fb)
Conflicts:
src/nwfilter/nwfilter_driver.c - earlier changes f4ea67f and 79b8a56 related to using bool and auto-shutdown of drivers are not backported
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Yufang Zhang [Wed, 9 Jan 2013 12:18:35 +0000 (20:18 +0800)]
build: move file deleting action from %files list to %install
When building libvirt rpms on rhel5, I got the following error:
File must begin with "/": rm
File must begin with "/": -f
File must begin with "/": $RPM_BUILD_ROOT/etc/sysctl.d/libvirtd
Installed (but unpackaged) file(s) found:
/etc/sysctl.d/libvirtd
It is triggerd by the %files list of libvirt daemon:
In a non-systemd environment the post and preun scripts of libvirt-client
fail, since the required files are in libvirt-daemon. Moved them to client.
Doing that I noticed %{_unitdir}/libvirt-guests.service was contained in
both libvirt-client and libvirt-daemon, which I don't think was intended.
Removed the extra copy from daemon.
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
(cherry picked from commit 3c83df679e8feab939c08b1f97c48f9290a5b8cd)
commit ac2797cf2af2fd0e64c58a48409a8175d24d6f86 attempted to fix a
problem with netlink in newer kernels requiring an extra attribute
with a filter flag set in order to receive an IFLA_VFINFO_LIST from
netlink. Unfortunately, the #ifdef that protected against compiling it
in on systems without the new flag went a bit too far, assuring that
the new code would *never* be compiled, and even if it had, the code
was incorrect.
The first problem was that, while some IFLA_* enum values are also
their existence at compile time, IFLA_EXT_MASK *isn't* #defined, so
checking to see if it's #defined is not a valid method of determining
whether or not to add the attribute. Fortunately, the flag that is
being set (RTEXT_FILTER_VF) *is* #defined, and it is never present if
IFLA_EXT_MASK isn't, so it's sufficient to just check for that flag.
And to top it off, due to the code not actually compiling when I
thought it did, I didn't realize that I'd been given the wrong arglist
to nla_put() - you can't just send a const value to nla_put, you have
to send it a pointer to memory containing what you want to add to the
message, along with the length of that memory.
This time I've actually sent the patch over to the other machine
that's experiencing the problem, applied it to the branch being used
(0.10.2) and verified that it works properly, i.e. it does fix the
problem it's supposed to fix. :-/
(cherry picked from commit 7c36650699f33e54361720f824efdf164bc6e65d)
Laine Stump [Thu, 20 Dec 2012 19:52:41 +0000 (14:52 -0500)]
util: add missing error log messages when failing to get netlink VFINFO
This patch fixes the lack of error messages when libvirt fails to find
VFINFO in a returned netlinke response message.
https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 is an example
of the error message that was previously logged when the
IFLA_VFINFO_LIST object was missing from the netlink response. The
reason for this failure is detailed in
Even though that root problem has been fixed, the experience of
finding the root cause shows us how important it is to properly log an
error message in these cases. This patch *seems* to replace the entire
function, but really most of the changes are due to moving code that
was previously inside an if() statement out to the top level of the
function (the original if() was reversed and made to log an error and
return).
(cherry picked from commit 846770e5ff959f7819e2c32857598cb88e2e2f0e)
When assigning an SRIOV virtual function to a guest using "intelligent
PCI passthrough" (<interface type='hostdev'>, which sets the MAC
address and vlan tag of the VF before passing its info to qemu),
libvirt first learns the current MAC address and vlan tag by sending
an NLM_F_REQUEST message for the VF's PF (physical function) to the
kernel via a NETLINK_ROUTE socket (see virNetDevLinkDump()); the
response message's IFLA_VFINFO_LIST section is examined to extract the
info for the particular VF being assigned.
This worked fine with kernels up until kernel commit 115c9b81928360d769a76c632bae62d15206a94a (first appearing in upstream
kernel 3.3) which changed the ABI to not return IFLA_VFINFO_LIST in
the response until a newly introduced IFLA_EXT_MASK field was included
in the request, with the (newly introduced, of course) RTEXT_FILTER_VF
flag set.
The justification for this ABI change was that new fields had been
added to the VFINFO, causing NLM_F_REQUEST messages to fail on systems
with large numbers of VFs if the requesting application didn't have a
large enough buffer for all the info. The idea is that most
applications doing an NLM_F_REQUEST don't care about VFINFO anyway, so
eliminating it from the response would lower the requirements on
buffer size. Apparently, the people who pushed this patch made the
mistaken assumption that iproute2 (the "ip" command) was the only
package that used IFLA_VFINFO_LIST, so it wouldn't break anything else
(and they made sure that iproute2 was fixed.
The logic of this "fix" is debatable at best (one could claim that the
proper fix would be for the applications in question to be fixed so
that they properly sized the buffer, which is what libvirt does
(purely by virtue of using libnl), but it is what it is and we have to
deal with it.
In order for <interface type='hostdev'> to work properly on systems
with a kernel 3.3 or later, libvirt needs to add the afore-mentioned
IFLA_EXT_MASK field with RTEXT_FILTER_VF set.
Of course we also need to continue working on systems with older
kernels, so that one bit of code is compiled conditionally. The one
time this could cause problems is if the libvirt binary was built on a
system without IFLA_EXT_MASK which was subsequently updated to a
kernel that *did* have it. That could be solved by manually providing
the values of IFLA_EXT_MASK and RTEXT_FILTER_VF and adding it to the
message anyway, but I'm uncertain what that might actually do on a
system that didn't support the message, so for the time being we'll
just fail in that case (which will very likely never happen anyway).
(cherry picked from commit ac2797cf2af2fd0e64c58a48409a8175d24d6f86)
Jiri Denemark [Fri, 26 Oct 2012 10:25:14 +0000 (12:25 +0200)]
virsh: Fix POD syntax
The first two hunks fix "Unterminated I<...> sequence" error and the
last one fixes "’=item’ outside of any ’=over’" error.
(cherry picked from commit 61299a1c983a64c7e0337b94232fdd2d42c1f4f2)
Eric Blake [Fri, 4 Jan 2013 21:21:59 +0000 (14:21 -0700)]
build: install libvirt sysctl file correctly
https://bugzilla.redhat.com/show_bug.cgi?id=887017 reports that
even though libvirt attempts to set fs.aio-max-nr via sysctl,
the file was installed with the wrong name and gets ignored by
sysctl. Furthermore, 'man systcl.d' recommends that packages
install into hard-coded /usr/lib/sysctl.d (even when libdir is
/usr/lib64), so that sysadmins can use /etc/sysctl.d for overrides.
* daemon/Makefile.am (install-sysctl, uninstall-sysctl): Use
correct location.
* libvirt.spec.in (network_files): Reflect this.
(cherry picked from commit a1fd56cb3057c45cffbf5d41eaf70a26d2116b20)
Eric Blake [Fri, 4 Jan 2013 20:35:04 +0000 (13:35 -0700)]
build: use common .in replacement mechanism
We had several different styles of .in conversion in our Makefiles:
ALLCAPS, @ALLCAPS@, @lower@, ::lower::
Canonicalize on one form, to make it easier to copy and paste
between .in files.
Also, we were using some non-portable sed constructs: \@ is an
undefined escape sequence (it happens to be @ itself in GNU sed,
but POSIX allows it to mean something else), as well as risky
behavior (failure to consistently quote things means a space
in $(sysconfdir) could throw things off; also, Autoconf recommends
using | rather than , or ! in the s||| operator, because | has to
be quoted in shell and is therefore less likely to appear in file
names than , or !).
Cole Robinson [Sun, 21 Oct 2012 02:29:51 +0000 (22:29 -0400)]
tools: Only install guests init script if --with-init=script=redhat
Most of this deals with moving the libvirt-guests.sh script which
does all the work to /usr/libexec, so it can be shared by both
systemd and traditional init. Previously systemd depended on
the script being in /etc/init.d
Eric Blake [Fri, 26 Oct 2012 15:06:54 +0000 (09:06 -0600)]
build: check for pod errors
Patch 61299a1c fixed a long-standing pod error in the man page.
But we should be preventing these up front.
See also https://bugzilla.redhat.com/show_bug.cgi?id=870273
* tools/Makefile.am (virt-xml-validate.1, virt-pki-validate.1)
(virt-host-validate.1, virt-sanlock-cleanup.8, virsh.1): Reject
pod conversion errors.
* daemon/Makefile.am ($(srcdir)/libvirtd.8.in): Likewise.
(cherry picked from commit 2639949abe732cf683bec48f87ff6c243b608b76)
Jim Fehlig [Mon, 7 Jan 2013 17:15:56 +0000 (10:15 -0700)]
build: Add libxenctrl to LIBXL_LIBS
Commit dfa1e1dd removed libxenctrl from LIBXL_LIBS, but the libxl
driver uses a symbol from this library. Explicitly link with
libxenctrl instead of relying on the build system to support
implicit DSO linking.
(cherry picked from commit 68e7bc4561783d742d1e266b7f1f0e3516d5117e)
This patch converts the Xen libxl driver to support only Xen >= 4.2.
Support for Xen 4.1 libxl is dropped since that version of libxl is
designated 'technology preview' only and is incompatible with Xen 4.2
libxl. Additionally, the default toolstack in Xen 4.1 is still xend,
for which libvirt has a stable, functional driver.
(cherry picked from commit dfa1e1dd5325f3a6bc8865312cde557044bf0b93)
Conflicts:
src/libxl/libxl_conf.c - commit e5e8d5 not backported
src/libxl/libxl_driver.c - commit 1c04f99 not backported
This introduces a few new APIs for dealing with strings.
One to split a char * into a char **, another to join a
char ** into a char *, and finally one to free a char **
There is a simple test suite to validate the edge cases
too. No more need to use the horrible strtok_r() API,
or hand-written code for splitting strings.
which was originally committed upstream in commit 753ff83a50263d6975f88d6605d4b5ddfcc97560. That commit improperly
removed the "--except-interface lo" from dnsmasq commandlines when
--bind-dynamic was used (based on comments in the latter bug).
It turns out that the problem reported in the CVE could be eliminated
without removing "--except-interface lo", and removing it actually
caused each instance of dnsmasq to listen on localhost on port 53,
which created a new problem:
If another instance of dnsmasq using "bind-interfaces" (instead of
"bind-dynamic") had already been started (or if another instance
started later used "bind-dynamic"), this wouldn't have any immediately
visible ill effects, but if you tried to start another dnsmasq
instance using "bind-interfaces" *after* starting any libvirt
networks, the new dnsmasq would fail to start, because there was
already another process listening on port 53.
This patch changes the network driver to *always* add
"except-interface=lo" to dnsmasq conf files, regardless of whether we use
bind-dynamic or bind-interfaces. This way no libvirt dnsmasq instances
are listening on localhost (and the CVE is still fixed).
The actual code change is miniscule, but must be propogated through all
of the test files as well.
(This is *not* a cherry-pick of the upstream commit that fixes the bug
(commit d66eb7866757dd371560c288dc6201fb9348792a), because subsequent
to the CVE fix, another patch changed the network driver to put
dnsmasq options in a conf file rather than directly on the dnsmasq
commandline preserving the same options), so a cherry-pick is just one
very large conflict.)
If debugging is enabled, the debug messages are sent to stderr.
Moreover, if a command has catching of stderr set, the messages
gets mixed with stdout output (assuming both outputs are stored
in the same variable). The resulting string then doesn't
necessarily have to start with desired prefix then. This bug
exposes itself when parsing dnsmasq output:
2012-12-06 11:18:11.445+0000: 18491: error :
dnsmasqCapsSetFromBuffer:664 : internal error cannot parse
/usr/sbin/dnsmasq version number in '2012-12-06
11:11:02.232+0000: 18492: debug : virFileClose:72 : Closed fd 22'
We can clearly see that the output of dnsmasq --version doesn't
start with expected "Dnsmasq version " string but a libvirt debug
output.
(cherry picked from commit ff33f807739dc2950e4df8c1d4007ce9f8b290c0)
If the debugging is enabled, the virCommand subsystem catches debug
messages in the command output as well. In that case, we can't assume
the string corresponding to command's stdout will start with specific
prefix. But the prefix can be moved deeper in the string. This bug
shows itself when parsing dnsmasq output:
2012-12-06 11:18:11.445+0000: 18491: error :
dnsmasqCapsSetFromBuffer:664 : internal error cannot parse
/usr/sbin/dnsmasq version number in '2012-12-06 11:11:02.232+0000:
18492: debug : virFileClose:72 : Closed fd 22'
We can clearly see that the output of dnsmasq --version
doesn't start with expected "Dnsmasq version " string but a libvirt
debug output.
(cherry picked from commit 5114431396fd125b6ebe4d1a20a981111f948ee7)
It's odd to fall through to buildVol, and the existed file is
removed when buildVol fails. This checks if the volume target
path already exists in createVol. The reason for not using
error like "Volume already exists" is that there isn't volume
maintained by libvirt for the path until a operation like
pool-refresh, using error like that will just cause confusion.
(cherry picked from commit d1f3d14974b4213193130d4b01d4449ad1533cbb)
Since the virConnect object is not locked wholely when doing
virConenctDispose, a thread can get the lock and thus might
cause the race.
Detected by valgrind:
==23687== Invalid read of size 4
==23687== at 0x38BAA091EC: pthread_mutex_lock (pthread_mutex_lock.c:61)
==23687== by 0x3FBA919E36: remoteClientCloseFunc (remote_driver.c:337)
==23687== by 0x3FBA936BF2: virNetClientCloseLocked (virnetclient.c:688)
==23687== by 0x3FBA9390D8: virNetClientIncomingEvent (virnetclient.c:1859)
==23687== by 0x3FBA851AAE: virEventPollRunOnce (event_poll.c:485)
==23687== by 0x3FBA850846: virEventRunDefaultImpl (event.c:247)
==23687== by 0x40CD61: vshEventLoop (virsh.c:2128)
==23687== by 0x3FBA8626F8: virThreadHelper (threads-pthread.c:161)
==23687== by 0x38BAA077F0: start_thread (pthread_create.c:301)
==23687== by 0x33F68E570C: clone (clone.S:115)
==23687== Address 0x4ca94e0 is 144 bytes inside a block of size 312 free'd
==23687== at 0x4A0595D: free (vg_replace_malloc.c:366)
==23687== by 0x3FBA8588B8: virFree (memory.c:309)
==23687== by 0x3FBA86AAFC: virObjectUnref (virobject.c:145)
==23687== by 0x3FBA8EA767: virConnectClose (libvirt.c:1458)
==23687== by 0x40C8B8: vshDeinit (virsh.c:2584)
==23687== by 0x41071E: main (virsh.c:3022)
The above race is caused by the eventLoop thread tries to handle
the net client event by calling the callback set by:
virNetClientSetCloseCallback(priv->client,
remoteClientCloseFunc,
conn, NULL);
I.E. remoteClientCloseFunc, which lock/unlock the virConnect object.
Jiri Denemark [Fri, 30 Nov 2012 15:52:03 +0000 (16:52 +0100)]
qemu: Don't free PCI device if adding it to activePciHostdevs fails
The device is still referenced from pcidevs and freeing it would leave
an invalid pointer there.
(cherry picked from commit ea1a9b5fddd1562abf31eb5c2502879bfd25903e)
Eric Blake [Mon, 3 Dec 2012 20:12:09 +0000 (13:12 -0700)]
build: fix incremental autogen.sh when no AUTHORS is present
Commit 71d1256 tried to fix a problem where rebasing an old
branch on top of newer libvirt.git resulted in automake failing
because of a missing AUTHORS file. However, while the fix
worked for an incremental 'make', it did not work for someone
that directly reran './autogen.sh'. Reported by Laine Stump.
* autogen.sh (autoreconf): Check for same conditions as cfg.mk.
* cfg.mk (_update_required): Add comments.
(cherry picked from commit 55dc872bd85e0756a13fc24693c4eed31d0c202f)
Ján Tomko [Mon, 3 Dec 2012 12:35:05 +0000 (13:35 +0100)]
conf: prevent crash with no uuid in cephx auth secret
Fix the null pointer access when UUID is not specified.
Introduce a bool 'uuidUsable' to virStoragePoolAuthCephx that indicates
if uuid was specified or not and use it instead of the pointless
comparison of the static UUID array to NULL.
Add an error message if both uuid and usage are specified.
Fixes:
Error: FORWARD_NULL (CWE-476):
libvirt-0.10.2/src/conf/storage_conf.c:461: var_deref_model: Passing
null pointer "uuid" to function "virUUIDParse(char const *, unsigned
char *)", which dereferences it. (The dereference is assumed on the
basis of the 'nonnull' parameter attribute.)
Error: NO_EFFECT (CWE-398):
libvirt-0.10.2/src/conf/storage_conf.c:979: array_null: Comparing an
array to null is not useful: "src->auth.cephx.secret.uuid != NULL".
(cherry picked from commit bc680e1381be7eeb5ae7d898ebab598df819b672)
Quote client identity in SASL whitelist log message
When seeing a message
virNetSASLContextCheckIdentity:146 : SASL client admin not allowed in whitelist
it isn't immediately obvious that 'admin' is the identity
being checked. Quote the string to make it more obvious
(cherry picked from commit 07da0a6b54057f48238e35f1756d137c10e0bef5)
Ján Tomko [Fri, 30 Nov 2012 14:07:50 +0000 (15:07 +0100)]
nwfilter: report an error on OOM
Also removed some unreachable code found by coverity:
libvirt-0.10.2/src/nwfilter/nwfilter_driver.c:259: unreachable: This
code cannot be reached: "nwfilterDriverUnlock(driver...".
(cherry picked from commit 4f9af0857c1547d19610e5c59efe45a8d847b67f)
Ján Tomko [Fri, 30 Nov 2012 12:09:21 +0000 (13:09 +0100)]
virsh: check the return value of virStoragePoolGetAutostart
On error, virStoragePoolGetAutostart would return -1 leaving autostart
untouched.
Removed the misleading debug message as well.
Error: CHECKED_RETURN (CWE-252):
libvirt-0.10.2/tools/virsh-pool.c:1386: unchecked_value: No check of the
return value of "virStoragePoolGetAutostart(pool, &autostart)".
(cherry picked from commit e9d74a7a8238f082cf0f0285ce4d2547a72eaa01)
Ján Tomko [Thu, 29 Nov 2012 10:23:06 +0000 (11:23 +0100)]
virsh: use correct sizeof when allocating cpumap
Found by coverity:
Error: SIZEOF_MISMATCH (CWE-569):
libvirt-0.10.2/tools/virsh-domain.c:4754: suspicious_sizeof: Passing
argument "8UL /* sizeof (cpumap) */" to function
"_vshCalloc(vshControl *, size_t, size_t, char const *, int)" and
then casting the return value to "unsigned char *" is suspicious.
Error: SIZEOF_MISMATCH (CWE-569):
libvirt-0.10.2/tools/virsh-domain.c:4942: suspicious_sizeof: Passing
argument "8UL /* sizeof (cpumap) */" to function
"_vshCalloc(vshControl *, size_t, size_t, char const *, int)" and
then casting the return value to "unsigned char *" is suspicious.
(cherry picked from commit dc04b2a737c8c4240d0d412080ce798b977068a1)
Ján Tomko [Thu, 29 Nov 2012 09:56:56 +0000 (10:56 +0100)]
util: fix virBitmap allocation in virProcessInfoGetAffinity
Found by coverity:
Error: REVERSE_INULL (CWE-476):
libvirt-0.10.2/src/util/processinfo.c:141: deref_ptr: Directly
dereferencing pointer "map".
libvirt-0.10.2/src/util/processinfo.c:142: check_after_deref:
Null-checking "map" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
(cherry picked from commit 7730257db30972a6f4db8d582b232b240b2a06b8)
Laine Stump [Wed, 28 Nov 2012 04:59:17 +0000 (23:59 -0500)]
network: fix crash when portgroup has no name
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=879473
The name attribute is required for portgroup elements (yes, the RNG
specifies that), and there is code in libvirt that assumes it is
non-null. Unfortunately, the portgroup parsing function wasn't
checking for lack of portgroup. One adverse result of this was that
attempts to update a network by adding a portgroup with no name would
cause libvirtd to segfault. For example:
This patch causes virNetworkPortGroupParseXML to fail if no name is
specified, thus avoiding any later problems.
(cherry picked from commit 012d69dff1e031f8079a9952e886a31795e589b2)
Ensure transient def is removed if LXC start fails
When starting a container, newDef is initialized to a
copy of 'def', but when startup fails newDef is never
removed. This cause later attempts to use 'virDomainDefine'
to lose the new data being defined.
Ensure failure to create macvtap device aborts LXC start
A mistaken initialization of 'ret' caused failure to create
macvtap devices to be ignored. The libvirt_lxc process
would later fail to start due to missing devices
Also make sure code checks '< 0' and not '!= 0' since only
-1 is considered an error condition
Treat missing driver cgroup as fatal in LXC driver
The LXC driver relies on use of cgroups to kill off LXC processes
in shutdown. If cgroups aren't available, we're unable to kill
off processes, so we must treat lack of cgroups as a fatal startup
error.
The code setting up LXC cgroups used an 'rc' variable both
for capturing the return value of methods it calls, and
its own return status. The result was that several failures
in setting up cgroups would actually result in success being
returned.
Use a separate 'ret' for tracking return value as per normal
code design in other parts of libvirt
Peter Krempa [Mon, 26 Nov 2012 11:13:56 +0000 (12:13 +0100)]
lxc: Avoid segfault of libvirt_lxc helper on early cleanup paths
Early jumps to the cleanup label caused a crash of the libvirt_lxc
container helper as the cleanup section called
virLXCControllerDeleteInterfaces(ctrl) without checking the ctrl argument
for NULL. The argument was de-referenced soon after.
$ /usr/libexec/libvirt_lxc
/usr/libexec/libvirt_lxc: missing --name argument for configuration
Segmentation fault
(cherry picked from commit 81efb13b4a33f58c28e0e65dcc9521b983592683)
Ján Tomko [Sun, 25 Nov 2012 01:59:33 +0000 (02:59 +0100)]
storage: fix logical volume cloning
Commit 258e06c removed setting of the volume type to
VIR_STORAGE_VOL_BLOCK, which leads to failures in
storageVolumeCreateXMLFrom.
The type (and target.format) of the volume was set to zero. In
virStorageBackendGetBuildVolFromFunction, this gets interpreted as
VIR_STORAGE_FILE_NONE and the qemu-img tool is called with unknown
"none" format.
Warn if requesting update to non-existent timer/handle watch
The event code is a no-op if requested to update a non-existent
timer/handle watch. This makes it hard to detect bugs in the
caller who have passed bogus data. Add a VIR_WARN output in
such cases, since the API does not allow for return errors.
Fix virDiskNameToIndex to actually ignore partition numbers
The docs for virDiskNameToIndex claim it ignores partition
numbers. In actual fact though, a code ordering bug means
that a partition number will cause the code to accidentally
multiply the result by 26.
Notice unsupported disk type (for the driver), but also no address
specified. The first part is not a problem and we should not abort
immediately because of that, but the combination with the address
unknown was causing an unspecified error.
While fixing this, I added an error to one place where this return
value was not managed properly.
(cherry picked from commit 03cd6e4ae8d86682986249f05f7de8eb405a12da)
The LXC controller code currently directly invokes the
libvirt main loop code. The problem is that this misses
the cleanup of virNetServerClient connections that
virNetServerRun takes care of.
The result is that when libvirtd is stopped, the
libvirt_lxc controller process gets stuck in a I/O loop.
When libvirtd is then started again, it fails to connect
to the controller and thus kills off the entire domain.
Osier Yang [Wed, 21 Nov 2012 03:22:39 +0000 (11:22 +0800)]
storage: Fix bug of fs pool destroying
Regression introduced by commit 258e06c85b7, "ret" could be set to 1
or 0 by virStorageBackendFileSystemIsMounted before goto cleanup.
This could mislead the callers (up to the public API
virStoragePoolDestroy) to return success even the underlying umount
command fails.
(cherry picked from commit f4ac06569a8ffce24fb8c07a0fc01574e38de6e4)
Ján Tomko [Tue, 20 Nov 2012 18:47:07 +0000 (19:47 +0100)]
conf: add support for booting from redirected USB devices
Commit a4c19459aa8634c43b51e8138fb1d7eec4c17824 only added the
QEMU capability flag, command line option and added the boot element
for redirdev's in the XML schema.
This patch adds support for parsing and writing the XML with redirdevs
with the boot flag. It also ignores unknown XML elements in redirdev
instead of failing with:
"error: An error occurred, but the cause is unknown"
The reported problem is that an attempt to restore a saved domain that
was configured with <currentMemory> and <memory> set to some (same for
both) number that's not a multiple of 4096KiB results in an error like
this:
error: Failed to start domain libvirt_test_api
error: XML error: current memory '4001792k' exceeds maximum '4000768k'
(in this case, currentMemory was set to 4000000KiB).
The reason for this failure is:
1) a saved image contains the "live xml" of the domain at the time of
the save.
2) the live xml of a running domain gets its currentMemory
(a.k.a. cur_balloon) directly from the qemu monitor rather than from
the configuration of the domain.
3) the value reported by qemu is (sometimes) not exactly what was
originally given to qemu when the domain was started, but is rounded
up to [some indeterminate granularity] - in some versions of qemu that
granularity is apparently 1MiB, and in others it is 4MiB.
4) When the XML is parsed to setup the state of the restored domain,
the XML parser for <currentMemory> compares it to <memory> (which is
the maximum allowed memory size for the domain) and if <currentMemory>
is greater than the next 1024KiB boundary above <memory>, it spits out
an error and fails.
For example (from the BZ) if you start qemu on RHEL6 with both
<currentMemory> and <memory> of 4000000 (this number is in KiB),
libvirt's dominfo or dumpxml will report "4001792" back (rounded up to
next 4MiB) for 10-20 seconds after the start, then revert to reporting
"4000000". On Fedora 16 (which uses qemu-1.0), it will instead report
"4000768" (rounded up to next 1MiB). On Fedora 17 (qemu-1.2), it seems
to always report "4000000". ("4000000" is of course okay, and
"4000768" is also okay since that's the next 1024KiB boundary above
"4000000" and the parser was already allowing for that. But "4001792
is *not* okay and produces the error message.)
This patch solves the problem by changing the allowed "fudge factor"
when parsing from 1024KiB to 4096KiB to match the maximum up-rounding
that could be done in qemu.
(I had earlier thought to fix this by up-rounding <memory> in the
dumpxml that's put into the saved image, but that wouldn't have fixed
the case where the save image was produced by an "unfixed"
libvirtd.)
(cherry picked from commit 89204fca7f193c7cf48f941bf2917c1a0e71096c)
Eric Blake [Thu, 1 Nov 2012 22:20:09 +0000 (16:20 -0600)]
nodeinfo: support kernels that lack socket information
On RHEL 5, I was getting a segfault trying to start libvirtd,
because we were failing virNodeParseSocket but not checking
for errors, and then calling CPU_SET(-1, &sock_map) as a result.
But if you don't have a topology/physical_package_id file,
then you can just assume that the cpu belongs to socket 0.
* src/nodeinfo.c (virNodeGetCpuValue): Change bool into
default_value.
(virNodeParseSocket): Allow for default value when file is missing,
different from fatal error on reading file.
(virNodeParseNode): Update call sites to fail on error.
(cherry picked from commit 47976b484cfae6ff0dd9e6fcbd45d377aaeaa8f4)
Eric Blake [Wed, 14 Nov 2012 20:20:55 +0000 (13:20 -0700)]
build: rerun bootstrap if AUTHORS is missing
Ever since commit 7b21981c started generating AUTHORS, we now have
the situation that if you flip between two branches in the same
git repository that cross that commit boundary, then 'make' will
fail due to automake complaining about AUTHORS not existing. The
simplest solution is to realize that if AUTHORS does not exist,
then we flipped branches so we will need to rerun bootstrap
anyways; and rerunning bootstrap ensures AUTHORS will exist in time.
Fix uninitialized variable in virLXCControllerSetupDevPTS
The lack of initialization of 'opts' caused a SEGV in the
cleanup: path if the root->src directory did not exist
(cherry picked from commit 3782814d4ad787d815e56382b6f809fe9020f14b)
After the connection to ESX 5.1 being broken since g1e7cd39, the fix
in bab7752c helped a bit, but still missed a spot, so the connection
is now successful, but some APIs (for example defineXML) don't work.
Two cases missing are added in this patch to avoid that.
(cherry picked from commit 9c294e6f9a2ae23f4fbfc98d0dc2aec1ce9eba0b)
qemu is sensitive to the order of arguments passed. Hence, if a
device requires a controller, the controller cmd string must
precede device cmd string. The same apply for controllers, when
for instance ccid controller requires usb controller. So
controllers create partial ordering in which they should be added
to qemu cmd line.
(cherry picked from commit 0f720ab35a1a836aa23b66ef4bafbc5e57290357)
Václav Pavlín [Thu, 25 Oct 2012 10:10:55 +0000 (12:10 +0200)]
spec: replace scriptlets with new systemd macros
https://bugzilla.redhat.com/850186
I added %with_systemd_macros so it should now work in F17 with old
scriptlets and in F18+/RHEL7+ with systemd macros
(see https://fedoraproject.org/wiki/Packaging:ScriptletSnippets#Systemd)
I missed libvirt-guests.service because there is no systemctl call for
it. So I only added systemd macros calls.
(cherry picked from commit ec02d49dfd98313d1c8c4179e88754f10a82ca46)
Some FDs may not implement fdatasync() functionality,
e.g. pipes. In that case EINVAL or EROFS is returned.
We don't want to fail then nor report any error.
Peter Krempa [Thu, 1 Nov 2012 14:45:47 +0000 (15:45 +0100)]
qemu: Fix possible race when pausing guest
When pausing the guest while migration is running (to speed up
convergence) the virDomainSuspend API checks if the migration job is
active before entering the job. This could cause a possible race if the
virDomainSuspend is called while the job is active but ends before the
Suspend API enters the job (this would require that the migration is
aborted). This would cause a incorrect event to be emitted.
(cherry picked from commit d0fc6dc8315b3172501e6fe09c8aed12598de47e)
Peter Krempa [Thu, 25 Oct 2012 14:13:57 +0000 (16:13 +0200)]
net: Remove dnsmasq and radvd files also when destroying transient nets
The network driver didn't care about config files when a network was
destroyed, just when it was undefined leaving behind files for transient
networks.
This patch splits out the cleanup code to a helper function that handles
the cleanup if the inactive network object is being removed and re-uses
this code when getting rid of inactive networks.
(cherry picked from commit e87af617fc3e5a69fb19319d43f58262e5e624ea)
Peter Krempa [Thu, 25 Oct 2012 12:41:28 +0000 (14:41 +0200)]
net: Move creation of dnsmasq hosts file to function starting dnsmasq
The hosts file was created in the network definition function. This
patch moves the place the file is being created to the point where
dnsmasq is being started.
(cherry picked from commit 23ae3fe4256ba634babc6818b8cb7bbd3664a95a)
Peter Krempa [Fri, 26 Oct 2012 12:33:13 +0000 (14:33 +0200)]
conf: net: Fix deadlock if assignment of network def fails
When the assignment fails, the network object is not unlocked and next
call that would use it deadlocks.
(cherry picked from commit f8230891243f86e920d04a0751512cc31055ff8c)
Dan Walsh [Thu, 1 Nov 2012 18:54:39 +0000 (14:54 -0400)]
Linux Containers are not allowed to create device nodes.
This needs to be done before the container starts. Turning
off the mknod capability is noticed by systemd, which will
no longer attempt to create device nodes.
Michal Privoznik [Tue, 30 Oct 2012 18:15:48 +0000 (19:15 +0100)]
iohelper: fdatasync() at the end
Currently, when we are doing (managed) save, we insert the
iohelper between the qemu and OS. The pipe is created, the
writing end is passed to qemu and the reading end to the
iohelper. It reads data and write them into given file. However,
with write() being asynchronous data may still be in OS
caches and hence in some (corner) cases, all migration data
may have been read and written (not physically though). So
qemu will report success, as well as iohelper. However, with
some non local filesystems, where ENOSPACE is polled every X
time units, we may get into situation where all operations
succeeded but data hasn't reached the disk. And in fact will
never do. Therefore we ought sync caches to make sure data
has reached the block device on remote host.
(cherry picked from commit f32e3a2dd686f3692cd2bd3147c03e90f82df987)
Recent fixes made almost all the right steps to make emulator pinned
to the cpuset of the whole domain in case <emulatorpin> isn't
specified, but qemudDomainGetEmulatorPinInfo still reports all the
CPUs even when cpuset is specified. This patch fixes that.
(cherry picked from commit 10c5212b108e8395a6cab2dd449f2b1c0f1442d0)
Gene Czarcinski [Tue, 30 Oct 2012 21:18:34 +0000 (17:18 -0400)]
bugfix: ip6tables rule removal
Three FORWARD chain rules are added and two INPUT chain rules
are added when a network is started but only the FORWARD chain
rules are removed when the network is destroyed.
(cherry picked from commit adaa7ab653b0f841aa549e9f47f9e63ee1d15b37)
Laine Stump [Mon, 29 Oct 2012 22:05:41 +0000 (18:05 -0400)]
util: do a better job of matching up pids with their binaries
This patch resolves: https://bugzilla.redhat.com/show_bug.cgi?id=871201
If libvirt is restarted after updating the dnsmasq or radvd packages,
a subsequent "virsh net-destroy" will fail to kill the dnsmasq/radvd
process.
The problem is that when libvirtd restarts, it re-reads the dnsmasq
and radvd pidfiles, then does a sanity check on each pid it finds,
including checking that the symbolic link in /proc/$pid/exe actually
points to the same file as the path used by libvirt to execute the
binary in the first place. If this fails, libvirt assumes that the
process is no longer alive.
But if the original binary has been replaced, the link in /proc is set
to "$binarypath (deleted)" (it literally has the string " (deleted)"
appended to the link text stored in the filesystem), so even if a new
binary exists in the same location, attempts to resolve the link will
fail.
In the end, not only is the old dnsmasq/radvd not terminated when the
network is stopped, but a new dnsmasq can't be started when the
network is later restarted (because the original process is still
listening on the ports that the new process wants).
The solution is, when the initial "use stat to check for identical
inodes" check for identity between /proc/$pid/exe and $binpath fails,
to check /proc/$pid/exe for a link ending with " (deleted)" and if so,
truncate that part of the link and compare what's left with the
original binarypath.
A twist to this problem is that on systems with "merged" /sbin and
/usr/sbin (i.e. /sbin is really just a symlink to /usr/sbin; Fedora
17+ is an example of this), libvirt may have started the process using
one path, but /proc/$pid/exe lists a different path (indeed, on F17
this is the case - libvirtd uses /sbin/dnsmasq, but /proc/$pid/exe
shows "/usr/sbin/dnsmasq"). The further bit of code to resolve this is
to call virFileResolveAllLinks() on both the original binarypath and
on the truncated link we read from /proc/$pid/exe, and compare the
results.
The resulting code still succeeds in all the same cases it did before,
but also succeeds if the binary was deleted or replaced after it was
started.
(cherry picked from commit 7bafe009d93f8b26330d52dc3289643699cf74f0)
qemu: pass -usb and usb hubs earlier, so USB disks with static address are handled properly
(cherry picked from commit 81af5336acf4c765ef1201e7762d003ae0b0011e)
return attributes that are in a union whose contents are interpreted
differently depending on the actual->type and so they should only
return non-0 when actual->type is 'bridge' (in the first case) or
'direct' (in the other two cases, but I had neglected to do that, so
...DirectDev() was returning bridge.brname (which happens to share the
same spot in the union with direct.linkdev) if actual->type was
'bridge', and ...BridgeName was returning direct.linkdev when
actual->type was 'direct'.
How does this involve Bug 881480 (which was about the inability to
switch between two networks that both have "<forward mode='bridge'/>
<bridge name='xxx'/>"? Whenever the return value of
virDomainNetGetActualDirectDev() for the new and old network
definitions doesn't match, qemuDomainChangeNet() requires a "complete
reconnect" of the device, which qemu currently doesn't
support. ...DirectDev() *should* have been returning NULL for old and
new, but was instead returning the old and new bridge names, which
differ.
(The other two functions weren't causing any behavioral problems in
virDomainChangeNet(), but their problem and fix was identical, so I
included them in this same patch).
(cherry picked from commit 3738cf41f1012eca419e8fa0fa0575cb1e0552e3)
In short, a dnsmasq instance run with the intention of listening for
DHCP/DNS requests only on a libvirt virtual network (which is
constructed using a Linux host bridge) would also answer queries sent
from outside the virtualization host.
This patch takes advantage of a new dnsmasq option "--bind-dynamic",
which will cause the listening socket to be setup such that it will
only receive those requests that actually come in via the bridge
interface. In order for this behavior to actually occur, not only must
"--bind-interfaces" be replaced with "--bind-dynamic", but also all
"--listen-address" options must be replaced with a single
"--interface" option. Fully:
--bind-interfaces --except-interface lo --listen-address x.x.x.x ...
(with --listen-address possibly repeated) is replaced with:
--bind-dynamic --interface virbrX
Of course libvirt can't use this new option if the host's dnsmasq
doesn't have it, but we still want libvirt to function (because the
great majority of libvirt installations, which only have mode='nat'
networks using RFC1918 private address ranges (e.g. 192.168.122.0/24),
are immune to this vulnerability from anywhere beyond the local subnet
of the host), so we use the new dnsmasqCaps API to check if dnsmasq
supports the new option and, if not, we use the "old" option style
instead. In order to assure that this permissiveness doesn't lead to a
vulnerable system, we do check for non-private addresses in this case,
and refuse to start the network if both a) we are using the old-style
options, and b) the network has a publicly routable IP
address. Hopefully this will provide the proper balance of not being
disruptive to those not practically affected, and making sure that
those who *are* affected get their dnsmasq upgraded.
Laine Stump [Thu, 22 Nov 2012 02:17:30 +0000 (21:17 -0500)]
util: new virSocketAddrIsPrivate function
This new function returns true if the given address is in the range of
any "private" or "local" networks as defined in RFC1918 (IPv4) or
RFC3484/RFC4193 (IPv6), otherwise they return false.
Laine Stump [Tue, 20 Nov 2012 17:22:15 +0000 (12:22 -0500)]
util: capabilities detection for dnsmasq
In order to optionally take advantage of new features in dnsmasq when
the host's version of dnsmasq supports them, but still be able to run
on hosts that don't support the new features, we need to be able to
detect the version of dnsmasq running on the host, and possibly
determine from the help output what options are in this dnsmasq.
This patch implements a greatly simplified version of the capabilities
code we already have for qemu. A dnsmasqCaps device can be created and
populated either from running a program on disk, reading a file with
the concatenated output of "dnsmasq --version; dnsmasq --help", or
examining a buffer in memory that contains the concatenated output of
those two commands. Simple functions to retrieve capabilities flags,
the version number, and the path of the binary are also included.
bridge_driver.c creates a single dnsmasqCaps object at driver startup,
and disposes of it at driver shutdown. Any time it must be used, the
dnsmasqCapsRefresh method is called - it checks the mtime of the
binary, and re-runs the checks if the binary has changed.
networkxml2argvtest.c creates 2 "artificial" dnsmasqCaps objects at
startup - one "restricted" (doesn't support --bind-dynamic) and one
"full" (does support --bind-dynamic). Some of the test cases use one
and some the other, to make sure both code pathes are tested.
Jiri Denemark [Tue, 16 Oct 2012 19:11:29 +0000 (21:11 +0200)]
qemu: Always format CPU topology
When libvirt cannot find a suitable CPU model for host CPU (easily
reproducible by running libvirt in a guest), it would not provide CPU
topology in capabilities XML either. Even though CPU topology is known
and can be queried by virNodeGetInfo. With this patch, CPU topology will
always be provided in capabilities XML regardless on the presence of CPU
model.
(cherry picked from commit f1c70100409562c3f402392aa667732e5f89a2c4)
Eric Blake [Mon, 5 Nov 2012 16:48:28 +0000 (09:48 -0700)]
spec: don't enable cgconfig under systemd
In Fedora 16, we quit enabling cgconfig because systemd set up
default cgroups that were good enough for our use. But in F17,
when we switched to systemd, we reverted and started up cgconfig
again. See also the tail of this thread:
https://www.redhat.com/archives/libvir-list/2012-October/msg01657.html
Stefan Hajnoczi [Thu, 1 Nov 2012 17:20:55 +0000 (18:20 +0100)]
qemu: Keep QEMU host drive prefix in BlkIoTune
The QEMU -drive id= begins with libvirt's QEMU host drive prefix
("drive-"), which is stripped off in several places two convert between
host ("-drive") and guest ("-device") device names.
In the case of BlkIoTune it is unnecessary to strip the QEMU host drive
prefix because we operate on "info block"/"query-block" output that uses
host drive names.
Stripping the prefix incorrectly caused string comparisons to fail since
we were comparing the guest device name against the host device name.