CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC chardev hostdev hotplug
Rewrite lxcDomainAttachDeviceHostdevMiscLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC block hostdev hotplug
Rewrite lxcDomainAttachDeviceHostdevStorageLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC USB hotplug
Rewrite lxcDomainAttachDeviceHostdevSubsysUSBLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC disk hotplug
Rewrite lxcDomainAttachDeviceDiskLive function to use the
virProcessRunInMountNamespace helper. This avoids risk of
a malicious guest replacing /dev with a absolute symlink,
tricking the driver into changing the host OS filesystem.
Eric Blake [Tue, 24 Dec 2013 05:55:51 +0000 (22:55 -0700)]
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC shutdown/reboot code
Use helper virProcessRunInMountNamespace in lxcDomainShutdownFlags and
lxcDomainReboot. Otherwise, a malicious guest could use symlinks
to force the host to manipulate the wrong file in the host's namespace.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Add helper for running code in separate namespaces
Implement virProcessRunInMountNamespace, which runs callback of type
virProcessNamespaceCallback in a container namespace. This uses a
child process to run the callback, since you can't change the mount
namespace of a thread. This implies that callbacks have to be careful
about what code they run due to async safety rules.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Daniel Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 7c72ef6f555f1f9844d51be2f38f078bc908652c)
Add a helper function which takes a file path and ensures
that all directory components leading up to the file exist.
IOW, it strips the filename part of the path and passes
the result to virFileMakePath.
Move check for cgroup devices ACL upfront in LXC hotplug
The check for whether the cgroup devices ACL is available is
done quite late during LXC hotplug - in fact after the device
node is already created in the container in some cases. Better
to do it upfront so we fail immediately.
Fix reset of cgroup when detaching USB device from LXC guests
When detaching a USB device from an LXC guest we must remove
the device from the cgroup ACL. Unfortunately we were telling
the cgroup code to use the guest /dev path, not the host /dev
path, and the guest device node had already been unlinked.
This was, however, fortunate since the code passed &priv->cgroup
instead of priv->cgroup, so would have crash if the device node
were accessible.
The LXC code missed the 'usb' component out of the path
/dev/bus/usb/$BUSNUM/$DEVNUM, so it failed to actually
setup cgroups for the device. This was in fact lucky
because the call to virLXCSetupHostUsbDeviceCgroup
was also mistakenly passing '&priv->cgroup' instead of
just 'priv->cgroup'. So once the path is fixed, libvirtd
would then crash trying to access the bogus virCgroupPtr
pointer. This would have been a security issue, were it
not for the bogus path preventing the pointer reference
being reached.
virDomainDefCompatibleDevice blocks use of USB if no USB
controller is present. This is not correct for containers
since devices can be assigned directly regardless of any
controllers.
Eric Blake [Tue, 5 Nov 2013 17:30:56 +0000 (10:30 -0700)]
storage: avoid short reads while chasing backing chain
Our backing file chain code was not very robust to an ill-timed
EINTR, which could lead to a short read causing us to randomly
treat metadata differently than usual. But the existing
virFileReadLimFD forces an error if we don't read the entire
file, even though we only care about the header of the file.
So add a new virFile function that does what we want.
* src/util/virfile.h (virFileReadHeaderFD): New prototype.
* src/util/virfile.c (virFileReadHeaderFD): New function.
* src/libvirt_private.syms (virfile.h): Export it.
* src/util/virstoragefile.c (virStorageFileGetMetadataInternal)
(virStorageFileProbeFormatFromFD): Use it.
Commit f9f56340 for CVE-2014-0028 almost had the right idea - we
need to check the ACL rules to filter which events to send. But
it overlooked one thing: the event dispatch queue is running in
the main loop thread, and therefore does not normally have a
current virIdentityPtr. But filter checks can be based on current
identity, so when libvirtd.conf contains access_drivers=["polkit"],
we ended up rejecting access for EVERY event due to failure to
look up the current identity, even if it should have been allowed.
Furthermore, even for events that are triggered by API calls, it
is important to remember that the point of events is that they can
be copied across multiple connections, which may have separate
identities and permissions. So even if events were dispatched
from a context where we have an identity, we must change to the
correct identity of the connection that will be receiving the
event, rather than basing a decision on the context that triggered
the event, when deciding whether to filter an event to a
particular connection.
If there were an easy way to get from virConnectPtr to the
appropriate virIdentityPtr, then object_event.c could adjust the
identity prior to checking whether to dispatch an event. But
setting up that back-reference is a bit invasive. Instead, it
is easier to delay the filtering check until lower down the
stack, at the point where we have direct access to the RPC
client object that owns an identity. As such, this patch ends
up reverting a large portion of the framework of commit f9f56340.
We also have to teach 'make check' to special-case the fact that
the event registration filtering is done at the point of dispatch,
rather than the point of registration. Note that even though we
don't actually use virConnectDomainEventRegisterCheckACL (because
the RegisterAny variant is sufficient), we still generate the
function for the purposes of documenting that the filtering
takes place.
Also note that I did not entirely delete the notion of a filter
from object_event.c; I still plan on using that for my upcoming
patch series for qemu monitor events in libvirt-qemu.so. In
other words, while this patch changes ACL filtering to live in
remote.c and therefore we have no current client of the filtering
in object_event.c, the notion of filtering in object_event.c is
still useful down the road.
* src/check-aclrules.pl: Exempt event registration from having to
pass checkACL filter down call stack.
* daemon/remote.c (remoteRelayDomainEventCheckACL)
(remoteRelayNetworkEventCheckACL): New functions.
(remoteRelay*Event*): Use new functions.
* src/conf/domain_event.h (virDomainEventStateRegister)
(virDomainEventStateRegisterID): Drop unused parameter.
* src/conf/network_event.h (virNetworkEventStateRegisterID):
Likewise.
* src/conf/domain_event.c (virDomainEventFilter): Delete unused
function.
* src/conf/network_event.c (virNetworkEventFilter): Likewise.
* src/libxl/libxl_driver.c: Adjust caller.
* src/lxc/lxc_driver.c: Likewise.
* src/network/bridge_driver.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/remote/remote_driver.c: Likewise.
* src/test/test_driver.c: Likewise.
* src/uml/uml_driver.c: Likewise.
* src/vbox/vbox_tmpl.c: Likewise.
* src/xen/xen_driver.c: Likewise.
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Eric Blake [Tue, 14 Jan 2014 17:29:34 +0000 (10:29 -0700)]
event: filter global events by domain:getattr ACL [CVE-2014-0028]
Ever since ACL filtering was added in commit 7639736 (v1.1.1), a
user could still use event registration to obtain access to a
domain that they could not normally access via virDomainLookup*
or virConnectListAllDomains and friends. We already have the
framework in the RPC generator for creating the filter, and
previous cleanup patches got us to the point that we can now
wire the filter through the entire object event stack.
Furthermore, whether or not domain:getattr is honored, use of
global events is a form of obtaining a list of networks, which
is covered by connect:search_domains added in a93cd08 (v1.1.0).
Ideally, we'd have a way to enforce connect:search_domains when
doing global registrations while omitting that check on a
per-domain registration. But this patch just unconditionally
requires connect:search_domains, even when no list could be
obtained, based on the following observations:
1. Administrators are unlikely to grant domain:getattr for one
or all domains while still denying connect:search_domains - a
user that is able to manage domains will want to be able to
manage them efficiently, but efficient management includes being
able to list the domains they can access. The idea of denying
connect:search_domains while still granting access to individual
domains is therefore not adding any real security, but just
serves as a layer of obscurity to annoy the end user.
2. In the current implementation, domain events are filtered
on the client; the server has no idea if a domain filter was
requested, and must therefore assume that all domain event
requests are global. Even if we fix the RPC protocol to
allow for server-side filtering for newer client/server combos,
making the connect:serach_domains ACL check conditional on
whether the domain argument was NULL won't benefit older clients.
Therefore, we choose to document that connect:search_domains
is a pre-requisite to any domain event management.
Network events need the same treatment, with the obvious
change of using connect:search_networks and network:getattr.
Eric Blake [Tue, 14 Jan 2014 00:30:59 +0000 (17:30 -0700)]
Fix memory leak in virObjectEventCallbackListRemoveID()
While running objecteventtest, it was found that valgrind pointed out the
following memory leak:
==13464== 5 bytes in 1 blocks are definitely lost in loss record 7 of 134
==13464== at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==13464== by 0x341F485E21: strdup (strdup.c:42)
==13464== by 0x4CAE28F: virStrdup (virstring.c:554)
==13464== by 0x4CF3CBE: virObjectEventCallbackListAddID (object_event.c:286)
==13464== by 0x4CF49CA: virObjectEventStateRegisterID (object_event.c:729)
==13464== by 0x4CF73FE: virDomainEventStateRegisterID (domain_event.c:1424)
==13464== by 0x4D7358F: testConnectDomainEventRegisterAny (test_driver.c:6032)
==13464== by 0x4D600C8: virConnectDomainEventRegisterAny (libvirt.c:19128)
==13464== by 0x402409: testDomainStartStopEvent (objecteventtest.c:232)
==13464== by 0x403451: virtTestRun (testutils.c:138)
==13464== by 0x402012: mymain (objecteventtest.c:395)
==13464== by 0x403AF2: virtTestMain (testutils.c:593)
==13464==
The @list->callbacks is an array that is inflated whenever a new event
is added, e.g. via virDomainEventCallbackListAddID(). However, when we
are freeing the array, we free the items within it but forgot to
actually free it.
When writing commit 173c291, I missed the fact virNetServerClientClose
unlocks the client object before actually clearing client->sock and thus
it is possible to hit a window when client->keepalive is NULL while
client->sock is not NULL. I was thinking client->sock == NULL was a
better check for a closed connection but apparently we have to go with
client->keepalive == NULL to actually fix the crash.
When a client closes its connection to libvirtd early during
virConnectOpen, more specifically just after making
REMOTE_PROC_CONNECT_SUPPORTS_FEATURE call to check if
VIR_DRV_FEATURE_PROGRAM_KEEPALIVE is supported without even waiting for
the result, libvirtd may crash due to a race in keep-alive
initialization. Once receiving the REMOTE_PROC_CONNECT_SUPPORTS_FEATURE
call, the daemon's event loop delegates it to a worker thread. In case
the event loop detects EOF on the connection and calls
virNetServerClientClose before the worker thread starts to handle
REMOTE_PROC_CONNECT_SUPPORTS_FEATURE call, client->keepalive will be
disposed by the time virNetServerClientStartKeepAlive gets called from
remoteDispatchConnectSupportsFeature. Because the flow is common for
both authenticated and read-only connections, even unprivileged clients
may cause the daemon to crash.
To avoid the crash, virNetServerClientStartKeepAlive needs to check if
the connection is still open before starting keep-alive protocol.
Every libvirt release since 0.9.8 is affected by this bug.
Jiri Denemark [Fri, 20 Dec 2013 13:50:02 +0000 (14:50 +0100)]
qemu: Avoid using stale data in virDomainGetBlockInfo
CVE-2013-6458
Generally, every API that is going to begin a job should do that before
fetching data from vm->def. However, qemuDomainGetBlockInfo does not
know whether it will have to start a job or not before checking vm->def.
To avoid using disk alias that might have been freed while we were
waiting for a job, we use its copy. In case the disk was removed in the
meantime, we will fail with "cannot find statistics for device '...'"
error message.
When virDomainDetachDeviceFlags is called concurrently to
virDomainBlockStats: libvirtd may crash because qemuDomainBlockStats
finds a disk in vm->def before getting a job on a domain and uses the
disk pointer after getting the job. However, the domain in unlocked
while waiting on a job condition and thus data behind the disk pointer
may disappear. This happens when thread 1 runs
virDomainDetachDeviceFlags and enters monitor to actually remove the
disk. Then another thread starts running virDomainBlockStats, finds the
disk in vm->def, and while it's waiting on the job condition (owned by
the first thread), the first thread finishes the disk removal. When the
second thread gets the job, the memory pointed to be the disk pointer is
already gone.
That said, every API that is going to begin a job should do that before
fetching data from vm->def.
Dario Faggioli [Fri, 20 Dec 2013 15:29:47 +0000 (16:29 +0100)]
libxl: avoid crashing if calling `virsh numatune' on inactive domain
by, in libxlDomainGetNumaParameters(), calling libxl_bitmap_init() as soon as
possible, which avoids getting to 'cleanup:', where libxl_bitmap_dispose()
happens, without having initialized the nodemap, and hence crashing after some
invalid free()-s:
Signed-off-by: Dario Faggili <dario.faggioli@citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit f9ee91d35510ccbc6fc42cef8864b291b2d220f4)
The function doesn't check whether the request is made for active or
inactive domain. Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).
I re-made the function in order for it to work the same way it's qemu
counterpart does.
Reproducer:
1) Define an LXC domain
2) Do 'virsh memtune <domain> --hard-limit 133T'
Backtrace:
Thread 6 (Thread 0x7fffec8c0700 (LWP 26826)):
#0 0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf718) at util/vircgroup.c:1764
#1 0x00007ffff70e9206 in virCgroupSetValueStr (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffe409f360 "1073741824")
at util/vircgroup.c:669
#2 0x00007ffff70e98b4 in virCgroupSetValueU64 (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=1073741824) at util/vircgroup.c:740
#3 0x00007ffff70ee518 in virCgroupSetMemory (group=0x0, kb=1048576) at util/vircgroup.c:1904
#4 0x00007ffff70ee675 in virCgroupSetMemoryHardLimit (group=0x0, kb=1048576)
at util/vircgroup.c:1944
#5 0x00005555557d54c8 in lxcDomainSetMemoryParameters (dom=0x7fffe40cc420,
params=0x7fffe409f100, nparams=1, flags=0) at lxc/lxc_driver.c:774
#6 0x00007ffff72c20f9 in virDomainSetMemoryParameters (domain=0x7fffe40cc420,
params=0x7fffe409f100, nparams=1, flags=0) at libvirt.c:4051
#7 0x000055555561365f in remoteDispatchDomainSetMemoryParameters (server=0x555555eb7e00,
client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510)
at remote_dispatch.h:7621
#8 0x00005555556133fd in remoteDispatchDomainSetMemoryParametersHelper (server=0x555555eb7e00,
client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510,
ret=0x7fffe40b84f0) at remote_dispatch.h:7591
#9 0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
at rpc/virnetserverprogram.c:435
#10 0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
at rpc/virnetserverprogram.c:305
#11 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ec4b10,
prog=0x555555ec3ae0, msg=0x555555eb94e0) at rpc/virnetserver.c:165
#12 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ec3e30, opaque=0x555555eb7e00)
at rpc/virnetserver.c:186
#13 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
#14 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
#15 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
#16 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
CVE-2013-6436: fix crash in lxcDomainGetMemoryParameters
The function doesn't check whether the request is made for active or
inactive domain. Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).
I re-made the function in order for it to work the same way it's qemu
counterpart does.
Reproducer:
1) Define an LXC domain
2) Do 'virsh memtune <domain>'
Backtrace:
Thread 6 (Thread 0x7fffec8c0700 (LWP 13387)):
#0 0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf750) at util/vircgroup.c:1764
#1 0x00007ffff70e958c in virCgroupGetValueStr (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf7c0) at util/vircgroup.c:705
#2 0x00007ffff70e9d29 in virCgroupGetValueU64 (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf810) at util/vircgroup.c:804
#3 0x00007ffff70ee706 in virCgroupGetMemoryHardLimit (group=0x0, kb=0x7fffec8bf8a8)
at util/vircgroup.c:1962
#4 0x00005555557d590f in lxcDomainGetMemoryParameters (dom=0x7fffd40024a0,
params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at lxc/lxc_driver.c:826
#5 0x00007ffff72c28d3 in virDomainGetMemoryParameters (domain=0x7fffd40024a0,
params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at libvirt.c:4137
#6 0x000055555563714d in remoteDispatchDomainGetMemoryParameters (server=0x555555eb7e00,
client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
ret=0x7fffd4002420) at remote.c:1895
#7 0x00005555556052c4 in remoteDispatchDomainGetMemoryParametersHelper (server=0x555555eb7e00,
client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
ret=0x7fffd4002420) at remote_dispatch.h:4050
#8 0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
at rpc/virnetserverprogram.c:435
#9 0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
at rpc/virnetserverprogram.c:305
#10 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ebaef0,
prog=0x555555ec3ae0, msg=0x555555ebb3e0) at rpc/virnetserver.c:165
#11 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ebc7e0, opaque=0x555555eb7e00)
at rpc/virnetserver.c:186
#12 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
#13 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
#14 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
#15 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Ryota Ozaki [Thu, 24 Oct 2013 15:48:25 +0000 (00:48 +0900)]
virnetsocket: fix getsockopt on FreeBSD
aa0f099 introduced a strict error checking for getsockopt and it
revealed that getting a peer credential of a socket on FreeBSD
didn't work. Libvirtd hits the error:
error : virNetSocketGetUNIXIdentity:1198 : Failed to get valid
client socket identity groups
SOL_SOCKET (0xffff) was used as a level of getsockopt for
LOCAL_PEERCRED, however, it was wrong. 0 is correct as well as
Mac OS X.
So for LOCAL_PEERCRED our options are SOL_LOCAL (if defined) or
0 on Mac OS X and FreeBSD. According to the fact, the patch
simplifies the code by removing ifdef __APPLE__.
I tested the patch on FreeBSD 8.4, 9.2 and 10.0-BETA1.
For reference, Linux systems typically define it as:
typedef bool_t (*xdrproc_t)(XDR *, void *, ...);
The rationale explained in the header is that using a vararg is
incorrect and has a potential to change the ABI slightly do to compiler
optimizations taken and the undefined behavior. They decided
to specify the exact number of parameters and for compatibility with old
code decided to make the signature require 3 arguments. The third
argument is ignored for cases that its not used and its recommended to
supply a 0.
libxl: fix dubious cpumask handling in libxlDomainSetVcpuAffinities
Rather than casting the virBitmap pointer to uint8_t* and then using
the structure contents as a byte array, use the virBitmap API to determine
the bitmap size and test each bit.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Jim Fehlig [Fri, 1 Nov 2013 14:22:18 +0000 (08:22 -0600)]
Revert "libxl: Fix possible invalid read"
This reverts commit 394d6e0a95ac91facba06ab43d67ae8600b14b0e.
The real problem is accessing the virtBitmap structure as a byte
array, which was correctly identified and fixed by Jeremy Fitzhardinge
in recently xen commit: 7051d5c8, there is a api changes in
libxl_domain_create_restore.
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu Oct 10 12:23:10 2013 +0100
tools/migrate: Fix regression when migrating from older version of Xen
use the macro LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS in libxl.h
in order to make libvirt could compile with old and new xen.
the params checkpointed_stream is useful if libvirt libxl driver
support migration. for new, set it as zero.
When starting a transient VM the first thing done is to check
for duplicates. The check looks if there are any running VMs
with the matching name/uuid. It explicitly allows there to
be inactive VMs, so that a persistent VM can be temporarily
booted with a different config.
There is a race condition, however, where 2 or more clients
try to create the same transient VM. The first client will
cause a virDomainObjPtr to be added to the domain list, and
it is inactive at this stage. The second client may then
come along and see this inactive VM, and mistake it for a
persistent VM.
If the first VM fails to start its transient guest for any
reason, then it'll remove the virDomainObjPtr from the list.
The second client now has a virDomainObjPtr that it can try
to boot, which libvirt no longer has a record of. The result
can be a running QEMU process that is orphaned.
It was also, however, possible for the virDomainObjPtr to be
completely free'd which will cause libvirtd to crash in some
scenarios.
The fix is to only allow an existing inactive VM if it is
marked as persistent.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ryota Ozaki [Thu, 31 Oct 2013 15:45:12 +0000 (00:45 +0900)]
nodedev_hal: fix segfault when virDBusGetSystemBus fails
Thie patch fixes the segfault:
error : nodeStateInitialize:658 : DBus not available,
disabling HAL driver: internal error: Unable to get DBus
system bus connection: Failed to connect to socket
/var/run/dbus/system_bus_socket: No such file or directory
error : nodeStateInitialize:719 : ?:
Caught Segmentation violation dumping internal log buffer:
This segfault occurs at the below VIR_ERROR:
failure:
if (dbus_error_is_set(&err)) {
VIR_ERROR(_("%s: %s"), err.name, err.message);
When virDBusGetSystemBus fails, the code jumps to the above failure
path. However, the err variable is not correctly initialized
before calling virDBusGetSystemBus. As a result, dbus_error_is_set
may pass over the uninitialized err variable whose name or
message may point to somewhere unknown memory region, which
causes a segfault on VIR_ERROR.
The new code initializes the err variable before calling
virDBusGetSystemBus.
Include reference of the VM object pointer and name in debug
logs for QEMU start/stop functions. Also make sure we log the
PID that we started, since it isn't available elsewhere in the
logs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In debugging a recent oVirt/libvirt race condition, I was very
frustrated by lack of logging in the job enter/exit code. This
patch adds some key data which would have been useful in by
debugging attempts.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Eric Blake [Wed, 30 Oct 2013 21:42:31 +0000 (15:42 -0600)]
storage: use correct type for array count
Using size_t counts will let us use VIR_APPEND_ELEMENT and friends.
* src/conf/storage_conf.h (_virStoragePoolObjList)
(_virStorageVolDefList): Track list sizes with size_t.
* src/storage/storage_backend_rbd.c
(virStorageBackendRBDRefreshPool): Fix type fallout.
Eric Blake [Tue, 29 Oct 2013 19:57:19 +0000 (13:57 -0600)]
maint: avoid further typedef accidents
To make it easier to forbid future attempts at a confusing typedef
name ending in Ptr that isn't actually a pointer, insist that we
follow our preferred style of 'typedef foo *fooPtr'.
Fix race condition reconnecting to vms & loading configs
The following sequence
1. Define a persistent QMEU guest
2. Start the QEMU guest
3. Stop libvirtd
4. Kill the QEMU process
5. Start libvirtd
6. List persistent guests
At the last step, the previously running persistent guest
will be missing. This is because of a race condition in the
QEMU driver startup code. It does
1. Load all VM state files
2. Spawn thread to reconnect to each VM
3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get
marked as "persistent". There is therefore a window where
the thread reconnecting to the VM will remove the persistent
VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
In addition to this though, we must only attempt to reconnect
to a VM which had a non-zero PID loaded from its state file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Fix leak of objects when reconnecting to QEMU instances
The 'error' cleanup block in qemuProcessReconnect() had a
'return' statement in the middle of it. This caused a leak
of virConnectPtr & virQEMUDriverConfigPtr instances. This
was identified because netcf recently started checking its
refcount in libvirtd shutdown:
netcfStateCleanup:109 : internal error: Attempt to close netcf state driver with open connections
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Doug Goldstein [Mon, 28 Oct 2013 18:20:35 +0000 (13:20 -0500)]
MacOS: Re-add support for QEMU backend
The QEMU backend was disabled on Mac OS X without a reason in the code
and due to refactors its difficult to understand when/why it was
disabled. With QEMU being supported on Mac OS X there is no reason to
disable QEMU on this platform.
John Ferlan [Tue, 22 Oct 2013 07:50:08 +0000 (08:50 +0100)]
Add '+' to uid/gid printing for label processing
To ensure proper processing by virGetUserID() and virGetGroupID()
of a uid/gid add a "+" prior to the uid/gid to denote it's really
a uid/gid for the label.
Eric Blake [Tue, 29 Oct 2013 15:56:48 +0000 (09:56 -0600)]
storage: fix incorrect typedef
The rbd code had a confusing typedef ending in Ptr that was not
actually a pointer, which made the rest of the code harder to
read. This fixes things to actually pass by pointer rather than
by copy.
Push RPM deps down into libvirt-daemon-driver-XXXX sub-RPMs
For inexplicable reasons, many of the 3rd party package deps
were left against the 'libvirt-daemon' RPM when the drivers
were split out. This makes a minimal install heavier that
it should be. Push them all down into libvirt-daemon-driver-XXX
so they're only pulled in when truly needed
With this change applied, a minimal install of just the
libvirt-daemon-driver-lxc RPM is reduced by 41 MB on a
Fedora 19 host.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
capabilities: add baselabel per sec driver/virt type to secmodel
Expand the "secmodel" XML fragment of "host" with a sequence of
baselabel's which describe the default security context used by
libvirt with a specific security model and virtualization type:
Chen Hanxiao [Mon, 28 Oct 2013 11:18:26 +0000 (11:18 +0000)]
Skip debug message in lxcContainerSetID if no map is set.
The lxcContainerSetID() method prints a misleading log
message about setting the uid/gid when no ID map is
present in the XML config. Skip the debug message in
this case.
John Ferlan [Fri, 18 Oct 2013 10:43:47 +0000 (06:43 -0400)]
Avoid Coverity DEADCODE warning
Commit '922b7fda' resulted in two DEADCODE warnings from Coverity in
remoteDispatchAuthPolkit and virAccessDriverPolkitFormatProcess.
Commit '604ae657' modified the daemon.c code to remove the deadcode
issue, but did not do so for viracessdriverpolkit.c. This just mimics
the same changes
Commit e962a57 added 'attach-disk --shareable', even though we
already had 'attach-disk --mode=shareable'. Worse, if the user
types 'attach-disk --mode=readonly --shareable', we create
non-sensical XML. The best solution is just to undocument the
duplicate spelling, by having it fall back to the preferred
spelling.
* tools/virsh-domain.c (cmdAttachDisk): Let alias handling fix our
mistake in exposing a second spelling for an existing option.
* tools/virsh.pod: Fix documentation.
Eric Blake [Thu, 24 Oct 2013 07:06:29 +0000 (08:06 +0100)]
virsh: allow alias to expand to opt=value pair
We want to treat 'attach-disk --shareable' as an undocumented
alias for 'attach-disk --mode=shareable'. By improving our
alias handling, we can allow all such --bool -> --opt=value
replacements, and guarantee up front that the alias is not
mixed with its replacement.
* tools/virsh.c (vshCmddefOptParse, vshCmddefGetOption): Add
support for expanding bool alias to --opt=value.
(opts_echo): Add another alias to test it.
* tests/virshtest.c (mymain): Test it.
According to the following valgrind output, there seems to be a
invalid limit for the iterator (captured on Fedora 19):
==3945== Invalid read of size 1
==3945== at 0x1E1FA410: libxlVmStart (libxl_driver.c:475)
==3945== by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945== by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945== by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945== by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945== by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945== by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945== by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945== by 0x8046C52: start_thread (pthread_create.c:308)
==3945== by 0x8758E1C: clone (clone.S:113)
==3945== Address 0x23424d81 is 0 bytes after a block of size 1 alloc'd
==3945== at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3945== by 0x50B1F8C: virAllocN (viralloc.c:189)
==3945== by 0x1E1FA3CA: libxlVmStart (libxl_driver.c:468)
==3945== by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945== by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945== by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945== by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945== by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945== by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945== by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945== by 0x8046C52: start_thread (pthread_create.c:308)
==3945== by 0x8758E1C: clone (clone.S:113)
==3945==
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1013045 Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
build: Fix prohibit_int_ijk (and iijjkk) on RHEL 5
On RHEL 5, make syntax-check was failing because even strings like
'int isTempChain' matched the 'int i' rule. To be honest, I haven't
found the root cause, but the change added makes it work as expected
and keeps the proper behavior on newer systems as well.
Marian Neagul [Tue, 22 Oct 2013 15:03:39 +0000 (16:03 +0100)]
python: Fix Create*WithFiles filefd passing
Commit d76227be added functions virDomainCreateWithFiles and
virDomainCreateXMLWithFiles, but there was a little piece missing in
python bindings. This patch fixes proper passing of file descriptors
in the overwrites of these functions.
Hongwei Bi [Tue, 22 Oct 2013 13:38:01 +0000 (21:38 +0800)]
networkStartDhcpDaemon: Check for dnsmasqCapsRefresh failure
Currently, we ignore whether dnsmasqCapsRefresh succeeds or fails. We
shouldn't do that as we may generate wrong dnsmasq command line (what
is done just a few lines below).
Signed-off-by: Hongwei Bi <hwbi2008@gmail.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Doug Goldstein [Sat, 12 Oct 2013 17:06:27 +0000 (12:06 -0500)]
rpc: Retrieve peer PID via new getsockopt() for Mac
While LOCAL_PEERCRED on the BSDs does not return the pid information of
the peer, Mac OS X 10.8 added LOCAL_PEERPID to retrieve the pid so we
should use that when its available to get that information.
Jim Fehlig [Tue, 22 Oct 2013 05:12:22 +0000 (23:12 -0600)]
build: fix build of virt-login-shell on systems with older gnutls
On systems where gnutls uses libgcrypt, I'm seeing the following
build failure
libvirt.c:314: error: variable 'virTLSThreadImpl' has initializer but incomplete type
libvirt.c:319: error: 'GCRY_THREAD_OPTION_PTHREAD' undeclared here (not in a function)
...
Fix by undefining WITH_GNUTLS_GCRYPT in config-post.h
Michal Privoznik [Tue, 22 Oct 2013 09:33:06 +0000 (10:33 +0100)]
Get rid of shadowed booleans
There are still two places where we are using 1bit width unsigned
integer to store a boolean. There's no real need for this and these
occurrences can be replaced with 'bool'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Jim Fehlig [Mon, 21 Oct 2013 21:36:11 +0000 (15:36 -0600)]
build: fix linking virt-login-shell
After commit 3e2f27e1, I've noticed build failures of virt-login-shell
when libapparmor-devel is installed on the build host
CCLD virt-login-shell
../src/.libs/libvirt-setuid-rpc-client.a(libvirt_setuid_rpc_client_la-vircommand.o):
In function `virExec':
/home/jfehlig/virt/upstream/libvirt/src/util/vircommand.c:653: undefined
reference to `aa_change_profile'
collect2: error: ld returned 1 exit status
I was about to commit an easy fix under the build-breaker rule
(build-fix-1.patch), but thought to extend the notion of SECDRIVER_LIBS
to SECDRIVER_CFLAGS, and use both throughout src/Makefile.am where it
makes sense (build-fix-2.patch).
Should I just stick with the simple fix, or is something along the lines
of patch 2 preferred?
Regards,
Jim
>From a0f35945f3127ab70d051101037e821b1759b4bb Mon Sep 17 00:00:00 2001
From: Jim Fehlig <jfehlig@suse.com>
Date: Mon, 21 Oct 2013 15:30:02 -0600
Subject: [PATCH] build: fix virt-login-shell build with apparmor
With libapparmor-devel installed, virt-login-shell fails to link
CCLD virt-login-shell
../src/.libs/libvirt-setuid-rpc-client.a(libvirt_setuid_rpc_client_la-vircommand.o): In function `virExec':
/home/jfehlig/virt/upstream/libvirt/src/util/vircommand.c:653: undefined reference to `aa_change_profile'
collect2: error: ld returned 1 exit status
Fix by linking libvirt_setuid_rpc_client with previously determined
SECDRIVER_LIBS in src/Makefile.am. While at it, introduce SECDRIVER_CFLAGS
and use both throughout src/Makefile.am where it makes sense.
Michal Privoznik [Tue, 22 Oct 2013 12:15:21 +0000 (13:15 +0100)]
vircgroupmock: Mock access() to some more files
Currently, if access(path, mode) is invoked, we check if @path has this
special prefix SYSFS_PREFIX. If it does, we modify the path a bit and
call realaccess. If it doesn't we act just like a wrapper and call
realaccess directly. However, we are mocking fopen() as well. And as one
can clearly see there, fopen("/proc/cgroups") will succeed. Hence, we
have an error in our mocked access(): We need to check whether @path is
not equal to /proc/cgroups as it may not exists on real system we're
running however we definitely know how to fopen() it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Tue, 22 Oct 2013 11:29:13 +0000 (12:29 +0100)]
tests: Use lv_abs_top_builddir instead of bare abs_top_builddir
As stated in the comment above introduction of the lv_abs_top_builddir
variable, older automake doesn't provide abs_top_builddir variable.
Hence, we are creating our own one with lv_ prefix. However, when
exporting env variables to the tests, the variables are not evaluated
but only substituted. Hence:
Peter Krempa [Tue, 22 Oct 2013 14:01:26 +0000 (15:01 +0100)]
virsh: Fix job watching when STDIN is not a tty
In commit b46c4787dde79b015dad67dedda4ccf6ff1a3082 I changed the code to
watch long running jobs in virsh. Unfortunately I didn't take into
account that poll may get a hangup if the terminal is not a TTY and will
be closed.
This patch avoids polling the STDIN fd when there's no TTY.
Ryota Ozaki [Sun, 20 Oct 2013 15:14:52 +0000 (00:14 +0900)]
nodeinfo: fix physical memory size on Mac OS X
HW_PHYSMEM is available on Mac OS X as well as FreeBSD, however,
its resulting value for Mac OS X is 32 bits. Mac OS X provides
HW_MEMSIZE that is 64 bits version of HW_PHYSMEM. We have to use it.
I tested the patch on Mac OS X 10.6.8, 10.7.4, 10.8.5 and FreeBSD 9.2.
When libvirt was changed to delay the final cleanup of device removal
until the qemu process had signaled it with a DEVICE_DELETED event for
that device, the hostdev removal function
(qemuDomainRemoveHostDevice()) was written to properly handle the
removal of a hostdev that was actually an SRIOV virtual function
(defined with <interface type='hostdev'>). However, the function used
to search for a device matching the alias name provided in the
DEVICE_DELETED message (virDomainDefFindDevice()) would search through
the list of netdevs before hostdevs, so qemuDomainRemoveHostDevice()
was never called; instead the netdev function,
qemuDomainRemoveNetDevice() (which *doesn't* properly cleanup after
removal of <interface type='hostdev'>), was called.
(As a reminder - each <interface type='hostdev'> results in a
virDomainNetDef which contains a virDomainHostdevDef having a parent
type of VIR_DOMAIN_DEVICE_NET, and parent.data.net pointing back to
the virDomainNetDef; both Defs point to the same device info object
(and the info contains the device's "alias", which is used by qemu to
identify the device). The virDomainHostdevDef is added to the domain's
hostdevs list *and* the virDomainNetDef is added to the domain's nets
list, so searching either list for a particular alias will yield a
positive result.)
This function modifies the qemuDomainRemoveNetDevice() to short
circuit itself and call qemu DomainRemoveHostDevice() instead when the
actual device is a VIR_DOMAIN_NET_TYPE_HOSTDEV (similar logic to what
is done in the higher level qemuDomainDetachNetDevice())
Note that even if virDomainDefFindDevice() changes in the future so
that it finds the hostdev entry first, the current code will continue
to work properly.
This function was called in three places, and in each the call was
qualified by a slightly different conditional. In reality, this
function should only be called for a hostdev if all of the following
are true:
1) mode='subsystem'
2) type='pci'
3) there is a parent device definition which is an <interface>
(VIR_DOMAIN_DEVICE_NET)
We can simplify the callers and make them more consistent by checking
these conditions at the top ov qemuDomainHostdevNetConfigRestore and
returning 0 if one of them isn't satisfied.
The location of the call to qemuDomainHostdevNetConfigRestore() has
also been changed in the hot-plug case - it is moved into the caller
of its previous location (i.e. from qemuDomainRemovePCIHostDevice() to
qemuDomainRemoveHostDevice()). This was done to be more consistent
about which functions pay attention to whether or not this is one of
the special <interface> hostdevs or just a normal hostdev -
qemuDomainRemoveHostDevice() already contained a call to
networkReleaseActualDevice() and virDomainNetDefFree(), so it makes
sense for it to also handle the resetting of the device's MAC address
and vlan tag (which is what's done by
qemuDomainHostdevNetConfigRestore()).
Move virt-login-shell into libvirt-login-shell sub-RPM
Many people will not want the setuid virt-login-shell binary
installed by default, so move it into a separate sub-RPM
named libvirt-login-shell. This RPM is only generated if
LXC is enabled
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Most of the usage of getuid()/getgid() is in cases where we are
considering what privileges we have. As such the code should be
using the effective IDs, not real IDs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We already have stubs for getuid, geteuid, getgid but
not for getegid. Something in gnulib already does a
check for it during configure, so we already have the
HAVE_GETEGID macro defined.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Only allow the UNIX transport in remote driver when setuid
We don't know enough about quality of external libraries used
for non-UNIX transports, nor do we want to spawn external
commands when setuid. Restrict to the bare minimum which is
UNIX transport for local usage. Users shouldn't need to be
running setuid if connecting to remote hypervisors in any
case.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Unconditional use of getenv is not secure in setuid env.
While not all libvirt code runs in a setuid env (since
much of it only exists inside libvirtd) this is not always
clear to developers. So make all the code paranoid, even
if it only ever runs inside libvirtd.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When running setuid, we must be careful about what env vars
we allow commands to inherit from us. Replace the
virCommandAddEnvPass function with two new ones which do
filtering