CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC hotunplug code
Rewrite multiple hotunplug functions to to use the
virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with an absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC chardev hostdev hotplug
Rewrite lxcDomainAttachDeviceHostdevMiscLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC block hostdev hotplug
Rewrite lxcDomainAttachDeviceHostdevStorageLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC USB hotplug
Rewrite lxcDomainAttachDeviceHostdevSubsysUSBLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC disk hotplug
Rewrite lxcDomainAttachDeviceDiskLive function to use the
virProcessRunInMountNamespace helper. This avoids risk of
a malicious guest replacing /dev with a absolute symlink,
tricking the driver into changing the host OS filesystem.
Eric Blake [Tue, 24 Dec 2013 05:55:51 +0000 (22:55 -0700)]
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC shutdown/reboot code
Use helper virProcessRunInMountNamespace in lxcDomainShutdownFlags and
lxcDomainReboot. Otherwise, a malicious guest could use symlinks
to force the host to manipulate the wrong file in the host's namespace.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Add helper for running code in separate namespaces
Implement virProcessRunInMountNamespace, which runs callback of type
virProcessNamespaceCallback in a container namespace. This uses a
child process to run the callback, since you can't change the mount
namespace of a thread. This implies that callbacks have to be careful
about what code they run due to async safety rules.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Daniel Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 7c72ef6f555f1f9844d51be2f38f078bc908652c)
Add a helper function which takes a file path and ensures
that all directory components leading up to the file exist.
IOW, it strips the filename part of the path and passes
the result to virFileMakePath.
Move check for cgroup devices ACL upfront in LXC hotplug
The check for whether the cgroup devices ACL is available is
done quite late during LXC hotplug - in fact after the device
node is already created in the container in some cases. Better
to do it upfront so we fail immediately.
Fix reset of cgroup when detaching USB device from LXC guests
When detaching a USB device from an LXC guest we must remove
the device from the cgroup ACL. Unfortunately we were telling
the cgroup code to use the guest /dev path, not the host /dev
path, and the guest device node had already been unlinked.
This was, however, fortunate since the code passed &priv->cgroup
instead of priv->cgroup, so would have crash if the device node
were accessible.
The LXC code missed the 'usb' component out of the path
/dev/bus/usb/$BUSNUM/$DEVNUM, so it failed to actually
setup cgroups for the device. This was in fact lucky
because the call to virLXCSetupHostUsbDeviceCgroup
was also mistakenly passing '&priv->cgroup' instead of
just 'priv->cgroup'. So once the path is fixed, libvirtd
would then crash trying to access the bogus virCgroupPtr
pointer. This would have been a security issue, were it
not for the bogus path preventing the pointer reference
being reached.
virDomainDefCompatibleDevice blocks use of USB if no USB
controller is present. This is not correct for containers
since devices can be assigned directly regardless of any
controllers.
Eric Blake [Tue, 5 Nov 2013 17:30:56 +0000 (10:30 -0700)]
storage: avoid short reads while chasing backing chain
Our backing file chain code was not very robust to an ill-timed
EINTR, which could lead to a short read causing us to randomly
treat metadata differently than usual. But the existing
virFileReadLimFD forces an error if we don't read the entire
file, even though we only care about the header of the file.
So add a new virFile function that does what we want.
* src/util/virfile.h (virFileReadHeaderFD): New prototype.
* src/util/virfile.c (virFileReadHeaderFD): New function.
* src/libvirt_private.syms (virfile.h): Export it.
* src/util/virstoragefile.c (virStorageFileGetMetadataInternal)
(virStorageFileProbeFormatFromFD): Use it.
Many methods accept a string parameter specifying the
old root directory prefix. Since removal of the non-pivot
root container setup codepaths, this parameter is obsolete
in many methods where the callers always pass "/.oldroot".
This helper function is used to create parent directory for
the hostdev which will be added to the container. If the
parent directory of this hostdev doesn't exist, the mknod of
the hostdev will fail. eg with /dev/net/tun
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Conflicts:
src/conf/nwfilter_conf.c
- virReportOOMError() in context of one hunk.
src/lxc/lxc_driver.c
- functions renamed, and lxc object locking changed, creating
a conflict in the context.
For inexplicable reasons, the nwfilter XML parser is intentionally
ignoring errors that arise during parsing. As well as meaning that
users don't get any feedback on their XML mistakes, this will lead
it to silently drop data in OOM conditions.
Unfortunately this is racy - since the event loop is in a
different thread, the virDBusWatchCallback method may be
run before we get to calling dbus_watch_set_data. We must
reverse the order of these calls
See https://bugzilla.redhat.com/show_bug.cgi?id=885445
When writing commit 173c291, I missed the fact virNetServerClientClose
unlocks the client object before actually clearing client->sock and thus
it is possible to hit a window when client->keepalive is NULL while
client->sock is not NULL. I was thinking client->sock == NULL was a
better check for a closed connection but apparently we have to go with
client->keepalive == NULL to actually fix the crash.
When a client closes its connection to libvirtd early during
virConnectOpen, more specifically just after making
REMOTE_PROC_CONNECT_SUPPORTS_FEATURE call to check if
VIR_DRV_FEATURE_PROGRAM_KEEPALIVE is supported without even waiting for
the result, libvirtd may crash due to a race in keep-alive
initialization. Once receiving the REMOTE_PROC_CONNECT_SUPPORTS_FEATURE
call, the daemon's event loop delegates it to a worker thread. In case
the event loop detects EOF on the connection and calls
virNetServerClientClose before the worker thread starts to handle
REMOTE_PROC_CONNECT_SUPPORTS_FEATURE call, client->keepalive will be
disposed by the time virNetServerClientStartKeepAlive gets called from
remoteDispatchConnectSupportsFeature. Because the flow is common for
both authenticated and read-only connections, even unprivileged clients
may cause the daemon to crash.
To avoid the crash, virNetServerClientStartKeepAlive needs to check if
the connection is still open before starting keep-alive protocol.
Every libvirt release since 0.9.8 is affected by this bug.
Jiri Denemark [Fri, 20 Dec 2013 13:50:02 +0000 (14:50 +0100)]
qemu: Avoid using stale data in virDomainGetBlockInfo
CVE-2013-6458
Generally, every API that is going to begin a job should do that before
fetching data from vm->def. However, qemuDomainGetBlockInfo does not
know whether it will have to start a job or not before checking vm->def.
To avoid using disk alias that might have been freed while we were
waiting for a job, we use its copy. In case the disk was removed in the
meantime, we will fail with "cannot find statistics for device '...'"
error message.
When virDomainDetachDeviceFlags is called concurrently to
virDomainBlockStats: libvirtd may crash because qemuDomainBlockStats
finds a disk in vm->def before getting a job on a domain and uses the
disk pointer after getting the job. However, the domain in unlocked
while waiting on a job condition and thus data behind the disk pointer
may disappear. This happens when thread 1 runs
virDomainDetachDeviceFlags and enters monitor to actually remove the
disk. Then another thread starts running virDomainBlockStats, finds the
disk in vm->def, and while it's waiting on the job condition (owned by
the first thread), the first thread finishes the disk removal. When the
second thread gets the job, the memory pointed to be the disk pointer is
already gone.
That said, every API that is going to begin a job should do that before
fetching data from vm->def.
Eric Blake [Tue, 17 Dec 2013 23:28:20 +0000 (16:28 -0700)]
tests: be more explicit on qcow2 versions in virstoragetest
While working on v1.0.5-maint (the branch in use on Fedora 19)
with the host at Fedora 20, I got a failure in virstoragetest.
I traced it to the fact that we were using qemu-img to create a
qcow2 file, but qemu-img changed from creating v2 files by
default in F19 to creating v3 files in F20. Rather than leaving
it up to qemu-img, it is better to write the test to force
testing of BOTH file formats (better code coverage and all).
This patch alone does not fix all the failures in v1.0.5-maint;
for that, we must decide to either teach the older branch to
understand v3 files, or to reject them outright as unsupported.
But for upstream, making the test less dependent on changing
qemu-img defaults is always a good thing.
* tests/virstoragetest.c (testPrepImages): Simplify creation of
raw file; check if qemu supports compat and if so use it.
Michal Privoznik [Fri, 18 Oct 2013 16:28:14 +0000 (18:28 +0200)]
qemu: Fix augeas support for migration ports
Commit e3ef20d7 allows user to configure migration ports range via
qemu.conf. However, it forgot to update augeas definition file and
even the test data was malicious.
When we migrate vms concurrently, there's a chance that libvirtd on
destination assigns the same port for different migrations, which will
lead to migration failure during prepare phase on destination. So we use
virPortAllocator here to solve the problem.
Signed-off-by: Wang Yufei <james.wangyufei@huawei.com> Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
(cherry picked from commit 0196845d3abd0d914cf11f7ad6c19df8b47c32ed)
Conflicts:
missing support for WebSockets and listen address
virAsprintf doesn't report OOM
Don't spam logs with "port 0 must be in range" errors
Whenever virPortAllocatorRelease is called with port == 0, it complains
that the port is not in an allowed range, which is expectable as the
port was never allocated. Let's make virPortAllocatorRelease ignore 0
ports in a similar way free() ignores NULL pointers.
The function doesn't check whether the request is made for active or
inactive domain. Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).
I re-made the function in order for it to work the same way it's qemu
counterpart does.
Reproducer:
1) Define an LXC domain
2) Do 'virsh memtune <domain> --hard-limit 133T'
Backtrace:
Thread 6 (Thread 0x7fffec8c0700 (LWP 26826)):
#0 0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf718) at util/vircgroup.c:1764
#1 0x00007ffff70e9206 in virCgroupSetValueStr (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffe409f360 "1073741824")
at util/vircgroup.c:669
#2 0x00007ffff70e98b4 in virCgroupSetValueU64 (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=1073741824) at util/vircgroup.c:740
#3 0x00007ffff70ee518 in virCgroupSetMemory (group=0x0, kb=1048576) at util/vircgroup.c:1904
#4 0x00007ffff70ee675 in virCgroupSetMemoryHardLimit (group=0x0, kb=1048576)
at util/vircgroup.c:1944
#5 0x00005555557d54c8 in lxcDomainSetMemoryParameters (dom=0x7fffe40cc420,
params=0x7fffe409f100, nparams=1, flags=0) at lxc/lxc_driver.c:774
#6 0x00007ffff72c20f9 in virDomainSetMemoryParameters (domain=0x7fffe40cc420,
params=0x7fffe409f100, nparams=1, flags=0) at libvirt.c:4051
#7 0x000055555561365f in remoteDispatchDomainSetMemoryParameters (server=0x555555eb7e00,
client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510)
at remote_dispatch.h:7621
#8 0x00005555556133fd in remoteDispatchDomainSetMemoryParametersHelper (server=0x555555eb7e00,
client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510,
ret=0x7fffe40b84f0) at remote_dispatch.h:7591
#9 0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
at rpc/virnetserverprogram.c:435
#10 0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
at rpc/virnetserverprogram.c:305
#11 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ec4b10,
prog=0x555555ec3ae0, msg=0x555555eb94e0) at rpc/virnetserver.c:165
#12 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ec3e30, opaque=0x555555eb7e00)
at rpc/virnetserver.c:186
#13 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
#14 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
#15 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
#16 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
CVE-2013-6436: fix crash in lxcDomainGetMemoryParameters
The function doesn't check whether the request is made for active or
inactive domain. Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).
I re-made the function in order for it to work the same way it's qemu
counterpart does.
Reproducer:
1) Define an LXC domain
2) Do 'virsh memtune <domain>'
Backtrace:
Thread 6 (Thread 0x7fffec8c0700 (LWP 13387)):
#0 0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf750) at util/vircgroup.c:1764
#1 0x00007ffff70e958c in virCgroupGetValueStr (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf7c0) at util/vircgroup.c:705
#2 0x00007ffff70e9d29 in virCgroupGetValueU64 (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf810) at util/vircgroup.c:804
#3 0x00007ffff70ee706 in virCgroupGetMemoryHardLimit (group=0x0, kb=0x7fffec8bf8a8)
at util/vircgroup.c:1962
#4 0x00005555557d590f in lxcDomainGetMemoryParameters (dom=0x7fffd40024a0,
params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at lxc/lxc_driver.c:826
#5 0x00007ffff72c28d3 in virDomainGetMemoryParameters (domain=0x7fffd40024a0,
params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at libvirt.c:4137
#6 0x000055555563714d in remoteDispatchDomainGetMemoryParameters (server=0x555555eb7e00,
client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
ret=0x7fffd4002420) at remote.c:1895
#7 0x00005555556052c4 in remoteDispatchDomainGetMemoryParametersHelper (server=0x555555eb7e00,
client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
ret=0x7fffd4002420) at remote_dispatch.h:4050
#8 0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
at rpc/virnetserverprogram.c:435
#9 0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
at rpc/virnetserverprogram.c:305
#10 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ebaef0,
prog=0x555555ec3ae0, msg=0x555555ebb3e0) at rpc/virnetserver.c:165
#11 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ebc7e0, opaque=0x555555eb7e00)
at rpc/virnetserver.c:186
#12 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
#13 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
#14 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
#15 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Tie SASL callbacks lifecycle to virNetSessionSASLContext
The array of sasl_callback_t callbacks which is passed to sasl_client_new()
must be kept alive as long as the created sasl_conn_t object is alive as
cyrus-sasl uses this structure internally for things like logging, so
the memory used for callbacks must only be freed after sasl_dispose() has
been called.
During testing of successful SASL logins with
virsh -c qemu+tls:///system list --all
I've been getting invalid read reports from valgrind
==9237== Invalid read of size 8
==9237== at 0x6E93B6F: _sasl_getcallback (common.c:1745)
==9237== by 0x6E95430: _sasl_log (common.c:1850)
==9237== by 0x16593D87: digestmd5_client_mech_dispose (digestmd5.c:4580)
==9237== by 0x6E91653: client_dispose (client.c:332)
==9237== by 0x6E9476A: sasl_dispose (common.c:851)
==9237== by 0x4E225A1: virNetSASLSessionDispose (virnetsaslcontext.c:678)
==9237== by 0x4CBC551: virObjectUnref (virobject.c:262)
==9237== by 0x4E254D1: virNetSocketDispose (virnetsocket.c:1042)
==9237== by 0x4CBC551: virObjectUnref (virobject.c:262)
==9237== by 0x4E2701C: virNetSocketEventFree (virnetsocket.c:1794)
==9237== by 0x4C965D3: virEventPollCleanupHandles (vireventpoll.c:583)
==9237== by 0x4C96987: virEventPollRunOnce (vireventpoll.c:652)
==9237== by 0x4C94730: virEventRunDefaultImpl (virevent.c:274)
==9237== by 0x12C7BA: vshEventLoop (virsh.c:2407)
==9237== by 0x4CD3D04: virThreadHelper (virthreadpthread.c:161)
==9237== by 0x7DAEF32: start_thread (pthread_create.c:309)
==9237== by 0x8C86EAC: clone (clone.S:111)
==9237== Address 0xe2d61b0 is 0 bytes inside a block of size 168 free'd
==9237== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==9237== by 0x4C73827: virFree (viralloc.c:580)
==9237== by 0x4DE4BC7: remoteAuthSASL (remote_driver.c:4219)
==9237== by 0x4DE33D0: remoteAuthenticate (remote_driver.c:3639)
==9237== by 0x4DDBFAA: doRemoteOpen (remote_driver.c:832)
==9237== by 0x4DDC8DC: remoteConnectOpen (remote_driver.c:1031)
==9237== by 0x4D8595F: do_open (libvirt.c:1239)
==9237== by 0x4D863F3: virConnectOpenAuth (libvirt.c:1481)
==9237== by 0x12762B: vshReconnect (virsh.c:337)
==9237== by 0x12C9B0: vshInit (virsh.c:2470)
==9237== by 0x12E9A5: main (virsh.c:3338)
This commit changes virNetSASLSessionNewClient() to take ownership of the SASL
callbacks. Then we can free them in virNetSASLSessionDispose() after the corresponding
sasl_conn_t has been freed.
Jiri Denemark [Mon, 25 Nov 2013 15:37:32 +0000 (16:37 +0100)]
spec: Don't save/restore running VMs on libvirt-client update
The previous attempt (commit d65e0e1) removed just one of two
libvirt-guests restarts that happened on libvirt-client update. Let's
remove the last one too :-)
Don Dugger [Sat, 23 Nov 2013 21:15:38 +0000 (14:15 -0700)]
Return right error code for baselineCPU
This Python interface code is returning a -1 on errors for the
`baselineCPU' API. Since this API is supposed to return a pointer
the error return value should really be VIR_PY_NONE.
NB: I've checked all the other APIs in this file and this is the
only pointer API that is returning -1.
Signed-off-by: Don Dugger <donald.d.dugger@intel.com>
(crobinso: Upstream in libvirt-python.git)
Which in a default configuration will managedsave every running VM,
and then restore them. Certainly not something we should do every
time the libvirt-client RPM is updated.
Just drop the try-restart attempt, I don't know what purpose it
serves anyways.
If the host side of an LXC container console disconnected
and the guest side continued to write data, until the PTY
buffer filled up, the LXC controller would busy wait. It
would repeatedly see POLLHUP from poll() and not disable
the watch.
This was due to some bogus logic detecting blocking
conditions. Upon seeing a POLLHUP we must disable all
reading & writing from the PTY, and setup the epoll to
wake us up again when the connection comes back.
libxl: fix dubious cpumask handling in libxlDomainSetVcpuAffinities
Rather than casting the virBitmap pointer to uint8_t* and then using
the structure contents as a byte array, use the virBitmap API to determine
the bitmap size and test each bit.
libvirt previously recognized NFS, GFS2, OCFS2, and AFS filesystems as
"shared", and thus eligible for exceptions to certain rules/actions
about chowning image files before handing them off to a guest. This
patch widens the definition of "shared filesystem" to include SMB and
CIFS filesystems (aka "Windows file sharing"); both of these use the
same protocol, but different drivers so there are different magic
numbers for each.
Ján Tomko [Tue, 12 Nov 2013 12:18:54 +0000 (13:18 +0100)]
Disable nwfilter driver when running unprivileged
When opening a new connection to the driver, nwfilterOpen
only succeeds if the driverState has been allocated.
Move the privilege check in driver initialization before
the state allocation to disable the driver.
This changes the nwfilter-define error from:
error: cannot create config directory (null): Bad address
To:
this function is not supported by the connection driver:
virNWFilterDefineXML
Michal Privoznik [Tue, 20 Aug 2013 09:04:18 +0000 (11:04 +0200)]
qemuSetupMemoryCgroup: Handle hard_limit properly
Since 16bcb3 we have a regression. The hard_limit is set
unconditionally. By default the limit is zero. Hence, if user hasn't
configured any, we set the zero in cgroup subsystem making the kernel
kill the corresponding qemu process immediately. The proper fix is to
set hard_limit iff user has configured any.
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
Zhou Yimin [Thu, 17 Oct 2013 07:59:21 +0000 (15:59 +0800)]
remote: fix regression in event deregistration
Introduced by 7b87a3
When I quit the process which only register VIR_DOMAIN_EVENT_ID_REBOOT,
I got error like:
"libvirt: XML-RPC error : internal error: domain event 0 not registered".
Then I add the following code, it fixed.
Signed-off-by: Zhou Yimin <zhouyimin@huawei.com> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 9712c2510ec87a87578576a407768380e250a6a4)
Commit a0b6a36f "fixed" what abfff210 broke (URI precedence), but
there was still one more thing missing to fix. When using virsh
parameters to setup debugging, those weren't honored, because at the
time debugging was initializing, arguments weren't parsed yet. To
make ewerything work as expected, we need to initialize the debugging
twice, once before debugging (so we can debug option parsing properly)
and then again after these options are parsed.
As a side effect, this patch also fixes a leak when virsh is ran with
multiple '-l' parameters.
Commit abfff210 changed the order of vshParseArgv() and vshInit() in
order to make fix debugging of parameter parsing. However, vshInit()
did a vshReconnect() even though ctl->name wasn't set according to the
'-c' parameter yet. In order to keep both issues fixed, I've split
the vshInit() into vshInitDebug() and vshInit().
One simple memleak of ctl->name is fixed as a part of this patch,
since it is related to the issue it's fixing.
Ján Tomko [Wed, 9 Oct 2013 12:17:13 +0000 (14:17 +0200)]
LXC: Fix handling of RAM filesystem size units
Since 76b644c when the support for RAM filesystems was introduced,
libvirt accepted the following XML:
<source usage='1024' unit='KiB'/>
This was parsed correctly and internally stored in bytes, but it
was formatted as (with an extra 's'):
<source usage='1024' units='KiB'/>
When read again, this was treated as if the units were missing,
meaning libvirt was unable to parse its own XML correctly.
The usage attribute was documented as being in KiB, but it was not
scaled if the unit was missing. Transient domains still worked,
because this was balanced by an extra 'k' in the mount options.
This patch:
Changes the parser to use 'units' instead of 'unit', as the latter
was never documented (fixing persistent domains) and some programs
(libvirt-glib, libvirt-sandbox) already parse the 'units' attribute.
Removes the extra 'k' from the tmpfs mount options, which is needed
because now we parse our own XML correctly.
Changes the default input unit to KiB to match documentation, fixing:
https://bugzilla.redhat.com/show_bug.cgi?id=1015689
(cherry picked from commit 3f029fb5319b9dc9cc2fbf8d1ba4505ee9e4b1e3)
Conflicts:
src/conf/domain_conf.c
src/conf/domain_conf.h - missing format
src/lxc/lxc_container.c - virAsprintf doesn't report OOM errors
tests/lxcxml2xmltest.c - missing format test
After successful @cmd construction the memory where @keys points to is
part of @cmd. Avoid double freeing it.
(cherry picked from commit 3e8343e1510741623aa5bc1dfb74ec39fde868dd)
Commit 38ab1225 changed the default value of ret from true to false but
forgot to set ret = true when job is NONE. Thus, virsh domjobinfo
returned 1 when there was no job running for a domain but it used to
(and should) return 0 in this case.
(cherry picked from commit f084caae7c5db8ae03e7fafce164c73f65681843)
Remove use of virConnectPtr from all remaining nwfilter code
The virConnectPtr is passed around loads of nwfilter code in
order to provide it as a parameter to the callback registered
by the virt drivers. None of the virt drivers use this param
though, so it serves no purpose.
Avoiding the need to pass a virConnectPtr means that the
nwfilterStateReload method no longer needs to open a bogus
QEMU driver connection. This addresses a race condition that
can lead to a crash on startup.
The nwfilter driver starts before the QEMU driver and registers
some callbacks with DBus to detect firewalld reload. If the
firewalld reload happens while the QEMU driver is still starting
up though, the nwfilterStateReload method will open a connection
to the partially initialized QEMU driver and cause a crash.
Don't pass virConnectPtr in nwfilter 'struct domUpdateCBStruct'
The nwfilter driver only needs a reference to its private
state object, not a full virConnectPtr. Update the domUpdateCBStruct
struct to have a 'void *opaque' field instead of a virConnectPtr.
So far the virNetDevBandwidthEqual() expected both ->in and ->out items
to be allocated for both @a and @b compared. This is not necessary true
for all our code. For instance, running 'update-device' twice over a NIC
with the very same XML results in SIGSEGV-ing in this function.
qemu_hotplug: Allow QoS update in qemuDomainChangeNet
The qemuDomainChangeNet() is called when 'virsh update-device' is
invoked on a NIC. Currently, we fail to update the QoS even though
we have routines for that.
Peter Krempa [Mon, 16 Sep 2013 11:40:42 +0000 (13:40 +0200)]
qemu: Use "migratable" XML definition when doing external checkpoints
In the original implementation of external checkpoints I've mistakenly
used the live definition to be stored in the save image. The normal
approach is to use the "migratable" definition. This was discovered when
commit 07966f6a8b5ccb5bb4c716b25deb8ba2e572cc67 changed the behavior to
use a converted XML from the user to do the compatibility check to fix
problem when using the regular machine saving.
As the previous patch added a compatibility layer, we can now change the
type of the XML in the image.
Peter Krempa [Mon, 16 Sep 2013 11:37:34 +0000 (13:37 +0200)]
qemu: Fix checking of ABI stability when restoring external checkpoints
External checkpoints have a bug in the implementation where they use the
normal definition instead of the "migratable" one. This causes errors
when the snapshot is being reverted using the workaround method via
qemuDomainRestoreFlags() with a custom XML. This issue was introduced
when commit 07966f6a8b5ccb5bb4c716b25deb8ba2e572cc67 changed the code to
compare "migratable" XMLs from the user as we should have used
migratable in the image too.
This patch adds a compatibility layer, so that fixing the snapshot code
won't make existing snapshots fail to load.
Osier Yang [Fri, 24 May 2013 03:59:14 +0000 (11:59 +0800)]
virsh: Fix regression of vol-resize
Introduced by commit 1daa4ba33acf. vshCommandOptStringReq returns
0 on *success* or the option is not required && not present, both
are right result. Error out when returning 0 is not correct.
the caller, it doesn't have to check wether it
(cherry picked from commit 2a3a725c33aba2046443d33eb473eb54517f61c8)
Resolves:https://bugzilla.redhat.com/show_bug.cgi?id=923053
When cdrom is block type, the virsh change-media failed to insert
source info because virsh uses "<source block='/dev/sdb'/>" while
the correct name of the attribute for block disks is "dev".
Fix crash in remoteDispatchDomainMemoryStats (CVE-2013-4296)
The 'stats' variable was not initialized to NULL, so if some
early validation of the RPC call fails, it is possible to jump
to the 'cleanup' label and VIR_FREE an uninitialized pointer.
This is a security flaw, since the API can be called from a
readonly connection which can trigger the validation checks.
Add support for using 3-arg pkcheck syntax for process (CVE-2013-4311)
With the existing pkcheck (pid, start time) tuple for identifying
the process, there is a race condition, where a process can make
a libvirt RPC call and in another thread exec a setuid application,
causing it to change to effective UID 0. This in turn causes polkit
to do its permission check based on the wrong UID.
To address this, libvirt must get the UID the caller had at time
of connect() (from SO_PEERCRED) and pass a (pid, start time, uid)
triple to the pkcheck program.
Signed-off-by: Colin Walters <walters@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 922b7fda77b094dbf022d625238262ea05335666)
Conflicts:
src/access/viraccessdriverpolkit.c
Resolution:
Dropped file that does not exist in this branch.
Include process start time when doing polkit checks
Since PIDs can be reused, polkit prefers to be given
a (PID,start time) pair. If given a PID on its own,
it will attempt to lookup the start time in /proc/pid/stat,
though this is subject to races.
It is safer if the client app resolves the PID start
time itself, because as long as the app has the client
socket open, the client PID won't be reused.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 979e9c56a7aadf2dcfbddd1abfbad594b78b4468) Signed-off-by: Eric Blake <eblake@redhat.com>
Conflicts:
src/rpc/virnetsocket.h - context
src/util/virprocess.c - needed #include "virstring.h"
src/util/virstring.c - context
src/util/virstring.h - context
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in 7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
Eric Blake [Fri, 23 Aug 2013 15:30:42 +0000 (09:30 -0600)]
security: provide supplemental groups even when parsing label (CVE-2013-4291)
Commit 29fe5d7 (released in 1.1.1) introduced a latent problem
for any caller of virSecurityManagerSetProcessLabel and where
the domain already had a uid:gid label to be parsed. Such a
setup would collect the list of supplementary groups during
virSecurityManagerPreFork, but then ignores that information,
and thus fails to call setgroups() to adjust the supplementary
groups of the process.
Upstream does not use virSecurityManagerSetProcessLabel for
qemu (it uses virSecurityManagerSetChildProcessLabel instead),
so this problem remained latent until backporting the initial
commit into v0.10.2-maint (commit c061ff5, released in 0.10.2.7),
where virSecurityManagerSetChildProcessLabel has not been
backported. As a result of using a different code path in the
backport, attempts to start a qemu domain that runs as qemu:qemu
will end up with supplementary groups unchanged from the libvirtd
parent process, rather than the desired supplementary groups of
the qemu user. This can lead to failure to start a domain
(typical Fedora setup assigns user 107 'qemu' to both group 107
'qemu' and group 36 'kvm', so a disk image that is only readable
under kvm group rights is locked out). Worse, it is a security
hole (the qemu process will inherit supplemental group rights
from the parent libvirtd process, which means it has access
rights to files owned by group 0 even when such files should
not normally be visible to user qemu).
LXC does not use the DAC security driver, so it is not vulnerable
at this time. Still, it is better to plug the latent hole on
the master branch first, before cherry-picking it to the only
vulnerable branch v0.10.2-maint.
* src/security/security_dac.c (virSecurityDACGetIds): Always populate
groups and ngroups, rather than only when no label is parsed.
Currently, when there is no blockjob, dom.blockJobInfo('vda')
still reports error because it doesn't distinguish return value 0 from -1.
libvirt.libvirtError: virDomainGetBlockJobInfo() failed
virDomainGetBlockJobInfo() API return value:
-1 in case of failure, 0 when nothing found, 1 found.
And use PyDict_SetItemString instead of PyDict_SetItem when key is
of string type. PyDict_SetItemString increments key/value reference
count, so call Py_DECREF() for value. For key, we don't need to
do this, because PyDict_SetItemString will handle it internally.
Peter Krempa [Fri, 16 Aug 2013 10:22:32 +0000 (12:22 +0200)]
virbitmap: Refactor virBitmapParse to avoid access beyond bounds of array
The virBitmapParse function was calling virBitmapIsSet() function that
requires the caller to check the bounds of the bitmap without checking
them. This resulted into crashes when parsing a bitmap string that was
exceeding the bounds used as argument.
This patch refactors the function to use virBitmapSetBit without
checking if the bit is set (this function does the checks internally)
and then counts the bits in the bitmap afterwards (instead of keeping
track while parsing the string).
This patch also changes the "parse_error" label to a more common
"error".
The refactor should also get rid of the need to call sa_assert on the
returned variable as the callpath should allow coverity to infer the
possible return values.
Ján Tomko [Fri, 26 Jul 2013 10:04:32 +0000 (12:04 +0200)]
Set the number of elements 0 in virNetwork*Clear
Decrementing it when it was already 0 causes an invalid free
in virNetworkDefUpdateDNSHost if virNetworkDNSHostDefParseXML
fails and virNetworkDNSHostDefClear gets called twice.
Ján Tomko [Wed, 17 Jul 2013 08:56:05 +0000 (10:56 +0200)]
cgroup: reuse buffer for getline
Reuse the buffer for getline and track buffer allocation
separately from the string length to prevent unlikely
out-of-bounds memory access.
This fixes the following leak that happened when zero bytes were read:
==404== 120 bytes in 1 blocks are definitely lost in loss record 1,344 of 1,671
==404== at 0x4C2C71B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==404== by 0x906F862: getdelim (iogetdelim.c:68)
==404== by 0x52A48FB: virCgroupPartitionNeedsEscaping (vircgroup.c:1136)
==404== by 0x52A0FB4: virCgroupPartitionEscape (vircgroup.c:1171)
==404== by 0x52A0EA4: virCgroupNewDomainPartition (vircgroup.c:1450)
(cherry picked from commit cc7329317fee6088055d7b09594c19f1b8fec5e3)
qemu: https://bugzilla.redhat.com/show_bug.cgi?id=981094
The commit 0ad9025ef introduce qemu flag QEMU_CAPS_DEVICE_VIDEO_PRIMARY
for using -device VGA, -device cirrus-vga, -device vmware-svga and
-device qxl-vga. In use, for -device qxl-vga, mouse doesn't display
in guest window like the desciption in above bug.
This patch try to use -device for primary video when qemu >=1.6 which
contains the bug fix patch
Eric Blake [Mon, 29 Jul 2013 18:53:36 +0000 (12:53 -0600)]
examples: fix mingw build vs. printf
Mingw *printf is a moving target; newer mingw now provides a version
of asprintf() that fails to understand %lld:
CC event_test-event-test.o
../../../../examples/domain-events/events-c/event-test.c: In function 'myDomainEventRTCChangeCallback':
../../../../examples/domain-events/events-c/event-test.c:270:18: error: unknown conversion type character 'l' in format [-Werror=format=]
virDomainGetID(dom), offset) < 0)
^
But since our examples already admitted that they were hacking around
a mingw deficiency, it is easier to just use printf() directly, coupled
with <inttypes.h> macros, for a more portable work-around.
* examples/domain-events/events-c/event-test.c
(myDomainEventRTCChangeCallback): Use PRIdMAX instead of asprintf.
On platforms without decent group support, the build failed:
Cannot export virGetGroupList: symbol not defined
./.libs/libvirt_security_manager.a(libvirt_security_manager_la-security_dac.o): In function `virSecurityDACPreFork':
/home/eblake/libvirt-tmp/build/src/../../src/security/security_dac.c:248: undefined reference to `virGetGroupList'
collect2: error: ld returned 1 exit status
* src/util/virutil.c (virGetGroupList): Provide dummy implementation.
Eric Blake [Tue, 2 Jul 2013 12:09:30 +0000 (06:09 -0600)]
build: work around mingw header pollution
On Fedora 18, when cross-compiling to mingw with the mingw*-dbus
packages installed, compilation fails with:
CC libvirt_net_rpc_server_la-virnetserver.lo
In file included from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-connection.h:32:0,
from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-bus.h:30,
from /usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus.h:31,
from ../../src/util/virdbus.h:26,
from ../../src/rpc/virnetserver.c:39:
/usr/i686-w64-mingw32/sys-root/mingw/include/dbus-1.0/dbus/dbus-message.h:74:58: error: expected ';', ',' or ')' before 'struct'
I have reported this as a bug against two packages:
- mingw-headers, for polluting the namespace
https://bugzilla.redhat.com/show_bug.cgi?id=980270
- dbus, for not dealing with the pollution
https://bugzilla.redhat.com/show_bug.cgi?id=980278
At least dbus has agreed that a future version of dbus headers will
do s/interface/iface/, regardless of what happens in mingw. But it
is also easy to workaround in libvirt in the meantime, without having
to wait for either mingw or dbus to upgrade.
* src/util/virdbus.h (includes): Undo mingw's pollution so that
dbus doesn't fail.
Eric Blake [Mon, 1 Jul 2013 22:48:11 +0000 (16:48 -0600)]
build: configure must not affect tarball contents
On mingw, configure sets the name of the lxc symfile to
libvirt_lxc.defs rather than libvirt_lxc.syms. But tarballs
must be arch-independent, regardless of the configure options
used for the tree where we ran 'make dist'. This led to the
following failure in autobuild.sh:
CCLD libvirt-lxc.la
CCLD libvirt-qemu.la
/usr/lib64/gcc/i686-w64-mingw32/4.7.2/../../../../i686-w64-mingw32/bin/ld: cannot find libvirt_lxc.def: No such file or directory
collect2: error: ld returned 1 exit status
make[3]: *** [libvirt-lxc.la] Error 1
make[3]: *** Waiting for unfinished jobs....
We were already doing the right thing with libvirt_qemu.syms.
* src/Makefile.am (EXTRA_DIST): Don't ship a built file which
depends on configure for its final name.
Eric Blake [Mon, 1 Jul 2013 19:21:57 +0000 (13:21 -0600)]
build: avoid build failure without gnutls
Found while trying to cross-compile to mingw:
CC libvirt_driver_remote_la-remote_driver.lo
../../src/remote/remote_driver.c: In function 'doRemoteOpen':
../../src/remote/remote_driver.c:487:23: error: variable 'verify' set but not used [-Werror=unused-but-set-variable]
* src/remote/remote_driver.c (doRemoteOpen): Also ignore 'verify'.
Ján Tomko [Thu, 4 Jul 2013 09:35:59 +0000 (11:35 +0200)]
Fix build with clang
Partially revert cdd703f's revert of c163410, as linking with clang
with --param=ssp-buffer-size=4 still fails with:
"argument unused during compilation".
Eric Blake [Thu, 18 Jul 2013 21:47:41 +0000 (15:47 -0600)]
maint: update to latest gnulib
Upstream gnulib recently patched a bug in bootstrap, for projects
that use a different name than build-aux for a subdirectory. We
don't, but it doesn't hurt to update.
* .gnulib: Update, for bootstrap fix.
* bootstrap: Sync to upstream.
* bootstrap.conf: Match upstream bug fix.
Eric Blake [Thu, 11 Jul 2013 11:07:16 +0000 (05:07 -0600)]
maint: update to latest gnulib
Future patches need LGPLv2+ versions of some modules that had
recent license changes; but separating the gnulib update from
the actual use of the modules makes it easier to backport to
an older version while avoiding a submodule update (assuming,
of course, that the backport is to a system where glibc provides
adequate functionaliy without needing the gnulib module).
* .gnulib: Update to latest, for modules needed in later patches.
Eric Blake [Wed, 3 Jul 2013 20:43:11 +0000 (14:43 -0600)]
build: honor autogen.sh --no-git
Based on a report by Chandrashekar Shastri, at
https://bugzilla.redhat.com/show_bug.cgi?id=979360
On systems where git cannot access the outside world, a developer
can instead arrange to get a copy of gnulib at the right commit
via side channels (such as NFS share drives), set GNULIB_SRCDIR,
then use ./autogen.sh --no-git. In this setup, we will now
avoid direct use of git. Of course, this means no automatic
gnulib updates when libvirt.git updates its submodule, but it
is expected that any developer in such a situation is already
prepared to deal with the fallout.
* .gnulib: Update to latest, for bootstrap.
* bootstrap: Synchronize from gnulib.
* autogen.sh (no_git): Avoid git when requested.
* cfg.mk (_update_required): Skip automatic rerun of bootstrap if
we can't use git.
* docs/compiling.html.in: Document this setup.
* docs/hacking.html.in: Mention this.
* HACKING: Regenerate.
Eric Blake [Tue, 2 Jul 2013 23:26:42 +0000 (17:26 -0600)]
maint: update to latest gnulib
The latest mingw headers on Fedora 19 fail to build with gnulib
without an update.
Meanwhile, now that upstream gnulib has better handling of -W
probing for clang, we can drop some of our own solutions in
favor of upstream; thus this reverts commit c1634100, "Correctly
detect warning flags with clang".
FreeBSD ships an old gcc 4.2.1 which generates
bogus code, e.g. getsockopt() call returns
struct xucred with bogus values, which doesn't even
allow to connect to libvirtd:
error: Failed to find group record for gid '1284660778': No error: 0
So roll back to just -fstack-protector on FreeBSD.