Jiri Denemark [Tue, 28 Jun 2016 12:39:58 +0000 (14:39 +0200)]
qemu: Let empty default VNC password work as documented
CVE-2016-5008
Setting an empty graphics password is documented as a way to disable
VNC/SPICE access, but QEMU does not always behaves like that. VNC would
happily accept the empty password. Let's enforce the behavior by setting
password expiration to "now".
Eric Blake [Wed, 9 Dec 2015 00:46:31 +0000 (17:46 -0700)]
CVE-2015-5313: storage: don't allow '/' in filesystem volume names
The libvirt file system storage driver determines what file to
act on by concatenating the pool location with the volume name.
If a user is able to pick names like "../../../etc/passwd", then
they can escape the bounds of the pool. For that matter,
virStoragePoolListVolumes() doesn't descend into subdirectories,
so a user really shouldn't use a name with a slash.
Normally, only privileged users can coerce libvirt into creating
or opening existing files using the virStorageVol APIs; and such
users already have full privilege to create any domain XML (so it
is not an escalation of privilege). But in the case of
fine-grained ACLs, it is feasible that a user can be granted
storage_vol:create but not domain:write, and it violates
assumptions if such a user can abuse libvirt to access files
outside of the storage pool.
Therefore, prevent all use of volume names that contain "/",
whether or not such a name is actually attempting to escape the
pool.
Since commit 8eb55d782a2b9afacc7938694891cc6fad7b42a5 libxml2 removes
two slashes from the URI when there is no server part. This is fixed
with beb7281055dbf0ed4d041022a67c6c5cfd126f25, but only if the calling
application calls xmlSaveUri() on URI that xmlURIParse() parsed. And
that is not the case in virURIFormat(). virURIFormat() accepts
virURIPtr that can be created without parsing it and we do that when we
format network storage paths for gluster for example. Even though
virStorageSourceParseBackingURI() uses virURIParse(), it throws that data
structure right away.
Since we want to format URIs as URIs and not absolute URIs or opaque
URIs (see RFC 3986), we can specify that with a special hack thanks to
commit beb7281055dbf0ed4d041022a67c6c5cfd126f25, by setting port to -1.
This fixes qemuxml2argvtest test where the disk-drive-network-gluster
case was failing.
In systemd >= 218, the udev_set_log_fn method has been marked
deprecated and turned into a no-op. Nothing in the udev client
library will print to stderr by default anymore, so we can
just stop installing a logging hook for new enough udev.
Dario Faggioli [Mon, 30 Jun 2014 17:19:01 +0000 (19:19 +0200)]
libxl: don't break the build on Xen>=4.5 because of libxl_vcpu_setaffinity()
libxl interface for vcpu pinning is changing in Xen 4.5. Basically,
libxl_set_vcpuaffinity() now wants one more parameter. That is
representative of 'VCPU soft affinity', which libvirt does not use.
To mark such change, the macro LIBXL_HAVE_VCPUINFO_SOFT_AFFINITY is
defined. Use it as a gate and, if present, re-#define the calls from
the old to the new interface, to avoid breaking the build.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit bfc72e99920215c9b004a5380ca61fe6ff81ea6b)
Peter Krempa [Tue, 20 Jan 2015 16:01:01 +0000 (17:01 +0100)]
CVE-2015-0236: qemu: Check ACLs when dumping security info from snapshots
The ACL check didn't check the VIR_DOMAIN_XML_SECURE flag and the
appropriate permission for it. Found via code inspection while fixing
permissions for save images.
Cédric Bosdonnat [Wed, 18 Dec 2013 17:33:44 +0000 (17:33 +0000)]
Fix crash in virsystemdtest with dbus 1.7.6
D-bus introduced some changes in its locking code. Overriding the init
function skips the new locking init and thus crashes later in libvirt
test. Removing the function makes the test pass again.
gnutls-3.3.0 and newer leaves 2 FDs open in order to be backwards
compatible when it comes to chrooted binaries [1]. Linking
commandhelper with gnutls then leaves these two FDs open and
commandtest fails thanks to that. This patch does not link
commandhelper with libvirt.la, but rather only the utilities making
the test pass.
This patch changes the macro introduced in 292d3f2d to either be
empty in the case of newer libselinux, or contain 'const' in the
case of older libselinux. The macro is then used directly in
tests/securityselinuxhelper.c.
Cédric Bosdonnat [Wed, 28 May 2014 12:44:08 +0000 (14:44 +0200)]
build: fix build with libselinux 2.3
Several function signatures changed in libselinux 2.3, now taking
a 'const char *' instead of 'security_context_t'. The latter is
defined in selinux/selinux.h as
Laine Stump [Wed, 15 Oct 2014 22:49:01 +0000 (00:49 +0200)]
util: eliminate "use after free" in callers of virNetDevLinkDump
virNetDevLinkDump() gets a message from netlink into "resp", then
calls nlmsg_parse() to fill the table "tb" with pointers into resp. It
then returns tb to its caller, but not before freeing the buffer at
resp. That means that all the callers of virNetDevLinkDump() are
examining memory that has already been freed. This can be verified by
filling the buffer at resp with garbage prior to freeing it (or, I
suppose, just running libvirtd under valgrind) then performing some
operation that calls virNetDevLinkDump().
The upstream commit log incorrectly states that the code has been like
this ever since virNetDevLinkDump() was written. In reality, the
problem was introduced with commit e95de74d, first in libvirt-1.0.5,
which was attempting to eliminate a typecast that caused compiler
warnings. It has only been pure luck (or maybe a lack of heavy load,
and/or maybe an allocation algorithm in malloc() that delays re-use of
just-freed memory) that has kept this from causing errors, for example
when configuring a PCI passthrough or macvtap passthrough network
interface.
The solution taken in this patch is the simplest - just return resp to
the caller along with tb, then have the caller free it after they are
finished using the data (pointers) in tb. I alternately could have
made a cleaner interface by creating a new struct that put tb and resp
together along with a vir*Free() function for it, but this function is
only used in a couple places, and I'm not sure there will be
additional new uses of virNetDevLinkDump(), so the value of adding a
new type, extra APIs, etc. is dubious.
Eric Blake [Sat, 1 Nov 2014 04:14:07 +0000 (22:14 -0600)]
CVE-2014-7823: dumpxml: security hole with migratable flag
Commit 28f8dfd (v1.0.0) introduced a security hole: in at least
the qemu implementation of virDomainGetXMLDesc, the use of the
flag VIR_DOMAIN_XML_MIGRATABLE (which is usable from a read-only
connection) triggers the implicit use of VIR_DOMAIN_XML_SECURE
prior to calling qemuDomainFormatXML. However, the use of
VIR_DOMAIN_XML_SECURE is supposed to be restricted to read-write
clients only. This patch treats the migratable flag as requiring
the same permissions, rather than analyzing what might break if
migratable xml no longer includes secret information.
Fortunately, the information leak is low-risk: all that is gated
by the VIR_DOMAIN_XML_SECURE flag is the VNC connection password;
but VNC passwords are already weak (FIPS forbids their use, and
on a non-FIPS machine, anyone stupid enough to trust a max-8-byte
password sent in plaintext over the network deserves what they
get). SPICE offers better security than VNC, and all other
secrets are properly protected by use of virSecret associations
rather than direct output in domain XML.
* src/remote/remote_protocol.x (REMOTE_PROC_DOMAIN_GET_XML_DESC):
Tighten rules on use of migratable flag.
* src/libvirt-domain.c (virDomainGetXMLDesc): Likewise.
Pavel Hrdina [Mon, 22 Sep 2014 16:19:07 +0000 (18:19 +0200)]
domain_conf: fix domain deadlock
If you use public api virConnectListAllDomains() with second parameter
set to NULL to get only the number of domains you will lock out all
other operations with domains.
Peter Krempa [Thu, 11 Sep 2014 14:35:53 +0000 (16:35 +0200)]
CVE-2014-3633: qemu: blkiotune: Use correct definition when looking up disk
Live definition was used to look up the disk index while persistent one
was indexed leading to a crash in qemuDomainGetBlockIoTune. Use the
correct def and report a nice error.
Unfortunately it's accessible via read-only connection, though it can
only crash libvirtd in the cases where the guest is hot-plugging disks
without reflecting those changes to the persistent definition. So
avoiding hotplug, or doing hotplug where persistent is always modified
alongside live definition, will avoid the out-of-bounds access.
Eric Blake [Mon, 6 Jan 2014 21:33:08 +0000 (14:33 -0700)]
build: fix 'make check' with newer git
Newer git doesn't like the maint.mk rule 'public-submodule-commit'
run during 'make check', as inherited from our checkout of gnulib.
I tracked down that libvirt commit 8531301 picked up a gnulib fix
that makes git happy. Rather than try and do a full .gnulib
submodule update to gnulib.git d18d1b802 (as used in that libvirt
commit), it was easier to just backport the fixed maint.mk from
gnulib on top of our existing submodule level. I did it as follows,
where these steps will have to be repeated when cherry-picking this
commit to any other maintenance branch:
mkdir -p gnulib/local/top
cd .gnulib
git checkout d18d1b802 top/maint.mk
git diff HEAD > ../gnulib/local/top/maint.mk.diff
git reset --hard
cd ..
git add gnulib/local/top
Peter Krempa [Tue, 1 Jul 2014 11:52:51 +0000 (13:52 +0200)]
qemu: copy: Accept 'format' parameter when copying to a non-existing img
We have the following matrix of possible arguments handled by the logic
statement touched by this patch:
| flags & _REUSE_EXT | !(flags & _REUSE_EXT)
-------+--------------------+----------------------
format| (1) | (2)
-------+--------------------+----------------------
!format| (3) | (4)
-------+--------------------+----------------------
In cases 1 and 2 the user provided a format, in cases 3 and 4 not. The
user requests to use a pre-existing image in 1 and 3 and libvirt will
create a new image in 2 and 4.
The difference between cases 3 and 4 is that for 3 the format is probed
from the user-provided image, whereas in 4 we just use the existing disk
format.
The current code would treat cases 1,3 and 4 correctly but in case 2 the
format provided by the user would be ignored.
The particular piece of code was broken in commit 35c7701c64508f975dfeb8
but since it was introduced a few commits before that it was never
released as working.
Eric Blake [Wed, 25 Jun 2014 20:54:36 +0000 (14:54 -0600)]
docs: publish correct enum values
We publish libvirt-api.xml for others to use, and in fact, the
libvirt-python bindings use it to generate python constants that
correspond to our enum values. However, we had an off-by-one bug
that any enum that relied on C's rules for implicit initialization
of the first enum member to 0 got listed in the xml as having a
value of 1 (and all later members of the enum were equally
botched).
The fix is simple - since we add one to the previous value when
encountering an enum without an initializer, the previous value
must start at -1 so that the first enum member is assigned 0.
The python generator code has had the off-by-one ever since DV
first wrote it years ago, but most of our public enums were immune
because they had an explicit = 0 initializer. The only affected
enums are:
- virDomainEventGraphicsAddressType (such as
VIR_DOMAIN_EVENT_GRAPHICS_ADDRESS_IPV4), since commit 987e31e
(libvirt v0.8.0)
- virDomainCoreDumpFormat (such as VIR_DOMAIN_CORE_DUMP_FORMAT_RAW),
since commit 9fbaff0 (libvirt v1.2.3)
- virIPAddrType (such as VIR_IP_ADDR_TYPE_IPV4), since commit 03e0e79 (not yet released)
Thanks to Nehal J Wani for reporting the problem on IRC, and
for helping me zero in on the culprit function.
Peter Krempa [Wed, 25 Jun 2014 16:11:17 +0000 (18:11 +0200)]
qemu: blockcopy: Don't remove existing disk mirror info
When creating a new disk mirror the new struct is stored in a separate
variable until everything went well. The removed hunk would actually
remove existing mirror information for example when the api would be run
if a mirror still exists.
LSN-2014-0003: Don't expand entities when parsing XML
If the XML_PARSE_NOENT flag is passed to libxml2, then any
entities in the input document will be fully expanded. This
allows the user to read arbitrary files on the host machine
by creating an entity pointing to a local file. Removing
the XML_PARSE_NOENT flag means that any entities are left
unchanged by the parser, or expanded to "" by the XPath
APIs.
Laine Stump [Thu, 1 May 2014 08:40:41 +0000 (11:40 +0300)]
qemu: fix crash when removing <filterref> from interface with update-device
If a domain network interface that contains a <filterref> is modified
"live" using "virsh update-device --live", libvirtd would crash. This
was because the code supporting live update of an interface's
filterref was assuming that a filterref might be added or modified,
but didn't account for removing the filterref, resulting in a null
dereference of the filter name.
Introduced with commit 258fb278, which was first in libvirt v1.0.1.
This addresses https://bugzilla.redhat.com/show_bug.cgi?id=1093301
qemu: make sure agent returns error when required data are missing
Commit 5b3492fa aimed to fix this and caught one error but exposed
another one. When agent command is being executed and the thread
waiting for the reply is woken up by an event (e.g. EOF in case of
shutdown), the command finishes with no data (rxObject == NULL), but
no error is reported, since this might be desired by the caller
(e.g. suspend through agent). However, in other situations, when the
data are required (e.g. getting vCPUs), we proceed to getting desired
data out of the reply, but none of the virJSON*() functions works well
with NULLs. I chose the way of a new parameter for qemuAgentCommand()
function that specifies whether reply is required and behaves
according to that.
On all the places where qemuAgentComand() was called, we did a check
for errors in the reply. Unfortunately, some of the places called
qemuAgentCheckError() without checking for non-null reply which might
have resulted in a crash.
So this patch makes the error-checking part of qemuAgentCommand()
itself, which:
a) makes it look better,
b) makes the check mandatory and, most importantly,
c) checks for the errors if and only if it is appropriate.
This actually fixes a potential crashers when qemuAgentComand()
returned 0, but reply was NULL. Having said that, it *should* fix the
following bug:
Michal Privoznik [Wed, 19 Mar 2014 17:10:34 +0000 (18:10 +0100)]
virNetClientSetTLSSession: Restore original signal mask
Currently, we use pthread_sigmask(SIG_BLOCK, ...) prior to calling
poll(). This is okay, as we don't want poll() to be interrupted.
However, then - immediately as we fall out from the poll() - we try to
restore the original sigmask - again using SIG_BLOCK. But as the man
page says, SIG_BLOCK adds signals to the signal mask:
SIG_BLOCK
The set of blocked signals is the union of the current set and the set argument.
Therefore, when restoring the original mask, we need to completely
overwrite the one we set earlier and hence we should be using:
SIG_SETMASK
The set of blocked signals is set to the argument set.
If a VM dies very early during an attempted connect to the guest agent
while the locks are down the domain monitor object will be freed. The
object is then accessed later as any failure during guest agent startup
isn't considered fatal.
In the current upstream version this doesn't lead to a crash as
virObjectLock called when entering the monitor in
qemuProcessDetectVcpuPIDs checks the pointer before attempting to
dereference (lock) it. The NULL pointer is then caught in the monitor
helper code.
Before the introduction of virObjectLockable - observed on 0.10.2 - the
pointer is locked directly via virMutexLock leading to a crash.
To avoid this problem we need to differentiate between the guest agent
not being present and the VM quitting when the locks were down. The fix
reorganizes the code in qemuConnectAgent to add the check and then adds
special handling to the callers.
The nwfilter conf update mutex previously serialized
updates to the internal data structures for firewall
rules, and updates to the firewall itself. The latter
was recently turned into a read/write lock, and filter
instantiation allowed to proceed in parallel. It was
believed that this was ok, since each filter is created
on a separate iptables/ebtables chain.
It turns out that there is a subtle lock ordering problem
on virNWFilterObjPtr instances. __virNWFilterInstantiateFilter
will hold a lock on the virNWFilterObjPtr it is instantiating.
This in turn invokes virNWFilterInstantiate which then invokes
virNWFilterDetermineMissingVarsRec which then invokes
virNWFilterObjFindByName. This iterates over every single
virNWFilterObjPtr in the list, locking them and checking their
name. So if 2 or more threads try to instantiate a filter in
parallel, they'll all hold 1 lock at the top level in the
__virNWFilterInstantiateFilter method which will cause the
other thread to deadlock in virNWFilterObjFindByName.
The fix is to add an exclusive mutex to serialize the
execution of __virNWFilterInstantiateFilter.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC hotunplug code
Rewrite multiple hotunplug functions to to use the
virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with an absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC chardev hostdev hotplug
Rewrite lxcDomainAttachDeviceHostdevMiscLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC block hostdev hotplug
Rewrite lxcDomainAttachDeviceHostdevStorageLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC USB hotplug
Rewrite lxcDomainAttachDeviceHostdevSubsysUSBLive function
to use the virProcessRunInMountNamespace helper. This avoids
risk of a malicious guest replacing /dev with a absolute
symlink, tricking the driver into changing the host OS
filesystem.
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC disk hotplug
Rewrite lxcDomainAttachDeviceDiskLive function to use the
virProcessRunInMountNamespace helper. This avoids risk of
a malicious guest replacing /dev with a absolute symlink,
tricking the driver into changing the host OS filesystem.
Eric Blake [Tue, 24 Dec 2013 05:55:51 +0000 (22:55 -0700)]
CVE-2013-6456: Avoid unsafe use of /proc/$PID/root in LXC shutdown/reboot code
Use helper virProcessRunInMountNamespace in lxcDomainShutdownFlags and
lxcDomainReboot. Otherwise, a malicious guest could use symlinks
to force the host to manipulate the wrong file in the host's namespace.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Add helper for running code in separate namespaces
Implement virProcessRunInMountNamespace, which runs callback of type
virProcessNamespaceCallback in a container namespace. This uses a
child process to run the callback, since you can't change the mount
namespace of a thread. This implies that callbacks have to be careful
about what code they run due to async safety rules.
Idea by Dan Berrange, based on an initial report by Reco
<recoverym4n@gmail.com> at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Daniel Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 7c72ef6f555f1f9844d51be2f38f078bc908652c)
Add a helper function which takes a file path and ensures
that all directory components leading up to the file exist.
IOW, it strips the filename part of the path and passes
the result to virFileMakePath.
Move check for cgroup devices ACL upfront in LXC hotplug
The check for whether the cgroup devices ACL is available is
done quite late during LXC hotplug - in fact after the device
node is already created in the container in some cases. Better
to do it upfront so we fail immediately.
Fix reset of cgroup when detaching USB device from LXC guests
When detaching a USB device from an LXC guest we must remove
the device from the cgroup ACL. Unfortunately we were telling
the cgroup code to use the guest /dev path, not the host /dev
path, and the guest device node had already been unlinked.
This was, however, fortunate since the code passed &priv->cgroup
instead of priv->cgroup, so would have crash if the device node
were accessible.
The LXC code missed the 'usb' component out of the path
/dev/bus/usb/$BUSNUM/$DEVNUM, so it failed to actually
setup cgroups for the device. This was in fact lucky
because the call to virLXCSetupHostUsbDeviceCgroup
was also mistakenly passing '&priv->cgroup' instead of
just 'priv->cgroup'. So once the path is fixed, libvirtd
would then crash trying to access the bogus virCgroupPtr
pointer. This would have been a security issue, were it
not for the bogus path preventing the pointer reference
being reached.
virDomainDefCompatibleDevice blocks use of USB if no USB
controller is present. This is not correct for containers
since devices can be assigned directly regardless of any
controllers.
Eric Blake [Tue, 5 Nov 2013 17:30:56 +0000 (10:30 -0700)]
storage: avoid short reads while chasing backing chain
Our backing file chain code was not very robust to an ill-timed
EINTR, which could lead to a short read causing us to randomly
treat metadata differently than usual. But the existing
virFileReadLimFD forces an error if we don't read the entire
file, even though we only care about the header of the file.
So add a new virFile function that does what we want.
* src/util/virfile.h (virFileReadHeaderFD): New prototype.
* src/util/virfile.c (virFileReadHeaderFD): New function.
* src/libvirt_private.syms (virfile.h): Export it.
* src/util/virstoragefile.c (virStorageFileGetMetadataInternal)
(virStorageFileProbeFormatFromFD): Use it.
Commit f9f56340 for CVE-2014-0028 almost had the right idea - we
need to check the ACL rules to filter which events to send. But
it overlooked one thing: the event dispatch queue is running in
the main loop thread, and therefore does not normally have a
current virIdentityPtr. But filter checks can be based on current
identity, so when libvirtd.conf contains access_drivers=["polkit"],
we ended up rejecting access for EVERY event due to failure to
look up the current identity, even if it should have been allowed.
Furthermore, even for events that are triggered by API calls, it
is important to remember that the point of events is that they can
be copied across multiple connections, which may have separate
identities and permissions. So even if events were dispatched
from a context where we have an identity, we must change to the
correct identity of the connection that will be receiving the
event, rather than basing a decision on the context that triggered
the event, when deciding whether to filter an event to a
particular connection.
If there were an easy way to get from virConnectPtr to the
appropriate virIdentityPtr, then object_event.c could adjust the
identity prior to checking whether to dispatch an event. But
setting up that back-reference is a bit invasive. Instead, it
is easier to delay the filtering check until lower down the
stack, at the point where we have direct access to the RPC
client object that owns an identity. As such, this patch ends
up reverting a large portion of the framework of commit f9f56340.
We also have to teach 'make check' to special-case the fact that
the event registration filtering is done at the point of dispatch,
rather than the point of registration. Note that even though we
don't actually use virConnectDomainEventRegisterCheckACL (because
the RegisterAny variant is sufficient), we still generate the
function for the purposes of documenting that the filtering
takes place.
Also note that I did not entirely delete the notion of a filter
from object_event.c; I still plan on using that for my upcoming
patch series for qemu monitor events in libvirt-qemu.so. In
other words, while this patch changes ACL filtering to live in
remote.c and therefore we have no current client of the filtering
in object_event.c, the notion of filtering in object_event.c is
still useful down the road.
* src/check-aclrules.pl: Exempt event registration from having to
pass checkACL filter down call stack.
* daemon/remote.c (remoteRelayDomainEventCheckACL)
(remoteRelayNetworkEventCheckACL): New functions.
(remoteRelay*Event*): Use new functions.
* src/conf/domain_event.h (virDomainEventStateRegister)
(virDomainEventStateRegisterID): Drop unused parameter.
* src/conf/network_event.h (virNetworkEventStateRegisterID):
Likewise.
* src/conf/domain_event.c (virDomainEventFilter): Delete unused
function.
* src/conf/network_event.c (virNetworkEventFilter): Likewise.
* src/libxl/libxl_driver.c: Adjust caller.
* src/lxc/lxc_driver.c: Likewise.
* src/network/bridge_driver.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/remote/remote_driver.c: Likewise.
* src/test/test_driver.c: Likewise.
* src/uml/uml_driver.c: Likewise.
* src/vbox/vbox_tmpl.c: Likewise.
* src/xen/xen_driver.c: Likewise.
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Eric Blake [Tue, 14 Jan 2014 17:29:34 +0000 (10:29 -0700)]
event: filter global events by domain:getattr ACL [CVE-2014-0028]
Ever since ACL filtering was added in commit 7639736 (v1.1.1), a
user could still use event registration to obtain access to a
domain that they could not normally access via virDomainLookup*
or virConnectListAllDomains and friends. We already have the
framework in the RPC generator for creating the filter, and
previous cleanup patches got us to the point that we can now
wire the filter through the entire object event stack.
Furthermore, whether or not domain:getattr is honored, use of
global events is a form of obtaining a list of networks, which
is covered by connect:search_domains added in a93cd08 (v1.1.0).
Ideally, we'd have a way to enforce connect:search_domains when
doing global registrations while omitting that check on a
per-domain registration. But this patch just unconditionally
requires connect:search_domains, even when no list could be
obtained, based on the following observations:
1. Administrators are unlikely to grant domain:getattr for one
or all domains while still denying connect:search_domains - a
user that is able to manage domains will want to be able to
manage them efficiently, but efficient management includes being
able to list the domains they can access. The idea of denying
connect:search_domains while still granting access to individual
domains is therefore not adding any real security, but just
serves as a layer of obscurity to annoy the end user.
2. In the current implementation, domain events are filtered
on the client; the server has no idea if a domain filter was
requested, and must therefore assume that all domain event
requests are global. Even if we fix the RPC protocol to
allow for server-side filtering for newer client/server combos,
making the connect:serach_domains ACL check conditional on
whether the domain argument was NULL won't benefit older clients.
Therefore, we choose to document that connect:search_domains
is a pre-requisite to any domain event management.
Network events need the same treatment, with the obvious
change of using connect:search_networks and network:getattr.
Eric Blake [Tue, 14 Jan 2014 00:30:59 +0000 (17:30 -0700)]
Fix memory leak in virObjectEventCallbackListRemoveID()
While running objecteventtest, it was found that valgrind pointed out the
following memory leak:
==13464== 5 bytes in 1 blocks are definitely lost in loss record 7 of 134
==13464== at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==13464== by 0x341F485E21: strdup (strdup.c:42)
==13464== by 0x4CAE28F: virStrdup (virstring.c:554)
==13464== by 0x4CF3CBE: virObjectEventCallbackListAddID (object_event.c:286)
==13464== by 0x4CF49CA: virObjectEventStateRegisterID (object_event.c:729)
==13464== by 0x4CF73FE: virDomainEventStateRegisterID (domain_event.c:1424)
==13464== by 0x4D7358F: testConnectDomainEventRegisterAny (test_driver.c:6032)
==13464== by 0x4D600C8: virConnectDomainEventRegisterAny (libvirt.c:19128)
==13464== by 0x402409: testDomainStartStopEvent (objecteventtest.c:232)
==13464== by 0x403451: virtTestRun (testutils.c:138)
==13464== by 0x402012: mymain (objecteventtest.c:395)
==13464== by 0x403AF2: virtTestMain (testutils.c:593)
==13464==
The @list->callbacks is an array that is inflated whenever a new event
is added, e.g. via virDomainEventCallbackListAddID(). However, when we
are freeing the array, we free the items within it but forgot to
actually free it.
When writing commit 173c291, I missed the fact virNetServerClientClose
unlocks the client object before actually clearing client->sock and thus
it is possible to hit a window when client->keepalive is NULL while
client->sock is not NULL. I was thinking client->sock == NULL was a
better check for a closed connection but apparently we have to go with
client->keepalive == NULL to actually fix the crash.
When a client closes its connection to libvirtd early during
virConnectOpen, more specifically just after making
REMOTE_PROC_CONNECT_SUPPORTS_FEATURE call to check if
VIR_DRV_FEATURE_PROGRAM_KEEPALIVE is supported without even waiting for
the result, libvirtd may crash due to a race in keep-alive
initialization. Once receiving the REMOTE_PROC_CONNECT_SUPPORTS_FEATURE
call, the daemon's event loop delegates it to a worker thread. In case
the event loop detects EOF on the connection and calls
virNetServerClientClose before the worker thread starts to handle
REMOTE_PROC_CONNECT_SUPPORTS_FEATURE call, client->keepalive will be
disposed by the time virNetServerClientStartKeepAlive gets called from
remoteDispatchConnectSupportsFeature. Because the flow is common for
both authenticated and read-only connections, even unprivileged clients
may cause the daemon to crash.
To avoid the crash, virNetServerClientStartKeepAlive needs to check if
the connection is still open before starting keep-alive protocol.
Every libvirt release since 0.9.8 is affected by this bug.
Jiri Denemark [Fri, 20 Dec 2013 13:50:02 +0000 (14:50 +0100)]
qemu: Avoid using stale data in virDomainGetBlockInfo
CVE-2013-6458
Generally, every API that is going to begin a job should do that before
fetching data from vm->def. However, qemuDomainGetBlockInfo does not
know whether it will have to start a job or not before checking vm->def.
To avoid using disk alias that might have been freed while we were
waiting for a job, we use its copy. In case the disk was removed in the
meantime, we will fail with "cannot find statistics for device '...'"
error message.
When virDomainDetachDeviceFlags is called concurrently to
virDomainBlockStats: libvirtd may crash because qemuDomainBlockStats
finds a disk in vm->def before getting a job on a domain and uses the
disk pointer after getting the job. However, the domain in unlocked
while waiting on a job condition and thus data behind the disk pointer
may disappear. This happens when thread 1 runs
virDomainDetachDeviceFlags and enters monitor to actually remove the
disk. Then another thread starts running virDomainBlockStats, finds the
disk in vm->def, and while it's waiting on the job condition (owned by
the first thread), the first thread finishes the disk removal. When the
second thread gets the job, the memory pointed to be the disk pointer is
already gone.
That said, every API that is going to begin a job should do that before
fetching data from vm->def.
Dario Faggioli [Fri, 20 Dec 2013 15:29:47 +0000 (16:29 +0100)]
libxl: avoid crashing if calling `virsh numatune' on inactive domain
by, in libxlDomainGetNumaParameters(), calling libxl_bitmap_init() as soon as
possible, which avoids getting to 'cleanup:', where libxl_bitmap_dispose()
happens, without having initialized the nodemap, and hence crashing after some
invalid free()-s:
Signed-off-by: Dario Faggili <dario.faggioli@citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit f9ee91d35510ccbc6fc42cef8864b291b2d220f4)
The function doesn't check whether the request is made for active or
inactive domain. Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).
I re-made the function in order for it to work the same way it's qemu
counterpart does.
Reproducer:
1) Define an LXC domain
2) Do 'virsh memtune <domain> --hard-limit 133T'
Backtrace:
Thread 6 (Thread 0x7fffec8c0700 (LWP 26826)):
#0 0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf718) at util/vircgroup.c:1764
#1 0x00007ffff70e9206 in virCgroupSetValueStr (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffe409f360 "1073741824")
at util/vircgroup.c:669
#2 0x00007ffff70e98b4 in virCgroupSetValueU64 (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=1073741824) at util/vircgroup.c:740
#3 0x00007ffff70ee518 in virCgroupSetMemory (group=0x0, kb=1048576) at util/vircgroup.c:1904
#4 0x00007ffff70ee675 in virCgroupSetMemoryHardLimit (group=0x0, kb=1048576)
at util/vircgroup.c:1944
#5 0x00005555557d54c8 in lxcDomainSetMemoryParameters (dom=0x7fffe40cc420,
params=0x7fffe409f100, nparams=1, flags=0) at lxc/lxc_driver.c:774
#6 0x00007ffff72c20f9 in virDomainSetMemoryParameters (domain=0x7fffe40cc420,
params=0x7fffe409f100, nparams=1, flags=0) at libvirt.c:4051
#7 0x000055555561365f in remoteDispatchDomainSetMemoryParameters (server=0x555555eb7e00,
client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510)
at remote_dispatch.h:7621
#8 0x00005555556133fd in remoteDispatchDomainSetMemoryParametersHelper (server=0x555555eb7e00,
client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510,
ret=0x7fffe40b84f0) at remote_dispatch.h:7591
#9 0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
at rpc/virnetserverprogram.c:435
#10 0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
at rpc/virnetserverprogram.c:305
#11 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ec4b10,
prog=0x555555ec3ae0, msg=0x555555eb94e0) at rpc/virnetserver.c:165
#12 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ec3e30, opaque=0x555555eb7e00)
at rpc/virnetserver.c:186
#13 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
#14 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
#15 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
#16 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
CVE-2013-6436: fix crash in lxcDomainGetMemoryParameters
The function doesn't check whether the request is made for active or
inactive domain. Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).
I re-made the function in order for it to work the same way it's qemu
counterpart does.
Reproducer:
1) Define an LXC domain
2) Do 'virsh memtune <domain>'
Backtrace:
Thread 6 (Thread 0x7fffec8c0700 (LWP 13387)):
#0 0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf750) at util/vircgroup.c:1764
#1 0x00007ffff70e958c in virCgroupGetValueStr (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf7c0) at util/vircgroup.c:705
#2 0x00007ffff70e9d29 in virCgroupGetValueU64 (group=0x0, controller=3,
key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf810) at util/vircgroup.c:804
#3 0x00007ffff70ee706 in virCgroupGetMemoryHardLimit (group=0x0, kb=0x7fffec8bf8a8)
at util/vircgroup.c:1962
#4 0x00005555557d590f in lxcDomainGetMemoryParameters (dom=0x7fffd40024a0,
params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at lxc/lxc_driver.c:826
#5 0x00007ffff72c28d3 in virDomainGetMemoryParameters (domain=0x7fffd40024a0,
params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at libvirt.c:4137
#6 0x000055555563714d in remoteDispatchDomainGetMemoryParameters (server=0x555555eb7e00,
client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
ret=0x7fffd4002420) at remote.c:1895
#7 0x00005555556052c4 in remoteDispatchDomainGetMemoryParametersHelper (server=0x555555eb7e00,
client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
ret=0x7fffd4002420) at remote_dispatch.h:4050
#8 0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
at rpc/virnetserverprogram.c:435
#9 0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
at rpc/virnetserverprogram.c:305
#10 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ebaef0,
prog=0x555555ec3ae0, msg=0x555555ebb3e0) at rpc/virnetserver.c:165
#11 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ebc7e0, opaque=0x555555eb7e00)
at rpc/virnetserver.c:186
#12 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
#13 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
#14 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
#15 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Ryota Ozaki [Thu, 24 Oct 2013 15:48:25 +0000 (00:48 +0900)]
virnetsocket: fix getsockopt on FreeBSD
aa0f099 introduced a strict error checking for getsockopt and it
revealed that getting a peer credential of a socket on FreeBSD
didn't work. Libvirtd hits the error:
error : virNetSocketGetUNIXIdentity:1198 : Failed to get valid
client socket identity groups
SOL_SOCKET (0xffff) was used as a level of getsockopt for
LOCAL_PEERCRED, however, it was wrong. 0 is correct as well as
Mac OS X.
So for LOCAL_PEERCRED our options are SOL_LOCAL (if defined) or
0 on Mac OS X and FreeBSD. According to the fact, the patch
simplifies the code by removing ifdef __APPLE__.
I tested the patch on FreeBSD 8.4, 9.2 and 10.0-BETA1.
For reference, Linux systems typically define it as:
typedef bool_t (*xdrproc_t)(XDR *, void *, ...);
The rationale explained in the header is that using a vararg is
incorrect and has a potential to change the ABI slightly do to compiler
optimizations taken and the undefined behavior. They decided
to specify the exact number of parameters and for compatibility with old
code decided to make the signature require 3 arguments. The third
argument is ignored for cases that its not used and its recommended to
supply a 0.
libxl: fix dubious cpumask handling in libxlDomainSetVcpuAffinities
Rather than casting the virBitmap pointer to uint8_t* and then using
the structure contents as a byte array, use the virBitmap API to determine
the bitmap size and test each bit.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Jim Fehlig [Fri, 1 Nov 2013 14:22:18 +0000 (08:22 -0600)]
Revert "libxl: Fix possible invalid read"
This reverts commit 394d6e0a95ac91facba06ab43d67ae8600b14b0e.
The real problem is accessing the virtBitmap structure as a byte
array, which was correctly identified and fixed by Jeremy Fitzhardinge
in recently xen commit: 7051d5c8, there is a api changes in
libxl_domain_create_restore.
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu Oct 10 12:23:10 2013 +0100
tools/migrate: Fix regression when migrating from older version of Xen
use the macro LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS in libxl.h
in order to make libvirt could compile with old and new xen.
the params checkpointed_stream is useful if libvirt libxl driver
support migration. for new, set it as zero.
When starting a transient VM the first thing done is to check
for duplicates. The check looks if there are any running VMs
with the matching name/uuid. It explicitly allows there to
be inactive VMs, so that a persistent VM can be temporarily
booted with a different config.
There is a race condition, however, where 2 or more clients
try to create the same transient VM. The first client will
cause a virDomainObjPtr to be added to the domain list, and
it is inactive at this stage. The second client may then
come along and see this inactive VM, and mistake it for a
persistent VM.
If the first VM fails to start its transient guest for any
reason, then it'll remove the virDomainObjPtr from the list.
The second client now has a virDomainObjPtr that it can try
to boot, which libvirt no longer has a record of. The result
can be a running QEMU process that is orphaned.
It was also, however, possible for the virDomainObjPtr to be
completely free'd which will cause libvirtd to crash in some
scenarios.
The fix is to only allow an existing inactive VM if it is
marked as persistent.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ryota Ozaki [Thu, 31 Oct 2013 15:45:12 +0000 (00:45 +0900)]
nodedev_hal: fix segfault when virDBusGetSystemBus fails
Thie patch fixes the segfault:
error : nodeStateInitialize:658 : DBus not available,
disabling HAL driver: internal error: Unable to get DBus
system bus connection: Failed to connect to socket
/var/run/dbus/system_bus_socket: No such file or directory
error : nodeStateInitialize:719 : ?:
Caught Segmentation violation dumping internal log buffer:
This segfault occurs at the below VIR_ERROR:
failure:
if (dbus_error_is_set(&err)) {
VIR_ERROR(_("%s: %s"), err.name, err.message);
When virDBusGetSystemBus fails, the code jumps to the above failure
path. However, the err variable is not correctly initialized
before calling virDBusGetSystemBus. As a result, dbus_error_is_set
may pass over the uninitialized err variable whose name or
message may point to somewhere unknown memory region, which
causes a segfault on VIR_ERROR.
The new code initializes the err variable before calling
virDBusGetSystemBus.
Include reference of the VM object pointer and name in debug
logs for QEMU start/stop functions. Also make sure we log the
PID that we started, since it isn't available elsewhere in the
logs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In debugging a recent oVirt/libvirt race condition, I was very
frustrated by lack of logging in the job enter/exit code. This
patch adds some key data which would have been useful in by
debugging attempts.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Eric Blake [Wed, 30 Oct 2013 21:42:31 +0000 (15:42 -0600)]
storage: use correct type for array count
Using size_t counts will let us use VIR_APPEND_ELEMENT and friends.
* src/conf/storage_conf.h (_virStoragePoolObjList)
(_virStorageVolDefList): Track list sizes with size_t.
* src/storage/storage_backend_rbd.c
(virStorageBackendRBDRefreshPool): Fix type fallout.
Eric Blake [Tue, 29 Oct 2013 19:57:19 +0000 (13:57 -0600)]
maint: avoid further typedef accidents
To make it easier to forbid future attempts at a confusing typedef
name ending in Ptr that isn't actually a pointer, insist that we
follow our preferred style of 'typedef foo *fooPtr'.
Fix race condition reconnecting to vms & loading configs
The following sequence
1. Define a persistent QMEU guest
2. Start the QEMU guest
3. Stop libvirtd
4. Kill the QEMU process
5. Start libvirtd
6. List persistent guests
At the last step, the previously running persistent guest
will be missing. This is because of a race condition in the
QEMU driver startup code. It does
1. Load all VM state files
2. Spawn thread to reconnect to each VM
3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get
marked as "persistent". There is therefore a window where
the thread reconnecting to the VM will remove the persistent
VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
In addition to this though, we must only attempt to reconnect
to a VM which had a non-zero PID loaded from its state file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Fix leak of objects when reconnecting to QEMU instances
The 'error' cleanup block in qemuProcessReconnect() had a
'return' statement in the middle of it. This caused a leak
of virConnectPtr & virQEMUDriverConfigPtr instances. This
was identified because netcf recently started checking its
refcount in libvirtd shutdown:
netcfStateCleanup:109 : internal error: Attempt to close netcf state driver with open connections
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Doug Goldstein [Mon, 28 Oct 2013 18:20:35 +0000 (13:20 -0500)]
MacOS: Re-add support for QEMU backend
The QEMU backend was disabled on Mac OS X without a reason in the code
and due to refactors its difficult to understand when/why it was
disabled. With QEMU being supported on Mac OS X there is no reason to
disable QEMU on this platform.
John Ferlan [Tue, 22 Oct 2013 07:50:08 +0000 (08:50 +0100)]
Add '+' to uid/gid printing for label processing
To ensure proper processing by virGetUserID() and virGetGroupID()
of a uid/gid add a "+" prior to the uid/gid to denote it's really
a uid/gid for the label.
Eric Blake [Tue, 29 Oct 2013 15:56:48 +0000 (09:56 -0600)]
storage: fix incorrect typedef
The rbd code had a confusing typedef ending in Ptr that was not
actually a pointer, which made the rest of the code harder to
read. This fixes things to actually pass by pointer rather than
by copy.
Push RPM deps down into libvirt-daemon-driver-XXXX sub-RPMs
For inexplicable reasons, many of the 3rd party package deps
were left against the 'libvirt-daemon' RPM when the drivers
were split out. This makes a minimal install heavier that
it should be. Push them all down into libvirt-daemon-driver-XXX
so they're only pulled in when truly needed
With this change applied, a minimal install of just the
libvirt-daemon-driver-lxc RPM is reduced by 41 MB on a
Fedora 19 host.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
capabilities: add baselabel per sec driver/virt type to secmodel
Expand the "secmodel" XML fragment of "host" with a sequence of
baselabel's which describe the default security context used by
libvirt with a specific security model and virtualization type:
Chen Hanxiao [Mon, 28 Oct 2013 11:18:26 +0000 (11:18 +0000)]
Skip debug message in lxcContainerSetID if no map is set.
The lxcContainerSetID() method prints a misleading log
message about setting the uid/gid when no ID map is
present in the XML config. Skip the debug message in
this case.
John Ferlan [Fri, 18 Oct 2013 10:43:47 +0000 (06:43 -0400)]
Avoid Coverity DEADCODE warning
Commit '922b7fda' resulted in two DEADCODE warnings from Coverity in
remoteDispatchAuthPolkit and virAccessDriverPolkitFormatProcess.
Commit '604ae657' modified the daemon.c code to remove the deadcode
issue, but did not do so for viracessdriverpolkit.c. This just mimics
the same changes