vircommand.c: write child pidfile before process tuning in virExec()
When VIR_EXEC_DAEMON is true and cmd->pidfile exists, the parent
will expect the pidfile to be written before exiting, sitting
tight in a saferead() call waiting.
The child then does process tuning (via virProcessSet* functions)
before writing the pidfile. Problem is that these tunings can
fail, and trigger a 'fork_error' jump, before cmd->pidfile is
written. The result is that the process was aborted in the
child, but the parent is still hang in the saferead() call.
This behavior can be reproduced by trying to create and execute
a QEMU guest in user mode (e.g. using qemu:///session as non-root).
virProcessSetMaxMemLock() will fail if the spawned libvirtd user
process does not have CAP_SYS_RESOURCE capability. setrlimit() will
fail, and a 'fork_error' jump is triggered before cmd->pidfile
is written. The parent will hung in saferead() indefinitely. From
the user perspective, 'virsh start <guest>' will hang up
indefinitely. CTRL+C can be used to retrieve the terminal, but
any subsequent 'virsh' call will also hang because the previous
libvirtd user process is still there.
We can fix this by moving all virProcessSet*() tuning functions
to be executed after cmd->pidfile is taken care of. In the case
mentioned above, this would be the result of 'virsh start'
after this patch:
error: Failed to start domain vm1
error: internal error: Process exited prior to exec: libvirt: error :
cannot limit locked memory to 79691776: Operation not permitted
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>
virfirewalltest: Don't duplicate string when adding it onto stringlist
In our wrapper of g_dbus_connection_call_sync() in
virfirewalltest a string is duplicated and added onto a
virStringList. This leads to a memory leak because
virStringListAdd() duplicates the string itself.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
With us switching to glib more and more it is easy to get things
wrong (as can be seen in the previous commit). Set G_DEBUG
variable to "fatal-warnings" which causes GLib to abort the
program at the first call to g_warning() or g_critical().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Pavel Hrdina [Fri, 2 Oct 2020 10:11:45 +0000 (12:11 +0200)]
tests: fix incorrect free of GVariant in our GLib mock functions
GLib implementation of g_dbus_connection_call_sync() calls
g_variant_ref_sink() on the passed @parameters to make sure they have
proper reference. If the original reference is floating the
g_dbus_connection_call_sync() consumes it, but if it's normal reference
it will just add another one.
Our mock functions were only freeing the @parameters which is incorrect
and doesn't reflect how the real implementation works.
Reported-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If writing side writes enough bytes to the pipe and closes writing
end then we got both VIR_EVENT_HANDLE_HANGUP and VIR_EVENT_HANDLE_READ
in handler. Currently in this situation handler reads 1024 bytes
and finish reading leaving unread data in pipe.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Wed, 30 Sep 2020 22:51:45 +0000 (18:51 -0400)]
build: remove duplicate check for GET_VLAN_VID_CMD
Somehow this check was duplicated just below the original.
(I was at first skeptical that it's needed at all, since
GET_VLAN_VID_CMD was already present in kernel 2.6.32, but then I
realized that there is no higher level check for __linux__ around the
code that is conditional on WITH_DECL_GET_VLAN_VID_CMD; it only checks
for SIOCGIFVLAN and WITH_STRUCT_IFREQ - the latter is also present on
*BSD platforms, the former doesn't seem to be anywhere but Linux, but
I didn't want to change the effect of the conditional, so I left it in
(we could have also replaced WITH_DECL_GET_VLAN_VID_CMD, but possibly
there is a non-Linux platform that *does* have it...)
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Wed, 30 Sep 2020 21:56:00 +0000 (17:56 -0400)]
util: provide non-netlink/libnl alternative for virNetDevGetMaster()
Lack of this one function (which is called for each active tap device
every time libvirtd is started) is the one thing preventing a
"WITHOUT_LIBNL" build of libvirt from being useful. With this
alternate implementation, guests using standard tap devices will work
properly even when libvirt is built without libnl support.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Wed, 30 Sep 2020 20:36:52 +0000 (16:36 -0400)]
util: fix Linux build when libnl-devel isn't available
There was one stray bit of code in virnetdev.c that required libnl to
build, but wasn't qualified by defined(WITH_LIBNL). Adding that, plus
putting a similar check around a static function only used by that
aforementioned code, makes libvirt build properly without libnl3-devel
installed.
How useful it is in that state is a separate issue :-)
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 29 Sep 2020 14:54:38 +0000 (10:54 -0400)]
build: eliminate WITH_MACVTAP flag entirely
This flag was originally created to indicate that either 1) the build
platform wasn't linux, 2) the build platform was linux, but the kernel
was too old to have macvtap support. Since there was already a switch
there, the ability to also disable it when 3) the kernel supports
macvtap but the user doesn't want it, was added in. I don't think that
(3) was ever an intentional goal, just something that grew naturally
out of having the flag there in the first place (unless possibly the
original author wanted a way to quickly disable their new code in case
it caused regressions elsewhere).
Now that the check for (2) has been removed, WITH_MACVTAP is just
checking (1) and (3), but (3) is pointless (because the extra code in
libvirt itself is miniscule, and the only external library needed for
it is libnl, which is also required for other unrelated features (and
itself has no subordinate dependencies and takes up < 1MB on
disk)). We can therfore eliminate the WITH_MACVTAP flag, as it is
functionally equivalent to WITH_LIBNL (which implies __linux__).
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 29 Sep 2020 13:56:18 +0000 (09:56 -0400)]
build: simplify check for WITH_MACVTAP
macvtap support was added to the Linux kernel in 2.6.33. libvirt
checked for this by looking for MACVLAN_MODE_BRIDGE and IFLA_VF_MAX in
linux/if_link.h. This hasn't been necessary for a very long time, so
just gate on platform == 'linux' (and be sure to complain if someone
tries to enable it on a non-Linux platform).
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 29 Sep 2020 13:43:54 +0000 (09:43 -0400)]
build: remove check for MACVLAN_MODE_PASSTHRU
macvlan support was added to the Linux kernel in 2.6.33, but
MACVLAN_MODE_PASSTHRU wasn't added until 2.6.38, so a workaround had
been put in place to define that constant on those few systems where
it was missing. It's useful like was probably 6 months at most, but
it's been there for over 10 years.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Mon, 28 Sep 2020 21:39:46 +0000 (17:39 -0400)]
build: eliminate useless WITH_VIRTUALPORT check
WITH_VIRTUALPORT just checks that we are building on Linux and that
IFLA_PORT_MAX is defined in linux/if_link.h. Back when 802.11Qb[gh]
support was added, the IFLA_* stuff was new (introduced in kernel
2.6.35, backported to RHEL6 2.6.32 kernel at some point), and so this
extra check was necessary, because libvirt was being built on Linux
distros that didn't yet have IFLA_* (e.g. older RHEL6, all
RHEL5). It's been in the kernel for a *very* long time now, so all
supported versions of all Linux platforms libvirt builds on have it.
Note that the above paragraph implies that the conditional compilation
should be changed to #if defined(__linux__). However, the astute
reader will notice that the code in question is sending and receiving
netlink messages, so it really should be conditional on WITH_LIBNL
(which implies __linux__) instead, so that's what this patch does.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Mon, 28 Sep 2020 21:29:41 +0000 (17:29 -0400)]
util: remove extraneous defined(__linux__) when checking for WITH_LIBNL
WITH_LIBNL will only be defined on Linux platforms (because libnl is a
library written to encapsulate parts of netlink, which is a Linux-only
API), so it's redundant to write:
#if defined(__linux__) && defined(WITH_LIBNL)
We can just check for WITH_LIBNL.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Mon, 28 Sep 2020 19:53:48 +0000 (15:53 -0400)]
util: remove useless checks for IFLA_VF_MAX
IFLA_VF_MAX was introduced to the Linux kernel in 2.6.35, and was even
backported to the RHEL*6* 2.6.32 kernel downstream, so it is present
in all supported versions of all Linux distros that libvirt builds
on. Additionally, it can't be conditionally compiled out of a
kernel. There is no reason to conditionalize any piece of code on
presence of IFLA_VF_MAX - if the platform is Linux, it is supported.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Sat, 26 Sep 2020 16:59:11 +0000 (12:59 -0400)]
conf: use g_free() instead of VIR_FREE in virDomainNetDefFree()
All these lines were moved over from the now-defunct
virDomainNetDefClear(), which required all pointers to be cleared
after free, but virDomainNetDefFree() doesn't have that restriction -
after free'ing the pointers are never again referenced, so g_free() is
safe.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Wed, 23 Sep 2020 01:19:38 +0000 (21:19 -0400)]
qemu: eliminate use of virDomainNetDefClear() in qemuConnectDomainXMLToNative()
Instead of saving the interesting pieces of each existing NetDef,
clearing it, and then copying back the saved pieces after setting the
type to ethernet, just create a new NetDef, copy in the interesting
bits, and replace the old one. (The end game is to eliminate
virDomainNetDefClear() completely, since this is the only real use)
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
these are old enough that they can be assumed present in all Linux
platforms we support. The tap device creation code changed is specific
to Linux, with a separate impl for non-Linux platforms.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Michal Privoznik [Mon, 25 May 2020 11:56:45 +0000 (13:56 +0200)]
qemu: Use memory-backend-* for regular guest memory
So far, Libvirt configures memory-backend-* for memory hotplug,
possibly NUMA nodes and in a few other cases. This patch
switches to constructing the memory-backend-* command line for
all cases. To keep ability to migrate guests a little hack is
used: the ID of the object is set to the one that QEMU uses
internally anyways. These IDs are stable (first started to appear
somewhere around v0.13.0-rc0~96) and can't change.
In fact, this patch does exactly what QEMU does internally. The
reason for moving the logic into Libvirt is that QEMU wants to
deprecate the old style of specifying memory.
So far, only x84_64 test cases are changed, because tests for
other architectures use older capabilities, which still lack the
QEMU_CAPS_MACHINE_MEMORY_BACKEND capability and they don't report
the RAM ID.
Michal Privoznik [Mon, 25 May 2020 17:13:43 +0000 (19:13 +0200)]
qemu: Track default-ram-id machine attribute
The machine structure has another (optional) attribute:
default-ram-id, which specifies the alias of the default RAM
object. While the alias is private, it can never change in order
to not break migration. QEMU uses the alias when allocating
regular, not NUMA memory. In order to switch to new command line
and maintain migration, save this ID.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The objects at @def and @mem pointers are only read and not
written. Make the arguments const to make that explicit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Michal Privoznik [Mon, 25 May 2020 12:35:51 +0000 (14:35 +0200)]
qemuBuildMemoryBackendProps: Prealloc mem for memfd backend
If a domain was using hugepages through memory-backend-file or
via -mem-path, we would turn prealloc on. But we are not doing
that for memory-backend-memfd. Fix this, because we need QEMU to
fully allocate hugepages.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
If user specifies immediate memory allocation in the domain XML,
they want QEMU to fully allocate its memory. And if the memory
was allocated using plain '-m' then we would honour it. But, if a
memory backend is used, then we don't set the prealloc attribute
of the backend.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Michal Privoznik [Mon, 25 May 2020 12:35:25 +0000 (14:35 +0200)]
qemuBuildMemoryBackendProps: Move @prealloc setting to backend agnostic part
All three memory backends (-file, -ram and -memfd) have .prealloc
attribute. Since we are setting it only for -file, the
corresponding code lives only under if() that handles that
specific backend. But in near future we will want to set the
attribute for other backends too. Therefore, move the
corresponding code outside of the if().
This causes some .argv files to be changed, but the only change
happening there is move of the attribute (best viewed with:
'git show --color-words=.').
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This originally started as bug 1595525 in which namespaces and
libusb used in QEMU were not playing nicely with each other. The
problem was that libusb built a cache of USB devices it saw
(which was a very limited set because of the namespace) and then
expected to receive udev events to keep the cache in sync. But
those udev events didn't come because on hotplug when we mknod()
devices in the namespace no udev event is generated. And what is
worse - libusb failed to open a device that wasn't in the cache.
Without going further into what the problem was, libusb added a
new API for opening USB devices that avoids using cache which
QEMU incorporated and exposes under "hostdevice" attribute.
What is even nicer is that QEMU uses qemu_open() for path
provided in the attribute and thus FD passing could be used.
Except qemu_open() expects so called FD sets instead of `getfd'
and these are not implemented in libvirt, yet.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1877218 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
This capability tracks whether "usb-host" device has "hostdevice"
attribute. This attribute allows us to specify full path to the
USB device ("/dev/bus/usb/$bus/$dev") but more importantly, since
QEMU uses qemu_open() for this attribute it allows us to pass
pre-opened FD and have QEMU not bother with opening the file at
all.
The attribute was added in v5.1.0-rc0~71^2~1 QEMU commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Michal Privoznik [Wed, 30 Sep 2020 08:37:29 +0000 (10:37 +0200)]
virDomainNumaFillCPUsInNode: Skip over NUMA nodes without vCPUs
After v6.5.0-rc1~148 we started to rectify vCPU to guest NUMA
assignment - if there is a vCPU not assigned to any guest NUMA
node it is automatically assigned to node #0.
A month later I've made it possible to define guest NUMA nodes
without vCPUs (v6.6.0-rc1~250) - this is needed because of HMAT.
As a part of that I fixed all callers of
virDomainNumaGetNodeCpumask() (which returns a bitmap of vCPUs for
given node) to handle case when NULL is returned (i.e. no vCPUs
assigned to given node). But of course my patch was written
before aforementioned vCPU rectify patch but merged afterwards
and hence I missed the virDomainNumaFillCPUsInNode() caller.
And because we are dealing with a NULL pointer, of course this
leads to a crash. Just try to define a domain with at least two
NUMA nodes and no vCPU assignment to any of the nodes.
Fixes: a26f61ee0cffa421b87ef568002b684dd8025432
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1880289 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Peter Krempa [Mon, 14 Sep 2020 10:26:05 +0000 (12:26 +0200)]
tests: qemucapabilities: Update capabilities for qemu-5.2 dev cycle
Mid-cycle caps resync. Notable change is that virtio-blk enables
multiqueue by default and the addition of
'calc-dirty-rate'/'query-dirty-rate' QMP commands.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Mon, 28 Sep 2020 15:12:40 +0000 (17:12 +0200)]
qemuMigrationCookieXMLFormatStr: Remove
There is just one caller, inline the code. This also optimizes the code
as we no longer have to calculate length of the output XML as it's
actually stored in the buffer struct.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 22 Sep 2020 12:53:57 +0000 (14:53 +0200)]
qemu: hotplug: Remove overlay of <transient> disk on disk unplug
Remove the overlay if the disk was <transient/>. Note that even if we'd
forbid unplug of such a disk through the API, the disk can still be
ejected from the guest.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Mon, 21 Sep 2020 17:39:02 +0000 (19:39 +0200)]
qemu: snapshot: Introduce helpers for creating overlays on <transient/> disks
To implement <transient/> disks we'll need to install an overlay on top
of the original disk image which will be discarded after the VM is
turned off. This was initially implemented by qemu but libvirt never
picked up this option as the overlays were created by qemu without
libvirt involvment which didn't work with SELinux.
With blockdev the qemu feature became unsupported so we need to do this
via the snapshot code anyways.
The helpers introduced in this patch prepare a fake snapshot disk
definition for a disk which is configured as <transient/> and use it to
create a snapshot (without actually modifying metadata or persistent
def).
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 22 Sep 2020 12:39:27 +0000 (14:39 +0200)]
qemu: prepare cleanup for <transient/> disk overlays
Later patches will implement support for <transient/> disks in libvirt
by installing an overlay on top of the configured image. This will
require cleanup after the VM will be stopped so that the state is
correctly discarded.
Since the overlay will be installed only during the startup phase of the
VM we need to ensure that qemuProcessStop doesn't delete the original
file on some previous failure. This is solved by adding
'inhibitDiskTransientDelete' VM private data member which is set prior
to any startup step and will be cleared once transient disk overlays are
established.
Based on that we can then delete the overlays for any <transient/> disk.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com>
Ján Tomko [Fri, 18 Sep 2020 15:44:56 +0000 (17:44 +0200)]
rpc: gendispatch: handle empty flags
CVE-2020-25637
Prepare for omission of the <flagname> in remote_protocol.x
@acl annotations:
@acl: <object>:<permission>:<flagname>
so that we can add more fields after, e.g.:
@acl: <object>:<permission>::<field>
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>