Michal Privoznik [Tue, 28 Jul 2020 08:51:32 +0000 (10:51 +0200)]
qemu: Validate memory hotplug in domainValidateCallback instead of cmd line generator
When editing a domain with hotplug enabled, I removed the only
NUMA node it had and got no error. I got the error later though,
when starting the domain. This is not as user friendly as it can
be. Move the validation call out from command line generator and
into domain validator (which is called prior to starting cmd line
generation anyway).
When doing this, I had to remove memory-hotplug-nonuma xml2xml
test case because there is no way the test case can succeed,
obviously.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Michal Privoznik [Mon, 31 Aug 2020 09:35:47 +0000 (11:35 +0200)]
RNG: Allow interleaving of /domain/cpu/numa/cell children
So far, the <cell/> element can have two types of children
elements: <distances/> and <cache/> (which can be repeated more
times). However, there is no reason to require specific order in
input XML. Allow elements to be interleaved.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Fangge Jin [Thu, 20 Aug 2020 10:09:24 +0000 (18:09 +0800)]
docs: add kbase entry for migrationinternals
Commit c051e56d27 added migrationinternals.rst in kbase, but the
entry was missing.
Signed-off-by: Fangge Jin <fjin@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com>
Scott Shambarger [Tue, 25 Aug 2020 23:47:07 +0000 (16:47 -0700)]
util: use host module suffix when loading drivers
Driver module loaders current hardcode ".so" as the file
extension. On MacOS, meson uses ".dylib" as a module file extension.
This patch adds VIR_FILE_MODULE_EXT to virfile.h defined as the
hosts module extension, and updates driver module loaders to make
use of it.
Signed-off-by: Scott Shambarger <scott-libvirt@shambarger.net> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Previous patch handled the runtime case where a non-x86 host is
fetching /proc/cpuinfo data for a microcode info that we know
it doesn't exist. This change alone speeded everything by a
bit for non-x86, but there is at least one major culprit left.
qemuxml2argvtest does several arch-specific tests, and a good
chunk of them are x86 exclusive. This means that 'hostArch'
will be seen as x86 for these tests, even when running in
non-x86 hosts. In a Power 9 server with 128 CPUs, qemuxml2argvtest
takes 298 seconds to complete in average, and 'perf record'
indicates that 95% of the time is spent in
virHostCPUGetMicrocodeVersion().
This patch mocks virHostCPUGetMicrocodeVersion() to always return
0 in the tests, avoiding /proc/cpuinfo reads. This will make all
tests behave arch-agnostic, and the microcode value being 0 has no
impact on any existing test.
This is a CI speed across the board for all archs, including x86,
given that we're not reading /proc/cpuinfo in the tests. For
a Thinkpad T480 laptop with 8 Intel i7 CPUs, qemuxml2argvtest
went from 15.50 sec to 12.50 seconds. The performance gain is even
more noticeable for huge servers with lots of CPUs. For the
Power 9 server mentioned above, this patch speeds qemuxml2argvtest
to 9 seconds, down from 298 sec.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com>
virhostcpu.c: skip non x86 hosts in virHostCPUGetMicrocodeVersion()
Non-x86 archs does not have a 'microcode' version like x86. This is
covered already inside the function - just return 0 if no microcode
is found. Regardless of that, a read of /proc/cpuinfo is always made.
Each read will invoke the kernel to fill in the CPU details every time.
Now let's consider a non-x86 host, like a Power 9 server with 128 CPUs.
Each /proc/cpuinfo read will need to fetch data for each CPU and it
won't even matter because we know beforehand that PowerPC chips don't
have microcode information.
We can do better for non-x86 hosts by skipping this process entirely.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jim Fehlig [Thu, 30 Jul 2020 19:25:20 +0000 (13:25 -0600)]
Xen: Add support for qemu command-line passthrough
Xen supports passing arbitrary arguments to the QEMU device model via
the 'extra' member of the public libxl_domain_build_info structure.
This patch adds a 'xen' namespace extension, similar to the QEMU and
bhyve drivers, to map arbitrary arguments to the 'extra' member. Only
passthrough of arguments is supported. Passthrough of environment
variables or capabilities adjustments is not supported.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Pavel Hrdina [Tue, 25 Aug 2020 13:09:53 +0000 (15:09 +0200)]
storage_util: fix qemu-img sparse allocation
Commit <c9ec7088c7a3f4cd26bb471f1f243931fff6f4f9> introduced a support
to fully allocate qcow2 images when <allocation> matches <capacity> but
it doesn't work as expected.
The issue is that info.size_arg is in KB but the info.allocation
introduced by the mentioned commit is in B. This results in using
"preallocation=falloc," in cases where "preallocation=metadata," should
be used.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Jin Yan [Thu, 13 Aug 2020 03:37:56 +0000 (11:37 +0800)]
virnetserver: fix some memory leaks in virNetTLSContextReloadForServer
These leaks were introduced in commit 15d280fa97b0, use g_autofree for all
cert_path pointers.
Signed-off-by: Jin Yan <jinyan12@huawei.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Han Han [Tue, 25 Aug 2020 03:50:33 +0000 (11:50 +0800)]
virsh: Add source-initiator opt to build the initiator of pool XML
For iscsi-direct pool, the initiator is necessary for pool defining:
<pool type="iscsi-direct">
...
<initiator>
<iqn name="iqn.2013-06.com.example:iscsi-initiator"/>
</initiator>
...
</pool>
Add --source-initiator to fill the initiator iqn for
pool-create-as/pool-define-as subcommands.
Document the new <audio> element which allows to specify
host audio backend for a guest <sound> device, and update
the <sound> element description with the new <audio>
sub-element which specifies the other end of the mapping.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
bhyve supports intel hda sound devices that could be specified
on the command like using "-1:0,hda,play=$play_dev,rec=$rec_dev",
where "1:0" is a PCI address, and "$play_dev" and "$rec_dev"
point to the playback and recording device on the host respectively.
Currently, schema of the 'sound' element doesn't allow specifying
neither playback nor recording devices, so for now hardcode
/dev/dsp0, which is the first audio device on the host.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Laine Stump [Sun, 23 Aug 2020 03:42:52 +0000 (23:42 -0400)]
conf: properly clear out autogenerated macvtap names when formatting/parsing
Back when macvtap support was added in commit 315baab9443 in Feb. 2010
(libvirt-0.7.7), it was setup to autogenerate a name for the device if
one wasn't supplied, in the pattern "macvtap%d" (or "macvlan%d"),
similar to the way an unspecified standard tap device name will lead
to an autogenerated "vnet%d".
As a matter of fact, in commit ca1b7cc8e45 added in May 2010, the code
was changed to *always* ignore a supplied device name for macvtap
interfaces by deleting *any* name immediately during the <interface>
parsing (this was intended to prevent one domain which had failed to
completely start from deleting the macvtap device of another domain
which had subsequently been provided the same device name (this will
seem mildly ironic later). This was later fixed to only clear the
device name when inactive XML was being parsed. HOWEVER - this was
only done if the xml was <interface type='direct'> - autogenerated
names were not cleared for <interface type='network'> (which could
also result in a macvtap device).
Although the names of "vnetX" tap devices had always been
automatically cleared when parsing <interface> (see commit d1304583d
from July 2008 (!)), at the time macvtap support was added, both vnetX
and macvtapX device names were always included when formatting the
XML.
Then in commit a8be259d0cc (July 2011, libvirt-0.9.4), <interface>
formatting was changed to also clear out "vnetX" device names during
XML formatting as well. However the same treatment wasn't given to
"macvtapX".
Now in 2020, there has been a report that a failed migration leads to
the macvtap device of some other unrelated guest on the destination
host losing its network connectivity. It was determined that this was
due to the domain XML in the migration containing a macvtap device
name, e.g. "macvtap0", that was already in use by the other guest on
the destination. Normally this wouldn't be a problem, because libvirt
would see that the device was already in use, and then find a
different unused name. But in this case, other external problems were
causing the migration to fail prior to selecting a macvtap device and
successfully opening it, and during error recovery, qemuProcessStop()
was called, which went through all def->nets objects and (if they were
macvtap) deleted the device specified in net->ifname; since libvirt
hadn't gotten to the point of replacing the incoming "macvtap0" with
the name of a device it actually created for this guest, that meant
that "macvtap0" was deleted, *even though it was currently in use by a
different guest*!
Whew!
So, it turns out that when formatting "migratable" XML, "vnetX"
devices are omitted, just as when formatting "inactive" XML. By making
the code in both interface parsing and formatting consistent for
"vnetX", "macvtapX", and "macvlanX", we can thus make sure that the
autogenerated (and unneeded / completely *not* wanted) macvtap device
name will not be sent with the migration XML. This way when a
migration fails, net->ifname will be NULL, and libvirt won't have any
device to try and (erroneously) delete.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Laine Stump [Sat, 22 Aug 2020 21:43:24 +0000 (17:43 -0400)]
qemu: remove unreachable code in qemuProcessStart()
Back when the original version of this chunk of code was added (commit 41b087198 in libvirt-0.8.1 in April 2010), we used virExecDaemonize()
to start the qemu process, and would continue on in the function
(which at that time was called qemudStartVMDaemon()) even if a -1 was
returned. So it was possible to get to this code with rv == -1 (it was
called "ret" in that version of the code).
In modern libvirt code, qemu is started with virCommandRun(); then we
call virPidFileReadPath(); those are the only two ways of setting "rv"
prior to this code being removed, and in either case if the new value
of rv < 0, then we immediately skip over the rest of the code to the
cleanup: label.
This means that the code being removed by this patch is
unreachable.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Michal Privoznik [Fri, 21 Aug 2020 13:49:29 +0000 (15:49 +0200)]
qemu_namespace: Don't build namespace if domain doesn't have it enabled
Even if namespaces are disabled, then due to a missing check at the
beginning of qemuDomainBuildNamespace(), the domain startup code
still tries to populate (nonexistent) domain's namespace.
Fixes: 8da362fe62766b4eee209cd3ce591ceb62299d13 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The existing auto-align behavior for pSeries has the idea to
alleviate user configuration of the NVDIMM size, given that the
alignment calculation is not trivial to do (256MiB alignment
of mem->size - mem->label_size value, a.k.a guest area). We
align mem->size down to avoid end of file problems.
The end result is not ideal though. We do not touch the domain
XML, meaning that the XML can report a NVDIMM size 255MiB smaller
than the actual size the guest is seeing. It also adds one more
thing to consider in case the guest is reporting less memory
than declared, since the auto-align is transparent to the
user.
Following Andrea's suggestion in [1], let's instead do an
size alignment validation. If the NVDIMM is unaligned, error out
and suggest a rounded up value. This can be bothersome to users,
but will bring consistency of NVDIMM size between the domain XML
and the guest.
This approach will force existing non-running pSeries guests to
readjust the NVDIMM value in their XMLs, if necessary. No changes
were made for x86 NVDIMM support.
Suggested-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
qemu_domain.c: make qemuDomainGetMemorySizeAlignment() public
Next patch will use it outside of qemu_domain.c.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Mon, 24 Aug 2020 16:29:44 +0000 (18:29 +0200)]
qemuDomainGetMemorySizeAlignment: Mark domain @def const
This function is not changing the domain definition, it's only
reading from it. The function is going to be used from another
function which already takes const virDomainDef. Make the @def
const to avoid typecasting it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
- Update the descriptions of --current & --config flags.
For --config, the reason to rephrase "next boot" to "next start"
is: "Next boot may still imply somebody selecting "reboot" in the
guest OS and fully expecting the changes to be applied." (per Peter
Krempa)
For --current, existing documentation says:
"If *--current* is specified, affect the current guest state."
It's not entirely clear what states can "current" mean or imply. So
rephrase it in context of the other two related flags --live and
--config.
- While at it, I also took the liberty to replace the few occurrences
of "peristent domain[s]" with "persistent guest[s]"
Fix all occurrences (i.e. as many as I could spot) of this.
(Thanks: Dan Berrangé on IRC.)
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Mon, 24 Aug 2020 15:38:05 +0000 (17:38 +0200)]
qemu: Move virQEMUFileOpenAs to qemu_domain.c
Commit 43620689794507308fbd3def6992a68ee2f8fa97 moved the function to
util/virqemu.c which is compiled also on win32 and geteuid()/getegid()
doesn't exist there.
Move it to qemu_domain.c which is compiled only when the qemu driver is
enabled. Originally I didn't want to put it here as qemu_domain.c is a
code dump for helper functions but this is the least invasive fix.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Thu, 16 Jul 2020 09:40:34 +0000 (11:40 +0200)]
qemuFileWrapperFDClose: move to qemu_domain.c
Move the code to qemu_domain.c so that it can be reused in other parts
of the qemu driver. 'qemu_domain' was chosen as we check the domain
state after closing the wrapper.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Thu, 16 Jul 2020 09:17:47 +0000 (11:17 +0200)]
qemuOpenFile: Move to qemu_domain.c
Move the code to qemu_domain.c so that it can be reused in other parts
of the qemu driver. 'qemu_domain' was chosen as the permissions are
based on the domain configuration.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Wed, 19 Aug 2020 11:17:06 +0000 (13:17 +0200)]
qemuMigrationParamsFromJSON: Unify return value handling with other functions
This function doesn't have an overly verbose cleanup section as there
isn't any error code path. Unify it with the rest of the functions which
will simplify adding a possible error path.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
When receiving stream (on virStorageVolUpload() and subsequent
virStreamSparseSendAll()) we may receive a hole. If the volume we
are saving the incoming data into is a regular file we just
lseek() and ftruncate() to create the hole. But this won't work
if the file is a block device. If that is the case we must write
zeroes so that any subsequent reader reads nothing just zeroes
(just like they would from a hole in a regular file).
The virshStreamInData() callback is used by virStreamSparseSendAll()
to detect whether the file the data is read from is in data or hole
section. The SendAll() will then send corresponding type of virStream
message to make server create a hole or write actual data. But the
callback uses virFileInData() even for block devices, which results in
an error. Just like in previous commit, emulate a DATA section
for block devices.
When handling sparse stream, a thread is executed. This thread
runs a read() or write() loop (depending what API is called; in
this case it's virStorageVolDownload() and this the thread run
read() loop). The read() is handled in virFDStreamThreadDoRead()
which is then data/hole section aware, meaning it uses
virFileInData() to detect data and hole sections and sends
TYPE_DATA or TYPE_HOLE virStream messages accordingly.
However, virFileInData() does not work with block devices. Simply
because block devices don't have data and hole sections. What we
can do though, is to mimic being always in a DATA section.
This callback is called when the server sends us STREAM_HOLE
meaning there is no real data, only zeroes. For regular files
we would just seek() beyond EOF and ftruncate() to create the
hole. But for block devices this won't work. Not only we can't
seek() beyond EOF, and ftruncate() will fail, this approach won't
fill the device with zeroes. We have to do it manually.
virsh: Track if vol-upload or vol-download work over a block device
We can't use virFileInData() with block devices, but we can
emulate being in data section all the time (vol-upload case).
Alternatively, we can't just lseek() beyond EOF with block
devices to create a hole, we will have to write zeroes
(vol-download case). But to decide we need to know if the FD we
are reading data from / writing data to is a block device. Store
this information in _virshStreamCallbackData.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and virshStreamSkip()
These callback will need to know more that the FD they are
working on. Pass the structure that is passed to other stream
callbacks (e.g. virshStreamSource() or virshStreamSourceSkip())
instead of inventing a new one.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
libvirt-storage: Document volume upload/download stream format
For libvirt, the volume is just a binary blob and it doesn't
interpret data on volume upload/download. But as it turns out,
this unspoken assumption is not clear to our users. Document it
explicitly.
Peter Krempa [Thu, 6 Aug 2020 17:43:51 +0000 (19:43 +0200)]
testutilsqemuschema: Add template checker for schema entries
We'll need to match that a certain part of the qemu schema hasn't grown
new properties unexpectedly. Add a helper which matches an 'object' QMP
schema entry against a template and reports errors if expected types
don't match or new entries are added.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Andrea Bolognani [Wed, 19 Aug 2020 09:15:35 +0000 (11:15 +0200)]
meson: Improve RPATH handling
Right now we're unconditionally adding RPATH information to the
installed binaries and libraries, but that's not always desired.
autotools seem to be smart enough to only include that information
when targeting a non-standard prefix, so most distro packages
don't actually contain it; moreover, both Debian and Fedora have
wiki pages encouraging packagers to avoid setting RPATH:
Implement RPATH logic that Does The Right Thing™ in the most
common cases, while still offering users the ability to override
the default behavior if they have specific needs.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Andrea Bolognani [Mon, 24 Aug 2020 07:52:43 +0000 (09:52 +0200)]
ABOUT-NLS: Drop symlink
The ABOUT-NLS symlink pointing to po/README.rst is a leftover
from when we were using autotools as the build system, and now
that we're using Meson we can drop it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>