Peter Krempa [Thu, 27 Feb 2020 08:08:26 +0000 (09:08 +0100)]
kbase: backing_chains: Add steps how to securely probe image format
We document steps how to fix images if they are rejected for missing
the 'backing file format' field. Document also how to securely probe
the image format if it's unknown.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Pavel Hrdina [Wed, 26 Feb 2020 12:23:00 +0000 (13:23 +0100)]
daemon: set default memlock limit for systemd service
The default memlock limit is 64k which is not enough to start a single
VM. The requirements for one VM are 12k, 8k for eBPF map and 4k for eBPF
program, however, it fails to create eBPF map and program with 64k limit.
By testing I figured out that the minimal limit is 80k to start a single
VM with functional eBPF and if I add 12k I can start another one.
This leads into following calculation:
80k as memlock limit worked to start a VM with eBPF which means there
is 68k of lock memory that I was not able to figure out what was using
it. So to get a number for 4096 VMs:
68 + 12 * 4096 = 49220
If we round it up we will get 64M of memory lock limit to support 4096
VMs with default map size which can hold 64 entries for devices.
This should be good enough as a sane default and users can change it if
the need to.
Jiri Denemark [Tue, 25 Feb 2020 15:05:06 +0000 (16:05 +0100)]
qemu: Do not set default CPU for archs without CPU driver
Whenever there is a guest CPU configured in domain XML, we will call
some CPU driver APIs to validate the CPU definition and check its
compatibility with the hypervisor. Thus domains with guest CPU
specification can only be started if the guest architecture is supported
by the CPU driver. But we would add a default CPU to any domain as long
as QEMU reports it causing failures to start any domain on affected
architectures.
Peter Krempa [Mon, 17 Feb 2020 09:08:25 +0000 (10:08 +0100)]
virStorageFileGetMetadataRecurse: Allow format probing under special circumstances
Allow format probing to work around lazy clients which did not specify
their format in the overlay. Format probing will be allowed only, if we
are able to probe the image, the probing result was successful and the
probed image does not have any backing or data file.
This relaxes the restrictions which were imposed in commit 3615e8b39bad
in cases when we know that the image probing will not result in security
issues or data corruption.
We perform the image format detection and in the case that we were able
to probe the format and the format does not specify a backing store (or
doesn't support backing store) we can use this format.
With pre-blockdev configurations this will restore the previous
behaviour for the images mentioned above as qemu would probe the format
anyways. It also improves error reporting compared to the old state as
we now report that the backing chain will be broken in case when there
is a backing file.
In blockdev configurations this ensures that libvirt will not cause data
corruption by ending the chain prematurely without notifying the user,
but still allows the old semantics when the users forgot to specify the
format.
Users thus don't have to re-invent when image format detection is safe
to do.
The price for this is that libvirt will need to keep the image format
detector still current and working or replace it by invocation of
qemu-img.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Peter Krempa [Tue, 25 Feb 2020 12:28:10 +0000 (13:28 +0100)]
qemu: domain: Convert detected 'iso' image format into 'raw'
While our code can detect ISO as a separate format, qemu does not use it
as such and just passes it through as raw. Add conversion for detected
parts of the backing chain so that the validation code does not reject
it right away.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Julio Faracco [Mon, 24 Feb 2020 14:24:27 +0000 (11:24 -0300)]
lxc: Replacing default strings definitions by g_autofree statement
There are a lots of strings being handled inside some LXC functions.
They can be moved to g_autofree to avoid declaring a return value to get
proper code cleanups. This commit is changing functions from
lxc_{controller,cgroup,fuse} only.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Ján Tomko [Sat, 22 Feb 2020 12:56:19 +0000 (13:56 +0100)]
tests: libxl: do not run the emulator
Ever since commit c5a00350 the libxl parser invokes the emulator
to probe which device model to use.
Commit b90c4b5 introduced a workaround that used a stable path
which was very likely to result in the answer matching the default.
However the test is still affected by the host state and the binary
gets invoked if present.
Mock the libxlDomainGetEmulatorType function to stop wasting CPU
cycles every time a 'make check' is run on a system with xen installed.
For example xlconfigtest gets faster by 90 %
Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: b90c4b5f505698d600303c5b4f03f5d229b329dd Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Ján Tomko [Sat, 22 Feb 2020 11:44:45 +0000 (12:44 +0100)]
testutilsxen: error out on initialization failure
libxlDriverConfigNew can possibly fail on wrong
firmware values (unlikely) or on failure to create
the log directory (possible if you're debugging
tests with VIR_FILE_ACCESS)
Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: 4a4132b4625778cf80acb9c92d06351b44468ac3 Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Michal Privoznik [Thu, 20 Feb 2020 14:38:43 +0000 (15:38 +0100)]
security: Don't fail if locking a file on NFS mount fails
The way that our file locking works is that we open() the file we
want to lock and then use fcntl(fd, F_SETLKW, ...) to lock it.
The problem is, we are doing all of these as root which doesn't
work if the file lives on root squashed NFS, because if it does
then the open() fails. The way to resolve this is to make this a
non fatal error and leave callers deal with this (i.e. disable
remembering) - implemented in the previous commit.
Michal Privoznik [Thu, 20 Feb 2020 14:38:10 +0000 (15:38 +0100)]
security: Don't remember seclabel for paths we haven't locked successfully
There are some cases where we want to remember the original owner
of a file but we fail to lock it for XATTR change (e.g. root
squashed NFS). If that is the case we error out and refuse to
start a domain. Well, we can do better if we disable remembering
for paths we haven't locked successfully.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Michal Privoznik [Thu, 20 Feb 2020 14:37:48 +0000 (15:37 +0100)]
virSecurityManagerMetadataLock: Store locked paths
So far, in the lock state we are storing only the file
descriptors of the files we've locked. Therefore, when unlocking
them and something does wrong the only thing we can report is FD
number, which is not user friendly at all. But if we store paths
among with FDs we can do better error reporting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Ján Tomko [Sat, 22 Feb 2020 15:22:54 +0000 (16:22 +0100)]
Remove virutil.h where possible
Historically, this file was a dump for most of our helper
functions and needed almost everywhere.
With the introduction of virfile.h and virstring.h,
and more importantly, virenum.h and the introduction
of GLib, that is no longer true.
Remove its include from C files that don't even use it.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This *is* a no-op, but there was a period of sickening dread while
auditing to be sure that no actual confusion between bus and slot had
occurred. I hope to avoid that by following the conventional order.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Andrea Bolognani [Mon, 24 Feb 2020 13:59:20 +0000 (14:59 +0100)]
ci: Fix handling of $PKG_CONFIG_LIBDIR
There are two environment variables that are baked into our
cross-compilation container images at build time, $CONFIGURE_OPTS
and $PKG_CONFIG_LIBDIR: the former contain the options necessary
to convince configure to perform a cross build rather than a
native one, and the latter is necessary so that pkg-config will
locate the .pc files for MinGW libraries. Container images that
are not intended for cross-compilation will not have either one
defined.
The problem is that, while an empty $CONFIGURE_OPTS is completely
harmless, setting $PKG_CONFIG_LIBDIR to an emtpy value will
result in pkg-config not looking in its default search path, thus
not finding any library, and subsequently breaking native builds.
To work around this issue, only pass $PKG_CONFIG_LIBDIR to sudo
when the value is set in the calling environment.
Fixes: 71517ae4db35c4dcc6c358d60d3a6d5da0615d39 Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Peter Krempa [Fri, 21 Feb 2020 11:51:35 +0000 (12:51 +0100)]
virStorageSourceNewFromBacking: Also transfer the format
When we create the new virStorageSource from the definitions stored in
the parent we should also use the 'backingStoreRawFormat' field to
populate the format.
Callers which use virStorageSourceNewFromBacking are also fixed to stop
setting the format manually.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
This helper is provided by qemu for vhost-user-gpu and thereby being
in the same path as qemu_bridge_helper. Due to that adding a rule allowing
to call uses the same path list.
Reviewed-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
docs: add a kbase explaining security protections for QEMU passthrough
When using command line passthrough users will often trip up over the
security protections like SELinux, DAC, namespaces, etc which will
deny access to files they are passing. This document explains the
various protections and how to deal with their policy, and/or how
to disable them.
Reviewed-by: Kashyap Chamarthy <kchamart@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Ján Tomko [Mon, 24 Feb 2020 12:32:30 +0000 (13:32 +0100)]
qemu: use correct backendType when checking memfd capability
The backend name is memory-backend-memfd but we've been checking
for memory-backend-memory.
Reported by GCC on rawhide:
../../../src/internal.h:75:22: error: 'strcmp' of a string of length 21 and
an array of size 21 evaluates to nonzero [-Werror=string-compare]
../../../src/qemu/qemu_command.c:3525:20: note: in expansion of macro 'STREQ'
3525 | } else if (STREQ(backendType, "memory-backend-memory") &&
| ^~~~~
Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: 24b74d187cab48a9dc9f409ea78900154c709579 Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Andrea Bolognani [Mon, 10 Feb 2020 17:02:55 +0000 (18:02 +0100)]
ci: Make container environment available to scripts
For container images targeted at cross-building, we bake a small
amount of architecture-specific information in the environment so
that builds can work as expected without requiring additional work
from the user; unfortunately this information got lost as soon as
we called sudo. Explicitly allow it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
While we have CI testing coverage for many platforms, we don't test any
non-glibc based Linux and there are other non-Linux platforms we don't
officially target, both of which might hit regressions.
Reviewed-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
docs: reduce excessive spacing in ToC for RST files
The table of contents in the RST based files uses <p> tags inside the
<li>, which results in 1em's worth of spacing above & below each
entry. This results in way too much whitespace in the ToC.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Rikard Falkeborn [Sat, 22 Feb 2020 23:22:47 +0000 (00:22 +0100)]
vz: Fix return value in error path
If PrlVmDev_GetType(), PrlVmDev_GetIndex() or PrlVmCfg_GetBootDevCount()
fails, return false to indicate error. Returning -1 would be interpreted
as true when used in an if-statement.
Fixes: 8c9252aa6d95247537da0939b54fdd2f31695e32 Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rikard Falkeborn [Sat, 22 Feb 2020 23:22:25 +0000 (00:22 +0100)]
esx: Same order of arguments in definition and declaration
The order of arguments were not the same in the definition and
declaration. All callers use the same order as the definition, so there
is no bug, but change the function declaration to match the
implementation to avoid confusion.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Ján Tomko [Wed, 19 Feb 2020 00:00:49 +0000 (01:00 +0100)]
util: vireventglibwatch: watch for G_IO_HUP and G_IO_ERR
To more closely match the previous usage in virEventPollDispatchHandles,
where called the handle callback for any revents returned by poll.
This should fix the virtlogd error on subsequent domain startup:
error: can't connect to virtlogd: Cannot open log file:
'/var/log/libvirt/qemu/f28live.log': Device or resource busy
as well as virtlogd spinning caused by virLogHandlerDomainLogFileEvent
never being called on hangup.
Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: f8ab47cb4491dd72d866c1a96a9d94b8c3341de9 Fixes: 946a25274c46ffff46323c62f567ae7e753aa921 Tested-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
After the introduction of virDomainDriverMergeBlkioDevice() in a
previous patch, it is now clear that lxcDomainSetBlkioParameters() and
qemuDomainSetBlkioParameters() uses the same loop to set cgroup
blkio parameter of a domain.
Avoid the repetition by adding a new helper called
virDomainCgroupSetupDomainBlkioParameters().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
lxcDomainMergeBlkioDevice() and qemuDomainMergeBlkioDevice()
are the same functions. This duplicated code can't be put in
the existing domain_cgroup.c since it's not cgroup related.
This patch introduces a new src/hypervisor/domain_driver.c to
host this more generic code that can be shared between virt
drivers. This new file is then used to create a new helper
called virDomainDeivceMergeBlkioDevice() to eliminate the code
repetition mentioned above. Callers in LXC and QEMU files
were updated.
This change is a preliminary step for more code reduction of
cgroup related code inside lxcDomainSetBlkioParameters() and
qemuDomainSetBlkioParameters().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
qemuSetupCgroupVcpuBW() and lxcSetVcpuBWLive() shares the
same code to set CPU CFS period and quota. This code can be
moved to a new virCgroupSetupCpuPeriodQuota() helper to
avoid code repetition.
A similar code is also executed in virLXCCgroupSetupCpuTune(),
but without the rollback on error. Use the new helper in this
function as well since the 'period' rollback, if not a
straight improvement for virLXCCgroupSetupCpuTune(), is
benign. And we end up cutting more code repetition.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
The code that calls virCgroupSetCpuShares() and virCgroupGetCpuShares()
is repeated in 4 different places. Let's put it in a new
virCgroupSetupCpuShares() to avoid code repetition.
There's a reason of why we execute a Get in the same value we
just executed Set, explained in detail by commit 97814d8ab3.
Let's add a gist of the reasoning behind it as a comment in
this new function as well.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
There is duplicated code between virt drivers that needs to
be moved to avoid code repetition. In the case of duplicated
code between lxc_cgroup.c and qemu_cgroup.c a common place
would be utils/vircgroup.c. The problem is that this would
introduce /conf related definitions that shouldn't be imported
to vircgroup.c, which is supposed to be a place for utilitary
cgroups functions only. And syntax-check would forbid it anyway
due to cross-directory includes being used.
An alternative would be to overload domain_conf.c, which already
contains all the definitions required. But that file is already
crowded with XML handling code and we wouldn't do any favors to
it by putting more utilitary, non-XML parsing/formatting code
there.
In [1], Cole suggested a 'domain_cgroup' file to host common code
between lxc_cgroup and qemu_cgroup, and Daniel suggested a
'src/hypervisor' dir to host these type of files. This patch
introduces src/hypervisor/domain_cgroup.c and, to get started,
introduces a new virDomainCgroupSetupBlkio() function to host shared
code between virLXCCgroupSetupBlkioTune() and qemuSetupBlkioCgroup().
vircgroup.c: turn virCgroup{Get/Set}BlkioDevice* into static
Previous patch moved all duplicated code that were setting
and getting BlkioDevice parameters to vircgroup.c. We can
turn them into static and spare a few symbols in
libvirt_private.syms.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
There are code repetition of set() and get() blkio device
parameters across lxc and qemu files. Use the new vircgroup
helpers to trim the repetition a bit.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
The current use of the functions that set and get
BlkioDevice attributes is doing a set(), followed by
a get() of the same parameter right after. This is done
because there is no guarantee that the kernel will accept
the desired value given by the set() call, thus we need to
execute a get() right after to get the actual value.
This patch adds helpers inside vircgroup.c to execute these
operations. Next patch will use these helpers to reduce
code repetition in LXC and QEMU files.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Ryan Moeller [Sat, 22 Feb 2020 05:01:39 +0000 (00:01 -0500)]
Add virtlockd and virtlogd init scripts
These are missing files for OpenRC.
Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Andrea Bolognani [Tue, 18 Feb 2020 20:43:08 +0000 (21:43 +0100)]
docs: Expand documentation for the tickpolicy timer attribute
The current documentation is fairly terse and not easy to decode
for someone who's not intimately familiar with the inner workings
of timer devices. Expand on it by providing a somewhat verbose
description of what behavior each policy will result in, as seen
from both the guest OS and host point of view.
Michal Privoznik [Fri, 21 Feb 2020 07:28:13 +0000 (08:28 +0100)]
qemuTestParseCapabilitiesArch: Free @binary
The variable is allocated, but never freed.
==119642== 29 bytes in 1 blocks are definitely lost in loss record 409 of 671
==119642== at 0x483579F: malloc (vg_replace_malloc.c:309)
==119642== by 0x5AB075F: __vasprintf_internal (in /lib64/libc-2.29.so)
==119642== by 0x57C1A28: g_vasprintf (in /usr/lib64/libglib-2.0.so.0.6000.7)
==119642== by 0x579A0CC: g_strdup_vprintf (in /usr/lib64/libglib-2.0.so.0.6000.7)
==119642== by 0x4AE6D58: vir_g_strdup_printf (glibcompat.c:197)
==119642== by 0x136EEE: qemuTestParseCapabilitiesArch (testutilsqemu.c:291)
==119642== by 0x138506: testQemuInfoSetArgs (testutilsqemu.c:763)
==119642== by 0x135FFF: mymain (qemuxml2argvtest.c:3093)
==119642== by 0x13A60E: virTestMain (testutils.c:839)
==119642== by 0x1368C2: main (qemuxml2argvtest.c:3121)
Fixes: 42b3e5b9e4b919644afe55a815992c07fb79b9dc Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>