Pavel Hrdina [Thu, 9 Jan 2025 15:23:44 +0000 (16:23 +0100)]
qemu: snapshot: delete disk image only if parent snapshot is external
When we are deleting external snapshot that is not active we only need
to delete overlay disk image of the parent snapshot. This works
correctly even if parent snapshot is external and active as it will have
another overlay created when user reverted to that snapshot.
In case the parent snapshot is internal there are no overlay disk images
created as everything is stored internally within the disk image. In
this case we would delete the actual disk image storing internal
snapshots and most likely the original disk image as well resulting in
data loss once the VM is shutoff.
Fixes: https://gitlab.com/libvirt/libvirt/-/issues/734 Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Jiri Denemark [Thu, 9 Jan 2025 14:06:24 +0000 (15:06 +0100)]
docs: Clarify documentation of host-model CPU mode
The host-model CPU mode was described as similar to copying the host CPU
definition from capabilities, which has not been the case for ages. The
host-model definition from domain capabilities is used instead.
Only the first sentence changed, but it required reformatting
essentially the whole paragraph so I used this as an opportunity to
reformat it a little bit more and split the long paragraph into several
smaller ones for better readability.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
util: don't attempt to acquire logind inhibitor if not requested
When VIR_INHIBITOR_WHAT_NONE is passed to virInhibitorNew, it is
an indication that daemon shutdown should be inhibited, but no
OS level inhibitors acquired. This is done by the virtnetworkd
daemon, for example, to prevent shutdown while running virtual
machines are present, without blocking / delaying OS shutdown.
Unfortunately the code forgot to skip the DBus call in this case,
resulting in errors being logged.
Reviewed-by: Laine Stump <laine@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When debugging it is useful to know what signals are being received and
metadata related to them. Log this data before calling the signal
handling callbacks.
Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Laine Stump [Fri, 13 Dec 2024 17:47:39 +0000 (12:47 -0500)]
qemu: allow migration of guest with mdev vGPU to VF vGPU
GPU vendors are moving away from using mdev to create virtual GPUs
towards using SRIOV VFs that are vGPUs. In both cases, once created
the vGPUs are assigned to guests via <hostdev> (i.e. VFIO device
assignment), and inside the guest the devices look identical, but mdev
vGPUs are located by QEMU/VFIO using a uuid, while VF vGPUs are
located with a PCI address. So although we generally require the
device on the source host to exactly match the device on the
destination host, in the case of mdev-created vGPU vs. VF vGPU
migration *can* potentially work, except that libvirt has a hard-coded
check that prevents us from even trying.
This patch loosens up that check so that we will allow attempts to
migrate a guest from a source host that has mdev-created vGPUs to a
destination host that has VF vGPUs (and vice versa). The expectation
is that if this doesn't actually work then QEMU will fail and generate
an error that we can report.
Georgia Garcia [Tue, 7 Jan 2025 15:23:38 +0000 (12:23 -0300)]
apparmor: fix UUID specification
There is a common misconception when writing AppArmor policy that
[0-9]* applies * to the [0-9] class, but that's not the case. For this
example, [0-9]* matches a single digit followed by any number of
characters except for /
Create a UUID variable that uses the following format 8-4-4-4-12.
Signed-off-by: Georgia Garcia <georgia.garcia@canonical.com> Reviewed-by: Jim Fehlig <jfehlig@suse.com>
meson: remove unneeded dependency on libdevmapper for storage_disk
In commit dfa0e11 the last direct usage of devmapper for storage_disk was
removed. There is one stale include remaining, which is unused even longer
since df1011ca. Remove the include and change meson.build so we can use
storage_disk without devmapper.
I'm running it right now with a stripped-down config on a small arm64
router with openwrt.
Signed-off-by: Stefan Hellermann <stefan@the2masters.de> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 247357cc292a added support for direct and extended modes for
tlbflush, but forgot to do the formatting as well.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Adam Julis [Fri, 3 Jan 2025 13:22:23 +0000 (14:22 +0100)]
virtiofs: Allow read only mode
Resolves: https://issues.redhat.com/browse/RHEL-72192 Signed-off-by: Adam Julis <ajulis@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Jim Fehlig [Sat, 4 Jan 2025 03:44:19 +0000 (20:44 -0700)]
security: apparmor: Remove hardcoded "libvirtd" profile name
The apparmor driver probe function checks for an active profile matching
the full path of the running daemon binary. If not found, it checks for
a profile named "libvirtd". This works fine when the running daemon is the
old monolithic libvirtd, but fails with modular daemons.
Remove the check for a hardcoded "libvirtd" profile and replace with the
basename of the running daemon binary.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The 'description' and 'message' fields in polkit policy files should be
translated into the user's chosen language. xgettext is told to search
in both and source and build dirs by meson.
Unfortunately a bug in xgettext means that when it searches for built
files in XML format, it'll trigger a warning message due to failure to
load the generated file from the source dir:
xgettext: cannot read ..snip../libvirt/src/access/org.libvirt.api.policy: failed to load external entity "..snip../libvirt/src/access/org.libvirt.api.policy"
This is harmless since it then goes on to try the build dir and
succeeds, but will pollute the output of 'ninja libvirt-pot'
Related: https://gitlab.com/libvirt/libvirt/-/merge_requests/387 Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
po: add its rules for translating polkit file strings
xgettext / msgfmt have generic support for extracting / merging strings
in XML files, however, they need to be told something about the schema
to know which fields are translatable. This is done by providing 'its'
rules. Usually the 'its' rules would be shipped in a -devel package of
the app which owns the schema definition, but polkit does not do this.
Thus libvirt (and other apps) must ship their own local 'its' rules for
polkit.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
qemu_tpm: lock the state explicitly when running swtpm
Commit bb5e26749fe5b ("qemu: explicit swtpm state locking") attempted to
lock the state, but only for swtpm-setup. The capability
"tpmstate-opt-lock" is actually only exposed by swtpm.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Thu, 19 Dec 2024 10:02:59 +0000 (11:02 +0100)]
hyperv: Introduce and export 'facility' variable.
In its upstream commit [1] openwsman dropped 'facility' variable
which is documented as:
* all processes that use the libu must define a "facility" variable somewhere
* to satisfy this external linkage reference.
*
* Such variable will be used as the syslog(3) facility argument.
Well, prior to that commit, openwsman itself declared the
variable (and set it to LOG_DAEMON). Now it's up to us.
Yeah, the variable naming is terrible and also I we are not using
libu directly, but apparently libwsman.so requires it anyway:
1: https://github.com/Openwsman/openwsman/commit/d72c51f21b9c85a773b7955ac587d2d3cea982c1 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
qemu: Add support for direct and extended tlbflush features
They require special handling since they are dependent on the basic
tlbflush feature itself and therefore are not handled automatically as
part of virDomainHyperv enum, just like the stimer-direct feature.
Resolves: https://issues.redhat.com/browse/RHEL-7122 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
While doing so, also drop QEMU specific arguments from
domainLogContextNew() and replace them with hypervisor agnostic
ones.
Signed-off-by: Praveen K Paladugu <praveenkpaladugu@gmail.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Libxml2 has awful error reporting behaviour when reading files. When
we fail to load a file from the test driver we see:
$ virsh -c test:///wibble.xml
I/O warning : failed to load external entity "/wibble.xml"
error: failed to connect to the hypervisor
error: XML error: failed to parse xml document '/wibble.xml'
where the I/O warning line is something printed by libxml2 itself,
which also lacks any useful detail.
Switching to our own file reading code we can massively improve
things:
$ ./build/tools/virsh -c test:///wibble.xml
error: failed to connect to the hypervisor
error: Failed to open file '/wibble.xml': No such file or directory
Using 10 MB as an upper limit on XML file size ought to be sufficient
for any XML files libvirt is reading.
Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
util/xml: don't assume libxml2 has the filename of the document
The libxml2 error handling gets the filename from a libxml2 struct, but
it is better to not assume libxml2 knows the filename being parsed, as
we might have simply provided it a pre-loaded string.
Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently given an input of '<dom\n' we emit an error:
error: Failed to define domain from tests/qemuxmlconfdata/broken-xml-invalid.xml
error: at line 2: Couldn't find end of Start Tag dom line 1
(null)
^
With this fix we emit:
error: Failed to define domain from tests/qemuxmlconfdata/broken-xml-invalid.xml
error: at line 2: Couldn't find end of Start Tag dom line 1
<dom
----^
Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The virNetDaemon code now only concerns itself with preventing auto
shutdown of the local daemon. Logind is now handled by the new
virInhibitor object, for QEMU, LXC and LibXL. This fixes two notable
bugs
* Running virtual networks would prevent system shutdown
* Loaded ephemeral secrets would prevent system shutdown
src: convert drivers over to new virInhibitor APIs
This initial conversion of the drivers switches them over to use
the virInhibitor APIs in local daemon only mode. Communication to
logind is still handled by the virNetDaemon class logic.
This mostly just replaces upto 3 fields in the driver state
with a single new virInhibitor object, but otherwise should not
change functionality besides replacing atomics with mutex protected
APIs.
The exception is the LXC driver which has been trying to inhibit
shutdown shutdown but silently failing to, since nothing ever
remembered to set the 'inhibitCallback' pointer in the driver
state struct.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
util: introduce object for holding a system inhibitor lock
The system inhibitor locks are currently handled by code in the
virNetDaemon class. The driver code invokes a callback provided
by the daemon when it wants to start or end inhibition.
When the first inhibition is started, the daemon will call out
to logind to apply it system wide.
This has many flaws
* A single message is registered with logind regardless of
what driver holds the inhibition
* An inhibition of daemon shutdown can't be acquired
without also inhibiting system shutdown
* Config of the inhibitions cannot be tailored by the
driver
The new virInhibitor object addresses these:
* The object directly manages an inhibition with logind
privately to the driver, enabling custom messages to
be set.
* It is possible to acquire an inhibition locally to the
daemon without forwarding it to logind.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Tue, 10 Dec 2024 11:28:53 +0000 (12:28 +0100)]
qemu: Call migrate-incoming with exit-on-error=false
The exit-on-error=false argument of migrate-incoming tells the QEMU
process to keep running when incoming migration fails, which helps us in
two ways:
1. When migration enters Finish phase to cleanup the process, the domain
might not even exist on the destination (because it has already been
cleaned up by EOF monitor callback) and we would get rather unhelpful
"operation failed: domain is no longer running" error message.
2. We can get the error that caused incoming migration to fail directly
from QEMU via query-migrate QMP command.
https://issues.redhat.com/browse/RHEL-7041
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Jiri Denemark [Thu, 12 Dec 2024 09:45:38 +0000 (10:45 +0100)]
qemu: Replace qemuDomainCheckMonitor with qemuMigrationJobCheckStatus
The function is only used during incoming migration in the beginning of
Finish phase to detect if QEMU already died but EOF handler haven't had
a chance to do its job yet. It calls query-status QMP command, but
ignores the result. By calling query-migrate instead we can achieve the
same functionality if QEMU is dead and even get meaningful error from
"error-desc" in case the incoming migration failed and QEMU is still
running.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Jiri Denemark [Mon, 9 Dec 2024 13:47:50 +0000 (14:47 +0100)]
qemu: Detect exit-on-error argument of migrate-incoming
The exit-on-error argument (added in QEMU 9.1.0) can be used to tell
QEMU not to exit when incoming migration fails so that the error can be
retrieved via QMP. This patch adds a new capability bit indicating
support for the new argument.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal Privoznik [Wed, 18 Dec 2024 15:15:56 +0000 (16:15 +0100)]
qemu_capabilities: Avoid memleak in virQEMUCapsProbeFullDeprecatedProperties()
As one of its arguments, the
virQEMUCapsProbeFullDeprecatedProperties() gets a pointer to
GStrv (a string list), which it may eventually replace. It's
single caller (virQEMUCapsProbeQMPHostCPU()) passes a string list
indeed. Now, when replacing one string list with another plain
g_free() is not enough as we need to free individual strings too.
==13573== 34 bytes in 8 blocks are definitely lost in loss record 271 of 576
==13573== at 0x4844878: malloc (vg_replace_malloc.c:446)
==13573== by 0x51789D1: g_malloc (in /usr/lib64/libglib-2.0.so.0.7800.6)
==13573== by 0x5193E82: g_strdup (in /usr/lib64/libglib-2.0.so.0.7800.6)
==13573== by 0x4997F73: g_strdup_inline (gstrfuncs.h:321)
==13573== by 0x4997F73: virJSONValueArrayToStringList (virjson.c:1296)
==13573== by 0x5027CF7: qemuMonitorJSONParseCPUModelExpansion (qemu_monitor_json.c:5139)
==13573== by 0x50281C9: qemuMonitorJSONGetCPUModelExpansion (qemu_monitor_json.c:5245)
==13573== by 0x501044F: qemuMonitorGetCPUModelExpansion (qemu_monitor.c:3261)
==13573== by 0x4F190D0: virQEMUCapsProbeQMPHostCPU (qemu_capabilities.c:3227)
==13573== by 0x4F2145E: virQEMUCapsInitQMPMonitor (qemu_capabilities.c:5758)
==13573== by 0x10FFF8: testQemuCaps (qemucapabilitiestest.c:111)
==13573== by 0x110B53: virTestRun (testutils.c:143)
==13573== by 0x11063E: doCapsTest (qemucapabilitiestest.c:200)
Fixes: 51c098347d7f2af9b4386ac0adc4431997d06f3d Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Michal Privoznik [Wed, 18 Dec 2024 10:07:26 +0000 (11:07 +0100)]
qemu: Enable I/O APIC even more frequently
In my previous commit v10.10.0-48-g2d222ecf6e I've made us enable
I/O APIC when there is an IOMMU with EIM. This works well. What
does not work is case when there's just an IOMMU without EIM but
with 256+ vCPUS. Problem is that post parsing happens in two
stages: general domain post parse (where
qemuDomainDefEnableDefaultFeatures() is called) and then per
device post parse (where qemuDomainIOMMUDefPostParse() is
called). Now, in aforementioned case it is the device post parse
phase where EIM is enabled but the code that would enable
VIR_DOMAIN_FEATURE_IOAPIC has already run.
To resolve this, make the domain post parse callback "foresee"
the future enabling of EIM so that it can turn on I/O APIC
beforehand.
An RPM must own any directories its creates, unless it can guarantee a
dependancy has ownership. Two packages owning the same directory is fine
if permissions are consistent.
We don't require augeas as a dep in most packages, so we must own the
augeas lens directories. Likewise for systemtap tapset dirs.
Our own cpu map dir also needs ownership.
A few files are re-sorted, so that the files are listed immediately
adjacent to the %dir that contains them.
https://bugzilla.redhat.com/show_bug.cgi?id=2280979 Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Collin Walling [Mon, 16 Dec 2024 23:03:58 +0000 (18:03 -0500)]
conf: add deprecated_features attribute
Add a new a attribute, deprecated_features='on|off' to the <cpu>
element. This is used to toggle features flagged as deprecated on the
CPU model on or off. When this attribute is paired with 'on',
deprecated features will not be filtered. When paired with 'off', any
CPU features that are flagged as deprecated will be listed under the
CPU model with the 'disable' policy.
The absence of this attribute is equivalent to the 'on' option.
The deprecated features that will populate the domain XML are the same
features that result in the virsh domcapabilities command with the
--disable-deprecated-features argument present.
It is recommended to define a domain XML with this attribute set to
'off' to ensure migration to machines that may outright drop these
features in the future.
Collin Walling [Mon, 16 Dec 2024 23:03:57 +0000 (18:03 -0500)]
virsh: add --disable-deprecated-features flag to domcapabilities
Add a new flag, --disable-deprecated-features, to the domcapabilities
command. This will modify the output to show the 'host-model' CPU
with features flagged as deprecated paired with the 'disable' policy.
Collin Walling [Mon, 16 Dec 2024 23:03:56 +0000 (18:03 -0500)]
qemu_capabilities: filter deprecated features if requested
If flag VIR_CONNECT_GET_DOMAIN_CAPABILITIES_DISABLE_DEPRECATED_FEATURES
is passed to qemuConnectGetDomainCapabilities, then the domain's CPU
model features will be updated to set any deprecated features to the
'disabled' policy.
Collin Walling [Mon, 16 Dec 2024 23:03:54 +0000 (18:03 -0500)]
qemu_capabilities: query deprecated features for host-model
Add QEMU_CAPS_QUERY_CPU_MODEL_EXPANSION_DEPRECATED_PROPS for detecting
if query-cpu-model-expansion can report deprecated CPU model properties.
QEMU introduced this capability in 9.1 release. Add flag and deprecated
features to the capabilities test data for QEMU 9.1 and 9.2 replies/XML
since it can now be accounted for.
When probing for the host CPU, perform a full CPU model expansion to
retrieve the list of features deprecated across the entire architecture.
The list and count are stored in the host's CPU model info within the
QEMU capabilities. Other info resulting from this query (e.g. model
name, etc) is ignored.
The new capabilities flag is used to fence off the extra query for
architectures/QEMU binaries that do not report deprecated CPU model
features.
Collin Walling [Mon, 16 Dec 2024 23:03:53 +0000 (18:03 -0500)]
qemu: parse deprecated-props from query-cpu-model-expansion response
query-cpu-model-expansion may report an array of deprecated properties.
This array is optional, and may not be supported for a particular
architecture or reported for a particular CPU model. If the output is
present, then capture it and store in a qemuMonitorCPUModelInfo struct
for later use.
The deprecated features will be retained in qemuCaps->kvm->hostCPU.info
and will be stored in the capabilities cache file under the <hostCPU>
element using the following format:
Refactor the CPU Model parsing functions within
qemuMonitorJSONGetCPUModelExpansion. The new functions,
qemuMonitorJSONParseCPUModelExpansionData and
qemuMonitorJSONParseCPUModelExpansion invoke the functions they
replace and leave room for a subsequent patch to handle parsing the
(optional) deprecated_props field resulting from the command.
Michal Privoznik [Thu, 12 Dec 2024 09:02:43 +0000 (10:02 +0100)]
qemu: Enable I/O APIC if needed
This is a follow up of my previous commits. If the number of
vCPUs exceeds some arbitrary value (255) then QEMU requires IOMMU
with EIM and intremap enabled. But in turn, intremap IOMMU
requires split I/O APIC (per virDomainDefIOMMUValidate()). Since
after my previous commits (e.g. v10.10.0-rc1~183) IOMMU is added
automagically, the I/O APIC can be also enabled automagically.
Relates to: https://issues.redhat.com/browse/RHEL-65844 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Summary: original problem was unexpected failure of update-device when
the user hadn't changed anything other than online status of the guest
NIC (which should always be allowed).
The first commit "fixed" this by avoiding the allocation of a new
ActualNetDef (i.e. creating a new networkport) for *all* network
device updates (because that was inappropriately changing which
ethernet physdev should be used for a macvtap connection, which by
design can't be handled in an update-device).
But this commit caused a regression for update-device of bridge-based
network devices (because some the updates of certain attributes *do*
require the ActualNetDef be re-allocated), so...
The 2nd commit narrowed the list of network types that get the "don't
allocate new ActualNetDef" treatment (so that only interfaces
connected to a network that uses a pool of ethernet VFs *being used in
passthrough mode* qualify).
But then it was pointed out that this re-broke simple updates of
devices that used a direct/macvtap network in "bridge" mode (because
it's possible to list multiple physdevs to use for bridge mode, in
which case the network driver attempts to "load balance" (and so a new
allocation might have a different ethernet physdev which, again, can't
be supported in a device-update).
So this (single line of code) patch *widens* the list of network types
that don't allocate a new ActualNetDef to also include the other
direct (macvtap) modes, e.g. bridge, private, etc.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>