Laine Stump [Mon, 21 Oct 2024 03:02:56 +0000 (23:02 -0400)]
network: add rule to nftables backend that zeroes checksum of DHCP responses
Many years ago (April 2010), soon after "vhost" in-kernel packet
processing was added to the virtio-net driver, people running RHEL5
virtual machines with a virtio-net interface connected via a libvirt
virtual network noticed that when vhost packet processing was enabled,
their VMs could no longer get an IP address via DHCP - the guest was
ignoring the DHCP response packets sent by the host.
(I've been informed by danpb that the same issue had been encountered,
and "fixed" even earlier than that, in 2006, with Xen as the
hypervisor.)
The "gory details" of the 2010 discussion are chronicled here:
but basically it was because packet checksums weren't being fully
computed on the host side (because QEMU on the host and the NIC driver
in the guest had agreed between themselves to turn off checksums
because they were unnecessary due to the "link" between the two being
entirely in local memory rather than an error-prone physical cable),
but
1) a partial checksum was being put into the packets at some point by
"someone"
2) the "don't use checksums" info was known by the guest kernel, which
would properly ignore the "bad" checksum), and
3) the packets were being read by the dhclient application on the
guest side with a "raw" socket (thus bypassing the guest kernel UDP
processing that would have known the checksum was irrelevant and
ignore it)),
The "fix" for this ended up being two-tiered:
1) The ISC DHCP package (which contains the aforementioned dhclient
program) made a fix to their dhclient code which caused it to accept
packets anyway even if they didn't have a proper checksum (NB: that's
not a full explanation, and possibly not accurate). This remedied the
problem for guests with an updated dhclient. Here is the code with the
fix to ISC DHCP:
This eliminated the issue for any new/updated guests that had the
fixed dhclient, but it didn't solve the problem for existing/old guest
images that didn't/couldn't get their dhclient updated. This brings us
to:
2) iptables added a new "CHECKSUM" target and "--checksum-fill"
action:
http://patchwork.ozlabs.org/patch/58525/
and libvirt added an iptables rule for each virtual network to match
DHCP response packets and perform --checksum-fill. This way by the
time dhclient on the guest read the raw packet, the checksum would be
corrected, and the packet would be accepted. This was pushed upstream
in libvirt commit v0.8.2-142-gfd5b15ff1a.
The word at the time from those more knowledgeable than me was that
the bad checksum problem was really specific to ISC's dhclient running
on Linux, and so once their fix was in use everywhere dhclient was
used, bad checksums would be a thing of the past and the
--checksum-fill iptables rules would no longer be needed (but would
otherwise be harmless if they were still there).
(Plot twist: the dhclient code in fix (1) above apparently is on a
Linux-only code path - this is very important later!)
Based on this information (and also due to the opinion that fixing it
by having iptables modify the packet checksum was really the wrong way
to permanently fix things, i.e. an "ugly hack"), the nftables
developers made the decision to not implement an equivalent to
--checksum-fill in nftables. As a result, when I wrote the nftables
firewall backend for libvirt virtual networks earlier this year, it
didn't add in any rule to "fix" broken UDP checksums (since there was
apparently no equivalent in nftables and, after all, that was fixed
somewhere else 14 years ago, right???)
But last week, when Rich Jones was doing routine testing using a Fedora
40 host (the first Fedora release to use the nftables backend of libvirt's
network driver by default) and a FreeBSD guest, for "some strange
reason", the FreeBSD guest was unable to get an IP address from DHCP!!
A few quick tests proved that it was the same old "bad checksum"
problem from 2010 come back to haunt us - it wasn't a Linux-only issue
after all.
Phil Sutter and Eric Garver (nftables people) pointed out that, while
nftables doesn't have an action that will *compute* the checksum of a
packet, it *does* have an action that will set the checksum to 0, and
suggested we try adding a "zero the checksum" rule for dhcp response
packets to our nftables ruleset. (Why? Because a checksum value of 0
in a IPv4 UDP packet is defined by RFC768 to mean "no checksum
generated", implying "checksum not needed"). It turns out that this
works - dhclient properly recognizes that a 0 checksum means "don't
bother with the checksum", and accepts the packet as valid.
So to once again fix this timeless bug, this patch adds such a
checksum zeroing rule to the nftables rules setup for each virtual
network.
This has been verified (on a Fedora 40 host) to fix DHCP with FreeBSD
and OpenBSD guests, while not breaking it for Fedora or Windows (10)
guests.
Fixes: b89c4991daa0ee9371f10937fab3b03c5ffdabc6 Reported-by: Rich Jones <rjones@redhat.com> Fix-Suggested-by: Eric Garver <egarver@redhat.com> Fix-Suggested-by: Phil Sutter <psutter@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Laine Stump [Tue, 22 Oct 2024 01:31:37 +0000 (21:31 -0400)]
network: don't unset the firewalld zone if it's going to be immediately re-set
Any time the firewalld zone for an interface is set, by definition
that removes it from any previous zone that it was in, so there is
really no point in unsetting the zone if it's just going to be
immediately set again.
This is useful because when firewalld reloads its rules, 3 things happen:
1) firewalld flushes *all* firewall rules (including those added by libvirt)
2) firewalld unsets the zones for all interfaces (including those set
by libvirt)
3) firewalld re-adds its own rules, and sets the zone for all the
interfaces it manages
4) firewalld sends a dbus message that libvirt is watching for, and
when libvirt receives that message, it reloads all of the
libvirt-generated rules, and also re-sets the firewalld zone for
the bridge interfaces managed by libvirt.
libvirt accomplishes step 4 by a) calling
networkRemoveFirewallRules(), and then b) calling
networkAddFirewallRules(). But (because it is useful in other
contexts) networkRemoveFirewallRules() will attempt to *unset* the
zone for each bridge interface, and when firewalld receives this
request, it sees that the bridge interface *has no zone* (because it
was unset by firewalld in step (2) above), and thus logs an error
message.
There is no way for libvirt to suppress an error message that is
logged by firewalld when a request to firewalld fails. But what
libvirt *can* do is realize that in these cases, the firewalld zone is
about to be set again anyway, and so we don't need to unset the zone.
This patch handles that by adding a bool unsetZone to the arguments of
networkRemoveFirewallRules(); most calls to networkRemoveFirewallRules()
have unsetZone=true, but in two cases where the zone is about to be
reset, networkRemoveFirewallRules() is called with unsetZone=false,
which prevents the call to virFirewallDInterfaceUnsetZone() and thus
avoids the unnecessary (and confusing to users!) error message that
would have been logged by firewalld.
Signed-off-by: Laine Stump <laine@redat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Laine Stump [Mon, 21 Oct 2024 17:55:16 +0000 (13:55 -0400)]
network: ignore/don't log errors when unsetting firewalld zone
The most common "error" when trying to unset the firewalld zone of an
interface is for firewalld to tell us that the interface already isn't
in any zone. Since this is what we want, no need to alarm the user by
logging it as an error.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Fri, 18 Oct 2024 12:40:48 +0000 (14:40 +0200)]
domain_capabilities: Report CPU blockers
When a CPU model is reported as usable='no' an additional
<blockers model='...'> element is added for that CPU model to show which
features are missing for the CPU model to become usable.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 15 Oct 2024 16:18:25 +0000 (18:18 +0200)]
qemu: Change CPU comparison algorithm for future models
When starting a domain we check whether the guest CPU definition is
compatible with the host (i.e., when the host supports all features
required both explicitly and by the specified CPU model) as long as
check == 'partial', which is the default.
We are doing so by checking our definition of the CPU model in the CPU
map amending it with explicitly mentioned features and comparing it to
features QEMU would enabled when started with -cpu host. But since our
CPU model definitions often slightly differ from QEMU we may be checking
features which are not actually needed and on the other hand not
checking something that is part of the CPU model in QEMU.
This patch changes the algorithm for CPU models added in the future
(changing it for existing models could cause them to suddenly become
incompatible with the host and domains using them would fail to start).
The new algorithm uses information we probe from QEMU about features
that block each model from being directly usable. If all those features
are explicitly disabled in the CPU definition we consider the base model
compatible with the host. Then we only need to check that all explicitly
required features are supported by QEMU on the host to get the result
for the whole CPU definition.
After this we only use the model definitions (for newly added models)
from CPU map for creating a CPU definition for host-model.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 15 Oct 2024 16:15:17 +0000 (18:15 +0200)]
cpu: Introduce virCPUCompareUnusable
As opposed to the existing virCPUCompare{,XML} this function does not
use CPU model definitions from CPU map. It relies on CPU model usability
info from a hypervisor with a list of blockers that make the selected
CPU model unusable. Explicitly requested features are checked against
the hypervisor's view of a host CPU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Thu, 10 Oct 2024 08:11:25 +0000 (10:11 +0200)]
qemu: Separate partial CPU check into a function
The new qemuDomainCheckCPU function is used as a replacement for
virCPUCompare to make sure all callers use the same comparison
algorithm. As a side effect qemuConnectCompareHypervisorCPU now properly
reports CPU compatibility for CPU model that are considered runnable by
QEMU even if our definition of the model disagrees.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 15 Oct 2024 10:31:09 +0000 (12:31 +0200)]
qemu: Use virCPUCompare in qemuConnectCompareHypervisorCPU directly
The function already parses CPU XML on s390. By parsing it consistently
on all architecture we can switch to virCPUCompare and easily replace it
with a QEMU specific helper in the following patch.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Wed, 9 Oct 2024 14:52:08 +0000 (16:52 +0200)]
cpu: Introduce virCPUGetCheckMode
On x86 the function returns whether an old style compat check mode
should be used for a specified CPU model according to the CPU map. All
other architectures will always use compat mode.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Wed, 9 Oct 2024 11:26:40 +0000 (13:26 +0200)]
cpu_x86: Introduce <check> element for CPU models
CPU models in the CPU map may be marked with <check partial="compat"/>
to indicate a backward compatible partial check (comparing our
definition of the model with the host CPU) should be performed. Other
models will be checked using just runnability info from QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Wed, 23 Oct 2024 06:49:07 +0000 (08:49 +0200)]
ci: Move definition of exit codes allowed to fail for cirrus jobs
Update with latest lcitool.
Update the build templates to move the definition of exit codes which
are allowed to fail for cirrus jobs for cases when we run out of CI
minutes. The previous location was overridden with the per-job
'allow_failure' value and thus didn't apply.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jim Fehlig [Mon, 21 Oct 2024 20:49:23 +0000 (14:49 -0600)]
spec: Drop nwfilter dependency in libvirt-daemon-xen
The libvirt xen driver does not support nwfilters. In fact, since
commit d721b6840f, the driver rejects VM configuration referencing
nwfilters. Drop the needless nwfilter dependency from
libvirt-daemon-xen.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Laine Stump <Laine@redhat.com>
Peter Krempa [Fri, 18 Oct 2024 14:13:15 +0000 (16:13 +0200)]
qemu: snapshot: Delete leftover overlay files for <transient/> disks
When a VM is terminated by host reboot libvirt doesn't get to cleaning
out the temporary overlay file used for transient disks. Since we create
those files with a very specific suffix it's almost guaranteed that if
it exists it's a leftover from a libvirt run. Delete them instead of
complaining to preserve functionality.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/684 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Andrea Bolognani [Fri, 18 Oct 2024 07:57:19 +0000 (09:57 +0200)]
rpm: Require dmidecode on more architectures
It's not only used on x86_64 these days. See virSysinfoRead().
Technically we should include loongarch64 in the list as well,
but Fedora hasn't been bootstrapped on the architecture yet,
and when the time comes several more changes are going to be
necessary anyway.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Thu, 17 Oct 2024 11:55:23 +0000 (13:55 +0200)]
util: bitmap: Rewrite virBitmapShrink using new helpers
Rather than reimplement everything manually use virBitmapBuffsize to
find the current number of units, realloc the buffer and clear the tail
using virBitmapClearTail().
This fixes a corner case where the buffer would be over-allocated by one
unit when shrinking to the boundary of the unit size.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Peter Krempa [Thu, 17 Oct 2024 07:47:07 +0000 (09:47 +0200)]
virBitmapNewCopy: Honor sizes of either bitmap when doing memcpy()
'virBitmapNewCopy()' allocates a new bitmap with the same number of bits
but uses the internal allocation length as argument for the memcpy()
operation to copy the bits. Due to bugs in other code these may not be
the same resulting into a buffer overflow if the source is
over-allocated. Use the buffer length of the target bitmap instead.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Michal Privoznik [Thu, 17 Oct 2024 14:50:12 +0000 (16:50 +0200)]
NEWS: Fix naming of DISK_DETECT_ZEROES migration parameter
There's a typo in NEWS.rst where
VIR_MIGRATE_PARAM_MIGRATE_DISKS_DETECT_ZEROES has the _ZEROES
suffix duplicated referring to a non-existent migration
parameter. Drop the suffix.
Fixes: 2e29ab3269701535f71cf56cc51165e7eeb1e49f Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
qemu: Do not hardcode Hyper-V feature names on command line
When constructing the command line for QEMU, some Hyper-V features were
hardcoded, probably due to the fact that they could not have been
automatically translated from the libvirt feature name to QEMU CPU
feature name.
Well now they can be, thanks to their additions to the
virQEMUCapsCPUFeaturesX86 translation table.
Translate all such features the same way when constructing the command
line. This way any future feature that is not translated will be caught
by tests (if a test is added for it) which was not the case when it was
just hardcoded. Hopefully this avoids at least some possible future
issues.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
qemu: Add more translations to virQEMUCapsCPUFeatureTranslationTable
Hyper-V enlightenment features can have hyphenated names which libvirt
exposes under Hyper-V features with underscored names. When libvirt
checks that all requested features were enabled by QEMU (on x86
architectures) it first queries for all those that QEMU knows and
compiles them in a map while using the virQEMUCapsCPUFeaturesX86 for
translations.
Some features (well, all Hyper-V features with underscores) were not
present in the translation table and were incorrectly reported as not
enabled, consequently failing the start of any such domain.
Add all hyphenated/underscored Hyper-V feature names into the
aforementioned translation table. That way domains with these features
enabled can be started when QEMU and the kernel support them.
Resolves: https://issues.redhat.com/browse/RHEL-7122 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
wireshark: drop gmodule.h include to avoid glib warnings
The wireshark address.h header uses 'g_memdup2' but this triggers
warnings under clang due to the max version cap:
In file included from ../tools/wireshark/src/plugin.c:27:
In file included from /usr/include/wireshark/epan/proto.h:30:
In file included from /usr/include/wireshark/epan/packet_info.h:15:
/usr/include/wireshark/epan/address.h:107:18: error: 'g_memdup2' is deprecated: Not available before 2.68 [-Werror,-Wdeprecated-declarations]
107 | addr->priv = g_memdup2(&val, sizeof(val));
| ^
/usr/include/glib-2.0/glib/gstrfuncs.h:341:1: note: 'g_memdup2' has been explicitly marked deprecated here
341 | GLIB_AVAILABLE_IN_2_68
| ^
/usr/include/glib-2.0/glib/glib-visibility.h:771:32: note: expanded from macro 'GLIB_AVAILABLE_IN_2_68'
771 | #define GLIB_AVAILABLE_IN_2_68 GLIB_UNAVAILABLE (2, 68)
| ^
/usr/include/glib-2.0/glib/glib-visibility.h:32:35: note: expanded from macro 'GLIB_UNAVAILABLE'
32 | #define GLIB_UNAVAILABLE(maj,min) G_UNAVAILABLE(maj,min) _GLIB_EXTERN
| ^
/usr/include/glib-2.0/glib/gmacros.h:1285:47: note: expanded from macro 'G_UNAVAILABLE'
1285 | #define G_UNAVAILABLE(maj,min) __attribute__((deprecated("Not available before " #maj "." #min)))
| ^
1 error generated.
It is unclear why clang warns, but gcc does not. Our plugin doesn't
actually use the inline helper in address.h that references g_memdup2,
but we get the warning regardless.
Interestingly removing the 'gmodule.h' include avoids the warning. Since
there is nothing in plugin.c that appears to need gmodule.h, removing it
should be safe & done regardless.
Reviewed-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
tests: stop stubbing libselinux APIs for purpose of data overrides
We currently create stub 'setcon', 'setcon_raw' and 'security_disable'
APIs in the securityselinuxhelper.c mock, which set env variables to
control how other mock'd libselinux APIs respond. These stubs merely
set some env variables, and we have no need to call these stubs from
the library code, only test code.
The 'security_disable' API is now deprecated in libselinux, so we
stubbing it generates compiler warnings. Rather than workaround that,
just stop stubbing these APIs and set the required env variables
directly. With this change, we now only mock API calls we actually
use from the library code.
Reviewed-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Adam Julis [Tue, 15 Oct 2024 09:51:38 +0000 (11:51 +0200)]
lxc: fix variable storage order before call
virDomainConfNWFilterInstantiate() was called without updated
net->ifname, it caused in some cases throwing error message. If
function failed, change is reverted.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/658 Signed-off-by: Adam Julis <ajulis@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
qemu_namespace: Only replicate labels on created files
Function qemuNamespaceMknodOne() is trying to replicate a file from the
parent namespace as perfectly as possible, with the same permissions,
labels, ACLs, etc.
If that file already existed it means that the qemu process is probably
using it already and the current setting is probably more correct than
the ones from the parent namespace.
In order to reflect that only replicate the file metadata when it was
(re-)created in this function.
Resolves: https://issues.redhat.com/browse/RHEL-62174 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Function qemuNamespaceMknodOne() is supposed to return 0 if the file did
not exist before this function. If, however, the file existed, but was
removed and recreated by this function the @existed flag should be reset
to its proper state (false) because the function then behaves the same
way as if the file did not exist as it needed to be recreated.
So reset the @existed flag to properly reflect what happened.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The boolean actually tells whether the file existed when the function
was called and using it in more places later on makes them
confusing (e.g. do something with a file if it does not exist). To
better reflect the above and prepare for next patch rename this
variable.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Notable changes:
- new 9.2 machine types
- 'gluster' disk backend deprecated
- 'reconnect' option of chardevs replaced by 'reconnect-ms'
- this includes test output changes happening in this patch
as 'reconnect' was deprecated in the same patch that
introduced 'reconnect-ms' and thus couldn't be changed
incrementally
- cpu flags:
- 'ibpb-brtype' added
- 'vmx-exit-secondary-ctls' added
- 'vmx-entry-load-rtit-ctl' added
- migration capabilities/parameters
- 'zero-blocks' deprecated
- 'multifd-qatzip-level' added
- 'pty' chardev backend gained 'path' attribute
- 'cris' and 'she4b' arches removed (from 'query-cpus-fast' data)
- 'copy-before-write' block filter gained 'min-cluster-size'
- 'vhost-user-scmi', 'serial-mm' removed
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Fri, 11 Oct 2024 12:20:32 +0000 (14:20 +0200)]
qemu: chardev: Use 'reconnect-ms' instead of deprecated 'reconnect'
qemu-9.2 will deprecate the 'reconnect' field in favor of
'reconnect-ms'. As libvirt currently doesn't track the timeouts in
milliseconds we simply convert them to avoid use of the deprecated
field.
Quite a lot of churn is caused by the need to plumb 'qemuCaps' into the
chardev props generator.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
New qemu introduced the 'reconnect-ms' field for character devices
allowing the reconnect timeout to be specified in milliseconds, which
also deprecates the existing 'reconnect' field that libvirt uses.
To avoid use of deprecated interfaces add a capability which will allow
us to use the new field.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 10 Oct 2024 15:57:36 +0000 (17:57 +0200)]
qemuxml(conf|active)test: Use 'nbd' instead of 'gluster' in 'disk-backing-chains-(no)index' cases
The gluster protocol will be deprecated by qemu-9.2. Convert the tests
to NBD as it's trivial and the test cases are not concerned with a
specific protocol.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 10 Oct 2024 15:39:46 +0000 (17:39 +0200)]
qemublocktest: Mark 'gluster' case in image creation test as deprecated
The gluster protocol backend will be deprecated as of qemu-9.2. Allow
it for now in the QMP schema validator and mark them to be dropped once
gluster is removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 10 Oct 2024 15:33:46 +0000 (17:33 +0200)]
qemublocktest: Mark gluster cases in XML->json->XML tests as deprecated
The gluster protocol backend will be deprecated as of qemu-9.2. Allow it
for now in the QMP schema validator and mark them to be dropped once
gluster is removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 10 Oct 2024 15:21:40 +0000 (17:21 +0200)]
qemublocktest: Convert all 'gluster' instances to 'nbd' in 'xml2json' cases
Gluster will be deprecated in the upcoming qemu version thus we need to
replace the network protocol by something which will stay supported so
that we can keep the tests around.
Convert all cases referencing 'gluster' to 'nbd'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Andrea Bolognani [Tue, 15 Oct 2024 09:50:36 +0000 (11:50 +0200)]
apparmor: Allow running i686 VMs on Debian 12
In Debian 12, the qemu-system-i386 binary in /usr/bin is a wrapper
script, with the actual executable living in /usr/libexec instead.
This makes it impossible to run i686 VMs when AppArmor is enabled.
Peter Krempa [Mon, 14 Oct 2024 06:09:06 +0000 (08:09 +0200)]
qemu: snapshot: Remove dead code in 'qemuSnapshotDeleteBlockJobRunning'
'qemuSnapshotDeleteBlockJobIsRunning' returns only 0 and 1. Convert it
to bool and remove the dead code handling -1 return in the caller.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/682 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 10 Oct 2024 06:42:20 +0000 (08:42 +0200)]
docs: Add warning about using a cleared image with VIR_MIGRATE_PARAM_MIGRATE_DISKS_DETECT_ZEROES_ZEROES
The migration parameter causes zero detection to be enabled and zero
blocks are *not* transferred to the destination. This means that users
must provide pre-cleared images that read all zero, otherwise the
non-zero blocks on destination which reside in places where the source
has zero blocks would be kept intact corrupting the image.
As not transferring and overwriting the zero blocks is what the feature
is supposed to do the users need to provide the proper environment.
Document the requirement, both in API and in the virsh man page for the
'--migrate-disks-detect-zeroes' option.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Peter Krempa [Fri, 11 Oct 2024 16:31:28 +0000 (18:31 +0200)]
qemu: migration: Fix blockdev config with VIR_MIGRATE_PARAM_MIGRATE_DISKS_DETECT_ZEROES
The idea of migration with VIR_MIGRATE_PARAM_MIGRATE_DISKS_DETECT_ZEROES
populated is to sparsify the image. The QEMU NBD client as it was
configured in commit 621f879adf98e2c93ac5c8c869733a57f06cd9aa would
signal to the destination to do thick allocation of holes which would
result in a non-sparse image for any backend except a qcow2 image which
I used to test it.
Switch to VIR_DOMAIN_DISK_DETECT_ZEROES_UNMAP and
VIR_DOMAIN_DISK_DISCARD_UNMAP which tells the NBD client (and that in
turn the NBD server) to preserve the sparse blocks it detected from the
image.
Fixes: 621f879adf98e2c93ac5c8c869733a57f06cd9aa Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Jiri Denemark [Thu, 10 Oct 2024 13:14:36 +0000 (15:14 +0200)]
util: Rename variable "major" in virIsDevMapperDevice
major() is a macro defined in sys/sysmacros.h so luckily the code works,
but it's very confusing. Let's rename the local variable to make the
difference between it and the macro more obvious. And while touching the
line we can also initialize it to make sure "clever" analyzers do not
think it may be used uninitialized.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>
With watchdog action=dump the actual watchdog action is set to pause and
the daemon then proceeds to dump the process. After that the domain is
resumed. That was the case since the feature was added. However the
resuming of the domain might be unexpected, especially when compared to
HW watchdog, which will never run the guest from the point where it got
interrupted.
Document the pre-existing behaviour, since any change might be
unexpected as well. Change of behaviour would require new options like
dump+reset, dump+pause, etc. That option is still possible, but
orthogonal to this change.
Resolves: https://issues.redhat.com/browse/RHEL-753 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Laine Stump [Fri, 30 Aug 2024 16:37:05 +0000 (12:37 -0400)]
network: inhibit idle timeout of daemon if there are any active networks
When the daemons were split out from the monolithic libvirtd, the
network driver didn't implement "inhibit idle timeout if there are any
active objects" as was done for other drivers, so virtnetworkd would
always exit after 120 seconds of no incoming connections. This didn't
every cause any visible problem, although it did mean that anytime a
network API was called after an idle time > 120 seconds, that the
restarting virtnetworkd would flush and reload all the
iptables/nftables rules for any active networks.
This patch replicates what is done in the QEMU driver - an nactive is
added to the network driver object, along with an inhibitCallback; the
latter is passed into networkStateInitialize when the driver is
loaded, and the former is incremented for each already-active network,
then incremented/decremented each time a network is started or
stopped. If nactive transitions from 0 to 1 or 1 to 0, inhibitCallback
is called, and it "does the right stuff" to prevent/enable the idle
timeout.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jim Fehlig [Fri, 6 Sep 2024 22:08:05 +0000 (16:08 -0600)]
libxl: Reject VM config referencing nwfilters
The Xen libxl driver does not support nwfilter. Introduce a
deviceValidateCallback function with a check for nwfilters, returning
VIR_ERR_CONFIG_UNSUPPORTED if any are found. Also fail to start any
existing VMs referencing nwfilters.
Drivers generally ignore unrecognized XML configuration, but ignoring
a user's request to filter VM network traffic can be viewed as a
security issue.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Laine Stump [Fri, 4 Oct 2024 22:43:02 +0000 (18:43 -0400)]
network: a different implementation of *un*setting firewalld zone when network is destroyed
(this is a remake of commit v10.7.0-78-g200f60b2e1, which was reverted
due to a regression in another patch it was dependent on. The new
implementation just adds the call to virFirewallDInterfaceUnsetZone()
into the existing networkRemoveFirewallRules() (but only if we had set
a zone when the network was first started).
Laine Stump [Fri, 4 Oct 2024 22:14:36 +0000 (18:14 -0400)]
network: a different way of supporting firewalld zone for mode='open' networks
Now that networkAddFirewallRules and networkRemoveFirewallRules() are
being called for mode='open' networks, we just need to move the code
that sets the zone outside of the if (mode != ...OPEN) clause, so that
it's done for all forward modes, with the exception of setting the
implied 'libvirt*' zones, which are set when no zone is specified for
all forward modes *except* 'open'.
This was previously done in commit v10.7.0-76-g1a72b83d56, but in a
manner that caused the zone to be unset whenever firewalld reloaded
its rules. That patch was reverted, and this new better patch takes
its place.
Laine Stump [Fri, 4 Oct 2024 21:17:59 +0000 (17:17 -0400)]
network: call network(Add|Remove)FirewallRules() for forward mode='open'
Previously networkAddFirewallRules() and networkRemoveFirewallRules()
were only called if the forward mode was none, 'route', or 'nat', so
those functions didn't check the forward mode. Although their current
contents shouldn't be executed for forward mode='open', soon they will
have extra functionality that should be executed for all the current
forward modes and also mode='open'.
This patch modifies all places either of the functions are called to
make sure they are called for mode='open' in addition to current modes
(by either adding 'case ..._OPEN:' to the case of a switch statement,
or just removing an 'if (mode != ...OPEN)' around the calls; to
balance out for that, it puts the entirety of the contents of both
functions inside if (mode != ...OPEN) to retain current behavior. (an
upcoming patch will add code outside that if clause).
debug log messages were also added to make it easier to test that the
right thing is being done in all cases.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Laine Stump [Fri, 4 Oct 2024 17:46:20 +0000 (13:46 -0400)]
Revert "network: support setting firewalld zone for bridge device of open networks"
This reverts commit 1a72b83d566df952033529001b0f88a66d7f4393. That
patch had made the incorrect assumption that the firewalld zone of a
bridge would not be changed/removed when firewalld reloaded its rules
(e.g. with "killall -HUP firewalld"). It turns out my memory was
faulty, and this *does* remove the bridge interface's zone, which
results in guest networking failure after a firewalld reload, until
the virtual network is restarted.
The functionality reverted as a result of this patch reversion will be
added back in an upcoming patch that keeps the zone setting in
networkAddFirewallRules() (rather than moving it into a separate
function) so that it is called every time the network's firewall rules
are reloaded (including the reload that happens in response to a
reload notification from firewalld).
Signed-off-by: Laine Stump Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Jim Fehlig [Thu, 15 Aug 2024 23:21:50 +0000 (17:21 -0600)]
qemu: Use consistent naming for save image format
The image format setting in qemu.conf is named 'save_image_format'. The
enum of supported format types is declared with name 'virQEMUSaveFormat'.
Let's be consistent and use 'format' instead of 'compressed' when referring
to the save image format.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Jim Fehlig [Thu, 15 Aug 2024 22:52:57 +0000 (16:52 -0600)]
qemu: conf: Improve the foo_image_format setting descriptions
The current description of the various foo_image_format settings can
be construded to imply the setting is only used to control compression
of the image. Improve the documentation to clarify that format describes
the representation of guest memory blocks on disk, which includes
compression among other possible layouts.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Peter Krempa [Tue, 8 Oct 2024 13:06:17 +0000 (15:06 +0200)]
docs: Prohibit 'external' links within the webpage
Enforce that relative links are used within the page, so that local
installations don't require internet conection and/or don't redirect to
the web needlessly.
This is done by looking for any local link (barring exceptions) when
checking links with 'check-html-references.py'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 8 Oct 2024 13:07:03 +0000 (15:07 +0200)]
docs: newreposetup: Drop section about 'libvirt project server'
Now that most things were migrated out of the old server which hosted
the 'libvirt.org' web (now handles only 'https://download.libvirt.org')
which no longer even hosts the cgit web interface (any link redirects to
gitlab) the whole section now is obsolete. Remove it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 8 Oct 2024 07:49:48 +0000 (09:49 +0200)]
docs: Use relative links within the web page
Replace full/external links which point to content within
'https://libvirt.org/' with relative links so that the web page works
fully locally.
This does not change the links in 'docs/manpages' as we want the
installed man page to work from everywhere (even when the local docs are
not installed) and the generated API docs which take links from the C
source.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 8 Oct 2024 11:38:34 +0000 (13:38 +0200)]
docs: Reject non-https external links
Add a '--require-https' switch to 'check-html-references' helper script
which will error out if any non-https external link is used from our web
and use it while builidng docs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Peter Krempa [Tue, 1 Oct 2024 13:16:14 +0000 (15:16 +0200)]
qemu: snapshot: Allow internal snapshots with PFLASH nvram
With the new snapshot QMP command we can select which block device
backend receives the VM state and thus the main issue with internal
snapshots with pflash was addressed.
Thus we can relax the check and allow snapshots if the pflash nvram is
on qcow2.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 3 Oct 2024 11:24:31 +0000 (13:24 +0200)]
qemuSnapshotActiveInternalDeleteGetDevices: Add warning when deleting inconsistent snapshot
As explained in the commit which added the new internal snapshot
deletion code we don't want to do any form of strict checking whether
the libvirt metadata is consistent with the on-disk state as we didn't
historically do that.
In order to be able to spot the cases add a warning into the logs if
such state is encountered. While warnings are easy to miss it's the only
reasonable way to do that. Users will be encouraged to file an issue
with the information, without requiring them to enable debug logs as
the reproduction of that issue may include very old historical state.
The checker is deliberately added separately so that it can be easily
reverted once it's no longer needed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Thu, 3 Oct 2024 11:23:20 +0000 (13:23 +0200)]
qemu snapshot: use QMP snapshot-delete for internal snapshots deletion
Switch to using the modern QMP command.
As the user visible logic when deleting internal snapshots using the old
'delvm' command was very lax in terms of catching inconsistencies
between the snapshot metadata and on-disk state we re-implement this
behaviour even using the new command. We could improve the validation
but that'd go at the cost of possible failures which users might not
expect.
As 'delvm' was simply ignoring any kind of failure the selection of
devices to delete the snapshot from is based on querying qemu first
which top level images do have the internal snapshot and then continuing
only on those.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
qemu snapshot: use QMP snapshot-save for internal snapshots creation
The usage of HMP commands are highly discouraged by qemu. Moreover,
current snapshot creation routine does not provide flexibility in
choosing target device for VM state snapshot.
This patch makes use of QMP commands snapshot-save and by
default chooses first writable non-shared qcow2 disk (if present)
as target for VM state.
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Peter Krempa [Wed, 2 Oct 2024 12:24:31 +0000 (14:24 +0200)]
qemu: monitor: Store internal snapshot names from 'query-named-block-nodes'
Store the names of internal snapshots present in supported images in the
data we dump from 'query-named-block-nodes' so that the upcoming changes
to the internal snapshot code can access it.
To test this we use the bitmap detection test cases which can be easily
extended to dump this data.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The 'snapshot-save/delete' QMP commands were introduced in QEMU 6.0.0,
so we add a compatible capability to check if target QEMU binary supports it.
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
qemu: blockjob: Add job types for 'snapshot-save/delete'
The snapshot creation/deletion QMP commands use the qemu 'job' API
to signal completion thus we need to add corresponding job types.
As the job handles everything internally we don't store anything about
the job.
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
qemu: monitor: Add plumbing for 'snaphot-save'/'snapshot-delete' QMP commands
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Jiri Denemark [Tue, 8 Oct 2024 10:26:46 +0000 (12:26 +0200)]
cpu_map: Drop vmx-invvpid-single-context from CPU models
QEMU calls the same feature differently, but translating the names in
libvirt does not make sense because the name in QEMU conflicts with
another feature. QEMU will not change the name for compatibility reasons
so we can just drop our invented name as it is not supported by QEMU.
Apart from this slightly different reason behind the feature being
unsupported by QEMU the situation is similar to vmx-ept-{uc,wb} dropped
in the previous patch and so is the implications.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 8 Oct 2024 10:26:45 +0000 (12:26 +0200)]
cpu_map: Drop vmx-ept-{uc,wb} features from CPU models
Although QEMU knows and enables the corresponding MSR bits, it does not
allow users to configure them (there are no names attached to them).
They should have never been added to the CPU map and definitely not to
CPU models as the features will always be considered disabled regardless
on their actual state as QEMU will not report them.
While we cannot drop them completely for backward compatibility, we can
at least remove them from all CPU models.
This is effectively no change for CPU models where the features were
marked with added='yes' because migration source would always remove the
features from domain XML so not adding them to the live XML does not
hurt. On the other side the destination could not ever be surprised by
the features being suddenly enabled as QEMU never reports them, which
means libvirt considers them disabled all the time.
GraniteRapids CPU model is the only one which contains the feature ever
since it was introduced in libvirt, but it was never possible to migrate
a domain with such CPU. The source would always mark vmx-ept-wb as
disabled and the destination without the fixes in this series would drop
the feature from the XML completely as it is unsupported by QEMU and
disabled, but when probing for the actual CPU created by QEMU libvirt
would expect the feature to be enabled (as it is included in the CPU
model and not explicitly mentioned in the domain definition) and fail
the migration. There's nothing the source could do to workaround the
behavior on the destination and migration to older libvirt will still be
broken. But it's possible to migrate a domain with GraniteRapids to a
destination with this series applied from both old and new source.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 8 Oct 2024 10:26:43 +0000 (12:26 +0200)]
qemu: Translate vmx-invvpid-single-context-noglobals CPU feature
This feature is called "vmx-invept-single-context-noglobals" in QEMU and
our CPU map even contains the appropriate alias. But we failed to
actually translate the name when talking to QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 8 Oct 2024 10:26:42 +0000 (12:26 +0200)]
cpu-data.py: Properly handle aliases
The script is used to create data files for cputest from QEMU replies.
By ignoring aliases we might end up thinking a feature is not enabled by
QEMU just because its name differs from the primary one in the CPU map.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 8 Oct 2024 10:26:41 +0000 (12:26 +0200)]
qemu: Do not drop unknown CPU features from domain XML
CPU features with policy='disable' which are unknown to QEMU may be
safely skipped when generating the -cpu command line, but we should
still keep them in the domain definition so that we can properly check
they are disabled after migrating the domain to a newer QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jiri Denemark [Tue, 8 Oct 2024 10:26:40 +0000 (12:26 +0200)]
qemu: Drop vmx-* from migratable CPU model only when origCPU is set
When qemuDomainMakeCPUMigratable is called with origCPU == NULL the code
just removed all vmx-* features marked as added in the specified CPU
model just like when origCPU is not NULL, but does not list any of the
vmx-* features. But this is wrong, we should not touch these features at
all when no origCPU is supplied, which happens when parsing XML passed
by a user (e.g., migration XML). Such XML is supposed to be generated by
libvirt as migration XML and contains only vmx-* features explicitly
requested by a user.
https://issues.redhat.com/browse/RHEL-52314
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>