The Linux MADV_DONTDUMP flag was added to Linux kernels > 3.3,
back in 2012, and the dump-guest-core flag was added to QEMU
> 1.0 at the same time.
IOW, on Linux we have long been able to assume that QEMU core
dumps will exclude guest memory, unless the user has overridden
the host level defaults in the domain XML.
It is desirable to permit QEMU core dumps out of the box to make
it easier for users to report crashes to their OS vendor without
having to reconfigure and restart libvirt daemons and their
running guests.
While there is a risk that an admin may have set 'dump_guest_core'
to true, while leaving 'max_core' to 0, on balance the benefits
of easier troubleshooting outweigh the risk of changing the
defaults to permit core dumps.
Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Leigh Brown [Tue, 3 Dec 2024 16:02:08 +0000 (16:02 +0000)]
lxc: remove no longer working netns check
Since iproute2 v6.12.0, the command "ip link set lo netns -1" can
no longer be used to check for netns support, as it now validates
PIDs are not less than zero.
Since every kernel we care about has the support, just remove the
check.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Peter Krempa [Thu, 28 Nov 2024 07:48:49 +0000 (08:48 +0100)]
virschematest: Don't skip all "directory" tests
Due to a bug in the optimization to avoid testing symlinked tests
multiple times all tests were skipped.
In commit f997fcca71a16b102e6ee663 I made an attempt to optimize the
tests by avoiding testing symlinks. This optimization was buggy as I've
passed the 'd_name' field of 'struct dirent' which is just the filename
to 'g_lstat()'. 'g_lstat()' obviously always failed with ENOENT. As the
logic checked only for successful return of 'g_lstat()' the optimizatio
was a dud.
Now in 4d8ebbfee83edb2 the 'g_lstat()' call was replaced by
'virFileIsLink()' checking all non-zero values. This meant that if
'virFileIsLink()' failed the test was skipped. Now since a bad argument
was passed this failed always and thus was always skipped making
'virschematest' useless.
Fix it by passing the full path of the test and also explicitly check
for '1' return value instead of any non-zero.
Fixes: f997fcca71a16b102e6ee663a3fb86bed8de9d7d Fixes: 4d8ebbfee83edb26b19a62465b9f98d0126db991 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Peter Krempa [Thu, 28 Nov 2024 08:05:13 +0000 (09:05 +0100)]
schemas: domain: Make <identity> subelement of NFS disk source optional
Both the 'user' and 'group' attribute are optional so <identity> can
be empty. Allow it to be omitted completely. The parser and qemu code
can handle that.
Peter Krempa [Tue, 26 Nov 2024 08:35:47 +0000 (09:35 +0100)]
qemuDomainVirStorageSourceFindByNodeName: Match also '<dataStore>' sources
As the source for the data file is a completely separate
virStorageSource including it's own index we need to match it
explicitly, so that code such as storage threshold events work properly
and separately for the data file.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Peter Krempa [Tue, 26 Nov 2024 08:12:01 +0000 (09:12 +0100)]
qemu: block: Ensure that <dataStore> is in appropriate state
In contrast to normal backing chain members where qemu does honour the
'auto-read-only' property the 'data-file' nodes are not automatically
reopened by qemu. Libvirt now has the infrastructure to reopen them
explicitly so use it for all transitions of the 'commit' block job.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Peter Krempa [Tue, 26 Nov 2024 08:10:10 +0000 (09:10 +0100)]
qemuBlockReopenAccess: Don't require backing chain terminator for non-chained images
Add an exception for image formats not supporting backing images so that
they can be reopened RW/RO without the need for adding a terminating
virStorageSource as they simply can't have a backing image.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Peter Krempa [Tue, 26 Nov 2024 12:20:56 +0000 (13:20 +0100)]
qemuBlockReopenAccess: Fix update of 'readonly' state
Refactors done in 24b667eeed78d2df (and also 9ec0e28e876b17df9)
broke the expected handling of the update of 'readonly' flag of a
virStorage. The source is actually set to the proper state but rolled
back to the previous state as the 'cleanup' label should have been
'error' and thus not reached on success.
Additionally some of the code paths violate the statement in the comment
after updating 'readonly' that only 'goto error' must be used.
Fixes: 24b667eeed78d2df0376a38a592ed9d8c2744bdc Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Michal Privoznik [Wed, 27 Nov 2024 11:25:12 +0000 (12:25 +0100)]
qemu: Validate QoS values in qemuDomainSetInterfaceParameters()
This is similar to one of my previous commits (v10.7.0-rc1~22)
which introduced a check that <bandwidth/> values fit into
certain limits. My original commit validated values when parsing
<bandwidth/> XML, but completely missed the case when values are
set over virDomainSetInterfaceParameters() API.
Solution is simple - just perform validation after bandwidth
structure is reconstructed from arguments passed to the API.
Resolves: https://issues.redhat.com/browse/RHEL-65372 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Jiri Denemark [Wed, 27 Nov 2024 07:34:52 +0000 (08:34 +0100)]
cpu: Check blockers in virCPUCompareUnusable only if they exist
virCPUCompareUnusable can be called with blockers == NULL in case the
CPU model itself is usable (i.e., QEMU reports an empty list of
blockers), but the CPU definition contains some additional features
which have to be checked.
Fixes: v10.8.0-129-g5f8abbb7d0 Reported-by: Han Han <hhan@redhat.com> Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Tested-by: Han Han <hhan@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Laine Stump [Tue, 26 Nov 2024 03:24:49 +0000 (22:24 -0500)]
network: add tc filter rule to nftables backend to fix checksum of DHCP responses
Please see the commit log for commit v10.9.0-rc1-1-g42ab0148dd for the
history and explanation of the problem that this patch is fixing.
A shorter explanation is that when a guest is connected to a libvirt
virtual network using a virtio-net adapter with in-kernel "vhost-net"
packet processing enabled, it will fail to acquire an IP address from
a DHCP seever running on the host.
In commit v10.9.0-rc1-1-g42ab0148dd we tried fixing this by *zeroing
out* the checksums of these packets with an nftables rule (nftables
can't recompute the checksum, but it can set it to 0) . This
*appeared* to work initially, but it turned out that zeroing the
checksum ends up breaking dhcp packets on *non* virtio/vhost-net guest
interfaces. That attempt was reverted in commit v10.9.0-rc2.
Fortunately, there is an existing way to recompute the checksum of a
packet as it leaves an interface - the "tc" (traffic control) utility
that libvirt already uses for bandwidth management. This patch uses a
tc filter rule to match dhcp response packets on the bridge and
recompute their checksum.
The filter rule must be attached to a tc qdisc, which may also have a
filter attached for bandwidth management (in the <bandwidth> element
of the network config). Not only must we add the qdisc only once
(which was already handled by the patch two prior to this one), but
also the filter rule for checksum fixing and the filter rule for
bandwidth management must be different priorities so they don't clash;
this is solved by adding the checksum-fix filter with "priority 2",
while the bandwidth management filter remains "priority 1" (both will
always be evaluated anyway, it's just a matter of which is evaluated
first).
So far this method has worked with every different guest we could
throw at it, including several that failed with the previous method.
Fixes: b89c4991daa0ee9371f10937fab3b03c5ffdabc6 Reported-by: Rich Jones <rjones@redhat.com> Reported-by: Andrea Bolognani <abologna@redhat.com> Fix-Suggested-by: Eric Garver <egarver@redhat.com> Fix-Suggested-by: Phil Sutter <psutter@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 26 Nov 2024 03:24:48 +0000 (22:24 -0500)]
util: add new "tc" layer for virFirewallCmd objects
If the layer of a virFirewallCmd is "tc", then the "tc" utility will
be executed using the arguments that had been added to the
virFirewallCmd
tc layer doesn't support auto-rollback command creation (any rollback
needs to be added manually with virFirewallAddRollbackCmd()), and also
tc layer isn't supported by the iptables backend (it would have been
straightforward to add, but the iptables backend doesn't need it, and
I didn't want to take the chance of causing a regression in that
code for no good reason).
Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 26 Nov 2024 03:24:47 +0000 (22:24 -0500)]
util: don't re-add the qdisc used for tx filters if it already exists
There will soon be two separate users of tc on virtual networks, and
both will use the "qdisc root handle 1: htb" to add tx filters. One or the
other could get the first chance to add the qdisc, and then if at a
later time the other decides to use it, we need to prevent the 2nd
user from attempting to re-add the qdisc (because that just generates
an error).
We do this by running "tc qdisc show dev $bridge handle 1:" then
checking if the output of that command contains both "qdisc" and " 1:
".[*] If it does then the qdisc has already been added. If not then we
need to add it now.
[*]As of this writing, the output more exactly starts with "qdisc
htb 1: root", but our comparison is made purposefully generous to
increase the chances that it will continue to work properly if tc
modifies the format of its output.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 26 Nov 2024 03:24:46 +0000 (22:24 -0500)]
util: put the command that adds a tx filter qdisc into a separate function
virNetDevBandwidthSet() adds a queue discipline (qdisc) for each
interface that it will need to add tc transmit filters to, and the
filters are then attached to the qdisc.
There are other circumstances where some other function will need to
add tc transmit filters to an interface (in particular an upcoming
patch to the network driver nftables backend that will use a tc tx
filter to fix the checksum of dhcp packets), so that function will
also need a qdisc for the tx filter. To assure both always use exactly
the same qdisc, this patch puts the command that adds the tx filter
qdisc into a separate helper function that can (and will) be called
from either place
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 26 Nov 2024 03:24:45 +0000 (22:24 -0500)]
util: make it optional to clear existing tc qdiscs/filters in virNetDevBandwidthSet()
virNetDevBandwidthSet() always clears all existing qdiscs and their
subordinate filters before adding all the new qdiscs/filters. This is
normally exactly what we want, but there is one case (the network
driver) where the Qdisc added by virNetDevBandwidthSet() may already
be in use by the nftables backend (which will add a rule to fix the
checksum of dhcp packets); in that case, we *don't* want
virNetDevBandwidthSet() to clear out the qdisc that was already added
for nftables, and none of the bandwidth filters have been added yet,
so there already aren't any "old" filters that need to be removed
either - it is safe to just skip virNetDevBandwidthClear() in this
case.
To allow the network driver to set bandwidth without first clearing
it, this patch adds the flag VIR_NETDEV_BANDWIDTH_SET_CLEAR_ALL to the
virNetDevBandwidthSetFlags enum, and recognizes it in
virNetDevBandwidthSet() - if the flag is set, then
virNetDevBandwidth() will call virNetDevBandwidthClear() just as it
always has. But if the flag isn't set it *won't* call
virNetDevBandwidthClear().
As suggested above, VIR_NETDEV_BANDWIDTH_SET_CLEAR_ALL is set for all
calls to virNetdevBandwidthSet() except for two places in the network
driver.
Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Laine Stump [Tue, 26 Nov 2024 03:24:44 +0000 (22:24 -0500)]
util: use a single flags arg for virNetDevBandwidthSet(), not multiple bools
Having two bools in the arg list is on the borderline of being
confusing to anyone trying to read the code, but we're about to add a
3rd. This patch replaces the two bools with a single flags argument
which will instead have one or more bits from virNetDevBandwidthFlags
set.
Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Jiri Denemark [Fri, 22 Nov 2024 18:09:39 +0000 (19:09 +0100)]
cpu_x86: Record relations between CPU models
Record a fact a specific CPU model was derived from another one. The
original model is also marked as an alias of the new one in case it did
not change any properties of the original CPU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Tue, 22 Oct 2024 10:38:04 +0000 (12:38 +0200)]
sync_qemu_models_i386: Copy signatures from base model
The signatures in the CPU map are used for matching physical CPUs and
thus we need to cover all possible real world variants we know about.
When adding a new version of an existing CPU model, we should copy the
signature(s) of the existing model rather than replacing it with the
signature that QEMU uses.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Tue, 22 Oct 2024 07:16:46 +0000 (09:16 +0200)]
cpu_map: Properly group models in index.xml
We already visually group the included models using comments. This patch
introduces a new <group name='...'> element for doing it properly in a
machine friendly way.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Tue, 12 Nov 2024 10:24:23 +0000 (11:24 +0100)]
sync_qemu_models_i386: Store extra info in a separate file
We don't really need or want the extra info to be included in the CPU
model definitions in git, it's mostly useful for verifying the output of
the script. Let's store it in a separate file rather than in a comment
block of the CPU model definition itself.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Mon, 21 Oct 2024 11:25:51 +0000 (13:25 +0200)]
sync_qemu_models_i386: Add support for versioned CPU models
Each CPU model with -v* suffix is defined as a standalone model copying
all attributes of the previous version. CPU model versions with an alias
are handled differently. The full definition is used for the alias and
the versioned model is created as an identical copy of the alias.
To avoid breaking migration compatibility of host-model CPUs all
versioned models are marked with <decode guest='off'/> so that they are
ignored when selecting candidates for host-model. It's not ideal but not
doing so would break almost all host-model CPUs as the new versioned CPU
models have all vmx-* features included since their introduction while
existing CPU models were updated later. This meas existing models would
be accompanied with a long list of vmx-* features to properly describe a
host CPU while the newly added CPU models would have those features
enabled implicitly and their list of features would be significantly
shorter. Thus the new models would always be better candidates for
host-model than the existing models.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Mon, 21 Oct 2024 11:00:26 +0000 (13:00 +0200)]
sync_qemu_models_i386: Do not require full path to QEMU's cpu.c
While the script for synchronizing CPU features expects a path to QEMU
source tree, this CPU model script insisted on getting a full patch to
cpu.c file, even though it could easily deduce it from the path to QEMU
source tree.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Mon, 21 Oct 2024 10:55:32 +0000 (12:55 +0200)]
sync_qemu_models_i386: Do not overwrite existing models
We don't change definitions of CPU models which were already included in
a libvirt release to maintain migration compatibility. Thus the script
can just skip existing models and save us from having to drop the
changes it would do to them.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Mon, 21 Oct 2024 09:54:50 +0000 (11:54 +0200)]
sync_qemu_features_i386: Add some removed features back
When removing features unknown to QEMU (they have a different name or
are completely missing as they are not configurable by a user) I should
not have removed them from the list of features unknown to QEMU in the
script for synchronizing QEMU features to the CPU map.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Jiri Denemark [Fri, 15 Nov 2024 13:52:12 +0000 (14:52 +0100)]
cpu_x86: Promote added/removed from ancestor
When a CPU model is defined based on another model, we were completely
ignoring features marked as added to or removed from the original model
after it was released. For added features this is the right thing to do
as it will promote them to become normal features included in the new
model. But features marked as removed would become included in the new
model as well. We need to explicitly remove them as if they were never
included in the model.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Nikolai Barybin [Wed, 20 Nov 2024 15:48:49 +0000 (18:48 +0300)]
qemuxmlconftest: Add test cases for the <dataStore> feature
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Nikolai Barybin [Wed, 20 Nov 2024 15:48:49 +0000 (18:48 +0300)]
qemuxmlactivetest: Add tests for <dataStore>
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Nikolai Barybin [Wed, 20 Nov 2024 15:48:49 +0000 (18:48 +0300)]
tests: virstoragetest: Add tests for detection of qcow2 'data_file' feature
Add two test images showing the use of 'data_file' and 'data_file_raw'
(although the latter is not detected by libvirt) so that we can see that
the qcow2 metadata parser and backing chain populators work correctly.
Note that 'data_file_raw' is mutually exclusive with backing images.
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Nikolai Barybin [Wed, 20 Nov 2024 15:48:48 +0000 (18:48 +0300)]
qemu: block: Add support for 'data-file' feature of qcow2
Add the block infrastructure for detecting and landling the data file
for images and starting qemu with the configuration.
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Nikolai Barybin [Wed, 20 Nov 2024 15:48:46 +0000 (18:48 +0300)]
qemu: put data-file path to VM's cgroup and namespace
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Nikolai Barybin [Wed, 20 Nov 2024 15:48:42 +0000 (18:48 +0300)]
storage file: fill in src->dataFileStore during file probe
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Nikolai Barybin [Wed, 20 Nov 2024 15:48:41 +0000 (18:48 +0300)]
storage file: add qcow2 data-file path parsing from header
In qcow2 header data file is represented by incompitible feature bit
and its path is saved to header extension table.
Thus, we implement here the logic similar to backing file probing.
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>