Eugenio Pérez [Mon, 19 Jan 2026 14:33:03 +0000 (15:33 +0100)]
vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
The next patch adds new ioctl with the ASID member per entry. Abstract
these two so it can be build on top easily.
Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-11-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:33:02 +0000 (15:33 +0100)]
vduse: take out allocations from vduse_dev_alloc_coherent
The function vduse_dev_alloc_coherent will be called under rwlock in
next patches. Make it out of the lock to avoid increasing its fail
rate.
Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-10-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:33:00 +0000 (15:33 +0100)]
vduse: refactor vdpa_dev_add for goto err handling
Next patches introduce more error paths in this function. Refactor it
so they can be accommodated through gotos.
Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-8-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:32:59 +0000 (15:32 +0100)]
vhost: forbid change vq groups ASID if DRIVER_OK is set
Only vdpa_sim support it. Forbid this behaviour as there is no use for
it right now, we can always enable it in the future with a feature flag.
Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-7-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:32:58 +0000 (15:32 +0100)]
vdpa: document set_group_asid thread safety
Document that the function races with the check of DRIVER_OK.
Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-6-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:32:57 +0000 (15:32 +0100)]
vduse: return internal vq group struct as map token
Return the internal struct that represents the vq group as virtqueue map
token, instead of the device. This allows the map functions to access
the information per group.
At this moment all the virtqueues share the same vq group, that only
can point to ASID 0. This change prepares the infrastructure for actual
per-group address space handling
Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-5-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:32:56 +0000 (15:32 +0100)]
vduse: add vq group support
This allows separate the different virtqueues in groups that shares the
same address space. Asking the VDUSE device for the groups of the vq at
the beginning as they're needed for the DMA API.
Allocating 3 vq groups as net is the device that need the most groups:
* Dataplane (guest passthrough)
* CVQ
* Shadowed vrings.
Future versions of the series can include dynamic allocation of the
groups array so VDUSE can declare more groups.
Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-4-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:32:55 +0000 (15:32 +0100)]
vduse: add v1 API definition
This allows the kernel to detect whether the userspace VDUSE device
supports the VQ group and ASID features. VDUSE devices that don't set
the V1 API will not receive the new messages, and vdpa device will be
created with only one vq group and asid.
The next patches implement the new feature incrementally, only enabling
the VDUSE device to set the V1 API version by the end of the series.
Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-3-eperezma@redhat.com>
Eugenio Pérez [Mon, 19 Jan 2026 14:32:54 +0000 (15:32 +0100)]
vhost: move vdpa group bound check to vhost_vdpa
Remove duplication by consolidating these here. This reduces the
posibility of a parent driver missing them.
While we're at it, fix a bug in vdpa_sim where a valid ASID can be
assigned to a group equal to ngroups, causing an out of bound write.
Cc: stable@vger.kernel.org Fixes: bda324fd037a ("vdpasim: control virtqueue support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-2-eperezma@redhat.com>
Currently, cacheline group macros trigger checkpatch warnings.
For example:
$ ./scripts/checkpatch.pl -g ba7e025a6c84aed012421468d83639e5dae982b0
WARNING: Missing a blank line after declarations
#58: FILE: drivers/gpio/gpio-virtio.c:32:
+ struct virtio_gpio_response res;
+ __dma_from_device_group_end();
$ ./scripts/checkpatch.pl -g 5d4cc87414c5d11345c4b11d61377d351b5c28a2
WARNING: Missing a blank line after declarations
#267: FILE: include/net/sock.h:431:
+ int sk_rcvlowat;
+ __cacheline_group_end(sock_read_rx);
But these are not actually statements - the following macros
all expand to zero-length fields:
__cacheline_group_begin()
__cacheline_group_end()
__cacheline_group_begin_aligned()
__cacheline_group_end_aligned()
__dma_from_device_group_begin()
__dma_from_device_group_end()
Add them to $declaration_macros so checkpatch recognizes this fact.
The res and ires buffers in struct virtio_gpio_line and struct
vgpio_irq_line respectively are used for DMA_FROM_DEVICE via
virtqueue_add_sgs(). However, within these structs, even though these
elements are tagged as ____cacheline_aligned, adjacent struct elements
can share DMA cachelines on platforms where ARCH_DMA_MINALIGN >
L1_CACHE_BYTES (e.g., arm64 with 128-byte DMA alignment but 64-byte
cache lines).
The existing ____cacheline_aligned annotation aligns to L1_CACHE_BYTES
which is not always sufficient for DMA alignment. For example, with
L1_CACHE_BYTES = 32 and ARCH_DMA_MINALIGN = 128
- irq_lines[0].ires at offset 128
- irq_lines[1].type at offset 192
both in same 128-byte DMA cacheline [128-256)
When the device writes to irq_lines[0].ires and the CPU concurrently
modifies one of irq_lines[1].type/disabled/masked/queued flags,
corruption can occur on non-cache-coherent platforms.
Fix by using __dma_from_device_group_begin()/end() annotations on the
DMA buffers. Drop ____cacheline_aligned - it's not required to isolate
request and response, and keeping them would increase the memory cost.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-ID: <ba7e025a6c84aed012421468d83639e5dae982b0.1767601130.git.mst@redhat.com> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reorder struct virtio_vsock fields to place the DMA buffer (event_list)
last. This eliminates the padding from aligning the struct size on
ARCH_DMA_MINALIGN.
virtio_input: use virtqueue_add_inbuf_cache_clean for events
The evts array contains 64 small (8-byte) input events that share
cachelines with each other. When CONFIG_DMA_API_DEBUG is enabled,
this can trigger warnings about overlapping DMA mappings within
the same cacheline.
Previous patch isolated the array in its own cachelines,
so the warnings are now spurious.
Use virtqueue_add_inbuf_cache_clean() to indicate that the CPU does not
write into these cache lines, suppressing these warnings.
The data buffer in struct virtrng_info is used for DMA_FROM_DEVICE via
virtqueue_add_inbuf() and shares cachelines with the adjacent
CPU-written fields (data_avail, data_idx).
The device writing to the DMA buffer and the CPU writing to adjacent
fields could corrupt each other's data on non-cache-coherent platforms.
Add __dma_from_device_group_begin()/end() annotations to place these
in distinct cache lines.
Current struct virtio_scsi_event_node layout has two problems:
The event (DMA_FROM_DEVICE) and work (CPU-written via
INIT_WORK/queue_work) fields share a cacheline.
On non-cache-coherent platforms, CPU writes to work can
corrupt device-written event data.
If ARCH_DMA_MINALIGN is large enough, the 8 events in event_list share
cachelines, triggering CONFIG_DMA_API_DEBUG warnings.
Fix the corruption by moving event buffers to a separate array and
aligning using __dma_from_device_group_begin()/end().
Suppress the (now spurious) DMA debug warnings using
virtqueue_add_inbuf_cache_clean().
On non-cache-coherent platforms, when a structure contains a buffer
used for DMA alongside fields that the CPU writes to, cacheline sharing
can cause data corruption.
The evts array is used for DMA_FROM_DEVICE operations via
virtqueue_add_inbuf(). The adjacent lock and ready fields are written
by the CPU during normal operation. If these share cachelines with evts,
CPU writes can corrupt DMA data.
Add __dma_from_device_group_begin()/end() annotations to ensure evts is
isolated in its own cachelines.
vsock/virtio: use virtqueue_add_inbuf_cache_clean for events
The event_list array contains 8 small (4-byte) events that share
cachelines with each other. When CONFIG_DMA_API_DEBUG is enabled,
this can trigger warnings about overlapping DMA mappings within
the same cacheline.
The previous patch isolated event_list in its own cache lines
so the warnings are spurious.
Use virtqueue_add_inbuf_cache_clean() to indicate that the CPU does not
write into these fields, suppressing the warnings.
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Message-ID: <4b5bf63a7ebb782d87f643466b3669df567c9fe1.1767601130.git.mst@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On non-cache-coherent platforms, when a structure contains a buffer
used for DMA alongside fields that the CPU writes to, cacheline sharing
can cause data corruption.
The event_list array is used for DMA_FROM_DEVICE operations via
virtqueue_add_inbuf(). The adjacent event_run and guest_cid fields are
written by the CPU while the buffer is available, so mapped for the
device. If these share cachelines with event_list, CPU writes can
corrupt DMA data.
Add __dma_from_device_group_begin()/end() annotations to ensure event_list
is isolated in its own cachelines.
Add virtqueue_add_inbuf_cache_clean() for passing DMA_ATTR_CPU_CACHE_CLEAN
to virtqueue operations. This suppresses DMA debug cacheline overlap
warnings for buffers where proper cache management is ensured by the
caller.
If a driver is buggy and has 2 overlapping mappings but only
sets cache clean flag on the 1st one of them, we warn.
But if it only does it for the 2nd one, we don't.
Document DMA_ATTR_CPU_CACHE_CLEAN as implemented in the
previous patch.
Message-ID: <0720b4be31c1b7a38edca67fd0c97983d2a56936.1767601130.git.mst@redhat.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When multiple small DMA_FROM_DEVICE or DMA_BIDIRECTIONAL buffers share a
cacheline, and DMA_API_DEBUG is enabled, we get this warning:
cacheline tracking EEXIST, overlapping mappings aren't supported.
This is because when one of the mappings is removed, while another one
is active, CPU might write into the buffer.
Add an attribute for the driver to promise not to do this, making the
overlapping safe, and suppressing the warning.
Message-ID: <2d5d091f9d84b68ea96abd545b365dd1d00bbf48.1767601130.git.mst@redhat.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Document the __dma_from_device_group_begin()/end() annotations.
Message-ID: <01ea88055ded4d70cac70ba557680fd5fa7d9ff5.1767601130.git.mst@redhat.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When a structure contains a buffer that DMA writes to alongside fields
that the CPU writes to, cache line sharing between the DMA buffer and
CPU-written fields can cause data corruption on non-cache-coherent
platforms.
Add __dma_from_device_group_begin()/end() annotations to ensure proper
alignment to prevent this:
Message-ID: <19163086d5e4704c316f18f6da06bc1c72968904.1767601130.git.mst@redhat.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:49 +0000 (14:46 +0800)]
virtio_ring: add in order support
This patch implements in order support for both split virtqueue and
packed virtqueue. Performance could be gained for the device where the
memory access could be expensive (e.g vhost-net or a real PCI device):
Benchmark also shows no performance impact for in_order=off for queue
size with 256 and 1024.
Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-20-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:48 +0000 (14:46 +0800)]
virtio_ring: factor out split detaching logic
This patch factors out the split core detaching logic that could be
reused by in order feature into a dedicated function.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-19-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:47 +0000 (14:46 +0800)]
virtio_ring: factor out split indirect detaching logic
Factor out the split indirect descriptor detaching logic in order to
allow it to be reused by the in order support.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-18-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:46 +0000 (14:46 +0800)]
virtio_ring: factor out core logic for updating last_used_idx
Factor out the core logic for updating last_used_idx to be reused by
the packed in order implementation.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-17-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:45 +0000 (14:46 +0800)]
virtio_ring: factor out core logic of buffer detaching
Factor out core logic of buffer detaching and leave the free list
management to the caller so in_order can just call the core logic.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-16-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:44 +0000 (14:46 +0800)]
virtio_ring: determine descriptor flags at one time
Let's determine the last descriptor by counting the number of sg. This
would be consistent with packed virtqueue implementation and ease the
future in-order implementation.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-15-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:43 +0000 (14:46 +0800)]
virtio_ring: introduce virtqueue ops
This patch introduces virtqueue ops which is a set of callbacks
that will be called for different queue layout or features. This would
help to avoid branches for split/packed and will ease the future
implementation like in order.
Note that in order to eliminate the indirect calls this patch uses
global array of const ops to allow compiler to avoid indirect
branches.
Tested with CONFIG_MITIGATION_RETPOLINE, no performance differences
were noticed.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-14-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:42 +0000 (14:46 +0800)]
virtio_ring: switch to use unsigned int for virtqueue_poll_packed()
Switch to use unsigned int for virtqueue_poll_packed() to match
virtqueue_poll() and virtqueue_poll_split() and to ease the
abstraction of the virtqueue ops.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-13-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:41 +0000 (14:46 +0800)]
virtio_ring: switch to use vring_virtqueue for detach_unused_buf variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-12-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:40 +0000 (14:46 +0800)]
virtio_ring: switch to use vring_virtqueue for disable_cb variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-11-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:39 +0000 (14:46 +0800)]
virtio_ring: use vring_virtqueue for enable_cb_delayed variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-10-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:38 +0000 (14:46 +0800)]
virtio_ring: switch to use vring_virtqueue for enable_cb_prepare variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-9-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:37 +0000 (14:46 +0800)]
virtio: switch to use vring_virtqueue for virtqueue_get variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-8-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:36 +0000 (14:46 +0800)]
virtio_ring: switch to use vring_virtqueue for virtqueue_add variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-7-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:35 +0000 (14:46 +0800)]
virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-6-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:34 +0000 (14:46 +0800)]
virtio_ring: switch to use vring_virtqueue for virtqueue resize variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-5-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:33 +0000 (14:46 +0800)]
virtio_ring: unify logic of virtqueue_poll() and more_used()
This patch unifies the logic of virtqueue_poll() and more_used() for
better code reusing and ease the future in order implementation.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-4-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:32 +0000 (14:46 +0800)]
virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants
Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-3-jasowang@redhat.com>
Jason Wang [Tue, 30 Dec 2025 06:46:31 +0000 (14:46 +0800)]
virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx()
To be consistent with virtqueue_reset().
Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251230064649.55597-2-jasowang@redhat.com>
Jon Kohler [Thu, 13 Nov 2025 00:55:28 +0000 (17:55 -0700)]
vhost: use "checked" versions of get_user() and put_user()
vhost_get_user and vhost_put_user leverage __get_user and __put_user,
respectively, which were both added in 2016 by commit 6b1e6cc7855b
("vhost: new device IOTLB API"). In a heavy UDP transmit workload on a
vhost-net backed tap device, these functions showed up as ~11.6% of
samples in a flamegraph of the underlying vhost worker thread.
Quoting Linus from [1]:
Anyway, every single __get_user() call I looked at looked like
historical garbage. [...] End result: I get the feeling that we
should just do a global search-and-replace of the __get_user/
__put_user users, replace them with plain get_user/put_user instead,
and then fix up any fallout (eg the coco code).
Switch to plain get_user/put_user in vhost, which results in a slight
throughput speedup. get_user now about ~8.4% of samples in flamegraph.
Basic iperf3 test on a Intel 5416S CPU with Ubuntu 25.10 guest:
TX: taskset -c 2 iperf3 -c <rx_ip> -t 60 -p 5200 -b 0 -u -i 5
RX: taskset -c 2 iperf3 -s -p 5200 -D
Before: 6.08 Gbits/sec
After: 6.32 Gbits/sec
As to what drives the speedup, Sean's patch [2] explains:
Use the normal, checked versions for get_user() and put_user() instead of
the double-underscore versions that omit range checks, as the checked
versions are actually measurably faster on modern CPUs (12%+ on Intel,
25%+ on AMD).
The performance hit on the unchecked versions is almost entirely due to
the added LFENCE on CPUs where LFENCE is serializing (which is effectively
all modern CPUs), which was added by commit 304ec1b05031 ("x86/uaccess:
Use __uaccess_begin_nospec() and uaccess_try_nospec"). The small
optimizations done by commit b19b74bc99b1 ("x86/mm: Rework address range
check in get_user() and put_user()") likely shave a few cycles off, but
the bulk of the extra latency comes from the LFENCE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-Id: <20251113005529.2494066-1-jon@nutanix.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since the return value of vring_unmap_one_split() is exactly
vq->split.desc_extra[i].next, 'i = vq->split.desc_extra[i].next' is
redundant. Assign vring_unmap_one_split() to i instead.
Since vq->split.desc_extra is assigned to extra, use extra[i].next
instead of vq->split.desc_extra[i].next to improve readability.
No change in functionality.
Signed-off-by: zhangdongchuan <zhangdongchuan@eswincomputing.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <202511261140162936986@eswincomputing.com>
Thomas Weißschuh [Mon, 22 Dec 2025 08:00:33 +0000 (09:00 +0100)]
virtio: uapi: avoid usage of libc types
Using libc types and headers from the UAPI headers is problematic as it
introduces a dependency on a full C toolchain.
On Linux 'unsigned long' works as a replacement for 'uintptr_t' and does
not depend on libc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20251222-uapi-virtio-v1-1-29390f87bcad@linutronix.de>
vhost/vsock: improve RCU read sections around vhost_vsock_get()
vhost_vsock_get() uses hash_for_each_possible_rcu() to find the
`vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu()
should only be called within an RCU read section, as mentioned in the
following comment in include/linux/rculist.h:
/**
* hlist_for_each_entry_rcu - iterate over rcu list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
Currently, all calls to vhost_vsock_get() are between rcu_read_lock()
and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and
vhost_vsock_reset_orphans(). In both cases, the current code is safe,
but we can make improvements to make it more robust.
About vhost_vsock_set_cid(), when building the kernel with
CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the
user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` :
WARNING: suspicious RCU usage
6.18.0-rc7 #62 Not tainted
-----------------------------
drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by rpc-libvirtd/3443:
#0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock]
This is not a real problem, because the vhost_vsock_get() caller, i.e.
vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash
table writers. Anyway, to prevent that warning, add lockdep_is_held()
condition to hash_for_each_possible_rcu() to verify that either the
caller is in an RCU read section or `vhost_vsock_mutex` is held when
CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for
vhost_vsock_get() to better describe the locking requirements and the
scope of the returned pointer validity.
About vhost_vsock_reset_orphans(), currently this function is only
called via vsock_for_each_connected_socket(), which holds the
`vsock_table_lock` spinlock (which is also an RCU read-side critical
section). However, add an explicit RCU read lock there to make the code
more robust and explicit about the RCU requirements, and to prevent
issues if the calling context changes in the future or if
vhost_vsock_reset_orphans() is called from other contexts.
Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers") Cc: stefanha@redhat.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20251126133826.142496-1-sgarzare@redhat.com>
Message-ID: <20251126210313.GA499503@fedora> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
oot build tends to help uncover bugs so it's worth keeping around,
as long as it's low effort.
add stubs for a couple of macros virtio gained recently,
and disable vdpa in the test build.
tools/virtio: add dev_WARN_ONCE and is_vmalloc_addr stubs
Add dev_WARN_ONCE and is_vmalloc_addr stubs needed by virtio_ring.c.
is_vmalloc_addr stub always returns false - that's fine since it's
merely a sanity check.
Add #undef __user before and after including compiler_types.h to avoid
redefinition warnings when compiling with system headers that also
define __user. This allows tools/virtio to build without warnings.
Linus Torvalds [Sun, 21 Dec 2025 23:28:59 +0000 (15:28 -0800)]
Merge tag 'coccinelle-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux
Pull Coccinelle fixes from Julia Lawall:
"These fix a typo and make the coccicheck script more robust by
ensuring that only compatible semantic patches are executed for the
chosen mode"
* tag 'coccinelle-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
Coccinelle: pm_runtime: Fix typo in report message
scripts: coccicheck: filter *.cocci files by MODE
Linus Torvalds [Sun, 21 Dec 2025 22:41:29 +0000 (14:41 -0800)]
Merge tag 'x86-urgent-2025-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix FPU core dumps on certain CPU models
- Fix htmldocs build warning
- Export TLB tracing event name via header
- Remove unused constant from <linux/mm_types.h>
- Fix comments
- Fix whitespace noise in documentation
- Fix variadic structure's definition to un-confuse UBSAN
- Fix posted MSI interrupts irq_retrigger() bug
- Fix asm build failure with older GCC builds
* tag 'x86-urgent-2025-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bug: Fix old GCC compile fails
x86/msi: Make irq_retrigger() functional for posted MSI
x86/platform/uv: Fix UBSAN array-index-out-of-bounds
mm: Remove tlb_flush_reason::NR_TLB_FLUSH_REASONS from <linux/mm_types.h>
x86/mm/tlb/trace: Export the TLB_REMOTE_WRONG_CPU enum in <trace/events/tlb.h>
x86/sgx: Remove unmatched quote in __sgx_encl_extend function comment
x86/boot/Documentation: Fix whitespace noise in boot.rst
x86/fpu: Fix FPU state core dump truncation on CPUs with no extended xfeatures
x86/boot/Documentation: Fix htmldocs build warning due to malformed table in boot.rst
Songwei Chai [Fri, 6 Jun 2025 06:09:36 +0000 (14:09 +0800)]
scripts: coccicheck: filter *.cocci files by MODE
Enhance the coccicheck script to filter *.cocci files based on the
specified MODE (e.g., report, patch). This ensures that only compatible
semantic patch files are executed, preventing errors such as:
"virtual rule report not supported"
This error occurs when a .cocci file does not define a 'virtual <MODE>'
rule, yet is executed in that mode.
For example:
make coccicheck M=drivers/hwtracing/coresight/ MODE=report
In this case, running "secs_to_jiffies.cocci" would trigger the error
because it lacks support for 'report' mode. With this change, such files
are skipped automatically, improving robustness and developer
experience.
Signed-off-by: Songwei Chai <quic_songchai@quicinc.com> Reviewed-by: Julia Lawall <Julia.Lawall@inria.fr>
Linus Torvalds [Sun, 21 Dec 2025 00:54:42 +0000 (16:54 -0800)]
Merge tag 'spi-fix-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of fixes for various SPI drivers, plus a relaxation
of constraints in the DT for the DesignWare controller to reflect
hardware that's been seen.
There's several fixes for the Cadence QuadSPI driver since a fix
during the last release made some existing issues with error handling
during probe more readily visible"
* tag 'spi-fix-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: mt65xx: Use IRQF_ONESHOT with threaded IRQ
spi: dt-bindings: snps,dw-abp-ssi: Allow up to 16 chip-selects
spi: cadence-quadspi: Fix clock disable on probe failure path
spi: cadence-quadspi: Add error logging for DMA request failure
spi: fsl-cpm: Check length parity before switching to 16 bit mode
spi: mpfs: Fix an error handling path in mpfs_spi_probe()
// Three instructions to ajust the stack, read the per-cpu canary
// and copy it to 8(%rsp) ffffffff82067a3a: 48 83 ec 10 sub $0x10,%rsp ffffffff82067a3e: 65 48 8b 05 da 15 45 02 mov %gs:0x24515da(%rip),%rax # <__stack_chk_guard> ffffffff82067a46: 48 89 44 24 08 mov %rax,0x8(%rsp)
Linus Torvalds [Sat, 20 Dec 2025 20:45:35 +0000 (12:45 -0800)]
Merge tag 'xfs-fixes-6.19-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"This contains a few fixes for zoned devices support, an UAF and a
compiler warning, and some cleaning up"
* tag 'xfs-fixes-6.19-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix the zoned RT growfs check for zone alignment
xfs: validate that zoned RT devices are zone aligned
xfs: fix XFS_ERRTAG_FORCE_ZERO_RANGE for zoned file system
xfs: fix a memory leak in xfs_buf_item_init()
xfs: fix stupid compiler warning
xfs: fix a UAF problem in xattr repair
xfs: ignore discard return value
Linus Torvalds [Sat, 20 Dec 2025 20:22:53 +0000 (12:22 -0800)]
Merge tag 'hwmon-for-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ltc4282: Fix reset_history file permissions
- ds620: Update broken Datasheet URL in driver documentation
- tmp401: Fix overflow caused by default conversion rate value
- ibmpex: Fix use-after-free in high/low store
- dell-smm: Limit fan multiplier to avoid overflow
* tag 'hwmon-for-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ltc4282): Fix reset_history file permissions
hwmon: (DS620) Update broken Datasheet URL in driver documentation
hwmon: (tmp401) fix overflow caused by default conversion rate value
hwmon: (ibmpex) fix use-after-free in high/low store
hwmon: (dell-smm) Limit fan multiplier to avoid overflow
Linus Torvalds [Sat, 20 Dec 2025 20:18:32 +0000 (12:18 -0800)]
Merge tag 'mmc-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- sdhci-esdhc-imx: Fix build problem dependency
- sdhci-of-arasan: Increase card-detect stable timeout to 2 seconds
- sdhci-of-aspeed: Fix DT doc for missing properties
* tag 'mmc-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: add alternate ARCH_S32 dependency to Kconfig
mmc: sdhci-of-arasan: Increase CD stable timeout to 2 seconds
dt-bindings: mmc: sdhci-of-aspeed: Switch ref to sdhci-common.yaml
Linus Torvalds [Sat, 20 Dec 2025 20:08:02 +0000 (12:08 -0800)]
Merge tag 'drm-fixes-2025-12-20' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"rc2 fixes for the week, mostly xe, with amdgpu as usual. Then a
smattering of small fixes across the core/tests/panel and amdxdna.
I expect things will be quiet for rc3/4 as teams take a break, and I'm
travelling but will keep an eye on things.
core:
- fix gem handle leak on DRM_IOCTL_GEM_CHANGE_HANDLE
xe:
- Limit num_syncs to prevent oversized kernel allocations
- Disallow 0 OA property values
- Disallow 0 EU stall property values
- Fix kobject leak
- Workaround
- Loop variable reference fix
- Fix a CONFIG corner-case incorrect number of argument
- Skip reason prefix while emitting array
- VF migration fix
- Fix context in mei interrupt top half
- Don't include the CCS metadata in the dma-buf sg-table
- VF queueing recovery work fix
- Increase TDF timeout
- GT reset registers vs scheduler ordering fix
- Adjust long-running workload timeslices
- Always set OA_OAGLBCTXCTRL_COUNTER_RESUME
- Fix a return value
- Drop preempt-fences when destroying imported dma-bufs
- Use usleep_range for accurate long-running workload timeslicing
* tag 'drm-fixes-2025-12-20' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
drm/xe: Use usleep_range for accurate long-running workload timeslicing
drm/xe: Drop preempt-fences when destroying imported dma-bufs.
drm/xe/eustall: Disallow 0 EU stall property values
drm/xe/oa: Disallow 0 OA property values
drm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()
drm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUME
drm/xe: Adjust long-running workload timeslices to reasonable values
drm/xe/oa: Limit num_syncs to prevent oversized allocations
drm/xe: Limit num_syncs to prevent oversized allocations
drm/amdkfd: Fix improper NULL termination of queue restore SMI event string
drm/amd/pm: restore SCLK settings after S0ix resume
drm/amdgpu: fix a job->pasid access race in gpu recovery
drm/amd/display: Fix DP no audio issue
drm/amd/display: Fix scratch registers offsets for DCN351
drm/amd/display: Fix scratch registers offsets for DCN35
drm/amd: Resume the device in thaw() callback when console suspend is disabled
drm/panel: visionox-rm69299: Depend on BACKLIGHT_CLASS_DEVICE
accel/amdxdna: Block running under a hypervisor
drm/panel: sony-td4353-jdi: Enable prepare_prev_first
drm/xe: Restore engine registers before restarting schedulers after GT reset
...
Linus Torvalds [Sat, 20 Dec 2025 19:59:06 +0000 (11:59 -0800)]
Merge tag 'linux_kselftest-kunit-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
"Drop unused parameter from kunit_device_register_internal and make
FAULT_TEST default to n when PANIC_ON_OOPS"
* tag 'linux_kselftest-kunit-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: make FAULT_TEST default to n when PANIC_ON_OOPS
kunit: Drop unused parameter from kunit_device_register_internal
Linus Torvalds [Sat, 20 Dec 2025 19:40:51 +0000 (11:40 -0800)]
Merge tag 'mips-fixes_6.19_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix build error for Alchemy
- Fix reference leak
* tag 'mips-fixes_6.19_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Fix a reference leak bug in ip22_check_gio()
MIPS: Alchemy: Remove bogus static/inline specifiers
Linus Torvalds [Sat, 20 Dec 2025 19:34:37 +0000 (11:34 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"Two left-over updates that could not go into -rc1 due to conflicts
with other series:
- Simplify checks in arch_kfence_init_pool() since
force_pte_mapping() already takes BBML2-noabort (break-before-make
Level 2 with no aborts generated) into account
- Remove unneeded SVE/SME fallback preserve/store handling in the
arm64 EFI. With the recent updates, the fallback path is only taken
for EFI runtime calls from hardirq or NMI contexts. In practice,
this only happens under panic/oops/emergency_restart() and no
restoring of the user state expected.
There's a corresponding lkdtm update to trigger a BUG() or panic()
from hardirq context together with a fixup not to confuse
clang/objtool about the control flow
GCS (guarded control stacks) fix: flush the GCS locking state on exec,
otherwise the new task will not be able to enable GCS (locked as
disabled)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
lkdtm/bugs: Do not confuse the clang/objtool with busy wait loop
arm64/gcs: Flush the GCS locking state on exec
arm64/efi: Remove unneeded SVE/SME fallback preserve/store handling
lkdtm/bugs: Add cases for BUG and PANIC occurring in hardirq context
arm64: mm: Simplify check in arch_kfence_init_pool()
Linus Torvalds [Sat, 20 Dec 2025 19:31:37 +0000 (11:31 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm fixes from Paolo Bonzini:
"x86 fixes. Everyone else is already in holiday mood apparently.
- Add a missing 'break' to fix param parsing in the rseq selftest
- Apply runtime updates to the _current_ CPUID when userspace is
setting CPUID, e.g. as part of vCPU hotplug, to fix a false
positive and to avoid dropping the pending update
- Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as
it's not supported by KVM and leads to a use-after-free due to KVM
failing to unbind the memslot from the previously-associated
guest_memfd instance
- Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for
supporting flags-only changes on KVM_MEM_GUEST_MEMFD memlslots,
e.g. for dirty logging
- Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is
defined as -1ull (a 64-bit value)
- Update SVI when activating APICv to fix a bug where a
post-activation EOI for an in-service IRQ would effective be lost
due to SVI being stale
- Immediately refresh APICv controls (if necessary) on a nested
VM-Exit instead of deferring the update via KVM_REQ_APICV_UPDATE,
as the request is effectively ignored because KVM thinks the vCPU
already has the correct APICv settings"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit
KVM: VMX: Update SVI during runtime APICv activation
KVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN)
KVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-Exits
KVM: Harden and prepare for modifying existing guest_memfd memslots
KVM: Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot
KVM: selftests: Add a CPUID testcase for KVM_SET_CPUID2 with runtime updates
KVM: x86: Apply runtime updates to current CPUID during KVM_SET_CPUID{,2}
KVM: selftests: Add missing "break" in rseq_test's param parsing
Linus Torvalds [Sat, 20 Dec 2025 19:24:42 +0000 (11:24 -0800)]
Merge tag 'slab-for-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- A stable fix for a missing tag reset that can happen in
kfree_nolock() with KASAN+SLUB_TINY configs (Deepanshu Kartikey)
* tag 'slab-for-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: reset KASAN tag in defer_free() before accessing freed memory
Linus Torvalds [Sat, 20 Dec 2025 19:18:32 +0000 (11:18 -0800)]
Merge tag 'iommu-fixes-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- iommupt: Fix an oops found by syzcaller in the new generic
IO-page-table code.
- AMD-Vi: Fix IO_PAGE_FAULTs in kdump kernels triggered by re-using
domain-ids from previous kernel.
* tag 'iommu-fixes-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
amd/iommu: Make protection domain ID functions non-static
amd/iommu: Preserve domain ids inside the kdump kernel
iommupt: Return ERR_PTR from _table_alloc()
Linus Torvalds [Sat, 20 Dec 2025 17:48:56 +0000 (09:48 -0800)]
Merge tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- ublk selftests for missing coverage
- two fixes for the block integrity code
- fix for the newly added newly added PR read keys ioctl, limiting the
memory that can be allocated
- work around for a deadlock that can occur with ublk, where partition
scanning ends up recursing back into file closure, which needs the
same mutex grabbed. Not the prettiest thing in the world, but an
acceptable work-around until we can eliminate the reliance on
disk->open_mutex for this
- fix for a race between enabling writeback throttling and new IO
submissions
- move a bit of bio flag handling code. No changes, but needed for a
patchset for a future kernel
- fix for an init time id leak failure in rnbd
- loop/zloop state check fix
* tag 'block-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: validate interval_exp integrity limit
block: validate pi_offset integrity limit
block: rnbd-clt: Fix leaked ID in init_dev()
ublk: fix deadlock when reading partition table
block: add allocation size check in blkdev_pr_read_keys()
Documentation: admin-guide: blockdev: replace zone_capacity with zone_capacity_mb when creating devices
zloop: use READ_ONCE() to read lo->lo_state in queue_rq path
loop: use READ_ONCE() to read lo->lo_state without locking
block: fix race between wbt_enable_default and IO submission
selftests: ublk: add user copy test cases
selftests: ublk: add support for user copy to kublk
selftests: ublk: forbid multiple data copy modes
selftests: ublk: don't share backing files between ublk servers
selftests: ublk: use auto_zc for PER_IO_DAEMON tests in stress_04
selftests: ublk: fix fio arguments in run_io_and_recover()
selftests: ublk: remove unused ios map in seq_io.bt
selftests: ublk: correct last_rw map type in seq_io.bt
selftests: ublk: fix overflow in ublk_queue_auto_zc_fallback()
block: move around bio flagging helpers
Linus Torvalds [Sat, 20 Dec 2025 17:38:56 +0000 (09:38 -0800)]
Merge tag 'io_uring-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix this week, for an issue with the calculation of the
number of segments in the ublk kbuf import path"
* tag 'io_uring-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: fix nr_segs calculation in io_import_kbuf
Catalin Marinas [Fri, 19 Dec 2025 15:09:09 +0000 (15:09 +0000)]
lkdtm/bugs: Do not confuse the clang/objtool with busy wait loop
Since commit eb972eab0794 ("lkdtm/bugs: Add cases for BUG and PANIC
occurring in hardirq context"), building with clang for x86_64 results
in the following warnings:
vmlinux.o: warning: objtool: lkdtm_PANIC_IN_HARDIRQ(): unexpected end of section .text.lkdtm_PANIC_IN_HARDIRQ
vmlinux.o: warning: objtool: lkdtm_BUG_IN_HARDIRQ(): unexpected end of section .text.lkdtm_BUG_IN_HARDIRQ
caused by busy "while (wait_for_...);" loops. Add READ_ONCE() and
cpu_relax() to better indicate the intention and avoid any unwanted
compiler optimisations.
Fixes: eb972eab0794 ("lkdtm/bugs: Add cases for BUG and PANIC occurring in hardirq context") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512190111.jxFSqxUH-lkp@intel.com/ Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Sairaj Kodilkar [Fri, 21 Nov 2025 09:11:16 +0000 (14:41 +0530)]
amd/iommu: Make protection domain ID functions non-static
So that both iommu.c and init.c can utilize them. Also define a new
function 'pdom_id_destroy()' to destroy 'pdom_ids' instead of directly
calling ida functions.
Sairaj Kodilkar [Fri, 21 Nov 2025 09:11:15 +0000 (14:41 +0530)]
amd/iommu: Preserve domain ids inside the kdump kernel
Currently AMD IOMMU driver does not reserve domain ids programmed in the
DTE while reusing the device table inside kdump kernel. This can cause
reallocation of these domain ids for newer domains that are created by
the kdump kernel, which can lead to potential IO_PAGE_FAULTs
Hence reserve these ids inside pdom_ids.
Fixes: 38e5f33ee359 ("iommu/amd: Reuse device table for kdump") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reported-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Junjie Cao [Fri, 19 Dec 2025 05:56:59 +0000 (21:56 -0800)]
Input: ti_am335x_tsc - fix off-by-one error in wire_order validation
The current validation 'wire_order[i] > ARRAY_SIZE(config_pins)' allows
wire_order[i] to equal ARRAY_SIZE(config_pins), which causes out-of-bounds
access when used as index in 'config_pins[wire_order[i]]'.
Since config_pins has 4 elements (indices 0-3), the valid range for
wire_order should be 0-3. Fix the off-by-one error by using >= instead
of > in the validation check.
Signed-off-by: Junjie Cao <junjie.cao@intel.com> Link: https://patch.msgid.link/20251114062817.852698-1-junjie.cao@intel.com Fixes: bb76dc09ddfc ("input: ti_am33x_tsc: Order of TSC wires, made configurable") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Gergo Koteles [Thu, 13 Nov 2025 16:02:58 +0000 (17:02 +0100)]
Input: add ABS_SND_PROFILE
ABS_SND_PROFILE used to describe the state of a multi-value sound profile
switch. This will be used for the alert-slider on OnePlus phones or other
phones.
Profile values added as SND_PROFLE_(SILENT|VIBRATE|RING) identifiers
to input-event-codes.h so they can be used from DTS.