git.ipfire.org Git - thirdparty/kernel/linux.git/log

vfio: selftests: Rename vfio_util.h to libvfio.h

Rename vfio_util.h to libvfio.h to match the name of libvfio.mk.

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-15-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Stop passing device for IOMMU operations

Drop the struct vfio_pci_device wrappers for IOMMU map/unmap functions
and require tests to directly call iommu_map(), iommu_unmap(), etc. This
results in more concise code, and also makes it clear the map operations
are happening on a struct iommu, not necessarily on a specific device,
especially when multi-device tests are introduced.

Do the same for iova_allocator_init() as that function only needs the
struct iommu, not struct vfio_pci_device.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-14-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Move IOVA allocator into iova_allocator.c

Move the IOVA allocator into its own file, to provide better separation
between the allocator and the struct vfio_pci_device helper code.

The allocator could go into iommu.c, but it is standalone enough that a
separate file seems cleaner. This also continues the trend of having a
.c for every major object in VFIO selftests (vfio_pci_device.c,
vfio_pci_driver.c, iommu.c, and now iova_allocator.c).

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-13-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Move IOMMU library code into iommu.c

Move all the IOMMU related library code into their own file iommu.c.
This provides a better separation between the vfio_pci_device helper
code and the iommu code.

No function change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-12-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Rename struct vfio_dma_region to dma_region

Rename struct vfio_dma_region to dma_region. This is in preparation for
separating the VFIO PCI device library code from the IOMMU library code.
This name change also better reflects the fact that DMA mappings can be
managed by either VFIO or IOMMUFD. i.e. the "vfio_" prefix is
misleading.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-11-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Upgrade driver logging to dev_err()

Upgrade various logging in the VFIO selftests drivers from dev_info() to
dev_err(). All of these logs indicate scenarios that may be unexpected.
For example, the logging during probing indicates matching devices but
that aren't supported by the driver. And the memcpy errors can indicate
a problem if the caller was not trying to do something like exercise I/O
fault handling. Exercising I/O fault handling is certainly a valid thing
to do, but the driver can't infer the caller's expectations, so better
to just log with dev_err().

Suggested-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-10-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Prefix logs with device BDF where relevant

Prefix log messages with the device's BDF where relevant. This will help
understanding VFIO selftests logs when tests are run with multiple
devices.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-9-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Eliminate overly chatty logging

Eliminate overly chatty logs that are printed during almost every test.
These logs are adding more noise than value. If a test cares about this
information it can log it itself. This is especially true as the VFIO
selftests gains support for multiple devices in a single test (which
multiplies all these logs).

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-8-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Support multiple devices in the same container/iommufd

Support tests that want to add multiple devices to the same
container/iommufd by decoupling struct vfio_pci_device from
struct iommu.

Multi-devices tests can now put multiple devices in the same
container/iommufd like so:

  iommu = iommu_init(iommu_mode);

  device1 = vfio_pci_device_init(bdf1, iommu);
  device2 = vfio_pci_device_init(bdf2, iommu);
  device3 = vfio_pci_device_init(bdf3, iommu);

  ...

  vfio_pci_device_cleanup(device3);
  vfio_pci_device_cleanup(device2);
  vfio_pci_device_cleanup(device1);

  iommu_cleanup(iommu);

To account for the new separation of vfio_pci_device and iommu, update
existing tests to initialize and cleanup a struct iommu.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-7-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Introduce struct iommu

Introduce struct iommu, which logically represents either a VFIO
container or an iommufd IOAS, depending on which IOMMU mode is used by
the test.

This will be used in a subsequent commit to allow devices to be added to
the same container/iommufd.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-6-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode

Rename struct vfio_iommu_mode to struct iommu_mode since the mode can
include iommufd. This also prepares for splitting out all the IOMMU code
into its own structs/helpers/files which are independent from the
vfio_pci_device code.

No function change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-5-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Allow passing multiple BDFs on the command line

Add support for passing multiple device BDFs to a test via the command
line. This is a prerequisite for multi-device tests.

Single-device tests can continue using vfio_selftests_get_bdf(), which
will continue to return argv[argc - 1] (if it is a BDF string), or the
environment variable $VFIO_SELFTESTS_BDF otherwise.

For multi-device tests, a new helper called vfio_selftests_get_bdfs() is
introduced which will return an array of all BDFs found at the end of
argv[], as well as the number of BDFs found (passed back to the caller
via argument). The array of BDFs returned does not need to be freed by
the caller.

The environment variable VFIO_SELFTESTS_BDF continues to support only a
single BDF for the time being.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-4-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Split run.sh into separate scripts

Split run.sh into separate scripts (setup.sh, run.sh, cleanup.sh) to
enable multi-device testing, and prepare for VFIO selftests
automatically detecting which devices to use for testing by storing
device metadata on the filesystem.

- setup.sh takes one or more BDFs as arguments and sets up each device.
   Metadata about each device is stored on the filesystem in the
   directory:

   ${TMPDIR:-/tmp}/vfio-selftests-devices

   Within this directory is a directory for each BDF, and then files in
   those directories that cleanup.sh uses to cleanup the device.

- run.sh runs a selftest by passing it the BDFs of all set up devices.

- cleanup.sh takes zero or more BDFs as arguments and cleans up each
   device. If no BDFs are provided, it cleans up all devices.

This split enables multi-device testing by allowing multiple BDFs to be
set up and passed into tests:

For example:

  $ tools/testing/selftests/vfio/scripts/setup.sh <BDF1> <BDF2>
  $ tools/testing/selftests/vfio/scripts/setup.sh <BDF3>
  $ tools/testing/selftests/vfio/scripts/run.sh echo
  <BDF1> <BDF2> <BDF3>
  $ tools/testing/selftests/vfio/scripts/cleanup.sh

In the future, VFIO selftests can automatically detect set up devices by
inspecting ${TMPDIR:-/tmp}/vfio-selftests-devices. This will avoid the
need for the run.sh script.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-3-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Move run.sh into scripts directory

Move run.sh in a new sub-directory scripts/. This directory will be used
to house various helper scripts to be used by humans and automation for
running VFIO selftests.

Opportunistically also switch run.sh from TEST_PROGS_EXTENDED to
TEST_FILES. The former is for actual test executables that are just not
run by default. TEST_FILES is a better fit for helper scripts.

No functional change intended.

Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-2-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

Merge tag 'vfio-v6.18-rc6' into v6.19/vfio/next

Merge mainline vfio-selftest updates for ongoing v6.19 work.

Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/nvgrace-gpu: wait for the GPU mem to be ready

Speculative prefetches from CPU to GPU memory until the GPU is
ready after reset can cause harmless corrected RAS events to
be logged on Grace systems. It is thus preferred that the
mapping not be re-established until the GPU is ready post reset.

The GPU readiness can be checked through BAR0 registers similar
to the checking at the time of device probe.

It can take several seconds for the GPU to be ready. So it is
desirable that the time overlaps as much of the VM startup as
possible to reduce impact on the VM bootup time. The GPU
readiness state is thus checked on the first fault/huge_fault
request or read/write access which amortizes the GPU readiness
time.

The first fault and read/write checks the GPU state when the
reset_done flag - which denotes whether the GPU has just been
reset. The memory_lock is taken across map/access to avoid
races with GPU reset.

Also check if the memory is enabled, before waiting for GPU
to be ready. Otherwise the readiness check would block for 30s.

Lastly added PM handling wrapping on read/write access.

Cc: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Alex Williamson <alex@shazbot.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Vikram Sethi <vsethi@nvidia.com>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-7-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/nvgrace-gpu: Inform devmem unmapped after reset

Introduce a new flag reset_done to notify that the GPU has just
been reset and the mapping to the GPU memory is zapped.

Implement the reset_done handler to set this new variable. It
will be used later in the patches to wait for the GPU memory
to be ready before doing any mapping or access.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-6-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/nvgrace-gpu: split the code to wait for GPU ready

Split the function that check for the GPU device being ready on
the probe.

Move the code to wait for the GPU to be ready through BAR0 register
reads to a separate function. This would help reuse the code.

This also fixes a bug where the return status in case of timeout
gets overridden by return from pci_enable_device. With the fix,
a timeout generate an error as initially intended.

Fixes: d85f69d520e6 ("vfio/nvgrace-gpu: Check the HBM training and C2C link status")
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-5-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: use vfio_pci_core_setup_barmap to map bar in mmap

Remove code duplication in vfio_pci_core_mmap by calling
vfio_pci_core_setup_barmap to perform the bar mapping.

No functional change is intended.

Cc: Donald Dutile <ddutile@redhat.com>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-4-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/nvgrace-gpu: Add support for huge pfnmap

NVIDIA's Grace based systems have large device memory. The device
memory is mapped as VM_PFNMAP in the VMM VMA. The nvgrace-gpu
module could make use of the huge PFNMAP support added in mm [1].

To make use of the huge pfnmap support, fault/huge_fault ops
based mapping mechanism needs to be implemented. Currently nvgrace-gpu
module relies on remap_pfn_range to do the mapping during VM bootup.
Replace it to instead rely on fault and use vfio_pci_vmf_insert_pfn
to setup the mapping.

Moreover to enable huge pfnmap, nvgrace-gpu module is updated by
adding huge_fault ops implementation. The implementation establishes
mapping according to the order request. Note that if the PFN or the
VMA address is unaligned to the order, the mapping fallbacks to
the PTE level.

Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/
Cc: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Alex Williamson <alex@shazbot.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Vikram Sethi <vsethi@nvidia.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-3-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: refactor vfio_pci_mmap_huge_fault function

Refactor vfio_pci_mmap_huge_fault to take out the implementation
to map the VMA to the PTE/PMD/PUD as a separate function.

Export the new function to be used by nvgrace-gpu module.

Move the alignment check code to verify that pfn and VMA VA is
aligned to the page order to the header file and make it inline.

No functional change is intended.

Cc: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Alex Williamson <alex@shazbot.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-2-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

dma-buf: fix integer overflow in fill_sg_entry() for buffers >= 8GiB

fill_sg_entry() splits large DMA buffers into multiple scatter-gather
entries, each holding up to UINT_MAX bytes. When calculating the DMA
address for entries beyond the second one, the expression (i * UINT_MAX)
causes integer overflow due to 32-bit arithmetic.

This manifests when the input arg length >= 8 GiB results in looping for
i >= 2.

Fix by casting i to dma_addr_t before multiplication.

Fixes: 3aa31a8bb11e ("dma-buf: provide phys_vec to scatter-gather mapping routine")
Signed-off-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Link: https://lore.kernel.org/r/20251125-dma-buf-overflow-v1-1-b70ea1e6c4ba@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Use RCU for error/request triggers to avoid circular locking

Thanks to a device generating an ACS violation during bus reset,
lockdep reported the following circular locking issue:

CPU0: SET_IRQS (MSI/X): holds igate, acquires memory_lock
CPU1: HOT_RESET: holds memory_lock, acquires pci_bus_sem
CPU2: AER: holds pci_bus_sem, acquires igate

This results in a potential 3-way deadlock.

Remove the pci_bus_sem->igate leg of the triangle by using RCU
to peek at the eventfd rather than locking it with igate.

Fixes: 3be3a074cf5b ("vfio-pci: Don't use device_lock around AER interrupt setup")
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20251124223623.2770706-1-alex@shazbot.org
Signed-off-by: Alex Williamson <alex@shazbot.org>

Merge tag 'vfio-v6.19-dma-buf-v9+' into v6.19/vfio/next

[v9] vfio/pci: Allow MMIO regions to be exported through dma-buf

https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com

Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/nvgrace: Support get_dmabuf_phys

Call vfio_pci_core_fill_phys_vec() with the proper physical ranges for the
synthetic BAR 2 and BAR 4 regions. Otherwise use the normal flow based on
the PCI bar.

This demonstrates a DMABUF that follows the region info report to only
allow mapping parts of the region that are mmapable. Since the BAR is
power of two sized and the "CXL" region is just page aligned the there can
be a padding region at the end that is not mmaped or passed into the
DMABUF.

The "CXL" ranges that are remapped into BAR 2 and BAR 4 areas are not PCI
MMIO, they actually run over the CXL-like coherent interconnect and for
the purposes of DMA behave identically to DRAM. We don't try to model this
distinction between true PCI BAR memory that takes a real PCI path and the
"CXL" memory that takes a different path in the p2p framework for now.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-11-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Add dma-buf export support for MMIO regions

Add support for exporting PCI device MMIO regions through dma-buf,
enabling safe sharing of non-struct page memory with controlled
lifetime management. This allows RDMA and other subsystems to import
dma-buf FDs and build them into memory regions for PCI P2P operations.

The implementation provides a revocable attachment mechanism using
dma-buf move operations. MMIO regions are normally pinned as BARs
don't change physical addresses, but access is revoked when the VFIO
device is closed or a PCI reset is issued. This ensures kernel
self-defense against potentially hostile userspace.

Currently VFIO can take MMIO regions from the device's BAR and map
them into a PFNMAP VMA with special PTEs. This mapping type ensures
the memory cannot be used with things like pin_user_pages(), hmm, and
so on. In practice only the user process CPU and KVM can safely make
use of these VMA. When VFIO shuts down these VMAs are cleaned by
unmap_mapping_range() to prevent any UAF of the MMIO beyond driver
unbind.

However, VFIO type 1 has an insecure behavior where it uses
follow_pfnmap_*() to fish a MMIO PFN out of a VMA and program it back
into the IOMMU. This has a long history of enabling P2P DMA inside
VMs, but has serious lifetime problems by allowing a UAF of the MMIO
after the VFIO driver has been unbound.

Introduce DMABUF as a new safe way to export a FD based handle for the
MMIO regions. This can be consumed by existing DMABUF importers like
RDMA or DRM without opening an UAF. A following series will add an
importer to iommufd to obsolete the type 1 code and allow safe
UAF-free MMIO P2P in VM cases.

DMABUF has a built in synchronous invalidation mechanism called
move_notify. VFIO keeps track of all drivers importing its MMIO and
can invoke a synchronous invalidation callback to tell the importing
drivers to DMA unmap and forget about the MMIO pfns. This process is
being called revoke. This synchronous invalidation fully prevents any
lifecycle problems. VFIO will do this before unbinding its driver
ensuring there is no UAF of the MMIO beyond the driver lifecycle.

Further, VFIO has additional behavior to block access to the MMIO
during things like Function Level Reset. This is because some poor
platforms may experience a MCE type crash when touching MMIO of a PCI
device that is undergoing a reset. Today this is done by using
unmap_mapping_range() on the VMAs. Extend that into the DMABUF world
and temporarily revoke the MMIO from the DMABUF importers during FLR
as well. This will more robustly prevent an errant P2P from possibly
upsetting the platform.

A DMABUF FD is a preferred handle for MMIO compared to using something
like a pgmap because:
- VFIO is supported, including its P2P feature, on archs that don't
   support pgmap
- PCI devices have all sorts of BAR sizes, including ones smaller
   than a section so a pgmap cannot always be created
- It is undesirable to waste a lot of memory for struct pages,
   especially for a case like a GPU with ~100GB of BAR size
- We want a synchronous revoke semantic to support FLR with light
   hardware requirements

Use the P2P subsystem to help generate the DMA mapping. This is a
significant upgrade over the abuse of dma_map_resource() that has
historically been used by DMABUF exporters. Experience with an OOT
version of this patch shows that real systems do need this. This
approach deals with all the P2P scenarios:
- Non-zero PCI bus_offset
- ACS flags routing traffic to the IOMMU
- ACS flags that bypass the IOMMU - though vfio noiommu is required
   to hit this.

There will be further work to formalize the revoke semantic in
DMABUF. For now this acts like a move_notify dynamic exporter where
importer fault handling will get a failure when they attempt to map.
This means that only fully restartable fault capable importers can
import the VFIO DMABUFs. A future revoke semantic should open this up
to more HW as the HW only needs to invalidate, not handle restartable
faults.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-10-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Enable peer-to-peer DMA transactions by default

Make sure that all VFIO PCI devices have peer-to-peer capabilities
enables, so we would be able to export their MMIO memory through DMABUF,

VFIO has always supported P2P mappings with itself. VFIO type 1
insecurely reads PFNs directly out of a VMA's PTEs and programs them
into the IOMMU allowing any two VFIO devices to perform P2P to each
other.

All existing VMMs use this capability to export P2P into a VM where
the VM could setup any kind of DMA it likes. Projects like DPDK/SPDK
are also known to make use of this, though less frequently.

As a first step to more properly integrating VFIO with the P2P
subsystem unconditionally enable P2P support for VFIO PCI devices. The
struct p2pdma_provider will act has a handle to the P2P subsystem to
do things like DMA mapping.

While real PCI devices have to support P2P (they can't even tell if an
IOVA is P2P or not) there may be fake PCI devices that may trigger
some kind of catastrophic system failure. To date VFIO has never
tripped up on such a case, but if one is discovered the plan is to add
a PCI quirk and have pcim_p2pdma_init() fail. This will fully block
the broken device throughout any users of the P2P subsystem in the
kernel.

Thus P2P through DMABUF will follow the historical VFIO model and be
unconditionally enabled by vfio-pci.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-9-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Share the core device pointer while invoking feature functions

There is no need to share the main device pointer (struct vfio_device *)
with all the feature functions as they only need the core device
pointer. Therefore, extract the core device pointer once in the
caller (vfio_pci_core_ioctl_feature) and share it instead.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-8-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Export vfio device get and put registration helpers

These helpers are useful for managing additional references taken
on the device from other associated VFIO modules.

Original-patch-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-7-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

dma-buf: provide phys_vec to scatter-gather mapping routine

Add dma_buf_phys_vec_to_sgt() and dma_buf_free_sgt() helpers to convert
an array of MMIO physical address ranges into scatter-gather tables with
proper DMA mapping.

These common functions are a starting point and support any PCI
drivers creating mappings from their BAR's MMIO addresses. VFIO is one
case, as shortly will be RDMA. We can review existing DRM drivers to
refactor them separately. We hope this will evolve into routines to
help common DRM that include mixed CPU and MMIO mappings.

Compared to the dma_map_resource() abuse this implementation handles
the complicated PCI P2P scenarios properly, especially when an IOMMU
is enabled:

- Direct bus address mapping without IOVA allocation for
   PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This
   happens if the IOMMU is enabled but the PCIe switch ACS flags allow
   transactions to avoid the host bridge.

   Further, this handles the slightly obscure, case of MMIO with a
   phys_addr_t that is different from the physical BAR programming
   (bus offset). The phys_addr_t is converted to a dma_addr_t and
   accommodates this effect. This enables certain real systems to
   work, especially on ARM platforms.

- Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO
   attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE).
   This happens when the IOMMU is enabled and the ACS flags are forcing
   all traffic to the IOMMU - ie for virtualization systems.

- Cases where P2P is not supported through the host bridge/CPU. The
   P2P subsystem is the proper place to detect this and block it.

Helper functions fill_sg_entry() and calc_sg_nents() handle the
scatter-gather table construction, splitting large regions into
UINT_MAX-sized chunks to fit within sg->length field limits.

Since the physical address based DMA API forbids use of the CPU list
of the scatterlist this will produce a mangled scatterlist that has
a fully zero-length and NULL'd CPU list. The list is 0 length,
all the struct page pointers are NULL and zero sized. This is stronger
and more robust than the existing mangle_sg_table() technique. It is
a future project to migrate DMABUF as a subsystem away from using
scatterlist for this data structure.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Document DMABUF model

Reflect latest changes in p2p implementation to support DMABUF lifecycle.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-5-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function

Provide an access to pci_p2pdma_map_type() function to allow subsystems
to determine the appropriate mapping type for P2PDMA transfers between
a provider and target device.

The pci_p2pdma_map_type() function is the core P2P layer version of
the existing public, but struct page focused, pci_p2pdma_state()
function. It returns the same result. It is required to use the p2p
subsystem from drivers that don't use the struct page layer.

Like __pci_p2pdma_update_state() it is not an exported function. The
idea is that only subsystem code will implement mapping helpers for
taking in phys_addr_t lists, this is deliberately not made accessible
to every driver to prevent abuse.

Following patches will use this function to implement a shared DMA
mapping helper for DMABUF.

Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-4-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation

Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA
functionality from the optional memory allocation layer. This creates a
two-tier architecture:

The core layer provides P2P mapping functionality for physical addresses
based on PCI device MMIO BARs and integrates with the DMA API for
mapping operations. This layer is required for all P2PDMA users.

The optional upper layer provides memory allocation capabilities
including gen_pool allocator, struct page support, and sysfs interface
for user space access.

This separation allows subsystems like DMABUF to use only the core P2P
mapping functionality without the overhead of memory allocation features
they don't need. The core functionality is now available through the
new pcim_p2pdma_provider() function that returns a p2pdma_provider
structure.

Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-3-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Simplify bus address mapping API

Update the pci_p2pdma_bus_addr_map() function to take a direct pointer
to the p2pdma_provider structure instead of the pci_p2pdma_map_state.
This simplifies the API by removing the need for callers to extract
the provider from the state structure.

The change updates all callers across the kernel (block layer, IOMMU,
DMA direct, and HMM) to pass the provider pointer directly, making
the code more explicit and reducing unnecessary indirection. This
also removes the runtime warning check since callers now have direct
control over which provider they use.

Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-2-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Separate the mmap() support from the core logic

Currently the P2PDMA code requires a pgmap and a struct page to
function. The was serving three important purposes:

- DMA API compatibility, where scatterlist required a struct page as
input

- Life cycle management, the percpu_ref is used to prevent UAF during
device hot unplug

- A way to get the P2P provider data through the pci_p2pdma_pagemap

The DMA API now has a new flow, and has gained phys_addr_t support, so
it no longer needs struct pages to perform P2P mapping.

Lifecycle management can be delegated to the user, DMABUF for instance
has a suitable invalidation protocol that does not require struct page.

Finding the P2P provider data can also be managed by the caller
without need to look it up from the phys_addr.

Split the P2PDMA code into two layers. The optional upper layer,
effectively, provides a way to mmap() P2P memory into a VMA by
providing struct page, pgmap, a genalloc and sysfs.

The lower layer provides the actual P2P infrastructure and is wrapped
up in a new struct p2pdma_provider. Rework the mmap layer to use new
p2pdma_provider based APIs.

Drivers that do not want to put P2P memory into VMA's can allocate a
struct p2pdma_provider after probe() starts and free it before
remove() completes. When DMA mapping the driver must convey the struct
p2pdma_provider to the DMA mapping code along with a phys_addr of the
MMIO BAR slice to map. The driver must ensure that no DMA mapping
outlives the lifetime of the struct p2pdma_provider.

The intended target of this new API layer is DMABUF. There is usually
only a single p2pdma_provider for a DMABUF exporter. Most drivers can
establish the p2pdma_provider during probe, access the single instance
during DMABUF attach and use that to drive the DMA mapping.

DMABUF provides an invalidation mechanism that can guarantee all DMA
is halted and the DMA mappings are undone prior to destroying the
struct p2pdma_provider. This ensures there is no UAF through DMABUFs
that are lingering past driver removal.

The new p2pdma_provider layer cannot be used to create P2P memory that
can be mapped into VMA's, be used with pin_user_pages(), O_DIRECT, and
so on. These use cases must still use the mmap() layer. The
p2pdma_provider layer is principally for DMABUF-like use cases where
DMABUF natively manages the life cycle and access instead of
vmas/pin_user_pages()/struct page.

In addition, remove the bus_off field from pci_p2pdma_map_state since
it duplicates information already available in the pgmap structure.
The bus_offset is only used in one location (pci_p2pdma_bus_addr_map)
and is always identical to pgmap->bus_offset.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-1-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Remove the get_region_info op

No driver uses it now, all are using get_region_info_caps().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/22-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Move the remaining drivers to get_region_info_caps

Remove the duplicate code and change info to a pointer. caps are not used.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/21-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/platform: Convert to get_region_info_caps

Remove the duplicate code and change info to a pointer. caps are not used.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Convert all PCI drivers to get_region_info_caps

Since the core function signature changes it has to flow up to all
drivers.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/19-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/ccw: Convert to get_region_info_caps

Remove the duplicate code and flatten the call chain.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/18-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/gvt: Convert to get_region_info_caps

Remove the duplicate code and change info to a pointer.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/17-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/mbochs: Convert mbochs to use vfio_info_add_capability()

This driver open codes the cap chain manipulations. Instead use
vfio_info_add_capability() and the get_region_info_caps() op.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/16-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Add get_region_info_caps op

This op does the copy to/from user for the info and can return back
a cap chain through a vfio_info_cap * result.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/15-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Require drivers to implement get_region_info

Remove the fallback through the ioctl callback, no drivers use this now.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/14-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/gvt: Provide a get_region_info op

Move it out of intel_vgpu_ioctl() and re-indent it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/13-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/ccw: Provide a get_region_info op

Move it out of vfio_ccw_mdev_ioctl() and re-indent it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/12-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/cdx: Provide a get_region_info op

Change the signature of vfio_cdx_ioctl_get_region_info() and hook it to
the op.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/11-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/fsl: Provide a get_region_info op

Move it out of vfio_fsl_mc_ioctl() and re-indent it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/10-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/platform: Provide a get_region_info op

Move it out of vfio_platform_ioctl() and re-indent it. Add it to all
platform drivers.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/9-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/mbochs: Provide a get_region_info op

Move it out of mbochs_ioctl() and re-indent it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/8-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/mdpy: Provide a get_region_info op

Move it out of mdpy_ioctl() and re-indent it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/mtty: Provide a get_region_info op

Move it out of mtty_ioctl() and re-indent it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Fill in the missing get_region_info ops

Now that every variant driver provides a get_region_info op remove the
ioctl based dispatch from vfio_pci_core_ioctl().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/nvgrace: Convert to the get_region_info op

Change the signature of nvgrace_gpu_ioctl_get_region_info()

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/virtio: Convert to the get_region_info op

Remove virtiovf_vfio_pci_core_ioctl() and change the signature of
virtiovf_pci_ioctl_get_region_info().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/hisi: Convert to the get_region_info op

Change the function signature of hisi_acc_vfio_pci_ioctl()
and re-indent it.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Provide a get_region_info op

Instead of hooking the general ioctl op, have the core code directly
decode VFIO_DEVICE_GET_REGION_INFO and call an op just for it.

This is intended to allow mechanical changes to the drivers to pull their
VFIO_DEVICE_GET_REGION_INFO int oa function. Later patches will improve
the function signature to consolidate more code.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: replace iova=vaddr with allocated iovas

vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr
as part of DMA mapping operations. However, not all IOMMUs support the
same virtual address width as the processor. For instance, older Intel
consumer platforms only support 39-bits of IOMMU address space. On such
platforms, using the virtual address as the IOVA fails.

Make the tests more robust by using iova_allocator to vend IOVAs, which
queries legally accessible IOVAs from the underlying IOMMUFD or VFIO
container.

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-4-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: add iova allocator

Add struct iova_allocator, which gives tests a convenient way to generate
legally-accessible IOVAs to map. This allocator traverses the sorted
available IOVA ranges linearly, requires power-of-two size allocations,
and does not support freeing iova allocations. The assumption is that
tests are not IOVA space-bounded, and will not need to recycle IOVAs.

This is based on Alex Williamson's patch series for adding an IOVA
allocator [1].

[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-3-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: fix map limit tests to use last available iova

Use the newly available vfio_pci_iova_ranges() to determine the last
legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.

Fixes: de8d1f2fd5a5 ("vfio: selftests: add end of address space DMA map/unmap tests")
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-2-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: add iova range query helpers

VFIO selftests need to map IOVAs from legally accessible ranges, which
could vary between hardware. Tests in vfio_dma_mapping_test.c are making
excessively strong assumptions about which IOVAs can be mapped.

Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the
IOMMUFD or VFIO container associated with the device. The queried ranges
are normalized to IOMMUFD's iommu_iova_range representation so that
handling of IOVA ranges up the stack can be implementation-agnostic.
iommu_iova_range and vfio_iova_range are equivalent, so bias to using the
new interface's struct.

Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES.
Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.

The underlying vfio_iommu_type1_info buffer-related functionality has
been kept generic so the same helpers can be used to query other
capability chain information, if needed.

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-1-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

hisi_acc_vfio_pci: Add .match_token_uuid callback in hisi_acc_vfio_pci_migrn_ops

The commit, <86624ba3b522> ("vfio/pci: Do vf_token checks for
VFIO_DEVICE_BIND_IOMMUFD") accidentally ignored including the
.match_token_uuid callback in the hisi_acc_vfio_pci_migrn_ops struct.
Introduce the missed callback here.

Fixes: 86624ba3b522 ("vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD")
Cc: stable@vger.kernel.org
Suggested-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Longfang Liu <liulongfang@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20251031170603.2260022-3-rananta@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Fix ksize arg while copying user struct in vfio_df_ioctl_bind_iommufd()

For the cases where user includes a non-zero value in 'token_uuid_ptr'
field of 'struct vfio_device_bind_iommufd', the copy_struct_from_user()
in vfio_df_ioctl_bind_iommufd() fails with -E2BIG. For the 'minsz' passed,
copy_struct_from_user() expects the newly introduced field to be zero-ed,
which would be incorrect in this case.

Fix this by passing the actual size of the kernel struct. If working
with a newer userspace, copy_struct_from_user() would copy the
'token_uuid_ptr' field, and if working with an old userspace, it would
zero out this field, thus still retaining backward compatibility.

Fixes: 86624ba3b522 ("vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD")
Cc: stable@vger.kernel.org
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20251031170603.2260022-2-rananta@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

hisi_acc_vfio_pci: adapt to new migration configuration

On new platforms greater than QM_HW_V3, the migration region has been
relocated from the VF to the PF. The VF's own configuration space is
restored to the complete 64KB, and there is no need to divide the
size of the BAR configuration space equally. The driver should be
modified accordingly to adapt to the new hardware device.

On the older hardware platform QM_HW_V3, the live migration configuration
region is placed in the latter 32K portion of the VF's BAR2 configuration
space. On the new hardware platform QM_HW_V4, the live migration
configuration region also exists in the same 32K area immediately following
the VF's BAR2, just like on QM_HW_V3.

However, access to this region is now controlled by hardware. Additionally,
a copy of the live migration configuration region is present in the PF's
BAR2 configuration space. On the new hardware platform QM_HW_V4, when an
older version of the driver is loaded, it behaves like QM_HW_V3 and uses
the configuration region in the VF, ensuring that the live migration
function continues to work normally. When the new version of the driver is
loaded, it directly uses the configuration region in the PF. Meanwhile,
hardware configuration disables the live migration configuration region
in the VF's BAR2: reads return all 0xF values, and writes are silently
ignored.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Reviewed-by: Shameer Kolothum <shameerkolothum@gmail.com>
Link: https://lore.kernel.org/r/20251030015744.131771-3-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

crypto: hisilicon - qm updates BAR configuration

On new platforms greater than QM_HW_V3, the configuration region for the
live migration function of the accelerator device is no longer
placed in the VF, but is instead placed in the PF.

Therefore, the configuration region of the live migration function
needs to be opened when the QM driver is loaded. When the QM driver
is uninstalled, the driver needs to clear this configuration.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Reviewed-by: Shameer Kolothum <shameerkolothum@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20251030015744.131771-2-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/mtty: Fix spelling typo in samples/vfio-mdev

The comment incorrectly used "atleast" instead of "at least".

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Link: https://lore.kernel.org/r/20251015015954.2363-2-chuguangqing@inspur.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: selftests: Store libvfio build outputs in $(OUTPUT)/libvfio

Store the tools/testing/selftests/vfio/lib outputs (e.g. object files)
in $(OUTPUT)/libvfio rather than in $(OUTPUT)/lib. This is in
preparation for building the VFIO selftests library into the KVM
selftests (see Link below).

Specifically this will avoid name conflicts between
tools/testing/selftests/{vfio,kvm/lib and also avoid leaving behind
empty directories under tools/testing/selftests/kvm after a make clean.

Link: https://lore.kernel.org/kvm/20250912222525.2515416-2-dmatlack@google.com/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250922224857.2528737-1-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

Linux 6.18-rc4

Merge tag 'spi-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
"One new device ID for an Intel SoC"

* tag 'spi-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: intel: Add support for Oak Stream SPI serial flash

Merge tag 'regulator-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"A simple fix for a missed part of an API conversion in the bd718x7
driver"

* tag 'regulator-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: bd718x7: Fix voltages scaled by resistor divider

Merge tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"One documentation fix and a fix for a problem with the slimbus regmap
  which was uncovered by some changes in one of the drivers"

* tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: irq: Correct documentation of wake_invert flag
  regmap: slimbus: fix bus_context pointer in regmap init calls

Merge tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

- Limit AMD microcode Entrysign sha256 signature checking to
   known CPU generations

- Disable AMD RDSEED32 on certain Zen5 CPUs that have a
   microcode version before when the microcode-based fix was
   issued for the AMD-SB-7055 erratum

- Fix FPU AMD XFD state synchronization on signal delivery

- Fix (work around) a SSE4a-disassembly related build failure
   on X86_NATIVE_CPU=y builds

- Extend the AMD Zen6 model space with a new range of models

- Fix <asm/intel-family.h> CPU model comments

- Fix the CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y build, which
   was unhappy due to missing kCFI type annotations of clear_page()
   variants

* tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
  x86/cpu: Add/fix core comments for {Panther,Nova} Lake
  x86/CPU/AMD: Extend Zen6 model range
  x86/build: Disable SSE4a
  x86/fpu: Ensure XFD state on signal delivery
  x86/CPU/AMD: Add RDSEED fix for Zen5
  x86/microcode/AMD: Limit Entrysign signature checking to known generations

Merge tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fixes from Ingo Molnar:
"Miscellaneous fixes and CPU model updates:

   - Fix an out-of-bounds access on non-hybrid platforms in the Intel
     PMU DS code, reported by KASAN

   - Add WildcatLake PMU and uncore support: it's identical to the
     PantherLake version"

* tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add uncore PMU support for Wildcat Lake
  perf/x86/intel: Add PMU support for WildcatLake
  perf/x86/intel: Fix KASAN global-out-of-bounds warning

Merge tag 'objtool-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Ingo Molnar:
"Fix objtool warning when faced with raw STAC/CLAC instructions"

* tag 'objtool-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix skip_alt_group() for non-alternative STAC/CLAC

Merge tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:
"Just a single bug fix (and documentation for the issue)"

* tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: document another racy GC case in xfs_zoned_map_extent
xfs: prevent gc from picking the same zone twice

Merge tag 'kbuild-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nathan Chancellor:

- Formally adopt Kconfig in MAINTAINERS

- Fix install-extmod-build for more O= paths

- Align end of .modinfo to fix Authenticode calculation in EDK2

- Restore dynamic check for '-fsanitize=kernel-memory' in
   CONFIG_HAVE_KMSAN_COMPILER to ensure backend target has support
   for it

- Initialize locale in menuconfig and nconfig to fix UTF-8 terminals
   that may not support VT100 ACS by default like PuTTY

* tag 'kbuild-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kconfig/nconf: Initialize the default locale at startup
  kconfig/mconf: Initialize the default locale at startup
  KMSAN: Restore dynamic check for '-fsanitize=kernel-memory'
  kbuild: align modinfo section for Secureboot Authenticode EDK2 compat
  kbuild: install-extmod-build: Fix when given dir outside the build dir
  MAINTAINERS: Update Kconfig section

objtool: Fix skip_alt_group() for non-alternative STAC/CLAC

If an insn->alt points to a STAC/CLAC instruction, skip_alt_group()
assumes it's part of an alternative ("alt group") as opposed to some
other kind of "alt" such as an exception fixup.

While that assumption may hold true in the current code base, Linus has
an out-of-tree patch which breaks that assumption by replacing the
STAC/CLAC alternatives with raw STAC/CLAC instructions.

Make skip_alt_group() more robust by making sure it's actually an alt
group before continuing.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 2d12c6fb7875 ("objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC")
Closes: https://lore.kernel.org/CAHk-=wi6goUT36sR8GE47_P-aVrd5g38=VTRHpktWARbyE-0ow@mail.gmail.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/3d22415f7b8e06a64e0873b21f48389290eeaa49.1761767616.git.jpoimboe@kernel.org

kconfig/nconf: Initialize the default locale at startup

Fix bug where make nconfig doesn't initialize the default locale, which
causes ncurses menu borders to be displayed incorrectly (lqqqqk) in
UTF-8 terminals that don't support VT100 ACS by default, such as PuTTY.

Signed-off-by: Jakub Horký <jakub.git@horky.net>
Link: https://patch.msgid.link/20251014144405.3975275-2-jakub.git@horky.net
[nathan: Alphabetize locale.h include]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>

kconfig/mconf: Initialize the default locale at startup

Fix bug where make menuconfig doesn't initialize the default locale, which
causes ncurses menu borders to be displayed incorrectly (lqqqqk) in
UTF-8 terminals that don't support VT100 ACS by default, such as PuTTY.

Signed-off-by: Jakub Horký <jakub.git@horky.net>
Link: https://patch.msgid.link/20251014154933.3990990-1-jakub.git@horky.net
[nathan: Alphabetize locale.h include]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Mark migrate_disable/enable() as always_inline to avoid issues with
   partial inlining (Yonghong Song)

- Fix powerpc stack register definition in libbpf bpf_tracing.h (Andrii
   Nakryiko)

- Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann)

- Conditionally include dynptr copy kfuncs (Malin Jonsson)

- Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal)

- Do not audit capability check in x86 do_jit() (Ondrej Mosnacek)

- Fix arm64 JIT of BPF_ST insn when it writes into arena memory
   (Puranjay Mohan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf/arm64: Fix BPF_ST into arena memory
  bpf: Make migrate_disable always inline to avoid partial inlining
  bpf: Reject negative head_room in __bpf_skb_change_head
  bpf: Conditionally include dynptr copy kfuncs
  libbpf: Fix powerpc's stack register definition in bpf_tracing.h
  bpf: Do not audit capability check in do_jit()
  bpf: Sync pending IRQ work before freeing ring buffer

x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols

When building with CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y, there is a series
of errors from the various versions of clear_page() not having __kcfi_typeid_
symbols.

  $ cat kernel/configs/repro.config
  CONFIG_CFI=y
  # CONFIG_LTO_NONE is not set
  CONFIG_LTO_CLANG_FULL=y

  $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config bzImage
  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_rep
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_rep)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_orig
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_orig)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_erms
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_erms)

With full LTO, it is possible for LLVM to realize that these functions never
have their address taken (as they are only used within an alternative, which
will make them a direct call) across the whole kernel and either drop or skip
generating their kCFI type identification symbols.

clear_page_{rep,orig,erms}() are defined in clear_page_64.S with
SYM_TYPED_FUNC_START as a result of

  2981557cb040 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI"),

as exported functions are free to be called indirectly thus need kCFI type
identifiers.

Use KCFI_REFERENCE with these clear_page() functions to force LLVM to see
these functions as address-taken and generate then keep the kCFI type
identifiers.

Fixes: 2981557cb040 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI")
Closes: https://github.com/ClangBuiltLinux/linux/issues/2128
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://patch.msgid.link/20251013-x86-fix-clear_page-cfi-full-lto-errors-v1-1-d69534c0be61@kernel.org

Merge tag 'drm-fixes-2025-10-31' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Simona Vetter:
"Looks like stochastics conspired to make this one a bit bigger, but
  nothing scary at all. Also first examples of the new Link: tags, yay!

  Next week Dave should be back.

  Drivers:
   - mediatek: uaf in unbind, fixes -rc2 boot regression
   - radeon: devm conversion fixes
   - amdgpu: VPE idle handler, re-enable DM idle optimization, DCN3,
     SMU, vblank, HDP eDP, powerplay fixes for fiji/iceland
   - msm: bunch of gem error path fixes, gmu fw parsing fix, dpu fixes
   - intel: fix dmc/dc6 asserts on ADL-S
   - xe: fix xe_validation_guard(), wake device handling around gt reset
   - ast: fix display output on AST2300
   - etnaviv: fix gpu flush
   - imx: fix parallel bridge handling
   - nouveau: scheduler locking fix
   - panel: fixes for kingdisplay-kd097d04 and sitronix-st7789v

  Core Changes:
   - CI: disable broken sanity job
   - sysfb: fix NULL pointer access
   - sched: fix SIGKILL handling, locking for race condition
   - dma_fence: better timeline name for signalled fences"

* tag 'drm-fixes-2025-10-31' of https://gitlab.freedesktop.org/drm/kernel: (44 commits)
  drm/ast: Clear preserved bits from register output value
  drm/imx: parallel-display: add the bridge before attaching it
  drm/imx: parallel-display: convert to devm_drm_bridge_alloc() API
  drm/panel: kingdisplay-kd097d04: Disable EoTp
  drm/panel: sitronix-st7789v: fix sync flags for t28cp45tn89
  drm/xe: Do not wake device during a GT reset
  drm/xe: Fix uninitialized return value from xe_validation_guard()
  drm/msm/dpu: Fix adjusted mode clock check for 3d merge
  drm/msm/dpu: Disable broken YUV on QSEED2 hardware
  drm/msm/dpu: Require linear modifier for writeback framebuffers
  drm/msm/dpu: Fix pixel extension sub-sampling
  drm/msm/dpu: Disable scaling for unsupported scaler types
  drm/msm/dpu: Propagate error from dpu_assign_plane_resources
  drm/msm/dpu: Fix allocation of RGB SSPPs without scaling
  drm/msm: dsi: fix PLL init in bonded mode
  drm/i915/dmc: Clear HRR EVT_CTL/HTP to zero on ADL-S
  drm/amd/display: Fix incorrect return of vblank enable on unconfigured crtc
  drm/amd/display: Add HDR workaround for a specific eDP
  drm/amdgpu: fix SPDX header on cyan_skillfish_reg_init.c
  drm/amdgpu: fix SPDX header on irqsrcs_vcn_5_0.h
  ...

Merge tag 'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Restore custom qcom ASPM enablement code so L1 PM Substates are
   enabled as they were in v6.17 even though the PCI core now enables
   just L0s and L1 by default (Bjorn Helgaas)

- Size prefetchable bridge windows only when they actually exist, to
   avoid a WARN_ON() regression (Ilpo Järvinen)

* tag 'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Do not size non-existing prefetchable window
  Revert "PCI: qcom: Remove custom ASPM enablement code"

Merge tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

- Fix overflows in vfio type1 backend for mappings at the end of the
   64-bit address space, resulting in leaked pinned memory.

   New selftest support included to avoid such issues in the future
   (Alex Mastro)

* tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio:
  vfio: selftests: add end of address space DMA map/unmap tests
  vfio: selftests: update DMA map/unmap helpers to support more test kinds
  vfio/type1: handle DMA map/unmap up to the addressable limit
  vfio/type1: move iova increment to unmap_unpin_*() caller
  vfio/type1: sanitize for overflow using check_*_overflow()

PCI: Do not size non-existing prefetchable window

pbus_size_mem() should only be called for bridge windows that exist but
__pci_bus_size_bridges() may point 'pref' to a resource that does not exist
(has zero flags) in case of non-root buses.

When prefetchable bridge window does not exist, the same non-prefetchable
bridge window is sized more than once which may result in duplicating
entries into the realloc_head list. Duplicated entries are shown in this
log and trigger a WARN_ON() because realloc_head had residual entries after
the resource assignment algorithm:

  pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400 PCIe Root Port
  pci 0000:00:03.0: PCI bridge to [bus 00]
  pci 0000:00:03.0:   bridge window [io  0x0000-0x0fff]
  pci 0000:00:03.0:   bridge window [mem 0x00000000-0x000fffff]
  pci 0000:00:03.0: bridge window [mem 0x00200000-0x003fffff] to [bus 02] add_size 200000 add_align 200000
  pci 0000:00:03.0: bridge window [mem 0x00200000-0x003fffff] to [bus 02] add_size 200000 add_align 200000
  pci 0000:00:03.0: bridge window [mem 0xe0000000-0xe03fffff]: assigned
  pci 0000:00:03.0: PCI bridge to [bus 02]
  pci 0000:00:03.0:   bridge window [mem 0xe0000000-0xe03fffff]
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1 at drivers/pci/setup-bus.c:2373 pci_assign_unassigned_root_bus_resources+0x1bc/0x234

Check resource flags of 'pref' and only size the prefetchable window if the
resource has the IORESOURCE_PREFETCH flag.

Fixes: ae88d0b9c57f ("PCI: Use pbus_select_window_for_type() during mem window sizing")
Reported-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Closes: https://lore.kernel.org/r/51e8cf1c62b8318882257d6b5a9de7fdaaecc343.camel@gmail.com/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Link: https://patch.msgid.link/20251027132423.8841-1-ilpo.jarvinen@linux.intel.com

Revert "PCI: qcom: Remove custom ASPM enablement code"

This reverts commit a729c16646198872e345bf6c48dbe540ad8a9753.

Prior to a729c1664619 ("PCI: qcom: Remove custom ASPM enablement code"),
the qcom controller driver enabled ASPM, including L0s, L1, and L1 PM
Substates, for all devices powered on at the time the controller driver
enumerates them.

ASPM was *not* enabled for devices powered on later by pwrctrl (unless the
kernel was built with PCIEASPM_POWERSAVE or PCIEASPM_POWER_SUPERSAVE, or
the user enabled ASPM via module parameter or sysfs).

After f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for
devicetree platforms"), the PCI core enabled all ASPM states for all
devices whether powered on initially or by pwrctrl, so a729c1664619 was
unnecessary and reverted.

But f3ac2ff14834 was too aggressive and broke platforms that didn't support
CLKREQ# or required device-specific configuration for L1 Substates, so
df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
enabled only L0s and L1.

On Qualcomm platforms, this left L1 Substates disabled, which was a
regression. Revert a729c1664619 so L1 Substates will be enabled on devices
that are initially powered on. Devices powered on by pwrctrl will be
addressed later.

Fixes: df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/lkml/aPuXZlaawFmmsLmX@hovoldconsulting.com/
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251024210514.1365996-1-helgaas@kernel.org

Merge tag 'block-6.18-20251031' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Fix blk-crypto reporting EIO when EINVAL is the correct error code

- Two bug fixes for the block zone support

- NVME pull request via Keith:
      - Target side authentication fixup
      - Peer-to-peer metadata fixup

- null_blk DMA alignment fix

* tag 'block-6.18-20251031' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  null_blk: set dma alignment to logical block size
  blk-crypto: use BLK_STS_INVAL for alignment errors
  block: make REQ_OP_ZONE_OPEN a write operation
  block: fix op_is_zone_mgmt() to handle REQ_OP_ZONE_RESET_ALL
  nvme-pci: use blk_map_iter for p2p metadata
  nvmet-auth: update sc_c in host response

Merge tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

- Use correct locking in zPCI event code to avoid deadlock

- Get rid of irqs_registered flag in zpci_dev structure and restore IRQ
   unconditionally for zPCI devices. This fixes sit uations where the
   flag was not correctly updated

- Fix potential memory leak kernel page table dumper code

- Disable (revert) ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP for s390 again.

   The optimized hugetlb vmemmap code modifies kernel page tables in a
   way which does not work on s390 and leads to reproducible kernel
   crashes due to stale TLB entries. This needs to be addressed with
   some larger changes. For now simply disable the feature

- Update defconfigs

* tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
  s390/mm: Fix memory leak in add_marker() when kvrealloc() fails
  s390/pci: Restore IRQ unconditionally for the zPCI device
  s390: Update defconfigs
  s390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump

bpf/arm64: Fix BPF_ST into arena memory

The arm64 JIT supports BPF_ST with BPF_PROBE_MEM32 (arena) by using the
tmp2 register to hold the dst + arena_vm_base value and using tmp2 as the
new dst register. But this is broken because in case is_lsi_offset()
returns false the tmp2 will be clobbered by emit_a64_mov_i(1, tmp2, off,
ctx); and hence the emitted store instruction will be of the form:
strb w10, [x11, x11]
Fix this by using the third temporary register to hold the dst +
arena_vm_base.

Fixes: 339af577ec05 ("bpf: Add arm64 JIT support for PROBE_MEM32 pseudo instructions.")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251030121715.55214-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Make migrate_disable always inline to avoid partial inlining

The build fails with llvm 21/22:

  $ make LLVM=1 -j
    ...
    LD      vmlinux.o
    GEN     .vmlinux.objs
    ...
    BTF     .tmp_vmlinux1.btf.o
    ...
    AS      .tmp_vmlinux2.kallsyms.o
    LD      vmlinux.unstripped
    BTFIDS  vmlinux.unstripped
  WARN: resolve_btfids: unresolved symbol migrate_enable
  WARN: resolve_btfids: unresolved symbol migrate_disable
  make[2]: *** [vmlinux.unstripped] Error 255
  make[2]: *** Deleting file 'vmlinux.unstripped'
  make[1]: *** [Makefile:1242: vmlinux] Error 2
  make: *** [Makefile:248: __sub-make] Error 2

Two functions with identical names but different addresses are
considered ambiguous and removed by "pahole" from vmlinux BTF.
Later resolve_btfids warns since it cannot find them.

Commit 378b7708194f ("sched: Make migrate_{en,dis}able() inline") made
them inlineable in most places, but in vmlinux built with llvm 21 and 22
there are four symbols for migrate_{enable,disable}:
three static functions and one global function.

Fix the issue by marking migrate_{enable,disable} as always inline.
The alternative is to mark them as notrace/nokprobe which is more
drastic. Only bpf programs are prevented from attaching to these
functions. The rest of the tracing shouldn't be affected.

[note: Peter ok-ed the patch, Alexei rewrote commit log]

Fixes: 378b7708194f ("sched: Make migrate_{en,dis}able() inline")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Menglong Dong <menglong.dong@linux.dev>
Link: https://lore.kernel.org/r/20251029183646.3811774-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'drm-xe-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix xe_validation_guard() not guarding (Thomas Hellström)
- Do not wake device during a GT reset (Matthew Brost)

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/o2b3lucyitafbbcd5bewpfqnslavtnnpc6ck4qatnou2wwukix@rz6seyfw75uy

Merge tag 'drm-misc-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

ast:
- Preserve correct bits on register I/O

dma-fence:
- Use correct timeline name

etnaviv:
- Use correct GPU adress space for flush

imx:
- parallel-display: Fix bridge handling

nouveau:
- Fix locking in scheduler

panel:
- kingdisplay-kd097d04: Disable EOT packet
- sitronix-st7789v: Use correct SYNC flags

sched:
- Fix locking to avoid race condition
- Fix SIGKILL handling

sysfb:
- Avoid NULL-pointer access

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251030195644.GA188441@localhost.localdomain

Merge tag 'drm-intel-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix DMC/DC6 asserts on ADL-S (Ville)

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aQNtTV75vPaDhnXh@intel.com

Merge tag 'drm-msm-fixes-2025-10-29' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.18-rc4

CI
- Disable broken sanity job

GEM
- Fix vm_bind prealloc error path
- Fix dma-buf import free
- Fix last-fence update
- Reject MAP_NULL if PRR is unsupported
- Ensure vm is created in VM_BIND ioctl

GPU
- GMU fw parsing fix

DPU:
- Fixed mode_valid callback
- Fixed planes on DPU 1.x devices.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Rob Clark <rob.clark@oss.qualcomm.com>
Link: https://patch.msgid.link/CACSVV03kUm1ms7FBg0m9U4ZcyickSWbnayAWqYqs0XH4UjWf+A@mail.gmail.com

Merge tag 'amd-drm-fixes-6.18-2025-10-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.18-2025-10-29:

amdgpu:
- VPE idle handler fix
- Re-enable DM idle optimizations
- DCN3.0 fix
- SMU fix
- Powerplay fixes for fiji/iceland
- License fixes
- HDP eDP panel fix
- Vblank fix

radeon:
- devm migration fixes

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251029201342.8813-1-alexander.deucher@amd.com

Merge tag 'mediatek-drm-fixes-20251028' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes

Mediatek DRM Fixes - 20251028

1. Fix device use-after-free on unbind

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patch.msgid.link/20251028151548.3944-1-chunkuang.hu@kernel.org

Merge tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- fix potential UAF in statfs

- DFS fix for expired referrals

- fix minor modinfo typo

- small improvement to reconnect for smbdirect

* tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: call smbd_destroy() in the same splace as kernel_sock_shutdown()/sock_release()
  smb: client: handle lack of IPC in dfs_cache_refresh()
  smb: client: fix potential cfid UAF in smb2_query_info_compound
  cifs: fix typo in enable_gcm_256 module parameter

null_blk: set dma alignment to logical block size

This driver assumes that bio vectors are memory aligned to the logical
block size, so set the queue limit to reflect that.

Unless we set up the limit based on the logical block size, we will go
out of page bounds in copy_to_nullb / copy_from_nullb.

Apparently this wasn't noticed so far because none of the tests generate
such buffers, but since commit 851c4c96db00 ("xfs: implement
XFS_IOC_DIOINFO in terms of vfs_getattr") xfstests generates unaligned
I/O, which now lead to memory corruption when using null_blk devices
with 4k block size.

Fixes: bf8d08532bc1 ("iomap: add support for dma aligned direct-io")
Fixes: b1a000d3b8ec ("block: relax direct io memory alignment")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes. It became slightly bigger than usual due
  to timing issues (holidays, etc), but all changes are rather
  device-specific fixes, so not really worrisome.

   - ASoC Cirrus codec fixes for AMD

   - Various fixes for ASoC Intel AVS, Qualcomm, SoundWire, FSL,
     Mediatek, Renesas

   - A few HD-audio quirks, and USB-audio regression fixes for Presonus"

* tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ALSA: hda/realtek: Enable mic on Vaio RPL
  ASoC: dt-bindings: pm4125-sdw: correct number of soundwire ports
  ASoC: renesas: rz-ssi: Use proper dma_buffer_pos after resume
  ASoC: soc_sdw_utils: remove cs42l43 component_name
  ASoC: fsl_sai: Fix sync error in consumer mode
  ASoC: Fix build for sdw_utils
  ALSA: hda/realtek: Fix mute led for HP Victus 15-fa1xxx (MB 8C2D)
  ASoC: rt721: fix prepare clock stop failed
  ALSA: usb-audio: don't log messages meant for 1810c when initializing 1824c
  ASoC: mediatek: Fix double pm_runtime_disable in remove functions
  ASoC: fsl_micfil: correct the endian format for DSD
  ASoC: fsl_sai: fix bit order for DSD format
  ASoC: Intel: avs: Use snd_codec format when initializing probe
  ASoC: Intel: avs: Disable periods-elapsed work when closing PCM
  ASoC: Intel: avs: Unprepare a stream when XRUN occurs
  ASoC: sdw_utils: add name_prefix for rt1321 part id
  ASoC: qdsp6: q6asm: do not sleep while atomic
  ASoC: Intel: soc-acpi-intel-ptl-match: Remove cs42l43 match from sdw link3
  ASOC: max98090/91: fix for filter configuration: AHPF removed DMIC2_HPF added
  ASoC: amd: acp: Add ACP7.0 match entries for cs35l56 and cs42l43
  ...

Merge tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Fix double free in aspeed

- Fix req->nbytes clobbering in s390/phmac

* tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aspeed - fix double free caused by devm
crypto: s390/phmac - Do not modify the req->nbytes value