]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
5 weeks agodrm/xe: Validate preferred system memory placement in xe_svm_range_validate
Matthew Brost [Tue, 6 Jan 2026 21:34:43 +0000 (13:34 -0800)] 
drm/xe: Validate preferred system memory placement in xe_svm_range_validate

Ensure preferred system memory placement is checked in
xe_svm_range_validate when dpagemap is NULL. Without this check, a
prefetch to system memory may become a no-op because device memory is
considered a valid placement.

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 238dbc9d9f4a ("drm/xe: Use the vma attibute drm_pagemap to select where to migrate")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20260106213443.1866797-1-matthew.brost@intel.com
5 weeks agodrm/xe/doc: Remove KEEP_ACTIVE feature
Niranjana Vishwanathapura [Tue, 6 Jan 2026 19:10:51 +0000 (11:10 -0800)] 
drm/xe/doc: Remove KEEP_ACTIVE feature

The KEEP_ACTIVE feature is being reverted, update documentation.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260106191051.2866538-6-niranjana.vishwanathapura@intel.com
5 weeks agoRevert "drm/xe/multi_queue: Support active group after primary is destroyed"
Niranjana Vishwanathapura [Tue, 6 Jan 2026 19:10:50 +0000 (11:10 -0800)] 
Revert "drm/xe/multi_queue: Support active group after primary is destroyed"

This reverts commit 3131a43ecb346ae3b5287ee195779fc38c6fcd11.

There is no must have requirement for this feature from Compute UMD.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260106191051.2866538-5-niranjana.vishwanathapura@intel.com
5 weeks agodrm/xe/i2c: Force polling mode in survivability
Raag Jadav [Mon, 5 Jan 2026 08:07:50 +0000 (13:37 +0530)] 
drm/xe/i2c: Force polling mode in survivability

SGUnit interrupts are not initialized in survivability. Force I2C
controller to polling mode while in survivability.

v2: Use helper function instead of manual check (Riana)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20260105080750.16605-1-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
7 weeks agodrm/xe: Improve rebar log messages
Lucas De Marchi [Fri, 19 Dec 2025 21:16:49 +0000 (13:16 -0800)] 
drm/xe: Improve rebar log messages

Some minor improvements to the log messages in the rebar logic:
use xe-oriented printk, switch unit from M to MiB in a few places for
consistency and use ilog2(SZ_1M) for clarity.

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251219211650.1908961-6-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
7 weeks agodrm/xe: Move rebar to its own file
Lucas De Marchi [Fri, 19 Dec 2025 21:16:48 +0000 (13:16 -0800)] 
drm/xe: Move rebar to its own file

Now that xe_pci.c calls the rebar directly, it doesn't make sense to
keep it in xe_vram.c since it's closer to the PCI initialization than to
the VRAM. Move it to its own file.

While at it, add a better comment to document the possible values for
the vram_bar_size module parameter.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251219211650.1908961-5-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
7 weeks agodrm/xe/guc: READ/WRITE_ONCE ct->state
Jonathan Cavitt [Mon, 22 Dec 2025 20:20:00 +0000 (20:20 +0000)] 
drm/xe/guc: READ/WRITE_ONCE ct->state

Use READ_ONCE and WRITE_ONCE when operating on ct->state
to prevent the compiler form ignoring important modifications
to its value.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251222201957.63245-6-jonathan.cavitt@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
7 weeks agodrm/xe/guc: READ/WRITE_ONCE g2h_fence->done
Jonathan Cavitt [Mon, 22 Dec 2025 20:19:59 +0000 (20:19 +0000)] 
drm/xe/guc: READ/WRITE_ONCE g2h_fence->done

Use READ_ONCE and WRITE_ONCE when operating on g2h_fence->done
to prevent the compiler from ignoring important modifications
to its value.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251222201957.63245-5-jonathan.cavitt@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
7 weeks agodrm/xe/soc_remapper: Add system controller config for SoC remapper
Umesh Nerlige Ramappa [Tue, 23 Dec 2025 18:39:47 +0000 (10:39 -0800)] 
drm/xe/soc_remapper: Add system controller config for SoC remapper

Define system controller config bits and helpers for SoC remapper.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251223183943.3175941-8-umesh.nerlige.ramappa@intel.com
7 weeks agodrm/xe/soc_remapper: Use SoC remapper helper from VSEC code
Umesh Nerlige Ramappa [Tue, 23 Dec 2025 18:39:46 +0000 (10:39 -0800)] 
drm/xe/soc_remapper: Use SoC remapper helper from VSEC code

Since different drivers can use SoC remapper, modify VSEC code to
access SoC remapper via a helper that would synchronize such accesses.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251223183943.3175941-7-umesh.nerlige.ramappa@intel.com
7 weeks agodrm/xe/soc_remapper: Initialize SoC remapper during Xe probe
Umesh Nerlige Ramappa [Tue, 23 Dec 2025 18:39:45 +0000 (10:39 -0800)] 
drm/xe/soc_remapper: Initialize SoC remapper during Xe probe

SoC remapper is used to map different HW functions in the SoC to their
respective drivers. Initialize SoC remapper during driver load.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251223183943.3175941-6-umesh.nerlige.ramappa@intel.com
7 weeks agodrm/xe: Don't use absolute path in generated header comment
Calvin Owens [Mon, 22 Dec 2025 16:54:42 +0000 (11:54 -0500)] 
drm/xe: Don't use absolute path in generated header comment

Building the XE driver through Yocto throws this QA warning:

    WARNING: mc:house:linux-stable-6.17-r0 do_package_qa: QA Issue: File /usr/src/debug/linux-stable/6.17/drivers/gpu/drm/xe/generated/xe_device_wa_oob.h in package linux-stable-src contains reference to TMPDIR [buildpaths]
    WARNING: mc:house:linux-stable-6.17-r0 do_package_qa: QA Issue: File /usr/src/debug/linux-stable/6.17/drivers/gpu/drm/xe/generated/xe_wa_oob.h in package linux-stable-src contains reference to TMPDIR [buildpaths]

...because the comment at the top of the generated header contains the
absolute path to the rules file at build time:

    * This file was generated from rules: /home/calvinow/git/meta-house/build/tmp-house/work-shared/nuc14rvhu7/kernel-source/drivers/gpu/drm/xe/xe_device_wa_oob.rules

Fix this minor annoyance by putting the basename of the rules file in
the generated comment instead of the absolute path, so the generated
header contents no longer depend on the location of the kernel source.

Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Link: https://patch.msgid.link/20251222165441.516102-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
7 weeks agodrm/xe/migrate: Configure migration queue as low latency
Francois Dugast [Tue, 23 Dec 2025 11:53:27 +0000 (12:53 +0100)] 
drm/xe/migrate: Configure migration queue as low latency

Commit 5488bec96bcc ("drm/xe/uapi: Use hint for guc to set GT frequency")
introduced low latency hint for use by user space when creating an exec
queue. This instructs SLPC to ramp the GT frequency aggressively.

SVM relies on an internal exec queue to migrate memory upon page faults.
This change creates this exec queue with the low latency hint to speed up
migration.

This should not impact systems where GT frequency is set over sysfs, or
with long running workloads which give enough time for the frequency to
ramp up. An example of memory access pattern that shows an improvement of
SVM performance is running hundreds of times IGT eu-fault-2m-once-device
in xe_exec_system_allocator. The copy duration provided by GT stats in
svm_2M_device_copy_us shows per GPU page fault:
    ~ 165 μs without low latency hint
    ~ 130 μs with low latency hint

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251223115327.49555-1-francois.dugast@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
7 weeks agodrm/xe/svm: Serialize migration to device if racing
Thomas Hellström [Fri, 19 Dec 2025 11:33:20 +0000 (12:33 +0100)] 
drm/xe/svm: Serialize migration to device if racing

Introduce an rw-semaphore to serialize migration to device if
it's likely that migration races with another device migration
of the same CPU address space range.
This is a temporary fix to attempt to mitigate a livelock that
might happen if many devices try to migrate a range at the same
time, and it affects only devices using the xe driver.
A longer term fix is probably improvements in the core mm
migration layer.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-25-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap: Support source migration over interconnect
Thomas Hellström [Fri, 19 Dec 2025 11:33:19 +0000 (12:33 +0100)] 
drm/pagemap: Support source migration over interconnect

Support source interconnect migration by using the copy_to_ram() op
of the source device private pages.

Source interconnect migration is required to flush the L2 cache of
the source device, which among other things is a requirement for
correct global atomic operation. It also enables the source GPU to
potentially decompress any compressed content which is not
understood by peers, and finally for the PCIe case, it's expected
that writes over PCIe will be faster than reads.

The implementation can probably be improved by coalescing subregions
with the same source.

v5:
- Update waiting for the pre_migrate_fence and comments around that,
  previously in another patch. (Himal).
- Actually select device private pages to migrate when
  source_peer_migrates is true.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-24-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap, drm/xe: Support destination migration over interconnect
Thomas Hellström [Fri, 19 Dec 2025 11:33:18 +0000 (12:33 +0100)] 
drm/pagemap, drm/xe: Support destination migration over interconnect

Support destination migration over interconnect when migrating from
device-private pages with the same dev_pagemap owner.

Since we now also collect device-private pages to migrate,
also abort migration if the range to migrate is already
fully populated with pages from the desired pagemap.

Finally return -EBUSY from drm_pagemap_populate_mm()
if the migration can't be completed without first migrating all
pages in the range to system. It is expected that the caller
will perform that before retrying the call to
drm_pagemap_populate_mm().

v3:
- Fix a bug where the p2p dma-address was never used.
- Postpone enabling destination interconnect migration,
  since xe devices require source interconnect migration to
  ensure the source L2 cache is flushed at migration time.
- Update the drm_pagemap_migrate_to_devmem() interface to
  pass migration details.
v4:
- Define XE_INTERCONNECT_P2P unconditionally (CI)
- Include a missing header (CI)
v5:
- Use page order increments where possible (Matt Brost).
- Fix a negated value of can_migrate_same_pagemap.
- Move removal of some dead code to a separate patch (Matt Brost).
- Remove an unnecessary zdd get() and put() (Matt Brost).

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-23-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe: Use drm_gpusvm_scan_mm()
Thomas Hellström [Fri, 19 Dec 2025 11:33:17 +0000 (12:33 +0100)] 
drm/xe: Use drm_gpusvm_scan_mm()

Use drm_gpusvm_scan_mm() to avoid unnecessarily calling into
drm_pagemap_populate_mm();

v3:
- New patch.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-22-thomas.hellstrom@linux.intel.com
7 weeks agodrm/gpusvm: Introduce a function to scan the current migration state
Thomas Hellström [Fri, 19 Dec 2025 11:33:16 +0000 (12:33 +0100)] 
drm/gpusvm: Introduce a function to scan the current migration state

With multi-device we are much more likely to have multiple
drm-gpusvm ranges pointing to the same struct mm range.

To avoid calling into drm_pagemap_populate_mm(), which is always
very costly, introduce a much less costly drm_gpusvm function,
drm_gpusvm_scan_mm() to scan the current migration state.
The device fault-handler and prefetcher can use this function to
determine whether migration is really necessary.

There are a couple of performance improvements that can be done
for this function if it turns out to be too costly. Those are
documented in the code.

v3:
- New patch.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-21-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap, drm/xe: Clean up the use of the device-private page owner
Thomas Hellström [Fri, 19 Dec 2025 11:33:15 +0000 (12:33 +0100)] 
drm/pagemap, drm/xe: Clean up the use of the device-private page owner

Use the dev_pagemap->owner field wherever possible, simplifying
the code slightly.

v3: New patch

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-20-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe/svm: Document how xe keeps drm_pagemap references
Thomas Hellström [Fri, 19 Dec 2025 11:33:14 +0000 (12:33 +0100)] 
drm/xe/svm: Document how xe keeps drm_pagemap references

As an aid to understanding the lifetime of the drm_pagemaps used
by the xe driver, document how the xe driver keeps the
drm_pagemap references.

v3:
- Fix formatting (Matt Brost)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-19-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe/vm: Add a couple of VM debug printouts
Thomas Hellström [Fri, 19 Dec 2025 11:33:13 +0000 (12:33 +0100)] 
drm/xe/vm: Add a couple of VM debug printouts

Add debug printouts that are valueable for pagemap prefetch,
migration and page collection.

v2:
- Add additional debug prinouts around migration and page collection.
- Require CONFIG_DRM_XE_DEBUG_VM.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Link: https://patch.msgid.link/20251219113320.183860-18-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe: Support pcie p2p dma as a fast interconnect
Thomas Hellström [Fri, 19 Dec 2025 11:33:12 +0000 (12:33 +0100)] 
drm/xe: Support pcie p2p dma as a fast interconnect

Mimic the dma-buf method using dma_[map|unmap]_resource to map
for pcie-p2p dma.

There's an ongoing area of work upstream to sort out how this best
should be done. One method proposed is to add an additional
pci_p2p_dma_pagemap aliasing the device_private pagemap and use
the corresponding pci_p2p_dma_pagemap page as input for
dma_map_page(). However, that would incur double the amount of
memory and latency to set up the drm_pagemap and given the huge
amount of memory present on modern GPUs, that would really not work.
Hence the simple approach used in this patch.

v2:
- Simplify xe_page_to_pcie(). (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-17-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe/uapi: Extend the madvise functionality to support foreign pagemap placement...
Thomas Hellström [Fri, 19 Dec 2025 11:33:11 +0000 (12:33 +0100)] 
drm/xe/uapi: Extend the madvise functionality to support foreign pagemap placement for svm

Use device file descriptors and regions to represent pagemaps on
foreign or local devices.

The underlying files are type-checked at madvise time, and
references are kept on the drm_pagemap as long as there is are
madvises pointing to it.

Extend the madvise preferred_location UAPI to support the region
instance to identify the foreign placement.

v2:
- Improve UAPI documentation. (Matt Brost)
- Sanitize preferred_mem_loc.region_instance madvise. (Matt Brost)
- Clarify madvise drm_pagemap vs xe_pagemap refcounting. (Matt Brost)
- Don't allow a foreign drm_pagemap madvise without a fast
  interconnect.
v3:
- Add a comment about reference-counting in xe_devmem_open() and
  remove the reference-count get-and-put. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-16-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe: Simplify madvise_preferred_mem_loc()
Thomas Hellström [Fri, 19 Dec 2025 11:33:10 +0000 (12:33 +0100)] 
drm/xe: Simplify madvise_preferred_mem_loc()

Simplify madvise_preferred_mem_loc by removing repetitive patterns
in favour of local variables.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-15-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe: Use the vma attibute drm_pagemap to select where to migrate
Thomas Hellström [Fri, 19 Dec 2025 11:33:09 +0000 (12:33 +0100)] 
drm/xe: Use the vma attibute drm_pagemap to select where to migrate

Honor the drm_pagemap vma attribute when migrating SVM pages.
Ensure that when the desired placement is validated as device
memory, that we also check that the requested drm_pagemap is
consistent with the current.

v2:
- Initialize a struct drm_pagemap pointer to NULL that could
  otherwise be dereferenced uninitialized. (CI)
- Remove a redundant assignment (Matt Brost)
- Slightly improved commit message (Matt Brost)
- Extended drm_pagemap validation.

v3:
- Fix a compilation error if CONFIG_DRM_GPUSVM is not enabled.
  (kernel test robot <lkp@intel.com>)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-14-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe: Pass a drm_pagemap pointer around with the memory advise attributes
Thomas Hellström [Fri, 19 Dec 2025 11:33:08 +0000 (12:33 +0100)] 
drm/xe: Pass a drm_pagemap pointer around with the memory advise attributes

As a consequence, struct xe_vma_mem_attr() can't simply be assigned
or freed without taking the reference count of individual members
into account. Also add helpers to do that.

v2:
- Move some calls to xe_vma_mem_attr_fini() to xe_vma_free(). (Matt Brost)
v3:
- Rebase.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v2
Link: https://patch.msgid.link/20251219113320.183860-13-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner
Thomas Hellström [Fri, 19 Dec 2025 11:33:07 +0000 (12:33 +0100)] 
drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner

Register a driver-wide owner list, provide a callback to identify
fast interconnects and use the drm_pagemap_util helper to allocate
or reuse a suitable owner struct. For now we consider pagemaps on
different tiles on the same device as having fast interconnect and
thus the same owner.

v2:
- Fix up the error onion unwind in xe_pagemap_create(). (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-12-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus
Thomas Hellström [Fri, 19 Dec 2025 11:33:06 +0000 (12:33 +0100)] 
drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus

The hmm_range_fault() and the migration helpers currently need a common
"owner" to identify pagemaps and clients with fast interconnect.
Add a drm_pagemap utility to setup such owners by registering
drm_pagemaps, in a registry, and for each new drm_pagemap,
query which existing drm_pagemaps have fast interconnects with the new
drm_pagemap.

The "owner" scheme is limited in that it is static at drm_pagemap creation.
Ideally one would want the owner to be adjusted at run-time, but that
requires changes to hmm. If the proposed scheme becomes too limited,
we need to revisit.

v2:
- Improve documentation of DRM_PAGEMAP_OWNER_LIST_DEFINE(). (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-11-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap: Remove the drm_pagemap_create() interface
Thomas Hellström [Fri, 19 Dec 2025 11:33:05 +0000 (12:33 +0100)] 
drm/pagemap: Remove the drm_pagemap_create() interface

With the drm_pagemap_init() interface, drm_pagemap_create() is not
used anymore.

v2:
- Slightly more verbose commit message. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-10-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe: Use the drm_pagemap cache and shrinker
Thomas Hellström [Fri, 19 Dec 2025 11:33:04 +0000 (12:33 +0100)] 
drm/xe: Use the drm_pagemap cache and shrinker

Define a struct xe_pagemap that embeds all pagemap-related
data used by xekmd, and use the drm_pagemap cache- and
shrinker to manage lifetime.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-9-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap: Add a drm_pagemap cache and shrinker
Thomas Hellström [Fri, 19 Dec 2025 11:33:03 +0000 (12:33 +0100)] 
drm/pagemap: Add a drm_pagemap cache and shrinker

Pagemaps are costly to set up and tear down, and they consume a lot
of system memory for the struct pages. Ideally they should be
created only when needed.

Add a caching mechanism to allow doing just that: Create the drm_pagemaps
when needed for migration. Keep them around to avoid destruction and
re-creation latencies and destroy inactive/unused drm_pagemaps on memory
pressure using a shrinker.

Only add the helper functions. They will be hooked up to the xe driver
in the upcoming patch.

v2:
- Add lockdep checking for drm_pagemap_put(). (Matt Brost)
- Add a copyright notice. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-8-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes
Thomas Hellström [Fri, 19 Dec 2025 11:33:02 +0000 (12:33 +0100)] 
drm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes

If a device holds a reference on a foregin device's drm_pagemap,
and a device unbind is executed on the foreign device,
Typically that foreign device would evict its device-private
pages and then continue its device-managed cleanup eventually
releasing its drm device and possibly allow for module unload.
However, since we're still holding a reference on a drm_pagemap,
when that reference is released and the provider module is
unloaded we'd execute out of undefined memory.

Therefore keep a reference on the provider device and module until
the last drm_pagemap reference is gone.

Note that in theory, the drm_gpusvm_helper module may be unloaded
as soon as the final module_put() of the provider driver module is
executed, so we need to add a module_exit() function that waits
for the work item executing the module_put() has completed.

v2:
- Better commit message (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-7-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap: Add a refcounted drm_pagemap backpointer to struct drm_pagemap_zdd
Thomas Hellström [Fri, 19 Dec 2025 11:33:01 +0000 (12:33 +0100)] 
drm/pagemap: Add a refcounted drm_pagemap backpointer to struct drm_pagemap_zdd

To be able to keep track of drm_pagemap usage, add a refcounted
backpointer to struct drm_pagemap_zdd. This will keep the drm_pagemap
reference count from dropping to zero as long as there are drm_pagemap
pages present in a CPU address space.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-6-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap, drm/xe: Add refcounting to struct drm_pagemap
Thomas Hellström [Fri, 19 Dec 2025 11:33:00 +0000 (12:33 +0100)] 
drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap

With the end goal of being able to free unused pagemaps
and allocate them on demand, add a refcount to struct drm_pagemap,
remove the xe embedded drm_pagemap, allocating and freeing it
explicitly.

v2:
- Make the drm_pagemap pointer in drm_gpusvm_pages reference-counted.
v3:
- Call drm_pagemap_get() before drm_pagemap_put() in drm_gpusvm_pages
  (Himal Prasad Ghimiray)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-5-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap, drm/xe: Ensure that the devmem allocation is idle before use
Thomas Hellström [Fri, 19 Dec 2025 11:32:59 +0000 (12:32 +0100)] 
drm/pagemap, drm/xe: Ensure that the devmem allocation is idle before use

In situations where no system memory is migrated to devmem, and in
upcoming patches where another GPU is performing the migration to
the newly allocated devmem buffer, there is nothing to ensure any
ongoing clear to the devmem allocation or async eviction from the
devmem allocation is complete.

Address that by passing a struct dma_fence down to the copy
functions, and ensure it is waited for before migration is marked
complete.

v3:
- New patch.
v4:
- Update the logic used for determining when to wait for the
  pre_migrate_fence.
- Update the logic used for determining when to warn for the
  pre_migrate_fence since the scheduler fences apparently
  can signal out-of-order.
v5:
- Fix a UAF (CI)
- Remove references to source P2P migration (Himal)
- Put the pre_migrate_fence after migration.
v6:
- Pipeline the pre_migrate_fence dependency (Matt Brost)

Fixes: c5b3eb5a906c ("drm/xe: Add GPUSVM device memory copy vfunc functions")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-4-thomas.hellstrom@linux.intel.com
7 weeks agodrm/pagemap: Remove some dead code
Thomas Hellström [Fri, 19 Dec 2025 11:32:58 +0000 (12:32 +0100)] 
drm/pagemap: Remove some dead code

The page pointer can't be NULL.

v5:
- New patch. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-3-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe/svm: Fix a debug printout
Thomas Hellström [Fri, 19 Dec 2025 11:32:57 +0000 (12:32 +0100)] 
drm/xe/svm: Fix a debug printout

Avoid spamming the log with drm_info(). Use drm_dbg() instead.

Fixes: cc795e041034 ("drm/xe/svm: Make xe_svm_range_needs_migrate_to_vram() public")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: <stable@vger.kernel.org> # v6.17+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-2-thomas.hellstrom@linux.intel.com
7 weeks agodrm/xe/pf: Add debugfs to set EQ and PT for scheduler groups
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:58 +0000 (14:38 -0800)] 
drm/xe/pf: Add debugfs to set EQ and PT for scheduler groups

Debugfs files are added to allow a user to provide a comma-separated list
of values to assign to each group for each VF.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-26-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Add functions to set preempt timeouts for each group
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:57 +0000 (14:38 -0800)] 
drm/xe/pf: Add functions to set preempt timeouts for each group

The KLV to set the preemption timeout for each groups works the exact
same way as the one for the exec quantums, so we add similar functions.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-25-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Add functions to set exec quantums for each group
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:56 +0000 (14:38 -0800)] 
drm/xe/pf: Add functions to set exec quantums for each group

The GuC has a new dedicated KLV to set the EQs for the groups. The GuC
always sets the EQs for all the groups (even the ones not enabled). If
we provide fewer values than the max number of groups (8), the GuC will
set the remaining ones to 0 (infinity).

Note that the new KLV can be used even when groups are disabled (as the
GuC always consider group0 to be active), so we can use it when encoding
the SRIOV config.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-24-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Prep for multiple exec quantums and preemption timeouts
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:55 +0000 (14:38 -0800)] 
drm/xe/pf: Prep for multiple exec quantums and preemption timeouts

Each scheduler group can be independently configured with its own exec
quantum and preemption timeouts. The existing KLVs to configure those
parameters will apply the value to all groups (even if they're not
enabled at the moment).

When scheduler groups are disabled, the GuC uses the values from Group 0.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-23-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Add debugfs with scheduler groups information
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:54 +0000 (14:38 -0800)] 
drm/xe/pf: Add debugfs with scheduler groups information

Under a new subfolder, an entry is created for each group to list the
engines assigned to them. We create enough entries for each possible
group, with the disabled groups just returning an empty list.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-22-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Add debugfs to enable scheduler groups
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:53 +0000 (14:38 -0800)] 
drm/xe/pf: Add debugfs to enable scheduler groups

Reading the debugfs file lists the available configurations by name.
Writing the name of a configuration to the file will enable it.
Note that while this debugfs is PF-only, follow up patches will add some
debugfs files that are applicable to VF as well, so the function accepts
a vfid parameter to be ready for that.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-21-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/vf: Check if scheduler groups are enabled
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:52 +0000 (14:38 -0800)] 
drm/xe/vf: Check if scheduler groups are enabled

VF can check if PF has enabled scheduler groups with a dedicated KLV
query. If scheduler groups are enabled, MLRC queue registrations are
forbidden.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-20-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Scheduler groups are incompatible with multi-lrc
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:51 +0000 (14:38 -0800)] 
drm/xe/pf: Scheduler groups are incompatible with multi-lrc

Since engines in the same class can be divided across multiple groups,
the GuC does not allow scheduler groups to be active if there are
multi-lrc contexts. This means that:

1) if a MLRC context is registered when we enable scheduler groups, the
   GuC will silently ignore the configuration
2) if a MLRC context is registered after scheduler groups are enabled,
   the GuC will disable the groups and generate an adverse event.

The expectation is that the admin will ensure that all apps that use
MLRC on PF have been terminated before scheduler groups are created. A
check is added anyway to make sure we don't still have contexts waiting
to be cleaned up laying around. A check is also added at queue creation
time to block MLRC queue creation if scheduler groups have been enabled.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-19-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Add support for enabling scheduler groups
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:50 +0000 (14:38 -0800)] 
drm/xe/pf: Add support for enabling scheduler groups

Scheduler groups are enabled by sending a specific policy configuration
KLV to the GuC. We don't allow changing this policy if there are VF
active, since the expectation is that the VF will only check if the
feature is enabled during driver initialization.

While the GuC interface supports a maximum of 8 groups, the actual
number of groups that can be enabled can be lower than that and
can be different on different devices. For now, all devices support up
to 2 groups, so we check that we do not have more groups than that.

The functions added by this patch will be used by sysfs/debugfs, coming
in follow up patches.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-18-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/pf: Initialize scheduler groups
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:49 +0000 (14:38 -0800)] 
drm/xe/pf: Initialize scheduler groups

Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
feature that allows the driver to define groups of engines that are
independently scheduled across VFs, which allows different VFs to be
active on the HW at the same time on different groups. The feature is
available for BMG and newer HW starting on GuC 70.53.0, but some
required fixes have been added to GuC 70.55.1.

This is intended for specific scenarios where the admin knows that the
VFs are not going to fully utilize the HW and therefore assigning all of
it to a single VF would lead to part of it being permanently idle.
We do not allow the admin to decide how to divide the engines across
groups, but we instead support specific configurations that are designed
for specific use-cases. During PF initialization we detect which
configurations are possible on a given GT and create the relevant
groups. Since the GuC expect a mask for each class for each group, that
is what we save when we init the configs.

Right now we only have one use-case on the media GT. If the VFs are
running a frame render + encoding at a not-too-high resolution (e.g.
1080@30fps) the render can produce frames faster than the video engine
can encode them, which means that the maximum number of parallel VFs is
limited by the VCS bandwidth. Since our products can have multiple VCS
engines, allowing multiple VFs to be active on the different VCS engines
at the same time allows us to run more parallel VFs on the same HW.
Given that engines in the same media slice share some resources (e.g.
SFC), we assign each media slice to a different scheduling group. We
refer to this configuration as "media_slices", given that each slice
gets its own group. Since upcoming products have a different number of
video engines per-slice, for now we limit the media_slices mode to BMG,
but we expect to add support for newer HW soon.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-17-daniele.ceraolospurio@intel.com
7 weeks agodrm/gt/guc: extract scheduler-related defines from guc_fwif.h
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:48 +0000 (14:38 -0800)] 
drm/gt/guc: extract scheduler-related defines from guc_fwif.h

Some upcoming KLVs are sized based on the engine counts, so we need
those defines to be moved to a separate file to include them from
guc_klv_abi.h (which is already included by guc_fwif.h).
Instead of moving just the engine-related defines, it is cleaner to
move all scheduler-related defines (i.e., everything engine or context
related). Note that the legacy GuC defines have not been moved and have
instead been dropped because Xe doesn't support any GuC old enough to
still use them.

While at it, struct guc_ctxt_registration_info has been moved to
guc_submit.c since it doesn't come from the GuC specs (we added it to
make things simpler in our code).

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-16-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe/gt: Add engine masks for each class
Daniele Ceraolo Spurio [Thu, 18 Dec 2025 22:38:47 +0000 (14:38 -0800)] 
drm/xe/gt: Add engine masks for each class

Follow up patches will need the engine masks for VCS and VECS engines.
Since we already have a macro for the CCS engines, just extend the same
approach to all classes.

To avoid confusion with the XE_HW_ENGINE_*_MASK masks, the new macros
use the _INSTANCES suffix instead. For consistency, rename CCS_MASK to
CCS_INSTANCES as well.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-15-daniele.ceraolospurio@intel.com
7 weeks agodrm/xe: Print GuC queue submission state on engine reset
Matthew Brost [Thu, 18 Dec 2025 22:45:46 +0000 (14:45 -0800)] 
drm/xe: Print GuC queue submission state on engine reset

Print the GuC queue submission state when an engine reset occurs, as
this provides clues about the cause of the reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251218224546.4057424-1-matthew.brost@intel.com
7 weeks agodrm/xe: Increase log level for unhandled page faults
Matthew Brost [Thu, 18 Dec 2025 22:37:45 +0000 (14:37 -0800)] 
drm/xe: Increase log level for unhandled page faults

Set the kernel log level for unhandled page faults to match the log
level (info) for engine resets. Currently, dmesg output can be confusing
because it shows an engine reset without indicating the page fault that
caused it. Without this change, the GuC log must be examined to
determine the source of the engine reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251218223745.4045207-1-matthew.brost@intel.com
7 weeks agodrm/xe/xe_survivability: Add index bound check
Riana Tauro [Fri, 19 Dec 2025 10:52:27 +0000 (16:22 +0530)] 
drm/xe/xe_survivability: Add index bound check

Fix static analysis tool reported issue. Add index bound check before
accessing info array to prevent out of bound.

Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode")
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251219105224.871930-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
7 weeks agodrm/xe/xe_survivability: Use static for survivability info attributes
Riana Tauro [Fri, 19 Dec 2025 10:52:26 +0000 (16:22 +0530)] 
drm/xe/xe_survivability: Use static for survivability info attributes

Fix sparse warnings. Use static for survivability info attributes.

Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512101919.G12cuhBJ-lkp@intel.com/
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251219105224.871930-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
7 weeks agodrm/xe/pmu: Replace sprintf() with sysfs_emit()
Madhur Kumar [Sun, 14 Dec 2025 08:36:59 +0000 (14:06 +0530)] 
drm/xe/pmu: Replace sprintf() with sysfs_emit()

Replace sprintf() calls with sysfs_emit() to follow current kernel
coding standards.

sysfs_emit() is the preferred method for formatting sysfs output as it
provides better bounds checking and is more secure.

Signed-off-by: Madhur Kumar <madhurkumar004@gmail.com>
Link: https://patch.msgid.link/20251214083659.2412218-1-madhurkumar004@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo adjusted commit message while pushing it]

7 weeks agoMerge drm/drm-next into drm-xe-next
Thomas Hellström [Fri, 19 Dec 2025 10:51:22 +0000 (11:51 +0100)] 
Merge drm/drm-next into drm-xe-next

Backmerging to bring in 6.19-rc1. An important upstream bugfix and
to help unblock PTL CI.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
8 weeks agodrm/xe: Fix documentation heading levels in xe_guc_pc.c
Swaraj Gaikwad [Tue, 9 Dec 2025 09:48:36 +0000 (09:48 +0000)] 
drm/xe: Fix documentation heading levels in xe_guc_pc.c

Sphinx reports htmldocs warnings:

  Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:76: ERROR: A level 2 section cannot be used here.
  Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:87: ERROR: A level 2 section cannot be used here.

The xe_guc_pc.c documentation is included inside xe_firmware.rst.
The headers in the C file currently use '=' underlines, which conflict with
the parent document's section levels.

Fix this by demoting "Frequency management" and "Render-C States" headers
from '=' to '-' to correctly nest them as subsections.

Build environment: Python 3.13.7 Sphinx 8.2.3 docutils 0.22.3

Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
Link: https://patch.msgid.link/20251209094836.18589-1-swarajgaikwad1925@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
8 weeks agodrm/xe/xe_survivability: Remove unused index
Riana Tauro [Thu, 18 Dec 2025 10:51:53 +0000 (16:21 +0530)] 
drm/xe/xe_survivability: Remove unused index

Remove unused index variable and fix for loop.

Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/intel-xe/20251210075757.GA1206705@ax162/
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251218105151.586575-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
8 weeks agodrm/xe/nvm: enable cri platform
Alexander Usyskin [Tue, 16 Dec 2025 11:10:34 +0000 (13:10 +0200)] 
drm/xe/nvm: enable cri platform

Mark CRI as one that have the CSC NVM device.
Update the writable override flow to take the information from
the scratch register for CRI.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251216111034.3093507-1-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
8 weeks agodrm/xe: Drop preempt-fences when destroying imported dma-bufs.
Thomas Hellström [Wed, 17 Dec 2025 09:34:41 +0000 (10:34 +0100)] 
drm/xe: Drop preempt-fences when destroying imported dma-bufs.

When imported dma-bufs are destroyed, TTM is not fully
individualizing the dma-resv, but it *is* copying the fences that
need to be waited for before declaring idle. So in the case where
the bo->resv != bo->_resv we can still drop the preempt-fences, but
make sure we do that on bo->_resv which contains the fence-pointer
copy.

In the case where the copying fails, bo->_resv will typically not
contain any fences pointers at all, so there will be nothing to
drop. In that case, TTM would have ensured all fences that would
have been copied are signaled, including any remaining preempt
fences.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fa0af721bd1f ("drm/ttm: test private resv obj on release/destroy")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251217093441.5073-1-thomas.hellstrom@linux.intel.com
8 weeks agodrm/xe/eustall: Disallow 0 EU stall property values
Ashutosh Dixit [Fri, 12 Dec 2025 06:18:50 +0000 (22:18 -0800)] 
drm/xe/eustall: Disallow 0 EU stall property values

An EU stall property value of 0 is invalid and will cause a NPD.

Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6453
Fixes: 1537ec85ebd7 ("drm/xe/uapi: Introduce API for EU stall sampling")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-4-ashutosh.dixit@intel.com
8 weeks agodrm/xe/oa: Disallow 0 OA property values
Ashutosh Dixit [Fri, 12 Dec 2025 06:18:49 +0000 (22:18 -0800)] 
drm/xe/oa: Disallow 0 OA property values

An OA property value of 0 is invalid and will cause a NPD.

Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6452
Fixes: cc4e6994d5a2 ("drm/xe/oa: Move functions up so they can be reused for config ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-3-ashutosh.dixit@intel.com
8 weeks agodrm/xe/oa: Move default oa unit assignment earlier during stream open
Ashutosh Dixit [Fri, 12 Dec 2025 06:18:48 +0000 (22:18 -0800)] 
drm/xe/oa: Move default oa unit assignment earlier during stream open

De-referencing param.oa_unit, when an OA unit id is not provided during
stream open, results in NPD below.

  Oops: general protection fault, probably for non-canonical address...
  KASAN: null-ptr-deref in range...
  RIP: 0010:xe_oa_stream_open_ioctl+0x169/0x38a0
   xe_observation_ioctl+0x19f/0x270
   drm_ioctl_kernel+0x1f4/0x410

Fix this by moving default oa unit assignment before the dereference.

Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6840
Fixes: c7e269aa565f ("drm/xe/oa: Allow exec_queue's to be specified only for OAG OA unit")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-2-ashutosh.dixit@intel.com
8 weeks agodrm/xe/pf: Add handling for MLRC adverse event threshold
Daniele Ceraolo Spurio [Tue, 16 Dec 2025 21:48:59 +0000 (22:48 +0100)] 
drm/xe/pf: Add handling for MLRC adverse event threshold

Since it is illegal to register a MLRC context when scheduler groups are
enabled, the GuC consider the VF doing so as an adverse event. Like for
other adverse event, there is a threshold for how many times the event
can happen before the GuC throws an error, which we need to add support
for.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251216214902.1429-5-michal.wajdeczko@intel.com
8 weeks agodrm/xe/pf: Prepare for new threshold KLVs
Michal Wajdeczko [Tue, 16 Dec 2025 21:48:58 +0000 (22:48 +0100)] 
drm/xe/pf: Prepare for new threshold KLVs

We want to extend our macro-based KLV list definitions with new
information about the version from which given KLV is supported.
Prepare our code generators to emit dedicated version check if
a KLV was defined with the version information.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251216214902.1429-4-michal.wajdeczko@intel.com
8 weeks agodrm/xe/guc: Introduce GUC_FIRMWARE_VER_AT_LEAST helper
Michal Wajdeczko [Tue, 16 Dec 2025 21:48:57 +0000 (22:48 +0100)] 
drm/xe/guc: Introduce GUC_FIRMWARE_VER_AT_LEAST helper

There are already few places in the code where we need to check GuC
firmware version. Wrap existing raw conditions into a named helper
macro to make it clear and avoid explicit call of the MAKE_GUC_VER.

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251216214902.1429-3-michal.wajdeczko@intel.com
8 weeks agodrm/xe: Introduce IF_ARGS macro utility
Michal Wajdeczko [Wed, 17 Dec 2025 22:40:18 +0000 (23:40 +0100)] 
drm/xe: Introduce IF_ARGS macro utility

We want to extend our macro-based KLV list definitions with new
information about the version from which given KLV is supported.
Add utility IF_ARGS macro that can be used in code generators to
emit different code based on the presence of additional arguments.

Introduce macro itself and extend our kunit tests to cover it.
We will use this macro in next patch.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251217224018.3490-1-michal.wajdeczko@intel.com
8 weeks agodrm/xe: Fix NULL pointer dereference in xe_exec_ioctl
Tapani Pälli [Wed, 17 Dec 2025 13:24:12 +0000 (15:24 +0200)] 
drm/xe: Fix NULL pointer dereference in xe_exec_ioctl

Helper function xe_sync_needs_wait expects sync->fence when accessing
flags, patch makes sure we call only when sync->fence exists.

v2: move null checking to xe_sync_needs_wait and make
    xe_sync_entry_wait utilize this helper (Matthew Auld)
v3: further simplify code (Matthew Auld)

Fixes NULL pointer dereference seen with Vulkan workloads:

[  118.410401] RIP: 0010:xe_sync_needs_wait+0x27/0x50 [xe]

Fixes: 4ac9048d0501 ("drm/xe: Wait on in-syncs when swicthing to dma-fence mode")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251217132412.435755-1-tapani.palli@intel.com
8 weeks agoMAINTAINERS: Update Xe driver maintainers
Rodrigo Vivi [Thu, 4 Dec 2025 19:34:04 +0000 (14:34 -0500)] 
MAINTAINERS: Update Xe driver maintainers

Add Matt Brost, one of the Xe driver creators, as maintainer.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251204193403.930328-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
8 weeks agodrm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()
Dan Carpenter [Fri, 5 Dec 2025 11:39:19 +0000 (14:39 +0300)] 
drm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()

The xe_sriov_vfio_migration_supported() function is type bool so
returning -EPERM means returning true.  Return false instead.

Fixes: 17f22465c5a5 ("drm/xe/pf: Export helpers for VFIO")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/aTLEZ4g-FD-iMQ2V@stanley.mountain
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
8 weeks agodrm/xe/vf: fix return type in vf_migration_init_late()
Dan Carpenter [Fri, 5 Dec 2025 11:10:31 +0000 (14:10 +0300)] 
drm/xe/vf: fix return type in vf_migration_init_late()

The vf_migration_init_late() function is supposed to return zero on
success and negative error codes on failure.  The error code
eventually gets propagated back to the probe() function and returned.
The problem is it's declared as type bool so it returns true on
error.  Change it to type int instead.

Fixes: 2e2dab20dd66 ("drm/xe/vf: Enable VF migration only on supported GuC versions")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/aTK9pwJ_roc8vpDi@stanley.mountain
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
8 weeks agodrm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUME
Ashutosh Dixit [Fri, 5 Dec 2025 21:26:13 +0000 (13:26 -0800)] 
drm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUME

Reports can be written out to the OA buffer using ways other than periodic
sampling. These include mmio trigger and context switches. To support these
use cases, when periodic sampling is not enabled,
OAG_OAGLBCTXCTRL_COUNTER_RESUME must be set.

Fixes: 1db9a9dc90ae ("drm/xe/oa: OA stream initialization (OAG)")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251205212613.826224-4-ashutosh.dixit@intel.com
8 weeks agodrm/xe/rtp: Whitelist OAMERT MMIO trigger registers
Ashutosh Dixit [Fri, 5 Dec 2025 21:26:12 +0000 (13:26 -0800)] 
drm/xe/rtp: Whitelist OAMERT MMIO trigger registers

Whitelist OAMERT registers to enable userspace to execute MMIO triggers on
OAMERT units. Registers are whitelisted for compute and copy class engines.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251205212613.826224-3-ashutosh.dixit@intel.com
8 weeks agodrm/xe/oa/uapi: Expose MERT OA unit
Ashutosh Dixit [Fri, 5 Dec 2025 21:26:11 +0000 (13:26 -0800)] 
drm/xe/oa/uapi: Expose MERT OA unit

A MERT OA unit is available in the SoC on some platforms. Add support
for this OA unit and expose it to userspace. The MERT OA unit does not
have any HW engines attached, but is otherwise similar to an OAM unit.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251205212613.826224-2-ashutosh.dixit@intel.com
8 weeks agodrm/xe: Add more GT stats around pagefault mode switch flows
Matthew Brost [Fri, 12 Dec 2025 18:28:47 +0000 (10:28 -0800)] 
drm/xe: Add more GT stats around pagefault mode switch flows

Add GT stats to measure the time spent switching between pagefault mode
and dma-fence mode. Also add a GT stat to indicate when pagefault
suspend is skipped because the system is idle. These metrics will help
profile pagefault workloads while 3D and display are enabled.

v2:
 - Use GT stats helper functions (Francois)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-8-matthew.brost@intel.com
8 weeks agodrm/xe: Add GT stats ktime helpers
Matthew Brost [Fri, 12 Dec 2025 18:28:46 +0000 (10:28 -0800)] 
drm/xe: Add GT stats ktime helpers

Normalize GT stats that record execution periods in code paths by
adding helpers to perform the ktime calculation. Use these helpers in
the SVM code.

Suggested-by: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-7-matthew.brost@intel.com
8 weeks agodrm/xe: Wait on in-syncs when swicthing to dma-fence mode
Matthew Brost [Fri, 12 Dec 2025 18:28:45 +0000 (10:28 -0800)] 
drm/xe: Wait on in-syncs when swicthing to dma-fence mode

If a dma-fence submission has in-fences and pagefault queues are running
work, there is little incentive to kick the pagefault queues off the
hardware until the dma-fence submission is ready to run. Therefore, wait
on the in-fences of the dma-fence submission before removing the
pagefault queues from the hardware.

v2:
 - Fix kernel doc (CI)
 - Don't wait under lock (Thomas)
 - Make wait interruptable

Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-6-matthew.brost@intel.com
8 weeks agodrm/xe: Skip exec queue schedule toggle if queue is idle during suspend
Matthew Brost [Fri, 12 Dec 2025 18:28:44 +0000 (10:28 -0800)] 
drm/xe: Skip exec queue schedule toggle if queue is idle during suspend

If an exec queue is idle, there is no need to issue a schedule disable
to the GuC when suspending the queue’s execution. Opportunistically skip
this step if the queue is idle and not a parallel queue. Parallel queues
must have their scheduling state flipped in the GuC due to limitations
in how submission is implemented in run_job().

Also if all pagefault queues can skip the schedule disable during a
switch to dma-fence mode, do not schedule a resume for the pagefault
queues after the next submission.

v2:
 - Don't touch the LRC tail is queue is suspended but enabled in run_job
   (CI)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-5-matthew.brost@intel.com
8 weeks agodrm/xe: Add debugfs knobs to control long running workload timeslicing
Matthew Brost [Fri, 12 Dec 2025 18:28:43 +0000 (10:28 -0800)] 
drm/xe: Add debugfs knobs to control long running workload timeslicing

Add debugfs knobs to control timeslicing for long-running workloads,
allowing quick tuning of values when running benchmarks.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-4-matthew.brost@intel.com
8 weeks agodrm/xe: Use usleep_range for accurate long-running workload timeslicing
Matthew Brost [Fri, 12 Dec 2025 18:28:42 +0000 (10:28 -0800)] 
drm/xe: Use usleep_range for accurate long-running workload timeslicing

msleep is not very accurate in terms of how long it actually sleeps,
whereas usleep_range is precise. Replace the timeslice sleep for
long-running workloads with the more accurate usleep_range to avoid
jitter if the sleep period is less than 20ms.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-3-matthew.brost@intel.com
8 weeks agodrm/xe: Adjust long-running workload timeslices to reasonable values
Matthew Brost [Fri, 12 Dec 2025 18:28:41 +0000 (10:28 -0800)] 
drm/xe: Adjust long-running workload timeslices to reasonable values

A 10ms timeslice for long-running workloads is far too long and causes
significant jitter in benchmarks when the system is shared. Adjust the
value to 5ms for preempt-fencing VMs, as the resume step there is quite
costly as memory is moved around, and set it to zero for pagefault VMs,
since switching back to pagefault mode after dma-fence mode is
relatively fast.

Also change min_run_period_ms to 'unsiged int' type rather than 's64' as
only positive values make sense.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-2-matthew.brost@intel.com
8 weeks agodrm/xe/oa: Limit num_syncs to prevent oversized allocations
Shuicheng Lin [Fri, 5 Dec 2025 23:47:18 +0000 (23:47 +0000)] 
drm/xe/oa: Limit num_syncs to prevent oversized allocations

The OA open parameters did not validate num_syncs, allowing
userspace to pass arbitrarily large values, potentially
leading to excessive allocations.

Add check to ensure that num_syncs does not exceed DRM_XE_MAX_SYNCS,
returning -EINVAL when the limit is violated.

v2: use XE_IOCTL_DBG() and drop duplicated check. (Ashutosh)

Fixes: c8507a25cebd ("drm/xe/oa/uapi: Define and parse OA sync properties")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251205234715.2476561-6-shuicheng.lin@intel.com
8 weeks agodrm/xe: Limit num_syncs to prevent oversized allocations
Shuicheng Lin [Fri, 5 Dec 2025 23:47:17 +0000 (23:47 +0000)] 
drm/xe: Limit num_syncs to prevent oversized allocations

The exec and vm_bind ioctl allow userspace to specify an arbitrary
num_syncs value. Without bounds checking, a very large num_syncs
can force an excessively large allocation, leading to kernel warnings
from the page allocator as below.

Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request
exceeding this limit.

"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124
...
Call Trace:
 <TASK>
 alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416
 ___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317
 __kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348
 __do_kmalloc_node mm/slub.c:4364 [inline]
 __kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388
 kmalloc_noprof include/linux/slab.h:909 [inline]
 kmalloc_array_noprof include/linux/slab.h:948 [inline]
 xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158
 drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797
 drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894
 xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:598 [inline]
 __se_sys_ioctl fs/ioctl.c:584 [inline]
 __x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
"

v2: Add "Reported-by" and Cc stable kernels.
v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh)
v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt)
v5: Do the check at the top of the exec func. (Matt)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450
Cc: <stable@vger.kernel.org> # v6.12+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Ivan Briano <ivan.briano@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com
8 weeks agodrm/xe/guc: Fix version check for page-reclaim feature
Michal Wajdeczko [Mon, 15 Dec 2025 17:04:33 +0000 (18:04 +0100)] 
drm/xe/guc: Fix version check for page-reclaim feature

Page reclamation interfaces were introduced in GuC firmware version
70.31.0 (which corresponds to GuC ABI version 1.14.0), but since this
feature is also available for the VFs and VFs don't know the firmware
version, use GuC compatibility version check instead.

Fixes: 77ebc7c10d16 ("drm/xe/guc: Add page reclamation interface to GuC")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Brian Nguyen <brian3.nguyen@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251215170433.196398-1-michal.wajdeczko@intel.com
2 months agoLinux 6.19-rc1 v6.19-rc1
Linus Torvalds [Sun, 14 Dec 2025 04:05:07 +0000 (16:05 +1200)] 
Linux 6.19-rc1

2 months agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 14 Dec 2025 03:35:35 +0000 (15:35 +1200)] 
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "The only core fix is in doc; all the others are in drivers, with the
  biggest impacts in libsas being the rollback on error handling and in
  ufs coming from a couple of error handling fixes, one causing a crash
  if it's activated before scanning and the other fixing W-LUN
  resumption"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: qcom: Fix confusing cleanup.h syntax
  scsi: libsas: Add rollback handling when an error occurs
  scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
  scsi: ufs: core: Fix a deadlock in the frequency scaling code
  scsi: ufs: core: Fix an error handler crash
  scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
  scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
  scsi: qla4xxx: Use time conversion macros
  scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
  scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
  scsi: imm: Fix use-after-free bug caused by unfinished delayed work
  scsi: target: sbp: Remove KMSG_COMPONENT macro
  scsi: core: Correct documentation for scsi_device_quiesce()
  scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
  scsi: target: Reset t_task_cdb pointer in error case
  scsi: ufs: core: Fix EH failure after W-LUN resume error

2 months agoMerge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client
Linus Torvalds [Sun, 14 Dec 2025 03:24:10 +0000 (15:24 +1200)] 
Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "We have a patch that adds an initial set of tracepoints to the MDS
  client from Max, a fix that hardens osdmap parsing code from myself
  (marked for stable) and a few assorted fixups"

* tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client:
  rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES
  ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES
  libceph: make decode_pool() more resilient against corrupted osdmaps
  libceph: Amend checking to fix `make W=1` build breakage
  ceph: Amend checking to fix `make W=1` build breakage
  ceph: add trace points to the MDS client
  libceph: fix log output race condition in OSD client

2 months agoMerge tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo
Linus Torvalds [Sun, 14 Dec 2025 03:21:02 +0000 (15:21 +1200)] 
Merge tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo

Pull tomoyo update from Tetsuo Handa:
 "Trivial optimization"

* tag 'tomoyo-pr-20251212' of git://git.code.sf.net/p/tomoyo/tomoyo:
  tomoyo: Use local kmap in tomoyo_dump_page()

2 months agoMerge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:12:46 +0000 (06:12 +1200)] 
Merge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull CPU hotplug fix from Ingo Molnar:

 - Fix CPU hotplug callbacks to disable interrupts on UP kernels

* tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Make atomic hotplug callbacks run with interrupts disabled on UP

2 months agoMerge tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:10:35 +0000 (06:10 +1200)] 
Merge tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fixes from Ingo Molnar:

 - Fix NULL pointer dereference crash in the Intel PMU driver

 - Fix missing read event generation on task exit

 - Fix AMD uncore driver init error handling

 - Fix whitespace noise

* tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
  perf/core: Fix missing read event generation on task exit
  perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error
  perf/uprobes: Remove <space><Tab> whitespace noise

2 months agoMerge tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:07:09 +0000 (06:07 +1200)] 
Merge tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Ingo Molnar:

 - Fix error code in the irqchip/mchp-eic driver

 - Fix setup_percpu_irq() affinity assumptions

 - Remove the unused irq_domain_add_tree() function

* tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc()
  irqdomain: Delete irq_domain_add_tree()
  genirq: Allow NULL affinity for setup_percpu_irq()

2 months agoMerge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Dec 2025 18:04:16 +0000 (06:04 +1200)] 
Merge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc core fixes from Ingo Molnar:

 - Improve bug reporting

 - Suppress W=1 format warning

 - Improve rseq scalability on Clang builds

* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Always inline rseq_debug_syscall_return()
  bug: Hush suggest-attribute=format for __warn_printf()
  bug: Let report_bug_entry() provide the correct bugaddr

2 months agoMerge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 13 Dec 2025 08:55:12 +0000 (20:55 +1200)] 
Merge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "There are no significant series in this small merge. Please see the
  individual changelogs for details"

[ Editor's note: it's mainly ocfs2 and a couple of random fixes ]

* tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: memfd_luo: add CONFIG_SHMEM dependency
  mm: shmem: avoid build warning for CONFIG_SHMEM=n
  ocfs2: fix memory leak in ocfs2_merge_rec_left()
  ocfs2: invalidate inode if i_mode is zero after block read
  ocfs2: avoid -Wflex-array-member-not-at-end warning
  ocfs2: convert remaining read-only checks to ocfs2_emergency_state
  ocfs2: add ocfs2_emergency_state helper and apply to setattr
  checkpatch: add uninitialized pointer with __free attribute check
  args: fix documentation to reflect the correct numbers
  ocfs2: fix kernel BUG in ocfs2_find_victim_chain
  liveupdate: luo_core: fix redundant bound check in luo_ioctl()
  ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list
  fs/fat: remove unnecessary wrapper fat_max_cache()
  ocfs2: replace deprecated strcpy with strscpy
  ocfs2: check tl_used after reading it from trancate log inode
  liveupdate: luo_file: don't use invalid list iterator

2 months agoMerge tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Dec 2025 08:35:41 +0000 (20:35 +1200)] 
Merge tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - "powerpc/pseries/cmm: two smaller fixes" (David Hildenbrand)
   fixes a couple of minor things in ppc land

 - "Improve folio split related functions" (Zi Yan)
   some cleanups and minorish fixes in the folio splitting code

* tag 'mm-stable-2025-12-11-11-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/damon/tests/core-kunit: avoid damos_test_commit stack warning
  mm: vmscan: correct nr_requested tracing in scan_folios
  MAINTAINERS: add idr core-api doc file to XARRAY
  mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages()
  mm: fix CONFIG_STACK_GROWSUP typo in mm.h
  mm/huge_memory: fix folio split stats counting
  mm/huge_memory: make min_order_for_split() always return an order
  mm/huge_memory: replace can_split_folio() with direct refcount calculation
  mm/huge_memory: change folio_split_supported() to folio_check_splittable()
  mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM
  powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages
  powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION

2 months agofile: ensure cleanup
Christian Brauner [Sat, 13 Dec 2025 07:45:23 +0000 (08:45 +0100)] 
file: ensure cleanup

Brown paper bag time. This is a silly oversight where I missed to drop
the error condition checking to ensure we clean up on early error
returns. I have an internal unit testset coming up for this which will
catch all such issues going forward.

Reported-by: Chris Mason <clm@fb.com>
Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: 011703a9acd7 ("file: add FD_{ADD,PREPARE}()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agox86/hv: Add gitignore entry for generated header file
Linus Torvalds [Sat, 13 Dec 2025 07:57:41 +0000 (19:57 +1200)] 
x86/hv: Add gitignore entry for generated header file

Commit 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") added a
new generated header file for the offsets into the mshv_vtl_cpu_context
structure to be used by the low-level assembly code.  But it didn't add
the .gitignore file to go with it, so 'git status' and friends will
mention it.

Let's add the gitignore file before somebody thinks that generated
header should be committed.

Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 months agoMerge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 13 Dec 2025 05:39:28 +0000 (17:39 +1200)] 
Merge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel

Pull more drm fixes from Dave Airlie:
 "These are the enqueued fixes that ended up in our fixes branch,
  nouveau mostly, along with some small fixes in other places.

  plane:
   - Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties()

  ttm:
   - fix devcoredump for evicted bos

  panel:
   - Fix stack usage warning in novatek-nt35560

  nouveau:
   - alloc fwsec sb at boot to avoid s/r problems
   - fix strcpy usage
   - fix i2c encoder crash

  bridge:
   - Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83

  mgag200:
   - Fix bigendian handling in mgag200

  tilcdc:
   - Fix probe failure in tilcdc"

* tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
  drm/mgag200: Fix big-endian support
  drm/tilcdc: Fix removal actions in case of failed probe
  drm/ttm: Avoid NULL pointer deref for evicted BOs
  drm: nouveau: Replace sprintf() with sysfs_emit()
  drm/nouveau: fix circular dep oops from vendored i2c encoder
  drm/nouveau: refactor deprecated strcpy
  drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties()
  drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors
  drm/nouveau/gsp: Allocate fwsec-sb at boot
  drm/panel: novatek-nt35560: avoid on-stack device structure

2 months agoMerge tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Sat, 13 Dec 2025 05:25:26 +0000 (17:25 +1200)] 
Merge tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "This is the weekly fixes for what is in next tree, mostly amdgpu and
  some i915, panthor and a core revert.

  core:
   - revert dumb bo 8 byte alignment

  amdgpu:
   - SI fix
   - DC reduce stack usage
   - HDMI fixes
   - VCN 4.0.5 fix
   - DP MST fix
   - DC memory allocation fix

  amdkfd:
   - SVM fix
   - Trap handler fix
   - VGPR fixes for GC 11.5

  i915:
   - Fix format string truncation warning
   - FIx runtime PM reference during fbdev BO creation

  panthor:
   - fix UAF

  renesas:
   - fix sync flag handling"

* tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
  Revert "drm/amd/display: Fix pbn to kbps Conversion"
  drm/amd: Fix unbind/rebind for VCN 4.0.5
  drm/i915: Fix format string truncation warning
  drm/i915/fbdev: Hold runtime PM ref during fbdev BO creation
  drm/amd/display: Improve HDMI info retrieval
  drm/amdkfd: bump minimum vgpr size for gfx1151
  drm/amd/display: shrink struct members
  drm/amdkfd: Export the cwsr_size and ctl_stack_size to userspace
  drm/amd/display: Refactor dml_core_mode_support to reduce stack frame
  drm/amdgpu: don't attach the tlb fence for SI
  drm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()
  drm/amdkfd: Trap handler support for expert scheduling mode
  drm/amdkfd: Use huge page size to check split svm range alignment
  drm/rcar-du: dsi: Handle both DRM_MODE_FLAG_N.SYNC and !DRM_MODE_FLAG_P.SYNC
  drm/gem-shmem: revert the 8-byte alignment constraint
  drm/gem-dma: revert the 8-byte alignment constraint
  drm/panthor: Prevent potential UAF in group creation

2 months agodrm/xe/lnl: Drop pre-production workaround support
Matt Roper [Fri, 12 Dec 2025 18:14:13 +0000 (10:14 -0800)] 
drm/xe/lnl: Drop pre-production workaround support

LNL has been out long enough that all of our internal usage of
pre-production hardware has been phased out and we no longer need to
maintain workarounds that were exclusive to pre-production parts.

Production LNL hardware always has B0 or later steppings for both
graphics and media IP.  Eliminate all workarounds that were exclusive to
A-step hardware and set the 'has_prod_wa_only' device flag for LNL to
make sure we warn and taint if someone tries to load the driver on an
old pre-production part.

Bspec: 70821
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20251212181411.294854-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe: Track pre-production workaround support
Matt Roper [Fri, 12 Dec 2025 18:14:12 +0000 (10:14 -0800)] 
drm/xe: Track pre-production workaround support

When we're initially enabling driver support for a new platform/IP, we
usually implement all workarounds documented in the WA database in the
driver.  Many of those workarounds are restricted to early steppings
that only showed up in pre-production hardware (i.e., internal test
chips that are not available to the general public).  Since the
workarounds for early, pre-production steppings tend to be some of the
ugliest and most complicated workarounds, we generally want to eliminate
them and simplify the code once the platform has launched and our
internal usage of those pre-production parts have been phased out.

Let's add a flag to the device info that tracks which platforms still
have support for pre-production workarounds for so that we can print a
warning and taint if someone tries to load the driver on a
pre-production part for a platform without pre-production workarounds.
This will help our internal users understand the likely problems they'll
encounter if they try to load the driver on an old pre-production
device.

The Xe behavior here is similar to what we've done for many years on
i915 (see intel_detect_preproduction_hw()), except that instead of
manually coding up ranges of device steppings that we believe to be
pre-production hardware, Xe will use the hardware's own production vs
pre-production fusing status, which we can read from the FUSE2 register.
This fuse didn't exist on older Intel hardware, but should be present on
all platforms supported by the Xe driver.

Going forward, let's set the expectation that we'll start looking into
removing pre-production workarounds for a platform around the time that
platforms of the next major IP stepping are having their force_probe
requirement lifted.  This timing is just a rough guideline; there may be
cases where some instances of pre-production parts are still being
actively used in CI farms, internal device pools, etc. and we'll need to
wait a bit longer for those to be swapped out.

v2:
 - Fix inverted forcewake check

v3:
 - Invert flag and add it to the platforms on which we still have
   pre-prod workarounds.  (Jani, Lucas)

v4:
 - Avoid checking pre-production on VF since they don't have access to
   the FUSE2 register.

Bspec: 78271, 52544
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251212181411.294854-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agoMerge tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Linus Torvalds [Sat, 13 Dec 2025 05:15:16 +0000 (17:15 +1200)] 
Merge tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull further i3c update from Alexandre Belloni:
 "We are removing a legacy API callback and having this sooner rather
  than later will help ensuring no one introduces a new driver using it.

  I've also added patches removing the "__free(...) = NULL" pattern
  because I'm sure we won't avoid people sending those following the
  mailing list discussion..."

* tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: adi: Fix confusing cleanup.h syntax
  i3c: master: Fix confusing cleanup.h syntax
  i3c: master: cleanup callback .priv_xfers()
  i3c: master: switch to use new callback .i3c_xfers() from .priv_xfers()