]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 months agodrm/xe: Add VM.uapi_flags to VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:49 +0000 (10:59 -0800)] 
drm/xe: Add VM.uapi_flags to VM snapshot capture

Add VM.uapi_flags to VM snapshot capture VM snapshot capture. This is
useful information for debug and will help build a robust GPU hang
replay tool.

The current format is:

VM.uapi_flags: 0x%x

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-7-matthew.brost@intel.com
2 months agodrm/xe: Add cpu_caching to properties line in VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:48 +0000 (10:59 -0800)] 
drm/xe: Add cpu_caching to properties line in VM snapshot capture

Add CPU caching to properties line in VM snapshot capture indicating
the BO caching properites. This is useful information for debug and
will help build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>|mem_region=0x%x|pat_index=%d|cpu_caching=%d

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Memory region is a bit mask of where the memory is located.

Pat index corresponds to the value setup upon VM bind.

CPU caching corresponds to the value of BO setup upon creation.

v2:
 - Save off cpu_caching value rather than looking at BO (Carlos)
v4:
 - Fix NULL ptr dereference (Carlos)

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-6-matthew.brost@intel.com
2 months agodrm/xe: Add pat_index to properties line in VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:47 +0000 (10:59 -0800)] 
drm/xe: Add pat_index to properties line in VM snapshot capture

Add pat index to properties line in VM snapshot capture indicating
the VMA caching properites. This is useful information for debug and
will help build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>|mem_region=0x%x|pat_index=%d

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Memory region is a bit mask of where the memory is located.

Pat index corresponds to the value setup upon VM bind.

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-5-matthew.brost@intel.com
2 months agodrm/xe: Add mem_region to properties line in VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:46 +0000 (10:59 -0800)] 
drm/xe: Add mem_region to properties line in VM snapshot capture

Add memory region to properties line in VM snapshot capture indicating
where the memory is located. The memory region corresponds to regions in
the uAPI. This is useful information for debug and will help build a
robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>|mem_region=0x%x

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Memory region is a bit mask of where the memory is located.

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-4-matthew.brost@intel.com
2 months agodrm/xe: Add "null_sparse" type to VM snap properties
Matthew Brost [Wed, 26 Nov 2025 18:59:45 +0000 (10:59 -0800)] 
drm/xe: Add "null_sparse" type to VM snap properties

Add "null_sparse" type to VM snap properties indicating the VMA reads
zero and writes are droppped. This is useful information for debug and
will help build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-3-matthew.brost@intel.com
2 months agodrm/xe: Add properties line to VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:44 +0000 (10:59 -0800)] 
drm/xe: Add properties line to VM snapshot capture

Add properties line to VM snapshot capture which includes additional
information about VMA being dumped. This is helpful for debug purposes
but also to build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>

Permissions has two options, either "read_only" or "read_write".

Type has two options, either "userptr" or "bo".

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-2-matthew.brost@intel.com
2 months agodrm/xe: Apply Wa_14020316580 in xe_gt_idle_enable_pg()
Vinay Belgaumkar [Sat, 29 Nov 2025 05:25:48 +0000 (21:25 -0800)] 
drm/xe: Apply Wa_14020316580 in xe_gt_idle_enable_pg()

Wa_14020316580 was getting clobbered by power gating init code
later in the driver load sequence. Move the Wa so that
it applies correctly.

Fixes: 7cd05ef89c9d ("drm/xe/xe2hpm: Add initial set of workarounds")
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251129052548.70766-1-vinay.belgaumkar@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe: Fix freq kobject leak on sysfs_create_files failure
Shuicheng Lin [Fri, 14 Nov 2025 20:56:39 +0000 (20:56 +0000)] 
drm/xe: Fix freq kobject leak on sysfs_create_files failure

Ensure gt->freq is released when sysfs_create_files() fails
in xe_gt_freq_init(). Without this, the kobject would leak.
Add kobject_put() before returning the error.

Fixes: fdc81c43f0c1 ("drm/xe: use devm_add_action_or_reset() helper")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Alex Zuo <alex.zuo@intel.com>
Reviewed-by: Xin Wang <x.wang@intel.com>
Link: https://patch.msgid.link/20251114205638.2184529-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/xe3_lpg: Apply Wa_16028005424
Balasubramani Vivekanandan [Fri, 21 Nov 2025 10:08:23 +0000 (15:38 +0530)] 
drm/xe/xe3_lpg: Apply Wa_16028005424

Applied Wa_16028005424 to Graphics version from 30.00 to 30.05

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20251121100822.20076-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agovfio/xe: Add device specific vfio_pci driver variant for Intel graphics
Michał Winiarski [Thu, 27 Nov 2025 09:39:34 +0000 (10:39 +0100)] 
vfio/xe: Add device specific vfio_pci driver variant for Intel graphics

In addition to generic VFIO PCI functionality, the driver implements
VFIO migration uAPI, allowing userspace to enable migration for Intel
Graphics SR-IOV Virtual Functions.
The driver binds to VF device and uses API exposed by Xe driver to
transfer the VF migration data under the control of PF device.

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex@shazbot.org>
Link: https://patch.msgid.link/20251127093934.1462188-5-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pf: Export helpers for VFIO
Michał Winiarski [Thu, 27 Nov 2025 09:39:33 +0000 (10:39 +0100)] 
drm/xe/pf: Export helpers for VFIO

Device specific VFIO driver variant for Xe will implement VF migration.
Export everything that's needed for migration ops.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251127093934.1462188-4-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pci: Introduce a helper to allow VF access to PF xe_device
Michał Winiarski [Thu, 27 Nov 2025 09:39:32 +0000 (10:39 +0100)] 
drm/xe/pci: Introduce a helper to allow VF access to PF xe_device

In certain scenarios (such as VF migration), VF driver needs to interact
with PF driver.
Add a helper to allow VF driver access to PF xe_device.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251127093934.1462188-3-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pf: Enable SR-IOV VF migration
Michał Winiarski [Thu, 27 Nov 2025 09:39:31 +0000 (10:39 +0100)] 
drm/xe/pf: Enable SR-IOV VF migration

All of the necessary building blocks are now in place to support SR-IOV
VF migration.
Flip the enable/disable logic to match VF code and disable the feature
only for platforms that don't meet the necessary prerequisites.
To allow more testing and experiments, on DEBUG builds any missing
prerequisites will be ignored.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251127093934.1462188-2-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/gt: Introduce runtime suspend/resume
Raag Jadav [Thu, 30 Oct 2025 12:23:57 +0000 (17:53 +0530)] 
drm/xe/gt: Introduce runtime suspend/resume

If power state is retained between suspend/resume cycle, we don't need
to perform full GT re-initialization. Introduce runtime helpers for GT
which greatly reduce suspend/resume delay.

v2: Drop redundant xe_gt_sanitize() and xe_guc_ct_stop() (Daniele)
    Use runtime naming for guc helpers (Daniele)
v3: Drop redundant logging, add kernel doc (Michal)
    Use runtime naming for ct helpers (Michal)
v4: Fix tags (Rodrigo)
v5: Include host_l2_vram workaround (Daniele)
    Reuse xe_guc_submit_enable/disable() helpers (Daniele)

Co-developed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-5-raag.jadav@intel.com
2 months agodrm/xe/pm: Assert on runtime suspend if VFs are enabled
Raag Jadav [Thu, 30 Oct 2025 12:23:56 +0000 (17:53 +0530)] 
drm/xe/pm: Assert on runtime suspend if VFs are enabled

We hold an additional reference to the runtime PM to keep PF in D0
during VFs lifetime, as our VFs do not implement the PM capability.
This means we should never be runtime suspending as long as VFs are
enabled.

v8: Add !IS_SRIOV_VF() assert (Matthew Brost)

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-4-raag.jadav@intel.com
2 months agodrm/xe/guc_submit: Introduce pause/unpause() helpers for PF
Raag Jadav [Thu, 30 Oct 2025 12:23:55 +0000 (17:53 +0530)] 
drm/xe/guc_submit: Introduce pause/unpause() helpers for PF

Introduce pause/unpause() helpers which stop/start further runs of
submission tasks on given GuC and can be called from PF context. This
is in preparation of usecases where we simply need to stop/start the
scheduler without losing GuC state and don't require dealing with VF
migration.

v7: Reword commit message (Matthew Brost)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-3-raag.jadav@intel.com
2 months agodrm/xe/vf: Update pause/unpause() helpers with VF naming
Raag Jadav [Thu, 30 Oct 2025 12:23:54 +0000 (17:53 +0530)] 
drm/xe/vf: Update pause/unpause() helpers with VF naming

Now that pause/unpause() helpers have been updated for VF migration
usecase, update their naming to match the functionality and while at it,
add IS_SRIOV_VF() assert to make sure they are not abused.

v7: Add IS_SRIOV_VF() assert (Matthew Brost)
    Use "vf" suffix (Michal)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-2-raag.jadav@intel.com
2 months agodrm/xe: Move VRAM MM debugfs creation to tile level
Piotr Piórkowski [Thu, 27 Nov 2025 07:36:43 +0000 (08:36 +0100)] 
drm/xe: Move VRAM MM debugfs creation to tile level

Previously, VRAM TTM resource manager debugfs entries (vram0_mm / vram1_mm)
were created globally in the XE debugfs root directory. But technically,
each tile has an associated VRAM TTM manager, which it can own.
Let's create VRAM memory manager debugfs entries directly under each tile's
debugfs directory for better alignment with the per-tile memory layout.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251127073643.144379-1-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/xe_sriov_packet: Return int from pf_descriptor_init
Jonathan Cavitt [Mon, 17 Nov 2025 19:01:15 +0000 (19:01 +0000)] 
drm/xe/xe_sriov_packet: Return int from pf_descriptor_init

pf_descriptor_init currently returns a size_t, which is an unsigned
integer data type.  This conflicts with it returning a negative errno
value on failure.

Make it return an int instead.  This mirrors how pf_trailer_init is used
later.

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Alex Zuo <alex.zuo@intel.com>
Link: https://patch.msgid.link/20251117190114.69953-2-jonathan.cavitt@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe: Covert return of -EBUSY to -ENOMEM in VM bind IOCTL
Matthew Brost [Sat, 22 Nov 2025 01:25:02 +0000 (17:25 -0800)] 
drm/xe: Covert return of -EBUSY to -ENOMEM in VM bind IOCTL

xe_vma_userptr_pin_pages can return -EBUSY but -EBUSY has special
meaning in VM bind IOCTLs that user fence is pending that is attached to
the VMA. Convert -EBUSY to -ENOMEM in this case as -EBUSY in practice
means we are low or out of memory.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patch.msgid.link/20251122012502.382587-2-matthew.brost@intel.com
2 months agodrm/gpusvm: Limit the number of retries in drm_gpusvm_get_pages
Matthew Brost [Sat, 22 Nov 2025 01:25:01 +0000 (17:25 -0800)] 
drm/gpusvm: Limit the number of retries in drm_gpusvm_get_pages

drm_gpusvm_get_pages should not be allowed to retry forever, cap the
time spent in the function to HMM_RANGE_DEFAULT_TIMEOUT has this is
essentially a wrapper around hmm_range_fault.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251122012502.382587-1-matthew.brost@intel.com
2 months agodrm/xe: Add caching pagetable flag
Zbigniew Kempczyński [Tue, 25 Nov 2025 15:37:33 +0000 (16:37 +0100)] 
drm/xe: Add caching pagetable flag

Introduce device xe_caching_pt flag to selectively turn it on for
supported platforms. It allows to eliminate version check and enable
this feature for the future platforms.

Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251125153732.400766-2-zbigniew.kempczynski@intel.com
2 months agodrm/xe/vm: Skip ufence association for CPU address mirror VMA during MAP
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:28 +0000 (13:26 +0530)] 
drm/xe/vm: Skip ufence association for CPU address mirror VMA during MAP

The MAP operation for a CPU address mirror VMA does not require ufence
association because such mappings are not GPU-synchronized and do not
participate in GPU job completion signaling.

Remove the unnecessary ufence addition for this case to avoid -EBUSY
failure in check_ufence of unbind ops.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-6-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe/svm: Enable UNMAP for VMA merging operations
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:27 +0000 (13:26 +0530)] 
drm/xe/svm: Enable UNMAP for VMA merging operations

ALLOW UNMAP of VMAs associated with SVM mappings when the MAP operation
is intended to merge adjacent CPU_ADDR_MIRROR VMAs.

v2
- Remove mapping exist check in garbage collector

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-5-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe/svm: Extend MAP range to reduce vma fragmentation
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:26 +0000 (13:26 +0530)] 
drm/xe/svm: Extend MAP range to reduce vma fragmentation

When DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR is set during VM_BIND_OP_MAP,
the mapping logic now checks adjacent cpu_addr_mirror VMAs with default
attributes and expands the mapping range accordingly. This ensures that
bo_unmap operations ideally target the same area and helps reduce
fragmentation by coalescing nearby compatible VMAs into a single mapping.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-4-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe: Merge adjacent default-attribute VMAs during garbage collection
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:25 +0000 (13:26 +0530)] 
drm/xe: Merge adjacent default-attribute VMAs during garbage collection

While restoring default memory attributes for VMAs during garbage
collection, extend the target range by checking neighboring VMAs. If
adjacent VMAs are CPU-address-mirrored and have default attributes,
include them in the mergeable range to reduce fragmentation and improve
VMA reuse.

v2
-Rebase

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe: Add helper to extend CPU-mirrored VMA range for merge
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:24 +0000 (13:26 +0530)] 
drm/xe: Add helper to extend CPU-mirrored VMA range for merge

Introduce xe_vm_find_cpu_addr_mirror_vma_range(), which computes an
extended range around a given range by including adjacent VMAs that are
CPU-address-mirrored and have default memory attributes. This helper is
useful for determining mergeable range without performing the actual merge.

v2
- Add assert
- Move unmap check to this patch

v3
- Decrease offset to check by SZ_4K to avoid wrong vma return in fast
  lookup path

v4
- *start should be >= SZ_4K (Matt)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe: Protect against unset LRC when pausing submissions
Tomasz Lis [Mon, 24 Nov 2025 22:28:53 +0000 (23:28 +0100)] 
drm/xe: Protect against unset LRC when pausing submissions

While pausing submissions, it is possible to encouner an exec queue
which is during creation, and therefore doesn't have a valid xe_lrc
struct reference.

Protect agains such situation, by checking for NULL before access.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause")
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251124222853.1900800-1-tomasz.lis@intel.com
2 months agodrm/xe/vf: Start re-emission from first unsignaled job during VF migration
Matthew Brost [Fri, 21 Nov 2025 15:27:50 +0000 (07:27 -0800)] 
drm/xe/vf: Start re-emission from first unsignaled job during VF migration

The LRC software ring tail is reset to the first unsignaled pending
job's head.

Fix the re-emission logic to begin submitting from the first unsignaled
job detected, rather than scanning all pending jobs, which can cause
imbalance.

v2:
 - Include missing local changes
v3:
 - s/skip_replay/restore_replay (Tomasz)

Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patch.msgid.link/20251121152750.240557-1-matthew.brost@intel.com
2 months agodrm/xe/pf: Handle MERT catastrophic errors
Lukasz Laguna [Mon, 24 Nov 2025 19:02:37 +0000 (20:02 +0100)] 
drm/xe/pf: Handle MERT catastrophic errors

The MERT block triggers an interrupt when a catastrophic error occurs.
Update the interrupt handler to read the MERT catastrophic error type
and log appropriate debug message.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-5-lukasz.laguna@intel.com
2 months agodrm/xe/pf: Add TLB invalidation support for MERT
Lukasz Laguna [Mon, 24 Nov 2025 19:02:36 +0000 (20:02 +0100)] 
drm/xe/pf: Add TLB invalidation support for MERT

Add support for triggering and handling MERT TLB invalidation. After
LMTT updates, the MERT TLB invalidation is initiated to ensure memory
translations remain coherent.

Completion of the invalidation is signaled via MERT interrupt (bit 13 in
the GFX master interrupt register). Detect and handle this interrupt to
properly synchronize the invalidation flow.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-4-lukasz.laguna@intel.com
2 months agodrm/xe/pf: Configure LMTT in MERT
Lukasz Laguna [Mon, 24 Nov 2025 19:02:35 +0000 (20:02 +0100)] 
drm/xe/pf: Configure LMTT in MERT

On platforms with standalone MERT, the PF driver needs to program LMTT
in MERT's LMEM_CFG register.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-3-lukasz.laguna@intel.com
2 months agodrm/xe: Add device flag to indicate standalone MERT
Lukasz Laguna [Mon, 24 Nov 2025 19:02:34 +0000 (20:02 +0100)] 
drm/xe: Add device flag to indicate standalone MERT

The MERT subsystem manages memory accesses between host and device. On
the Crescent Island platform, it requires direct management by the
driver.

Introduce a device flag and corresponding helpers to identify platforms
with standalone MERT, enabling proper initialization and handling.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-2-lukasz.laguna@intel.com
2 months agodrm/xe/uc: Change assertion to error on huc authentication failure
Zhanjun Dong [Mon, 27 Oct 2025 21:42:12 +0000 (17:42 -0400)] 
drm/xe/uc: Change assertion to error on huc authentication failure

The fault injection test can cause the xe_huc_auth function to fail.
This is an intentional failure, so in this scenario we don't want to
throw an assert and taint the kernel, because that will impact CI
execution.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20251027214212.2856903-1-zhanjun.dong@intel.com
2 months agodrm/xe/guc: Cleanup GuC log buffer macros and helpers
Zhanjun Dong [Wed, 5 Nov 2025 23:31:43 +0000 (18:31 -0500)] 
drm/xe/guc: Cleanup GuC log buffer macros and helpers

Cleanup GuC log buffer macros and helpers, add Xe style macro prefix.
Update buffer type values to align with the GuC specification
Update buffer offset calculation.
Remove helper functions, replaced with macros.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20251105233143.1168759-1-zhanjun.dong@intel.com
2 months agodrm/xe/pf: Fix .bulk_profile/sched_priority description
Michal Wajdeczko [Sat, 15 Nov 2025 15:26:58 +0000 (16:26 +0100)] 
drm/xe/pf: Fix .bulk_profile/sched_priority description

The .bulk_profile/sched_priority file is always write-only, unlike
the profile/sched_priority files which can be either read-write or
read-only (in case of PF or VFs respectively).

Fixes: 6b514ed2d9a7 ("drm/xe/pf: Add documentation for sriov_admin attributes")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251115152659.10853-1-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Use div_u64 when calculating GGTT profile
Michal Wajdeczko [Sat, 15 Nov 2025 15:13:22 +0000 (16:13 +0100)] 
drm/xe/pf: Use div_u64 when calculating GGTT profile

This will fix the following error seen on some 32-bit config:

"ERROR: modpost: "__udivdi3" [drivers/gpu/drm/xe/xe.ko] undefined!"

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511150929.3vUi6PEJ-lkp@intel.com/
Fixes: e448372e8a8e ("drm/xe/pf: Use migration-friendly GGTT auto-provisioning")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251115151323.10828-1-michal.wajdeczko@intel.com
2 months agodrm/xe: Fix conversion from clock ticks to milliseconds
Harish Chegondi [Mon, 17 Nov 2025 19:48:43 +0000 (11:48 -0800)] 
drm/xe: Fix conversion from clock ticks to milliseconds

When tick counts are large and multiplication by MSEC_PER_SEC is larger
than 64 bits, the conversion from clock ticks to milliseconds can go bad.

Use mul_u64_u32_div() instead.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 49cc215aad7f ("drm/xe: Add xe_gt_clock_interval_to_ms helper")
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/1562f1b62d5be3fbaee100f09107f3cc49e40dd1.1763408584.git.harish.chegondi@intel.com
2 months agodrm/xe/guc_ct: Cleanup ifdef'ry
Lucas De Marchi [Wed, 19 Nov 2025 15:21:57 +0000 (07:21 -0800)] 
drm/xe/guc_ct: Cleanup ifdef'ry

Better split CONFIG_DRM_XE_DEBUG and CONFIG_DRM_XE_DEBUG_GUC optional
parts from the main code, creating smaller ct_dead_* and fast_req_*
interfaces.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20251119152157.1675188-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc: Fix stack_depot usage
Lucas De Marchi [Tue, 18 Nov 2025 19:08:11 +0000 (11:08 -0800)] 
drm/xe/guc: Fix stack_depot usage

Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is
enabled to fix the following call stack:

[] BUG: kernel NULL pointer dereference, address: 0000000000000000
[] Workqueue:  drm_sched_run_job_work [gpu_sched]
[] RIP: 0010:stack_depot_save_flags+0x172/0x870
[] Call Trace:
[]  <TASK>
[]  fast_req_track+0x58/0xb0 [xe]

Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from")
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Cc: stable@vger.kernel.org # v6.17+
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251118-fix-debug-guc-v1-1-9f780c6bedf8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc: Fix resource leak in xe_guc_ct_init_noalloc()
Shuicheng Lin [Mon, 10 Nov 2025 18:45:23 +0000 (18:45 +0000)] 
drm/xe/guc: Fix resource leak in xe_guc_ct_init_noalloc()

xe_guc_ct_init_noalloc() allocates the CT workqueue and other helpers
before it tries to initialize ct->lock. If drmm_mutex_init() fails
we currently bail out without releasing those resources because the
guc_ct_fini() hasn’t been registered yet.

Since destroy_workqueue() in guc_ct_fini() may flush the workqueue, which
in turn can take the ct lock, the initialization sequence is restructured
to first initialize the ct->lock, then set up all CT state, and finally
register guc_ct_fini().

v2: guc_ct_fini() does take ct lock. (Matt)
v3: move primelockdep() together with drmm_mutex_init(). (Lucas)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251110184522.1581001-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe: Fix memory leak when handling pagefault vma
Mika Kuoppala [Thu, 20 Nov 2025 16:14:35 +0000 (18:14 +0200)] 
drm/xe: Fix memory leak when handling pagefault vma

When the pagefault handling code was moved to a new file, an extra
drm_exec_init() was added to the VMA path. This call is unnecessary because
xe_validation_ctx_init() already performs a drm_exec_init(), resulting in a
memory leak reported by kmemleak.

Remove the redundant drm_exec_init() from the VMA pagefault handling code.

Fixes: fb544b844508 ("drm/xe: Implement xe_pagefault_queue_work")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: intel-xe@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251120161435.3674556-1-mika.kuoppala@linux.intel.com
2 months agodrm/xe/oa: Fix potential UAF in xe_oa_add_config_ioctl()
Sanjay Yadav [Tue, 18 Nov 2025 11:49:00 +0000 (17:19 +0530)] 
drm/xe/oa: Fix potential UAF in xe_oa_add_config_ioctl()

In xe_oa_add_config_ioctl(), we accessed oa_config->id after dropping
metrics_lock. Since this lock protects the lifetime of oa_config, an
attacker could guess the id and call xe_oa_remove_config_ioctl() with
perfect timing, freeing oa_config before we dereference it, leading to
a potential use-after-free.

Fix this by caching the id in a local variable while holding the lock.

v2: (Matt A)
- Dropped mutex_unlock(&oa->metrics_lock) ordering change from
  xe_oa_remove_config_ioctl()

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6614
Fixes: cdf02fe1a94a7 ("drm/xe/oa/uapi: Add/remove OA config perf ops")
Cc: <stable@vger.kernel.org> # v6.11+
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251118114859.3379952-2-sanjay.kumar.yadav@intel.com
2 months agodrm/xe/debugfs: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:06 +0000 (08:44 -0800)] 
drm/xe/debugfs: Use scope-based runtime PM

Switch the debugfs code to use scope-based runtime PM where possible,
for consistency with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-56-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/sysfs: Use scope-based runtime power management
Matt Roper [Tue, 18 Nov 2025 16:44:05 +0000 (08:44 -0800)] 
drm/xe/sysfs: Use scope-based runtime power management

Switch sysfs to use scope-based runtime power management to slightly
simplify the code.

v2:
 - Drop unnecessary local variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-55-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/tests: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:04 +0000 (08:44 -0800)] 
drm/xe/tests: Use scope-based runtime PM

Use scope-based handling of runtime PM in the kunit tests for
consistency with other parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-54-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/sriov: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:03 +0000 (08:44 -0800)] 
drm/xe/sriov: Use scope-based runtime PM

Use scope-based runtime power management in the SRIOV code for
consistency with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-53-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/hwmon: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:02 +0000 (08:44 -0800)] 
drm/xe/hwmon: Use scope-based runtime PM

Use scope-based runtime power management in the hwmon code for
consistency with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-52-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/ggtt: Use scope-based runtime pm
Matt Roper [Tue, 18 Nov 2025 16:44:01 +0000 (08:44 -0800)] 
drm/xe/ggtt: Use scope-based runtime pm

Switch the GGTT code to scope-based runtime PM for consistency with
other parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-51-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/bo: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:00 +0000 (08:44 -0800)] 
drm/xe/bo: Use scope-based runtime PM

Use scope-based runtime power management in the BO code for consistency
with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variable.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-50-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/vram: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:59 +0000 (08:43 -0800)] 
drm/xe/vram: Use scope-based forcewake

Switch VRAM code to use scope-based forcewake for consistency with other
parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-49-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/reg_sr: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:58 +0000 (08:43 -0800)] 
drm/xe/reg_sr: Use scope-based forcewake

Use scope-based forcewake to slightly simplify the reg_sr code.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-48-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/query: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:57 +0000 (08:43 -0800)] 
drm/xe/query: Use scope-based forcewake

Use scope-based forcewake handling for consistency with other parts of
the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-47-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/huc: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:56 +0000 (08:43 -0800)] 
drm/xe/huc: Use scope-based forcewake

Use scope-based forcewake in the HuC code for a small simplification and
consistency with other parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-46-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gt_debugfs: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:55 +0000 (08:43 -0800)] 
drm/xe/gt_debugfs: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM to simplify the
debugfs code slightly.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-45-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/drm_client: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:54 +0000 (08:43 -0800)] 
drm/xe/drm_client: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM.

v2:
 - Use xe_force_wake_release_only rather than a custom one-off class for
   "any engine" forcewake.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-44-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe: Return forcewake reference type from force_wake_get_any_engine()
Matt Roper [Tue, 18 Nov 2025 16:43:53 +0000 (08:43 -0800)] 
drm/xe: Return forcewake reference type from force_wake_get_any_engine()

Adjust the signature of force_wake_get_any_engine() such that it returns
a 'struct xe_force_wake_ref' rather than a boolean success/failure.
Failure cases are now recognized by inspecting the hardware engine
returned by reference; a NULL hwe indicates that no engine's forcewake
could be obtained.

These changes will make it cleaner and easier to incorporate scope-based
cleanup in force_wake_get_any_engine()'s caller in a future patch.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-43-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/display: Use scoped-cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:52 +0000 (08:43 -0800)] 
drm/xe/display: Use scoped-cleanup

Eliminate some goto-based cleanup by utilizing scoped cleanup helpers.

v2:
 - Eliminate unnecessary 'ret' variable in intel_hdcp_gsc_check_status()
   (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-42-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/devcoredump: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:51 +0000 (08:43 -0800)] 
drm/xe/devcoredump: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM in the devcoredump
code.  This eliminates some goto-based error handling and slightly
simplifies other functions.

v2:
 - Move the forcewake acquisition slightly higher in
   devcoredump_snapshot() so that we maintain an easy-to-understand LIFO
   cleanup order.  (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-41-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/device: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:50 +0000 (08:43 -0800)] 
drm/xe/device: Use scope-based cleanup

Convert device code to use scope-based forcewake and runtime PM.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-40-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gsc: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:49 +0000 (08:43 -0800)] 
drm/xe/gsc: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM to eliminate some
goto-based error handling and simplify other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-39-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pxp: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:48 +0000 (08:43 -0800)] 
drm/xe/pxp: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime pm.  This allows us to
eliminate some goto-based error handling and simplify other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-38-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pat: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:47 +0000 (08:43 -0800)] 
drm/xe/pat: Use scope-based forcewake

Use scope-based cleanup for forcewake in the PAT code to slightly
simplify the code.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-37-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/mocs: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:46 +0000 (08:43 -0800)] 
drm/xe/mocs: Use scope-based cleanup

Using scope-based cleanup for runtime PM and forcewake in the MOCS code
allows us to eliminate some goto-based error handling and simplify some
other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-36-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/guc_pc: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:45 +0000 (08:43 -0800)] 
drm/xe/guc_pc: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM in the GuC PC code.
This allows us to eliminate to goto-based cleanup and simplifies some
other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-35-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/guc: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:44 +0000 (08:43 -0800)] 
drm/xe/guc: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-34-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gt_idle: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:43 +0000 (08:43 -0800)] 
drm/xe/gt_idle: Use scope-based cleanup

Use scope-based cleanup for runtime PM and forcewake in the GT idle
code.

v2:
 - Use scoped_guard() over guard() in idle_status_show() and
   idle_residency_ms_show().  (Gustavo)
 - Eliminate unnecessary 'ret' local variable in name_show().

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-33-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gt: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:42 +0000 (08:43 -0800)] 
drm/xe/gt: Use scope-based cleanup

Using scope-based cleanup for forcewake and runtime PM allows us to
reduce or eliminate some of the goto-based error handling and simplify
several functions.

v2:
 - Drop changes to do_gt_restart().  This function still has goto-based
   logic, making scope-based cleanup unsafe for now.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-32-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pm: Add scope-based cleanup helper for runtime PM
Matt Roper [Tue, 18 Nov 2025 16:43:41 +0000 (08:43 -0800)] 
drm/xe/pm: Add scope-based cleanup helper for runtime PM

Add a scope-based helpers for runtime PM that may be used to simplify
cleanup logic and potentially avoid goto-based cleanup.

For example, using

        guard(xe_pm_runtime)(xe);

will get runtime PM and cause a corresponding put to occur automatically
when the current scope is exited.  'xe_pm_runtime_noresume' can be used
as a guard replacement for the corresponding 'noresume' variant.
There's also an xe_pm_runtime_ioctl conditional guard that can be used
as a replacement for xe_runtime_ioctl():

        ACQUIRE(xe_pm_runtime_ioctl, pm)(xe);
        if ((ret = ACQUIRE_ERR(xe_pm_runtime_ioctl, &pm)) < 0)
                /* failed */

In a few rare cases (such as gt_reset_worker()) we need to ensure that
runtime PM is dropped when the function is exited by any means
(including error paths), but the function does not need to acquire
runtime PM because that has already been done earlier by a different
function.  For these special cases, an 'xe_pm_runtime_release_only'
guard can be used to handle the release without doing an acquisition.

These guards will be used in future patches to eliminate some of our
goto-based cleanup.

v2:
 - Specify success condition for xe_pm runtime_ioctl as _RET >= 0 so
   that positive values will be properly identified as success and
   trigger destructor cleanup properly.

v3:
 - Add comments to the kerneldoc for the existing 'get' functions
   indicating that scope-based handling should be preferred where
   possible.  (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-31-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/forcewake: Add scope-based cleanup for forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:40 +0000 (08:43 -0800)] 
drm/xe/forcewake: Add scope-based cleanup for forcewake

Since forcewake uses a reference counting get/put model, there are many
places where we need to be careful to drop the forcewake reference when
bailing out of a function early on an error path.  Add scope-based
cleanup options that can be used in place of explicit get/put to help
prevent mistakes in this area.

Examples:

   CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT);

       Obtain forcewake on the XE_FW_GT domain and hold it until the
       end of the current block.  The wakeref will be dropped
       automatically when the current scope is exited by any means
       (return, break, reaching the end of the block, etc.).

   xe_with_force_wake(fw_ref, gt_to_fw(ss->gt), XE_FORCEWAKE_ALL) {
        ...
   }

       Hold all forcewake domains for the following block.  As with the
       CLASS usage, forcewake will be dropped automatically when the
       block is exited by any means.

Use of these cleanup helpers should allow us to remove some ugly
goto-based error handling and help avoid mistakes in functions with lots
of early error exits.

An 'xe_force_wake_release_only' class is also added for cases where a
forcewake reference is passed in from another function and the current
function is responsible for releasing it in every flow and error path.

v2:
 - Create a separate constructor that just wraps xe_force_wake_get for
   use in the class.  This eliminates the need to update the signature
   of xe_force_wake_get().  (Michal)

v3:
 - Wrap xe_with_force_wake's 'done' marker in __UNIQUE_ID.  (Gustavo)
 - Add a note to xe_force_wake_get()'s kerneldoc explaining that
   scope-based cleanup is preferred when possible.  (Gustavo)
 - Add an xe_force_wake_release_only class.  (Gustavo)

v4:
 - Add NULL check on fw in release_only variant.  (Gustavo)

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-30-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/vm: Use for_each_tlb_inval() to calculate invalidation fences
Matt Roper [Tue, 18 Nov 2025 20:26:05 +0000 (12:26 -0800)] 
drm/xe/vm: Use for_each_tlb_inval() to calculate invalidation fences

ops_execute() calculates the size of a fence array based on
XE_MAX_GT_PER_TILE, while the code that actually fills in the fence
array uses a for_each_tlb_inval() iterator.  This works out okay today
since both approaches come up with the same number of invalidation
fences (2: primary GT invalidation + media GT invalidation), but could
be problematic in the future if there isn't a 1:1 relationship between
TLBs needing invalidation and potential GTs on the tile.

Adjust the allocation code to use the same for_each_tlb_inval()
counting logic as the code that fills the array to future-proof the
code.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251118202604.3715782-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/vf: Shadow buffer management for CCS read/write operations
Satyanarayana K V P [Tue, 18 Nov 2025 12:07:45 +0000 (12:07 +0000)] 
drm/xe/vf: Shadow buffer management for CCS read/write operations

CCS copy command consist of 5-dword sequence. If vCPU halts during
save/restore operations while these sequences are being programmed,
incomplete writes can cause page faults during IGPU CCS metadata saving.

Use shadow buffer management to prevent partial write issues during CCS
operations.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251118120745.3460172-3-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/sa: Shadow buffer support in the sub-allocator pool
Satyanarayana K V P [Tue, 18 Nov 2025 12:07:44 +0000 (12:07 +0000)] 
drm/xe/sa: Shadow buffer support in the sub-allocator pool

The existing sub-allocator is limited to managing a single buffer object.
This enhancement introduces shadow buffer functionality to support
scenarios requiring dual buffer management.

The changes include added shadow buffer object creation capability,
Management for both primary and shadow buffers, and appropriate locking
mechanisms for thread-safe operations.

This enables more flexible buffer allocation strategies in scenarios where
shadow buffering is required.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251118120745.3460172-2-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/irq: Handle msix vector0 interrupt
Venkata Ramana Nayana [Fri, 7 Nov 2025 08:31:41 +0000 (14:01 +0530)] 
drm/xe/irq: Handle msix vector0 interrupt

Current gu2host handler registered as MSI-X vector 0 and as per bspec for
a msix vector 0 interrupt, the driver must check the legacy registers
190008(TILE_INT_REG), 190060h (GT INTR Identity Reg 0) and other registers
mentioned in "Interrupt Service Routine Pseudocode" otherwise it will block
the next interrupts. To overcome this issue replacing guc2host handler
with legacy xe_irq_handler.

Fixes: da889070be7b2 ("drm/xe/irq: Separate MSI and MSI-X flows")
Bspec: 62357
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20251107083141.2080189-1-venkata.ramana.nayana@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pf: Check for fence error on VRAM save/restore
Michał Winiarski [Fri, 14 Nov 2025 12:23:39 +0000 (13:23 +0100)] 
drm/xe/pf: Check for fence error on VRAM save/restore

The code incorrectly assumes that the VRAM save/restore fence is valid.
Fix it by checking for error.

Fixes: 49cf1b9b609fe ("drm/xe/pf: Handle VRAM migration data as part of PF control")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251114122339.1791026-1-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pf: Drop the VF VRAM BO reference on successful restore
Michał Winiarski [Fri, 14 Nov 2025 10:07:13 +0000 (11:07 +0100)] 
drm/xe/pf: Drop the VF VRAM BO reference on successful restore

The reference is only dropped on error. Fix it by adding the missing
xe_bo_put().

Fixes: 49cf1b9b609fe ("drm/xe/pf: Handle VRAM migration data as part of PF control")
Reported-by: Adam Miszczak <adam.miszczak@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251114100713.1776073-1-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe: Remove duplicate DRM_EXEC selection from Kconfig
Shuicheng Lin [Mon, 10 Nov 2025 23:26:58 +0000 (23:26 +0000)] 
drm/xe: Remove duplicate DRM_EXEC selection from Kconfig

There are 2 identical "select DRM_EXEC" lines for DRM_XE.
Remove one to clean up the configuration.

Fixes: d490ecf57790 ("drm/xe: Rework xe_exec and the VM rebind worker to use the drm_exec helper")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251110232657.1807998-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/kunit: Fix forcewake assertion in mocs test
Matt Roper [Thu, 13 Nov 2025 23:40:39 +0000 (15:40 -0800)] 
drm/xe/kunit: Fix forcewake assertion in mocs test

The MOCS kunit test calls KUNIT_ASSERT_TRUE_MSG() with a condition of
'true;' this prevents the assertion from ever failing.  Replace
KUNIT_ASSERT_TRUE_MSG with KUNIT_FAIL_AND_ABORT to get the intended
failure behavior in cases where forcewake was not acquired successfully.

Fixes: 51c0ee84e4dc ("drm/xe/tests/mocs: Hold XE_FORCEWAKE_ALL for LNCF regs")
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251113234038.2256106-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pf: Fix kernel-doc warning in migration_save_consume
Michał Winiarski [Fri, 14 Nov 2025 13:40:30 +0000 (14:40 +0100)] 
drm/xe/pf: Fix kernel-doc warning in migration_save_consume

The kernel-doc for xe_sriov_pf_migration_save_consume() contained
multiple "Return:" sections, causing a warning.
Fix it by removing the extra line.

Fixes: 67df4a5cbc583 ("drm/xe/pf: Add data structures and handlers for migration rings")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251114134030.1795947-1-michal.winiarski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe: Prevent BIT() overflow when handling invalid prefetch region
Shuicheng Lin [Wed, 12 Nov 2025 18:10:06 +0000 (18:10 +0000)] 
drm/xe: Prevent BIT() overflow when handling invalid prefetch region

If user provides a large value (such as 0x80) for parameter
prefetch_mem_region_instance in vm_bind ioctl, it will cause
BIT(prefetch_region) overflow as below:
"
 ------------[ cut here ]------------
 UBSAN: shift-out-of-bounds in drivers/gpu/drm/xe/xe_vm.c:3414:7
 shift exponent 128 is too large for 64-bit type 'long unsigned int'
 CPU: 8 UID: 0 PID: 53120 Comm: xe_exec_system_ Tainted: G        W           6.18.0-rc1-lgci-xe-kernel+ #200 PREEMPT(voluntary)
 Tainted: [W]=WARN
 Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
 Call Trace:
  <TASK>
  dump_stack_lvl+0xa0/0xc0
  dump_stack+0x10/0x20
  ubsan_epilogue+0x9/0x40
  __ubsan_handle_shift_out_of_bounds+0x10e/0x170
  ? mutex_unlock+0x12/0x20
  xe_vm_bind_ioctl.cold+0x20/0x3c [xe]
 ...
"
Fix it by validating prefetch_region before the BIT() usage.

v2: Add Closes and Cc stable kernels. (Matt)

Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6478
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251112181005.2120521-2-shuicheng.lin@intel.com
2 months agodrm/xe/pat: Add helper to query compression enable status
Xin Wang [Mon, 10 Nov 2025 22:14:58 +0000 (22:14 +0000)] 
drm/xe/pat: Add helper to query compression enable status

Add xe_pat_index_get_comp_en() helper function to check whether
compression is enabled for a given PAT index by extracting the
XE2_COMP_EN bit from the PAT table entry.

There are no current users, however there are multiple in-flight series
which will all use this helper.

CC: Nitin Gote <nitin.r.gote@intel.com>
CC: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
CC: Matt Roper <matthew.d.roper@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251110221458.1864507-2-x.wang@intel.com
3 months agodrm/xe/oa: Store forcewake reference in stream structure
Matt Roper [Mon, 10 Nov 2025 23:20:21 +0000 (15:20 -0800)] 
drm/xe/oa: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get().  Since OA grabs and releases forcewake
in different functions, xe_oa_stream_destroy() is currently calling put
with a hardcoded ALL mask.  Although this works for now, it's somewhat
fragile in case OA moves to more precise power domain management in the
future.

Stash the original reference obtained during stream initialization
inside the stream structure so that we can use it directly when the
stream is destroyed.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-35-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/eustall: Store forcewake reference in stream structure
Matt Roper [Mon, 10 Nov 2025 23:20:20 +0000 (15:20 -0800)] 
drm/xe/eustall: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get().  Since EU stall grabs and releases
forcewake in different functions, xe_eu_stall_disable_locked() is
currently calling put with a hardcoded RENDER domain.  Although this
works for now, it's somewhat fragile in case the power domain(s)
required by stall sampling change in the future, or if workarounds show
up that require us to obtain additional domains.

Stash the original reference obtained during stream enable inside the
stream structure so that we can use it directly when the stream is
disabled.

Cc: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-34-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/forcewake: Improve kerneldoc
Matt Roper [Mon, 10 Nov 2025 23:20:19 +0000 (15:20 -0800)] 
drm/xe/forcewake: Improve kerneldoc

Improve the kerneldoc for forcewake a bit to give more detail about what
the structures represent.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-33-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/pf: Use migration-friendly GGTT auto-provisioning
Michal Wajdeczko [Wed, 12 Nov 2025 12:44:08 +0000 (13:44 +0100)] 
drm/xe/pf: Use migration-friendly GGTT auto-provisioning

Instead of trying very hard to find the largest fair GGTT size that
could be allocated for VFs on the current tile, pick some smaller
rounded down to power-of-two value that is more likely to be
provisioned in the same manner by the other PF instance:

  num VFs | GGTT space (MiB)
  --------+-----------------
   63..57 | 56
   56..29 | 64
   28..15 | 128
   14..8  | 256
    7..4  | 512
    3..2  | 1024
       1  | 2048 (regular PF)
       1  | 3584 (admin only PF)

Note that due to FW/HW limitations we can't share all 4GiB GGTT
address space with VFs, so for the larger (>7) number of the VFs
the change in the outcome is happening at different points than
we have in case of GuC contexts/doorbells IDs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251112124408.8094-1-michal.wajdeczko@intel.com
3 months agodrm/intel/bmg: Allow device ID usage with single-argument macros
Michał Winiarski [Wed, 12 Nov 2025 13:22:20 +0000 (14:22 +0100)] 
drm/intel/bmg: Allow device ID usage with single-argument macros

When INTEL_BMG_G21_IDS were added as a subplatform, token concatenation
operator usage was omitted, making INTEL_BMG_IDS not usable with
single-argument macros.
Fix that by adding the missing operator.

Fixes: 78de8f876683 ("drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workarounds")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-25-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add wait helper for VF FLR
Michał Winiarski [Wed, 12 Nov 2025 13:22:19 +0000 (14:22 +0100)] 
drm/xe/pf: Add wait helper for VF FLR

VF FLR requires additional processing done by PF driver.
The processing is done after FLR is already finished from PCIe
perspective.
In order to avoid a scenario where migration state transitions while
PF processing is still in progress, additional synchronization
point is needed.
Add a helper that will be used as part of VF driver struct
pci_error_handlers .reset_done() callback.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-24-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle VRAM migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:18 +0000 (14:22 +0100)] 
drm/xe/pf: Handle VRAM migration data as part of PF control

Connect the helpers to allow save and restore of VRAM migration data in
stop_copy / resume device state.

Co-developed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-23-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/migrate: Add function to copy of VRAM data in chunks
Lukasz Laguna [Wed, 12 Nov 2025 13:22:17 +0000 (14:22 +0100)] 
drm/xe/migrate: Add function to copy of VRAM data in chunks

Introduce a new function to copy data between VRAM and sysmem objects.
The existing xe_migrate_copy() is tailored for eviction and restore
operations, which involves additional logic and operates on entire
objects.
The xe_migrate_vram_copy_chunk() allows copying chunks of data to or
from a dedicated buffer object, which is essential in case of VF
migration.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-22-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add helper to retrieve VF's LMEM object
Lukasz Laguna [Wed, 12 Nov 2025 13:22:16 +0000 (14:22 +0100)] 
drm/xe/pf: Add helper to retrieve VF's LMEM object

Instead of accessing VF's lmem_obj directly, introduce a helper function
to make the access more convenient.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-21-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle MMIO migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:15 +0000 (14:22 +0100)] 
drm/xe/pf: Handle MMIO migration data as part of PF control

Implement the helpers and use them for save and restore of MMIO
migration data in stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-20-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle GGTT migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:14 +0000 (14:22 +0100)] 
drm/xe/pf: Handle GGTT migration data as part of PF control

Connect the helpers to allow save and restore of GGTT migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-19-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add helpers for VF GGTT migration data handling
Michał Winiarski [Wed, 12 Nov 2025 13:22:13 +0000 (14:22 +0100)] 
drm/xe/pf: Add helpers for VF GGTT migration data handling

In an upcoming change, the VF GGTT migration data will be handled as
part of VF control state machine. Add the necessary helpers to allow the
migration data transfer to/from the HW GGTT resource.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-18-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle GuC migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:12 +0000 (14:22 +0100)] 
drm/xe/pf: Handle GuC migration data as part of PF control

Connect the helpers to allow save and restore of GuC migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-17-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Switch VF migration GuC save/restore to struct migration data
Michał Winiarski [Wed, 12 Nov 2025 13:22:11 +0000 (14:22 +0100)] 
drm/xe/pf: Switch VF migration GuC save/restore to struct migration data

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Now that the data is decoupled from both guc_state debugfs and PAUSE
state, we can safely remove the struct xe_gt_sriov_state_snapshot and
modify the GuC save/restore functions to operate on struct
xe_sriov_migration_data.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-16-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Don't save GuC VF migration data on pause
Michał Winiarski [Wed, 12 Nov 2025 13:22:10 +0000 (14:22 +0100)] 
drm/xe/pf: Don't save GuC VF migration data on pause

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Remove it from PAUSE state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-15-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Remove GuC migration data save/restore from GT debugfs
Michał Winiarski [Wed, 12 Nov 2025 13:22:09 +0000 (14:22 +0100)] 
drm/xe/pf: Remove GuC migration data save/restore from GT debugfs

In upcoming changes, SR-IOV VF migration data will be extended beyond
GuC data and exported to userspace using VFIO interface (with a
vendor-specific variant driver) and a device-level debugfs interface.
Remove the GT-level debugfs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-14-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration
Michał Winiarski [Wed, 12 Nov 2025 13:22:08 +0000 (14:22 +0100)] 
drm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration

Contiguous PF GGTT VMAs can be scarce after creating VFs.
Increase the GuC buffer cache size to 8M for PF so that we can fit GuC
migration data (which currently maxes out at just over 4M) and use the
cache instead of allocating fresh BOs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-13-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe: Allow the caller to pass guc_buf_cache size
Michał Winiarski [Wed, 12 Nov 2025 13:22:07 +0000 (14:22 +0100)] 
drm/xe: Allow the caller to pass guc_buf_cache size

An upcoming change will use GuC buffer cache as a place where GuC
migration data will be stored, and the memory requirement for that is
larger than indirect data.
Allow the caller to pass the size based on the intended usecase.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-12-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe: Add sa/guc_buf_cache sync interface
Michał Winiarski [Wed, 12 Nov 2025 13:22:06 +0000 (14:22 +0100)] 
drm/xe: Add sa/guc_buf_cache sync interface

In upcoming changes the cached buffers are going to be used to read data
produced by the GuC. Add a counterpart to flush, which synchronizes the
CPU-side of suballocation with the GPU data and propagate the interface
to GuC Buffer Cache.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-11-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>