]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 months agodrm/gpusvm: Limit the number of retries in drm_gpusvm_get_pages
Matthew Brost [Sat, 22 Nov 2025 01:25:01 +0000 (17:25 -0800)] 
drm/gpusvm: Limit the number of retries in drm_gpusvm_get_pages

drm_gpusvm_get_pages should not be allowed to retry forever, cap the
time spent in the function to HMM_RANGE_DEFAULT_TIMEOUT has this is
essentially a wrapper around hmm_range_fault.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251122012502.382587-1-matthew.brost@intel.com
2 months agodrm/xe: Add caching pagetable flag
Zbigniew Kempczyński [Tue, 25 Nov 2025 15:37:33 +0000 (16:37 +0100)] 
drm/xe: Add caching pagetable flag

Introduce device xe_caching_pt flag to selectively turn it on for
supported platforms. It allows to eliminate version check and enable
this feature for the future platforms.

Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251125153732.400766-2-zbigniew.kempczynski@intel.com
2 months agodrm/xe/vm: Skip ufence association for CPU address mirror VMA during MAP
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:28 +0000 (13:26 +0530)] 
drm/xe/vm: Skip ufence association for CPU address mirror VMA during MAP

The MAP operation for a CPU address mirror VMA does not require ufence
association because such mappings are not GPU-synchronized and do not
participate in GPU job completion signaling.

Remove the unnecessary ufence addition for this case to avoid -EBUSY
failure in check_ufence of unbind ops.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-6-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe/svm: Enable UNMAP for VMA merging operations
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:27 +0000 (13:26 +0530)] 
drm/xe/svm: Enable UNMAP for VMA merging operations

ALLOW UNMAP of VMAs associated with SVM mappings when the MAP operation
is intended to merge adjacent CPU_ADDR_MIRROR VMAs.

v2
- Remove mapping exist check in garbage collector

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-5-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe/svm: Extend MAP range to reduce vma fragmentation
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:26 +0000 (13:26 +0530)] 
drm/xe/svm: Extend MAP range to reduce vma fragmentation

When DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR is set during VM_BIND_OP_MAP,
the mapping logic now checks adjacent cpu_addr_mirror VMAs with default
attributes and expands the mapping range accordingly. This ensures that
bo_unmap operations ideally target the same area and helps reduce
fragmentation by coalescing nearby compatible VMAs into a single mapping.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-4-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe: Merge adjacent default-attribute VMAs during garbage collection
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:25 +0000 (13:26 +0530)] 
drm/xe: Merge adjacent default-attribute VMAs during garbage collection

While restoring default memory attributes for VMAs during garbage
collection, extend the target range by checking neighboring VMAs. If
adjacent VMAs are CPU-address-mirrored and have default attributes,
include them in the mergeable range to reduce fragmentation and improve
VMA reuse.

v2
-Rebase

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe: Add helper to extend CPU-mirrored VMA range for merge
Himal Prasad Ghimiray [Tue, 25 Nov 2025 07:56:24 +0000 (13:26 +0530)] 
drm/xe: Add helper to extend CPU-mirrored VMA range for merge

Introduce xe_vm_find_cpu_addr_mirror_vma_range(), which computes an
extended range around a given range by including adjacent VMAs that are
CPU-address-mirrored and have default memory attributes. This helper is
useful for determining mergeable range without performing the actual merge.

v2
- Add assert
- Move unmap check to this patch

v3
- Decrease offset to check by SZ_4K to avoid wrong vma return in fast
  lookup path

v4
- *start should be >= SZ_4K (Matt)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251125075628.1182481-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2 months agodrm/xe: Protect against unset LRC when pausing submissions
Tomasz Lis [Mon, 24 Nov 2025 22:28:53 +0000 (23:28 +0100)] 
drm/xe: Protect against unset LRC when pausing submissions

While pausing submissions, it is possible to encouner an exec queue
which is during creation, and therefore doesn't have a valid xe_lrc
struct reference.

Protect agains such situation, by checking for NULL before access.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause")
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251124222853.1900800-1-tomasz.lis@intel.com
2 months agodrm/xe/vf: Start re-emission from first unsignaled job during VF migration
Matthew Brost [Fri, 21 Nov 2025 15:27:50 +0000 (07:27 -0800)] 
drm/xe/vf: Start re-emission from first unsignaled job during VF migration

The LRC software ring tail is reset to the first unsignaled pending
job's head.

Fix the re-emission logic to begin submitting from the first unsignaled
job detected, rather than scanning all pending jobs, which can cause
imbalance.

v2:
 - Include missing local changes
v3:
 - s/skip_replay/restore_replay (Tomasz)

Fixes: c25c1010df88 ("drm/xe/vf: Replay GuC submission state on pause / unpause")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patch.msgid.link/20251121152750.240557-1-matthew.brost@intel.com
2 months agodrm/xe/pf: Handle MERT catastrophic errors
Lukasz Laguna [Mon, 24 Nov 2025 19:02:37 +0000 (20:02 +0100)] 
drm/xe/pf: Handle MERT catastrophic errors

The MERT block triggers an interrupt when a catastrophic error occurs.
Update the interrupt handler to read the MERT catastrophic error type
and log appropriate debug message.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-5-lukasz.laguna@intel.com
2 months agodrm/xe/pf: Add TLB invalidation support for MERT
Lukasz Laguna [Mon, 24 Nov 2025 19:02:36 +0000 (20:02 +0100)] 
drm/xe/pf: Add TLB invalidation support for MERT

Add support for triggering and handling MERT TLB invalidation. After
LMTT updates, the MERT TLB invalidation is initiated to ensure memory
translations remain coherent.

Completion of the invalidation is signaled via MERT interrupt (bit 13 in
the GFX master interrupt register). Detect and handle this interrupt to
properly synchronize the invalidation flow.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-4-lukasz.laguna@intel.com
2 months agodrm/xe/pf: Configure LMTT in MERT
Lukasz Laguna [Mon, 24 Nov 2025 19:02:35 +0000 (20:02 +0100)] 
drm/xe/pf: Configure LMTT in MERT

On platforms with standalone MERT, the PF driver needs to program LMTT
in MERT's LMEM_CFG register.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-3-lukasz.laguna@intel.com
2 months agodrm/xe: Add device flag to indicate standalone MERT
Lukasz Laguna [Mon, 24 Nov 2025 19:02:34 +0000 (20:02 +0100)] 
drm/xe: Add device flag to indicate standalone MERT

The MERT subsystem manages memory accesses between host and device. On
the Crescent Island platform, it requires direct management by the
driver.

Introduce a device flag and corresponding helpers to identify platforms
with standalone MERT, enabling proper initialization and handling.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251124190237.20503-2-lukasz.laguna@intel.com
2 months agodrm/xe/uc: Change assertion to error on huc authentication failure
Zhanjun Dong [Mon, 27 Oct 2025 21:42:12 +0000 (17:42 -0400)] 
drm/xe/uc: Change assertion to error on huc authentication failure

The fault injection test can cause the xe_huc_auth function to fail.
This is an intentional failure, so in this scenario we don't want to
throw an assert and taint the kernel, because that will impact CI
execution.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20251027214212.2856903-1-zhanjun.dong@intel.com
2 months agodrm/xe/guc: Cleanup GuC log buffer macros and helpers
Zhanjun Dong [Wed, 5 Nov 2025 23:31:43 +0000 (18:31 -0500)] 
drm/xe/guc: Cleanup GuC log buffer macros and helpers

Cleanup GuC log buffer macros and helpers, add Xe style macro prefix.
Update buffer type values to align with the GuC specification
Update buffer offset calculation.
Remove helper functions, replaced with macros.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20251105233143.1168759-1-zhanjun.dong@intel.com
2 months agodrm/xe/pf: Fix .bulk_profile/sched_priority description
Michal Wajdeczko [Sat, 15 Nov 2025 15:26:58 +0000 (16:26 +0100)] 
drm/xe/pf: Fix .bulk_profile/sched_priority description

The .bulk_profile/sched_priority file is always write-only, unlike
the profile/sched_priority files which can be either read-write or
read-only (in case of PF or VFs respectively).

Fixes: 6b514ed2d9a7 ("drm/xe/pf: Add documentation for sriov_admin attributes")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251115152659.10853-1-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Use div_u64 when calculating GGTT profile
Michal Wajdeczko [Sat, 15 Nov 2025 15:13:22 +0000 (16:13 +0100)] 
drm/xe/pf: Use div_u64 when calculating GGTT profile

This will fix the following error seen on some 32-bit config:

"ERROR: modpost: "__udivdi3" [drivers/gpu/drm/xe/xe.ko] undefined!"

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511150929.3vUi6PEJ-lkp@intel.com/
Fixes: e448372e8a8e ("drm/xe/pf: Use migration-friendly GGTT auto-provisioning")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251115151323.10828-1-michal.wajdeczko@intel.com
2 months agodrm/xe: Fix conversion from clock ticks to milliseconds
Harish Chegondi [Mon, 17 Nov 2025 19:48:43 +0000 (11:48 -0800)] 
drm/xe: Fix conversion from clock ticks to milliseconds

When tick counts are large and multiplication by MSEC_PER_SEC is larger
than 64 bits, the conversion from clock ticks to milliseconds can go bad.

Use mul_u64_u32_div() instead.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 49cc215aad7f ("drm/xe: Add xe_gt_clock_interval_to_ms helper")
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/1562f1b62d5be3fbaee100f09107f3cc49e40dd1.1763408584.git.harish.chegondi@intel.com
2 months agodrm/xe/guc_ct: Cleanup ifdef'ry
Lucas De Marchi [Wed, 19 Nov 2025 15:21:57 +0000 (07:21 -0800)] 
drm/xe/guc_ct: Cleanup ifdef'ry

Better split CONFIG_DRM_XE_DEBUG and CONFIG_DRM_XE_DEBUG_GUC optional
parts from the main code, creating smaller ct_dead_* and fast_req_*
interfaces.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20251119152157.1675188-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc: Fix stack_depot usage
Lucas De Marchi [Tue, 18 Nov 2025 19:08:11 +0000 (11:08 -0800)] 
drm/xe/guc: Fix stack_depot usage

Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is
enabled to fix the following call stack:

[] BUG: kernel NULL pointer dereference, address: 0000000000000000
[] Workqueue:  drm_sched_run_job_work [gpu_sched]
[] RIP: 0010:stack_depot_save_flags+0x172/0x870
[] Call Trace:
[]  <TASK>
[]  fast_req_track+0x58/0xb0 [xe]

Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from")
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Cc: stable@vger.kernel.org # v6.17+
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251118-fix-debug-guc-v1-1-9f780c6bedf8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc: Fix resource leak in xe_guc_ct_init_noalloc()
Shuicheng Lin [Mon, 10 Nov 2025 18:45:23 +0000 (18:45 +0000)] 
drm/xe/guc: Fix resource leak in xe_guc_ct_init_noalloc()

xe_guc_ct_init_noalloc() allocates the CT workqueue and other helpers
before it tries to initialize ct->lock. If drmm_mutex_init() fails
we currently bail out without releasing those resources because the
guc_ct_fini() hasn’t been registered yet.

Since destroy_workqueue() in guc_ct_fini() may flush the workqueue, which
in turn can take the ct lock, the initialization sequence is restructured
to first initialize the ct->lock, then set up all CT state, and finally
register guc_ct_fini().

v2: guc_ct_fini() does take ct lock. (Matt)
v3: move primelockdep() together with drmm_mutex_init(). (Lucas)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251110184522.1581001-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe: Fix memory leak when handling pagefault vma
Mika Kuoppala [Thu, 20 Nov 2025 16:14:35 +0000 (18:14 +0200)] 
drm/xe: Fix memory leak when handling pagefault vma

When the pagefault handling code was moved to a new file, an extra
drm_exec_init() was added to the VMA path. This call is unnecessary because
xe_validation_ctx_init() already performs a drm_exec_init(), resulting in a
memory leak reported by kmemleak.

Remove the redundant drm_exec_init() from the VMA pagefault handling code.

Fixes: fb544b844508 ("drm/xe: Implement xe_pagefault_queue_work")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: intel-xe@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251120161435.3674556-1-mika.kuoppala@linux.intel.com
2 months agodrm/xe/oa: Fix potential UAF in xe_oa_add_config_ioctl()
Sanjay Yadav [Tue, 18 Nov 2025 11:49:00 +0000 (17:19 +0530)] 
drm/xe/oa: Fix potential UAF in xe_oa_add_config_ioctl()

In xe_oa_add_config_ioctl(), we accessed oa_config->id after dropping
metrics_lock. Since this lock protects the lifetime of oa_config, an
attacker could guess the id and call xe_oa_remove_config_ioctl() with
perfect timing, freeing oa_config before we dereference it, leading to
a potential use-after-free.

Fix this by caching the id in a local variable while holding the lock.

v2: (Matt A)
- Dropped mutex_unlock(&oa->metrics_lock) ordering change from
  xe_oa_remove_config_ioctl()

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6614
Fixes: cdf02fe1a94a7 ("drm/xe/oa/uapi: Add/remove OA config perf ops")
Cc: <stable@vger.kernel.org> # v6.11+
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251118114859.3379952-2-sanjay.kumar.yadav@intel.com
2 months agodrm/xe/debugfs: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:06 +0000 (08:44 -0800)] 
drm/xe/debugfs: Use scope-based runtime PM

Switch the debugfs code to use scope-based runtime PM where possible,
for consistency with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-56-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/sysfs: Use scope-based runtime power management
Matt Roper [Tue, 18 Nov 2025 16:44:05 +0000 (08:44 -0800)] 
drm/xe/sysfs: Use scope-based runtime power management

Switch sysfs to use scope-based runtime power management to slightly
simplify the code.

v2:
 - Drop unnecessary local variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-55-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/tests: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:04 +0000 (08:44 -0800)] 
drm/xe/tests: Use scope-based runtime PM

Use scope-based handling of runtime PM in the kunit tests for
consistency with other parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-54-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/sriov: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:03 +0000 (08:44 -0800)] 
drm/xe/sriov: Use scope-based runtime PM

Use scope-based runtime power management in the SRIOV code for
consistency with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-53-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/hwmon: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:02 +0000 (08:44 -0800)] 
drm/xe/hwmon: Use scope-based runtime PM

Use scope-based runtime power management in the hwmon code for
consistency with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variables.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-52-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/ggtt: Use scope-based runtime pm
Matt Roper [Tue, 18 Nov 2025 16:44:01 +0000 (08:44 -0800)] 
drm/xe/ggtt: Use scope-based runtime pm

Switch the GGTT code to scope-based runtime PM for consistency with
other parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-51-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/bo: Use scope-based runtime PM
Matt Roper [Tue, 18 Nov 2025 16:44:00 +0000 (08:44 -0800)] 
drm/xe/bo: Use scope-based runtime PM

Use scope-based runtime power management in the BO code for consistency
with other parts of the driver.

v2:
 - Drop unnecessary 'ret' variable.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-50-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/vram: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:59 +0000 (08:43 -0800)] 
drm/xe/vram: Use scope-based forcewake

Switch VRAM code to use scope-based forcewake for consistency with other
parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-49-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/reg_sr: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:58 +0000 (08:43 -0800)] 
drm/xe/reg_sr: Use scope-based forcewake

Use scope-based forcewake to slightly simplify the reg_sr code.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-48-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/query: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:57 +0000 (08:43 -0800)] 
drm/xe/query: Use scope-based forcewake

Use scope-based forcewake handling for consistency with other parts of
the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-47-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/huc: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:56 +0000 (08:43 -0800)] 
drm/xe/huc: Use scope-based forcewake

Use scope-based forcewake in the HuC code for a small simplification and
consistency with other parts of the driver.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-46-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gt_debugfs: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:55 +0000 (08:43 -0800)] 
drm/xe/gt_debugfs: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM to simplify the
debugfs code slightly.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-45-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/drm_client: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:54 +0000 (08:43 -0800)] 
drm/xe/drm_client: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM.

v2:
 - Use xe_force_wake_release_only rather than a custom one-off class for
   "any engine" forcewake.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-44-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe: Return forcewake reference type from force_wake_get_any_engine()
Matt Roper [Tue, 18 Nov 2025 16:43:53 +0000 (08:43 -0800)] 
drm/xe: Return forcewake reference type from force_wake_get_any_engine()

Adjust the signature of force_wake_get_any_engine() such that it returns
a 'struct xe_force_wake_ref' rather than a boolean success/failure.
Failure cases are now recognized by inspecting the hardware engine
returned by reference; a NULL hwe indicates that no engine's forcewake
could be obtained.

These changes will make it cleaner and easier to incorporate scope-based
cleanup in force_wake_get_any_engine()'s caller in a future patch.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-43-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/display: Use scoped-cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:52 +0000 (08:43 -0800)] 
drm/xe/display: Use scoped-cleanup

Eliminate some goto-based cleanup by utilizing scoped cleanup helpers.

v2:
 - Eliminate unnecessary 'ret' variable in intel_hdcp_gsc_check_status()
   (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-42-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/devcoredump: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:51 +0000 (08:43 -0800)] 
drm/xe/devcoredump: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM in the devcoredump
code.  This eliminates some goto-based error handling and slightly
simplifies other functions.

v2:
 - Move the forcewake acquisition slightly higher in
   devcoredump_snapshot() so that we maintain an easy-to-understand LIFO
   cleanup order.  (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-41-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/device: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:50 +0000 (08:43 -0800)] 
drm/xe/device: Use scope-based cleanup

Convert device code to use scope-based forcewake and runtime PM.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-40-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gsc: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:49 +0000 (08:43 -0800)] 
drm/xe/gsc: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM to eliminate some
goto-based error handling and simplify other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-39-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pxp: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:48 +0000 (08:43 -0800)] 
drm/xe/pxp: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime pm.  This allows us to
eliminate some goto-based error handling and simplify other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-38-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pat: Use scope-based forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:47 +0000 (08:43 -0800)] 
drm/xe/pat: Use scope-based forcewake

Use scope-based cleanup for forcewake in the PAT code to slightly
simplify the code.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-37-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/mocs: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:46 +0000 (08:43 -0800)] 
drm/xe/mocs: Use scope-based cleanup

Using scope-based cleanup for runtime PM and forcewake in the MOCS code
allows us to eliminate some goto-based error handling and simplify some
other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-36-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/guc_pc: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:45 +0000 (08:43 -0800)] 
drm/xe/guc_pc: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM in the GuC PC code.
This allows us to eliminate to goto-based cleanup and simplifies some
other functions.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-35-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/guc: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:44 +0000 (08:43 -0800)] 
drm/xe/guc: Use scope-based cleanup

Use scope-based cleanup for forcewake and runtime PM.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-34-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gt_idle: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:43 +0000 (08:43 -0800)] 
drm/xe/gt_idle: Use scope-based cleanup

Use scope-based cleanup for runtime PM and forcewake in the GT idle
code.

v2:
 - Use scoped_guard() over guard() in idle_status_show() and
   idle_residency_ms_show().  (Gustavo)
 - Eliminate unnecessary 'ret' local variable in name_show().

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-33-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/gt: Use scope-based cleanup
Matt Roper [Tue, 18 Nov 2025 16:43:42 +0000 (08:43 -0800)] 
drm/xe/gt: Use scope-based cleanup

Using scope-based cleanup for forcewake and runtime PM allows us to
reduce or eliminate some of the goto-based error handling and simplify
several functions.

v2:
 - Drop changes to do_gt_restart().  This function still has goto-based
   logic, making scope-based cleanup unsafe for now.  (Gustavo)

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-32-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pm: Add scope-based cleanup helper for runtime PM
Matt Roper [Tue, 18 Nov 2025 16:43:41 +0000 (08:43 -0800)] 
drm/xe/pm: Add scope-based cleanup helper for runtime PM

Add a scope-based helpers for runtime PM that may be used to simplify
cleanup logic and potentially avoid goto-based cleanup.

For example, using

        guard(xe_pm_runtime)(xe);

will get runtime PM and cause a corresponding put to occur automatically
when the current scope is exited.  'xe_pm_runtime_noresume' can be used
as a guard replacement for the corresponding 'noresume' variant.
There's also an xe_pm_runtime_ioctl conditional guard that can be used
as a replacement for xe_runtime_ioctl():

        ACQUIRE(xe_pm_runtime_ioctl, pm)(xe);
        if ((ret = ACQUIRE_ERR(xe_pm_runtime_ioctl, &pm)) < 0)
                /* failed */

In a few rare cases (such as gt_reset_worker()) we need to ensure that
runtime PM is dropped when the function is exited by any means
(including error paths), but the function does not need to acquire
runtime PM because that has already been done earlier by a different
function.  For these special cases, an 'xe_pm_runtime_release_only'
guard can be used to handle the release without doing an acquisition.

These guards will be used in future patches to eliminate some of our
goto-based cleanup.

v2:
 - Specify success condition for xe_pm runtime_ioctl as _RET >= 0 so
   that positive values will be properly identified as success and
   trigger destructor cleanup properly.

v3:
 - Add comments to the kerneldoc for the existing 'get' functions
   indicating that scope-based handling should be preferred where
   possible.  (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-31-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/forcewake: Add scope-based cleanup for forcewake
Matt Roper [Tue, 18 Nov 2025 16:43:40 +0000 (08:43 -0800)] 
drm/xe/forcewake: Add scope-based cleanup for forcewake

Since forcewake uses a reference counting get/put model, there are many
places where we need to be careful to drop the forcewake reference when
bailing out of a function early on an error path.  Add scope-based
cleanup options that can be used in place of explicit get/put to help
prevent mistakes in this area.

Examples:

   CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT);

       Obtain forcewake on the XE_FW_GT domain and hold it until the
       end of the current block.  The wakeref will be dropped
       automatically when the current scope is exited by any means
       (return, break, reaching the end of the block, etc.).

   xe_with_force_wake(fw_ref, gt_to_fw(ss->gt), XE_FORCEWAKE_ALL) {
        ...
   }

       Hold all forcewake domains for the following block.  As with the
       CLASS usage, forcewake will be dropped automatically when the
       block is exited by any means.

Use of these cleanup helpers should allow us to remove some ugly
goto-based error handling and help avoid mistakes in functions with lots
of early error exits.

An 'xe_force_wake_release_only' class is also added for cases where a
forcewake reference is passed in from another function and the current
function is responsible for releasing it in every flow and error path.

v2:
 - Create a separate constructor that just wraps xe_force_wake_get for
   use in the class.  This eliminates the need to update the signature
   of xe_force_wake_get().  (Michal)

v3:
 - Wrap xe_with_force_wake's 'done' marker in __UNIQUE_ID.  (Gustavo)
 - Add a note to xe_force_wake_get()'s kerneldoc explaining that
   scope-based cleanup is preferred when possible.  (Gustavo)
 - Add an xe_force_wake_release_only class.  (Gustavo)

v4:
 - Add NULL check on fw in release_only variant.  (Gustavo)

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-30-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/vm: Use for_each_tlb_inval() to calculate invalidation fences
Matt Roper [Tue, 18 Nov 2025 20:26:05 +0000 (12:26 -0800)] 
drm/xe/vm: Use for_each_tlb_inval() to calculate invalidation fences

ops_execute() calculates the size of a fence array based on
XE_MAX_GT_PER_TILE, while the code that actually fills in the fence
array uses a for_each_tlb_inval() iterator.  This works out okay today
since both approaches come up with the same number of invalidation
fences (2: primary GT invalidation + media GT invalidation), but could
be problematic in the future if there isn't a 1:1 relationship between
TLBs needing invalidation and potential GTs on the tile.

Adjust the allocation code to use the same for_each_tlb_inval()
counting logic as the code that fills the array to future-proof the
code.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251118202604.3715782-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/vf: Shadow buffer management for CCS read/write operations
Satyanarayana K V P [Tue, 18 Nov 2025 12:07:45 +0000 (12:07 +0000)] 
drm/xe/vf: Shadow buffer management for CCS read/write operations

CCS copy command consist of 5-dword sequence. If vCPU halts during
save/restore operations while these sequences are being programmed,
incomplete writes can cause page faults during IGPU CCS metadata saving.

Use shadow buffer management to prevent partial write issues during CCS
operations.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251118120745.3460172-3-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/sa: Shadow buffer support in the sub-allocator pool
Satyanarayana K V P [Tue, 18 Nov 2025 12:07:44 +0000 (12:07 +0000)] 
drm/xe/sa: Shadow buffer support in the sub-allocator pool

The existing sub-allocator is limited to managing a single buffer object.
This enhancement introduces shadow buffer functionality to support
scenarios requiring dual buffer management.

The changes include added shadow buffer object creation capability,
Management for both primary and shadow buffers, and appropriate locking
mechanisms for thread-safe operations.

This enables more flexible buffer allocation strategies in scenarios where
shadow buffering is required.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251118120745.3460172-2-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/irq: Handle msix vector0 interrupt
Venkata Ramana Nayana [Fri, 7 Nov 2025 08:31:41 +0000 (14:01 +0530)] 
drm/xe/irq: Handle msix vector0 interrupt

Current gu2host handler registered as MSI-X vector 0 and as per bspec for
a msix vector 0 interrupt, the driver must check the legacy registers
190008(TILE_INT_REG), 190060h (GT INTR Identity Reg 0) and other registers
mentioned in "Interrupt Service Routine Pseudocode" otherwise it will block
the next interrupts. To overcome this issue replacing guc2host handler
with legacy xe_irq_handler.

Fixes: da889070be7b2 ("drm/xe/irq: Separate MSI and MSI-X flows")
Bspec: 62357
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20251107083141.2080189-1-venkata.ramana.nayana@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/pf: Check for fence error on VRAM save/restore
Michał Winiarski [Fri, 14 Nov 2025 12:23:39 +0000 (13:23 +0100)] 
drm/xe/pf: Check for fence error on VRAM save/restore

The code incorrectly assumes that the VRAM save/restore fence is valid.
Fix it by checking for error.

Fixes: 49cf1b9b609fe ("drm/xe/pf: Handle VRAM migration data as part of PF control")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251114122339.1791026-1-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pf: Drop the VF VRAM BO reference on successful restore
Michał Winiarski [Fri, 14 Nov 2025 10:07:13 +0000 (11:07 +0100)] 
drm/xe/pf: Drop the VF VRAM BO reference on successful restore

The reference is only dropped on error. Fix it by adding the missing
xe_bo_put().

Fixes: 49cf1b9b609fe ("drm/xe/pf: Handle VRAM migration data as part of PF control")
Reported-by: Adam Miszczak <adam.miszczak@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251114100713.1776073-1-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe: Remove duplicate DRM_EXEC selection from Kconfig
Shuicheng Lin [Mon, 10 Nov 2025 23:26:58 +0000 (23:26 +0000)] 
drm/xe: Remove duplicate DRM_EXEC selection from Kconfig

There are 2 identical "select DRM_EXEC" lines for DRM_XE.
Remove one to clean up the configuration.

Fixes: d490ecf57790 ("drm/xe: Rework xe_exec and the VM rebind worker to use the drm_exec helper")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251110232657.1807998-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe/kunit: Fix forcewake assertion in mocs test
Matt Roper [Thu, 13 Nov 2025 23:40:39 +0000 (15:40 -0800)] 
drm/xe/kunit: Fix forcewake assertion in mocs test

The MOCS kunit test calls KUNIT_ASSERT_TRUE_MSG() with a condition of
'true;' this prevents the assertion from ever failing.  Replace
KUNIT_ASSERT_TRUE_MSG with KUNIT_FAIL_AND_ABORT to get the intended
failure behavior in cases where forcewake was not acquired successfully.

Fixes: 51c0ee84e4dc ("drm/xe/tests/mocs: Hold XE_FORCEWAKE_ALL for LNCF regs")
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251113234038.2256106-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/pf: Fix kernel-doc warning in migration_save_consume
Michał Winiarski [Fri, 14 Nov 2025 13:40:30 +0000 (14:40 +0100)] 
drm/xe/pf: Fix kernel-doc warning in migration_save_consume

The kernel-doc for xe_sriov_pf_migration_save_consume() contained
multiple "Return:" sections, causing a warning.
Fix it by removing the extra line.

Fixes: 67df4a5cbc583 ("drm/xe/pf: Add data structures and handlers for migration rings")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251114134030.1795947-1-michal.winiarski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Prevent BIT() overflow when handling invalid prefetch region
Shuicheng Lin [Wed, 12 Nov 2025 18:10:06 +0000 (18:10 +0000)] 
drm/xe: Prevent BIT() overflow when handling invalid prefetch region

If user provides a large value (such as 0x80) for parameter
prefetch_mem_region_instance in vm_bind ioctl, it will cause
BIT(prefetch_region) overflow as below:
"
 ------------[ cut here ]------------
 UBSAN: shift-out-of-bounds in drivers/gpu/drm/xe/xe_vm.c:3414:7
 shift exponent 128 is too large for 64-bit type 'long unsigned int'
 CPU: 8 UID: 0 PID: 53120 Comm: xe_exec_system_ Tainted: G        W           6.18.0-rc1-lgci-xe-kernel+ #200 PREEMPT(voluntary)
 Tainted: [W]=WARN
 Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
 Call Trace:
  <TASK>
  dump_stack_lvl+0xa0/0xc0
  dump_stack+0x10/0x20
  ubsan_epilogue+0x9/0x40
  __ubsan_handle_shift_out_of_bounds+0x10e/0x170
  ? mutex_unlock+0x12/0x20
  xe_vm_bind_ioctl.cold+0x20/0x3c [xe]
 ...
"
Fix it by validating prefetch_region before the BIT() usage.

v2: Add Closes and Cc stable kernels. (Matt)

Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6478
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251112181005.2120521-2-shuicheng.lin@intel.com
3 months agodrm/xe/pat: Add helper to query compression enable status
Xin Wang [Mon, 10 Nov 2025 22:14:58 +0000 (22:14 +0000)] 
drm/xe/pat: Add helper to query compression enable status

Add xe_pat_index_get_comp_en() helper function to check whether
compression is enabled for a given PAT index by extracting the
XE2_COMP_EN bit from the PAT table entry.

There are no current users, however there are multiple in-flight series
which will all use this helper.

CC: Nitin Gote <nitin.r.gote@intel.com>
CC: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
CC: Matt Roper <matthew.d.roper@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251110221458.1864507-2-x.wang@intel.com
3 months agodrm/xe/oa: Store forcewake reference in stream structure
Matt Roper [Mon, 10 Nov 2025 23:20:21 +0000 (15:20 -0800)] 
drm/xe/oa: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get().  Since OA grabs and releases forcewake
in different functions, xe_oa_stream_destroy() is currently calling put
with a hardcoded ALL mask.  Although this works for now, it's somewhat
fragile in case OA moves to more precise power domain management in the
future.

Stash the original reference obtained during stream initialization
inside the stream structure so that we can use it directly when the
stream is destroyed.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-35-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/eustall: Store forcewake reference in stream structure
Matt Roper [Mon, 10 Nov 2025 23:20:20 +0000 (15:20 -0800)] 
drm/xe/eustall: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get().  Since EU stall grabs and releases
forcewake in different functions, xe_eu_stall_disable_locked() is
currently calling put with a hardcoded RENDER domain.  Although this
works for now, it's somewhat fragile in case the power domain(s)
required by stall sampling change in the future, or if workarounds show
up that require us to obtain additional domains.

Stash the original reference obtained during stream enable inside the
stream structure so that we can use it directly when the stream is
disabled.

Cc: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-34-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/forcewake: Improve kerneldoc
Matt Roper [Mon, 10 Nov 2025 23:20:19 +0000 (15:20 -0800)] 
drm/xe/forcewake: Improve kerneldoc

Improve the kerneldoc for forcewake a bit to give more detail about what
the structures represent.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-33-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 months agodrm/xe/pf: Use migration-friendly GGTT auto-provisioning
Michal Wajdeczko [Wed, 12 Nov 2025 12:44:08 +0000 (13:44 +0100)] 
drm/xe/pf: Use migration-friendly GGTT auto-provisioning

Instead of trying very hard to find the largest fair GGTT size that
could be allocated for VFs on the current tile, pick some smaller
rounded down to power-of-two value that is more likely to be
provisioned in the same manner by the other PF instance:

  num VFs | GGTT space (MiB)
  --------+-----------------
   63..57 | 56
   56..29 | 64
   28..15 | 128
   14..8  | 256
    7..4  | 512
    3..2  | 1024
       1  | 2048 (regular PF)
       1  | 3584 (admin only PF)

Note that due to FW/HW limitations we can't share all 4GiB GGTT
address space with VFs, so for the larger (>7) number of the VFs
the change in the outcome is happening at different points than
we have in case of GuC contexts/doorbells IDs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251112124408.8094-1-michal.wajdeczko@intel.com
3 months agodrm/intel/bmg: Allow device ID usage with single-argument macros
Michał Winiarski [Wed, 12 Nov 2025 13:22:20 +0000 (14:22 +0100)] 
drm/intel/bmg: Allow device ID usage with single-argument macros

When INTEL_BMG_G21_IDS were added as a subplatform, token concatenation
operator usage was omitted, making INTEL_BMG_IDS not usable with
single-argument macros.
Fix that by adding the missing operator.

Fixes: 78de8f876683 ("drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workarounds")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-25-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add wait helper for VF FLR
Michał Winiarski [Wed, 12 Nov 2025 13:22:19 +0000 (14:22 +0100)] 
drm/xe/pf: Add wait helper for VF FLR

VF FLR requires additional processing done by PF driver.
The processing is done after FLR is already finished from PCIe
perspective.
In order to avoid a scenario where migration state transitions while
PF processing is still in progress, additional synchronization
point is needed.
Add a helper that will be used as part of VF driver struct
pci_error_handlers .reset_done() callback.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-24-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle VRAM migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:18 +0000 (14:22 +0100)] 
drm/xe/pf: Handle VRAM migration data as part of PF control

Connect the helpers to allow save and restore of VRAM migration data in
stop_copy / resume device state.

Co-developed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-23-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/migrate: Add function to copy of VRAM data in chunks
Lukasz Laguna [Wed, 12 Nov 2025 13:22:17 +0000 (14:22 +0100)] 
drm/xe/migrate: Add function to copy of VRAM data in chunks

Introduce a new function to copy data between VRAM and sysmem objects.
The existing xe_migrate_copy() is tailored for eviction and restore
operations, which involves additional logic and operates on entire
objects.
The xe_migrate_vram_copy_chunk() allows copying chunks of data to or
from a dedicated buffer object, which is essential in case of VF
migration.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-22-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add helper to retrieve VF's LMEM object
Lukasz Laguna [Wed, 12 Nov 2025 13:22:16 +0000 (14:22 +0100)] 
drm/xe/pf: Add helper to retrieve VF's LMEM object

Instead of accessing VF's lmem_obj directly, introduce a helper function
to make the access more convenient.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-21-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle MMIO migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:15 +0000 (14:22 +0100)] 
drm/xe/pf: Handle MMIO migration data as part of PF control

Implement the helpers and use them for save and restore of MMIO
migration data in stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-20-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle GGTT migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:14 +0000 (14:22 +0100)] 
drm/xe/pf: Handle GGTT migration data as part of PF control

Connect the helpers to allow save and restore of GGTT migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-19-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add helpers for VF GGTT migration data handling
Michał Winiarski [Wed, 12 Nov 2025 13:22:13 +0000 (14:22 +0100)] 
drm/xe/pf: Add helpers for VF GGTT migration data handling

In an upcoming change, the VF GGTT migration data will be handled as
part of VF control state machine. Add the necessary helpers to allow the
migration data transfer to/from the HW GGTT resource.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-18-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Handle GuC migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:12 +0000 (14:22 +0100)] 
drm/xe/pf: Handle GuC migration data as part of PF control

Connect the helpers to allow save and restore of GuC migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-17-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Switch VF migration GuC save/restore to struct migration data
Michał Winiarski [Wed, 12 Nov 2025 13:22:11 +0000 (14:22 +0100)] 
drm/xe/pf: Switch VF migration GuC save/restore to struct migration data

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Now that the data is decoupled from both guc_state debugfs and PAUSE
state, we can safely remove the struct xe_gt_sriov_state_snapshot and
modify the GuC save/restore functions to operate on struct
xe_sriov_migration_data.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-16-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Don't save GuC VF migration data on pause
Michał Winiarski [Wed, 12 Nov 2025 13:22:10 +0000 (14:22 +0100)] 
drm/xe/pf: Don't save GuC VF migration data on pause

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Remove it from PAUSE state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-15-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Remove GuC migration data save/restore from GT debugfs
Michał Winiarski [Wed, 12 Nov 2025 13:22:09 +0000 (14:22 +0100)] 
drm/xe/pf: Remove GuC migration data save/restore from GT debugfs

In upcoming changes, SR-IOV VF migration data will be extended beyond
GuC data and exported to userspace using VFIO interface (with a
vendor-specific variant driver) and a device-level debugfs interface.
Remove the GT-level debugfs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-14-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration
Michał Winiarski [Wed, 12 Nov 2025 13:22:08 +0000 (14:22 +0100)] 
drm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration

Contiguous PF GGTT VMAs can be scarce after creating VFs.
Increase the GuC buffer cache size to 8M for PF so that we can fit GuC
migration data (which currently maxes out at just over 4M) and use the
cache instead of allocating fresh BOs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-13-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe: Allow the caller to pass guc_buf_cache size
Michał Winiarski [Wed, 12 Nov 2025 13:22:07 +0000 (14:22 +0100)] 
drm/xe: Allow the caller to pass guc_buf_cache size

An upcoming change will use GuC buffer cache as a place where GuC
migration data will be stored, and the memory requirement for that is
larger than indirect data.
Allow the caller to pass the size based on the intended usecase.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-12-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe: Add sa/guc_buf_cache sync interface
Michał Winiarski [Wed, 12 Nov 2025 13:22:06 +0000 (14:22 +0100)] 
drm/xe: Add sa/guc_buf_cache sync interface

In upcoming changes the cached buffers are going to be used to read data
produced by the GuC. Add a counterpart to flush, which synchronizes the
CPU-side of suballocation with the GPU data and propagate the interface
to GuC Buffer Cache.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-11-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Expose VF migration data size over debugfs
Michał Winiarski [Wed, 12 Nov 2025 13:22:05 +0000 (14:22 +0100)] 
drm/xe/pf: Expose VF migration data size over debugfs

The size is normally used to make a decision on when to stop the device
(mainly when it's in a pre_copy state).

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-10-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add minimalistic migration descriptor
Michał Winiarski [Wed, 12 Nov 2025 13:22:04 +0000 (14:22 +0100)] 
drm/xe/pf: Add minimalistic migration descriptor

The descriptor reuses the KLV format used by GuC and contains metadata
that can be used to quickly fail migration when source is incompatible
with destination.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-9-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add support for encap/decap of bitstream to/from packet
Michał Winiarski [Wed, 12 Nov 2025 13:22:03 +0000 (14:22 +0100)] 
drm/xe/pf: Add support for encap/decap of bitstream to/from packet

Add debugfs handlers for migration state and handle bitstream
.read()/.write() to convert from bitstream to/from migration data
packets.
As descriptor/trailer are handled at this layer - add handling for both
save and restore side.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-8-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add helpers for migration data packet allocation / free
Michał Winiarski [Wed, 12 Nov 2025 13:22:02 +0000 (14:22 +0100)] 
drm/xe/pf: Add helpers for migration data packet allocation / free

Now that it's possible to free the packets - connect the restore
handling logic with the ring.
The helpers will also be used in upcoming changes that will start
producing migration data packets.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-7-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add data structures and handlers for migration rings
Michał Winiarski [Wed, 12 Nov 2025 13:22:01 +0000 (14:22 +0100)] 
drm/xe/pf: Add data structures and handlers for migration rings

Migration data is queued in a per-GT ptr_ring to decouple the worker
responsible for handling the data transfer from the .read() and .write()
syscalls.
Add the data structures and handlers that will be used in future
commits.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-6-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Add save/restore control state stubs and connect to debugfs
Michał Winiarski [Wed, 12 Nov 2025 13:22:00 +0000 (14:22 +0100)] 
drm/xe/pf: Add save/restore control state stubs and connect to debugfs

The states will be used by upcoming changes to produce (in case of save)
or consume (in case of resume) the VF migration data.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-5-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Convert control state to bitmap
Michał Winiarski [Wed, 12 Nov 2025 13:21:59 +0000 (14:21 +0100)] 
drm/xe/pf: Convert control state to bitmap

In upcoming changes, the number of states will increase as a result of
introducing SAVE and RESTORE states.
This means that using unsigned long as underlying storage won't work on
32-bit architectures, as we'll run out of bits.
Use bitmap instead.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510231918.XlOqymLC-lkp@intel.com/
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-4-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe: Move migration support to device-level struct
Michał Winiarski [Wed, 12 Nov 2025 13:21:58 +0000 (14:21 +0100)] 
drm/xe: Move migration support to device-level struct

Upcoming changes will allow users to control VF state and obtain its
migration data with a device-level granularity (not tile/gt).
Change the data structures to reflect that and move the GT-level
migration init to happen after device-level init.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-3-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/pf: Remove GuC version check for migration support
Michał Winiarski [Wed, 12 Nov 2025 13:21:57 +0000 (14:21 +0100)] 
drm/xe/pf: Remove GuC version check for migration support

Since commit 4eb0aab6e4434 ("drm/xe/guc: Bump minimum required GuC
version to v70.29.2"), the minimum GuC version required by the driver
is v70.29.2, which should already include everything that we need for
migration.
Remove the version check.

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-2-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/guc: Eliminate RPa frequency caching
Sk Anirban [Wed, 12 Nov 2025 18:51:56 +0000 (00:21 +0530)] 
drm/xe/guc: Eliminate RPa frequency caching

Remove the cached pc->rpa_freq field and refactor RPA frequency handling
to fetch values directly from hardware registers on each request.

v2: Check graphics version instead of platform (Rodrigo)
v3: Fix graphics version check (Badal)

Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Suggested-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251112185153.3593145-6-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe/guc: Eliminate RPe caching for SLPC parameter handling
Sk Anirban [Wed, 12 Nov 2025 18:51:55 +0000 (00:21 +0530)] 
drm/xe/guc: Eliminate RPe caching for SLPC parameter handling

RPe is runtime-determined by PCODE and caching it caused stale values,
leading to incorrect GuC SLPC parameter settings.
Drop the cached rpe_freq field and query fresh values from hardware
on each use to ensure GuC SLPC parameters reflect current RPe.

v2: Remove cached RPe frequency field (Rodrigo)
v3: Remove extra variable (Vinay)
    Modify function name (Vinay)
v4: Maintain a separate function for PVC (Rodrigo)
v5: Avoid RPn update while fetching RPe frequency (Rodrigo)
v6: Split platform-specific RPe comments (Vinay)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5166
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patch.msgid.link/20251112185153.3593145-5-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe/pf: Allow to lockdown the PF using custom guard
Michal Wajdeczko [Sun, 9 Nov 2025 16:24:50 +0000 (17:24 +0100)] 
drm/xe/pf: Allow to lockdown the PF using custom guard

Some driver components, like eudebug or ccs-mode, can't be used
when VFs are enabled.  Add functions to allow those components
to block the PF from enabling VFs for the requested duration.

Introduce trivial counter to allow lockdown or exclusive access
that can be used in the scenarios where we can't follow the strict
owner semantics as required by the rw_semaphore implementation.

Before enabling VFs, the PF will try to arm the "vfs_enabling"
guard for the exclusive access.  This will fail if there are
some lockdown requests already initiated by the other components.

For testing purposes, add debugfs file which will call these new
functions from the file's open/close hooks.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Christoph Manszewski <christoph.manszewski@intel.com>
Reviewed-by: Christoph Manszewski <christoph.manszewski@intel.com>
Link: https://patch.msgid.link/20251109162451.4779-1-michal.wajdeczko@intel.com
3 months agodrm/xe/pcode: Rework error mapping
Lucas De Marchi [Mon, 10 Nov 2025 16:41:08 +0000 (08:41 -0800)] 
drm/xe/pcode: Rework error mapping

The sparse array used for error decoding from is unnecessarily big. It
should be better handled by a switch statement that will also allow us
to more easily improve this code.

Add a CASE_ERR() macro to keep the table compact and use it instead of
the 256-entries array, which saves some space:

$ bloat-o-meter xe_pcode.o.old xe_pcode.o
add/remove: 0/1 grow/shrink: 2/0 up/down: 190/-4096 (-3906)
Function                                     old     new   delta
__pcode_mailbox_rw                           363     465    +102
__pcode_mailbox_rw.cold                       58     146     +88
err_decode                                  4096       -   -4096
Total: Before=7890, After=3984, chg -49.51%

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20251110-pcode-errmap-v2-1-cb18c8f54238@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: fix kernel-doc function name mismatch in xe_pm.c
Kriish Sharma [Mon, 10 Nov 2025 18:42:06 +0000 (18:42 +0000)] 
drm/xe: fix kernel-doc function name mismatch in xe_pm.c

Documentation build reported:

   WARNING: ./drivers/gpu/drm/xe/xe_pm.c:131 expecting prototype for xe_pm_might_block_on_suspend(). Prototype was for xe_pm_block_on_suspend() instead

The kernel-doc comment for xe_pm_block_on_suspend() incorrectly used
the function name xe_pm_might_block_on_suspend(). Fix the header to
match the actual function prototype.

No functional changes.

Fixes: f73f6dd312a5 ("drm/xe/pm: Add lockdep annotation for the pm_block completion")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511061736.CiuroL7H-lkp@intel.com/
Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251110184206.2113830-1-kriish.sharma2006@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe/pf: Add runtime registers for GFX ver >= 35
Piotr Piórkowski [Fri, 7 Nov 2025 21:18:45 +0000 (22:18 +0100)] 
drm/xe/pf: Add runtime registers for GFX ver >= 35

Add a dedicated runtime register list for GFX ver >= 35.
Compared to the list for GFX >= 30, this variant drops
HUC_KERNEL_LOAD_INFO, MIRROR_FUSE1 and adds SERVICE_COPY_ENABLE.

v2:
 - drop MIRROR_FUSE1 register
 - update commit message

Fixes: 5e0de2dfbc1b ("drm/xe/cri: Add CRI platform definition")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251107211845.3633633-1-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe/vram: Move forcewake down to get_flat_ccs_offset()
Lucas De Marchi [Fri, 7 Nov 2025 18:23:45 +0000 (10:23 -0800)] 
drm/xe/vram: Move forcewake down to get_flat_ccs_offset()

With SG_TILE_ADDR_RANGE use, the only thing requiring GT forcewake while
probing for vram size is the get_flat_ccs_offset(). Move the forcewake
down where it's needed.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251107-tile-addr-v1-2-a3014aadc2e7@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Use SG_TILE_ADDR_RANGE instead of TILE_ADDR_RANGE
Fei Yang [Fri, 7 Nov 2025 18:23:44 +0000 (10:23 -0800)] 
drm/xe: Use SG_TILE_ADDR_RANGE instead of TILE_ADDR_RANGE

The TILE_ADDR_RANGE register is not available on all platforms going
forward as it was deprecated and is being replaced by equivalent
registers within SoC MMIO space. While that doesn't happen, the
SG_TILE_ADDR_RANGE (base 0x1083a0) is still valid for all platforms
supported by xe. Use that instead.

BSpec: 59353, 54991
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251107-tile-addr-v1-1-a3014aadc2e7@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
3 months agodrm/xe: Fix MTL vm_max_level
Rodrigo Vivi [Sat, 8 Nov 2025 04:06:35 +0000 (23:06 -0500)] 
drm/xe: Fix MTL vm_max_level

MTL was broken after the vm_max_level movement. Get it back to a
working value.

[   37.722413] xe 0000:00:02.0: [drm] Tile0: GT0: VM job timed out on non-killed execqueue
[   37.722465] WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/xe/xe_guc_submit.c:1379 guc_exec_queue_timedout_job+0x2f3/0xe00 [xe]
[   37.722559] Modules linked in: xt_REDIRECT nft_compat nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr sunrpc bnep snd_ctl_led snd_soc_s\
of_sdw snd_soc_intel_hda_dsp_common snd_soc_sdw_utils snd_sof_probes snd_soc_rt712_sdca regmap_sdw_mbq snd_hda_codec_intelhdmi regmap_sdw snd_soc_dmic snd_hda_intel snd_sof_pci_intel_mtl iwlmvm snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink\
 snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp binfmt_misc snd_sof mac80211 vfat snd_sof_utils fat snd_hda_ext_core snd_hda_codec snd_hda_core snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi snd_hwdep \
crc8 soundwire_bus libarc4 snd_soc_sdca snd_soc_core
[   37.722584]  snd_compress ac97_bus uvcvideo snd_pcm_dmaengine iwlwifi snd_seq uvc videobuf2_vmalloc snd_seq_device videobuf2_memops videobuf2_v4l2 snd_pcm processor_thermal_device_pci videobuf2_common processor_thermal_device btusb intel_uncore_frequency processor_thermal_wt_hint intel_uncore_frequency_common platform_temp\
erature_control videodev btmtk spi_nor processor_thermal_soc_slider x86_pkg_temp_thermal btrtl snd_timer iTCO_wdt processor_thermal_rfim intel_powerclamp btbcm intel_pmc_bxt snd intel_rapl_msr processor_thermal_rapl coretemp iTCO_vendor_support mei_gsc_proxy btintel intel_rapl_common rapl intel_cstate cfg80211 bluetooth mc in\
tel_pmc_core mtd soundcore acer_wmi mei_me intel_uncore processor_thermal_wt_req i2c_i801 spi_intel_pci pmt_telemetry platform_profile mei processor_thermal_power_floor spi_intel i2c_smbus pmt_discovery igen6_edac pcspkr rfkill wmi_bmof idma64 processor_thermal_mbox intel_hid pmt_class int3403_thermal int3400_thermal joydev i\
nt340x_thermal_zone acpi_pad sparse_keymap
[   37.722611]  intel_pmc_ssram_telemetry acpi_thermal_rel acer_wireless loop nfnetlink zram lz4hc_compress lz4_compress dm_crypt xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm_helper i915 nvme i2c_algo_bit nvme_core drm_buddy ucsi_acpi ttm typec_ucsi typec nvme_keyring nvme_auth hkdf drm_displa\
y_helper hid_multitouch polyval_clmulni thunderbolt intel_vpu ghash_clmulni_intel cec vmd i2c_hid_acpi video intel_vsec i2c_hid wmi pinctrl_meteorlake serio_raw i2c_dev fuse
[   37.722638] CPU: 0 UID: 0 PID: 12 Comm: kworker/u88:0 Not tainted 6.18.0-rc2+ #37 PREEMPT(voluntary)
[   37.722641] Hardware name: Acer Swift SFG14-72/Coral_MTH, BIOS V1.01 11/06/2023
[   37.722643] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   37.722649] RIP: 0010:guc_exec_queue_timedout_job+0x2f3/0xe00 [xe]
[   37.722722] Code: 4c 24 10 44 89 44 24 08 e8 5a 95 f1 d4 44 8b 44 24 08 8b 4c 24 10 48 c7 c7 00 b7 25 c1 48 8b 54 24 18 48 89 c6 e8 4d 59 37 d4 <0f> 0b 80 3c 24 00 0f 85 55 03 00 00 49 8b 47 58 a8 01 75 1a 49 8b
[   37.722723] RSP: 0018:ffffd468000f7d80 EFLAGS: 00010246
[   37.722725] RAX: 0000000000000000 RBX: ffff8e3d4e215c00 RCX: 0000000000000027
[   37.722726] RDX: ffff8e40ae61cfc8 RSI: 0000000000000001 RDI: ffff8e40ae61cfc0
[   37.722727] RBP: 00000000fffffffb R08: 0000000000000000 R09: ffffd468000f7c20
[   37.722727] R10: ffff8e40c09fffa8 R11: 00000000fffbffff R12: ffff8e3d44c00028
[   37.722728] R13: ffff8e3d807d4000 R14: ffff8e3d807d4018 R15: ffff8e3d95c9d600
[   37.722729] FS:  0000000000000000(0000) GS:ffff8e4116110000(0000) knlGS:0000000000000000
[   37.722729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.722730] CR2: 00007ff1f3e02720 CR3: 0000000113c8d005 CR4: 0000000000f70ef0
[   37.722731] PKRU: 55555554
[   37.722731] Call Trace:
[   37.722734]  <TASK>
[   37.722735]  ? __pfx_autoremove_wake_function+0x10/0x10
[   37.722740]  drm_sched_job_timedout+0x81/0x170 [gpu_sched]

Fixes: 50292f9af8ec ("drm/xe: Move 'vm_max_level' flag back to platform descriptor")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251108040634.6376-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 months agodrm/xe/vf: Enable VF resource fixup unconditionally
Michał Winiarski [Fri, 7 Nov 2025 16:10:00 +0000 (17:10 +0100)] 
drm/xe/vf: Enable VF resource fixup unconditionally

All the feature enabling code is in place - drop the debug flag
requirement for VF resource fixup.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251107161000.1938186-1-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 months agodrm/xe/tests: Add KUnit tests for PF fair provisioning
Michal Wajdeczko [Thu, 6 Nov 2025 16:59:32 +0000 (17:59 +0100)] 
drm/xe/tests: Add KUnit tests for PF fair provisioning

Add test cases to check outcome of fair GuC context or doorbells
IDs allocations for regular and admin-only PF mode.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251106165932.2143-1-michal.wajdeczko@intel.com