]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
8 weeks agodrm/xe/guc: Ratelimit diagnostic messages from the relay
Michal Wajdeczko [Sun, 5 Oct 2025 17:39:46 +0000 (19:39 +0200)] 
drm/xe/guc: Ratelimit diagnostic messages from the relay

There might be some malicious VFs that by sending an invalid VF2PF
relay messages will flood PF's dmesg with our diagnostics messages.

Rate limit all relay messages, unless running in DEBUG_SRIOV mode.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251005173946.2784-1-michal.wajdeczko@intel.com
8 weeks agodrm/xe: Update MEMIRQ to use tile-based printk macros
Michal Wajdeczko [Sun, 5 Oct 2025 13:36:40 +0000 (15:36 +0200)] 
drm/xe: Update MEMIRQ to use tile-based printk macros

We already have tile-based printk macros, there is no need to
manually prepare MEMIRQ specific messages to include tile id.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251005133641.2651-5-michal.wajdeczko@intel.com
8 weeks agodrm/xe/pf: Update LMTT to use tile-based messages
Michal Wajdeczko [Sun, 5 Oct 2025 13:36:39 +0000 (15:36 +0200)] 
drm/xe/pf: Update LMTT to use tile-based messages

Since now we have tile-based SR-IOV printk macros, there is no
need to manually prepare the LMTT specific warning message (that
is now upgraded to proper error level message) nor to use generic
debug message without tile/LMTT identification.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251005133641.2651-4-michal.wajdeczko@intel.com
8 weeks agodrm/xe: Add tile-based SRIOV printk macros
Michal Wajdeczko [Sun, 5 Oct 2025 13:36:38 +0000 (15:36 +0200)] 
drm/xe: Add tile-based SRIOV printk macros

We already have device and GT level SR-IOV specific macros, but
unlike native case, we don't have yet tile-based ones.

Add macros to match native use case and also update GT-based
macros to rely on those new tile-based SR-IOV macros. This will
slightly rearrange the output of the GT logs and instead:

  [...] Tile0: GT0: PF: pushed VF1 config with 2 KLVs...

we might see:

  [...] PF: Tile0: GT0: pushed VF1 config with 2 KLVs...

but that's even better.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251005133641.2651-3-michal.wajdeczko@intel.com
8 weeks agodrm/xe: Update SRIOV printk macros
Michal Wajdeczko [Sun, 5 Oct 2025 13:36:37 +0000 (15:36 +0200)] 
drm/xe: Update SRIOV printk macros

Recently we introduced xe-based printk macros, use them instead
of plain drm-based ones.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251005133641.2651-2-michal.wajdeczko@intel.com
8 weeks agodrm/xe/pf: Make the late-initialization really late
Michal Wajdeczko [Sat, 4 Oct 2025 16:20:08 +0000 (18:20 +0200)] 
drm/xe/pf: Make the late-initialization really late

While the late PF per-GT initialization is done quite late in the
single GT initialization flow, in case of multi-GT platforms, it
may still be done before other GT early initialization. That leads
to some issues during unwind, when there are cross-GT dependencies,
like resource cleanup that is shared by both GTs, but the other GT
may already be sanitized or disabled.

The following errors could be observed when trying to unload the PF
driver with some LMEM/VRAM already provisioned for few VFs:

 [ ] xe 0000:03:00.0: DEVRES REL ffff88814708f240 fini_config (16 bytes)
 [ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=1 pte=0x0
 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=53a470000
 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=53a4b0000
 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
 [ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV)
 [ ] xe 0000:03:00.0: [drm:lmtt_write_pte [xe]] PF: LMTT: WRITE level=2 index=2 pte=0x0
 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=0 addr=539b70000
 [ ] xe 0000:03:00.0: [drm:lmtt_pt_free [xe]] PF: LMTT: level=1 addr=539bf0000
 [ ] xe 0000:03:00.0: [drm:lmtt_invalidate_hw [xe]] PF: LMTT: num_fences=2 err=-19
 [ ] xe 0000:03:00.0: [drm] PF: LMTT0 invalidation failed (-ENODEV)

Move all PF per-GT late initialization to the already defined late
SR-IOV initialization function to allow proper order of the cleanup
actions.

While around, format all PF function stubs as one-liners, like many
other stubs are defined in the Xe driver.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251004162008.1782-1-michal.wajdeczko@intel.com
8 weeks agodrm/xe/xe_late_bind_fw: Fix and simplify parsing user input
Michal Wajdeczko [Thu, 2 Oct 2025 19:27:36 +0000 (21:27 +0200)] 
drm/xe/xe_late_bind_fw: Fix and simplify parsing user input

Code was wrongly passing sizeof(uval) as the number base to use,
and unlike other debugfs entries that represent bool data, it
wasn't using the dedicated function to parse user input as bool.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20251002192736.203186-1-michal.wajdeczko@intel.com
8 weeks agodrm/xe: Don't force DRM_XE_DEBUG_MEMIRQ for SR-IOV debug
Michal Wajdeczko [Thu, 2 Oct 2025 17:13:08 +0000 (19:13 +0200)] 
drm/xe: Don't force DRM_XE_DEBUG_MEMIRQ for SR-IOV debug

For pure SR-IOV debugging there is no need to select already
separated config for the debugging of the memory based interrupts,
as the latter is also very noisy on its own. Change config order
and use a weak reverse dependency instead.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20251002171308.203127-1-michal.wajdeczko@intel.com
8 weeks agodrm/xe: Fix copyright and function naming in xe_ttm_vram_mgr
Shuicheng Lin [Sat, 4 Oct 2025 00:03:31 +0000 (00:03 +0000)] 
drm/xe: Fix copyright and function naming in xe_ttm_vram_mgr

- Correct copyright year from "2002" to "2022".
- Rename ttm_vram_mgr_fini() to xe_ttm_vram_mgr_fini() to avoid
  confusion with generic TTM helpers.

No functional changes intended.

Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://lore.kernel.org/r/20251004000425.2489291-2-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
8 weeks agodrm/xe: Combine userspace context check
Piotr Piórkowski [Fri, 3 Oct 2025 16:26:19 +0000 (18:26 +0200)] 
drm/xe: Combine userspace context check

Both vm->xef and XE_LRC_CREATE_USER_CTX indicate in xe_lrc_init that
the context originates from userspace. However, XE_LRC_CREATE_USER_CTX
has a broader scope as it may be set even when no vm->xef is present.
The XE_BO_FLAG_PINNED_LATE_RESTORE flag can be extended to both cases,
so there is no point in handling the two cases separately.
Let's combine vm->xef and XE_LRC_CREATE_USER_CTX checks to detect
userspace context.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-6-piotr.piorkowski@intel.com
8 weeks agodrm/xe/pf: Force use user VRAM for LMEM provisioning
Piotr Piórkowski [Fri, 3 Oct 2025 16:26:18 +0000 (18:26 +0200)] 
drm/xe/pf: Force use user VRAM for LMEM provisioning

The LMEM assigned to VFs should be allocated from the general-purpose
VRAM pool, not from the kernel-reserved region.
Let's force the use of general-purpose VRAM for BOs intended for VFs.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-5-piotr.piorkowski@intel.com
8 weeks agodrm/xe: Force user context allocations in user VRAM
Piotr Piórkowski [Fri, 3 Oct 2025 16:26:17 +0000 (18:26 +0200)] 
drm/xe: Force user context allocations in user VRAM

In general, kernel structures should be allocated in the kernel-dedicated
VRAM region. However, userspace context data - while used by the kernel -
does not need to reside there.
Let's force the allocation of such data in the general-purpose VRAM region
accessible to userspace.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-4-piotr.piorkowski@intel.com
8 weeks agodrm/xe: Introduce new BO flag XE_BO_FLAG_FORCE_USER_VRAM
Piotr Piórkowski [Fri, 3 Oct 2025 16:26:16 +0000 (18:26 +0200)] 
drm/xe: Introduce new BO flag XE_BO_FLAG_FORCE_USER_VRAM

When using a separate VRAM region for kernel allocations,
some kernel structures, such as context userspace data,
should not reside in the VRAM region dedicated to the kernel.
The VRAM kernel region is intended only for allocations necessary
for driver operation. Allocations created via ioctl are long-lived
and not easily evictable. If this region runs out of space,
there may not be a fallback, which could cause failures.
To prevent this, add a new BO flag that explicitly forces the BO to be
allocated in the general-purpose VRAM region accessible to userspace,
avoiding the kernel-only VRAM region.

v2:
 - update commit message (Matthew)

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-3-piotr.piorkowski@intel.com
8 weeks agodrm/xe: Add initial support for separate kernel VRAM region on the tile
Piotr Piórkowski [Fri, 3 Oct 2025 16:26:15 +0000 (18:26 +0200)] 
drm/xe: Add initial support for separate kernel VRAM region on the tile

So far, kernel and userspace allocations have shared the same VRAM region.
However, in some scenarios, it may be necessary to reserve a separate
VRAM area exclusively for kernel allocations.
Let's add preliminary support for such a configuration.

v2:
- replaced for_each_bo_flag_vram with the improved
  for_each_set_bo_vram_flag helper (Matthew)
- moved the VRAM flag iteration macro definition into xe_bo.c (Matthew)
- drop unused bo_flgas from bo_vram_flags_to_vram_placement (Matthew)
- use hweight32 helper in __xe_bo_fixed_placement for readability
  (Matthew)
v3: remove unnecessary VRAM fixup id

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-2-piotr.piorkowski@intel.com
2 months agoRevert "drm/xe/vf: Fixup CTB send buffer messages after migration"
Matthew Brost [Thu, 2 Oct 2025 23:38:24 +0000 (01:38 +0200)] 
Revert "drm/xe/vf: Fixup CTB send buffer messages after migration"

This reverts commit cef88d1265cac7d415606af73ba58926fd3cd8b7.

Due to change in the VF migration recovery design this code
is not needed any more.

v3:
 - Add commit message (Michal / Lucas)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251002233824.203417-4-michal.wajdeczko@intel.com
2 months agoRevert "drm/xe/vf: Post migration, repopulate ring area for pending request"
Matthew Brost [Thu, 2 Oct 2025 23:38:23 +0000 (01:38 +0200)] 
Revert "drm/xe/vf: Post migration, repopulate ring area for pending request"

This reverts commit a0dda25d24e636df5c30a9370464b7cebc709faf.

Due to change in the VF migration recovery design this code
is not needed any more.

v3:
 - Add commit message (Michal / Lucas)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251002233824.203417-3-michal.wajdeczko@intel.com
2 months agoRevert "drm/xe/vf: Rebase exec queue parallel commands during migration recovery"
Matthew Brost [Thu, 2 Oct 2025 23:38:22 +0000 (01:38 +0200)] 
Revert "drm/xe/vf: Rebase exec queue parallel commands during migration recovery"

This reverts commit ba180a362128cb71d16c3f0ce6645448011d2607.

Due to change in the VF migration recovery design this code
is not needed any more.

v3:
 - Add commit message (Michal / Lucas)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251002233824.203417-2-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Synchronize VF FLR between all GTs
Michal Wajdeczko [Tue, 30 Sep 2025 23:35:24 +0000 (01:35 +0200)] 
drm/xe/pf: Synchronize VF FLR between all GTs

The PF part of the VF FLR processing shall be done after all GuCs
confirm that they finished their part VF FLR processing, otherwise
PF may start clearing VF's GGTT that other GuC may still accessing.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-7-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Split VF FLR processing function
Michal Wajdeczko [Tue, 30 Sep 2025 23:35:23 +0000 (01:35 +0200)] 
drm/xe/pf: Split VF FLR processing function

On multi-GT platforms (like PTL) we may want to run VF FLR on each
GuC (render and media) in parallel. Split our FLR function to allow
to wait for GT VF FLR completion separately.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-6-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Unify VF state tracking log
Michal Wajdeczko [Tue, 30 Sep 2025 23:35:22 +0000 (01:35 +0200)] 
drm/xe/pf: Unify VF state tracking log

By using single function that dumps VF state transition, final
logs are easier to analyze as there is always the same call site
in every debug message.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-5-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Expose VF control operations over debugfs
Michal Wajdeczko [Tue, 30 Sep 2025 23:35:21 +0000 (01:35 +0200)] 
drm/xe/pf: Expose VF control operations over debugfs

To allow the user to control the activity of individual VFs,
expose basic VF control operations (pause, resume, stop, reset)
over the debugfs as write-only files:

  /sys/kernel/debug/dri/BDF/sriov/
  ├── vf1
  │   ├── pause
  │   ├── reset
  │   ├── resume
  │   ├── stop
  │   :
  ├── vf2
  :   :

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-4-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Log only top level VF state changes
Michal Wajdeczko [Tue, 30 Sep 2025 23:35:20 +0000 (01:35 +0200)] 
drm/xe/pf: Log only top level VF state changes

The user likely only care about top level VF state changes, any VF
state logs on the per-GT basis can be demoted to the debug level.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-3-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Add top level functions to control VFs
Michal Wajdeczko [Tue, 30 Sep 2025 23:35:19 +0000 (01:35 +0200)] 
drm/xe/pf: Add top level functions to control VFs

We already have control functions that we use to control the VF
state on the per-GT basis, but that is low level detail from the
user point of view, who rather expects VF-level functions.

For now add simple functions that just iterate over all GTs and
call per-GT control function. We will soon allow to use some of
them from the user facing interfaces like debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250930233525.201263-2-michal.wajdeczko@intel.com
2 months agodrm/xe: Detect GT workqueue allocation failure
Michal Wajdeczko [Wed, 1 Oct 2025 14:40:51 +0000 (16:40 +0200)] 
drm/xe: Detect GT workqueue allocation failure

The allocation of the per-GT workqueue may fail and we shouldn't
ignore that.  While around use drm managed allocation function
to drop our custom fini action.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251001144051.202040-1-michal.wajdeczko@intel.com
2 months agodrm/xe/doc: Add documentation for Execution Queues
Niranjana Vishwanathapura [Thu, 2 Oct 2025 04:43:20 +0000 (21:43 -0700)] 
drm/xe/doc: Add documentation for Execution Queues

Add documentation for Xe Execution Queues and add xe_exec_queue.rst
file.

v2: Add info about how Execution queue interfaces
    with other components in the driver (Matt Brost)

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251002044319.450181-2-niranjana.vishwanathapura@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/i2c: Don't rely on d3cold.allowed flag in system PM path
Raag Jadav [Thu, 18 Sep 2025 10:32:00 +0000 (16:02 +0530)] 
drm/xe/i2c: Don't rely on d3cold.allowed flag in system PM path

In S3 and above sleep states, the device can loose power regardless of
d3cold.allowed flag. Bring up I2C controller explicitly in system PM
path to ensure its normal operation after losing power.

v2: Cover S3 and above states (Rodrigo)

Fixes: 0ea07b69517a ("drm/xe/pm: Wire up suspend/resume for I2C controller")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250918103200.2952576-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Initialize uval variable in xe_late_bind_fw_num_fans()
Mallesh Koujalagi [Thu, 2 Oct 2025 00:56:48 +0000 (06:26 +0530)] 
drm/xe/xe_late_bind_fw: Initialize uval variable in xe_late_bind_fw_num_fans()

Initialize the uval variable to 0 in xe_late_bind_fw_num_fans() to fix
a potential use of uninitialized variable warning and ensure predictable
behavior.

The variable is passed by reference to xe_pcode_read() which should
populate it on success, but initializing it to 0 provides a safe
default value and follows kernel coding best practices.

v2:
- uval = 0 which serves as both a safe default and the fallback
  value when the pcode read operation fails.

v3:
- Handle MMIO failure (Rodrigo)
- The function should probably return the error and make the uval as
  pointer-argument, like the pcode_read.
- Change the caller of this function to propagate the error
  upwards if mmio failed.

Fixes: 45832bf9c10f3 ("drm/xe/xe_late_bind_fw: Initialize late binding firmware")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Link: https://lore.kernel.org/r/20251002005648.3185636-1-mallesh.koujalagi@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 months agodrm/gpusvm, drm/xe: Fix userptr to not allow device private pages
Thomas Hellström [Tue, 30 Sep 2025 12:27:52 +0000 (14:27 +0200)] 
drm/gpusvm, drm/xe: Fix userptr to not allow device private pages

When userptr is used on SVM-enabled VMs, a non-NULL
hmm_range::dev_private_owner value might mean that
hmm_range_fault() attempts to return device private pages.
Either that will fail, or the userptr code will not know
how to handle those.

Use NULL for hmm_range::dev_private_owner to migrate
such pages to system. In order to do that, move the
struct drm_gpusvm::device_private_page_owner field to
struct drm_gpusvm_ctx::device_private_page_owner so that
it doesn't remain immutable over the drm_gpusvm lifetime.

v2:
- Don't conditionally compile xe_svm_devm_owner().
- Kerneldoc xe_svm_devm_owner().

Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250930122752.96034-1-thomas.hellstrom@linux.intel.com
2 months agodrm/xe/sysfs: Drop redundant runtime PM usage
Raag Jadav [Thu, 18 Sep 2025 11:48:04 +0000 (17:18 +0530)] 
drm/xe/sysfs: Drop redundant runtime PM usage

The device is expected to be in D0 state during driver probe. No need to
resume it in ->is_visible() callbacks or non I/O operations.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250918114804.2957177-3-raag.jadav@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/hwmon: Drop redundant runtime PM usage
Raag Jadav [Thu, 18 Sep 2025 11:48:03 +0000 (17:18 +0530)] 
drm/xe/hwmon: Drop redundant runtime PM usage

The device is expected to be in D0 state during driver probe. No need to
resume it in ->is_visible() callbacks.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250918114804.2957177-2-raag.jadav@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Fix missing initialization of variable offset
Colin Ian King [Wed, 24 Sep 2025 10:22:08 +0000 (11:22 +0100)] 
drm/xe/xe_late_bind_fw: Fix missing initialization of variable offset

The variable offset is not being initialized, and it is only set inside
a for-loop if entry->name is the same as manifest_entry. In the case
where it is not initialized a non-zero check on offset is potentialy checking
a bogus uninitalized value. Fix this by initializing offset to zero.

Fixes: efa29317a553 ("drm/xe/xe_late_bind_fw: Extract and print version info")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250924102208.9216-1-colin.i.king@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 months agodrm/xe/bo: Fix an idle assertion for local bos
Thomas Hellström [Mon, 29 Sep 2025 11:26:49 +0000 (13:26 +0200)] 
drm/xe/bo: Fix an idle assertion for local bos

Before calling ttm_bo_populate() in the CPU fault path of a bo,
we assert that the bo is not being migrated. However, for
local bos we share the reservation object with other local bos
that might be in the process of being migrated. Also some VM
operations may attach USAGE_KERNEL fences to the common
reservation object and trigger false positives from the assert.

So remove the assert and instead wait for bo idle. This may
unnecessarily wait for idle in some cases but since we're
doing this wait later in the fault path anyway we might as
well do it here as well.

This fixes warnings like:
Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------
Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) || (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed!
                              platform: BATTLEMAGE subplatform: 1
                              graphics: Xe2_HPG 20.01 step A0
                              media: Xe2_HPM 13.01 step A1
Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr>
Sep 25 14:56:23 desky kernel:  snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor>
Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G     U  W           6.17.0-rc7+ #32 PREEMPT(voluntary)
Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN
Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022
Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4>
Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286
Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027
Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0
Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa
Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8
Sep 25 14:56:23 desky kernel: FS:  00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000
Sep 25 14:56:23 desky kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0
Sep 25 14:56:23 desky kernel: PKRU: 55555558
Sep 25 14:56:23 desky kernel: Call Trace:
Sep 25 14:56:23 desky kernel:  <TASK>
Sep 25 14:56:23 desky kernel:  xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe]
Sep 25 14:56:23 desky kernel:  xe_bo_cpu_fault+0x84/0x410 [xe]
Sep 25 14:56:23 desky kernel:  ? __x64_sys_mmap+0x33/0x50
Sep 25 14:56:23 desky kernel:  ? x64_sys_call+0x1b2e/0x20d0
Sep 25 14:56:23 desky kernel:  ? do_syscall_64+0x9d/0x1f0
Sep 25 14:56:23 desky kernel:  ? __check_object_size+0x4a/0x2e0
Sep 25 14:56:23 desky kernel:  __do_fault+0x36/0x190
Sep 25 14:56:23 desky kernel:  do_fault+0xcf/0x570
Sep 25 14:56:23 desky kernel:  __handle_mm_fault+0x92b/0xfe0
Sep 25 14:56:23 desky kernel:  ? ktime_get_mono_fast_ns+0x39/0xd0
Sep 25 14:56:23 desky kernel:  handle_mm_fault+0x164/0x2c0
Sep 25 14:56:23 desky kernel:  do_user_addr_fault+0x2cb/0x840
Sep 25 14:56:23 desky kernel:  exc_page_fault+0x75/0x180
Sep 25 14:56:23 desky kernel:  asm_exc_page_fault+0x27/0x30
Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7
Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7>
Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207
Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000
Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020
Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000
Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240
Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000
Sep 25 14:56:23 desky kernel:  </TASK>

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250
Fixes: c2ae94cf8cd8 ("drm/xe: Convert the CPU fault handler for exhaustive eviction")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250929112649.6131-1-thomas.hellstrom@linux.intel.com
2 months agodrm/xe/debugfs: Update xe_pat_dump signature
Michal Wajdeczko [Tue, 23 Sep 2025 21:16:13 +0000 (23:16 +0200)] 
drm/xe/debugfs: Update xe_pat_dump signature

Our debugfs helper xe_gt_debugfs_show_with_rpm() expects print()
functions to return int. New signature allows us to drop wrapper.

While around, move kernel-doc closer to the function definition,
as suggested in the doc-guide.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250923211613.193347-6-michal.wajdeczko@intel.com
2 months agodrm/xe/debugfs: Update xe_mocs_dump signature
Michal Wajdeczko [Tue, 23 Sep 2025 21:16:12 +0000 (23:16 +0200)] 
drm/xe/debugfs: Update xe_mocs_dump signature

Our debugfs helper xe_gt_debugfs_show_with_rpm() expects print()
functions to return int. New signature allows us to drop wrapper.

While around, move kernel-doc closer to the function definition,
as suggested in the doc-guide.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250923211613.193347-5-michal.wajdeczko@intel.com
2 months agodrm/xe/debugfs: Update xe_tuning_dump signature
Michal Wajdeczko [Tue, 23 Sep 2025 21:16:11 +0000 (23:16 +0200)] 
drm/xe/debugfs: Update xe_tuning_dump signature

Our debugfs helper xe_gt_debugfs_show_with_rpm() expects print()
functions to return int. New signature allows us to drop wrapper.

While around, print additional separation lines using puts() to
avoid output with leading \n which might confuse some printers.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250923211613.193347-4-michal.wajdeczko@intel.com
2 months agodrm/xe/debugfs: Update xe_wa_dump signature
Michal Wajdeczko [Tue, 23 Sep 2025 21:16:10 +0000 (23:16 +0200)] 
drm/xe/debugfs: Update xe_wa_dump signature

Our debugfs helper xe_gt_debugfs_show_with_rpm() expects print()
functions to return int. New signature allows us to drop wrapper.

While around, print additional separation lines using puts() to
avoid output with leading \n which might confuse some printers.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250923211613.193347-3-michal.wajdeczko@intel.com
2 months agodrm/xe/debugfs: Update xe_gt_topology_dump signature
Michal Wajdeczko [Tue, 23 Sep 2025 21:16:09 +0000 (23:16 +0200)] 
drm/xe/debugfs: Update xe_gt_topology_dump signature

Our debugfs helper xe_gt_debugfs_show_with_rpm() expects print()
functions to return int. New signature allows us to drop wrapper.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250923211613.193347-2-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Make GGTT/LMEM debugfs files per-tile
Michal Wajdeczko [Sun, 28 Sep 2025 14:00:28 +0000 (16:00 +0200)] 
drm/xe/pf: Make GGTT/LMEM debugfs files per-tile

Due to initial design of the Xe debugfs, the GGTT and LMEM files
were defined on the primary GT, instead of being per-tile.

While PF provisioning code is now still maintaining GGTT and LMEM
also on the per primary-GT level, this will be refactored soon,
but we can fix debugfs layout now, as part of the new SR-IOV tree.

For backward compatibility we will provide some symlinks that can
be removed once our tools will be fully converted.

As we are making all those changes in the user facing interface,
take this as apportunity to also start replacing the "LMEM" term,
used by the SR-IOV code, with the "VRAM" term, used by Xe driver.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-7-michal.wajdeczko@intel.com
2 months agodrm/xe/debugfs: Promote xe_tile_debugfs_simple_show
Michal Wajdeczko [Sun, 28 Sep 2025 14:00:27 +0000 (16:00 +0200)] 
drm/xe/debugfs: Promote xe_tile_debugfs_simple_show

We will want to use this helper function in other files.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-6-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Move SR-IOV GT debugfs files to new tree
Michal Wajdeczko [Sun, 28 Sep 2025 14:00:26 +0000 (16:00 +0200)] 
drm/xe/pf: Move SR-IOV GT debugfs files to new tree

Instead of expanding GT debugfs directories with large number of
SR-IOV files, as those are replicated per each SR-IOV function,
move them to our new debugfs tree, organized by the function.

But to avoid breaking IGT tests that use current layout, provide
symlinks which could be removed once transition period is over,
or we can we can leave them for convenience.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-5-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Populate SR-IOV debugfs tree with tiles
Michal Wajdeczko [Sun, 28 Sep 2025 14:00:25 +0000 (16:00 +0200)] 
drm/xe/pf: Populate SR-IOV debugfs tree with tiles

Populate new per SR-IOV function debugfs directories with next
level directories that represent tiles. There are no files yet,
but we will continue updating that tree in upcoming patches.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-4-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Create separate debugfs tree for SR-IOV files
Michal Wajdeczko [Sun, 28 Sep 2025 14:00:24 +0000 (16:00 +0200)] 
drm/xe/pf: Create separate debugfs tree for SR-IOV files

Currently we expose debugfs files related to SR-IOV functions
together with other native files, but that approach will not
scale well as we plan to add more attributes and also expose
some of them on the per-tile basis.

Start building separate tree for SR-IOV specific debugfs files
where we can replicate similar files per every SR-IOV function:

   /sys/kernel/debug/dri/BDF/
   ├── sriov
   │   ├── pf
   │   │   ├── tile0
   │   │   │   ├── gt0
   │   │   │   ├── gt1
   │   │   │   :
   │   │   ├── tile1
   │   │   :
   │   ├── vf1
   │   │   ├── tile0
   │   │   │   ├── gt0
   │   │   │   ├── gt1
   │   │   │   :
   │   │   :
   │   ├── vf2
   │   ├── ...

We will populate this new tree in upcoming patches.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-3-michal.wajdeczko@intel.com
2 months agodrm/xe/pf: Promote PF debugfs function to its own file
Michal Wajdeczko [Sun, 28 Sep 2025 14:00:23 +0000 (16:00 +0200)] 
drm/xe/pf: Promote PF debugfs function to its own file

In upcoming patches, we will build on the PF separate debugfs
tree for all SR-IOV related files and this new code will need
dedicated file. To minimize large diffs later, move existing
function now as-is, so any future modifications will be done
directly in target file.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250928140029.198847-2-michal.wajdeczko@intel.com
2 months agodrm/xe/vf: Don't claim support for firmware late-bind if VF
Michal Wajdeczko [Sun, 28 Sep 2025 17:48:11 +0000 (19:48 +0200)] 
drm/xe/vf: Don't claim support for firmware late-bind if VF

In general, the VFs can't load firmwares so attempt to initialize
the firmware late-bind component leads to errors like:

 [] xe 0000:03:00.1: [drm] *ERROR* Late bind component not bound

Fixes: 918bd789d62e ("drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6190
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-3-michal.wajdeczko@intel.com
2 months agodrm/xe/vf: Rename sriov_update_device_info
Michal Wajdeczko [Sun, 28 Sep 2025 17:48:10 +0000 (19:48 +0200)] 
drm/xe/vf: Rename sriov_update_device_info

This is a VF only function and its name should reflect that to
avoid any confusion. Move the VF check to the caller side.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-2-michal.wajdeczko@intel.com
2 months agodrm/xe/hw_engine_group: Fix double write lock release in error path
Shuicheng Lin [Thu, 25 Sep 2025 02:31:46 +0000 (02:31 +0000)] 
drm/xe/hw_engine_group: Fix double write lock release in error path

In xe_hw_engine_group_get_mode(), a write lock is acquired before
calling switch_mode(), which in turn invokes
xe_hw_engine_group_suspend_faulting_lr_jobs().

On failure inside xe_hw_engine_group_suspend_faulting_lr_jobs(),
the write lock is released there, and then again in
xe_hw_engine_group_get_mode(), leading to a double release.

Fix this by keeping both acquire and release operation in
xe_hw_engine_group_get_mode().

Fixes: 770bd1d34113 ("drm/xe/hw_engine_group: Ensure safe transition between execution modes")
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250925023145.1203004-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc: Refactor GuC load to use poll_timeout_us()
Lucas De Marchi [Mon, 22 Sep 2025 19:58:33 +0000 (12:58 -0700)] 
drm/xe/guc: Refactor GuC load to use poll_timeout_us()

Currently there are 2 wait loops for loading GuC: one in
xe_mmio_wait32_not() and one guc_wait_ucode(). Now that there's a
generic poll_timeout_us(), refactor the code to use that to be more
readable.

Main change in behavior is that there's no exponential wait anymore:
that is now replaced by a 10msec retry.

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-5-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc: Extract function to print load error
Lucas De Marchi [Mon, 22 Sep 2025 19:58:32 +0000 (12:58 -0700)] 
drm/xe/guc: Extract function to print load error

Move the error parsing and print out of guc_wait_ucode() into a helper
to clean up the wait function. Since now the `load_done != 1` condition
has a return statement, also simplify the if/else chain.

Reviewed-by: John Harrison <John.C.Harrison@Intel.com> # v2
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-4-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc: Drop helper to read freq
Lucas De Marchi [Mon, 22 Sep 2025 19:58:31 +0000 (12:58 -0700)] 
drm/xe/guc: Drop helper to read freq

As the forcewake is already held during GuC load, there's no need to use
a helper function to call xe_guc_pc_get_cur_freq(). Just call
xe_guc_pc_get_cur_freq_fw() directly.

Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-3-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/guc_pc: Use poll_timeout_us() for waiting
Lucas De Marchi [Mon, 22 Sep 2025 19:58:30 +0000 (12:58 -0700)] 
drm/xe/guc_pc: Use poll_timeout_us() for waiting

Convert wait_for_pc_state() and wait_for_act_freq_limit() to
poll_timeout_us(). This brings 2 changes in behavior: Drop the
exponential wait and fix a potential much longer sleep.

usleep_range() will wait anywhere between `wait` and `wait << 1`, so
it's not correct to assume `slept += wait`.  This code is not really
accurate. Pairing this with the exponential wait increase, it could be
waiting much longer than intended.

Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-2-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/device: Use poll_timeout_us() to wait for lmem
Lucas De Marchi [Mon, 22 Sep 2025 19:58:29 +0000 (12:58 -0700)] 
drm/xe/device: Use poll_timeout_us() to wait for lmem

Now that there's a generic poll_timeout_us(), use it to wait for
LMEM_INIT in GU_CNTL.

Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-1-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/configfs: Improve doc for ctx_restore* attributes
Lucas De Marchi [Wed, 24 Sep 2025 15:27:11 +0000 (08:27 -0700)] 
drm/xe/configfs: Improve doc for ctx_restore* attributes

Spell out the syntax instead of only using examples. Particularly
important the <engine-class> part since that's different than
engines_allowed and may confuse users. The same batch buffer is used for
all engines of a certain class.

Cc: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Fixes: e2a9854d806e ("drm/xe/configfs: Allow to select by class only")
Link: https://lore.kernel.org/r/20250924152709.659483-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/configfs: Fix engine class parsing
Lucas De Marchi [Wed, 24 Sep 2025 15:27:10 +0000 (08:27 -0700)] 
drm/xe/configfs: Fix engine class parsing

If mask is NULL, only the engine class should be accepted, so the
pattern string should be completely parsed. This should fix passing e.g.
rcs0 to ctx_restore_post_bb when it's only expecting the engine class.

Reported-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Closes: https://lore.kernel.org/r/20250922155544.67712-1-jonathan.cavitt@intel.com
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/aNJKnrCQmL9xS9Gv@stanley.mountain
Fixes: e2a9854d806e ("drm/xe/configfs: Allow to select by class only")
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250924152709.659483-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/uapi: loosen used tracking restriction
Matthew Auld [Fri, 19 Sep 2025 12:20:53 +0000 (13:20 +0100)] 
drm/xe/uapi: loosen used tracking restriction

Currently this is hidden behind perfmon_capable() since this is
technically an info leak, given that this is a system wide metric.
However the granularity reported here is always PAGE_SIZE aligned, which
matches what the core kernel is already willing to expose to userspace
if querying how many free RAM pages there are on the system, and that
doesn't need any special privileges. In addition other drm drivers seem
happy to expose this.

The motivation here if with oneAPI where they want to use the system
wide 'used' reporting here, so not the per-client fdinfo stats. This has
also come up with some perf overlay applications wanting this
information.

Fixes: 1105ac15d2a1 ("drm/xe/uapi: restrict system wide accounting")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joshua Santosh <joshua.santosh.ranjan@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250919122052.420979-2-matthew.auld@intel.com
2 months agodrm/xe/tests: Fix build break on clang 16.0.6
Michal Wajdeczko [Mon, 22 Sep 2025 10:12:07 +0000 (12:12 +0200)] 
drm/xe/tests: Fix build break on clang 16.0.6

The following error was reported when building with clang 16.0.6:

   In file included from drivers/gpu/drm/xe/xe_pci.c:1104:
>> drivers/gpu/drm/xe/tests/xe_pci.c:214:2: error: initializer \
   element is not a compile-time constant
           graphics_ip_xelp,
           ^~~~~~~~~~~~~~~~
   drivers/gpu/drm/xe/tests/xe_pci.c:221:2: error: initializer \
   element is not a compile-time constant
           media_ip_xem,
           ^~~~~~~~~~~~
   2 errors generated.

Fix that by explicit re-definition of pre-GMDID IPs, as there are
not so many of them.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509192041.tQwdE4DS-lkp@intel.com/
Fixes: 5bb5258e357e ("drm/xe/tests: Add pre-GMDID IP descriptors to param generators")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250922101207.192028-1-michal.wajdeczko@intel.com
2 months agodrm/xe/debugfs: Improve .show() helper for GT-based attributes
Michal Wajdeczko [Fri, 19 Sep 2025 16:04:30 +0000 (18:04 +0200)] 
drm/xe/debugfs: Improve .show() helper for GT-based attributes

Like we did for tile-based attributes, introduce separate show()
helper that implicitly takes an RPM reference prior to the call
to the actual print() function. This translates into some savings.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250919160430.573-3-michal.wajdeczko@intel.com
2 months agodrm/xe/debugfs: Make ggtt file per-tile
Michal Wajdeczko [Fri, 19 Sep 2025 16:04:29 +0000 (18:04 +0200)] 
drm/xe/debugfs: Make ggtt file per-tile

Due to initial lack of per-tile debugfs directories, the ggtt file
attribute was created as per-GT file. Fix that since now we have
proper per-tile directories.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250919160430.573-2-michal.wajdeczko@intel.com
2 months agodrm/xe/psmi: Do not return NULL
Lucas De Marchi [Mon, 22 Sep 2025 22:11:34 +0000 (15:11 -0700)] 
drm/xe/psmi: Do not return NULL

The checks for id and bo_size are impossible conditions. If they were
possible, then the caller should not be using IS_ERR(). Just replace
them with asserts which should be compiled out when not debugging and
at the same time prevent other refactors to break this assumption.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/aK1nZjyAF0s7bnHg@stanley.mountain
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250922221133.109921-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/pm: Add lockdep annotation for the pm_block completion
Thomas Hellström [Thu, 18 Sep 2025 14:28:48 +0000 (16:28 +0200)] 
drm/xe/pm: Add lockdep annotation for the pm_block completion

Similar to how we annotate dma-fences, add lockep annotation to
the pm_block completion to ensure we don't wait for it while holding
locks that are needed in the pm notifier or in the device
suspend / resume callbacks.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250918142848.21807-3-thomas.hellstrom@linux.intel.com
2 months agodrm/xe/pm: Hold the validation lock around evicting user-space bos for suspend
Thomas Hellström [Thu, 18 Sep 2025 14:28:47 +0000 (16:28 +0200)] 
drm/xe/pm: Hold the validation lock around evicting user-space bos for suspend

During pm notifier eviction we may still race with validations.
Ensure those are blocked out during eviction to ensure we have
access to as much system memory as possible.

During the suspend operation itself, we run single-threaded so that
shouldn't be a problem.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250918142848.21807-2-thomas.hellstrom@linux.intel.com
2 months agodrm/xe/dma-buf: Allow pinning of p2p dma-buf
Thomas Hellström [Thu, 18 Sep 2025 09:22:07 +0000 (11:22 +0200)] 
drm/xe/dma-buf: Allow pinning of p2p dma-buf

RDMA NICs typically requires the VRAM dma-bufs to be pinned in
VRAM for pcie-p2p communication, since they don't fully support
the move_notify() scheme. We would like to support that.

However allowing unaccounted pinning of VRAM creates a DOS vector
so up until now we haven't allowed it.

However with cgroups support in TTM, the amount of VRAM allocated
to a cgroup can be limited, and since also the pinned memory is
accounted as allocated VRAM we should be safe.

An analogy with system memory can be made if we observe the
similarity with kernel system memory that is allocated as the
result of user-space action and that is accounted using __GFP_ACCOUNT.

Ideally, to be more flexible, we would add a "pinned_memory",
or possibly "kernel_memory" limit to the dmem cgroups controller,
that would additionally limit the memory that is pinned in this way.
If we let that limit default to the dmem::max limit we can
introduce that without needing to care about regressions.

Considering that we already pin VRAM in this way for at least
page-table memory and LRC memory, and the above path to greater
flexibility, allow this also for dma-bufs.

v2:
- Update comments about pinning in the dma-buf kunit test
  (Niranjana Vishwanathapura)

Cc: Dave Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Simona Vetter <simona.vetter@ffwll.ch>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250918092207.54472-4-thomas.hellstrom@linux.intel.com
2 months agodrm/xe: Pre-allocate system memory for pinned external bos in the pm notfier
Thomas Hellström [Thu, 18 Sep 2025 09:22:06 +0000 (11:22 +0200)] 
drm/xe: Pre-allocate system memory for pinned external bos in the pm notfier

Similarly to what we do for other pinned bos, pre-allocate
system memory for pinned external bos in the pm notifier,
where swapping is still possible.

This hasn't been needed until now when we're about to allow
pinning of exernal VRAM bos.

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250918092207.54472-3-thomas.hellstrom@linux.intel.com
2 months agodrm/xe: Don't copy pinned kernel bos twice on suspend
Thomas Hellström [Thu, 18 Sep 2025 09:22:05 +0000 (11:22 +0200)] 
drm/xe: Don't copy pinned kernel bos twice on suspend

We were copying the bo content the bos on the list
"xe->pinned.late.kernel_bo_present" twice on suspend.

Presumingly the intent is to copy the pinned external bos on
the first pass.

This is harmless since we (currently) should have no pinned
external bos needing copy since
a) exernal system bos don't have compressed content,
b) We do not (yet) allow pinning of VRAM bos.

Still, fix this up so that we copy pinned external bos on
the first pass. We're about to allow bos pinned in VRAM.

Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250918092207.54472-2-thomas.hellstrom@linux.intel.com
2 months agoMerge drm/drm-next into drm-xe-next
Thomas Hellström [Mon, 22 Sep 2025 08:15:27 +0000 (10:15 +0200)] 
Merge drm/drm-next into drm-xe-next

Initial backmerge for 6.19 development.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2 months agoMerge tag 'amd-drm-next-6.18-2025-09-19' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Sun, 21 Sep 2025 22:44:52 +0000 (08:44 +1000)] 
Merge tag 'amd-drm-next-6.18-2025-09-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.18-2025-09-19:

amdgpu:
- Fence drv clean up fix
- DPC fixes
- Misc display fixes
- Support the MMIO remap page as a ttm pool
- JPEG parser updates
- UserQ updates
- VCN ctx handling fixes
- Documentation updates
- Misc cleanups
- SMU 13.0.x updates
- SI DPM updates
- GC 11.x cleaner shader updates
- DMCUB updates
- DML fixes
- Improve fallback handling for pixel encoding
- VCN reset improvements
- DCE6 DC updates
- DSC fixes
- Use devm for i2c buses
- GPUVM locking updates
- GPUVM documentation improvements
- Drop non-DC DCE11 code
- S0ix fixes
- Backlight fix
- SR-IOV fixes

amdkfd:
- SVM updates

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250919193354.2989255-1-alexander.deucher@amd.com
2 months agoMerge tag 'drm-xe-next-2025-09-19' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Sun, 21 Sep 2025 21:42:05 +0000 (07:42 +1000)] 
Merge tag 'drm-xe-next-2025-09-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:
 - Drop L3 bank mask reporting from the media GT on Xe3 and later. Only
   do that for the primary GT. No userspace needs or uses it for media
   and some platforms may report bogus values.
 - Add SLPC power_profile sysfs interface with support for base and
   power_saving modes (Vinay Belgaumkar, Rodrigo Vivi)
 - Add configfs attributes to add post/mid context-switch commands
   (Lucas De Marchi)

Cross-subsystem Changes:
 - Fix hmm_pfn_to_map_order() usage in gpusvm and refactor APIs to
   align with pieces previous handled by xe_hmm (Matthew Auld)

Core Changes:
 - Add MEI driver for Late Binding Firmware Update/Upload
   (Alexander Usyskin)

Driver Changes:
 - Fix GuC CT teardown wrt TLB invalidation (Satyanarayana)
 - Fix CCS save/restore on VF (Satyanarayana)
 - Increase default GuC crash buffer size (Zhanjun)
 - Allow to clear GT stats in debugfs to aid debugging (Matthew Brost)
 - Add more SVM GT stats to debugfs (Matthew Brost)
 - Fix error handling in VMA attr query (Himal)
 - Move sa_info in debugfs to be per tile (Michal Wajdeczko)
 - Limit number of retries upon receiving NO_RESPONSE_RETRY from GuC to
   avoid endless loop (Michal Wajdeczko)
 - Fix configfs handling for survivability_mode undoing user choice when
   unbinding the module (Michal Wajdeczko)
 - Refactor configfs attribute visibility to future-proof it and stop
   exposing survivability_mode if not applicable (Michal Wajdeczko)
 - Constify some functions (Harish Chegondi, Michal Wajdeczko)
 - Add/extend more HW workarounds for Xe2 and Xe3
   (Harish Chegondi, Tangudu Tilak Tirumalesh)
 - Replace xe_hmm with gpusvm (Matthew Auld)
 - Improve fake pci and WA kunit handling for testing new platforms
   (Michal Wajdeczko)
 - Reduce unnecessary PTE writes when migrating (Sanjay Yadav)
 - Cleanup GuC interface definitions and log message (John Harrison)
 - Small improvements around VF CCS (Michal Wajdeczko)
 - Enable bus mastering for the I2C controller (Raag Jadav)
 - Prefer devm_mutex of hand rolling it (Christophe JAILLET)
 - Drop sysfs and debugfs attributes not available for VF (Michal Wajdeczko)
 - GuC CT devm actions improvements (Michal Wajdeczko)
 - Recommend new GuC versions for PTL and BMG (Julia Filipchuk)
 - Improveme driver handling for exhaustive eviction using new
   xe_validation wrapper around drm_exec (Thomas Hellström)
 - Add and use printk wrappers for tile and device (Michal Wajdeczko)
 - Better document workaround handling in Xe (Lucas De Marchi)
 - Improvements on ARRAY_SIZE  and ERR_CAST usage (Lucas De Marchi,
   Fushuai Wang)
 - Align CSS firmware headers with the GuC APIs (John Harrison)
 - Test GuC to GuC (G2G) communication to aid debug in pre-production
   firmware (John Harrison)
 - Bail out driver probing if GuC fails to load (John Harrison)
 - Allow error injection in xe_pxp_exec_queue_add()
   (Daniele Ceraolo Spurio)
 - Minor refactors in xe_svm (Shuicheng Lin)
 - Fix madvise ioctl error handling (Shuicheng Lin)
 - Use attribute groups to simplify sysfs registration
   (Michal Wajdeczko)
 - Add Late Binding Firmware implementation in Xe to work together with
   the MEI component (Badal Nilawar, Daniele Ceraolo Spurio, Rodrigo
   Vivi)
 - Fix build with CONFIG_MODULES=n (Lucas De Marchi)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/c2et6dnkst2apsgt46dklej4nprqdukjosb55grpaknf3pvcxy@t7gtn3hqtp6n
2 months agodrm/xe: Fix build with CONFIG_MODULES=n
Lucas De Marchi [Fri, 12 Sep 2025 21:54:51 +0000 (14:54 -0700)] 
drm/xe: Fix build with CONFIG_MODULES=n

When building with CONFIG_MODULES=n, the __exit functions are dropped.
However our init functions may call them for error handling, so they are
not good candidates for the exit sections.

Fix this error reported by 0day:

ld.lld: error: relocation refers to a symbol in a discarded section: xe_configfs_exit
>>> defined in vmlinux.a(drivers/gpu/drm/xe/xe_configfs.o)
>>> referenced by xe_module.c
>>>               drivers/gpu/drm/xe/xe_module.o:(init_funcs) in archive vmlinux.a

This is the only exit function using __exit. Drop it to fix the build.

Cc: Riana Tauro <riana.tauro@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506092221.1FmUQmI8-lkp@intel.com/
Fixes: 16280ded45fb ("drm/xe: Add configfs to enable survivability mode")
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250912-fix-nomodule-build-v1-1-d11b70a92516@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agoMerge tag 'drm-intel-next-2025-09-12' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Fri, 19 Sep 2025 02:59:29 +0000 (12:59 +1000)] 
Merge tag 'drm-intel-next-2025-09-12' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

Cross-subsystem Changes:
- Overflow: add range_overflows and range_end_overflows (Jani)

Core Changes:
- Get rid of dev->struct_mutex (Luiz)

Non-display related:
 - GVT: Remove redundant ternary operators (Liao)
 - Various i915_utils clean-ups (Jani)

 Display related:
 - Wait PSR idle before on dsb commit (Jouni)
 - Fix size for for_each_set_bit() in abox iteration (Jani)
 - Abstract figuring out encoder name (Jani)
 - Remove FBC modulo 4 restriction for ADL-P+ (Uma)
 - Panic: refactor framebuffer allocation (Jani)
 - Backlight luminance control improvements (Suraj, Aaron)
 - Add intel_display_device_present (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aMxX_lBxm7wd5wmi@intel.com
2 months agoMerge tag 'drm-misc-next-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm...
Dave Airlie [Fri, 19 Sep 2025 02:50:22 +0000 (12:50 +1000)] 
Merge tag 'drm-misc-next-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Short summary of fixes pull:

pixpaper:
- Fix mode_valid function signature

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250918064558.GA10017@linux.fritz.box
2 months agodrm/xe/configfs: Add mid context restore bb
Lucas De Marchi [Tue, 16 Sep 2025 21:15:44 +0000 (14:15 -0700)] 
drm/xe/configfs: Add mid context restore bb

Like done for post context restore, allow the user to add commands to
the middle of context restore, at the beginning of engine restore
commands.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-7-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/lrc: Allow to add user commands mid context switch
Lucas De Marchi [Tue, 16 Sep 2025 21:15:43 +0000 (14:15 -0700)] 
drm/xe/lrc: Allow to add user commands mid context switch

Like done for post-context-restore commands, allow to add commands from
configfs in the middle of context restore. Since currently the indirect
ctx hardcodes the offset to CTX_INDIRECT_CTX_OFFSET_DEFAULT, this is
executed in the very beginning of engine context restore.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-6-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/lrc: Allow INDIRECT_CTX for more engine classes
Lucas De Marchi [Tue, 16 Sep 2025 21:15:42 +0000 (14:15 -0700)] 
drm/xe/lrc: Allow INDIRECT_CTX for more engine classes

Currently it's only allowed for render and compute. Going forward we
want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX
flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor
for its availability.

While at it, add the missing const to rcs_funcs array. Since
CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and
gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there
is no change in behavior, it's only preparation for future use case.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/configfs: Add post context restore bb
Lucas De Marchi [Tue, 16 Sep 2025 21:15:41 +0000 (14:15 -0700)] 
drm/xe/configfs: Add post context restore bb

Allow the user to specify commands to execute during a context restore.
Currently it's possible to parse 2 types of actions:

- cmd: the instructions are added as is to the bb
- reg: just use the address and value, without worrying about
  encoding the right LRI instruction. This is possibly the most
  useful use case, so added a dedicated action for that.

This also prepares for future BBs: mid context restore and rc6 context
restore that can re-use the same parsing functions.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-4-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/lrc: Allow to add user commands on context switch
Lucas De Marchi [Tue, 16 Sep 2025 21:15:40 +0000 (14:15 -0700)] 
drm/xe/lrc: Allow to add user commands on context switch

During validation it's useful to allows additional commands to be
executed on context switch. Fetch the commands from configfs (to be
added) and add them to the WA BB.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-3-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/configfs: Allow to select by class only
Lucas De Marchi [Tue, 16 Sep 2025 21:15:39 +0000 (14:15 -0700)] 
drm/xe/configfs: Allow to select by class only

For a future configfs attribute, it's desirable to select by engine mask
only as the instance doesn't make sense.

Rename the function lookup_engine_mask() to lookup_engine_info() and
make it return the entry. This allows parse_engine() to still return an
item if the caller wants to allow parsing a class-only string like
"rcs", "bcs", "ccs", etc.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-2-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/configfs: Extract function to parse engine
Lucas De Marchi [Tue, 16 Sep 2025 21:15:38 +0000 (14:15 -0700)] 
drm/xe/configfs: Extract function to parse engine

Move the part that copies the engine to a local buffer so it can be
shared in future for other configfs attributes parsing an engine.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-1-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume
Matthew Schwartz [Thu, 11 Sep 2025 17:48:51 +0000 (10:48 -0700)] 
drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume

On clients that utilize AMD_PRIVATE_COLOR properties for HDR support,
brightness sliders can include a hardware controlled portion and a
gamma-based portion. This is the case on the Steam Deck OLED when using
gamescope with Steam as a client.

When a user sets a brightness level while HDR is active, the gamma-based
portion and/or hardware portion are adjusted to achieve the desired
brightness. However, when a modeset takes place while the gamma-based
portion is in-use, restoring the hardware brightness level overrides the
user's overall brightness level and results in a mismatch between what
the slider reports and the display's current brightness.

To avoid overriding gamma-based brightness, only restore HW backlight
level after boot or resume. This ensures that the backlight level is
set correctly after the DC layer resets it while avoiding interference
with subsequent modesets.

Fixes: 7875afafba84 ("drm/amd/display: Fix brightness level not retained over reboot")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4551
Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/atom: Check kcalloc() for WS buffer in amdgpu_atom_execute_table_locked()
Guangshuo Li [Thu, 18 Sep 2025 10:57:05 +0000 (18:57 +0800)] 
drm/amdgpu/atom: Check kcalloc() for WS buffer in amdgpu_atom_execute_table_locked()

kcalloc() may fail. When WS is non-zero and allocation fails, ectx.ws
remains NULL while ectx.ws_size is set, leading to a potential NULL
pointer dereference in atom_get_src_int() when accessing WS entries.

Return -ENOMEM on allocation failure to avoid the NULL dereference.

Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: revert to old status lock handling v3
Christian König [Wed, 27 Aug 2025 09:45:45 +0000 (11:45 +0200)] 
drm/amdgpu: revert to old status lock handling v3

It turned out that protecting the status of each bo_va with a
spinlock was just hiding problems instead of solving them.

Revert the whole approach, add a separate stats_lock and lockdep
assertions that the correct reservation lock is held all over the place.

This not only allows for better checks if a state transition is properly
protected by a lock, but also switching back to using list macros to
iterate over the state of lists protected by the dma_resv lock of the
root PD.

v2: re-add missing check
v3: split into two patches

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/xe/xe_late_bind_fw: Extract and print version info
Badal Nilawar [Fri, 5 Sep 2025 15:49:53 +0000 (21:19 +0530)] 
drm/xe/xe_late_bind_fw: Extract and print version info

Extract and print version info of the late binding binary.

v2: Some refinements (Daniele)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-10-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding
Badal Nilawar [Fri, 5 Sep 2025 15:49:52 +0000 (21:19 +0530)] 
drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late binding

Introduce a debug filesystem node to disable late binding fw reload
during the system or runtime resume. This is intended for situations
where the late binding fw needs to be loaded from user mode,
perticularly for validation purpose.
Note that xe kmd doesn't participate in late binding flow from user
space. Binary loaded from the userspace will be lost upon entering to
D3 cold hence user space app need to handle this situation.

v2:
  - s/(uval == 1) ? true : false/!!uval/ (Daniele)
v3:
  - Refine the commit message (Daniele)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-9-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Reload late binding fw during system resume
Badal Nilawar [Fri, 5 Sep 2025 15:49:51 +0000 (21:19 +0530)] 
drm/xe/xe_late_bind_fw: Reload late binding fw during system resume

Reload late binding fw during resume from system suspend

v2:
  - Unconditionally reload late binding fw (Rodrigo)
  - Flush worker during system suspend

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-8-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume
Badal Nilawar [Fri, 5 Sep 2025 15:49:50 +0000 (21:19 +0530)] 
drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resume

Reload late binding fw during runtime resume.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-7-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Load late binding firmware
Badal Nilawar [Fri, 5 Sep 2025 15:49:49 +0000 (21:19 +0530)] 
drm/xe/xe_late_bind_fw: Load late binding firmware

Load late binding firmware

v2:
 - s/EAGAIN/EBUSY/
 - Flush worker in suspend and driver unload (Daniele)
v3:
 - Use retry interval of 6s, in steps of 200ms, to allow
   other OS components release MEI CL handle (Sasha)
v4:
 - return -ENODEV if component not added (Daniele)
 - parse and print status returned by csc
v5:
 - Use payload to check firmware valid (Daniele)
 - Obtain the RPM reference before scheduling the worker to
   ensure the device remains awake until the worker completes
   firmware loading (Rodrigo)
v6:
 - In case of error donot re-attempt fw download (Daniele)
v7 (Rodrigo):
 - Rename of mei structs and callback.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-6-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Initialize late binding firmware
Badal Nilawar [Fri, 5 Sep 2025 15:49:48 +0000 (21:19 +0530)] 
drm/xe/xe_late_bind_fw: Initialize late binding firmware

Search for late binding firmware binaries and populate the meta data of
firmware structures.

v2 (Daniele):
 - drm_err if firmware size is more than max pay load size
 - s/request_firmware/firmware_request_nowarn/ as firmware will
   not be available for all possible cards
v3 (Daniele):
 - init firmware from within xe_late_bind_init, propagate error
 - switch late_bind_fw to array to handle multiple firmware types
v4 (Daniele):
 - Alloc payload dynamically, fix nits
v6 (Daniele)
 - %s/MAX_PAYLOAD_SIZE/XE_LB_MAX_PAYLOAD_SIZE/

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-5-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw
Badal Nilawar [Fri, 5 Sep 2025 15:49:47 +0000 (21:19 +0530)] 
drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw

Introduce xe_late_bind_fw to enable firmware loading for the devices,
such as the fan controller, during the driver probe. Typically,
firmware for such devices are part of IFWI flash image but can be
replaced at probe after OEM tuning.
This patch binds mei late binding component to enable firmware loading.

v2:
 - Add devm_add_action_or_reset to remove the component (Daniele)
 - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
v3:
 - Fail driver probe if late bind initialization fails,
   add has_late_bind flag (Daniele)
v4:
 - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/
v6:
 - rebased
v7:
 - rebased
 - In xe_late_bind_init, use drm_err when returning an error to
   stop the probe (Lucas)
 - Use imperative mode in commit message (Lucas)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agomei: late_bind: add late binding component driver
Alexander Usyskin [Fri, 5 Sep 2025 15:49:46 +0000 (21:19 +0530)] 
mei: late_bind: add late binding component driver

Introduce a new MEI client driver to support Late Binding firmware
upload/update for Intel discrete graphics platforms.

Late Binding is a runtime firmware upload/update mechanism that allows
payloads, such as fan control and voltage regulator, to be securely
delivered and applied without requiring SPI flash updates or
system reboots. This driver enables the Xe graphics driver and other
user-space tools to push such firmware blobs to the authentication
firmware via the MEI interface.

The driver handles authentication, versioning, and communication
with the authentication firmware, which in turn coordinates with
the PUnit/PCODE to apply the payload.

This is a foundational component for enabling dynamic, secure,
and re-entrant configuration updates on platforms like Battlemage.

Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250905154953.3974335-3-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agomei: bus: add mei_cldev_mtu interface
Alexander Usyskin [Fri, 5 Sep 2025 15:49:45 +0000 (21:19 +0530)] 
mei: bus: add mei_cldev_mtu interface

Add a new helper function that allows MEI client drivers
to query the maximum transmission unit (MTU) for a connected
MEI client.

This is useful for clients that need to transmit large payloads,
such as firmware blobs, allowing them to determine the maximum
message size that can be safely sent before starting transmission and
size of the buffer to allocate when receiving data.

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250905154953.3974335-2-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/amdgpu: add missing comment for the new argument
Sunil Khatri [Thu, 18 Sep 2025 04:03:51 +0000 (09:33 +0530)] 
drm/amdgpu: add missing comment for the new argument

In function 'amdgpu_vm_lock_done_list' update the comment
for the new argument 'vm'.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509180211.UAqME0zj-lkp@intel.com/
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: suspend KFD and KGD user queues for S0ix
Alex Deucher [Wed, 17 Sep 2025 16:42:11 +0000 (12:42 -0400)] 
drm/amdgpu: suspend KFD and KGD user queues for S0ix

We need to make sure the user queues are preempted so
GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/userq: Optimize S0ix handling
Alex Deucher [Wed, 17 Sep 2025 16:42:10 +0000 (12:42 -0400)] 
drm/amdgpu/userq: Optimize S0ix handling

In S0i3, GFX state is retained, so it's preferrable to
preempt queues rather than unmapping them as the overhead
is lower.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Fix PRT flag for gfx12
Joe.Wang [Wed, 17 Sep 2025 06:58:49 +0000 (14:58 +0800)] 
drm/amdgpu: Fix PRT flag for gfx12

AMDGPU_PTE_PRT_GFX12 flag is missed during pageTable rework, add it back.

Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3")
Signed-off-by: Joe Wang <joe.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Check VF critical region before RAS poison injection
Xiang Liu [Tue, 19 Aug 2025 05:06:24 +0000 (13:06 +0800)] 
drm/amdgpu: Check VF critical region before RAS poison injection

Check VF critical region before RAS poison injection to ensure that the
poison injection will not hit the VF critical region.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdkfd: add proper handling for S0ix
Alex Deucher [Wed, 17 Sep 2025 16:42:09 +0000 (12:42 -0400)] 
drm/amdkfd: add proper handling for S0ix

When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Introduce VF critical region check for RAS poison injection
Xiang Liu [Tue, 19 Aug 2025 04:51:28 +0000 (12:51 +0800)] 
drm/amdgpu: Introduce VF critical region check for RAS poison injection

The SRIOV guest send requet to host to check whether the poison
injection address is in VF critical region or not via mabox.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: remove non-DC DCE 11 code
Alex Deucher [Wed, 20 Aug 2025 20:04:18 +0000 (16:04 -0400)] 
drm/amdgpu: remove non-DC DCE 11 code

DC has been the default for ~8 years now and supports
many things that the non-DC code does not (audio, DP MST, etc.).
No DCE 11.x IPs ever supported analog encoders so that is not
an issue.  Finally drop this code.

Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: Enable npm metrics data
Asad Kamal [Mon, 15 Sep 2025 12:28:49 +0000 (20:28 +0800)] 
drm/amd/pm: Enable npm metrics data

Enable npm metrics data for smu_v13_0_12

v3: Add node id check for setting NPM_CAPS (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: Fetch npm data from system metrics table
Asad Kamal [Fri, 29 Aug 2025 04:25:54 +0000 (12:25 +0800)] 
drm/amd/pm: Fetch npm data from system metrics table

Fetch npm data from system metrics table for smu_v13_0_12

v3: Remove intermittent type for npm data, remove node id check,
move npm caps check to npm_get_data function (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: Add sysfs node for node power
Asad Kamal [Wed, 27 Aug 2025 13:22:13 +0000 (21:22 +0800)] 
drm/amd/pm: Add sysfs node for node power

Add sysfs node to expose node power limit for smu_v13_0_12

v2: Remove support check from visible function (Kevin)

v3: Update comments (Kevin)
    Remove sysfs remove file, change format specifier
    for sysfs_emit, use attribute_group.name (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: Allow system metrics table in 1vf mode
Asad Kamal [Mon, 15 Sep 2025 09:53:19 +0000 (17:53 +0800)] 
drm/amd/pm: Allow system metrics table in 1vf mode

Allow fetching system metrics table in 1VF mode

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>