]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 months agodrm/xe/multi_queue: Handle CGP context error
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:59 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Handle CGP context error

Trigger multi-queue context cleanup upon CGP context error
notification from GuC.

v4: Fix error message

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-30-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:58 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches

To properly support soft light restore between batches
being arbitrated at the CFEG, PIPE_CONTROL instructions
have a new bit in the first DW, QUEUE_DRAIN_MODE. When
set, this indicates to the CFEG that it should only
drain the current queue.

Additionally we no longer want to set the CS_STALL bit
for these multi queue queues as this causes the entire
pipeline to stall waiting for completion of the prior
batch, preventing this soft light restore from occurring
between queues in a queue group.

v4: Assert !multi_queue where applicable (Matt Roper)

Bspec: 56551
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-29-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Handle tearing down of a multi queue
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:57 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Handle tearing down of a multi queue

As all queues of a multi queue group use the primary queue of the group
to interface with GuC. Hence there is a dependency between the queues of
the group. So, when primary queue of a multi queue group is cleaned up,
also trigger a cleanup of the secondary queues also. During cleanup, stop
and re-start submission for all queues of a multi queue group to avoid
any submission happening in parallel when a queue is being cleaned up.

v2: Initialize group->list_lock, add fs_reclaim dependency, remove
    unwanted secondary queues cleanup (Matt Brost)
v3: Properly handle cleanup of multi-queue group (Matt Brost)
v4: Fix IS_ENABLED(CONFIG_LOCKDEP) check (Matt Brost)
    Revert stopping/restarting of submissions on queues of the
    group in TDR as it is not needed.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-28-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Add multi queue information to guc_info dump
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:56 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Add multi queue information to guc_info dump

Dump multi queue specific information in the guc exec queue
dump.

v2: Move multi queue related fields inside the multi_queue
    sub-structure (Matt Brost)

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-27-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Add support for multi queue dynamic priority change
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:55 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Add support for multi queue dynamic priority change

Support dynamic priority change for multi queue group queues via
exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
the drm scheduler message interface for priority to take effect.

v2: Move is_multi_queue check to exec_queue layer and assert
    is_multi_queue being set in guc submission layer (Matt Brost)
v3: Assert CGP_SYNC message length is valid (Matt Brost)

Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-26-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Add exec_queue set_property ioctl support
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:54 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Add exec_queue set_property ioctl support

This patch adds support for exec_queue set_property ioctl.
It is derived from the original work which is part of
https://patchwork.freedesktop.org/series/112188/

Currently only DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY
property can be dynamically set.

v2: Check for and update kernel-doc which property this ioctl
    supports (Matt Brost)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-25-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Handle invalid exec queue property setting
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:53 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Handle invalid exec queue property setting

Only MULTI_QUEUE_PRIORITY property is valid for secondary queues of a
multi queue group. MULTI_QUEUE_PRIORITY only applies to multi queue group
queues. Detect invalid user queue property setting and return error.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-24-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Add multi queue priority property
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:52 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Add multi queue priority property

Add support for queues of a multi queue group to set
their priority within the queue group by adding property
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY.
This is the only other property supported by secondary
queues of a multi queue group, other than
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE.

v2: Add kernel doc for enum xe_multi_queue_priority,
    Add assert for priority values, fix includes and
    declarations (Matt Brost)
v3: update uapi kernel-doc (Matt Brost)
v4: uapi change due to rebase

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-23-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Add GuC interface for multi queue support
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:51 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Add GuC interface for multi queue support

Implement GuC commands and response along with the Context
Group Page (CGP) interface for multi queue support.

Ensure that only primary queue (q0) of a multi queue group
communicate with GuC. The secondary queues of the group only
need to maintain LRCA and interface with drm scheduler.

Use primary queue's submit_wq for all secondary queues of a multi
queue group. This serialization avoids any locking around CGP
synchronization with GuC.

v2: Fix G2H_LEN_DW_MULTI_QUEUE_CONTEXT value, add more comments
    (Matt Brost)
v3: Minor code refactro, use xe_gt_assert
v4: Use xe_guc_ct_wake_waiters(), remove vf recovery support
    (Matt Brost)

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-22-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Add user interface for multi queue support
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:50 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Add user interface for multi queue support

Multi Queue is a new mode of execution supported by the compute and
blitter copy command streamers (CCS and BCS, respectively). It is an
enhancement of the existing hardware architecture and leverages the
same submission model. It enables support for efficient, parallel
execution of multiple queues within a single context. All the queues
of a group must use the same address space (VM).

The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
property supports creating a multi queue group and adding queues to
a queue group. All queues of a multi queue group share the same
context.

A exec queue create ioctl call with above property specified with value
DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
queue being created as the primary queue (aka q0) of the group. To add
secondary queues to the group, they need to be created with the above
property with id of the primary queue as the value. The properties of
the primary queue (like priority, timeslice) applies to the whole group.
So, these properties can't be set for secondary queues of a group.

Once destroyed, the secondary queues of a multi queue group can't be
replaced. However, they can be dynamically added to the group up to a
total of 64 queues per group. Once the primary queue is destroyed,
secondary queues can't be added to the queue group.

v2: Remove group->lock, fix xe_exec_queue_group_add()/delete()
    function semantics, add additional comments, remove unused
    group->list_lock, add XE_BO_FLAG_GGTT_INVALIDATE for cgp bo,
    Assert LRC is valid, update uapi kernel doc.
    (Matt Brost)
v3: Use XE_BO_FLAG_PINNED_LATE_RESTORE/USER_VRAM/GGTT_INVALIDATE
    flags for cgp bo (Matt)
v4: Ensure queue is not a vm_bind queue
    uapi change due to rebase

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-21-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/multi_queue: Add multi_queue_enable_mask to gt information
Niranjana Vishwanathapura [Thu, 11 Dec 2025 01:02:49 +0000 (17:02 -0800)] 
drm/xe/multi_queue: Add multi_queue_enable_mask to gt information

Add multi_queue_enable_mask field to the gt information structure
which is bitmask of all engine classes with multi queue support
enabled.

v2: Rename multi_queue_enable_mask to multi_queue_engine_class_mask
    (Matt Brost)

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-20-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/vf: Reset recovery_queued after issuing RESFIX_START
Satyanarayana K V P [Wed, 10 Dec 2025 05:25:49 +0000 (05:25 +0000)] 
drm/xe/vf: Reset recovery_queued after issuing RESFIX_START

During VF_RESTORE or VF_RESUME, the GuC sends a migration interrupt and
clears the RESFIX_START marker. If migration or resume occurs before the
VF issues its own RESFIX_START, VF KMD may receive two back-to-back
migration interrupts. VF then sends RESFIX_START to indicate the beginning
of fixups and RESFIX_DONE to mark completion. However, the second
RESFIX_START fails because the GuC is already in the RUNNING state.

Clear the recovery_queued flag after sending a RESFIX_START message to
ignore duplicated IRQs seen before we start actual recovery.

This ensures the state is reset only after the fixup process begins,
avoiding redundant work item queuing.

Fixes: b5fbb94341a2 ("drm/xe/vf: Introduce RESFIX start marker support")
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251210052546.622809-6-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/vf: Fix queuing of recovery work
Satyanarayana K V P [Wed, 10 Dec 2025 05:25:48 +0000 (05:25 +0000)] 
drm/xe/vf: Fix queuing of recovery work

Ensure VF migration recovery work is only queued when no recovery is
already queued and teardown is not in progress.

Fixes: b47c0c07c350 ("drm/xe/vf: Teardown VF post migration worker on driver unload")
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251210052546.622809-5-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/bo: Don't include the CCS metadata in the dma-buf sg-table
Thomas Hellström [Tue, 9 Dec 2025 20:49:20 +0000 (21:49 +0100)] 
drm/xe/bo: Don't include the CCS metadata in the dma-buf sg-table

Some Xe bos are allocated with extra backing-store for the CCS
metadata. It's never been the intention to share the CCS metadata
when exporting such bos as dma-buf. Don't include it in the
dma-buf sg-table.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251209204920.224374-1-thomas.hellstrom@linux.intel.com
2 months agodrm/xe/hw_engine_group: Add stats for mode switching
Francois Dugast [Wed, 10 Dec 2025 16:50:00 +0000 (17:50 +0100)] 
drm/xe/hw_engine_group: Add stats for mode switching

The GT stats interface is extended to include counters of how many
queues are either interrupted or waited on in the hardware engine
groups. This can help application debugging.

v2: Rename to queue as those operations are queue-based (Matthew Brost)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251210165000.60789-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
2 months agodrm/me/gsc: mei interrupt top half should be in irq disabled context
Junxiao Chang [Fri, 7 Nov 2025 03:31:52 +0000 (11:31 +0800)] 
drm/me/gsc: mei interrupt top half should be in irq disabled context

MEI GSC interrupt comes from i915 or xe driver. It has top half and
bottom half. Top half is called from i915/xe interrupt handler. It
should be in irq disabled context.

With RT kernel(PREEMPT_RT enabled), by default IRQ handler is in
threaded IRQ. MEI GSC top half might be in threaded IRQ context.
generic_handle_irq_safe API could be called from either IRQ or
process context, it disables local IRQ then calls MEI GSC interrupt
top half.

This change fixes B580 GPU boot issue with RT enabled.

Fixes: e02cea83d32d ("drm/xe/gsc: add Battlemage support")
Tested-by: Baoli Zhang <baoli.zhang@intel.com>
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251107033152.834960-1-junxiao.chang@intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2 months agodrm/xe/guc: Recommend GuC v70.54.0 for BMG, PTL
Julia Filipchuk [Tue, 25 Nov 2025 01:41:40 +0000 (17:41 -0800)] 
drm/xe/guc: Recommend GuC v70.54.0 for BMG, PTL

UAPI compatibility version 1.27.0

Update recommended GuC version for BMG, PTL.

Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patch.msgid.link/20251125014134.2075988-15-julia.filipchuk@intel.com
2 months agodrm/xe/guc: Recommend GuC v70.53.0 for MTL, DG2, LNL
Julia Filipchuk [Tue, 25 Nov 2025 01:41:39 +0000 (17:41 -0800)] 
drm/xe/guc: Recommend GuC v70.53.0 for MTL, DG2, LNL

UAPI compatibility version 1.26.0

Update recommended GuC version for MTL, DG2, LNL.

Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patch.msgid.link/20251125014134.2075988-14-julia.filipchuk@intel.com
2 months agodrm/xe/xe_survivability: Add support for survivability mode v2
Riana Tauro [Mon, 8 Dec 2025 08:45:42 +0000 (14:15 +0530)] 
drm/xe/xe_survivability: Add support for survivability mode v2

v2 survivability breadcrumbs introduces a new mode called
SPI Flash Descriptor Override mode (FDO). This is enabled by
PCODE when MEI itself fails and firmware cannot be updated via
MEI using igsc. This mode provides the ability to update
the firmware directly via SPI driver.

Xe KMD initializes the nvm aux driver if FDO mode is enabled.

Userspace should check FDO mode entry in survivability info sysfs before
using the SPI driver to update firmware.

/sys/bus/pci/devices/<device>/survivability_info/fdo_mode

v2 also supports survivability mode for critical boot errors.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251208084539.3652902-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 months agodrm/xe/xe_survivability: Redesign survivability mode
Riana Tauro [Mon, 8 Dec 2025 08:45:41 +0000 (14:15 +0530)] 
drm/xe/xe_survivability: Redesign survivability mode

Redesign survivability mode to have only one value per file.

1) Retain the survivability_mode sysfs to indicate the type

cat /sys/bus/pci/devices/0000\:03\:00.0/survivability_mode
(Boot / Runtime)

2) Add survivability_info directory to expose boot breadcrumbs.
Entries in survivability mode sysfs are only visible when
boot breadcrumb registers are populated.

/sys/bus/pci/devices/0000:03:00.0/survivability_info
├── aux_info0
├── aux_info1
├── aux_info2
├── aux_info3
├── aux_info4
├── capability_info
├── postcode_trace
└── postcode_trace_overflow

Capability Info:

Provides data about boot status and has bits that
indicate the support for the other breadcrumbs

Postcode Trace / Postcode Trace Overflow :

Each postcode is represented as an 8-bit value and represents
a boot failure event. When a new failure event is logged by Pcode
the existing postcodes are shifted left. These entries provide a
history of 8 postcodes.

Auxiliary Info:

Some failures have additional debug information.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251208084539.3652902-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 months agodrm/xe/vf: Stop waiting for ring space on VF post migration recovery
Tomasz Lis [Thu, 4 Dec 2025 20:08:20 +0000 (21:08 +0100)] 
drm/xe/vf: Stop waiting for ring space on VF post migration recovery

If wait for ring space started just before migration, it can delay
the recovery process, by waiting without bailout path for up to 2
seconds.

Two second wait for recovery is not acceptable, and if the ring was
completely filled even without the migration temporarily stopping
execution, then such a wait will result in up to a thousand new jobs
(assuming constant flow) being added while the wait is happening.

While this will not cause data corruption, it will lead to warning
messages getting logged due to reset being scheduled on a GT under
recovery. Also several seconds of unresponsiveness, as the backlog
of jobs gets progressively executed.

Add a bailout condition, to make sure the recovery starts without
much delay. The recovery is expected to finish in about 100 ms when
under moderate stress, so the condition verification period needs to be
below that - settling at 64 ms.

The theoretical max time which the recovery can take depends on how
many requests can be emitted to engine rings and be pending execution.
While stress testing, it was possible to reach 10k pending requests
on rings when a platform with two GTs was used. This resulted in max
recovery time of 5 seconds. But in real life situations, it is very
unlikely that the amount of pending requests will ever exceed 100,
and for that the recovery time will be around 50 ms - well within our
claimed limit of 100ms.

Fixes: a4dae94aad6a ("drm/xe/vf: Wakeup in GuC backend on VF post migration recovery")
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251204200820.2206168-1-tomasz.lis@intel.com
2 months agodrm/xe/xe3p_xpc: Enable Indirect Ring State for xe3p_xpc
Niranjana Vishwanathapura [Thu, 4 Dec 2025 06:34:52 +0000 (22:34 -0800)] 
drm/xe/xe3p_xpc: Enable Indirect Ring State for xe3p_xpc

The xe3p_xpc platform supports Indirect Ring State and
it is required for the upcoming multi-queue feature.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251204063451.1180387-2-niranjana.vishwanathapura@intel.com
2 months agodrm/xe/throttle: Skip reason prefix while emitting array
Raag Jadav [Wed, 3 Dec 2025 12:33:55 +0000 (18:03 +0530)] 
drm/xe/throttle: Skip reason prefix while emitting array

The newly introduced "reasons" attribute already signifies possible
reasons for throttling and makes the prefix in individual attribute
names redundant while emitting them as an array. Skip the prefix.

Fixes: 83ccde67a3f7 ("drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Sk Anirban <sk.anirban@intel.com>
Link: https://patch.msgid.link/20251203123355.571606-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 months agodrm/xe/guc_ct: Assert on credits mismatch during runtime suspend
Raag Jadav [Tue, 2 Dec 2025 08:33:34 +0000 (14:03 +0530)] 
drm/xe/guc_ct: Assert on credits mismatch during runtime suspend

G2H credits should be in fully idle state when runtime suspending GuC CT.
Assert on mismatch.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20251202083334.554045-1-raag.jadav@intel.com
2 months agodrm/xe: expose PAT software config to debugfs
Xin Wang [Fri, 5 Dec 2025 07:06:33 +0000 (07:06 +0000)] 
drm/xe: expose PAT software config to debugfs

The existing "pat" debugfs node dumps the live PAT registers. Under
SR-IOV the VF cannot touch those registers, so the file vanishes and
users lose all PAT visibility. Add a VF-safe "pat_sw_config" entry to
the VF-safe debugfs list. It prints the cached PAT table the driver
programmed, rather than poking HW, so PF and VF instances present the
same view.

This lets IGT and other tools query the PAT configuration without
carrying platform-specific tables or mirroring kernel logic.

v2: (Jonathan)
- Only append "(* = reserved entry)" to the PAT table header on Xe2+
  platforms where it actually applies.
- Deduplicate the PTA/ATS mode printing by introducing the small
  drm_printf_pat_mode() helper macro.

v3: (Matt)
- Print IDX[XE_CACHE_NONE_COMPRESSION] on every Xe2+ platform so the
  dump always reflects the value the driver might use (even if it defaults
  to 0) and future IP revisions don’t need extra condition tweaks.

v4:
- Drop the drm_printf_pat_mode macro and introduce a real helper
  xe2_pat_entry_dump(). (Jani)
- Reuse the helper across all PTA/ATS/PAT dumps for xe2+ entries to keep
  output format identical.

v5: (Matt)
- Split the original patch into two: one for refactoring helpers, one for
  the new debugfs entry.

CC: Jani Nikula <jani.nikula@intel.com>
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251205070633.28072-1-x.wang@intel.com
2 months agodrm/xe: Refactor PAT dump to use shared helpers
Xin Wang [Fri, 5 Dec 2025 07:02:19 +0000 (07:02 +0000)] 
drm/xe: Refactor PAT dump to use shared helpers

Move the PAT entry formatting into shared helper functions to ensure
consistency and enable code reuse.

This preparation is necessary for a follow-up patch that introduces a
software-based PAT dump, which is required for debugging on VFs where
hardware access is limited.

V2: (Matt)
- Xe3p XPC doesn’t define COMP_EN; omit it to match bspec and avoid
confusion.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251205070220.27859-1-x.wang@intel.com
2 months agodrm/xe/guc: Add new debugfs entry for lfd format output
Zhanjun Dong [Thu, 27 Nov 2025 17:07:59 +0000 (12:07 -0500)] 
drm/xe/guc: Add new debugfs entry for lfd format output

Add new debugfs entry "guc_log_lfd", prepared for output guc log
in LFD(Log Format Descriptors) format.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251127170759.2620994-7-zhanjun.dong@intel.com
2 months agodrm/xe/guc: Only add GuC crash dump if available
Zhanjun Dong [Thu, 27 Nov 2025 17:07:58 +0000 (12:07 -0500)] 
drm/xe/guc: Only add GuC crash dump if available

Add GuC crash dump data empty check. LFD will only include crash dump
section when data is not empty.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251127170759.2620994-6-zhanjun.dong@intel.com
2 months agodrm/xe/guc: Add GuC log event buffer output in LFD format
Zhanjun Dong [Thu, 27 Nov 2025 17:07:57 +0000 (12:07 -0500)] 
drm/xe/guc: Add GuC log event buffer output in LFD format

Add GuC log event buffer output in LFD format.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251127170759.2620994-5-zhanjun.dong@intel.com
2 months agodrm/xe/guc: Add GuC log init config in LFD format
Zhanjun Dong [Thu, 27 Nov 2025 17:07:56 +0000 (12:07 -0500)] 
drm/xe/guc: Add GuC log init config in LFD format

Add support to output GuC log init config (LIC) in LFD format.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251127170759.2620994-4-zhanjun.dong@intel.com
2 months agodrm/xe/guc: Add LFD related abi definitions
Zhanjun Dong [Thu, 27 Nov 2025 17:07:55 +0000 (12:07 -0500)] 
drm/xe/guc: Add LFD related abi definitions

Add GuC LFD (Log Format Descriptors) related ABI definitions.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251127170759.2620994-3-zhanjun.dong@intel.com
2 months agodrm/xe/guc: Add log init config abi definitions
Zhanjun Dong [Thu, 27 Nov 2025 17:07:54 +0000 (12:07 -0500)] 
drm/xe/guc: Add log init config abi definitions

Add GuC log init config (LIC) ABI definitions.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251127170759.2620994-2-zhanjun.dong@intel.com
2 months agodrm/xe/rtp: Whitelist OAM MMIO trigger registers
Ashutosh Dixit [Tue, 2 Dec 2025 02:51:15 +0000 (18:51 -0800)] 
drm/xe/rtp: Whitelist OAM MMIO trigger registers

Whitelist OAM registers to enable userspace to execute MMIO triggers on OAM
units.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251202025115.373546-6-ashutosh.dixit@intel.com
2 months agodrm/xe/rtp: Refactor OAG MMIO trigger register whitelisting
Ashutosh Dixit [Tue, 2 Dec 2025 02:51:14 +0000 (18:51 -0800)] 
drm/xe/rtp: Refactor OAG MMIO trigger register whitelisting

Minor refactor of OAG MMIO trigger register whitelisting for code reuse
with OAM MMIO trigger register whitelisting.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251202025115.373546-5-ashutosh.dixit@intel.com
2 months agodrm/xe/oa: Allow exec_queue's to be specified only for OAG OA unit
Ashutosh Dixit [Tue, 2 Dec 2025 02:51:13 +0000 (18:51 -0800)] 
drm/xe/oa: Allow exec_queue's to be specified only for OAG OA unit

Exec_queue's are only used for OAR/OAC functionality for OAG unit. Make
this requirement explicit, which avoids complications in the code for
other (non-OAG) OA units.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251202025115.373546-4-ashutosh.dixit@intel.com
2 months agodrm/xe/oa/uapi: Add gt_id to struct drm_xe_oa_unit
Ashutosh Dixit [Tue, 2 Dec 2025 02:51:12 +0000 (18:51 -0800)] 
drm/xe/oa/uapi: Add gt_id to struct drm_xe_oa_unit

gt_id was previously omitted from 'struct drm_xe_oa_unit' because it could
be determine from hwe's attached to the OA unit. However, we now have OA
units which don't have any hwe's attached to them. Hence add gt_id to
'struct drm_xe_oa_unit' in order to provide this needed information to
userspace.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251202025115.373546-3-ashutosh.dixit@intel.com
2 months agodrm/xe/oa: Use explicit struct initialization for struct xe_oa_regs
Ashutosh Dixit [Tue, 2 Dec 2025 02:51:11 +0000 (18:51 -0800)] 
drm/xe/oa: Use explicit struct initialization for struct xe_oa_regs

Use explicit struct initialization for struct xe_oa_regs to reduce chance
of error. Also add .oa_mmio_trg field to struct for completeness.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251202025115.373546-2-ashutosh.dixit@intel.com
2 months agodrm/xe: fix drm_gpusvm_init() arguments
Arnd Bergmann [Thu, 4 Dec 2025 09:46:58 +0000 (10:46 +0100)] 
drm/xe: fix drm_gpusvm_init() arguments

The Xe driver fails to build when CONFIG_DRM_XE_GPUSVM is disabled
but CONFIG_DRM_GPUSVM is turned on, due to the clash of two commits:

In file included from drivers/gpu/drm/xe/xe_vm_madvise.c:8:
drivers/gpu/drm/xe/xe_svm.h: In function 'xe_svm_init':
include/linux/stddef.h:8:14: error: passing argument 5 of 'drm_gpusvm_init' makes integer from pointer without a cast [-Wint-conversion]
drivers/gpu/drm/xe/xe_svm.h:217:38: note: in expansion of macro 'NULL'
  217 |                                NULL, NULL, 0, 0, 0, NULL, NULL, 0);
      |                                      ^~~~
In file included from drivers/gpu/drm/xe/xe_bo_types.h:11,
                 from drivers/gpu/drm/xe/xe_bo.h:11,
                 from drivers/gpu/drm/xe/xe_vm_madvise.c:11:
include/drm/drm_gpusvm.h:254:35: note: expected 'long unsigned int' but argument is of type 'void *'
  254 |                     unsigned long mm_start, unsigned long mm_range,
      |                     ~~~~~~~~~~~~~~^~~~~~~~
In file included from drivers/gpu/drm/xe/xe_vm_madvise.c:14:
drivers/gpu/drm/xe/xe_svm.h:216:16: error: too many arguments to function 'drm_gpusvm_init'; expected 10, have 11
  216 |         return drm_gpusvm_init(&vm->svm.gpusvm, "Xe SVM (simple)", &vm->xe->drm,
      |                ^~~~~~~~~~~~~~~
  217 |                                NULL, NULL, 0, 0, 0, NULL, NULL, 0);
      |                                                                 ~
include/drm/drm_gpusvm.h:251:5: note: declared here

Adapt the caller to the new argument list by removing the extraneous
NULL argument.

Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Fixes: 10aa5c806030 ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251204094704.1030933-1-arnd@kernel.org
2 months agodrm/xe/pf: fix VFIO link error
Arnd Bergmann [Thu, 4 Dec 2025 09:41:36 +0000 (10:41 +0100)] 
drm/xe/pf: fix VFIO link error

The Makefile logic for building xe_sriov_vfio.o was added incorrectly,
as setting CONFIG_XE_VFIO_PCI=m means it doesn't get included into a
built-in xe driver:

ERROR: modpost: "xe_sriov_vfio_stop_copy_enter" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_stop_copy_exit" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_suspend_device" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_wait_flr_done" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_error" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_resume_data_enter" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_resume_device" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_resume_data_exit" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_data_write" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
ERROR: modpost: "xe_sriov_vfio_migration_supported" [drivers/vfio/pci/xe/xe-vfio-pci.ko] undefined!
WARNING: modpost: suppressed 3 unresolved symbol warnings because there were too many)

Check for CONFIG_XE_VFIO_PCI being enabled in the Makefile to decide whether to
include the object instead.

Fixes: 17f22465c5a5 ("drm/xe/pf: Export helpers for VFIO")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251204094154.1029357-1-arnd@kernel.org
2 months agodrm/xe/uapi: Add NO_COMPRESSION BO flag and query capability
Sanjay Yadav [Thu, 4 Dec 2025 04:04:03 +0000 (09:34 +0530)] 
drm/xe/uapi: Add NO_COMPRESSION BO flag and query capability

Introduce DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION to let userspace
opt out of CCS compression on a per-BO basis. When set, the driver
maps this to XE_BO_FLAG_NO_COMPRESSION, skips CCS metadata
allocation/clearing, and rejects compressed PAT indices at vm_bind.
This avoids extra memory ops and manual CCS state handling for buffers.

To allow userspace to detect at runtime whether the kernel supports this
feature, add DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT and expose
it via query_config() on Xe2+ platforms.

Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38425
IGT PR: https://patchwork.freedesktop.org/patch/685180/

v2
- Changed error code from -EINVAL to -EOPNOTSUPP for unsupported flag
  usage on pre-Xe2 platforms
- Fixed checkpatch warning in xe_vm.c
- Fixed kernel-doc formatting in xe_drm.h

v3
- Rebase
- Updated commit title and description
- Added UAPI for DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT and
  exposed it via query_config()

v4
- Rebase

v5
- Included Mesa PR and IGT PR in the commit description
- Used xe_pat_index_get_comp_en() to extract the compression

v6
- Added XE_IOCTL_DBG() checks for argument validation

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Suggested-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251204040402.2692921-2-sanjay.kumar.yadav@intel.com
2 months agodrm/xe/xe3: Remove graphics IP 30.01 from Wa_18041344222 IP list
Harish Chegondi [Fri, 21 Nov 2025 21:59:17 +0000 (13:59 -0800)] 
drm/xe/xe3: Remove graphics IP 30.01 from Wa_18041344222 IP list

As per the updated WA database, Wa_18041344222 no longer applies to
graphics IP version 30.01. Remove it.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/280ab3e8dce8d7a40540ae634a5432694ac17ab0.1763762330.git.harish.chegondi@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe: Do not reference loop variable directly
Matthew Brost [Wed, 3 Dec 2025 01:18:09 +0000 (17:18 -0800)] 
drm/xe: Do not reference loop variable directly

Do not reference the loop variable job after the loop has exited.
Instead, save the job from the last iteration of the loop.

Fixes: 00937fe1921a ("drm/xe/vf: Start re-emission from first unsignaled job during VF migration")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202511291102.jnnKP6IB-lkp@intel.com/
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20251203011809.968893-1-matthew.brost@intel.com
2 months agodrm/xe/sync: Use for_each_tlb_inval() to calculate invalidation fences
Matt Roper [Tue, 2 Dec 2025 22:25:52 +0000 (14:25 -0800)] 
drm/xe/sync: Use for_each_tlb_inval() to calculate invalidation fences

xe_sync_in_fence_get() uses the same kind of mismatched fence array
allocation vs looping logic that was previously noted and changed by
commit 0a4c2ddc711a ("drm/xe/vm: Use for_each_tlb_inval() to calculate
invalidation fences").  As with that commit, the mismatch doesn't cause
any problem at the moment since for_each_tlb_inval() loops the same
number of times as XE_MAX_GT_PER_TILE (2).  However we don't want to
assume that these will always be the same in the future, so switch to
using for_each_tlb_inval() in both places to future-proof the code.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251202222551.1858930-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agoMerge drm/drm-next into drm-xe-next
Thomas Hellström [Wed, 3 Dec 2025 10:22:18 +0000 (11:22 +0100)] 
Merge drm/drm-next into drm-xe-next

Backmerging to bring in a needed dependency for the Xe VFIO
driver variant. This should ideally have been done before we
commited that, so we now have a small window in drm-xe-next
where that driver doesn't compile.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512030331.I8CveRre-lkp@intel.com/
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2 months agoMerge tag 'amd-drm-next-6.19-2025-12-02' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Tue, 2 Dec 2025 23:43:09 +0000 (09:43 +1000)] 
Merge tag 'amd-drm-next-6.19-2025-12-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.19-2025-12-02:

amdgpu:
- Unified MES fix
- SMU 11 unbalanced irq fix
- Fix for driver reloading on APUs
- pp_table sysfs fix
- Fix memory leak in fence handling
- HDMI fix
- DC cursor fixes
- eDP panel parsing fix
- Brightness fix
- DC analog fixes
- EDID retry fixes
- UserQ fixes
- RAS fixes
- IP discovery fix
- Add missing locking in amdgpu_ttm_access_memory_sdma()
- Smart Power OLED fix
- PRT and page fault fixes for GC 6-8
- VMID reservation fix
- ACP platform device fix
- Add missing vm fault handling for GC 11-12
- VPE fix

amdkfd:
- Partitioning fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251202220101.2039347-1-alexander.deucher@amd.com
2 months agoRevert "drm/amd: Skip power ungate during suspend for VPE"
Mario Limonciello (AMD) [Sun, 30 Nov 2025 01:46:31 +0000 (19:46 -0600)] 
Revert "drm/amd: Skip power ungate during suspend for VPE"

Skipping power ungate exposed some scenarios that will fail
like below:

```
amdgpu: Register(0) [regVPEC_QUEUE_RESET_REQ] failed to reach value 0x00000000 != 0x00000001n
amdgpu 0000:c1:00.0: amdgpu: VPE queue reset failed
...
amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout!
```

The underlying s2idle issue that prompted this commit is going to
be fixed in BIOS.
This reverts commit 2a6c826cfeedd7714611ac115371a959ead55bda.

Fixes: 2a6c826cfeed ("drm/amd: Skip power ungate during suspend for VPE")
Cc: stable@vger.kernel.org
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Konstantin <answer2019@yandex.ru>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220812
Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: use common defines for HUB faults
Alex Deucher [Tue, 18 Nov 2025 21:56:54 +0000 (16:56 -0500)] 
drm/amdgpu: use common defines for HUB faults

Use common definitions for the fault bits in the IH sourc
data for the gmc9-12 memory hub faults

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling
Alex Deucher [Thu, 13 Nov 2025 20:57:43 +0000 (15:57 -0500)] 
drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling

We need to call amdgpu_vm_handle_fault() on page fault
on all gfx9 and newer parts to properly update the
page tables, not just for recoverable page faults.

Cc: stable@vger.kernel.org
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling
Alex Deucher [Thu, 13 Nov 2025 20:55:19 +0000 (15:55 -0500)] 
drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling

We need to call amdgpu_vm_handle_fault() on page fault
on all gfx9 and newer parts to properly update the
page tables, not just for recoverable page faults.

Cc: stable@vger.kernel.org
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: use static ids for ACP platform devs
Brady Norander [Tue, 25 Mar 2025 21:05:17 +0000 (17:05 -0400)] 
drm/amdgpu: use static ids for ACP platform devs

mfd_add_hotplug_devices() assigns child platform devices with
PLATFORM_DEVID_AUTO, but the ACP machine drivers expect the platform
device names to never change. Use mfd_add_devices() instead and give
each cell a unique id.

Signed-off-by: Brady Norander <bradynorander@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix
Srinivasan Shanmugam [Tue, 25 Nov 2025 15:50:45 +0000 (21:20 +0530)] 
drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix

On GFX11.0.3, earlier SDMA firmware versions issue the
PROTECTED_FENCE write from the user VMID (e.g. VMID 8) instead of
VMID 0. This causes a GPU VM protection fault when SDMA tries to
write the secure fence location, as seen in the UMQ SDMA test
(cs-sdma-with-IP-DMA-UMQ)

Fixes the below GPU page fault:
[  514.037189] amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32770)
[  514.037199] amdgpu 0000:0b:00.0: amdgpu:  Process  pid 0 thread  pid 0
[  514.037205] amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x00007fff00409000 from client 10
[  514.037212] amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841A51
[  514.037217] amdgpu 0000:0b:00.0: amdgpu:      Faulty UTCL2 client ID: SDMA0 (0xd)
[  514.037223] amdgpu 0000:0b:00.0: amdgpu:      MORE_FAULTS: 0x1
[  514.037227] amdgpu 0000:0b:00.0: amdgpu:      WALKER_ERROR: 0x0
[  514.037232] amdgpu 0000:0b:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[  514.037236] amdgpu 0000:0b:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  514.037241] amdgpu 0000:0b:00.0: amdgpu:      RW: 0x1

v2: Updated commit message
v3: s/gfx11.0.3/sdma 6.0.3/ in patch title (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Forward VMID reservation errors
Natalie Vock [Mon, 1 Dec 2025 17:52:38 +0000 (12:52 -0500)] 
drm/amdgpu: Forward VMID reservation errors

Otherwise userspace may be fooled into believing it has a reserved VMID
when in reality it doesn't, ultimately leading to GPU hangs when SPM is
used.

Fixes: 80e709ee6ecc ("drm/amdgpu: add option params to enforce process isolation between graphics and compute")
Cc: stable@vger.kernel.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:52 +0000 (14:29 +0100)] 
drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring

On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.

Delegate the processing of all VM faults to the soft
IRQ handler ring.

As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:51 +0000 (14:29 +0100)] 
drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring

On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.

Delegate the processing of all VM faults to the soft
IRQ handler ring.

As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:50 +0000 (14:29 +0100)] 
drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring

On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.

Delegate the processing of all VM faults to the soft
IRQ handler ring.

As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gmc6: Cache VM fault info
Timur Kristóf [Wed, 26 Nov 2025 13:29:49 +0000 (14:29 +0100)] 
drm/amdgpu/gmc6: Cache VM fault info

Call amdgpu_vm_update_fault_cache on GMC v6 similarly to how we
do in GMC v7-v8 so that VM fault info can be used later by
userspace for debugging.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gmc6: Don't print MC client as it's unknown
Timur Kristóf [Wed, 26 Nov 2025 13:29:48 +0000 (14:29 +0100)] 
drm/amdgpu/gmc6: Don't print MC client as it's unknown

The VM_CONTEXT1_PROTECTION_FAULT_MCCLIENT register
doesn't exist on GMC v6 so we can't print the MC client as a
string like we do on GMC v7-v8. However, we still print the
mc_id from VM_CONTEXT1_PROTECTION_FAULT_STATUS.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/cz_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:47 +0000 (14:29 +0100)] 
drm/amdgpu/cz_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/tonga_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:46 +0000 (14:29 +0100)] 
drm/amdgpu/tonga_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/iceland_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:45 +0000 (14:29 +0100)] 
drm/amdgpu/iceland_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/cik_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:44 +0000 (14:29 +0100)] 
drm/amdgpu/cik_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v7 (CIK)
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/si_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:43 +0000 (14:29 +0100)] 
drm/amdgpu/si_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v6 (SI)
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: fix typo in display_mode_core_structs.h
Aditya Gollamudi [Sun, 12 Oct 2025 19:13:19 +0000 (12:13 -0700)] 
drm/amd/display: fix typo in display_mode_core_structs.h

Fix a typo in a comment, change "enviroment" to "environment" in
drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h

Fixes: e6a8a000cfe6 ("drm/amd/display: Rename dml2 to dml2_0 folder")
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Aditya Gollamudi <adigollamudi@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: fix Smart Power OLED not working after S4
Ian Chen [Thu, 13 Nov 2025 05:07:58 +0000 (13:07 +0800)] 
drm/amd/display: fix Smart Power OLED not working after S4

[HOW]
Before enable smart power OLED, we need to call set pipe to let
DMUB get correct ABM config.

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Ian Chen <ian.chen@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Move RGB-type check for audio sync to DCE HW sequence
Ivan Lipski [Fri, 21 Nov 2025 20:03:57 +0000 (15:03 -0500)] 
drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence

[Why&How]
DVI-A & VGA connectors are applicable to DCE ASICs, so move them to
dce110_hwseq.c to block audio sync on SIGNAL_TYPE_RGB for DCE ASICs.

Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdma
Pierre-Eric Pelloux-Prayer [Tue, 25 Nov 2025 09:48:39 +0000 (10:48 +0100)] 
drm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdma

Users of ttm entities need to hold the gtt_window_lock before using them
to guarantee proper ordering of jobs.

Cc: stable@vger.kernel.org
Fixes: cb5cc4f573e1 ("drm/amdgpu: improve debug VRAM access performance using sdma")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/xe/gt: Use scope-based forcewake
Raag Jadav [Fri, 28 Nov 2025 08:22:12 +0000 (13:52 +0530)] 
drm/xe/gt: Use scope-based forcewake

Switch runtime PM code to use scope-based forcewake for consistency with
other parts of the driver.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251128082212.294592-1-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/vf: Add debugfs entries to test VF double migration
Satyanarayana K V P [Mon, 1 Dec 2025 09:50:16 +0000 (15:20 +0530)] 
drm/xe/vf: Add debugfs entries to test VF double migration

VF migration sends a marker to the GUC before resource fixups begin,
and repeats the marker with the RESFIX_DONE notification. This prevents
the GUC from submitting jobs during double migration events.

To reliably test double migration, a second migration must be triggered
while fixups from the first migration are still in progress. Since fixups
complete quickly, reproducing this scenario is difficult. Introduce
debugfs controls to add delays in the post-fixup phase, creating a
deterministic window for subsequent migrations.

New debugfs entries:
/sys/kernel/debug/dri/BDF/
├── tile0
│   ├─gt0
│   │ ├──vf
│   │ │  ├── resfix_stoppers

resfix_stoppers: Predefined checkpoints that allow the migration process
to pause at specific stages. The stages are given below.

VF_MIGRATION_WAIT_RESFIX_START - BIT(0)
VF_MIGRATION_WAIT_FIXUPS - BIT(1)
VF_MIGRATION_WAIT_RESTART_JOBS - BIT(2)
VF_MIGRATION_WAIT_RESFIX_DONE - BIT(3)

Each state will pause with a 1-second delay per iteration, continuing until
its corresponding bit is cleared.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Acked-by: Adam Miszczak <adam.miszczak@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251201095011.21453-10-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/vf: Requeue recovery on GuC MIGRATION error during VF post-migration
Satyanarayana K V P [Mon, 1 Dec 2025 09:50:15 +0000 (15:20 +0530)] 
drm/xe/vf: Requeue recovery on GuC MIGRATION error during VF post-migration

Handle GuC response `XE_GUC_RESPONSE_VF_MIGRATED` as a special case in the
VF post-migration recovery flow. When this error occurs, it indicates that
a new migration was detected while the resource fixup process was still in
progress. Instead of failing immediately, requeue the VF into the recovery
path to allow proper handling of the new migration event.

This improves robustness of VF recovery in SR-IOV environments where
migrations can overlap with resource fixup steps.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251201095011.21453-9-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/vf: Introduce RESFIX start marker support
Satyanarayana K V P [Mon, 1 Dec 2025 09:50:14 +0000 (15:20 +0530)] 
drm/xe/vf: Introduce RESFIX start marker support

In scenarios involving double migration, the VF KMD may encounter
situations where it is instructed to re-migrate before having the
opportunity to send RESFIX_DONE for the initial migration. This can occur
when the fix-up for the prior migration is still underway, but the VF KMD
is migrated again.

Consequently, this may lead to the possibility of sending two migration
notifications (i.e., pending fix-up for the first migration and a second
notification for the new migration). Upon receiving the first RES_FIX
notification, the GuC will resume VF submission on the GPU, potentially
resulting in undefined behavior, such as system hangs or crashes.

To avoid this, post migration, a marker is sent to the GUC prior to the
start of resource fixups to indicate start of resource fixups. The same
marker is sent along with RESFIX_DONE notification so that GUC can avoid
submitting jobs to HW in case of double migration.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251201095011.21453-8-satyanarayana.k.v.p@intel.com
2 months agodrm/xe/vf: Enable VF migration only on supported GuC versions
Satyanarayana K V P [Mon, 1 Dec 2025 09:50:13 +0000 (15:20 +0530)] 
drm/xe/vf: Enable VF migration only on supported GuC versions

Enable VF migration starting with GuC 70.54.0 (compatibility version
1.27.0) which supports additional VF2GUC_RESFIX_START message required
to handle migration recovery in a more robust way.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251201095011.21453-7-satyanarayana.k.v.p@intel.com
2 months agoMerge tag 'drm-misc-next-2025-12-01-1' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Tue, 2 Dec 2025 08:09:01 +0000 (18:09 +1000)] 
Merge tag 'drm-misc-next-2025-12-01-1' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Extra drm-misc-next for v6.19-rc1:

UAPI Changes:
- Add support for drm colorop pipeline.
- Add COLOR PIPELINE plane property.
- Add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE.

Cross-subsystem Changes:
- Attempt to use higher order mappings in system heap allocator.
- Always taint kernel with sw-sync.

Core Changes:
- Small fixes to drm/gem.
- Support emergency restore to drm-client.
- Allocate and release fb_info in single place.
- Rework ttm pipelined eviction fence handling.

Driver Changes:
- Support the drm color pipeline in vkms, amdgfx.
- Add NVJPG driver for tegra.
- Assorted small fixes and updates to rockchip, bridge/dw-hdmi-qp,
  panthor.
- Add ASL CS5263 DP-to-HDMI simple bridge.
- Add and improve support for G LD070WX3-SL01 MIPI DSI, Samsung LTL106AL0,
  Samsung LTL106AL01, Raystar RFF500F-AWH-DNN, Winstar WF70A8SYJHLNGA,
  Wanchanglong w552946aaa, Samsung SOFEF00, Lenovo X13s panel.
- Add support for it66122 to it66121.
- Support mali-G1 gpu in panthor.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/aa5cbd50-7676-4a59-bbed-e8428af86804@linux.intel.com
2 months agoMAINTAINERS: Remove myself from xe maintainers
Lucas De Marchi [Wed, 26 Nov 2025 22:43:58 +0000 (14:43 -0800)] 
MAINTAINERS: Remove myself from xe maintainers

As I'm leaving Intel soon, drop myself from the list of Xe maintainers.
Also update the mailmap to switch to my kernel.org address.

Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251126224357.2482051-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2 months agodrm/xe: Implement DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE
Matthew Brost [Wed, 26 Nov 2025 18:59:52 +0000 (10:59 -0800)] 
drm/xe: Implement DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE

Implement DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE which sets the exec
queue default state to user data passed in. The intent is for a Mesa
tool to use this replay GPU hangs.

v2:
 - Enable the flag DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE
 - Fix the page size math calculation to avoid a crash
v4:
 - Use vmemdup_user (Maarten)
 - Copy default state first into LRC, then replay state (Testing, Carlos)

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-10-matthew.brost@intel.com
2 months agodrm/xe: Add replay_offset and replay_length lines to LRC HWCTX snapshot
Matthew Brost [Wed, 26 Nov 2025 18:59:51 +0000 (10:59 -0800)] 
drm/xe: Add replay_offset and replay_length lines to LRC HWCTX snapshot

Add replay_offset and replay_length lines to LRC HWCTX snapshot with the
idea being this information can be used extract the data which needs to
be pass to exec queue extension DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE
so GPU hang can be replayed via a Mesa tool.

The additional lines look like:

[HWCTX].replay_offset: 0x%x
[HWCTX].replay_length: 0x%x

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-9-matthew.brost@intel.com
2 months agodrm/xe/uapi: Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE
Matthew Brost [Wed, 26 Nov 2025 18:59:50 +0000 (10:59 -0800)] 
drm/xe/uapi: Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE

Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE which accepts a user pointer
to populate the exec queue state so that a GPU hang can be replayed via
a Mesa tool.

v2: Update the value for HANG_REPLAY_STATE flag

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Carlos Santa <carlos.santa@intel-corp-partner.google.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-8-matthew.brost@intel.com
2 months agodrm/xe: Add VM.uapi_flags to VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:49 +0000 (10:59 -0800)] 
drm/xe: Add VM.uapi_flags to VM snapshot capture

Add VM.uapi_flags to VM snapshot capture VM snapshot capture. This is
useful information for debug and will help build a robust GPU hang
replay tool.

The current format is:

VM.uapi_flags: 0x%x

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-7-matthew.brost@intel.com
2 months agodrm/xe: Add cpu_caching to properties line in VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:48 +0000 (10:59 -0800)] 
drm/xe: Add cpu_caching to properties line in VM snapshot capture

Add CPU caching to properties line in VM snapshot capture indicating
the BO caching properites. This is useful information for debug and
will help build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>|mem_region=0x%x|pat_index=%d|cpu_caching=%d

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Memory region is a bit mask of where the memory is located.

Pat index corresponds to the value setup upon VM bind.

CPU caching corresponds to the value of BO setup upon creation.

v2:
 - Save off cpu_caching value rather than looking at BO (Carlos)
v4:
 - Fix NULL ptr dereference (Carlos)

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-6-matthew.brost@intel.com
2 months agodrm/xe: Add pat_index to properties line in VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:47 +0000 (10:59 -0800)] 
drm/xe: Add pat_index to properties line in VM snapshot capture

Add pat index to properties line in VM snapshot capture indicating
the VMA caching properites. This is useful information for debug and
will help build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>|mem_region=0x%x|pat_index=%d

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Memory region is a bit mask of where the memory is located.

Pat index corresponds to the value setup upon VM bind.

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-5-matthew.brost@intel.com
2 months agodrm/xe: Add mem_region to properties line in VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:46 +0000 (10:59 -0800)] 
drm/xe: Add mem_region to properties line in VM snapshot capture

Add memory region to properties line in VM snapshot capture indicating
where the memory is located. The memory region corresponds to regions in
the uAPI. This is useful information for debug and will help build a
robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>|mem_region=0x%x

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Memory region is a bit mask of where the memory is located.

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-4-matthew.brost@intel.com
2 months agodrm/xe: Add "null_sparse" type to VM snap properties
Matthew Brost [Wed, 26 Nov 2025 18:59:45 +0000 (10:59 -0800)] 
drm/xe: Add "null_sparse" type to VM snap properties

Add "null_sparse" type to VM snap properties indicating the VMA reads
zero and writes are droppped. This is useful information for debug and
will help build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>

Permissions has two options, either "read_only" or "read_write".

Type has three options, either "userptr", "null_sparse", or "bo".

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-3-matthew.brost@intel.com
2 months agodrm/xe: Add properties line to VM snapshot capture
Matthew Brost [Wed, 26 Nov 2025 18:59:44 +0000 (10:59 -0800)] 
drm/xe: Add properties line to VM snapshot capture

Add properties line to VM snapshot capture which includes additional
information about VMA being dumped. This is helpful for debug purposes
but also to build a robust GPU hang replay tool.

The current format is:

[<vma address>]: <permissions>|<type>

Permissions has two options, either "read_only" or "read_write".

Type has two options, either "userptr" or "bo".

Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20251126185952.546277-2-matthew.brost@intel.com
2 months agodrm/xe: Apply Wa_14020316580 in xe_gt_idle_enable_pg()
Vinay Belgaumkar [Sat, 29 Nov 2025 05:25:48 +0000 (21:25 -0800)] 
drm/xe: Apply Wa_14020316580 in xe_gt_idle_enable_pg()

Wa_14020316580 was getting clobbered by power gating init code
later in the driver load sequence. Move the Wa so that
it applies correctly.

Fixes: 7cd05ef89c9d ("drm/xe/xe2hpm: Add initial set of workarounds")
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251129052548.70766-1-vinay.belgaumkar@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe: Fix freq kobject leak on sysfs_create_files failure
Shuicheng Lin [Fri, 14 Nov 2025 20:56:39 +0000 (20:56 +0000)] 
drm/xe: Fix freq kobject leak on sysfs_create_files failure

Ensure gt->freq is released when sysfs_create_files() fails
in xe_gt_freq_init(). Without this, the kobject would leak.
Add kobject_put() before returning the error.

Fixes: fdc81c43f0c1 ("drm/xe: use devm_add_action_or_reset() helper")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Alex Zuo <alex.zuo@intel.com>
Reviewed-by: Xin Wang <x.wang@intel.com>
Link: https://patch.msgid.link/20251114205638.2184529-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agodrm/xe/xe3_lpg: Apply Wa_16028005424
Balasubramani Vivekanandan [Fri, 21 Nov 2025 10:08:23 +0000 (15:38 +0530)] 
drm/xe/xe3_lpg: Apply Wa_16028005424

Applied Wa_16028005424 to Graphics version from 30.00 to 30.05

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20251121100822.20076-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2 months agovfio/xe: Add device specific vfio_pci driver variant for Intel graphics
Michał Winiarski [Thu, 27 Nov 2025 09:39:34 +0000 (10:39 +0100)] 
vfio/xe: Add device specific vfio_pci driver variant for Intel graphics

In addition to generic VFIO PCI functionality, the driver implements
VFIO migration uAPI, allowing userspace to enable migration for Intel
Graphics SR-IOV Virtual Functions.
The driver binds to VF device and uses API exposed by Xe driver to
transfer the VF migration data under the control of PF device.

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex@shazbot.org>
Link: https://patch.msgid.link/20251127093934.1462188-5-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pf: Export helpers for VFIO
Michał Winiarski [Thu, 27 Nov 2025 09:39:33 +0000 (10:39 +0100)] 
drm/xe/pf: Export helpers for VFIO

Device specific VFIO driver variant for Xe will implement VF migration.
Export everything that's needed for migration ops.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251127093934.1462188-4-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pci: Introduce a helper to allow VF access to PF xe_device
Michał Winiarski [Thu, 27 Nov 2025 09:39:32 +0000 (10:39 +0100)] 
drm/xe/pci: Introduce a helper to allow VF access to PF xe_device

In certain scenarios (such as VF migration), VF driver needs to interact
with PF driver.
Add a helper to allow VF driver access to PF xe_device.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251127093934.1462188-3-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/pf: Enable SR-IOV VF migration
Michał Winiarski [Thu, 27 Nov 2025 09:39:31 +0000 (10:39 +0100)] 
drm/xe/pf: Enable SR-IOV VF migration

All of the necessary building blocks are now in place to support SR-IOV
VF migration.
Flip the enable/disable logic to match VF code and disable the feature
only for platforms that don't meet the necessary prerequisites.
To allow more testing and experiments, on DEBUG builds any missing
prerequisites will be ignored.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251127093934.1462188-2-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agoMerge tag 'drm-misc-next-fixes-2025-11-26' of https://gitlab.freedesktop.org/drm...
Dave Airlie [Thu, 27 Nov 2025 22:40:47 +0000 (08:40 +1000)] 
Merge tag 'drm-misc-next-fixes-2025-11-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next-fixes for v6.19:
- Restrict the pointer size of flush pages to prevent a regression.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/0090a4fc-9cc4-4c03-bfe5-d1b1f0cc7e05@linux.intel.com
2 months agoMerge tag 'drm-rust-next-2025-11-21' of https://gitlab.freedesktop.org/drm/rust/kerne...
Dave Airlie [Thu, 27 Nov 2025 20:49:40 +0000 (06:49 +1000)] 
Merge tag 'drm-rust-next-2025-11-21' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-next

Core Changes:

- Fix warning in documentation builds on older rustc versions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/aSA5pshsJ7TeJIbu@google.com
2 months agoMerge tag 'drm-xe-next-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/xe...
Dave Airlie [Thu, 27 Nov 2025 20:46:35 +0000 (06:46 +1000)] 
Merge tag 'drm-xe-next-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Driver Changes:
- A couple of SR-IOV fixes (Michal Winiarski)
- Fix a potential UAF (Sanjay)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aSA08EW9JMU3LkIu@fedora
2 months agodrm/xe/gt: Introduce runtime suspend/resume
Raag Jadav [Thu, 30 Oct 2025 12:23:57 +0000 (17:53 +0530)] 
drm/xe/gt: Introduce runtime suspend/resume

If power state is retained between suspend/resume cycle, we don't need
to perform full GT re-initialization. Introduce runtime helpers for GT
which greatly reduce suspend/resume delay.

v2: Drop redundant xe_gt_sanitize() and xe_guc_ct_stop() (Daniele)
    Use runtime naming for guc helpers (Daniele)
v3: Drop redundant logging, add kernel doc (Michal)
    Use runtime naming for ct helpers (Michal)
v4: Fix tags (Rodrigo)
v5: Include host_l2_vram workaround (Daniele)
    Reuse xe_guc_submit_enable/disable() helpers (Daniele)

Co-developed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-5-raag.jadav@intel.com
2 months agodrm/xe/pm: Assert on runtime suspend if VFs are enabled
Raag Jadav [Thu, 30 Oct 2025 12:23:56 +0000 (17:53 +0530)] 
drm/xe/pm: Assert on runtime suspend if VFs are enabled

We hold an additional reference to the runtime PM to keep PF in D0
during VFs lifetime, as our VFs do not implement the PM capability.
This means we should never be runtime suspending as long as VFs are
enabled.

v8: Add !IS_SRIOV_VF() assert (Matthew Brost)

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-4-raag.jadav@intel.com
2 months agodrm/xe/guc_submit: Introduce pause/unpause() helpers for PF
Raag Jadav [Thu, 30 Oct 2025 12:23:55 +0000 (17:53 +0530)] 
drm/xe/guc_submit: Introduce pause/unpause() helpers for PF

Introduce pause/unpause() helpers which stop/start further runs of
submission tasks on given GuC and can be called from PF context. This
is in preparation of usecases where we simply need to stop/start the
scheduler without losing GuC state and don't require dealing with VF
migration.

v7: Reword commit message (Matthew Brost)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-3-raag.jadav@intel.com
2 months agodrm/xe/vf: Update pause/unpause() helpers with VF naming
Raag Jadav [Thu, 30 Oct 2025 12:23:54 +0000 (17:53 +0530)] 
drm/xe/vf: Update pause/unpause() helpers with VF naming

Now that pause/unpause() helpers have been updated for VF migration
usecase, update their naming to match the functionality and while at it,
add IS_SRIOV_VF() assert to make sure they are not abused.

v7: Add IS_SRIOV_VF() assert (Matthew Brost)
    Use "vf" suffix (Michal)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251030122357.128825-2-raag.jadav@intel.com
2 months agodrm/xe: Move VRAM MM debugfs creation to tile level
Piotr Piórkowski [Thu, 27 Nov 2025 07:36:43 +0000 (08:36 +0100)] 
drm/xe: Move VRAM MM debugfs creation to tile level

Previously, VRAM TTM resource manager debugfs entries (vram0_mm / vram1_mm)
were created globally in the XE debugfs root directory. But technically,
each tile has an associated VRAM TTM manager, which it can own.
Let's create VRAM memory manager debugfs entries directly under each tile's
debugfs directory for better alignment with the per-tile memory layout.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251127073643.144379-1-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/xe/xe_sriov_packet: Return int from pf_descriptor_init
Jonathan Cavitt [Mon, 17 Nov 2025 19:01:15 +0000 (19:01 +0000)] 
drm/xe/xe_sriov_packet: Return int from pf_descriptor_init

pf_descriptor_init currently returns a size_t, which is an unsigned
integer data type.  This conflicts with it returning a negative errno
value on failure.

Make it return an int instead.  This mirrors how pf_trailer_init is used
later.

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Alex Zuo <alex.zuo@intel.com>
Link: https://patch.msgid.link/20251117190114.69953-2-jonathan.cavitt@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2 months agodrm/amd/display: Enable support for Gamma 2.2
Alex Hung [Sat, 15 Nov 2025 00:02:16 +0000 (17:02 -0700)] 
drm/amd/display: Enable support for Gamma 2.2

This patchset enables support for the Gamma 2.2.

With this patch the following IGT subtests pass:

kms_colorop --run plane-XR30-XR30-gamma_2_2

kms_colorop --run plane-XR30-XR30-gamma_2_2_inv-gamma_2_2

kms_colorop --run plane-XR30-XR30-gamma_2_2_inv-gamma_2_2-gamma_2_2_inv

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Simon Ser <contact@emersion.fr>
Link: https://patch.msgid.link/20251115000237.3561250-52-alex.hung@amd.com
2 months agodrm/colorop: Add DRM_COLOROP_1D_CURVE_GAMMA22 to 1D Curve
Alex Hung [Sat, 15 Nov 2025 00:02:15 +0000 (17:02 -0700)] 
drm/colorop: Add DRM_COLOROP_1D_CURVE_GAMMA22 to 1D Curve

Add "DRM_COLOROP_1D_CURVE_GAMMA22" and DRM_COLOROP_1D_CURVE_GAMMA22_INV
subtypes to drm_colorop of DRM_COLOROP_1D_CURVE.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Simon Ser <contact@emersion.fr>
Link: https://patch.msgid.link/20251115000237.3561250-51-alex.hung@amd.com