git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge tag 'drm-xe-next-2025-11-14' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Driver Changes:

Avoid TOCTOU when montoring throttle reasons (Lucas)
Add/extend workaround (Nitin)
SRIOV migration work / plumbing (Michal Wajdeczko, Michal Winiarski, Lukasz)
Drop debug flag requirement for VF resource fixup
Fix MTL vm_max_level (Rodrigo)
Changes around TILE_ADDR_RANGE for platform compatibility
(Fei, Lucas)
Add runtime registers for GFX ver >= 35 (Piotr)
Kerneldoc fix (Kriish)
Rework pcode error mapping (Lucas)
Allow lockdown the PF (Michal)
Eliminate GUC code caching of some frequency values (Sk)
Improvements around forcewake referencing (Matt Roper)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aRcJOrisG2qPbucE@fedora

Merge tag 'drm-xe-next-2025-11-05' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:

Limit number of jobs per exec queue (Shuicheng)
Add sriov_admin sysfs tree (Michal)

Driver Changes:

Fix an uninitialized value (Thomas)
Expose a residency counter through debugfs (Mohammed Thasleem)
Workaround enabling and improvement (Tapani, Tangudu)
More Crescent Island-specific support (Sk Anirban, Lucas)
PAT entry dump imprement (Xin)
Inline gt_reset in the worker (Lucas)
Synchronize GT reset with device unbind (Balasubramani)
Do clean shutdown also when using flr (Jouni)
Fix serialization on burst of unbinds (Matt Brost)
Pagefault Refactor (Matt Brost)
Remove some unused code (Gwan-gyeong)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aQuBECxNOhudc0Bz@fedora

drm/xe/oa: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get(). Since OA grabs and releases forcewake
in different functions, xe_oa_stream_destroy() is currently calling put
with a hardcoded ALL mask. Although this works for now, it's somewhat
fragile in case OA moves to more precise power domain management in the
future.

Stash the original reference obtained during stream initialization
inside the stream structure so that we can use it directly when the
stream is destroyed.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-35-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

drm/xe/eustall: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get(). Since EU stall grabs and releases
forcewake in different functions, xe_eu_stall_disable_locked() is
currently calling put with a hardcoded RENDER domain. Although this
works for now, it's somewhat fragile in case the power domain(s)
required by stall sampling change in the future, or if workarounds show
up that require us to obtain additional domains.

Stash the original reference obtained during stream enable inside the
stream structure so that we can use it directly when the stream is
disabled.

Cc: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-34-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

drm/xe/forcewake: Improve kerneldoc

Improve the kerneldoc for forcewake a bit to give more detail about what
the structures represent.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-33-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

drm/xe/pf: Use migration-friendly GGTT auto-provisioning

Instead of trying very hard to find the largest fair GGTT size that
could be allocated for VFs on the current tile, pick some smaller
rounded down to power-of-two value that is more likely to be
provisioned in the same manner by the other PF instance:

  num VFs | GGTT space (MiB)
  --------+-----------------
   63..57 | 56
   56..29 | 64
   28..15 | 128
   14..8  | 256
    7..4  | 512
    3..2  | 1024
       1  | 2048 (regular PF)
       1  | 3584 (admin only PF)

Note that due to FW/HW limitations we can't share all 4GiB GGTT
address space with VFs, so for the larger (>7) number of the VFs
the change in the outcome is happening at different points than
we have in case of GuC contexts/doorbells IDs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251112124408.8094-1-michal.wajdeczko@intel.com

drm/intel/bmg: Allow device ID usage with single-argument macros

When INTEL_BMG_G21_IDS were added as a subplatform, token concatenation
operator usage was omitted, making INTEL_BMG_IDS not usable with
single-argument macros.
Fix that by adding the missing operator.

Fixes: 78de8f876683 ("drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workarounds")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-25-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add wait helper for VF FLR

VF FLR requires additional processing done by PF driver.
The processing is done after FLR is already finished from PCIe
perspective.
In order to avoid a scenario where migration state transitions while
PF processing is still in progress, additional synchronization
point is needed.
Add a helper that will be used as part of VF driver struct
pci_error_handlers .reset_done() callback.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-24-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Handle VRAM migration data as part of PF control

Connect the helpers to allow save and restore of VRAM migration data in
stop_copy / resume device state.

Co-developed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-23-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/migrate: Add function to copy of VRAM data in chunks

Introduce a new function to copy data between VRAM and sysmem objects.
The existing xe_migrate_copy() is tailored for eviction and restore
operations, which involves additional logic and operates on entire
objects.
The xe_migrate_vram_copy_chunk() allows copying chunks of data to or
from a dedicated buffer object, which is essential in case of VF
migration.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-22-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add helper to retrieve VF's LMEM object

Instead of accessing VF's lmem_obj directly, introduce a helper function
to make the access more convenient.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-21-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Handle MMIO migration data as part of PF control

Implement the helpers and use them for save and restore of MMIO
migration data in stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-20-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Handle GGTT migration data as part of PF control

Connect the helpers to allow save and restore of GGTT migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-19-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add helpers for VF GGTT migration data handling

In an upcoming change, the VF GGTT migration data will be handled as
part of VF control state machine. Add the necessary helpers to allow the
migration data transfer to/from the HW GGTT resource.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-18-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Handle GuC migration data as part of PF control

Connect the helpers to allow save and restore of GuC migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-17-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Switch VF migration GuC save/restore to struct migration data

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Now that the data is decoupled from both guc_state debugfs and PAUSE
state, we can safely remove the struct xe_gt_sriov_state_snapshot and
modify the GuC save/restore functions to operate on struct
xe_sriov_migration_data.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-16-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Don't save GuC VF migration data on pause

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Remove it from PAUSE state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-15-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Remove GuC migration data save/restore from GT debugfs

In upcoming changes, SR-IOV VF migration data will be extended beyond
GuC data and exported to userspace using VFIO interface (with a
vendor-specific variant driver) and a device-level debugfs interface.
Remove the GT-level debugfs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-14-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration

Contiguous PF GGTT VMAs can be scarce after creating VFs.
Increase the GuC buffer cache size to 8M for PF so that we can fit GuC
migration data (which currently maxes out at just over 4M) and use the
cache instead of allocating fresh BOs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-13-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe: Allow the caller to pass guc_buf_cache size

An upcoming change will use GuC buffer cache as a place where GuC
migration data will be stored, and the memory requirement for that is
larger than indirect data.
Allow the caller to pass the size based on the intended usecase.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-12-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe: Add sa/guc_buf_cache sync interface

In upcoming changes the cached buffers are going to be used to read data
produced by the GuC. Add a counterpart to flush, which synchronizes the
CPU-side of suballocation with the GPU data and propagate the interface
to GuC Buffer Cache.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-11-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Expose VF migration data size over debugfs

The size is normally used to make a decision on when to stop the device
(mainly when it's in a pre_copy state).

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-10-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add minimalistic migration descriptor

The descriptor reuses the KLV format used by GuC and contains metadata
that can be used to quickly fail migration when source is incompatible
with destination.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-9-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add support for encap/decap of bitstream to/from packet

Add debugfs handlers for migration state and handle bitstream
.read()/.write() to convert from bitstream to/from migration data
packets.
As descriptor/trailer are handled at this layer - add handling for both
save and restore side.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-8-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add helpers for migration data packet allocation / free

Now that it's possible to free the packets - connect the restore
handling logic with the ring.
The helpers will also be used in upcoming changes that will start
producing migration data packets.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-7-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add data structures and handlers for migration rings

Migration data is queued in a per-GT ptr_ring to decouple the worker
responsible for handling the data transfer from the .read() and .write()
syscalls.
Add the data structures and handlers that will be used in future
commits.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-6-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Add save/restore control state stubs and connect to debugfs

The states will be used by upcoming changes to produce (in case of save)
or consume (in case of resume) the VF migration data.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-5-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Convert control state to bitmap

In upcoming changes, the number of states will increase as a result of
introducing SAVE and RESTORE states.
This means that using unsigned long as underlying storage won't work on
32-bit architectures, as we'll run out of bits.
Use bitmap instead.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510231918.XlOqymLC-lkp@intel.com/
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-4-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe: Move migration support to device-level struct

Upcoming changes will allow users to control VF state and obtain its
migration data with a device-level granularity (not tile/gt).
Change the data structures to reflect that and move the GT-level
migration init to happen after device-level init.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-3-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/pf: Remove GuC version check for migration support

Since commit 4eb0aab6e4434 ("drm/xe/guc: Bump minimum required GuC
version to v70.29.2"), the minimum GuC version required by the driver
is v70.29.2, which should already include everything that we need for
migration.
Remove the version check.

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-2-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/guc: Eliminate RPa frequency caching

Remove the cached pc->rpa_freq field and refactor RPA frequency handling
to fetch values directly from hardware registers on each request.

v2: Check graphics version instead of platform (Rodrigo)
v3: Fix graphics version check (Badal)

Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Suggested-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251112185153.3593145-6-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/guc: Eliminate RPe caching for SLPC parameter handling

RPe is runtime-determined by PCODE and caching it caused stale values,
leading to incorrect GuC SLPC parameter settings.
Drop the cached rpe_freq field and query fresh values from hardware
on each use to ensure GuC SLPC parameters reflect current RPe.

v2: Remove cached RPe frequency field (Rodrigo)
v3: Remove extra variable (Vinay)
Modify function name (Vinay)
v4: Maintain a separate function for PVC (Rodrigo)
v5: Avoid RPn update while fetching RPe frequency (Rodrigo)
v6: Split platform-specific RPe comments (Vinay)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5166
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patch.msgid.link/20251112185153.3593145-5-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/pf: Allow to lockdown the PF using custom guard

Some driver components, like eudebug or ccs-mode, can't be used
when VFs are enabled. Add functions to allow those components
to block the PF from enabling VFs for the requested duration.

Introduce trivial counter to allow lockdown or exclusive access
that can be used in the scenarios where we can't follow the strict
owner semantics as required by the rw_semaphore implementation.

Before enabling VFs, the PF will try to arm the "vfs_enabling"
guard for the exclusive access. This will fail if there are
some lockdown requests already initiated by the other components.

For testing purposes, add debugfs file which will call these new
functions from the file's open/close hooks.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Christoph Manszewski <christoph.manszewski@intel.com>
Reviewed-by: Christoph Manszewski <christoph.manszewski@intel.com>
Link: https://patch.msgid.link/20251109162451.4779-1-michal.wajdeczko@intel.com

drm/xe/pcode: Rework error mapping

The sparse array used for error decoding from is unnecessarily big. It
should be better handled by a switch statement that will also allow us
to more easily improve this code.

Add a CASE_ERR() macro to keep the table compact and use it instead of
the 256-entries array, which saves some space:

$ bloat-o-meter xe_pcode.o.old xe_pcode.o
add/remove: 0/1 grow/shrink: 2/0 up/down: 190/-4096 (-3906)
Function                                     old     new   delta
__pcode_mailbox_rw                           363     465    +102
__pcode_mailbox_rw.cold                       58     146     +88
err_decode                                  4096       -   -4096
Total: Before=7890, After=3984, chg -49.51%

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20251110-pcode-errmap-v2-1-cb18c8f54238@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: fix kernel-doc function name mismatch in xe_pm.c

Documentation build reported:

WARNING: ./drivers/gpu/drm/xe/xe_pm.c:131 expecting prototype for xe_pm_might_block_on_suspend(). Prototype was for xe_pm_block_on_suspend() instead

The kernel-doc comment for xe_pm_block_on_suspend() incorrectly used
the function name xe_pm_might_block_on_suspend(). Fix the header to
match the actual function prototype.

No functional changes.

Fixes: f73f6dd312a5 ("drm/xe/pm: Add lockdep annotation for the pm_block completion")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511061736.CiuroL7H-lkp@intel.com/
Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251110184206.2113830-1-kriish.sharma2006@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge tag 'amd-drm-next-6.19-2025-11-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.19-2025-11-07:

amdgpu:
- Misc fixes
- HMM cleanup
- HDP flush rework
- RAS updates
- SMU 13.x updates
- SI DPM cleanup
- Suspend rework
- UQ reset support
- Replay/PSR fixes
- HDCP updates
- DC PMO fixes
- DC pstate fixes
- DCN4 fixes
- GPUVM fixes
- SMU 13 parition metrics
- Fix possible fence leak in job cleanup
- Hibernation fix
- MST fix

amdkfd:
- HMM cleanup
- Process cleanup fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251107145938.26669-1-alexander.deucher@amd.com

drm/xe/pf: Add runtime registers for GFX ver >= 35

Add a dedicated runtime register list for GFX ver >= 35.
Compared to the list for GFX >= 30, this variant drops
HUC_KERNEL_LOAD_INFO, MIRROR_FUSE1 and adds SERVICE_COPY_ENABLE.

v2:
- drop MIRROR_FUSE1 register
- update commit message

Fixes: 5e0de2dfbc1b ("drm/xe/cri: Add CRI platform definition")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251107211845.3633633-1-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/vram: Move forcewake down to get_flat_ccs_offset()

With SG_TILE_ADDR_RANGE use, the only thing requiring GT forcewake while
probing for vram size is the get_flat_ccs_offset(). Move the forcewake
down where it's needed.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251107-tile-addr-v1-2-a3014aadc2e7@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: Use SG_TILE_ADDR_RANGE instead of TILE_ADDR_RANGE

The TILE_ADDR_RANGE register is not available on all platforms going
forward as it was deprecated and is being replaced by equivalent
registers within SoC MMIO space. While that doesn't happen, the
SG_TILE_ADDR_RANGE (base 0x1083a0) is still valid for all platforms
supported by xe. Use that instead.

BSpec: 59353, 54991
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251107-tile-addr-v1-1-a3014aadc2e7@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: Fix MTL vm_max_level

MTL was broken after the vm_max_level movement. Get it back to a
working value.

[   37.722413] xe 0000:00:02.0: [drm] Tile0: GT0: VM job timed out on non-killed execqueue
[   37.722465] WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/xe/xe_guc_submit.c:1379 guc_exec_queue_timedout_job+0x2f3/0xe00 [xe]
[   37.722559] Modules linked in: xt_REDIRECT nft_compat nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr sunrpc bnep snd_ctl_led snd_soc_s\
of_sdw snd_soc_intel_hda_dsp_common snd_soc_sdw_utils snd_sof_probes snd_soc_rt712_sdca regmap_sdw_mbq snd_hda_codec_intelhdmi regmap_sdw snd_soc_dmic snd_hda_intel snd_sof_pci_intel_mtl iwlmvm snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink\
snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp binfmt_misc snd_sof mac80211 vfat snd_sof_utils fat snd_hda_ext_core snd_hda_codec snd_hda_core snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi snd_hwdep \
crc8 soundwire_bus libarc4 snd_soc_sdca snd_soc_core
[   37.722584]  snd_compress ac97_bus uvcvideo snd_pcm_dmaengine iwlwifi snd_seq uvc videobuf2_vmalloc snd_seq_device videobuf2_memops videobuf2_v4l2 snd_pcm processor_thermal_device_pci videobuf2_common processor_thermal_device btusb intel_uncore_frequency processor_thermal_wt_hint intel_uncore_frequency_common platform_temp\
erature_control videodev btmtk spi_nor processor_thermal_soc_slider x86_pkg_temp_thermal btrtl snd_timer iTCO_wdt processor_thermal_rfim intel_powerclamp btbcm intel_pmc_bxt snd intel_rapl_msr processor_thermal_rapl coretemp iTCO_vendor_support mei_gsc_proxy btintel intel_rapl_common rapl intel_cstate cfg80211 bluetooth mc in\
tel_pmc_core mtd soundcore acer_wmi mei_me intel_uncore processor_thermal_wt_req i2c_i801 spi_intel_pci pmt_telemetry platform_profile mei processor_thermal_power_floor spi_intel i2c_smbus pmt_discovery igen6_edac pcspkr rfkill wmi_bmof idma64 processor_thermal_mbox intel_hid pmt_class int3403_thermal int3400_thermal joydev i\
nt340x_thermal_zone acpi_pad sparse_keymap
[   37.722611]  intel_pmc_ssram_telemetry acpi_thermal_rel acer_wireless loop nfnetlink zram lz4hc_compress lz4_compress dm_crypt xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm_helper i915 nvme i2c_algo_bit nvme_core drm_buddy ucsi_acpi ttm typec_ucsi typec nvme_keyring nvme_auth hkdf drm_displa\
y_helper hid_multitouch polyval_clmulni thunderbolt intel_vpu ghash_clmulni_intel cec vmd i2c_hid_acpi video intel_vsec i2c_hid wmi pinctrl_meteorlake serio_raw i2c_dev fuse
[   37.722638] CPU: 0 UID: 0 PID: 12 Comm: kworker/u88:0 Not tainted 6.18.0-rc2+ #37 PREEMPT(voluntary)
[   37.722641] Hardware name: Acer Swift SFG14-72/Coral_MTH, BIOS V1.01 11/06/2023
[   37.722643] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   37.722649] RIP: 0010:guc_exec_queue_timedout_job+0x2f3/0xe00 [xe]
[   37.722722] Code: 4c 24 10 44 89 44 24 08 e8 5a 95 f1 d4 44 8b 44 24 08 8b 4c 24 10 48 c7 c7 00 b7 25 c1 48 8b 54 24 18 48 89 c6 e8 4d 59 37 d4 <0f> 0b 80 3c 24 00 0f 85 55 03 00 00 49 8b 47 58 a8 01 75 1a 49 8b
[   37.722723] RSP: 0018:ffffd468000f7d80 EFLAGS: 00010246
[   37.722725] RAX: 0000000000000000 RBX: ffff8e3d4e215c00 RCX: 0000000000000027
[   37.722726] RDX: ffff8e40ae61cfc8 RSI: 0000000000000001 RDI: ffff8e40ae61cfc0
[   37.722727] RBP: 00000000fffffffb R08: 0000000000000000 R09: ffffd468000f7c20
[   37.722727] R10: ffff8e40c09fffa8 R11: 00000000fffbffff R12: ffff8e3d44c00028
[   37.722728] R13: ffff8e3d807d4000 R14: ffff8e3d807d4018 R15: ffff8e3d95c9d600
[   37.722729] FS:  0000000000000000(0000) GS:ffff8e4116110000(0000) knlGS:0000000000000000
[   37.722729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.722730] CR2: 00007ff1f3e02720 CR3: 0000000113c8d005 CR4: 0000000000f70ef0
[   37.722731] PKRU: 55555554
[   37.722731] Call Trace:
[   37.722734]  <TASK>
[   37.722735]  ? __pfx_autoremove_wake_function+0x10/0x10
[   37.722740]  drm_sched_job_timedout+0x81/0x170 [gpu_sched]

Fixes: 50292f9af8ec ("drm/xe: Move 'vm_max_level' flag back to platform descriptor")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251108040634.6376-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/vf: Enable VF resource fixup unconditionally

All the feature enabling code is in place - drop the debug flag
requirement for VF resource fixup.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251107161000.1938186-1-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/tests: Add KUnit tests for PF fair provisioning

Add test cases to check outcome of fair GuC context or doorbells
IDs allocations for regular and admin-only PF mode.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251106165932.2143-1-michal.wajdeczko@intel.com

drm/xe/pf: Use migration-friendly doorbells auto-provisioning

Instead of trying very hard to find the largest fair number of GuC
doorbell IDs that could be allocated for VFs on the current GT, pick
some smaller rounded down to power-of-two value that is more likely
to be provisioned in the same manner by the other PF instance:

  num VFs | num doorbells
  --------+--------------
   63..32 | 4
   31..16 | 8
   15..8  | 16
    7..4  | 32
    3..2  | 64
       1  | 128 (regular PF)
       1  | 240 (admin only PF)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251105183253.863-3-michal.wajdeczko@intel.com

drm/xe/pf: Use migration-friendly context IDs auto-provisioning

Instead of trying very hard to find the largest fair number of GuC
context IDs that could be allocated for VFs on the current GT, pick
some smaller rounded down to power-of-two value that is more likely
to be provisioned in the same manner by the other PF instance:

num VFs | num contexts
--------+-------------
  63..32 | 1024
  31..16 | 2048
  15..8  | 4096
   7..4  | 8192
   3..2  | 16384
      1  | 32768 (regular PF)
      1  | 64512 (admin only PF)

Add also helper function to determine if the PF is admin-only,
and for now use .probe_display flag for that.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251105183253.863-2-michal.wajdeczko@intel.com

Merge tag 'drm-misc-next-2025-11-05-1' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.19-rc1:

UAPI Changes:
- Add userptr support to ivpu.
- Add IOCTL's for resource and telemetry data in amdxdna.

Core Changes:
- Improve some atomic state checking handling.
- drm/client updates.
- Use forward declarations instead of including drm_print.h
- RUse allocation flags in ttm_pool/device_init and allow specifying max
  useful pool size and propagate ENOSPC.
- Updates and fixes to scheduler and bridge code.
- Add support for quirking DisplayID checksum errors.

Driver Changes:
- Assorted cleanups and fixes in rcar-du, accel/ivpu, panel/nv3052cf,
  sti, imxm, accel/qaic, accel/amdxdna, imagination, tidss, sti,
  panthor, vkms.
- Add Samsung S6E3FC2X01 DDIC/AMS641RW, Synaptics TDDI series DSI,
  TL121BVMS07-00 (IL79900A) panels.
- Add mali MediaTek MT8196 SoC gpu support.
- Add etnaviv GC8000 Nano Ultra VIP r6205 support.
- Document powervr ge7800 support in the devicetree.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/5afae707-c9aa-4a47-b726-5e1f1aa7a106@linux.intel.com

Merge tag 'drm-intel-next-2025-11-04' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

drm/i915 feature pull for v6.19:

Features and functionality:
- Enable LNL+ content adaptive sharpness filter (CASF) (Nemesa)
- Use optimized VRR guardband (Ankit, Ville)
- Enable Xe3p LT PHY (Suraj)
- Enable FBC support for Xe3p_LPD display (Sai Teja, Vinod)
- Specify DMC firmware for display version 30.02 (Dnyaneshwar)
- Report reason for disabling PSR to debugfs (Michał)
- Extend i915_display_info with Type-C port details (Khaled)
- Log DSI send packet sequence errors and contents

Refactoring and cleanups:
- Refactoring to prepare for VRR guardband optimization (Ankit)
- Abstract VRR live status wait (Ankit)
- Refactor VRR and DSB timing to handle Set Context Latency explicitly (Ankit)
- Helpers for prefill latency calculations (Ville)
- Refactor SKL+ watermark latency setup (Ville)
- VRR refactoring and cleanups (Ville)
- SKL+ universal plane cleanups (Ville)
- Decouple CDCLK from state->modeset refactor (Ville)
- Refactor VLV/CHV clock functions (Jani)
- Refactor fbdev handling (Jani)
- Call i915 and xe runtime PM from display via function pointers (Jouni)
- IRQ code refactoring (Jani)
- Drop display dependency on i915 feature check macros (Jani)
- Refactor and unify i915 and xe stolen memory interfaces towards display (Jani)
- Switch to driver agnostic drm to display pointer chase (Jani)
- Use display version over graphics version in display code (Matt A)
- GVT cleanups (Jonathan, Andi)
- Rename a VLV clock function to unify (Michał)
- Explicitly sanitize DMC package header num entries (Luca)
- Remove redundant port clock check from ALPM (Jouni)
- Use sysfs_emit() instead of sprintf() in PMU sysfs (Madhur Kumar)
- Clean up C20 PHY PLL register macros (Imre, Mika))
- Abstract "address in MMIO table" helper for general use (Matt A)
- Improve VRR platform abstractions (Ville)
- Move towards more standard PCI PM code usage (Ville)
- Framebuffer refactoring (Ville)
- Drop display dependency on i915_utils.h (Jani)
- Include cleanups (Jani)

Fixes:
- Workaround docking station DSC issues with high pixel clock and bpp (Imre)
- Fix Panel Replay in DSC mode (Imre)
- Disable tracepoints for PREEMPT_RT as a workaround (Maarten)
- Fix intel_crtc_get_vblank_counter() on PREEMPT_RT (Maarten)
- Fix C10 PHY identification on PTL/WCL (Dnyaneshwar)
- Take AS SDP into account with optimized guardband (Jouni)
- Fix panic structure allocation memory leak (Jani)
- Adjust an FBC workaround platforms (Vinod)
- Add fallback for CDCLK selection (Naladala)
- Avoid using invalid transcoder in MST transport select (Suraj)
- Don't use cursor size reduction on display version 14+ (Nemesa)
- Fix C20 PHY PLL register programming (Imre, Mika)
- Fix PSR frontbuffer flush handling (Jouni)
- Store ALPM parameters in crtc state (Jouni)
- Defeature DRRS on LNL+ (Ville)
- Fix the scope of the large DRAM DIMM workaround (Ville)
- Fix PICA vs. AUX power ordering issue (Gustavo)
- Fix pixel rate for computing watermark line time (Ville)
- Fix framebuffer set_tiling vs. addfb race (Ville)
- DMC event handler fixes (Ville)

DRM Core:
- CRTC sharpness strength property (Nemesa)
- DPCD DSC quirk for Synaptics Panamera devices (Imre)
- Helpers to query the branch DSC max throughput/line-width (Imre)

Merges:
- Backmerge drm-next for v6.18-rc and to sync with drm-xe-next (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/ec5a05f2df6d597a62033ee2d57225cce707b320@intel.com

drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpg

Wa_15016589081 applies to Xe3_LPG renderCS

Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://patch.msgid.link/20251106100516.318863-2-nitin.r.gote@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

drm/amd/pm: Update default power1_cap

Update default power1_cap to max limit for smu_v13_0_6 and smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: skip writing eeprom when PMFW manages RAS data

Only update bad page number in legacy eeprom write path.

v2: add null pointer check for con.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Enable mst when it's detected but yet to be initialized

[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.

[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst

V2: Adjust the commit msg a bit

Fixes: bc068194f548 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT")
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: support to load RAS bad pages from PMFW

PMFW manages eeprom bad page records, update bad page loading
accrodingly.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix wait after reset sequence in S3

For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if
TOS is unloaded.

Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add ras_eeprom_read_idx interface

PMFW will manage RAS eeprom data by itself, add new interface to read
eeprom data via PMFW, we can read part of records by setting index.

v2: use IPID parse interface.
pa is not used and set it to a fixed value.
v3: optimize the null pointer check for IPID parse interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: make MCA IPID parse global

So we can call it in other blocks.

v2: add a new IPID parse interface for umc and we can
implement it for each ASIC.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Fix suspend failure with secure display TA

commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.

Add that to the failure handling path even on a quick failure.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679
Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Fix the issue of incorrect function call

When amdgpu_device_health_check fails, amdgpu_ras_pre_reset
will not be called and therefore amdgpu_ras_post_reset
cannot be called either.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix gpu page fault after hibernation on PF passthrough

On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.

Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.

The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: ras supports i2c eeprom for mp1 v13_0_12

ras supports i2c eeprom for mp1 v13_0_12.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Do not wait for queue op response during reset

This patch adds the condition to not wait for
the queue response for unmap, if the gpu is in reset.

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: need to unref bo

unref bo after amdgpu_bo_reserve() failure as it has
called amdgpu_bo_ref() already

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: initialize max record count after table reset

initialize max record count and record offset after table reset

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: check pmfw eeprom feature bit

get and check the pmfw eeprom feature bit to
decide if pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add check function for pmfw eeprom

add check function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add initialization function for pmfw eeprom

add initialization function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: adapt reset function for pmfw eeprom

adapt reset function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons

It's currently not possible to safely monitor if there's throttling
happening and what are the reasons. The approach of reading the status
and then reading the reasons is not reliable as by the time sysadmin
reads the reason, the throttling could not be happening anymore.

Previous tentative to fix that[1] was breaking the ABI and potentially
sysadmin's scripts. This takes a different approach of adding and
documenting the additional attribute. It's still valuable, though
redundant, to provide the simpler 0/1 interface.

In order to avoid userspace knowledge on the bitmask meaning and to be
able to maintain the kernel side in sync with possible changes in
future, just walk the attribute group and check what are the masks that
match the value read.

[1] https://lore.kernel.org/intel-xe/20241025092238.167042-1-raag.jadav@intel.com/

Cc: Raag Jadav <raag.jadav@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20251104-gt-throttle-cri-v5-1-4948b060bbfd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: Remove never used code in xe_vm_create()

Clang is not happy with set but unused variable (this is visible
with `make LLVM=1` build:

drivers/gpu/drm/xe/xe_vm.c:1462:11: error: variable 'number_tiles' set
but not used [-Werror,-Wunused-but-set-variable]

The use of this variable was removed in the commit mentioned below as
"Fixes:" but only its declaration and update remain.
It seems like the variable is not used along with the assignment that
does not have side effects as far as I can see.
Remove those altogether.

Fixes: cb99e12ba8cb ("drm/xe: Decouple bind queue last fence from TLB invalidations")
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patch.msgid.link/20251105011311.3177875-1-gwan-gyeong.mun@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

dt-bindings: gpu: img,powervr-rogue: Document GE7800 GPU in Renesas R-Car M3-N

Document Imagination Technologies PowerVR Rogue GE7800 BNVC 15.5.1.64
present in Renesas R-Car R8A77965 M3-N SoC.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20251104135716.12497-2-marek.vasut+renesas@mailbox.org
Signed-off-by: Matt Coster <matt.coster@imgtec.com>

dt-bindings: gpu: img,powervr-rogue: Keep lists sorted alphabetically

Sort the enum: list alphabetically. No functional change.

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20251104135716.12497-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Matt Coster <matt.coster@imgtec.com>

drm: rcar-du: fix incorrect return in rcar_du_crtc_cleanup()

The rcar_du_crtc_cleanup() function has a void return type, but
incorrectly uses a return statement with a call to drm_crtc_cleanup(),
which also returns void.

Remove the return statement to ensure proper function semantics.
No functional change intended.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20251017191634.1454201-1-alok.a.tiwari@oracle.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>

accel/ivpu: Improve debug and warning messages

Add IOCTL debug bit for logging user provided parameter validation
errors.

Refactor several warning and error messages to better reflect fault
reason. User generated faults should not flood kernel messages with
warnings or errors, so change those to ivpu_dbg(). Add additional debug
logs for parameter validation in IOCTLs.

Check size provided by in metric streamer start and return -EINVAL
together with a debug message print.

Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251104132418.970784-1-karol.wachowski@linux.intel.com

drm/xe: Remove unused GT page fault code

With the Xe page fault layer and GuC page layer in place, this is now
dead code and can be removed. ACC code is also removed, but this was
dead code.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-8-matthew.brost@intel.com

drm/xe: Add xe_guc_pagefault layer

Add xe_guc_pagefault layer (producer) which parses G2H fault messages
messages into struct xe_pagefault, forwards them to the page fault layer
(consumer) for servicing, and provides a vfunc to acknowledge faults to
the GuC upon completion. Replace the old (and incorrect) GT page fault
layer with this new layer throughout the driver.

As part of this change, the ACC handling code has been removed, as it is
dead code that is currently unused.

v2:
- Include engine instance (Stuart)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-7-matthew.brost@intel.com

drm/xe: Implement xe_pagefault_queue_work

Implement a worker that services page faults, using the same
implementation as in xe_gt_pagefault.c.

v2:
- Rebase on exhaustive eviction changes
- Include engine instance in debug prints (Stuart)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-6-matthew.brost@intel.com

drm/xe: Implement xe_pagefault_handler

Enqueue (copy) the input struct xe_pagefault into a queue (i.e., into a
memory buffer) and schedule a worker to service it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-5-matthew.brost@intel.com

drm/xe: Implement xe_pagefault_reset

Squash any pending faults on the GT being reset by setting the GT field
in struct xe_pagefault to NULL.

v4:
- Only do reset it page faults queues initialized (CI)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-4-matthew.brost@intel.com

drm/xe: Implement xe_pagefault_init

Create pagefault queues and initialize them.

v2:
- Fix kernel doc + add comment for number PF queue (Francois)
v4:
- Move init after GT init (CI, Francois)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-3-matthew.brost@intel.com

accel/amdxdna: Add IOCTL parameter for telemetry data

Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters
that allow collection of telemetry data.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104062546.833771-3-lizhi.hou@amd.com

accel/amdxdna: Add IOCTL parameter for resource data

Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters
that allow collection of resource data.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104062546.833771-2-lizhi.hou@amd.com

drm/xe: Stub out new pagefault layer

Stub out the new page fault layer and add kernel documentation. This is
intended as a replacement for the GT page fault layer, enabling multiple
producers to hook into a shared page fault consumer interface.

v2:
- Fix kernel doc typo (checkpatch)
- Remove comment around GT (Stuart)
- Add explaination around reclaim (Francois)
- Add comment around u8 vs enum (Francois)
- Include engine instance (Stuart)
v3:
- Fix XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION kernel doc (Stuart)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-2-matthew.brost@intel.com

accel/amdxdna: Add hardware specific attributes

Add three hardware specific attributes to describe device capabilities:
  hwctx_limit: The maximum number of hardware context supported.
  max_tops: The maximum TOPS supported.
  curr_tops: The TOPS achievable with the current power and frequency
             configuration.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104062546.833771-1-lizhi.hou@amd.com

drm/amdgpu: fix possible fence leaks from job structure

If we don't end up initializing the fences, free them when
we free the job. We can't set the hw_fence to NULL after
emitting it because we need it in the cleanup path for the
submit direct case.

v2: take a reference to the fences if we emit them
v3: handle non-job fence in error paths

Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling")
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: suspend ras module before gpu reset

During gpu reset, all GPU-related resources are
inaccessible. To avoid affecting ras functionality,
suspend ras module before gpu reset and resume
it after gpu reset is complete.

V2:
  Rename functions to avoid misunderstanding.

V3:
  Move flush_delayed_work to amdgpu_ras_process_pause,
  Move schedule_delayed_work to amdgpu_ras_process_unpause.

V4:
  Rename functions.

V5:
  Move the function to amdgpu_ras.c.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add wrapper functions for pmfw eeprom interface

add wrapper functions for pmfw eeprom interface, for these interfaces
to be easily and safely called

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add function to check if pmfw eeprom is supported

add function to check if pmfw is supported, skip eeprom
check and recover when pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: add smu ras driver framework

add functions to get smu ras driver

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12

implement ras_smu_drv interface for smu v13.0.12

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: add new message definitions for pmfw eeprom interface

Add new message definitions for pmfw eeprom interface

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amdkfd: Improve signal event slow path"

To fix regression report on gfx8, which requires the exhaustive search
path for signaled event.

The high CPU usage of KFD interrupt wq issue is gone after HIP/ROCr add
option to reduce HW event interrupts, safe to revert this optimization
patch now.

This reverts commit de844846f72b152119faaef1b363448dc8ea368f.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix NULL deref in debugfs odm_combine_segments

When a connector is connected but inactive (e.g., disabled by desktop
environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading
odm_combine_segments causes kernel NULL pointer dereference.

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy)  e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6
Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025
RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
  <TASK>
  seq_read_iter+0x125/0x490
  ? __alloc_frozen_pages_noprof+0x18f/0x350
  seq_read+0x12c/0x170
  full_proxy_read+0x51/0x80
  vfs_read+0xbc/0x390
  ? __handle_mm_fault+0xa46/0xef0
  ? do_syscall_64+0x71/0x900
  ksys_read+0x73/0xf0
  do_syscall_64+0x71/0x900
  ? count_memcg_events+0xc2/0x190
  ? handle_mm_fault+0x1d7/0x2d0
  ? do_user_addr_fault+0x21a/0x690
  ? exc_page_fault+0x7e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x6c/0x74
RIP: 0033:0x7f44d4031687
Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00>
RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687
RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003
RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
  </TASK>
Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x>
  snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn>
  platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp>
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
PKRU: 55555554

Fix this by checking pipe_ctx->stream_res.tg before dereferencing.

Fixes: 07926ba8a44f ("drm/amd/display: Add debugfs interface for ODM combine info")
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Mario Limoncello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Don't clear PT after process killed

If process is killed. the vm entity is stopped, submit pt update job
will trigger the error message "*ERROR* Trying to push to a killed
entity", job will not execute.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Add ras support for umc v12_5_0

Add ras support for umc v12_5_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Add ras support for nbio v7_9_1

Add ras support for nbio v7_9_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add ras ip block name

Add ras ip block name.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Increase ras switch control range

Increase ras switch control range.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: Handle S0ix for vangogh

Fix the flows for S0ix. There is no need to stop
rlc or reintialize PMFW in S0ix.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update SMUv13.0.12 partition metrics

Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update SMUv13.0.6 partition metrics

For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add schema v1.1 for parition metrics

Use a schema similar to gpu metrics v1.9 for partition metrics also. It
will have field type encoded followed by the field value(s). The
attribute ids used will be shared with gpu metrics. The structure
definition is only to distinguish between gpu metrics and partition
metrics though both gpu metrics v1.9 and partition metrics v1.1 follow
the same definition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12

Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>