git.ipfire.org Git - thirdparty/kernel/stable.git/log

drm/xe: Never report L3 bank mask for media GT going forward

We currently report an L3 bank mask as part of the GT topology on both
GTs (primary and media) because a copy of the L3 bank fuse register
exists on both GTs (e.g., $gsi_offset + 0x9130 on Xe3).  After recent
discussions it's come to light that the only known userspace software
that uses this part of the uapi (the compute UMD and Mesa) only uses the
value reported for the primary GT; the value reported for the media GT
is ignored by both projects, and the media UMDs don't have any use for
L3 information today.  Since we always strive to have our uapi match the
specific needs of userspace and not include additional unused baggage,
let's officially drop L3 bank reporting on the media GT going forward
and only keep it around for the primary GT where it actually gets used.
This change will only apply to future platforms (Xe3 and later); even
though it would probably be safe to remove it from Xe1/Xe2 as well, we
don't want to take any chances with changing existing ABI.

Note that we'd already disabled reading/reporting of the L3 bank for the
media GT on PTL in commit 9ab440a9d042 ("drm/xe/ptl: L3bank mask is not
available on the media GT") because it was discovered that the copy of
the fuse registers on the media GT were just reporting a bogus ~0 value
rather than an accurate mask.  So this is just extending that PTL
behavior forward to WCL and other future platforms.  Note that we're
also free to reinstate this part of the uapi in the future if/when some
new userspace consumer emerges that _does_ have a use for media-specific
L3 bank masks.

Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20250905215614.796247-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

drm/xe/debugfs: Don't expose dgfx residencies attributes on VF

In addition of checking if we are running on the BATTLEMAGE,
we should also check for not being a VF driver, as VFs can't
access necessary registers, and doing so leads to:

| .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0
| RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe]
| Call Trace:
|  xe_mmio_read32+0x110/0x280 [xe]
|  read_residency_counter+0x42/0xd0 [xe]
|  dgfx_pkg_residencies_show+0x115/0x190 [xe]
| .. [drm] Package G2 counter failed to read, ret -19

or

| .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0
| RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe]
| Call Trace:
|  xe_mmio_read32+0x110/0x280 [xe]
|  read_residency_counter+0x42/0xd0 [xe]
|  dgfx_pcie_link_residencies_show+0xe7/0x160 [xe]
| .. [drm] PCIE LINK L0 RESIDENCY counter failed to read, ret -19

Similarly, there is no point to expose inject_csc_hw_error on VFs,
as HW errors support is already disabled for VFs.

Bspec: 53221
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Soham Purkait <soham.purkait@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905173625.8398-1-michal.wajdeczko@intel.com

drm/xe/hwmon: Use devm_mutex_init()

Use devm_mutex_init() instead of hand-writing it.

This saves some LoC, improves readability and saves some space in the
generated .o file.

Before:
======
   text    data     bss     dec     hex filename
  36884   10296      64   47244    b88c drivers/gpu/drm/xe/xe_hwmon.o

After:
=====
   text    data     bss     dec     hex filename
  36651   10224      64   46939    b75b drivers/gpu/drm/xe/xe_hwmon.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/989e96369e9e1f8a44b816962917ec76877c912d.1757252520.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/debugfs: Make residencies definitions const

No need to keep them non-const. Also fix declaration of .name
member, as it points to the const string. This translates to:

  add/remove: 1/0 grow/shrink: 0/2 up/down: 80/-248 (-168)
  Function                                     old     new   delta
  residencies                                    -      80     +80
  dgfx_pcie_link_residencies_show              365     263    -102
  dgfx_pkg_residencies_show                    454     308    -146
  Total: Before=2821548, After=2821380, chg -0.01%

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Soham Purkait <soham.purkait@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Karthik Poosa <karthik.poosa@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250905180225.8434-1-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/i2c: Enable bus mastering

Enable bus mastering for I2C controller to support device initiated
in-band transactions.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20250908055320.2549722-1-raag.jadav@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/vf: Move VF CCS debugfs attribute

The VF CCS handling is per-device so its debugfs file should not
be exposed on per-GT basis. Move it up to the device level.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908123025.747-8-michal.wajdeczko@intel.com

drm/xe/vf: Move VF CCS data to xe_device

We only need single set of VF CCS contexts, they are not per-tile
as initial implementation might suggest. Move all VF CCS data from
xe_tile.sriov.vf to xe_device.sriov.vf. Also rename some structs to
align with the usage and fix their kernel-doc.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908123025.747-7-michal.wajdeczko@intel.com

drm/xe/bo: Add xe_bo_has_valid_ccs_bb helper

This will allow as to drop ugly IS_VF_CCS_BB_VALID macro.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908123025.747-6-michal.wajdeczko@intel.com

drm/xe/vf: Use single check when calling VF CCS functions

All xe_sriov_vf_ccs() functions but init() expect to be called
when initialization was successful and CCS handling is ready.

Update IS_VF_CCS_READY macro and use it as single entry guard.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908123025.747-5-michal.wajdeczko@intel.com

drm/xe/vf: Drop IS_VF_CCS_INIT_NEEDED macro

We only use this macro once and we can open-code it to explicitly
show relevant conditions and avoid duplications.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908123025.747-4-michal.wajdeczko@intel.com

drm/xe/guc: Use proper flag definitions when registering context

In H2G action context type is specified in flags dword in bits 2:1.
Use generic FIELD_PREP macro instead of misleading BIT logic.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250908123025.747-3-michal.wajdeczko@intel.com

drm/xe/guc: Rename xe_guc_register_exec_queue

This function is dedicated for use by the VFs, we shouldn't use
name that might suggests it's general purpose. While there, update
asserts to better reflect intended usage.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908123025.747-2-michal.wajdeczko@intel.com

drm/xe/configfs: Use config_group_put()

configfs has a config_group_put() helper that was adopted by
commit 88df7939d728 ("drm/xe/configfs: Rename struct xe_config_device").
Another pending work to add psmi later landed in commit
afe902848b41 ("drm/xe/configfs: Allow to enable PSMI") and didn't use
the helper.

Use config_group_put() consistently to hide the inner workings of
configfs. No change in behavior since it does exactly the same thing
as currently being done.

Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20250905162236.578117-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/guc: Fix badly worded error message

If a GuC id lookup failed, the error message was 'Not engine present',
which is bad in multiple ways - incorrect English and 'engines' are
now called 'exec queues' in this context. So fix it.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250904195752.3846138-3-John.C.Harrison@Intel.com

drm/xe/guc: Clean up of GuC 'CTL' defines

All the field generation for the CTL defines (used for GuC init data)
were hand-rolled rather than using FIELD_PREP/REG_GENMASK/BIT macros.

Also, there were a bunch of macros defined for verbosity settings that
were never used.

So fix that all up.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250904195752.3846138-2-John.C.Harrison@Intel.com

drm/xe: Extend Wa_13011645652 to PTL-H, WCL

Expand workaround to additional graphics architectures.

Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.17+
Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250903190122.1028373-2-julia.filipchuk@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: Block exec and rebind worker while evicting for suspend / hibernate

When the xe pm_notifier evicts for suspend / hibernate, there might be
racing tasks trying to re-validate again. This can lead to suspend taking
excessive time or get stuck in a live-lock. This behaviour becomes
much worse with the fix that actually makes re-validation bring back
bos to VRAM rather than letting them remain in TT.

Prevent that by having exec and the rebind worker waiting for a completion
that is set to block by the pm_notifier before suspend and is signaled
by the pm_notifier after resume / wakeup.

It's probably still possible to craft malicious applications that block
suspending. More work is pending to fix that.

v3:
- Avoid wait_for_completion() in the kernel worker since it could
  potentially cause work item flushes from freezable processes to
  wait forever. Instead terminate the rebind workers if needed and
  re-launch at resume. (Matt Auld)
v4:
- Fix some bad naming and leftover debug printouts.
- Fix kerneldoc.
- Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld).
- Rework the interface of xe_vm_rebind_resume_worker (Matt Auld).

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com

drm/xe: Allow the pm notifier to continue on failure

Its actions are opportunistic anyway and will be completed
on device suspend.

Marking as a fix to simplify backporting of the fix
that follows in the series.

v2:
- Keep the runtime pm reference over suspend / hibernate and
document why. (Matt Auld, Rodrigo Vivi):

Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-3-thomas.hellstrom@linux.intel.com

drm/xe: Attempt to bring bos back to VRAM after eviction

VRAM+TT bos that are evicted from VRAM to TT may remain in
TT also after a revalidation following eviction or suspend.

This manifests itself as applications becoming sluggish
after buffer objects get evicted or after a resume from
suspend or hibernation.

If the bo supports placement in both VRAM and TT, and
we are on DGFX, mark the TT placement as fallback. This means
that it is tried only after VRAM + eviction.

This flaw has probably been present since the xe module was
upstreamed but use a Fixes: commit below where backporting is
likely to be simple. For earlier versions we need to open-
code the fallback algorithm in the driver.

v2:
- Remove check for dgfx. (Matthew Auld)
- Update the xe_dma_buf kunit test for the new strategy (CI)
- Allow dma-buf to pin in current placement (CI)
- Make xe_bo_validate() for pinned bos a NOP.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995
Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com

drm/xe/migrate: Remove unneeded emit_pte() when copying CCS only

In xe_migrate_copy(), when copy_only_ccs is true, we only need two
emit_pte() calls one for the BO and one for the raw CCS storage.
However, the current implementation issues three emit_pte() calls,
resulting in an unnecessary PTE programming job.

This fix removes the redundant emit_pte() call to avoid programming
the same PTEs twice and reducing overhead during CCS-only migration.

v2: Preserve correct behavior on DG2, which requires both CCS and
page copies.

Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904161423.2448727-1-sanjay.kumar.yadav@intel.com

drm/xe: Fix broken kernel-doc for the struct xe_bo

Use correct multi-line kernel-doc style if required.
Some members were described only in the commit message.
Some other members were described using wrong names.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250904144026.7222-1-michal.wajdeczko@intel.com

drm/xe/kunit: Drop xe_wa_test_exit

Remove xe_wa_test_exit() as it could crach the KUnit kernel in
case of hitting some asserts in xe_wa_test_init() as test->priv
could not be pointing to expected data.

|    # xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:34
|    Expected ret == 0, but
|        ret == -19 (0xffffffffffffffed)
|Bus error - the host /dev/shm or /tmp mount likely just ran out of space
|Kernel panic - not syncing: Kernel mode signal 7

Note that there is no need to call drm_kunit_helper_free_device()
since our fake device allocated by drm_kunit_helper_alloc_device()
will be cleaned up automatically.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250829171922.572-7-michal.wajdeczko@intel.com

drm/xe/kunit: Promote fake platform parameter list

The list of all known representative platforms defined in xe_wa
could be used in more places by other test suites.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250829171922.572-6-michal.wajdeczko@intel.com

drm/xe/kunit: Drop custom struct platform_test_case

Custom struct platform_test_case definition in xe_wa is now almost
identical to generic struct xe_pci_fake_data defintiion except the
.name member, which could be generated by xe_pci_fake_data_desc().

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250829171922.572-5-michal.wajdeczko@intel.com

drm/xe/kunit: Introduce xe_pci_fake_data_desc()

We already use struct xe_pci_fake_data to provide custom config of
the fake PCI device and soon we will be using this struct also as
direct parameter for the parameterized Xe KUnit tests.

Add function to generate description based on that config data.

For platform or subplatform name lookup pciidlist which already
have definitions of all supported platforms.

Examples:

  TIGERLAKE
  TIGERLAKE A0
  TIGERLAKE SR-IOV PF
  ...
  PANTHERLAKE 30.00(Xe3_LPG) 30.00(Xe3_LPM)
  PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
  PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0 SR-IOV VF

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250829171922.572-4-michal.wajdeczko@intel.com

drm/xe/kunit: Update struct xe_pci_fake_data step declarations

The struct xe_pci_fake_data has fields that specify graphics and
media stepping of the fake PCI device used during KUnit testing.

Change definitions of those separate step fields and use existing
struct xe_step_info definition that already have required fields.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250829171922.572-3-michal.wajdeczko@intel.com

drm/xe: Allow to stub lookup for graphics and media IP

In upcoming patch we will want to replace lookup code during the
test to relax the strict match that we use in production.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250829171922.572-2-michal.wajdeczko@intel.com

drm/xe: improve dma-resv handling for backup object

Since the dma-resv is shared we don't need to reserve and add a fence
slot fence twice, plus no need to loop through the dependencies.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250829164715.720735-2-matthew.auld@intel.com

drm/xe/pt: unify xe_pt_svm_pre_commit with userptr

We now use the same notifier lock for SVM and userptr, with that we can
combine xe_pt_userptr_pre_commit and xe_pt_svm_pre_commit.

v2: (Matt B)
  - Re-use xe_svm_notifier_lock/unlock for userptr.
  - Combine svm/userptr handling further down into op_check_svm_userptr.
v3:
  - Only hide the ops if we lack DRM_GPUSVM, since we also need them for
    userptr.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-18-matthew.auld@intel.com

drm/xe/userptr: replace xe_hmm with gpusvm

Goal here is cut over to gpusvm and remove xe_hmm, relying instead on
common code. The core facilities we need are get_pages(), unmap_pages()
and free_pages() for a given useptr range, plus a vm level notifier
lock, which is now provided by gpusvm.

v2:
  - Reuse the same SVM vm struct we use for full SVM, that way we can
    use the same lock (Matt B & Himal)
v3:
  - Re-use svm_init/fini for userptr.
v4:
  - Allow building xe without userptr if we are missing DRM_GPUSVM
    config. (Matt B)
  - Always make .read_only match xe_vma_read_only() for the ctx. (Dafna)
v5:
  - Fix missing conversion with CONFIG_DRM_XE_USERPTR_INVAL_INJECT
v6:
  - Convert the new user in xe_vm_madise.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Dafna Hirschfeld <dafna.hirschfeld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-17-matthew.auld@intel.com

drm/xe/vm: split userptr bits into separate file

This will simplify compiling out the bits that depend on DRM_GPUSVM in a
later patch. Without this we end up littering the code with ifdef
checks, plus it becomes hard to be sure that something won't blow at
runtime due to something not being initialised, even though it passed
the build. Should be no functional change here.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-16-matthew.auld@intel.com

drm/gpusvm: export drm_gpusvm_pages API

Export get/unmap/free pages API. We also need to tweak the SVM init to
allow skipping much of the unneeded parts.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-15-matthew.auld@intel.com

drm/gpusvm: refactor core API to use pages struct

Refactor the core API of get/unmap/free pages to all operate on
drm_gpusvm_pages. In the next patch we want to export a simplified core
API without needing fully blown svm range etc.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-14-matthew.auld@intel.com

drm/gpusvm: pull out drm_gpusvm_pages substructure

Pull the pages stuff from the svm range into its own substructure, with
the idea of having the main pages related routines, like get_pages(),
unmap_pages() and free_pages() all operating on some lower level
structures, which can then be re-used for stuff like userptr.

v2:
- Move seq into pages struct (Matt B)
v3:
- Small kernel-doc fixes

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-13-matthew.auld@intel.com

drm/gpusvm: use more selective dma dir in get_pages()

If we are only reading the memory then from the device pov the direction
can be DMA_TO_DEVICE. This aligns with the xe-userptr code. Using the
most restrictive data direction to represent the access is normally a
good idea.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-12-matthew.auld@intel.com

drm/gpusvm: fix hmm_pfn_to_map_order() usage

Handle the case where the hmm range partially covers a huge page (like
2M), otherwise we can potentially end up doing something nasty like
mapping memory which is outside the range, and maybe not even mapped by
the mm. Fix is based on the xe userptr code, which in a future patch
will directly use gpusvm, so needs alignment here.

v2:
- Add kernel-doc (Matt B)
- s/fls/ilog2/ (Thomas)

Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-11-matthew.auld@intel.com

drm/xe/xe2hpg: Add Wa_18041344222 for Xe2_HPG

Add Wa_18041344222 for Xe2_HPG that requires disabling
the perf mode for subslice count for eustall sampling
when the enabled slices are discontiguous.

Bspec: 79483, 56024
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/b6a631a13a9fb7360e89d679e0797fae42d5a09e.1756855529.git.harish.chegondi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/mcr: Make xe_gt_mcr_get_dss_steering() input gt a const

Make gt, input parameter to xe_gt_mcr_get_dss_steering(), a
constant. This would allow xe_gt_mcr_get_dss_steering() to
be called from functions that have gt as const to struct xe_gt.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://lore.kernel.org/r/9dc621a90880f62ac8e2951afea7952277f7eb0e.1756855529.git.harish.chegondi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/configfs: Don't expose survivability_mode if not applicable

The survivability_mode attribute is applicable only for DGFX and
platforms newer than BATTLEMAGE. Use .is_visible() hook to hide
this attribute when above conditions are not met. Remove code that
was trying to fix such configuration during the runtime.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250902131744.5076-4-michal.wajdeczko@intel.com

drm/xe/configfs: Prepare to filter-out configfs attributes

Implement empty ops.is_visible hook to allow filtering-out any not
supported attributes, as not all of them are applicable on all xe
platforms.

Since during creation of each new configfs directory we are looking
for xe device descriptor to validate that xe driver supports given
PCI device, store reference to that descriptor to allow later use
while doing attribute filtering.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250902131744.5076-3-michal.wajdeczko@intel.com

drm/xe/configfs: Don't touch survivability_mode on fini

This is a user controlled configfs attribute, we should not
modify that outside the configfs attr.store() implementation.

Fixes: bc417e54e24b ("drm/xe: Enable configfs support for survivability mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com

drm/xe/guc: Set upper limit of H2G retries over CTB

The GuC communication protocol allows GuC to send NO_RESPONSE_RETRY
reply message to indicate that due to some interim condition it can
not handle incoming H2G request and the host shall resend it.

But in some cases, due to errors, this unsatisfied condition might
be final and this could lead to endless retries as it was recently
seen on the CI:

[drm] GT0: PF: VF1 FLR didn't finish in 5000 ms (-ETIMEDOUT)
[drm] GT0: PF: VF1 resource sanitizing failed (-ETIMEDOUT)
[drm] GT0: PF: VF1 FLR failed!
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0

To avoid such dangerous loops allow only limited number of retries
(for now 50) and add some delays (n * 5ms) to slow down the rate of
resending this repeated request.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://lore.kernel.org/r/20250903223330.6408-1-michal.wajdeczko@intel.com

drm/xe/debugfs: Move sa_info from gt to tile directory

Our drm-based suballocator is implemented per-tile so it is better
to show its debug information also per-tile debugfs directory, not
under per-gt directory as it is done today.

To allow adding more per-tile attributes, prepare necessary helper
functions, like we already did for per-gt or per-uc attributes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250829201106.1263-1-michal.wajdeczko@intel.com

drm/xe/vm: Fix error handling in xe_vm_query_vmas_attrs_ioctl()

copy_to_user() returns the number of bytes not copied on failure, not a
negative error code. Update the logic to return -EFAULT instead of the
number of bytes to correctly signal the error.

Fixes: 418807860e94 ("drm/xe/uapi: Add UAPI for querying VMA count and memory attributes")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250828104933.3839825-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe: Fix indentation in xe_zap_ptes_in_madvise_range

Fix misleading indentation around WRITE_ONCE in pte zap loop.
No functional change intended.

Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250828104933.3839825-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe: Add more SVM GT stats

Add more SVM GT stats which give visibility to where time is spent in
the SVM page fault handler. Stats include number of faults at a given
size, total SVM page fault time, migration time in us, copy time in us,
copy kb, get pages time in us, and bind time in us. Will help in tuning
SVM for performance.

v2:
- Include local changes
v3:
- Add tlb invalidation + valid page fault + per size copy size stats
v4:
- Ensure gt not NULL when incrementing SVM copy stats
- Normalize stats names
- Use magic macros to generate increment functions for ranges
v7:
- Use DEF_STAT_STR (Michal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250829172232.1308004-3-matthew.brost@intel.com

drm/xe: Add clearing stats to GT debugfs

It helpful to clear GT stats, run a test cases which is being profiled,
and look at the results of the stats from the individual test case. Make
stats entry writable and upon write clear the stats.

v5:
- Drop clear_stats debugfs entry (Lucas)
v6:
- Use xe_gt_stats_clear rather than helper (Michal)
- Rework loop in xe_gt_stats_clear (Michal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250829172232.1308004-2-matthew.brost@intel.com

drm/xe: Extend Wa_22021007897 to Xe3 platforms

WA 22021007897 should also be applied to Graphics Versions 30.00, 30.01
and 30.03. To make it simple, simply use the range [3000, 3003] that
should be ok as there isn't a 3002 and if it's added, the WA list would
need to be revisited anyway.

Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250827-wa-22021007897-v1-1-96922eb52af4@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/guc: Increase GuC crash dump buffer size

There are platforms already have a maximum dump size of 12KB, to avoid
data truncating, increase GuC crash dump buffer size to 16KB.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250829160427.1245732-1-zhanjun.dong@intel.com

drm/xe/vf: Enable CCS save/restore only on supported GUC versions

CCS save/restore is supported starting with GuC 70.48.0 (compatibility
version 1.23.0). Gate the feature on the GuC firmware version and keep it
disabled on older or unsupported versions.

Fixes: f3009272ff2e ("drm/xe/vf: Create contexts for CCS read write")
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250902103256.21658-2-satyanarayana.k.v.p@intel.com

drm/xe/guc: Add devm release action to safely tear down CT

When a buffer object (BO) is allocated with the XE_BO_FLAG_GGTT_INVALIDATE
flag, the driver initiates TLB invalidation requests via the CTB mechanism
while releasing the BO. However a premature release of the CTB BO can lead
to system crashes, as observed in:

Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:h2g_write+0x2f3/0x7c0 [xe]
Call Trace:
guc_ct_send_locked+0x8b/0x670 [xe]
xe_guc_ct_send_locked+0x19/0x60 [xe]
send_tlb_invalidation+0xb4/0x460 [xe]
xe_gt_tlb_invalidation_ggtt+0x15e/0x2e0 [xe]
ggtt_invalidate_gt_tlb.part.0+0x16/0x90 [xe]
ggtt_node_remove+0x110/0x140 [xe]
xe_ggtt_node_remove+0x40/0xa0 [xe]
xe_ggtt_remove_bo+0x87/0x250 [xe]

Introduce a devm-managed release action during xe_guc_ct_init() and
xe_guc_ct_init_post_hwconfig() to ensure proper CTB disablement before
resource deallocation, preventing the use-after-free scenario.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Summers Stuart <stuart.summers@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250901072541.31461-1-satyanarayana.k.v.p@intel.com

drm/xe: Fix incorrect migration of backed-up object to VRAM

If an object is backed up to shmem it is incorrectly identified
as not having valid data by the move code. This means moving
to VRAM skips the -EMULTIHOP step and the bo is cleared. This
causes all sorts of weird behaviour on DGFX if an already evicted
object is targeted by the shrinker.

Fix this by using ttm_tt_is_swapped() to identify backed-up
objects.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5996
Fixes: 00c8efc3180f ("drm/xe: Add a shrinker for xe bos")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250828134837.5709-1-thomas.hellstrom@linux.intel.com

drm/xe/uapi: Fix kernel-doc formatting for madvise and vma_query

Correct kernel-doc formatting issues in the UAPI definitions for
madvise and VMA query interfaces to resolve docutils warnings during
documentation build.

Fixes: 418807860e94 ("drm/xe/uapi: Add UAPI for querying VMA count and memory attributes")
Fixes: 231bb0ee7aa5 ("drm/xe/uapi: Add madvise interface")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250828071516.3838110-1-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/nvm: Use root tile mmio

To allow initialization of nvm during early probe for future usecases,
use root tile instead of root gt to access mmios, as gt is not
yet initialized at early probe.

v2: fix commit message (Lucas)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250825103537.2551837-1-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/tests: Make cross-device dma-buf BOs CPU-visible on small BAR

Small-BAR systems (e.g., SR-IOV VFs in VMs) expose only a subset of
VRAM via PCI/BAR. Exporting a BO outside that window fails, and the
selftests also do CPU fill/verify.

Set XE_BO_FLAG_NEEDS_CPU_ACCESS for cross-device variants to force
CPU-mappable placement and keep tests reliable. Large-BAR/P2P setups
are unaffected.

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250814145950.430231-1-marcin.bernatowicz@linux.intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

drm/xe/migrate: make MI_TLB_INVALIDATE conditional

When clearing VRAM we should be able to skip invalidating the TLBs if we
are only using the identity map to access VRAM (which is the common
case), since no modifications are made to PTEs on the fly. Also since we
use huge 1G entries within the identity map, there should be a pretty
decent chance that the next packet(s) (if also clears) can avoid a tree
walk if we don't shoot down the TLBs, like if we have to process a long
stream of clears.

For normal moves/copies, we usually always end up with the src or dst
being system memory, meaning we can't only rely on the identity map and
will also need to emit PTEs and so will always require a TLB flush.

v2:
- Update commit to explain the situation for normal copies (Matt B)
- Rebase on latest changes

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250808110452.467513-2-matthew.auld@intel.com

drm/xe: Split TLB invalidation code in frontend and backend

The frontend exposes an API to the driver to send invalidations, handles
sequence number assignment, synchronization (fences), and provides a
timeout mechanism. The backend issues the actual invalidation to the
hardware (or firmware).

The new layering easily allows issuing TLB invalidations to different
hardware or firmware interfaces.

Normalize some naming while here too.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-10-stuart.summers@intel.com

drm/xe: Add helpers to send TLB invalidations

Break out the GuC specific code into helpers as part of the process to
decouple frontback TLB invalidation code from the backend.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-9-stuart.summers@intel.com

drm/xe: Prep TLB invalidation fence before sending

It is a bit backwards to add a TLB invalidation fence to the pending
list after issuing the invalidation. Perform this step before issuing
the TLB invalidation in a helper function.

v2: Make sure the seqno_lock mutex covers the send as well (Matt)

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-8-stuart.summers@intel.com

drm/xe: Decouple TLB invalidations from GT

Decouple TLB invalidations from the GT by updating the TLB invalidation
layer to accept a `struct xe_tlb_inval` instead of a `struct xe_gt`.
Also, rename *gt_tlb* to *tlb*. The internals of the TLB invalidation
code still operate on a GT, but this is now hidden from the rest of the
driver.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-7-stuart.summers@intel.com

drm/xe: Add xe_gt_tlb_invalidation_done_handler

Decouple GT TLB seqno handling from G2H handler.

v2:
- Add kernel doc

Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-6-stuart.summers@intel.com

drm/xe: Add xe_tlb_inval structure

Extract TLB invalidation state into a structure to decouple TLB
invalidations from the GT, allowing the structure to be embedded
anywhere in the driver.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-5-stuart.summers@intel.com

drm/xe: s/tlb_invalidation/tlb_inval

tlb_invalidation is a bit verbose leading to ugly wraps in the code,
shorten to tlb_inval.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-4-stuart.summers@intel.com

drm/xe: Cancel pending TLB inval workers on teardown

Add a new _fini() routine on the GT TLB invalidation
side to handle this worker cleanup on driver teardown.

v2: Move the TLB teardown to the gt fini() routine called during
gt_init rather than in gt_alloc. This way the GT structure stays
alive for while we reset the TLB state.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-3-stuart.summers@intel.com

drm/xe: Move explicit CT lock in TLB invalidation sequence

Currently the CT lock is used to cover TLB invalidation
sequence number updates. In an effort to separate the GuC
back end tracking of communication with the firmware from
the front end TLB sequence number tracking, add a new lock
here to specifically track those sequence number updates
coming in from the user.

Apart from the CT lock, we also have a pending lock to
cover both pending fences and sequence numbers received
from the back end. Those cover interrupt cases and so
it makes not to overload those with sequence numbers
coming in from new transactions. In that way, we'll employ
a mutex here.

v2: Actually add the correct lock rather than just dropping
it... (Matt)

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-2-stuart.summers@intel.com

drm/xe/configfs: Block runtime attribute changes

Although it's possible to change the attributes in runtime, they have no
effect after the driver is already bound to the device. Check for that
and return -EBUSY in that case.

This should help users understand what's going on when the behavior is
not changing even if the value from the configfs is "right", but it got
to that state too late.

Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250826153210.3068808-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: Ensure GT is in C0 during resumes

This patch ensures the gt will be awake for the entire duration
of the resume sequences until GuCRC takes over and GT-C6 gets
re-enabled.

Before suspending GT-C6 is kept enabled, but upon resume, GuCRC
is not yet alive to properly control the exits and some cases of
instability and corruption related to GT-C6 can be observed.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
Link: https://lore.kernel.org/r/20250827000633.1369890-3-x.wang@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: make xe_gt_idle_disable_c6() handle the forcewake internally

Move forcewake_get() into xe_gt_idle_enable_c6() to streamline the
code and make it easier to use.

Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250827000633.1369890-2-x.wang@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/wcl: Extend L3bank mask workaround

The commit 9ab440a9d042 ("drm/xe/ptl: L3bank mask is not
available on the media GT") added a workaround to ignore
the fuse register that L3 bank availability as it did not
contain valid values. Same is true for WCL therefore extend
the workaround to cover it.

Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://lore.kernel.org/r/20250822002512.1129144-1-chaitanya.kumar.borah@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>

drm/xe/xe_hw_error: Add fault injection to trigger csc error handler

Add a debugfs fault handler to trigger csc error handler that
wedges the device and enables runtime survivability mode.

v2: add debugfs only for bmg (Umesh)
v3: do not use csc_fault attribute if debugfs is not enabled
v4: rebase

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-11-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors

Add support to handle CSC firmware reported errors. When CSC firmware
errors are encoutered, a error interrupt is received by the GFX device as
a MSI interrupt.

Device Source control registers indicates the source of the error as CSC
The HEC error status register indicates that the error is firmware reported
Depending on the type of error, the error cause is written to the HEC
Firmware error register.

On encountering such CSC firmware errors, the graphics device is
non-recoverable from driver context. The only way to recover from these
errors is firmware flash.

System admin/userspace is notified of the necessity of firmware flash
with a combination of vendor-specific drm device edged uevent, dmesg logs
and runtime survivability sysfs. It is the responsiblity of the consumer
to verify all the actions and then trigger a firmware flash using tools
like fwupd.

$ udevadm monitor --property --kernel
monitor will print the received events for:
KERNEL - the kernel uevent

KERNEL[754.709341] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm)
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0
SUBSYSTEM=drm
WEDGED=vendor-specific
DEVNAME=/dev/dri/card0
DEVTYPE=drm_minor
SEQNUM=5973
MAJOR=226
MINOR=0

Logs

xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: Tile0 reported NONFATAL error 0x20000
xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: NONFATAL: HEC Uncorrected FW FD Corruption error reported, bit[2] is set
xe 0000:03:00.0: Runtime Survivability mode enabled
xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
               IOCTLs and executions are blocked. Only a rebind may clear the failure
               Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
xe 0000:03:00.0: [drm] device wedged, needs recovery
xe 0000:03:00.0: Firmware flash required, Please refer to the userspace documentation for more details!

Runtime survivability Sysfs:

/sys/bus/pci/devices/<device>/survivability_mode

v2: use vendor recovery method with
    runtime survivability (Christian, Rodrigo, Raag)
v3: move declare wedged to runtime survivability mode (Rodrigo)
v4: update commit message

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-10-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add support to handle hardware errors

Gfx device reports two classes of errors: uncorrectable and
correctable. Depending on the severity uncorrectable errors are further
classified Non-Fatal and Fatal.

Correctable and Non-Fatal errors: These errors are reported as MSI. Bits in
the Master Interrupt Register indicate the class of the error.
The source of the error is then read from the Device Error Source
Register.

Fatal errors: These are reported as PCIe errors
When a PCIe error is asserted, the OS will perform a SBR (Secondary
Bus reset) which causes the driver to reload. The error registers are
sticky and the values are maintained through SBR.

Add basic support to handle these errors.

Bspec: 50875, 53073, 53074, 53075, 53076

v2: Format commit message (Umesh)
v3: fix documentation (Stuart)

Cc: Stuart Summers <stuart.summers@intel.com>
Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-9-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/doc: Document device wedged and runtime survivability

Add documentation for vendor specific device wedged recovery method
and runtime survivability.

v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
v4: use consistent documentation (Raag)
v5: add more documentation

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-8-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/xe_survivability: Add support for Runtime survivability mode

Certain runtime firmware errors can cause the device to be in a unusable
state requiring a firmware flash to restore normal operation.
Runtime Survivability Mode indicates firmware flash is necessary by
wedging the device and exposing survivability mode sysfs.

The below sysfs is an indication that device is in survivability mode

/sys/bus/pci/devices/<device>/survivability_mode

v2: Fix kernel-doc (Umesh)
v3: Add user friendly dmesg (Frank)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-7-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/xe_survivability: Refactor survivability mode

Refactor survivability mode code to support both boot
and runtime survivability.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add a helper function to set recovery method

Add a helper function to set recovery method. The recovery
method can be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set.

v2: fix documentation (Raag)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Set GT as wedged before sending wedged uevent

Userspace should be notified after setting the device as wedged.
Re-order function calls to set gt wedged before sending uevent.

Cc: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-4-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm: Add a vendor-specific recovery method to drm device wedged uevent

Address the need for a recovery method (firmware flash on Firmware errors)
introduced in the later patches of Xe KMD.
Whenever XE KMD detects a firmware error, a firmware flash is required to
recover the device to normal operation.

The initial proposal to use 'firmware-flash' as a recovery method was
not applicable to other drivers and could cause multiple recovery
methods specific to vendors to be added.
To address this a more generic 'vendor-specific' method is introduced,
guiding users to refer to vendor specific documentation and system logs
for detailed vendor specific recovery procedure.

Add a recovery method 'WEDGED=vendor-specific' for such errors.
Vendors must provide additional recovery documentation if this method
is used.

It is the responsibility of the consumer to refer to the correct vendor
specific documentation and usecase before attempting a recovery.

For example: If driver is XE KMD, the consumer must refer
to the documentation of 'Device Wedging' under 'Documentation/gpu/xe/'.

v2: fix documentation (Raag)
v3: add more details to commit message (Sima, Rodrigo, Raag)
add an example script to the documentation (Raag)
v4: use consistent naming (Raag)
v5: fix commit message
v6: add more documentation

Cc: André Almeida <andrealmeid@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20250826063419.3022216-3-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add documentation for Xe Device Wedging

Add documentation for Xe Device Wedging so that
file can be referenced in following patches.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-2-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/uapi: Add UAPI for querying VMA count and memory attributes

Introduce the DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS ioctl to allow
userspace to query memory attributes of VMAs within a user specified
virtual address range.

Userspace first calls the ioctl with num_mem_ranges = 0,
sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve
the number of memory ranges (vmas) and size of each memory range attribute.
Then, it allocates a buffer of that size and calls the ioctl again to fill
the buffer with memory range attributes.

This two-step interface allows userspace to first query the required
buffer size, then retrieve detailed attributes efficiently.

v2 (Matthew Brost)
- Use same ioctl to overload functionality

v3
- Add kernel-doc

v4
- Make uapi future proof by passing struct size (Matthew Brost)
- make lock interruptible (Matthew Brost)
- set reserved bits to zero (Matthew Brost)
- s/__copy_to_user/copy_to_user (Matthew Brost)
- Avod using VMA term in uapi (Thomas)
- xe_vm_put(vm) is missing (Shuicheng)

v5
- Nits
- Fix kernel-doc

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-21-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe: Enable madvise ioctl for xe

Ioctl enables setting up of memory attributes in user provided range.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-20-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe: Reset VMA attributes to default in SVM garbage collector

Restore default memory attributes for VMAs during garbage collection
if they were modified by madvise. Reuse existing VMA if fully overlapping;
otherwise, allocate a new mirror VMA.

v2 (Matthew Brost)
- Add helper for vma split
- Add retry to get updated vma

v3
- Rebase on gpuvm layer

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-19-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/vm: Add helper to check for default VMA memory attributes

Introduce a new helper function `xe_vma_has_default_mem_attrs()` to
determine whether a VMA's memory attributes are set to their default
values. This includes checks for atomic access, PAT index, and preferred
location.

Also, add a new field `default_pat_index` to `struct xe_vma_mem_attr`
to track the initial PAT index set during the first bind. This helps
distinguish between default and user-modified pat index, such as those
changed via madvise.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-18-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/madvise: Skip vma invalidation if mem attr are unchanged

If a VMA within the madvise input range already has the same memory
attribute as the one requested by the user, skip PTE zapping for that
VMA to avoid unnecessary invalidation.

v2 (Matthew Brost)
- fix skip_invalidation for new attributes
- s/u32/bool
- Remove unnecessary assignment for kzalloc'ed

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-17-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/bo: Update atomic_access attribute on madvise

Update the bo_atomic_access based on user-provided input and determine
the migration to smem during a CPU fault

v2 (Matthew Brost)
- Avoid cpu unmapping if bo is already in smem
- check atomics on smem too for ioctl
- Add comments

v3
- Avoid migration in prefetch

v4 (Matthew Brost)
- make sanity check function bool
- add assert for smem placement
- fix doc

v5 (Matthew Brost)
- NACK atomic fault with DRM_XE_ATOMIC_CPU

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-16-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/bo: Add attributes field to xe_bo

A single BO can be linked to multiple VMAs, making VMA attributes
insufficient for determining the placement and PTE update attributes
of the BO. To address this, an attributes field has been added to the
BO.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-15-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/svm: Consult madvise preferred location in prefetch

When prefetch region is DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC, prefetch svm
ranges to preferred location provided by madvise.

v2 (Matthew Brost)
- Fix region, devmem_fd usages
- consult madvise is applicable for other vma's too.

v3
- Fix atomic handling

v4
- Fix xe_svm_range_validate to check for
DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC too.

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-14-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/uapi: Add flag for consulting madvise hints on svm prefetch

Introduce flag DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC to ensure prefetching
in madvise-advised memory regions

v2 (Matthew Brost)
- Add kernel-doc

v3 (Matthew Brost)
- Fix kernel-doc

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-13-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/svm: Support DRM_XE_SVM_MEM_RANGE_ATTR_PAT memory attribute

This attributes sets the pat_index for the svm used vma range, which is
utilized to ascertain the coherence.

v2 (Matthew Brost)
- Pat index sanity check

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-12-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/madvise: Update migration policy based on preferred location

When the user sets the valid devmem_fd as a preferred location, GPU fault
will trigger migration to tile of device associated with devmem_fd.

If the user sets an invalid devmem_fd the preferred location is current
placement(smem) only.

v2(Matthew Brost)
- Default should be faulting tile
- remove devmem_fd used as region

v3 (Matthew Brost)
- Add migration_policy
- Fix return condition
- fix migrate condition

v4
-Rebase

v5
- Add check for userptr and bo based vmas

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-11-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/svm: Add svm ranges migration policy on atomic access

If the platform does not support atomic access on system memory, and the
ranges are in system memory, but the user requires atomic accesses on
the VMA, then migrate the ranges to VRAM. Apply this policy for prefetch
operations as well.

v2
- Drop unnecessary vm_dbg

v3 (Matthew Brost)
- fix atomic policy
- prefetch shouldn't have any impact of atomic
- bo can be accessed from vma, avoid duplicate parameter

v4 (Matthew Brost)
- Remove TODO comment
- Fix comment
- Dont allow gpu atomic ops when user is setting atomic attr as CPU

v5 (Matthew Brost)
- Fix atomic checks
- Add userptr checks

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-10-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe: Implement madvise ioctl for xe

This driver-specific ioctl enables UMDs to control the memory attributes
for GPU VMAs within a specified input range. If the start or end
addresses fall within an existing VMA, the VMA is split accordingly. The
attributes of the VMA are modified as provided by the users. The old
mappings of the VMAs are invalidated, and TLB invalidation is performed
if necessary.

v2(Matthew brost)
- xe_vm_in_fault_mode can't be enabled by Mesa, hence allow ioctl in non
fault mode too
- fix tlb invalidation skip for same ranges in multiple op
- use helper for tlb invalidation
- use xe_svm_notifier_lock/unlock helper
- s/lockdep_assert_held/lockdep_assert_held_write
- Add kernel-doc

v3(Matthew Brost)
- make vfunc fail safe
- Add sanitizing input args before vfunc

v4(Matthew Brost/Shuicheng)
- Make locks interruptable
- Error handling fixes
- vm_put fixes

v5(Matthew Brost)
- Flush garbage collector before any locking.
- Add check for null vma

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-9-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping

Introduce xe_svm_ranges_zap_ptes_in_range(), a function to zap page table
entries (PTEs) for all SVM ranges within a user-specified address range.

-v2 (Matthew Brost)
Lock should be called even for tlb_invalidation

v3(Matthew Brost)
- Update comment
- s/notifier->itree.start/drm_gpusvm_notifier_start
- s/notifier->itree.last + 1/drm_gpusvm_notifier_end
- use WRITE_ONCE

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-8-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise

In the case of the MADVISE ioctl, if the start or end addresses fall
within a VMA and existing SVM ranges are present, remove the existing
SVM mappings. Then, continue with ops_parse to create new VMAs by REMAP
unmapping of old one.

v2 (Matthew Brost)
- Use vops flag to call unmapping of ranges in vm_bind_ioctl_ops_parse
- Rename the function

v3
- Fix doc

v4
- check if range is already in garbage collector (Matthew Brost)

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-7-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/svm: Split system allocator vma incase of madvise call

If the start or end of input address range lies within system allocator
vma split the vma to create new vma's as per input range.

v2 (Matthew Brost)
- Add lockdep_assert_write for vm->lock
- Remove unnecessary page aligned checks
- Add kerrnel-doc and comments
- Remove unnecessary unwind_ops and return

v3
- Fix copying of attributes

v4
- Nit fixes

v5
- Squash identifier for madvise in xe_vma_ops to this patch

v6/v7/v8
- Rebase on drm_gpuvm changes

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-6-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/vma: Modify new_vma to accept struct xe_vma_mem_attr as parameter

This change simplifies the logic by ensuring that remapped previous or
next VMAs are created with the same memory attributes as the original VMA.
By passing struct xe_vma_mem_attr as a parameter, we maintain consistency
in memory attributes.

-v2
*dst = *src (Matthew Brost)

-v3 (Matthew Brost)
Drop unnecessary helper
pass attr ptr as input to new_vma and vma_create

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-5-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/vma: Move pat_index to vma attributes

The PAT index determines how PTEs are encoded and can be modified by
madvise. Therefore, it is now part of the vma attributes.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-4-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/vm: Add attributes struct as member of vma

The attribute of xe_vma will determine the migration policy and the
encoding of the page table entries (PTEs) for that vma.
This attribute helps manage how memory pages are moved and how their
addresses are translated. It will be used by madvise to set the
behavior of the vma.

v2 (Matthew Brost)
- Add docs

v3 (Matthew Brost)
- Add uapi references
- 80 characters line wrap

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

drm/xe/uapi: Add madvise interface

This commit introduces a new madvise interface to support
driver-specific ioctl operations. The madvise interface allows for more
efficient memory management by providing hints to the driver about the
expected memory usage and pte update policy for gpuvma.

v2 (Matthew/Thomas)
- Drop num_ops support
- Drop purgeable support
- Add kernel-docs
- IOWR/IOW

v3 (Matthew/Thomas)
- Reorder attributes
- use __u16 for migration_policy
- use __u64 for reserved in unions
- Avoid usage of vma

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>