Shuicheng Lin [Thu, 22 Jan 2026 21:40:54 +0000 (21:40 +0000)]
drm/xe: Skip address copy for sync-only execs
For parallel exec queues, xe_exec_ioctl() copied the batch buffer address
array from userspace without checking num_batch_buffer.
If user creates a sync-only exec that doesn't use the address field, the
exec will fail with -EFAULT.
Add num_batch_buffer check to skip the copy, and the exec could be executed
successfully.
Here is the sync-only exec:
struct drm_xe_exec exec = {
.extensions = 0,
.exec_queue_id = qid,
.num_syncs = 1,
.syncs = (uintptr_t)&sync,
.address = 0, /* ignored for sync-only */
.num_batch_buffer = 0, /* sync-only */
};
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260122214053.3189366-2-shuicheng.lin@intel.com
Nitin Gote [Tue, 20 Jan 2026 05:47:25 +0000 (11:17 +0530)]
drm/xe: derive mem copy capability from graphics version
Drop .has_mem_copy_instr from the platform descriptors and set it
in xe_info_init() after handle_gmdid() populates graphics_verx100.
Centralizing the GRAPHICS_VER(xe) >= 20 check keeps MEM_COPY enabled
on Xe2+ and removes redundant per-platform plumbing.
Bspec: 57561
Fixes: 1e12dbae9d72 ("drm/xe/migrate: support MEM_COPY instruction") Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patch.msgid.link/20260120054724.1982608-2-nitin.r.gote@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Sanjay Yadav [Wed, 21 Jan 2026 11:14:17 +0000 (16:44 +0530)]
drm/xe: Use DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations
The VRAM/stolen memory managers do not currently set
DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations. Enabling
this flag activates the buddy allocator's try_harder path, which helps
handle fragmented memory scenarios.
This enables the __alloc_contig_try_harder fallback in the buddy
allocator, allowing contiguous allocation requests to succeed even when
memory is fragmented by combining allocations from both(RHS and LHS)
sides of a large free block.
v2: (Matt B)
- Remove redundant logic for rounding allocation size and trimming when
TTM_PL_FLAG_CONTIGUOUS is set, since drm_buddy now handles this when
DRM_BUDDY_CONTIGUOUS_ALLOCATION is enabled
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6713 Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260121111416.3104399-2-sanjay.kumar.yadav@intel.com
Thomas Hellström [Wed, 21 Jan 2026 09:10:48 +0000 (10:10 +0100)]
drm/xe: Select CONFIG_DEVICE_PRIVATE when DRM_XE_GPUSVM is selected
CONFIG_DEVICE_PRIVATE is a prerequisite for DRM_XE_GPUSVM.
Explicitly select it so that DRM_XE_GPUSVM is not unintentionally
left out from distro configs not explicitly enabling
CONFIG_DEVICE_PRIVATE.
v2:
- Select also CONFIG_ZONE_DEVICE since it's needed by
CONFIG_DEVICE_PRIVATE.
v3:
- Depend on CONFIG_ZONE_DEVICE rather than selecting it.
Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260121091048.41371-3-thomas.hellstrom@linux.intel.com
Thomas Hellström [Wed, 21 Jan 2026 09:10:47 +0000 (10:10 +0100)]
drm, drm/xe: Fix xe userptr in the absence of CONFIG_DEVICE_PRIVATE
CONFIG_DEVICE_PRIVATE is not selected by default by some distros,
for example Fedora, and that leads to a regression in the xe driver
since userptr support gets compiled out.
It turns out that DRM_GPUSVM, which is needed for xe userptr support
compiles also without CONFIG_DEVICE_PRIVATE, but doesn't compile
without CONFIG_ZONE_DEVICE.
Exclude the drm_pagemap files from compilation with !CONFIG_ZONE_DEVICE,
and remove the CONFIG_DEVICE_PRIVATE dependency from CONFIG_DRM_GPUSVM and
the xe driver's selection of it, re-enabling xe userptr for those configs.
v2:
- Don't compile the drm_pagemap files unless CONFIG_ZONE_DEVICE is set.
- Adjust the drm_pagemap.h header accordingly.
Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.18+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/20260121091048.41371-2-thomas.hellstrom@linux.intel.com
Matthew Auld [Tue, 20 Jan 2026 11:06:11 +0000 (11:06 +0000)]
drm/xe/migrate: fix job lock assert
We are meant to be checking the user vm for the bind queue, but actually
we are checking the migrate vm. For various reasons this is not
currently firing but this will likely change in the future.
Now that we have the user_vm attached to the bind queue, we can fix this
by directly checking that here.
Fixes: dba89840a920 ("drm/xe: Add GT TLB invalidation jobs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260120110609.77958-4-matthew.auld@intel.com
Matthew Auld [Tue, 20 Jan 2026 11:06:10 +0000 (11:06 +0000)]
drm/xe/uapi: disallow bind queue sharing
Currently this is very broken if someone attempts to create a bind
queue and share it across multiple VMs. For example currently we assume
it is safe to acquire the user VM lock to protect some of the bind queue
state, but if allow sharing the bind queue with multiple VMs then this
quickly breaks down.
To fix this reject using a bind queue with any VM that is not the same
VM that was originally passed when creating the bind queue. This a uAPI
change, however this was more of an oversight on kernel side that we
didn't reject this, and expectation is that userspace shouldn't be using
bind queues in this way, so in theory this change should go unnoticed.
Based on a patch from Matt Brost.
v2 (Matt B):
- Hold the vm lock over queue create, to ensure it can't be closed as
we attach the user_vm to the queue.
- Make sure we actually check for NULL user_vm in destruction path.
v3:
- Fix error path handling.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Link: https://patch.msgid.link/20260120110609.77958-3-matthew.auld@intel.com
Matthew Brost [Fri, 16 Jan 2026 22:17:31 +0000 (14:17 -0800)]
drm/xe: Add context-based invalidation to GuC TLB invalidation backend
Introduce context-based invalidation support to the GuC TLB invalidation
backend. This is implemented by iterating over each exec queue per GT
within a VM, skipping inactive queues, and issuing a context-based (GuC
ID) H2G TLB invalidation. All H2G messages, except the final one, are
sent with an invalid seqno, which the G2H handler drops to ensure the
TLB invalidation fence is only signaled once all H2G messages are
completed.
A watermark mechanism is also added to switch between context-based TLB
invalidations and full device-wide invalidations, as the return on
investment for context-based invalidation diminishes when many exec
queues are mapped.
v2:
- Fix checkpatch warnings
v3:
- Rebase on PRL
- Use ref counting to avoid racing with deregisters
v4:
- Extra braces (Stuart)
- Use per GT list (CI)
- Reorder put
Matthew Brost [Fri, 16 Jan 2026 22:17:30 +0000 (14:17 -0800)]
drm/xe: Add exec queue active vfunc
If an exec queue is inactive (e.g., not registered or scheduling is
disabled), TLB invalidations are not issued for that queue. Add a
virtual function to determine the active state, which TLB invalidation
logic can hook into.
Matthew Brost [Fri, 16 Jan 2026 22:17:29 +0000 (14:17 -0800)]
drm/xe: Add xe_tlb_inval_idle helper
Introduce the xe_tlb_inval_idle helper to detect whether any TLB
invalidations are currently in flight. This is used in context-based TLB
invalidations to determine whether dummy TLB invalidations need to be
sent to maintain proper TLB invalidation fence ordering..
v2:
- Implement xe_tlb_inval_idle based on pending list
Matthew Brost [Fri, 16 Jan 2026 22:17:28 +0000 (14:17 -0800)]
drm/xe: Add send_tlb_inval_ppgtt helper
Extract the common code that issues a TLB invalidation H2G for PPGTTs
into a helper function. This helper can be reused for both ASID-based
and context-based TLB invalidations.
Matthew Brost [Fri, 16 Jan 2026 22:17:27 +0000 (14:17 -0800)]
drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt
Context-based TLB invalidations have their own set of GuC TLB
invalidation operations. Rename the current PPGTT invalidation function,
which operates on ASIDs, to a more descriptive name that reflects its
purpose.
Matthew Brost [Fri, 16 Jan 2026 22:17:25 +0000 (14:17 -0800)]
drm/xe: Add vm to exec queues association
Maintain a list of exec queues per vm which will be used by TLB
invalidation code to do context-ID based tlb invalidations.
v4:
- More asserts (Stuart)
- Per GT list (CI)
- Skip adding / removal if context TLB invalidatiions not supported
(Stuart)
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-6-matthew.brost@intel.com
Matthew Brost [Fri, 16 Jan 2026 22:17:24 +0000 (14:17 -0800)]
drm/xe: Add xe_device_asid_to_vm helper
Introduce the xe_device_asid_to_vm helper, which can be used throughout
the driver to resolve the VM from a given ASID.
v4:
- Move forward declare after includes (Stuart)
Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-5-matthew.brost@intel.com
Matthew Brost [Fri, 16 Jan 2026 22:17:22 +0000 (14:17 -0800)]
drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT
Ensure the asid_to_vm lookup is reclaim-safe so it can be performed
during TLB invalidations, which is necessary for context-based TLB
invalidation support.
Matthew Brost [Fri, 16 Jan 2026 22:17:21 +0000 (14:17 -0800)]
drm/xe: Add normalize_invalidation_range
Extract the code that determines the alignment of TLB invalidation into
a helper function — normalize_invalidation_range. This will be useful
when adding context-based invalidations to the GuC TLB invalidation
backend.
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Tested-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260116221731.868657-2-matthew.brost@intel.com
Matthew Brost [Fri, 16 Jan 2026 22:03:32 +0000 (14:03 -0800)]
drm/xe: Ban entire multi-queue group on any job timeout
In multi-queue mode, we only have control over the entire group, so we
cannot ban individual queues or signal fences until the whole group is
removed from hardware. Implement banning of the entire group if any job
within it times out.
v2:
- Fix lock inversion (Niranjana)
- Initialize new queues in group to stopped
v3:
- Blindly call xe_exec_queue_multi_queue_primary (Niranjana)
- More comments around temporary list when stopping (Niranjana)
- Restart group on false timeout (Niranjana)
Nakshtra Goyal [Tue, 13 Jan 2026 09:19:28 +0000 (14:49 +0530)]
drm/xe/xe_query: Remove check for gt
There's no need to check a userspace-provided GT ID (which may come from
any tile) against the number of GTs that can be present on a single
tile. The xe_device_get_gt() lookup already checks that the GT ID passed
is valid for the current device.(Matt Roper)
Matthew Brost [Wed, 14 Jan 2026 18:49:05 +0000 (10:49 -0800)]
drm/xe: Reduce LRC timestamp stuck message on VFs to notice
An LRC timestamp getting stuck is a somewhat normal occurrence. If a
single VF submits a job that does not get timesliced, the LRC timestamp
will not increment. Reduce the LRC timestamp stuck message on VFs to
notice (same log level as job timeout) to avoid false CI bugs in tests
where a VF submits a job that does not get timesliced.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7032 Fixes: bb63e7257e63 ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR") Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260114184905.4189026-1-matthew.brost@intel.com
Matt Roper [Thu, 15 Jan 2026 03:28:02 +0000 (19:28 -0800)]
drm/xe: Cleanup unused header includes
clangd reports many "unused header" warnings throughout the Xe driver.
Start working to clean this up by removing unnecessary includes in our
.c files and/or replacing them with explicit includes of other headers
that were previously being included indirectly.
By far the most common offender here was unnecessary inclusion of
xe_gt.h. That likely originates from the early days of xe.ko when
xe_mmio did not exist and all register accesses, including those
unrelated to GTs, were done with GT functions.
There's still a lot of additional #include cleanup that can be done in
the headers themselves; that will come as a followup series.
v2:
- Squash the 79-patch series down to a single patch. (MattB)
Michal Wajdeczko [Mon, 12 Jan 2026 18:37:16 +0000 (19:37 +0100)]
drm/xe/mert: Improve handling of MERT CAT errors
All MERT catastrophic errors but VF's LMTT fault are serious, so
we shouldn't limit our handling only to print debug messages.
Change CATERR message to error level and then declare the device
as wedged to match expectation from the design document. For the
LMTT faults, add a note about adding tracking of this unexpected
VF activity.
While at it, rename register fields defnitions to match the BSpec.
Also drop trailing include guard name from the regs.h file.
Fei Yang [Mon, 12 Jan 2026 22:03:30 +0000 (14:03 -0800)]
drm/xe: vram addr range is expanded to bit[17:8]
The bit field used to be [14:8] with [17:15] marked as SPARE and
defaulted to 0. So, simply expand the read to bit[17:8] assuming
the platforms using only bit[14:8] have zeros in the expanded bits.
Marco Crivellari [Mon, 12 Jan 2026 09:44:06 +0000 (10:44 +0100)]
drm/xe: Replace use of system_wq with tlb_inval->timeout_wq
This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:
This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.
After a carefully evaluation, because this is the fence signaling path, we
changed the code in order to use one of the Xe's workqueue.
So, a new workqueue named 'timeout_wq' has been added to
'struct xe_tlb_inval' and has been initialized with 'gt->ordered_wq'
changing the system_wq uses with tlb_inval->timeout_wq.
Karthik Poosa [Mon, 12 Jan 2026 20:35:21 +0000 (02:05 +0530)]
drm/xe/hwmon: Expose individual VRAM channel temperature
Expose individual VRAM temperature attributes.
Update Xe hwmon documentation for this entry.
v2:
- Avoid using default switch case for VRAM individual temperatures.
- Append labels with VRAM channel number.
- Update kernel version in Xe hwmon documentation.
v3:
- Add missing brackets in Xe hwmon documentation from VRAM channel sysfs.
- Reorder BMG_VRAM_TEMPERATURE_N macro in xe_pcode_regs.h.
- Add api to check if VRAM is available on the channel.
v4:
- Improve VRAM label handling to eliminate temp variable by
introducing a dedicated array vram_label in xe_hwmon_thermal_info.
- Remove a magic number.
- Change the label from vram_X to vram_ch_X.
v5:
- Address review comments from Raag.
- Change vram to VRAM in commit title and subject.
- Refactor BMG_VRAM_TEMPERATURE_N macro.
- Refactor is_vram_ch_available().
- Rephrase a comment.
- Check individual VRAM temperature limits in addition to VRAM
availability in xe_hwmon_temp_is_visible. (Raag)
- Move VRAM label change out of this patch.
v6:
- Use in_range() for VRAM_N index check instead of if check. (Raag)
- Minor aesthetic changes.
Karthik Poosa [Mon, 12 Jan 2026 20:35:20 +0000 (02:05 +0530)]
drm/xe/hwmon: Expose GPU PCIe temperature
Expose GPU PCIe average temperature and its limits via hwmon sysfs entry
temp5_xxx.
Update Xe hwmon sysfs documentation for this.
v2: Update kernel version in Xe hwmon documentation. (Raag)
v3:
- Address review comments from Raag.
- Remove redundant debug log.
- Update kernel version in Xe hwmon documentation. (Raag)
v4:
- Address review comments from Raag.
- Group new temperature attributes with existing temperature attributes
as per channel index in Xe hwmon documentation.
- Use TEMP_MASK instead of TEMP_MASK_MAILBOX.
- Add PCIE_SENSOR_MASK which uses REG_FIELD_GET as replacement of
PCIE_SENSOR_SHIFT.
v5:
- Address review comments from Raag.
- Use REG_FIELD_GET to get PCIe temperature.
- Move PCIE_SENSOR_GROUP_ID and PCIE_SENSOR_MASK to xe_pcode_api.h
- Cosmetic change.
Karthik Poosa [Mon, 12 Jan 2026 20:35:19 +0000 (02:05 +0530)]
drm/xe/hwmon: Expose memory controller temperature
Expose GPU memory controller average temperature and its limits under
temp4_xxx.
Update Xe hwmon documentation for this.
v2:
- Rephrase commit message. (Badal)
- Update kernel version in Xe hwmon documentation. (Raag)
v3:
- Update kernel version in Xe hwmon documentation.
- Address review comments from Raag.
- Remove obvious comments.
- Remove redundant debug logs.
- Remove unnecessary checks.
- Avoid magic numbers.
- Add new comments.
- Use temperature sensors count to make memory controller visible.
- Use temperature limits of package for memory controller.
v4:
- Address review comments from Raag.
- Group new temperature attributes with existing temperature attributes
as per channel index in Xe hwmon documentation.
- Use DIV_ROUND_UP to calculate dwords needed for temperature limits.
- Minor aesthetic refinements.
- Remove unused TEMP_MASK_MAILBOX.
v5:
- Use REG_FIELD_GET to get count from READ_THERMAL_DATA output. (Raag)
- Change count print from decimal to hexadecimal.
- Cosmetic changes.
Karthik Poosa [Mon, 12 Jan 2026 20:35:18 +0000 (02:05 +0530)]
drm/xe/hwmon: Expose temperature limits
Read temperature limits using pcode mailbox and expose shutdown
temperature limit as tempX_emergency, critical temperature limit as
tempX_crit and GPU max temperature limit as temp2_max.
Update Xe hwmon documentation with above entries.
v2:
- Resolve a documentation warning.
- Address below review comments from Raag.
- Update date and kernel version in Xe hwmon documentation.
- Remove explicit disable of has_mbx_thermal_info for unsupported
platforms.
- Remove unnecessary default case in switches.
- Remove obvious comments.
- Use TEMP_LIMIT_MAX to compute number of dwords needed in
xe_hwmon_thermal_info.
- Remove THERMAL_LIMITS_DWORDS macro.
- Use has_mbx_thermal_info for checking thermal mailbox support.
v3:
- Address below minor comments. (Raag)
- Group new temperature attributes with existing temperature attributes
as per channel index in Xe hwmon documentation.
- Rename enums of xe_temp_limit to improve clarity.
- Use DIV_ROUND_UP to calculate dwords needed for temperature limits.
- Use return instead of breaks in xe_hwmon_temp_read.
- Minor aesthetic refinements.
v4:
- Remove a redundant break. (Raag)
- Update drm_dbg to drm_warn to inform user of unavailability for
thermal mailbox on expected platforms.
drm/xe/gsc: Make GSC FW load optional for newer platforms
On newer platforms GSC FW is only required for content protection
features, so the core driver features work perfectly fine without it
(and we did in fact not enable it to start with on PTL). Therefore, we
can selectively enable the GSC only if the FW is found on disk, without
failing if it is not found.
Note that this means that the FW can now be enabled (i.e., we're looking
for it) but not available (i.e., we haven't found it), so checks on FW
support should use the latter state to decide whether to go on or not.
As part of the rework, the message for FW not found has been cleaned up
to be more readable.
While at it, drop the comment about xe_uc_fw_init() since the code has
been reworked and the statement no longer applies.
drm/xe/device: Convert wait for lmem init into an assert
Prior to lmem init check, driver is waiting for the pcode uncore_init
status. uncore_init status will be flagged after the complete boot and
initialization of the SoC by the pcode. uncore_init confirms that lmem
init and mmio unblock has been already completed.
It makes no sense to check for lmem init after the pcode uncore_init
check. So change the wait for lmem init check into an assert which
confirms lmem init is set.
A careful inspection of __xe_ggtt_insert_bo_at() shows that
the ggtt_node can always be seen as inserted from xe_bo.c
due to the way error handling is performed.
The checks are also a little bit too paranoid, since we
never create a bo with ggtt_node[id] initialised but not
inserted into the GGTT, which can be seen by looking at
__xe_ggtt_insert_bo_at()
Additionally, the size of the GGTT is never bigger than 4 GB,
so adding a check at that level is incorrect.
drm/xe: Convert xe_fb_pin to use a callback for insertion into GGTT
The rotation details belong in xe_fb_pin.c, while the operations involving
GGTT belong to xe_ggtt.c. As directly locking xe_ggtt etc results in
exposing all of xe_ggtt details anyway, create a special function that
allocates a ggtt_node, and allow display to populate it using a callback
as a compromise.
drm/xe: Start using ggtt->start in preparation of balloon removal
Instead of having ggtt->size point to the end of ggtt, have ggtt->size
be the actual size of the GGTT, and introduce ggtt->start to point to
the beginning of GGTT.
This will allow a massive cleanup of GGTT in case of SRIOV-VF.
drm/xe/mert: Use local mert variable to simplify the code
There is no need to always refer to MERT data using tile pointer.
Use of local mert pointer will simplify the code and make it look
like other existing MERT function.
Michal Wajdeczko [Sun, 11 Jan 2026 21:38:47 +0000 (22:38 +0100)]
drm/xe/mert: Always refer to MERT using xe_device
There is only one MERT instance and while it is located on the root
tile, it is safer to refer to it using xe_device rather than xe_tile.
This will also allow to align signature with other MERT function.
Matthew Brost [Sat, 10 Jan 2026 01:27:39 +0000 (17:27 -0800)]
drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR
We now have proper infrastructure to accurately check the LRC timestamp
without toggling the scheduling state for non-VFs. For VFs, it is still
possible to get an inaccurate view if the context is on hardware. We
guard against free-running contexts on VFs by banning jobs whose
timestamps are not moving. In addition, VFs have a timeslice quantum
that naturally triggers context switches when more than one VF is
running, thus updating the LRC timestamp.
For multi-queue, it is desirable to avoid scheduling toggling in the TDR
because this scheduling state is shared among many queues. Furthermore,
this change simplifies the GuC state machine. The trade-off for VF cases
seems worthwhile.
v5:
- Add xe_lrc_timestamp helper (Umesh)
v6:
- Reduce number of tries on stuck timestamp (VF testing)
- Convert job timestamp save to a memory copy (VF testing)
v7:
- Save ctx timestamp to LRC when start VF job (VF testing)
Matthew Brost [Sat, 10 Jan 2026 01:27:38 +0000 (17:27 -0800)]
drm/xe: Disable timestamp WA on VFs
The timestamp WA does not work on a VF because it requires reading MMIO
registers, which are inaccessible on a VF. This timestamp WA confuses
LRC sampling on a VF during TDR, as the LRC timestamp would always read
as 1 for any active context. Disable the timestamp WA on VFs to avoid
this confusion.
Matthew Brost [Sat, 10 Jan 2026 01:27:37 +0000 (17:27 -0800)]
drm/xe: Remove special casing for LR queues in submission
Now that LR jobs are tracked by the DRM scheduler, there's no longer a
need to special-case LR queues. This change removes all LR
queue-specific handling, including dedicated TDR logic, reference
counting schemes, and other related mechanisms.
Matthew Brost [Sat, 10 Jan 2026 01:27:36 +0000 (17:27 -0800)]
drm/xe: Do not deregister queues in TDR
Deregistering queues in the TDR introduces unnecessary complexity,
requiring reference-counting techniques to function correctly,
particularly to prevent use-after-free (UAF) issues while a
deregistration initiated from the TDR is in progress.
All that's needed in the TDR is to kick the queue off the hardware,
which is achieved by disabling scheduling. Queue deregistration should
be handled in a single, well-defined point in the cleanup path, tied to
the queue's reference count.
v4:
- Explain why extra ref were needed prior to this patch (Niranjana)
Matthew Brost [Sat, 10 Jan 2026 01:27:35 +0000 (17:27 -0800)]
drm/xe: Only toggle scheduling in TDR if GuC is running
If the firmware is not running during TDR (e.g., when the driver is
unloading), there's no need to toggle scheduling in the GuC. In such
cases, skip this step.
Matthew Brost [Sat, 10 Jan 2026 01:27:34 +0000 (17:27 -0800)]
drm/xe: Stop abusing DRM scheduler internals
Use new pending job list iterator and new helper functions in Xe to
avoid reaching into DRM scheduler internals.
Part of this change involves removing pending jobs debug information
from debugfs and devcoredump. As agreed, the pending job list should
only be accessed when the scheduler is stopped. However, it's not
straightforward to determine whether the scheduler is stopped from the
shared debugfs/devcoredump code path. Additionally, the pending job list
provides little useful information, as pending jobs can be inferred from
seqnos and ring head/tail positions. Therefore, this debug information
is being removed.
v4:
- Add comment around DRM_GPU_SCHED_STAT_NO_HANG (Niranjana)
Xin Wang [Fri, 9 Jan 2026 09:30:06 +0000 (09:30 +0000)]
drm/xe: Allow compressible surfaces to be 1-way coherent
Previously, compressible surfaces were required to be non-coherent
(allocated as WC) because compression and coherency were mutually
exclusive. Starting with Xe3, hardware supports combining compression
with 1-way coherency, allowing compressible surfaces to be allocated as
WB memory. This provides applications with more efficient memory
allocation by avoiding WC allocation overhead that can cause system
stuttering and memory management challenges.
The implementation adds support for compressed+coherent PAT entry for
the xe3_lpg devices and updates the driver logic to handle the new
compression capabilities.
v2: (Matthew Auld)
- Improved error handling with XE_IOCTL_DBG()
- Enhanced documentation and comments
- Fixed xe_bo_needs_ccs_pages() outdated compression assumptions
v3:
- Improve WB compression support detection by checking PAT table
instead of version check
v4:
- Add XE_CACHE_WB_COMPRESSION, which simplifies the logic.
v5:
- Use U16_MAX for the invalid PAT index. (Matthew Auld)
Bspec: 71582, 59361, 59399 Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Xin Wang <x.wang@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260109093007.546784-1-x.wang@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Jani Nikula [Wed, 7 Jan 2026 15:54:00 +0000 (17:54 +0200)]
drm/xe/vm: fix xe_vm_validation_exec() kernel-doc
Fix kernel-doc warnings on xe_vm_validation_exec():
Warning: ../drivers/gpu/drm/xe/xe_vm.h:392 expecting prototype for
xe_vm_set_validation_exec(). Prototype was for xe_vm_validation_exec()
instead
Fixes: 0131514f9789 ("drm/xe: Pass down drm_exec context to validation") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260107155401.2379127-4-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Fix kernel-doc warnings on struct guc_lfd_file_header:
Warning: ../drivers/gpu/drm/xe/abi/guc_lfd_abi.h:168 expecting prototype
for struct guc_logfile_header. Prototype was for struct
guc_lfd_file_header instead
Fixes: 7eeb0e5408bd ("drm/xe/guc: Add LFD related abi definitions") Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260107155401.2379127-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Brian Nguyen [Wed, 7 Jan 2026 01:04:52 +0000 (09:04 +0800)]
drm/xe: Add page reclamation related stats
Add page reclaim list (PRL) related stats to GT stats to assist in
debugging and tuning of page reclaim related actions. Include counters
of page sizes added to PRL and if PRL action is issued.
v2:
- Add PRL_ABORTED_COUNT stats and corresponding changes. (Matthew B)
Brian Nguyen [Wed, 7 Jan 2026 01:04:51 +0000 (09:04 +0800)]
drm/xe: Fix page reclaim entry handling for large pages
For 64KB pages, XE_PTE_PS64 is defined for all consecutive 4KB pages and
are all considered leaf nodes, so existing check was falsely adding
multiple 64KB pages to PRL.
For larger entries such as 2MB PDE, the check for pte->base.children is
insufficient since this array is always defined for page directory,
level 1 and above, so perform a check on the entry itself pointing to
the correct page.
For unmaps, if the range is properly covered by the page full directory,
page walker may finish without walking to the leaf nodes.
For example, a 1G range can be fully covered by 512 2MB pages if
alignment allows. In this case, the page walker will walk until
it reaches this corresponding directory which can correlate to the 1GB
range. Page walker will simply complete its walk and the individual 2MB
PDE leaves won't get accessed.
In this case, PRL invalidation is also required, so add a check to see if
pt entry cover the entire range since the walker will complete the walk.
There are possible race conditions that will cause driver to read a pte
that hasn't been written to yet. The 2 scenarios are:
- Another issued TLB invalidation such as from userptr or MMU notifier.
- Dependencies on original bind that has yet to be executed with an
unbind on that job.
The expectation is these race conditions are likely rare cases so simply
perform a fallback to full PPC flush invalidation instead.
v2:
- Reword commit and updated zero-pte handling. (Matthew B)
v3:
- Rework if statement for abort case with additional comments. (Matthew B)
Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind") Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260107010447.4125005-9-brian3.nguyen@intel.com
Brian Nguyen [Wed, 7 Jan 2026 01:04:50 +0000 (09:04 +0800)]
drm/xe: Add explicit abort page reclaim list
PRLs could be invalidated to indicate its getting dropped from current
scope but are still valid. So standardize calls and add abort to clearly
define when an invalidation is a real abort and PRL should fallback.
drm/xe: fix WQ_MEM_RECLAIM passed as max_active to alloc_workqueue()
Workqueue xe-ggtt-wq has been allocated using WQ_MEM_RECLAIM, but
the flag has been passed as 3rd parameter (max_active) instead
of 2nd (flags) creating the workqueue as per-cpu with max_active = 8
(the WQ_MEM_RECLAIM value).
So change this by set WQ_MEM_RECLAIM as the 2nd parameter with a
default max_active.
Fixes: 60df57e496e4 ("drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM") Cc: stable@vger.kernel.org Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108180148.423062-1-marco.crivellari@suse.com
Osama Abdelkader [Wed, 24 Dec 2025 21:21:16 +0000 (22:21 +0100)]
drm/xe: Add missing newlines to drm_warn messages
The drm_warn() calls in the default cases of various switch statements
in xe_vm.c were missing trailing newlines, which can cause log messages
to be concatenated with subsequent output. Add '\n' to all affected
messages.
Lukasz Laguna [Wed, 7 Jan 2026 17:47:41 +0000 (18:47 +0100)]
drm/xe/pf: Allow upon-any-hang wedged mode only in debug config
The GuC reset policy is global, so disabling it on PF can affect all
running VFs. To avoid unintended side effects, restrict setting
upon-any-hang (2) wedged mode on the PF to debug builds only.
Lukasz Laguna [Wed, 7 Jan 2026 17:47:39 +0000 (18:47 +0100)]
drm/xe: Update wedged.mode only after successful reset policy change
Previously, the driver's internal wedged.mode state was updated without
verifying whether the corresponding engine reset policy update in GuC
succeeded. This could leave the driver reporting a wedged.mode state
that doesn't match the actual reset behavior programmed in GuC.
With this change, the reset policy is updated first, and the driver's
wedged.mode state is modified only if the policy update succeeds on all
available GTs.
This patch also introduces two functional improvements:
- The policy is sent to GuC only when a change is required. An update
is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG,
because only in that case the reset policy changes. For example,
switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and
XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no
need to send the same value to GuC.
- An inconsistent_reset flag is added to track cases where reset policy
update succeeds only on a subset of GTs. If such inconsistency is
detected, future wedged mode configuration will force a retry of the
reset policy update to restore a consistent state across all GTs.
Lukasz Laguna [Wed, 7 Jan 2026 17:47:38 +0000 (18:47 +0100)]
drm/xe: Validate wedged_mode parameter and define enum for modes
Check correctness of the wedged_mode parameter input to ensure only
supported values are accepted. Additionally, replace magic numbers with
a clearly defined enum.
Matthew Brost [Wed, 7 Jan 2026 18:27:16 +0000 (10:27 -0800)]
drm/pagemap: Disable device-to-device migration
Device-to-device migration is causing xe_exec_system_allocator --r
*race*no* to intermittently fail with engine resets and a kernel hang on
a page lock. This should work but is clearly buggy somewhere. Disable
device-to-device migration in the interim until the issue can be
root-caused.
The only downside of disabling device-to-device migration is that memory
will bounce through system memory during migration. However, this path
should be rare, as it only occurs when madvise attributes are changed or
atomics are used.
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: ec265e1f1cfc ("drm/pagemap: Support source migration over interconnect") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20260107182716.2236607-3-matthew.brost@intel.com
Matthew Brost [Wed, 7 Jan 2026 20:57:32 +0000 (12:57 -0800)]
drm/xe: Adjust page count tracepoints in shrinker
Page accounting can change via the shrinker without calling
xe_ttm_tt_unpopulate(), which normally updates page count tracepoints
through update_global_total_pages. Add a call to
update_global_total_pages when the shrinker successfully shrinks a BO.
v2:
- Don't adjust global accounting when pinning (Stuart)
Matthew Brost [Tue, 6 Jan 2026 21:34:43 +0000 (13:34 -0800)]
drm/xe: Validate preferred system memory placement in xe_svm_range_validate
Ensure preferred system memory placement is checked in
xe_svm_range_validate when dpagemap is NULL. Without this check, a
prefetch to system memory may become a no-op because device memory is
considered a valid placement.
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 238dbc9d9f4a ("drm/xe: Use the vma attibute drm_pagemap to select where to migrate") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patch.msgid.link/20260106213443.1866797-1-matthew.brost@intel.com
Dave Airlie [Sat, 27 Dec 2025 07:17:39 +0000 (17:17 +1000)]
Merge tag 'drm-xe-next-2025-12-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
[airlied: fix guc submit double definition]
UAPI Changes:
- Multi-Queue support (Niranjana)
- Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE (Brost)
- Add NO_COMPRESSION BO flag and query capability (Sanjay)
- Add gt_id to struct drm_xe_oa_unit (Ashutosh)
- Expose MERT OA unit (Ashutosh)
- Sysfs Survivability refactor (Riana)
Cross-subsystem Changes:
- VFIO: Add device specific vfio_pci driver variant for Intel graphics (Winiarski)
Driver Changes:
- MAINTAINERS update (Lucas -> Matt)
- Add helper to query compression enable status (Xin)
- Xe_VM fixes and updates (Shuicheng, Himal)
- Documentation fixes (Winiarski, Swaraj, Niranjana)
- Kunit fix (Roper)
- Fix potential leaks, uaf, null derref, and oversized
allocations (Shuicheng, Sanjay, Mika, Tapani)
- Other minor fixes like kbuild duplication and sysfs_emit (Shuicheng, Madhur)
- Handle msix vector0 interrupt (Venkata)
- Scope-based forcewake and runtime PM (Roper, Raag)
- GuC/HuC related fixes and refactors (Lucas, Zhanjun, Brost, Julia, Wajdeczko)
- Fix conversion from clock ticks to milliseconds (Harish)
- SRIOV PF PF: Add support for MERT (Lukasz)
- Enable SR-IOV VF migration and other SRIOV updates (Winiarski,
Satya, Brost, Wajdeczko, Piotr, Tomasz, Daniele)
- Optimize runtime suspend/resume and other PM improvements (Raag)
- Some W/a additions and updates (Bala, Harish, Roper)
- Use for_each_tlb_inval() to calculate invalidation fences (Roper)
- Fix VFIO link error (Arnd)
- Fix ix drm_gpusvm_init() arguments (Arnd)
- Other OA refactor (Ashutosh)
- Refactor PAT and expose debugfs (Xin)
- Enable Indirect Ring State for xe3p_xpc (Niranjana)
- MEI interrupt fix (Junxiao)
- Add stats for mode switching on hw_engine_group (Francois)
- DMA-Buf related changes (Thomas)
- Multi Queue feature support (Niranjana)
- Enable I2C controller for Crescent Island (Raag)
- Enable NVM for Crescent Island (Sasha)
- Increase TDF timeout (Jagmeet)
- Restore engine registers before restarting schedulers after GT reset (Jan)
- Page Reclamation Support for Xe3p Platforms (Brian, Brost, Oak)
- Fix performance when pagefaults and 3d/display share resources (Brost)
- More OA MERT work (Ashutosh)
- Fix return values (Dan)
- Some log level and messages improvements (Brost)
Dave Airlie [Sat, 27 Dec 2025 06:25:56 +0000 (16:25 +1000)]
Merge tag 'drm-intel-next-2025-12-19' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Beyond Display related:
- Switch to use kernel standard fault injection in i915 (Juha-Pekka)
Display uAPI related:
- Display uapi vs. hw state fixes (Ville)
- Expose sharpness only if num_scalers is >= 2 (Nemesa)
Display related:
- More display driver refactor and clean-ups, specially towards separation (Jani)
- Add initial support Xe3p_LPD for NVL (Gustavo, Sai, )
- BMG FBC W/a (Vinod)
- RPM fix (Dibin)
- Add MTL+ platforms to support dpll framework (Mika, Imre)
- Other PLL related fixes (Imre)
- Fix DIMM_S DRAM decoding on ICL (Ville)
- Async flip refactor (Ville, Jouni)
- Go back to using AUX interrupts (Ville)
- Reduce severity of failed DII FEC enabling (Grzelak)
- Enable system cache support for FBC (Vinod)
- Move PSR/Panel Replay sink data into intel_connector and other PSR changes (Jouni)
- Detect AuxCCS support via display parent interface (Tvrtko)
- Clean up link BW/DSC slice config computation(Imre)
- Toggle powerdown states for C10 on HDMI (Gustavo)
- Add parent interface for PC8 forcewake tricks (Ville)
- atomic: Add drm_device pointer to drm_private_obj
- bridge: Introduce drm_bridge_unplug, drm_bridge_enter, and
drm_bridge_exit
- dma-buf: Improve sg_table debugging
- dma-fence: Add new helpers, and use them when needed
- dp_mst: Avoid out-of-bounds access with VCPI==0
- gem: Reduce page table overhead with transparent huge pages
- panic: Report invalid panic modes
- sched: Add TODO entries
- ttm: Various cleanups
- vblank: Various refactoring and cleanups
- Kconfig cleanups
- Removed support for kdb
Driver Changes:
- amdxdna: Fix race conditions at suspend, Improve handling of zero
tail pointers, Fix cu_idx being overwritten during command setup
- ast: Support imported cursor buffers
-
- panthor: Enable timestamp propagation, Multiple improvements and
fixes to improve the overall robustness, notably of the scheduler.
- panels:
- panel-edp: Support for CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H
Lucas De Marchi [Fri, 19 Dec 2025 21:16:49 +0000 (13:16 -0800)]
drm/xe: Improve rebar log messages
Some minor improvements to the log messages in the rebar logic:
use xe-oriented printk, switch unit from M to MiB in a few places for
consistency and use ilog2(SZ_1M) for clarity.
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20251219211650.1908961-6-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Lucas De Marchi [Fri, 19 Dec 2025 21:16:48 +0000 (13:16 -0800)]
drm/xe: Move rebar to its own file
Now that xe_pci.c calls the rebar directly, it doesn't make sense to
keep it in xe_vram.c since it's closer to the PCI initialization than to
the VRAM. Move it to its own file.
While at it, add a better comment to document the possible values for
the vram_bar_size module parameter.
Calvin Owens [Mon, 22 Dec 2025 16:54:42 +0000 (11:54 -0500)]
drm/xe: Don't use absolute path in generated header comment
Building the XE driver through Yocto throws this QA warning:
WARNING: mc:house:linux-stable-6.17-r0 do_package_qa: QA Issue: File /usr/src/debug/linux-stable/6.17/drivers/gpu/drm/xe/generated/xe_device_wa_oob.h in package linux-stable-src contains reference to TMPDIR [buildpaths]
WARNING: mc:house:linux-stable-6.17-r0 do_package_qa: QA Issue: File /usr/src/debug/linux-stable/6.17/drivers/gpu/drm/xe/generated/xe_wa_oob.h in package linux-stable-src contains reference to TMPDIR [buildpaths]
...because the comment at the top of the generated header contains the
absolute path to the rules file at build time:
* This file was generated from rules: /home/calvinow/git/meta-house/build/tmp-house/work-shared/nuc14rvhu7/kernel-source/drivers/gpu/drm/xe/xe_device_wa_oob.rules
Fix this minor annoyance by putting the basename of the rules file in
the generated comment instead of the absolute path, so the generated
header contents no longer depend on the location of the kernel source.
Francois Dugast [Tue, 23 Dec 2025 11:53:27 +0000 (12:53 +0100)]
drm/xe/migrate: Configure migration queue as low latency
Commit 5488bec96bcc ("drm/xe/uapi: Use hint for guc to set GT frequency")
introduced low latency hint for use by user space when creating an exec
queue. This instructs SLPC to ramp the GT frequency aggressively.
SVM relies on an internal exec queue to migrate memory upon page faults.
This change creates this exec queue with the low latency hint to speed up
migration.
This should not impact systems where GT frequency is set over sysfs, or
with long running workloads which give enough time for the frequency to
ramp up. An example of memory access pattern that shows an improvement of
SVM performance is running hundreds of times IGT eu-fault-2m-once-device
in xe_exec_system_allocator. The copy duration provided by GT stats in
svm_2M_device_copy_us shows per GPU page fault:
~ 165 μs without low latency hint
~ 130 μs with low latency hint
Thomas Hellström [Fri, 19 Dec 2025 11:33:20 +0000 (12:33 +0100)]
drm/xe/svm: Serialize migration to device if racing
Introduce an rw-semaphore to serialize migration to device if
it's likely that migration races with another device migration
of the same CPU address space range.
This is a temporary fix to attempt to mitigate a livelock that
might happen if many devices try to migrate a range at the same
time, and it affects only devices using the xe driver.
A longer term fix is probably improvements in the core mm
migration layer.
Thomas Hellström [Fri, 19 Dec 2025 11:33:19 +0000 (12:33 +0100)]
drm/pagemap: Support source migration over interconnect
Support source interconnect migration by using the copy_to_ram() op
of the source device private pages.
Source interconnect migration is required to flush the L2 cache of
the source device, which among other things is a requirement for
correct global atomic operation. It also enables the source GPU to
potentially decompress any compressed content which is not
understood by peers, and finally for the PCIe case, it's expected
that writes over PCIe will be faster than reads.
The implementation can probably be improved by coalescing subregions
with the same source.
v5:
- Update waiting for the pre_migrate_fence and comments around that,
previously in another patch. (Himal).
- Actually select device private pages to migrate when
source_peer_migrates is true.
Thomas Hellström [Fri, 19 Dec 2025 11:33:18 +0000 (12:33 +0100)]
drm/pagemap, drm/xe: Support destination migration over interconnect
Support destination migration over interconnect when migrating from
device-private pages with the same dev_pagemap owner.
Since we now also collect device-private pages to migrate,
also abort migration if the range to migrate is already
fully populated with pages from the desired pagemap.
Finally return -EBUSY from drm_pagemap_populate_mm()
if the migration can't be completed without first migrating all
pages in the range to system. It is expected that the caller
will perform that before retrying the call to
drm_pagemap_populate_mm().
v3:
- Fix a bug where the p2p dma-address was never used.
- Postpone enabling destination interconnect migration,
since xe devices require source interconnect migration to
ensure the source L2 cache is flushed at migration time.
- Update the drm_pagemap_migrate_to_devmem() interface to
pass migration details.
v4:
- Define XE_INTERCONNECT_P2P unconditionally (CI)
- Include a missing header (CI)
v5:
- Use page order increments where possible (Matt Brost).
- Fix a negated value of can_migrate_same_pagemap.
- Move removal of some dead code to a separate patch (Matt Brost).
- Remove an unnecessary zdd get() and put() (Matt Brost).
Thomas Hellström [Fri, 19 Dec 2025 11:33:16 +0000 (12:33 +0100)]
drm/gpusvm: Introduce a function to scan the current migration state
With multi-device we are much more likely to have multiple
drm-gpusvm ranges pointing to the same struct mm range.
To avoid calling into drm_pagemap_populate_mm(), which is always
very costly, introduce a much less costly drm_gpusvm function,
drm_gpusvm_scan_mm() to scan the current migration state.
The device fault-handler and prefetcher can use this function to
determine whether migration is really necessary.
There are a couple of performance improvements that can be done
for this function if it turns out to be too costly. Those are
documented in the code.
v3:
- New patch.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-21-thomas.hellstrom@linux.intel.com