drm/xe: Defer buffer object shrinker write-backs and GPU waits
When the xe buffer-object shrinker allows GPU waits and write-back,
(typically from kswapd), perform multiple passes, skipping
subsequent passes if the shrinker number of scanned objects target
is reached.
1) Without GPU waits and write-back
2) Without write-back
3) With both GPU-waits and write-back
This is to avoid stalls and costly write- and readbacks unless they
are really necessary.
v2:
- Don't test for scan completion twice. (Stuart Summers)
- Update tags.
Reported-by: melvyn <melvyn2@dnsense.pub> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5557 Cc: Summers Stuart <stuart.summers@intel.com> Fixes: 00c8efc3180f ("drm/xe: Add a shrinker for xe bos") Cc: <stable@vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250805074842.11359-1-thomas.hellstrom@linux.intel.com
Matthew Auld [Thu, 31 Jul 2025 09:38:11 +0000 (10:38 +0100)]
drm/xe/migrate: prevent potential UAF
If we hit the error path, the previous fence (if there is one) has
already been put() prior to this, so doing a fence_wait could lead to
UAF. Tweak the flow to do to the put() until after we do the wait.
Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-8-matthew.auld@intel.com
Matthew Auld [Thu, 31 Jul 2025 09:38:10 +0000 (10:38 +0100)]
drm/xe/migrate: don't overflow max copy size
With non-page aligned copy, we need to use 4 byte aligned pitch, however
the size itself might still be close to our maximum of ~8M, and so the
dimensions of the copy can easily exceed the S16_MAX limit of the copy
command leading to the following assert:
WARNING: CPU: 23 PID: 10605 at drivers/gpu/drm/xe/xe_migrate.c:673 emit_copy+0x4b5/0x4e0 [xe]
To fix this account for the pitch when calculating the number of current
bytes to copy.
Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-7-matthew.auld@intel.com
Matthew Auld [Thu, 31 Jul 2025 09:38:09 +0000 (10:38 +0100)]
drm/xe/migrate: prevent infinite recursion
If the buf + offset is not aligned to XE_CAHELINE_BYTES we fallback to
using a bounce buffer. However the bounce buffer here is allocated on
the stack, and the only alignment requirement here is that it's
naturally aligned to u8, and not XE_CACHELINE_BYTES. If the bounce
buffer is also misaligned we then recurse back into the function again,
however the new bounce buffer might also not be aligned, and might never
be until we eventually blow through the stack, as we keep recursing.
Instead of using the stack use kmalloc, which should respect the
power-of-two alignment request here. Fixes a kernel panic when
triggering this path through eudebug.
Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-6-matthew.auld@intel.com
drm/xe/pf: Program LMTT directory pointer on all GTs within a tile
Previously, the LMTT directory pointer was only programmed for primary GT
within a tile. However, to ensure correct Local Memory access by VFs,
the LMTT configuration must be programmed on all GTs within the tile.
Lets program the LMTT directory pointer on every GT of the tile
to guarantee proper LMEM access across all GTs on VFs.
HSD: 18042797646
Bspec: 67468 Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250805091850.1508240-1-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Francois Dugast [Tue, 5 Aug 2025 13:59:07 +0000 (15:59 +0200)]
drm/xe/svm: Migrate folios when possible
The DMA mapping can now correspond to a folio (order > 0), so move
the iterator by the number of pages in the folio in order to migrate
all pages at once.
This requires forcing contiguous memory for SVM BOs, which greatly
simplifies the code and enables 2MB device page support, allowing a
major performance improvement. Negative effects like extra eviction
are unlikely as SVM BOs have a maximal size of 2MB.
Workaround to ensure all addresses are populated in the array as
this is expected when creating the copy batch. This is required
because the migrate layer does not support 2MB GPU pages yet. A
proper fix will come in a follow-up.
Francois Dugast [Tue, 5 Aug 2025 13:59:05 +0000 (15:59 +0200)]
drm/pagemap: Allocate folios when possible
If the order is greater than zero, allocate a folio when populating the
RAM PFNs instead of allocating individual pages one after the other. For
example if 2MB folios are used instead of 4KB pages, this reduces the
number of calls to the allocation API by 512.
v2:
- Use page order instead of extra argument (Matthew Brost)
- Allocate with folio_alloc() (Matthew Brost)
- Loop for mpages and free_pages based on order (Matthew Brost)
v3:
- Fix loops in drm_pagemap_migrate_populate_ram_pfn() (Matthew Brost)
v4:
- Use folio_trylock(), set local variable to NULL (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250805140028.599361-5-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Francois Dugast [Tue, 5 Aug 2025 13:59:04 +0000 (15:59 +0200)]
drm/pagemap: DMA map folios when possible
If the page is part of a folio, DMA map the whole folio at once instead of
mapping individual pages one after the other. For example if 2MB folios
are used instead of 4KB pages, this reduces the number of DMA mappings by
512.
The folio order (and consequently, the size) is persisted in the struct
drm_pagemap_device_addr to be available at the time of unmapping.
v2:
- Initialize order variable (Matthew Brost)
- Set proto and dir for completeness (Matthew Brost)
- Do not populate drm_pagemap_addr, document it (Matthew Brost)
- Add and use macro NR_PAGES(order) (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250805140028.599361-4-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Francois Dugast [Tue, 5 Aug 2025 13:59:03 +0000 (15:59 +0200)]
drm/pagemap: Use struct drm_pagemap_addr in mapping and copy functions
This struct embeds more information than just the DMA address. This will
help later to support folio orders greater than zero. At this point, there
is no functional change as the only struct member used is addr.
In Xe, adapt to the new drm_gpusvm_devmem_ops type signatures using struct
drm_pagemap_addr, as well as the internal xe SVM functions implementing
those operations. The use of this struct is propagated to xe_migrate as it
makes indexed accesses to the next DMA address but they are no longer
contiguous.
v2:
- Rename drm_pagemap_device_addr to drm_pagemap_addr (Matthew Brost)
- Squash with patch for Xe (Matthew Brost)
- Set proto and dir for completeness (Matthew Brost)
- Assess DMA map protocol (Matthew Brost)
Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250805140028.599361-3-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 21:21:45 +0000 (23:21 +0200)]
drm/xe/configfs: Allow adding configurations for future VFs
Since we are expecting that all configuration directory names
will match some of the existing devices, we can't provide any
configuration for the VFs until they are actually enabled.
But we can relax that restriction by just checking if there
is a PF device that could create given VF. This is easy since
all our PF devices are always present at function 0 and we can
query PF device for number of VFs it could support.
Then for some system with PF device at 0000:00:02.0 we can add
configs for all VFs:
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731212145.179898-1-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 19:33:38 +0000 (21:33 +0200)]
drm/xe/configfs: Only allow configurations for supported devices
Since we already lookup for the real PCI device before we allow
to create its directory config, we might also check if the found
device matches our driver PCI ID list. This will prevent creation
of the directory configs for the unsupported devices.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-11-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 19:33:36 +0000 (21:33 +0200)]
drm/xe/configfs: Keep default device config settings together
For easier maintenance add a placeholder where we can keep all
default device configuration settings in one place.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-9-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
This time it will hold just pure configuration parameters, without
any configfs related stuff. This will help us define defaults data
without wasting space for unneeded data.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-8-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
This helper name shouldn't suggest that it iss a part of the core
configfs API family. While around switch to use different helper
to release a reference.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-7-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 19:33:33 +0000 (21:33 +0200)]
drm/xe/configfs: Rename struct xe_config_device
Rename it to struct xe_config_group_device to better match its
purpose. It will also help us to reintroduce in the upcoming patch
the same struct name but this time to hold only configuration data.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-6-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 19:33:32 +0000 (21:33 +0200)]
drm/xe/configfs: Drop redundant init() error message
There is no need to print separate error message since we will
also print one in xe_init(). Also drop temporary variable, which
was likely just taken from the example code.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-5-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 19:33:31 +0000 (21:33 +0200)]
drm/xe/configfs: Destroy xe_configfs.su_mutex on exit/error
While mutex_destroy() is NOP when CONFIG_DEBUG_MUTEXES is not
enabled, we should still call it.
While around, drop a trailing line.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-4-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 19:33:30 +0000 (21:33 +0200)]
drm/xe: Print module init abort code
We should provide a hint to the user why the module refused to
load. This will also allow us to drop individual error messages
from init steps.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Thu, 31 Jul 2025 19:33:29 +0000 (21:33 +0200)]
drm/xe: Simplify module initialization code
There is no need to have extra checks and WARN() in the helpers
as instead of an index of the entry with function pointers, we
can pass pointer to the entry which we prepare directly in the
main loop, that is guaranteed to be valid.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250731193339.179829-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Presently, multiple versions of the guc_waklv_enable_.* function exist,
all with different numbers of dwords added to the klv_entry array. This
is not extensible, and more duplicates of the function will need to be
created if it ever becomes necessary to support 3 or more dwords per wa
in the future.
Consolidate the disparate guc_waklv_enable functions into a single
guc_waklv_enable function that can take an arbitrary number of dword
values.
v2:
- Update length value properly (Shuicheng)
v3: (Harrison)
- Use data as a term instead of dwords or arr
- Reformat warning message to use hex values
- Eliminate need for kzalloc and klv_entry array
- Reorder function parameters to fix line wrapping
v4:
- Miscellaneous formatting fixes (Cavitt)
v5: (Harrison)
- s/data_range/data_len_dw
- Use data_len_dw to calculate size for xe_map_memcpy_to
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Lucas De Marchi <lucas.demarch@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250728194806.68176-2-jonathan.cavitt@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:45 +0000 (05:10 +0200)]
drm/xe/vf: Rebase exec queue parallel commands during migration recovery
Parallel exec queues have an additional command streamer buffer which holds
a GGTT reference to data within context status. The GGTT references have to
be fixed after VF migration.
v2: Properly handle nop entry, verify if parsing goes ok
v3: Improve error/warn logging, add propagation of errors,
give names to magic offsets
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-9-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:44 +0000 (05:10 +0200)]
drm/xe/vf: Refresh utilization buffer during migration recovery
The WA buffer we use to capture context utilization contains GGTT
references. This means its instructions have to be either fixed or
re-emitted during VF post-migration recovery.
This patch adds re-emitting content of the utilization WA BB during
the recovery.
The way we write to vram requires scratch buffer to be used before
the whole block is memcopied. We are re-using a scratch buffer
introduced in earlier part of the recovery. This is not a performance
optimization, but a necessity to avoid creating dependencies between
locks.
v2: Notable rebase after "Prepare WA BB setup for more users" patch
v3: Added error propagation
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-8-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:43 +0000 (05:10 +0200)]
drm/xe/vf: Post migration, repopulate ring area for pending request
The commands within ring area allocated for a request may contain
references to GGTT. These references require update after VF
migration, in order to continue any preempted LRCs, or jobs which
were emitted to the ring but not sent to GuC yet.
This change calls the emit function again for all such jobs,
as part of post-migration recovery.
v2: Moved few functions to better files
v3: Take job_list_lock
v4: Rephrased comments
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-7-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:42 +0000 (05:10 +0200)]
drm/xe/vf: Rebase MEMIRQ structures for all contexts after migration
All contexts require an update of state data, as the data includes
GGTT references to memirq-related buffers.
Default contexts need these references updated as well, because they
are not refreshed when a new context is created from them.
The way we write to vram requires scratch buffer to be used
before the whole block is memcopied. Since using kalloc() within
specific recovery functions would lead to unintended relations
between locks, we are allocating the buffer earlier, before
any locks are taken. The same buffer will be used for other steps
of the recovery.
v2: Update addresses by xe_lrc_write_ctx_reg() rather than
set_memory_based_intr()
v3: Renamed parameter, reordered parameters in some functs
v4: Check if have MEMIRQ, move `xe_gt*` funct to proper file
v5: Revert back to requiring scratch buffer, but allocate it
earlier this time
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-6-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:41 +0000 (05:10 +0200)]
drm/xe/vf: Rebase HWSP of all contexts after migration
All contexts require an update due to GGTT range shift, as that
affects their HWSP.
The HW status page of a context contains GGTT references, which
need to be shifted to a new range (or re-computed using the
previously updated vma nodes). The references include ring start
address and indirect state address.
v2: move some functions to better matched files
v3: Add missing kerneldocs
v4: Style fix
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-5-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:40 +0000 (05:10 +0200)]
drm/xe: Block reset while recovering from VF migration
Resetting GuC during recovery could interfere with the recovery
process. Such reset might be also triggered without justification,
due to migration taking time, rather than due to the workload not
progressing.
Doing GuC reset during the recovery would cause exit of RESFIX state,
and therefore continuation of GuC work while fixups are still being
applied. To avoid that, reset needs to be blocked during the recovery.
This patch blocks the reset during recovery. Reset request in that
time range will be stalled, and unblocked only after GuC goes out
of RESFIX state.
In case a reset procedure already started while the recovery is
triggered, there isn't much we can do - we cannot wait for it to
finish as it involves waiting for hardware, and we can't be sure
at which exact point of the reset procedure the GPU got switched.
Therefore, the rare cases where migration happens while reset is
in progress, are still dangerous. Resets are not a part of the
standard flow, and cause unfinished workloads - that will happen
during the reset interrupted by migration as well, so it doesn't
diverge that much from what normally happens during such resets.
v2: Introduce a new atomic for reset blocking, as we cannot reuse
`stopped` atomic (that could lead to losing a workload).
v3: Switched atomic functs to ones which include proper barriers
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-4-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:39 +0000 (05:10 +0200)]
drm/xe/vf: Pause submissions during RESFIX fixups
While applying post-migration fixups to VF, GuC will not respond
to any commands. This means submissions have no way of finishing.
To avoid acquiring additional resources and then stalling
on hardware access, pause the submission work. This will
decrease the chance of depleting resources, and speed up
the recovery.
v2: Commented xe_irq_resume() call
v3: Typo fix
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-3-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Tomasz Lis [Sat, 2 Aug 2025 03:10:38 +0000 (05:10 +0200)]
drm/xe/sa: Avoid caching GGTT address within the manager
Non-virtualized resources require fixups after SRIOV VF migration.
Caching GGTT references rather than re-computing them from the
underlying Buffer Object is something we want to avoid, as such
code would require additional fixup step and additional locking
around all the places where the address is accessed.
This change removes the cached GPU address from the Sub-Allocation
Manager, and introduces a function which recomputes and returns
the address instead.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-2-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Simon Richter [Sat, 2 Aug 2025 02:40:36 +0000 (11:40 +0900)]
Mark xe driver as BROKEN if kernel page size is not 4kB
This driver, for the time being, assumes that the kernel page size is 4kB,
so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB
pages.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250802024152.3021-1-Simon.Richter@hogyros.de
drm/xe/pf: Don't resume device from restart worker
The PF's restart worker shouldn't attempt to resume the device on
its own, since its goal is to finish PF and VFs reprovisioning on
the recently reset GuC. Take extra RPM reference while scheduling
a work and release it from the worker or when we cancel a work.
The PF driver might be resumed just to configure VFs, but since
it is doing some asynchronous GuC reconfigurations after fresh
reset, we should wait until all pending works are completed.
This is especially important in case of LMEM provisioning, since
we also need to update the LMTT and send invalidation requests
to all GuCs, which are expected to be already in the VGT mode.
Fixes: 68ae022278a1 ("drm/xe/pf: Force GuC virtualization mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com
drm/xe/pf: Disable PF restart worker on device removal
We can't let restart worker run once device is removed, since other
data that it might want to access could be already released.
Explicitly disable worker as part of device cleanup action.
Fixes: a4d1c5d0b99b ("drm/xe/pf: Move VFs reprovisioning to worker") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com
drm/xe/pf: Skip LMTT update if no LMEM was provisioned
During VF unprovisioning, if VF was not provisioned with LMEM,
there is no need to trigger LMTT update, as VF LMTT was never
set. This will spare us sending full TLB invalidation requests.
drm/xe/devcoredump: Defer devcoredump initialization during probe
Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.
The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.
With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.
drm/xe: Fix oops in xe_gem_fault when running core_hotunplug test.
I saw an oops in xe_gem_fault when running the xe-fast-feedback
testlist against the realtime kernel without debug options enabled.
The panic happens after core_hotunplug unbind-rebind finishes.
Presumably what happens is that a process mmaps, unlocks because
of the FAULT_FLAG_RETRY_NOWAIT logic, has no process memory left,
causing ttm_bo_vm_dummy_page() to return VM_FAULT_NOPAGE, since
there was nothing left to populate, and then oopses in
"mem_type_is_vram(tbo->resource->mem_type)" because tbo->resource
is NULL.
It's convoluted, but fits the data and explains the oops after
the test exits.
The VF CCS save/restore series (patchwork #149108) has a dependency
on the migration framework. A recent migration update in commit d65ff1ec8535 ("drm/xe: Split xe_migrate allocation from initialization")
caused a VM crash during XE driver release for iGPU devices.
Update the VF CCS migration initialization sequence to align with the new
migration framework changes, resolving the release-time crash.
Fixes: f3009272ff2e ("drm/xe/vf: Create contexts for CCS read write") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250729120720.13990-1-satyanarayana.k.v.p@intel.com
Michal Wajdeczko [Fri, 25 Jul 2025 09:05:08 +0000 (11:05 +0200)]
drm/xe/hw_engine_group: Don't use drm_warn to catch missed case
Since hwe->class is an enumeration we can rely on the compiler
to catch any unhandled engine class case at compile time thanks
to [-Werror=switch]. Any unexpected use of a special CLASS_MAX
enum case can be guarded by our xe_gt_assert() instead, which
will be compiled-out on the production builds.
John Harrison [Sat, 26 Jul 2025 02:43:37 +0000 (19:43 -0700)]
drm/xe/guc: Add more GuC load error status codes
The GuC load process will abort if certain status codes (which are
indicative of a fatal error) are reported. Otherwise, it keeps waiting
until the 'success' code is returned. New error codes have been added
in recent GuC releases, so add support for aborting on those as well.
v2: Shuffle HWCONFIG_START to the front of the switch to keep the
ordering as per the enum define for clarity (review feedback by
Jonathan). Also add a description for the basic 'invalid init data'
code which was missing.
Ilia Levi [Mon, 14 Jul 2025 12:26:58 +0000 (15:26 +0300)]
drm/xe: Support for mmap-ing mmio regions
Allow the driver to expose hardware register spaces to userspace
through GEM objects with fake mmap offsets. This can be useful
for userspace-firmware communication, debugging, etc.
v2: Minor doc fix (CI)
v3: Enforce MAP_SHARED (Tejas)
Add fault handler with dummy page (Tejas, Matt Auld)
Store physical address instead of xe_mmio in the GEM object (MattB)
v4: Separate xe_mmio_gem from xe_mmio and make it private (MattB)
Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250714122658.1803-1-ilia.levi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The description calls for it to be emitted as the indirect context buffer
workaround for render and compute, and from the workaround batch buffer
for the other engines. Therefore we plug into the previously added
respective top level emission functions.
The actual command streamer programming sequence differs from what is
described in the PRM, in that it assumes the listed LRCA offset was
supposed to actually refer to the location of the CTX_TIMESTAMP register
instead of LRCA + 0x180c (which is in GPR space). Latter appears to make
more sense under the assumption that multiple writes are helping with
restoring the CTX_TIMESTAMP register content from the saved context state.
Michal Wajdeczko [Tue, 22 Jul 2025 14:10:56 +0000 (16:10 +0200)]
drm/xe/configfs: Use pci_name() for lookup
There is no need to manually build PCI device name from BDF data,
since it was already prepared and assigned and can be accessed by
calling pci_name() function.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-4-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Tue, 22 Jul 2025 14:10:55 +0000 (16:10 +0200)]
drm/xe/configfs: Enforce canonical device names
While we expect config directory names to match PCI device name,
currently we are only scanning provided names for domain, bus,
device and function numbers, without checking their format.
This would pass slightly broken entries like:
To avoid such mistakes, check if the name provided exactly matches
the canonical PCI device address format, which we recreated from
the parsed BDF data. Also simplify scanf format as it can't really
catch all formatting errors.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Tue, 22 Jul 2025 14:10:54 +0000 (16:10 +0200)]
drm/xe/configfs: Fix pci_dev reference leak
We are using pci_get_domain_bus_and_slot() function to verify if
the given config directory name matches any existing PCI device,
but we missed to call matching pci_dev_put() to release reference.
While around, also change error code in case of no device match,
to make it more specific than generic formatting error.
Fixes: 16280ded45fb ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Shuicheng Lin [Thu, 24 Jul 2025 19:38:55 +0000 (19:38 +0000)]
drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc()
Memory allocated with drmm_kzalloc() should not be freed using
kfree(), as it is managed by the DRM subsystem. The memory will
be automatically freed when the associated drm_device is released.
These 3 group pointers are allocated using drmm_kzalloc() in
hw_engine_group_alloc(), so they don't require manual deallocation.
Fixes: 67979060740f ("drm/xe/hw_engine_group: Fix potential leak") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250724193854.1124510-2-shuicheng.lin@intel.com
Remove unused GT TLB invalidation trace points after converting to used
GT TLB invalidation jobs. The trace points removed were used during
early bring up of unstable driver, with a stable driver no need to
replace with new tracepoints.
Matthew Brost [Thu, 24 Jul 2025 19:12:15 +0000 (12:12 -0700)]
drm/xe: Use GT TLB invalidation jobs in PT layer
Rather than open-coding GT TLB invalidations in the PT layer, use GT TLB
invalidation jobs. The real benefit is that GT TLB invalidation jobs use
a single dma-fence context, allowing the generated fences to be squashed
in dma-resv/DRM scheduler.
v2:
- s/;;/; (checkpatch)
- Move ijob/mjob job push after range fence install
v3:
- Remove extra newline (Stuart)
- Set ijob/mjob near creation (Stuart)
- Add comment back in (Stuart)
Matthew Brost [Thu, 24 Jul 2025 19:12:13 +0000 (12:12 -0700)]
drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues
Add a generic dependency scheduler for GT TLB invalidations, used to
schedule jobs that issue GT TLB invalidations to bind queues.
v2:
- Use shared GT TLB invalidation queue for dep scheduler
- Break allocation of dep scheduler into its own function
- Add define for max number tlb invalidations
- Skip media if not present
Matthew Brost [Thu, 24 Jul 2025 19:12:12 +0000 (12:12 -0700)]
drm/xe: Create ordered workqueue for GT TLB invalidation jobs
No sense to schedule GT TLB invalidation jobs in parallel which target
the same GT given these all contend on the same lock, create ordered
workqueue for GT TLB invalidation jobs.
Matthew Brost [Thu, 24 Jul 2025 19:12:11 +0000 (12:12 -0700)]
drm/xe: Add generic dependecy jobs / scheduler
Add generic dependecy jobs / scheduler which serves as wrapper for DRM
scheduler. Useful when we want delay a generic operation until a
dma-fence signals.
Existing use cases could be destroying of resources based fences /
dma-resv, the preempt rebind worker, and pipelined GT TLB invalidations.
Written in such a way it could be moved to DRM subsystem if needed.
drm/xe: Rename MCFG_MCR_SELECTOR to STEER_SEMAPHORE
The register at offset 0xfd0 was incorrectly named MCFG_MCR_SELECTOR,
likely copied from i915. According to the hardware specification (Bspec),
this register is actually called STEER_SEMAPHORE.
Rename the register definition and update its usage in xe_gt_mcr.c to
match the official hardware documentation.
Michal Wajdeczko [Wed, 23 Jul 2025 17:56:39 +0000 (19:56 +0200)]
drm/xe/guc: Clear whole g2h_fence during initialization
The struct g2h_fence must be explicitly initializated using the
g2h_fence_init() function to avoid trash values in its members,
but we missed to update this helper function with the new member.
To fix that and avoid any future mistakes, memset the whole struct
first, then update remaining non-zero members.
Fixes: 94de94d24ea8 ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723175639.206875-1-michal.wajdeczko@intel.com
Michal Wajdeczko [Wed, 23 Jul 2025 13:30:15 +0000 (15:30 +0200)]
drm/xe: Make GGTT TLB invalidation failure message GT oriented
GGTT TLB invalidation is performed on the specific GT, thus any
failure message shall be also GT specific. And to help investigate
any unexpected failures, promote message from warn level to WARN
to get full call stack of this unlikely case.
Michal Wajdeczko [Tue, 22 Jul 2025 18:26:18 +0000 (20:26 +0200)]
drm/xe: Enable SR-IOV for TGL
While we don't have official CI SR-IOV coverage for the Tigerlake
platforms, we were using this platform for the feature enabling
and Xe driver already has all required changes to support it.
Since TGL platforms are guarded by the xe.require_force_probe flag
enable SR-IOV feature on them, like we recently did for ADL/ATSM.
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-5-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Tue, 22 Jul 2025 18:26:17 +0000 (20:26 +0200)]
drm/xe: Enable SR-IOV for ADL/ATSM
We were already testing those two platforms for a while on our CI,
but enabling flag (has_sriov) was only available on the topic branch
and only for builds with CONFIG_DRM_XE_DEBUG config.
Since those two platforms are guarded by the another enabling flag
(require_force_probe) and we believe our SR-IOV support for them is
at sufficient level to start enjoying the feature, turn on the
SR-IOV enabling flag unconditionally.
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-4-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Tue, 22 Jul 2025 18:26:16 +0000 (20:26 +0200)]
drm/xe/pf: Enable SR-IOV PF mode by default
We already claim official support for SR-IOV PF/VF modes on PTL
and BMG platforms, but by default we start the Xe driver on those
platforms in non-virtualized mode (native) since we still have
max_vfs modparam set to disable creation of the VFs.
It's time to let the Xe driver support SR-IOV PF mode by default.
We were already testing this on our CI, which was relying on the
patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Tue, 22 Jul 2025 19:52:08 +0000 (12:52 -0700)]
drm/xe: Fix build without debugfs
When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o
is not built and build fails on some setups with:
ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset':
drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure'
ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure'
collect2: error: ld returned 1 exit status
Do not use the gt_reset_failure attribute if debugfs is not enabled.
Fixes: 8f3013e0b222 ("drm/xe: Introduce fault injection for gt reset") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
drm/xe/vf: Register CCS read/write contexts with Guc
Register read write contexts with newly added flags with GUC and
enable the context immediately after registration.
Re-register the context with Guc when resuming from runtime suspend as
soft reset is applied to Guc during xe_pm_runtime_resume().
Make Ring head=tail while unbinding device to avoid issues with VF pause
after device is unbinded.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250722120506.6483-4-satyanarayana.k.v.p@intel.com
drm/xe/vf: Attach and detach CCS copy commands with BO
Attach CCS read/write copy commands to BO for old and new mem types as
NULL -> tt or system -> tt.
Detach the CCS read/write copy commands from BO while deleting ttm bo
from xe_ttm_bo_delete_mem_notify().
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250722120506.6483-3-satyanarayana.k.v.p@intel.com
Create two LRCs to handle CCS meta data read / write from CCS pool in the
VM. Read context is used to hold GPU instructions to be executed at save
time and write context is used to hold GPU instructions to be executed at
the restore time.
Allocate batch buffer pool using suballocator for both read and write
contexts.
Migration framework is reused to create LRCAs for read and write.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250722120506.6483-2-satyanarayana.k.v.p@intel.com
Fixes: b2c4ac219fa4 ("drm/xe/uc: Disable GuC communication on hardware initialization error") Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250721214520.954014-1-zhanjun.dong@intel.com
drm/xe/xe_debugfs: Exposure of G-State and pcie link state residency counters through debugfs
Add debug nodes, "dgfx_pkg_residencies" for G-states (G2, G6, G8, G10,
ModS) and "dgfx_pcie_link_residencies" for PCIe link states(L0, L1, L1.2)
residency counters.
v1:
- Expose all G-State residency counter values under
dgfx_pkg_residencies. (Anshuman)
- Include runtime_get/put. (Riana)
v2:
- Move offset macros to drm/xe/regs/xe_pmt. (Riana)
v3:
- Include debugfs node "dgfx_pcie_link_residencies" for pcie link
residency counter values. (Anshuman)
v4:
- Include check for BMG and add helper function for repetitive
code. (Riana)
- Add for loop and local struct to avoid repetition. (Riana)
- Use "drm_debugfs_create_files" to create debugfs. (Karthik)
v5:
- Reorder commits to reflect the correct dependency hierarchy. (Jonathan)
- Simplification of commit message and rectified register offset.(Karthik)
- Error handling and return before printing. (Riana)
v6:
- Remove check for DGFX as BMG is discrete. (Karthik)
- Rearrange residency offsets in ascending order. (Riana)
v7:
- Squash the macros into the patch they are used in. (Lucas)
Piotr Piórkowski [Mon, 14 Jul 2025 18:48:18 +0000 (20:48 +0200)]
drm/xe: Unify the initialization of VRAM regions
Currently in the drivers we have defined VRAM regions per device and per
tile. Initialization of these regions is done in two completely different
ways. To simplify the logic of the code and make it easier to add new
regions in the future, let's unify the way we initialize VRAM regions.
v2:
- fix doc comments in struct xe_vram_region
- remove unnecessary includes (Jani)
v3:
- move code from xe_vram_init_regions_managers to xe_tile_init_noalloc
(Matthew)
- replace ioremap_wc to devm_ioremap_wc for mapping VRAM BAR
(Matthew)
- Replace the tile id parameter with vram region in the xe_pf_begin
function.
v4:
- remove tile back pointer from struct xe_vram_region
- add new back pointers: xe and migarte to xe_vram_region
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-6-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Piotr Piórkowski [Mon, 14 Jul 2025 18:48:17 +0000 (20:48 +0200)]
drm/xe: Split xe_migrate allocation from initialization
Currently, xe_migrate_init handled both allocation and initialization,
Lets moves allocation to xe_tile_alloc via a new xe_migrate_alloc
function, and keep initialization in xe_migrate_init.
This will allow the migration pointers to be passed to other structures
before full initialization.
Also replaces devm_kzalloc with drmm_kzalloc for better
DRM-managed memory.
Piotr Piórkowski [Mon, 14 Jul 2025 18:48:16 +0000 (20:48 +0200)]
drm/xe: Move struct xe_vram_region to a dedicated header
Let's move the xe_vram_region structure to a new header dedicated to VRAM
to improve modularity and avoid unnecessary dependencies when only
VRAM-related structures are needed.
v2: Fix build if CONFIG_DRM_XE_DEVMEM_MIRROR is enabled
v3: Fix build if CONFIG_DRM_XE_DISPLAY is enabled
v4: Move helper to get tile dpagemap to xe_svm.c
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-4-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Piotr Piórkowski [Mon, 14 Jul 2025 18:48:15 +0000 (20:48 +0200)]
drm/xe: Use dynamic allocation for tile and device VRAM region structures
In future platforms, we will need to represent the device and tile
VRAM regions in a more dynamic way, so let's abandon the static
allocation of these structures and start use a dynamic allocation.
v2:
- Add a helpers for accessing fields of the xe_vram_region structure
v3:
- Add missing EXPORT_SYMBOL_IF_KUNIT for
xe_vram_region_actual_physical_size
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-3-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Piotr Piórkowski [Mon, 14 Jul 2025 18:48:14 +0000 (20:48 +0200)]
drm/xe: Use devm_ioremap_wc for VRAM mapping and drop manual unmap
Let's replace the manual call to ioremap_wc function with devm_ioremap_wc
function, ensuring that VRAM mappings are automatically released when
the driver is detached.
Since devm_ioremap_wc registers the mapping with the device's managed
resources, the explicit iounmap call in vram_fini is no longer needed,
so let's remove it.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-2-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Michal Wajdeczko [Mon, 14 Jul 2025 19:36:45 +0000 (21:36 +0200)]
drm/xe: Move debugfs GT attributes under tile directory
While in sysfs we are correctly trying to reflect the hardware
architecture and we expose GT attributes in per tile hierarchy,
in debugfs we expose GT attributes at flat level, without tiles.
Create debugfs directories to represent each tile and move GT
attributes under matching parent tile directory. To not break
existing debugfs tools, create symlink under old location:
Ashutosh Dixit [Tue, 15 Jul 2025 18:14:22 +0000 (11:14 -0700)]
drm/xe/oa: Fix static checker warning about null gt
There is a static checker warning that gt returned by xe_device_get_gt can
be NULL and that is being dereferenced. Use xe_root_mmio_gt instead, which
is equivalent and cannot return a NULL gt 0.
drm/xe: Don't fail probe on unsupported mailbox command
If the device is running older pcode firmware, it is possible that newer
mailbox commands are not supported by it. The sysfs attributes aren't
useful in that case, but we shouldn't fail driver probe because of it.
As of now, it is unknown if we can distinguish unsupported commands before
attempting them. But until we figure out a way to do that, fix the
regressions.
v2: Add debug message (Lucas)
Fixes: cdc36b66cd41 ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Tested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250714215503.2897748-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Michal Wajdeczko [Fri, 11 Jul 2025 19:33:16 +0000 (21:33 +0200)]
drm/xe/pf: Invalidate LMTT after completing changes
Once we finish populating all leaf pages in the VF's LMTT we should
make sure that hardware will not access any stale data. Explicitly
force LMTT invalidation (as it was already planned in the past).
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-7-michal.wajdeczko@intel.com
Michal Wajdeczko [Fri, 11 Jul 2025 19:33:15 +0000 (21:33 +0200)]
drm/xe/pf: Invalidate LMTT during LMEM unprovisioning
Invalidate LMTT immediately after removing VF's LMTT page tables
and clearing root PTE in the LMTT PD to avoid any invalid access
by the hardware (and VF) due to stale data.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250711193316.1920-6-michal.wajdeczko@intel.com
Michal Wajdeczko [Fri, 11 Jul 2025 19:33:14 +0000 (21:33 +0200)]
drm/xe/pf: Force GuC virtualization mode
By default the GuC starts in the 'native' mode and enables the VGT
mode (aka 'virtualization' mode) only after it receives at least one
set of VF configuration data. While this happens naturally while PF
begins VFs provisioning, we might need this sooner as some actions,
like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in
the VGT mode.
And this becomes a real problem if we would want to use above action
to invalidate the LMTT early during VFs auto-provisioning, before VFs
are enabled, as such H2G would be rejected:
This could be mitigated by pushing earlier a PF self-configuration
with some hard-coded values that cover unlimited access to the GGTT,
use of all GuC contexts and doorbells. This step is sufficient for
the GuC to switch into the VGT mode.
Michal Wajdeczko [Fri, 11 Jul 2025 19:33:12 +0000 (21:33 +0200)]
drm/xe/pf: Resend PF provisioning after GT reset
If we reload the GuC due to suspend/resume or GT reset then we
have to resend not only any VFs provisioning data, but also PF
configuration, like scheduling parameters (EQ, PT), as otherwise
GuC will continue to use default values.
Michal Wajdeczko [Fri, 11 Jul 2025 19:33:11 +0000 (21:33 +0200)]
drm/xe/pf: Prepare to stop SR-IOV support prior GT reset
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.
However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:
While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.
I was debugging some unrelated issue and noticed the current code was
very verbose. We can improve it easily by using the more common batch
buffer building pattern.
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-348 (-348)
Function old new delta
xe_gt_record_default_lrcs.cold 304 296 -8
xe_gt_record_default_lrcs 2200 1860 -340
Total: Before=13554, After=13206, chg -2.57%
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-7-a5e2ca03f6bd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi [Thu, 10 Jul 2025 20:33:51 +0000 (13:33 -0700)]
drm/xe/gt: Drop third submission for default context
There's no need to submit the nop job again on the first queue. Any
state needed is already saved when the first LRC is switched out. The
comment is a little misleading regarding indirect W/A: first of all
there's still no indirect W/A enabled and secondly, even after they are,
there's no need to submit this job again for having their state
propagated: the indirect W/A will actually run on every LRC switch.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:50 +0000 (13:33 -0700)]
drm/xe/lrc: Remove leftover TODO/FIXME
There isn't anything to set for CTX_TIMESTAMP handling in the empty
LRC: that is set on every LRC init since it should always start from 0
rather than the value saved in the image after first submission.
The FIXME about perma-pinning also doesn't make much sense as we will
always going to pin the lrc and the GGTT mapping has nothing to do with
VM bind.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:48 +0000 (13:33 -0700)]
drm/xe/gt: Extract emit_job_sync()
Both the nop and wa jobs are going through the same boiler plate calls
to emit the job with a timeout and handling error for both bb and job.
Extract emit_job_sync() so those functions create the bb, handling
possible errors and delegate the part about really emitting the job
and waiting for its completion.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:47 +0000 (13:33 -0700)]
drm/xe: Count dwords before allocating
The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
allocating enough space for the 3DSTATE or hardcoding 4k depending on
the engine. In the first case it doesn't account for the WAs and in the
former it may not be sufficient. Secondly it's using the size instead of
number of dwords, causing the buffer to be 4x bigger than needed:
xe_bb_new() receives number of dwords as parameter and its declaration
was also not following its implementation.
Lastly, reword the debug message since it's not only about the LRC WAs
anymore as it also include the 3DSTATE for render.
While it's unlikely this is causing any real issue, let's calculate the
needed space and allocate just enough.
Lucas De Marchi [Thu, 10 Jul 2025 20:33:46 +0000 (13:33 -0700)]
drm/xe/lrc: Reduce scope of empty lrc data
The only case in which new lrc data is created from scratch is when it's
called prior to recording the default lrc. There's no need to check for
NULL init_data since in that case the function already failed: just move
the allocation where it's needed.
Michal Wajdeczko [Sun, 13 Jul 2025 10:36:24 +0000 (12:36 +0200)]
drm/xe/pf: Stop requiring VF/PF version negotiation on every GT
While some VF/PF relay actions must be handled on the GT level,
like query for runtime registers, it was clarified by the arch
team that initial version negotiation can be done by the VF just
once, by using any available GuC/GT.
Move handling of the VF/PF ABI version negotiation on the PF side
from the GT level functions to the device level functions.