David Francis [Tue, 12 Aug 2025 18:19:18 +0000 (14:19 -0400)]
drm/amdgpu: Allow more flags to be set on gem create.
The GEM create flag AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE
specifies that gem memory contains sensitive information and
should be cleared to prevent snooping.
The COHERENT and UNCACHED gem create flags enable memory
features related to sharing memory across devices.
For CRIU we need to re-create KFD BOs through the
GEM_CREATE IOCTL, so allow those KFD specific flags here as well.
This will also aid us in the future and allows to move
the KFD components over using the render node for allocations.
Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Tue, 2 Sep 2025 01:23:39 +0000 (11:23 +1000)]
Merge tag 'drm-intel-gt-next-2025-09-01' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Driver Changes:
- Apply multiple JSL/EHL/Gen7/Gen6 workaround properly at context level (Sebastian)
- Protect against overflow in active_engine() (Krzysztof)
- Use try_cmpxchg64() in __active_lookup() (Uros)
- Add a vendor-specific recovery method to drm device wedged uevent
(Riana Tauro)
Driver Changes:
- Use same directory structure in debugfs as in sysfs (Michal Wajdeczko)
- Cleanup and future-proof VRAM region initialization (Piotr Piórkowski)
- Add G-states and PCIe link states to debugfs (Soham Purkait)
- Cleanup eustall debug messages (Harish Chegondi)
- Add SR-IOV support to restore Compression Control Surface (CCS) to
Xe2 and later (Satyanarayana K V P)
- Enable SR-IOV PF mode by default on supported platforms without
needing CONFIG_DRM_XE_DEBUG and mark some platforms behind
force_probe as supported (Michal Wajdeczko)
- More targeted log messages (Michal Wajdeczko)
- Cleanup STEER_SEMAPHORE/MCFG_MCR_SELECTOR usage (Nitin Gote)
- Use common code to emit flush (Tvrtko Ursulin)
- Add/extend more HW workarounds and tunings for Xe2 and Xe3
(Sk Anirban, Tangudu Tilak Tirumalesh, Nitin Gote, Chaitanya Kumar Borah)
- Add a generic dependency scheduler to help with TLB invalidations
and future scenarios (Matthew Brost)
- Use DRM scheduler for delayed GT TLB invalidations (Matthew Brost)
- Error out on incorrect device use in configfs
(Michal Wajdeczko, Lucas De Marchi)
- Refactor configfs attributes (Michal Wajdeczko / Lucas De Marchi)
- Allow configuring future VF devices via configfs (Michal Wajdeczko)
- Implement some missing XeLP workarounds (Tvrtko Ursulin)
- Generalize WA BB setup/emission and add support for
mid context restore BB, aka indirect context (Tvrtko Ursulin)
- Prepare the driver to expose mmio regions to userspace
in future (Ilia Levi)
- Add more GuC load error status codes (John Harrison)
- Document DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING (Priyanka Dandamudi)
- Disable CSC and RPM on VFs (Lukasz Laguna, Satyanarayana K V P)
- Fix oops in xe_gem_fault with PREEMPT_RT (Maarten Lankhorst)
- Skip LMTT update if no LMEM was provisioned (Michal Wajdeczko)
- Add support to VF migration (Tomasz Lis)
- Use a helper for guc_waklv_enable functions (Jonathan Cavitt)
- Prepare GPU SVM for migration of THP (Francois Dugast)
- Program LMTT directory pointer on all GTs within a tile
(Piotr Piórkowski)
- Rename XE_WA to XE_GT_WA to better convey its scope vs the device WAs
(Matt Atwood)
- Allow to match devices on PCI devid/vendorid only (Lucas De Marchi)
- Improve PDE PAT index selection (Matthew Brost)
- Consolidate ASID allocation in xe_vm_create() vs
xe_vm_create_ioctl() (Piotr Piórkowski)
- Resize VF BARS to max possible size according to number of VFs
(Michał Winiarski)
- Untangle vm_bind_ioctl cleanup order (Christoph Manszewski)
- Start fixing usage of XE_PAGE_SIZE vs PAGE_SIZE to improve
compatibility with non-x86 arch (Simon Richter)
- Improve tile vs gt initialization order and accounting
(Gustavo Sousa)
- Extend WA kunit test to PTL
- Ensure data is initialized before transferring to pcode
(Stuart Summers)
- Add PSMI support for HW validation (Lucas De Marchi,
Vinay Belgaumkar, Badal Nilawar)
- Improve xe_dma_buf test (Thomas Hellström, Marcin Bernatowicz)
- Fix basename() usage in generator with !glibc (Carlos Llamas)
- Ensure GT is in C0 during resumes (Xin Wang)
- Add TLB invalidation abstraction (Matt Brost, Stuart Summers)
- Make MI_TLB_INVALIDATE conditional on migrate (Matthew Auld)
- Prepare xe_nvm to be initialized early for future use cases
(Riana Tauro)
Timur Kristóf [Thu, 28 Aug 2025 14:50:36 +0000 (16:50 +0200)]
drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)
Update the legacy (non-DC) display code to respect the maximum
pixel clock for HDMI and DVI-D. Reject modes that would require
a higher pixel clock than can be supported.
Also update the maximum supported HDMI clock value depending on
the ASIC type.
For reference, see the DC code:
check max_hdmi_pixel_clock in dce*_resource.c
v2:
Fix maximum clocks for DVI-D and DVI/HDMI adapters.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fangzhi Zuo [Thu, 14 Aug 2025 18:41:44 +0000 (14:41 -0400)]
drm/amd/display: Skip Check Runtime Link Setting for Specific Branch Device
[why]
Read link setting inside mode validation is not always the final downlink setting.
It is found true in Synaptics branch device.
At bootup, the preferred mode being set right after 1080p is set. It occurred
before graphic load. That modeset switch in a short period of time makes
the branch device switch back and forth from lower and higher link rate,
observed at Synaptics branch device.
DP2 RTK hub on the other hand, sticks to highest available downlink rate after bootup.
Existing check of runtime downlink setting in mode validation shows asynchronous at
branch device link switch, i.e., downlink switch to higher link rate not yet complete
when the mode validation tries to probe the downlink setting. That makes mode validation
checking downlink setting making wrong decision by pruning modes that should pass the
validation after the downlink setting switch is complete.
[how]
If Synaptics is found at the last branch, skip checking downlink setting
at mode validation.
Reviewed-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Wed, 20 Aug 2025 19:46:52 +0000 (15:46 -0400)]
drm/amd/display: Clear the CUR_ENABLE register on DCN314 w/out DPP PG
[Why&How]
ON DCN314, clearing DPP SW structure without power gating it can cause a
double cursor in full screen with non-native scaling.
A W/A that clears CURSOR0_CONTROL cursor_enable flag if
dcn10_plane_atomic_power_down is called and DPP power gating is disabled.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4168 Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clay King [Wed, 20 Aug 2025 19:04:29 +0000 (15:04 -0400)]
drm/amd/display: incorrect conditions for failing dto calculations
[Why & How]
Previously, when calculating dto phase, we would incorrectly fail when phase
<=0 without additionally checking for the integer value. This meant that
calculations would incorrectly fail when the desired pixel clock was an exact
multiple of the reference clock.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Clay King <clayking@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Austin Zheng [Thu, 14 Aug 2025 13:54:45 +0000 (09:54 -0400)]
drm/amd/display: Add Component To Handle Bounding Box Values and IP Caps
[Why]
Bounding box values can be stored in multiple locations. (e.g. PMFW, VBIOS, DMUB).
The source and interpretation of these values can vary with DCN revision
so there should be a component that can gather these values and translate
them accordingly
[How]
Have component start with the statically defined values as a base.
Then update them as needed with DCN-specific logic
Guard this component with FPU flags since values need to be in float point.
Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Custom brightness curve works by walking through all data points one
by one. When the brightness value is at either extreme this is a lot
of data points to walk. This is especially noticeable when moving a
brightness slider around how it can lag.
[How]
Bisect the data points to find the closest for interpolation.
Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Relja Vojvodic [Thu, 14 Aug 2025 15:33:22 +0000 (11:33 -0400)]
drm/amd/display: Increase minimum clock for TMDS 420 with pipe splitting
[Why]
-Pipe splitting allows for clocks to be reduced, but when using TMDS 420,
reduced clocks lead to missed clocks cycles on clock resyncing
[How]
-Impose a minimum clock when using TMDS 420
Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Relja Vojvodic <rvojvodi@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Although compositors will add their own modes, Xorg won't use it's own
modes and will only stick to modes advertised by the driver. This mean a
user that used to pick 1024x768 could no longer access it unless the
panel's native resolution was 1024x768.
[How]
Revert commit 6d396e7ac1ce3 ("drm/amd/display: Disable common modes for
LVDS") and commit 7948afb46af92 ("drm/amd/display: Disable common modes
for eDP").
The panel will still use scaling for any non-native modes due to
commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native
resolutions on eDP")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538 Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250828140856.2887993-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Thu, 14 Aug 2025 08:22:50 +0000 (13:52 +0530)]
drm/amdgpu: Check vcn state before profile switch
The patch uses power state of VCN instances for requesting video
profile.
In idle worker of a vcn instance, when there is no outstanding
submisssion or fence, the instance is put to power gated state. When
all instances are powered off that means video profile is no longer
required. A request is made to turn off video profile.
A job submission starts with begin_use of ring, and at that time
vcn instance state is changed to power on. Subsequently a check is
made for active video profile, and if not active, a request is made.
Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload profile handling") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Mon, 25 Aug 2025 01:38:32 +0000 (09:38 +0800)]
drm/amd/amdgpu: unified amdgpu ip block name
v1:
1. Unified amdgpu ip block name print with format
"{ip_type}_v{major}_{minor}_{rev}"
2. Avoid IP block name conflicts for SMU/PSP ip block
v2:
Update IP block print format to keep legacy IP block name (Alex)
"{ip_type}_v{major}_{minor}_{rev} ({funcs->name})"
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Wed, 27 Aug 2025 05:29:17 +0000 (13:29 +0800)]
drm/amdgpu/sdma: bump firmware version checks for user queue support
Using the previous firmware could lead to problems with
PROTECTED_FENCE_SIGNAL commands, specifically causing register
conflicts between MCU_DBG0 and MCU_DBG1.
The updated firmware versions ensure proper alignment
and unification of the SDMA_SUBOP_PROTECTED_FENCE_SIGNAL value with SDMA 7.x,
resolving these hardware coordination issues
Fixes: e8cca30d8b34 ("drm/amdgpu/sdma6: add ucode version checks for userq support") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu/vcn: add instance number to VCN version message
For multiple VCN instances case we get multiple lines of the same
message like below:
amdgpu 0000:43:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 11
amdgpu 0000:43:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 11
By adding instance number to the log message for multiple VCN instances,
each line will clearly indicate which VCN instance it refers to.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch corrects several typographical errors in atomfirmware.h.
The fixes improve readability and maintain consistency in the codebase.
No functional changes are introduced.
Thomas Hellström [Thu, 28 Aug 2025 13:48:37 +0000 (15:48 +0200)]
drm/xe: Fix incorrect migration of backed-up object to VRAM
If an object is backed up to shmem it is incorrectly identified
as not having valid data by the move code. This means moving
to VRAM skips the -EMULTIHOP step and the bo is cleared. This
causes all sorts of weird behaviour on DGFX if an already evicted
object is targeted by the shrinker.
Fix this by using ttm_tt_is_swapped() to identify backed-up
objects.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5996 Fixes: 00c8efc3180f ("drm/xe: Add a shrinker for xe bos") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250828134837.5709-1-thomas.hellstrom@linux.intel.com
Dave Airlie [Thu, 28 Aug 2025 22:55:29 +0000 (08:55 +1000)]
Merge tag 'drm-misc-next-2025-08-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.18:
UAPI Changes:
atomic:
- Reallow no-op async page flips
Cross-subsystem Changes:
hid:
- i2c-hid: Make elan touch controllers power on after panel is enabled
video:
- Improve pixel-format handling for struct screen_info
Core Changes:
display:
- dp: Fix command length
Driver Changes:
amdxdna:
- Fixes
bridge:
- Add support for Radxa Ra620 plus DT bindings
msm:
- Fix VMA allocation
panel:
- ilitek-ili9881c: Refactor mode setting; Add support for Bestar
BSD1218-A101KL68 LCD plus DT bindings
- lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
rockchip:
- dsi2: Add support for RK3576 plus DT bindings
drm/xe/uapi: Fix kernel-doc formatting for madvise and vma_query
Correct kernel-doc formatting issues in the UAPI definitions for
madvise and VMA query interfaces to resolve docutils warnings during
documentation build.
Fixes: 418807860e94 ("drm/xe/uapi: Add UAPI for querying VMA count and memory attributes") Fixes: 231bb0ee7aa5 ("drm/xe/uapi: Add madvise interface") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250828071516.3838110-1-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Riana Tauro [Mon, 25 Aug 2025 10:35:37 +0000 (16:05 +0530)]
drm/xe/nvm: Use root tile mmio
To allow initialization of nvm during early probe for future usecases,
use root tile instead of root gt to access mmios, as gt is not
yet initialized at early probe.
drm/xe/tests: Make cross-device dma-buf BOs CPU-visible on small BAR
Small-BAR systems (e.g., SR-IOV VFs in VMs) expose only a subset of
VRAM via PCI/BAR. Exporting a BO outside that window fails, and the
selftests also do CPU fill/verify.
Set XE_BO_FLAG_NEEDS_CPU_ACCESS for cross-device variants to force
CPU-mappable placement and keep tests reliable. Large-BAR/P2P setups
are unaffected.
Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250814145950.430231-1-marcin.bernatowicz@linux.intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Matthew Auld [Fri, 8 Aug 2025 11:04:53 +0000 (12:04 +0100)]
drm/xe/migrate: make MI_TLB_INVALIDATE conditional
When clearing VRAM we should be able to skip invalidating the TLBs if we
are only using the identity map to access VRAM (which is the common
case), since no modifications are made to PTEs on the fly. Also since we
use huge 1G entries within the identity map, there should be a pretty
decent chance that the next packet(s) (if also clears) can avoid a tree
walk if we don't shoot down the TLBs, like if we have to process a long
stream of clears.
For normal moves/copies, we usually always end up with the src or dst
being system memory, meaning we can't only rely on the identity map and
will also need to emit PTEs and so will always require a TLB flush.
v2:
- Update commit to explain the situation for normal copies (Matt B)
- Rebase on latest changes
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250808110452.467513-2-matthew.auld@intel.com
Dan Carpenter [Wed, 27 Aug 2025 12:57:31 +0000 (15:57 +0300)]
HID: i2c-hid: Fix test in i2c_hid_core_register_panel_follower()
Bitwise AND was intended instead of OR. With the current code the
condition is always true.
Fixes: cbdd16b818ee ("HID: i2c-hid: Make elan touch controllers power on after panel is enabled") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Pin-yen Lin <treapking@chromium.org> Acked-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/aK8Au3CgZSTvfEJ6@stanley.mountain
Matthew Brost [Tue, 26 Aug 2025 18:29:11 +0000 (18:29 +0000)]
drm/xe: Split TLB invalidation code in frontend and backend
The frontend exposes an API to the driver to send invalidations, handles
sequence number assignment, synchronization (fences), and provides a
timeout mechanism. The backend issues the actual invalidation to the
hardware (or firmware).
The new layering easily allows issuing TLB invalidations to different
hardware or firmware interfaces.
Matthew Brost [Tue, 26 Aug 2025 18:29:09 +0000 (18:29 +0000)]
drm/xe: Prep TLB invalidation fence before sending
It is a bit backwards to add a TLB invalidation fence to the pending
list after issuing the invalidation. Perform this step before issuing
the TLB invalidation in a helper function.
v2: Make sure the seqno_lock mutex covers the send as well (Matt)
Matthew Brost [Tue, 26 Aug 2025 18:29:08 +0000 (18:29 +0000)]
drm/xe: Decouple TLB invalidations from GT
Decouple TLB invalidations from the GT by updating the TLB invalidation
layer to accept a `struct xe_tlb_inval` instead of a `struct xe_gt`.
Also, rename *gt_tlb* to *tlb*. The internals of the TLB invalidation
code still operate on a GT, but this is now hidden from the rest of the
driver.
Matthew Brost [Tue, 26 Aug 2025 18:29:06 +0000 (18:29 +0000)]
drm/xe: Add xe_tlb_inval structure
Extract TLB invalidation state into a structure to decouple TLB
invalidations from the GT, allowing the structure to be embedded
anywhere in the driver.
Stuart Summers [Tue, 26 Aug 2025 18:29:04 +0000 (18:29 +0000)]
drm/xe: Cancel pending TLB inval workers on teardown
Add a new _fini() routine on the GT TLB invalidation
side to handle this worker cleanup on driver teardown.
v2: Move the TLB teardown to the gt fini() routine called during
gt_init rather than in gt_alloc. This way the GT structure stays
alive for while we reset the TLB state.
Stuart Summers [Tue, 26 Aug 2025 18:29:03 +0000 (18:29 +0000)]
drm/xe: Move explicit CT lock in TLB invalidation sequence
Currently the CT lock is used to cover TLB invalidation
sequence number updates. In an effort to separate the GuC
back end tracking of communication with the firmware from
the front end TLB sequence number tracking, add a new lock
here to specifically track those sequence number updates
coming in from the user.
Apart from the CT lock, we also have a pending lock to
cover both pending fences and sequence numbers received
from the back end. Those cover interrupt cases and so
it makes not to overload those with sequence numbers
coming in from new transactions. In that way, we'll employ
a mutex here.
v2: Actually add the correct lock rather than just dropping
it... (Matt)
Lucas De Marchi [Tue, 26 Aug 2025 15:32:11 +0000 (08:32 -0700)]
drm/xe/configfs: Block runtime attribute changes
Although it's possible to change the attributes in runtime, they have no
effect after the driver is already bound to the device. Check for that
and return -EBUSY in that case.
This should help users understand what's going on when the behavior is
not changing even if the value from the configfs is "right", but it got
to that state too late.
Jesse.Zhang [Tue, 26 Aug 2025 09:30:58 +0000 (17:30 +0800)]
drm/amdgpu: update firmware version checks for user queue support
The minimum firmware versions required for user queue functionality
have been increased to address an issue where the queue privilege
state was lost during queue connect operations.
The problem occurred because the privilege state was being restored
to its initial value at the beginning of the function, overwriting
the state that was properly set during the queue connect case.
This commit updates the minimum version requirements:
- ME firmware from 2390 to 2420
- PFP firmware from 2530 to 2580
- MEC firmware from 2600 to 2650
- MES firmware remains at 120
These updated firmware versions contain the necessary fixes to
properly maintain queue privilege state throughout connect operations.
Fixes: 61ca97e9590c ("drm/amdgpu: Add fw minimum version check for usermode queue") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Mon, 25 Aug 2025 04:54:01 +0000 (12:54 +0800)]
drm/amd/amdgpu: disable hwmon power1_cap* for gfx 11.0.3 on vf mode
the PPSMC_MSG_GetPptLimit msg is not valid for gfx 11.0.3 on vf mode,
so skiped to create power1_cap* hwmon sysfs node.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shaoyun Liu [Fri, 11 Jul 2025 01:42:16 +0000 (21:42 -0400)]
drm/amd/amdgpu : Use the MES INV_TLBS API for tlb invalidation on gfx12
From MES version 0x81, it provide the new API INV_TLBS that support
invalidate tlbs with PASID.
Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shaoyun Liu [Fri, 4 Jul 2025 16:30:10 +0000 (12:30 -0400)]
drm/amd/include : Update MES v12 API header(INV_TLBS)
The requirement from driver side is to have an API that can do the
tlb invalidation on dedicate pasid since driver don't know the vmid
and process mapping.
Make the API generic to support different tlb invalidation related
request. Driver can specify pasid, vmid, hub_id and vm address range
need to be invalidated.
With this API the old INV_GART in MISC Op can be deprecated.
Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Sat, 23 Aug 2025 07:00:06 +0000 (15:00 +0800)]
drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set
Fix a UBSAN shift-out-of-bounds warning in amdgpu_debugfs_jpeg_sched_mask_set
when the shift exponent reaches or exceeds 32 bits. The issue occurred because
a 32-bit integer '1' was being shifted by up to 32 bits, which is undefined
behavior.
Replace '1' with '1ULL' to ensure 64-bit arithmetic, matching the u64 type of
'val' and preventing the shift overflow. This is consistent with the existing
mask calculation that already uses 1ULL.
The error manifested as:
UBSAN: shift-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c:373:17
shift exponent 32 is too large for 32-bit type 'int'
v2: remove debug log
* Firmware releases for multiple asics
* CodeQL fixes
* Fix for double cursor with 180 degree rotation on large resolutions
* Misc bug fixes for DSC, PSR/Replay, DPIA etc.
Signed-off-by: Nicholas Carbones <ncarbone@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 15 Aug 2025 23:23:50 +0000 (19:23 -0400)]
drm/amd/display: [FW Promotion] Release 0.1.24.0
Add two new IPS residency data modes.
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Thu, 14 Aug 2025 16:01:15 +0000 (12:01 -0400)]
drm/amd/display: Consider sink max slice width limitation for dsc
[WHY&HOW]
The sink max slice width limitation should be considered for DSC, but
was removed in "refactor DSC cap calculations".
This patch adds it back and takes the valid minimum between the sink and
source.
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Thu, 7 Aug 2025 13:45:26 +0000 (09:45 -0400)]
drm/amd/display: Support HW cursor 180 rot for any number of pipe splits
[Why]
For the HW cursor, its current position in the pipe_ctx->stream struct is
not affected by the 180 rotation, i. e. the top left corner is still at
0,0. However, the DPP & HUBP set_cursor_position functions require rotated
position.
The current approach is hard-coded for ODM 2:1, thus it's failing for
ODM 4:1, resulting in a double cursor.
[How]
Instead of calculating the new cursor position relatively to the
viewports, we calculate it using a viewavable clip_rect of each plane.
The clip_rects are first offset and scaled to the same space as the
src_rect, i. e. Stream space -> Plane space.
In case of a pipe split, which divides the plane into 2 or more viewports,
the clip_rect is the union of all the viewports of the given plane.
With the assumption that the viewports in HUBP's set_cursor_position are
in the Plane space as well, it should produce a correct cursor position
for any number of pipe splits.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Amber Lin [Fri, 15 Aug 2025 18:04:15 +0000 (14:04 -0400)]
drm/amdkfd: Tie UNMAP_LATENCY to queue_preemption
When KFD asks CP to preempt queues, other than preempt CP queues, CP
also requests SDMA to preempt SDMA queues with UNMAP_LATENCY timeout.
Currently queue_preemption_timeout_ms is 9000 ms by default but can be
configured via module parameter. KFD_UNMAP_LATENCY_MS is hard coded as
4000 ms though. This patch ties KFD_UNMAP_LATENCY_MS to
queue_preemption_timeout_ms so in a slow system such as emulator, both
CP and SDMA slowness are taken into account.
Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the conditions for setting the SMU vcn reset caps in the SMU v13.0.6 PPT
initialization function. Specifically:
- Add support for VCN reset capability for firmware versions 0x00558200 and
above when the program version is 0.
- Add support for VCN reset capability for firmware versions 0x05551800 and
above when the program version is 5.
v2: correct the smu mp1 version for program 5 (Lijo)
Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Eric Huang [Mon, 18 Aug 2025 18:22:53 +0000 (14:22 -0400)]
drm/amdkfd: fix vram allocation failure for a special case
When it only allocates vram without va, which is 0, and a
SVM range allocated stays in this range, the vram allocation
returns failure. It should be skipped for this case from
SVM usage check.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunday Clement [Fri, 30 May 2025 14:58:08 +0000 (10:58 -0400)]
drm/amdkfd: Allow device error to be logged
The addition of a WARN_ON() check in order to return early in the
kq_initialize function retroactively causes the default case in the
following switch statement to never be executed, preventing dev_err
from logging device errors in the kernel. Both logs are now checked
in the default case.
Rakuram Eswaran [Thu, 21 Aug 2025 02:59:55 +0000 (08:29 +0530)]
docs: gpu: amdgpu: Fix spelling in amdgpu documentation
Fixed following typos reported by Codespell
1. propogated ==> propagated
aperatures ==> apertures
In Documentation/gpu/amdgpu/debugfs.rst
2. parition ==> partition
In Documentation/gpu/amdgpu/process-isolation.rst
3. conections ==> connections
In Documentation/gpu/amdgpu/display/programming-model-dcn.rst
In addition to above,
Fixed wrong bit-partition naming in gpu/amdgpu/process-isolation.rst
from "fourth" partition to "third" partition.
Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Rakuram Eswaran <rakuram.e96@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Brahmajit Das [Thu, 21 Aug 2025 12:01:33 +0000 (17:31 +0530)]
drm/amd/display: clean-up dead code in dml2_mall_phantom
pipe_idx in funtion dml2_svp_validate_static_schedulabilit, although set
is never actually used. While building with GCC 16 this gives a warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_mall_phantom.c: In function ‘set_phantom_stream_timing’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_mall_phantom.c:657:25: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=]
657 | unsigned int i, pipe_idx;
| ^~~~~~~~
Signed-off-by: Brahmajit Das <listout@listout.xyz> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yifan Zhang [Wed, 20 Aug 2025 06:55:41 +0000 (14:55 +0800)]
drm/amdgpu: remove redundant AMDGPU_HAS_VRAM
AMDGPU_HAS_VRAM is redundant with is_app_apu, as both refer to
APUs with no carve-out. Since AMDGPU_HAS_VRAM only occurs once,
remove AMDGPU_HAS_VRAM definition. The tmr allocation can be covered
with AMDGPU_GEM_DOMAIN_GTT | AMDGPU_GEM_DOMAIN_VRAM in both vram and
non vram ASICs.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ce Sun [Wed, 20 Aug 2025 09:18:57 +0000 (17:18 +0800)]
drm/amdgpu: Correct the loss of aca bank reg info
By polling, poll ACA bank count to ensure that valid
ACA bank reg info can be obtained
v2: add corresponding delay before send msg to SMU to query mca bank info
(Stanley)
v3: the loop cannot exit. (Thomas)
v4: remove amdgpu_aca_clear_bank_count. (Kevin)
v5: continuously inject ce. If a creation interruption
occurs at this time, bank reg info will be lost. (Thomas)
v5: each cycle is delayed by 100ms. (Tao)
Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ce Sun [Tue, 19 Aug 2025 06:47:05 +0000 (14:47 +0800)]
drm/amdgpu: Add a mutex lock to protect poison injection
When poison is triggered multiple times, competition will occur.
Add a mutex lock to protect poison injection
Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ce Sun [Thu, 7 Aug 2025 04:36:05 +0000 (12:36 +0800)]
drm/amdgpu: Correct the counts of nr_banks and nr_errors
Correct the counts of nr_banks and nr_errors
Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Liao Yuanhong [Tue, 19 Aug 2025 14:24:50 +0000 (22:24 +0800)]
drm/amd/display: Remove redundant header files
The header file "dc_stream.h" is already included on line 1507. Remove the
redundant include.
This is because the header file was initially included towards the latter
part of the code. Subsequent commits had to include the header file again
earlier in the code. In my opinion, this doesn't count as a fix; it just
requires removing the redundant header inclusion.
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Liao Yuanhong [Tue, 19 Aug 2025 08:25:21 +0000 (16:25 +0800)]
drm/amdgpu/fence: Remove redundant 0 value initialization
The amdgpu_fence struct is already zeroed by kzalloc(). It's redundant to
initialize am_fence->context to 0.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 24 Jun 2025 15:38:14 +0000 (11:38 -0400)]
drm/amdgpu/gfx12: set MQD as appriopriate for queue types
Set the MQD as appropriate for the kernel vs user queues.
Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 24 Jun 2025 15:37:16 +0000 (11:37 -0400)]
drm/amdgpu/gfx11: set MQD as appriopriate for queue types
Set the MQD as appropriate for the kernel vs user queues.
Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xin Wang [Wed, 27 Aug 2025 00:06:33 +0000 (17:06 -0700)]
drm/xe: Ensure GT is in C0 during resumes
This patch ensures the gt will be awake for the entire duration
of the resume sequences until GuCRC takes over and GT-C6 gets
re-enabled.
Before suspending GT-C6 is kept enabled, but upon resume, GuCRC
is not yet alive to properly control the exits and some cases of
instability and corruption related to GT-C6 can be observed.
drm/sysfb: Do not deref unexisting CRTC state in atomic_disable
Do not access CRTC state in drm_sysfb_plane_helper_atomic_disable().
Use format from sysfb device for clearing scanout buffer. This is
the behavior from before commit 061963cd9e5b ("drm/sysfb: Blit to
CRTC destination format").
When being disabled, the plane has no associated CRTC. Trying to deref
the format pointer results in a segmentation fault. An example stack
track is shown below.
The commit 9ab440a9d042 ("drm/xe/ptl: L3bank mask is not
available on the media GT") added a workaround to ignore
the fuse register that L3 bank availability as it did not
contain valid values. Same is true for WCL therefore extend
the workaround to cover it.
Lizhi Hou [Tue, 26 Aug 2025 17:19:51 +0000 (10:19 -0700)]
accel/amdxdna: Fix incorrect type used for a local variable
drivers/accel/amdxdna/aie2_pci.c:794:13: sparse: sparse: incorrect type in assignment (different address spaces)
Fixes: c8cea4371e5e ("accel/amdxdna: Add a function to walk hardware contexts") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508230855.0b9efFl6-lkp@intel.com/ Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250826171951.801585-1-lizhi.hou@amd.com
Add support to handle CSC firmware reported errors. When CSC firmware
errors are encoutered, a error interrupt is received by the GFX device as
a MSI interrupt.
Device Source control registers indicates the source of the error as CSC
The HEC error status register indicates that the error is firmware reported
Depending on the type of error, the error cause is written to the HEC
Firmware error register.
On encountering such CSC firmware errors, the graphics device is
non-recoverable from driver context. The only way to recover from these
errors is firmware flash.
System admin/userspace is notified of the necessity of firmware flash
with a combination of vendor-specific drm device edged uevent, dmesg logs
and runtime survivability sysfs. It is the responsiblity of the consumer
to verify all the actions and then trigger a firmware flash using tools
like fwupd.
$ udevadm monitor --property --kernel
monitor will print the received events for:
KERNEL - the kernel uevent
Riana Tauro [Tue, 26 Aug 2025 06:34:15 +0000 (12:04 +0530)]
drm/xe: Add support to handle hardware errors
Gfx device reports two classes of errors: uncorrectable and
correctable. Depending on the severity uncorrectable errors are further
classified Non-Fatal and Fatal.
Correctable and Non-Fatal errors: These errors are reported as MSI. Bits in
the Master Interrupt Register indicate the class of the error.
The source of the error is then read from the Device Error Source
Register.
Fatal errors: These are reported as PCIe errors
When a PCIe error is asserted, the OS will perform a SBR (Secondary
Bus reset) which causes the driver to reload. The error registers are
sticky and the values are maintained through SBR.
Add basic support to handle these errors.
Bspec: 50875, 53073, 53074, 53075, 53076
v2: Format commit message (Umesh)
v3: fix documentation (Stuart)
Riana Tauro [Tue, 26 Aug 2025 06:34:13 +0000 (12:04 +0530)]
drm/xe/xe_survivability: Add support for Runtime survivability mode
Certain runtime firmware errors can cause the device to be in a unusable
state requiring a firmware flash to restore normal operation.
Runtime Survivability Mode indicates firmware flash is necessary by
wedging the device and exposing survivability mode sysfs.
The below sysfs is an indication that device is in survivability mode
/sys/bus/pci/devices/<device>/survivability_mode
v2: Fix kernel-doc (Umesh)
v3: Add user friendly dmesg (Frank)