Samuel Zhang [Wed, 5 Nov 2025 03:04:08 +0000 (03:04 +0000)]
drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.
Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.
The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
unref bo after amdgpu_bo_reserve() failure as it has
called amdgpu_bo_ref() already
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 22 Oct 2025 21:11:38 +0000 (17:11 -0400)]
drm/amdgpu: fix possible fence leaks from job structure
If we don't end up initializing the fences, free them when
we free the job. We can't set the hw_fence to NULL after
emitting it because we need it in the cleanup path for the
submit direct case.
v2: take a reference to the fences if we emit them
v3: handle non-job fence in error paths
Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> (v1) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
YiPeng Chai [Tue, 28 Oct 2025 08:18:31 +0000 (16:18 +0800)]
drm/amdgpu: suspend ras module before gpu reset
During gpu reset, all GPU-related resources are
inaccessible. To avoid affecting ras functionality,
suspend ras module before gpu reset and resume
it after gpu reset is complete.
V2:
Rename functions to avoid misunderstanding.
V3:
Move flush_delayed_work to amdgpu_ras_process_pause,
Move schedule_delayed_work to amdgpu_ras_process_unpause.
V4:
Rename functions.
V5:
Move the function to amdgpu_ras.c.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12
implement ras_smu_drv interface for smu v13.0.12
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 29 Oct 2025 13:41:04 +0000 (09:41 -0400)]
Revert "drm/amdkfd: Improve signal event slow path"
To fix regression report on gfx8, which requires the exhaustive search
path for signaled event.
The high CPU usage of KFD interrupt wq issue is gone after HIP/ROCr add
option to reduce HW event interrupts, safe to revert this optimization
patch now.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rong Zhang [Mon, 13 Oct 2025 16:47:35 +0000 (00:47 +0800)]
drm/amd/display: Fix NULL deref in debugfs odm_combine_segments
When a connector is connected but inactive (e.g., disabled by desktop
environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading
odm_combine_segments causes kernel NULL pointer dereference.
Philip Yang [Fri, 31 Oct 2025 14:50:02 +0000 (10:50 -0400)]
drm/amdkfd: Don't clear PT after process killed
If process is killed. the vm entity is stopped, submit pt update job
will trigger the error message "*ERROR* Trying to push to a killed
entity", job will not execute.
Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Mon, 18 Aug 2025 06:33:41 +0000 (12:03 +0530)]
drm/amd/pm: Add schema v1.1 for parition metrics
Use a schema similar to gpu metrics v1.9 for partition metrics also. It
will have field type encoded followed by the field value(s). The
attribute ids used will be shared with gpu metrics. The structure
definition is only to distinguish between gpu metrics and partition
metrics though both gpu metrics v1.9 and partition metrics v1.1 follow
the same definition.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Fri, 31 Oct 2025 08:40:13 +0000 (14:10 +0530)]
drm/amdgpu: validate the bo from done list for NULL
Make sure the bo is valid before using it.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: lock bo before calling amdgpu_vm_bo_update_shared
BO's reservation object must be locked before using
amdgpu_vm_bo_update_shared otherwise dma_resv_assert_held will
complain in amdgpu_vm_update_shared.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Fri, 31 Oct 2025 08:21:36 +0000 (09:21 +0100)]
drm/amdgpu: grab a BO reference in vm_lock_done_list.
Otherwise it is possible that between dropping the status lock and
locking the BO that the BO is freed up.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Thu, 30 Oct 2025 14:38:49 +0000 (22:38 +0800)]
drm/amd/ras: Fix format truncation
../ras/rascore/ras_cper.c: In function ‘cper_generate_fatal_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
| ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
| ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76 | RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../ras/rascore/ras_cper.c: In function ‘cper_generate_runtime_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
| ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
| ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76 | RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sat, 25 Oct 2025 00:38:14 +0000 (19:38 -0500)]
drm/amd/display: Promote DC to 3.2.357
This version brings along following update:
- HDCP2 FW locality check refactors
- Fix black screen issue with HDMI output
- Increase IB mem size
- Revert max buffered cursor size to 64
- Extend inbox0 lock to run Replay / PSR
- Refactor VActive implementation
- Add Pstate viewport reduction
- Persist stream refcount through restore
Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 24 Oct 2025 22:42:39 +0000 (18:42 -0400)]
drm/amd/display: [FW Promotion] Release 0.1.34.0
Release hightlights
DCN35/36
* Dynamically clock gate before and after prefetch
Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Wed, 22 Oct 2025 22:19:34 +0000 (16:19 -0600)]
drm/amd/display: Fix black screen with HDMI outputs
[Why & How]
This fixes the black screen issue on certain APUs with HDMI,
accompanied by the following messages:
amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info
frame on connector DP-1: -22
amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm]
Cannot find any crtc or sizes
Fixes: 489f0f600ce2 ("drm/amd/display: Fix DVI-D/HDMI adapters") Suggested-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alvin Lee [Thu, 23 Oct 2025 17:56:32 +0000 (13:56 -0400)]
drm/amd/display: Increase IB mem size
[Why & How]
Increase IB mem size to match size of largest structure that will
use IB transfer between driver and DMU.
Reviewed-by: Oleh Kuzhylnyi <oleh.kuzhylnyi@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Thu, 23 Oct 2025 21:07:04 +0000 (17:07 -0400)]
drm/amd/display: Revert DCN4 max buffered cursor size to 64
[Why & How]
The buffered cursor cap is expressed assuming a square cursor, and usage
of the cursor buffer is limited by the request size. For greater than 32
pixels, the request size is fixed at 256 bytes, so the maximum width
must be floored to the nearest 256th byte. At 4bpp this means even with
24kB DCN4 can only hold a 64x64 cursor in the buffer as even 65 pixels
would require 512 bytes per line instead of 256.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Joshua Aberback [Thu, 23 Oct 2025 20:43:56 +0000 (16:43 -0400)]
drm/amd/display: Persist stream refcount through restore
[Why & How]
Overwriting the refcount on stream restore can lead to double-free errors
or memory leaks if an unbalanced number of retains and releases occurs
between a backup and restore.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Austin Zheng [Fri, 3 Oct 2025 14:39:49 +0000 (10:39 -0400)]
drm/amd/display: Refactor VActive implementation
[Why & How]
Refactors VActive accounting in PMO, and breaks down fill time
requirement by P-State type as it can result in drasitcally different
bandwidth requirements depending on the blackout length.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Andrew Mazour [Wed, 15 Oct 2025 16:19:49 +0000 (12:19 -0400)]
drm/amd/display: Extend inbox0 lock to run Replay/PSR
[Why]
The inbox1 infrastructure is deprecated, so to support display
power features requiring a DMUB interlock moving forward extend
the inbox0 locking conditions to also include Replay or PSR.
[How]
Implemented a series of changes to improve HW lock handling:
- Deprecated should_use_dmub_inbox1_lock() and guarded it with
DCN401 flag.
- Migrated lock checks into inbox0 helpers and added PSR/Replay
enablement checks to ensure correct behavior.
- Updated HWSS fast update path to acquire HW lock as needed
using the new helpers.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Andrew Mazour <Andrew.Mazour@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wenjing Liu [Fri, 3 Oct 2025 15:59:39 +0000 (11:59 -0400)]
drm/amd/display: fw locality check refactors
[why]
There are some new changes for HDCP2 firmware locality check. The
implementation doesn't perfectly fit the intended design and clarity.
1. Clarify and consolidate variable responsibilities.
The previous implementation introduced the following variables:
- config.ddc.funcs.atomic_write_poll_read_i2c (optional pointer)
- hdcp->config.ddc.funcs.atomic_write_poll_read_aux (optional pointer)
- hdcp->connection.link.adjust.hdcp2.force_sw_locality_check (bool)
- hdcp->config.debug.lc_enable_sw_fallback (bool)
- use_fw (bool)
They will be used together to determine two operations:
- Whether to use FW locality check
- Whether to use SW fallback on FW locality check failure
The refactor streamlines this by introducing two variables in the hdcp2
link adjustment, while ensuring function pointers are always assigned
and remain independent from policy decisions:
- use_fw_locality_check (bool) -> true if fw locality should be used.
- use_sw_locality_fallback (bool) -> true to reset use_fw_locality_check
back to false and retry on fw locality check failure.
2. Mixed meanings of l_prime_read transition input
l_prime_read originally means if l_prime is read when sw locality check
is used. When FW locality check is used, l_prime_read means if lc init
write, l prime poll and l_prime read combo operation is successful. The
mix of meanings is confusing. The refactor introduces a new variable
l_prime_combo_read to isolate the second meaning into its own variable.
3. Missing specific error code on firmware locality error.
The original change reuses the generic DDC failure error code when
firmware fails to return locality check result. This is not ideal as
DDC failure indicates an error occurred during an I2C/AUX transaction.
FW locality failure could be caused by polling timeout in firmware or
failure to acquire firmware access. Which sits at a higher level of
abstraction above DDC hardware. An incorrect error code could mislead
the debug into a wrong direction.
4. Correcting misplaced comments. The previous implementation of the
firmware locality check resulted in some comments in hdcp2_transition
being incorrectly positioned. This refactor relocates those comments to
their appropriate locations for better clarity.
Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Fri, 24 Oct 2025 02:51:52 +0000 (10:51 +0800)]
drm/amdgpu: Implement user queue reset functionality
This patch adds robust reset handling for user queues (userq) to improve
recovery from queue failures. The key components include:
1. Queue detection and reset logic:
- amdgpu_userq_detect_and_reset_queues() identifies failed queues
- Per-IP detect_and_reset callbacks for targeted recovery
- Falls back to full GPU reset when needed
2. Reset infrastructure:
- Adds userq_reset_work workqueue for async reset handling
- Implements pre/post reset handlers for queue state management
- Integrates with existing GPU reset framework
3. Error handling improvements:
- Enhanced state tracking with HUNG state
- Automatic reset triggering on critical failures
- VRAM loss handling during recovery
4. Integration points:
- Added to device init/reset paths
- Called during queue destroy, suspend, and isolation events
- Handles both individual queue and full GPU resets
The reset functionality works with both gfx/compute and sdma queues,
providing better resilience against queue failures while minimizing
disruption to unaffected queues.
v2: add detection and reset calls when preemption/unmaped fails.
add a per device userq counter for each user queue type.(Alex)
v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex)
warn if the adev->userq_mutex is not held.
v4: make sure we have all of the uqm->userq_mutex held.
warn if the uqm->userq_mutex is not held.
v5: Use array for user queue type counters.(Alex)
all of the uqm->userq_mutex need to be held when calling detect and reset. (Alex)
v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process
v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo)
v8: remove atomic_set(&userq_mgr->userq_count[i], 0).
it should already be 0 since we kzalloc the structure (Alex)
v9: For consistency with kernel queues, We may want something like:
amdgpu_userq_is_reset_type_supported (Alex)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Don't stretch non-native images by default in eDP
commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native
resolutions on eDP") started using the GPU scaler hardware to scale
when a non-native resolution was picked on eDP. This scaling was done
to fill the screen instead of maintain aspect ratio.
The idea was supposed to be that if a different scaling behavior is
preferred then the compositor would request it. The not following
aspect ratio behavior however isn't desirable, so adjust it to follow
aspect ratio and still try to fill screen.
Note: This will lead to black bars in some cases for non-native
resolutions. Compositors can request the previous behavior if desired.
Fixes: 978fa2f6d0b1 ("drm/amd/display: Use scaling for non-native resolutions on eDP") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunday Clement [Mon, 27 Oct 2025 18:00:59 +0000 (14:00 -0400)]
drm/amdkfd: Fix Unchecked Return Values
Properly Check for return values from calls to debug functions in
runtime_disable().
v2: storing the last non zero returned value from the loop.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If device suspend has failed, add a recovery flow that will attempt
to unwind the suspend and get things back up and running.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4627 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase2()
If any hardware IPs involved with the second phase of suspend fail, unwind
all steps to restore back to original state.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase1()
If any hardware IPs involved with the first phase of suspend fail, unwind
all steps to restore back to original state.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Sun, 26 Oct 2025 04:29:36 +0000 (23:29 -0500)]
drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()
For S3 on vangogh, PMFW needs to be notified before the
driver powers down RLC. This already happens in smu_disable_dpms()
so drop the superfluous call in amdgpu_device_suspend().
Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lukas Bulwahn [Thu, 30 Oct 2025 14:37:37 +0000 (15:37 +0100)]
MAINTAINERS: adjust file entry in AMD DISPLAY CORE - DML
Commit e6a8a000cfe6 ("drm/amd/display: Rename dml2 to dml2_0 folder")
renames the directory dml2 to dml2_0 in ./drivers/gpu/drm/amd/display/dc,
but misses to adjust the file entry in AMD DISPLAY CORE - DML.
Adjust the file entry after this directory renaming.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Thu, 30 Oct 2025 09:15:56 +0000 (10:15 +0100)]
drm/amd/pm/si: Delete unused structs and fields
The contents of si_dpm.h seem to have been copied from the
old radeon driver, including a lot of structs and fields which
were only relevant to GPU generations even older than SI.
A lot of these can be deleted without causing much churn to the
actual SI DPM code. Let's delete them to make the code easier
to understand.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Mon, 11 Aug 2025 13:37:05 +0000 (19:07 +0530)]
drm/amd/pm: Add helper functions for gpu metrics
Add helper macros to define metrics struct definitions. It will define
structs with field type followed by actual field. A helper macro is also
added to initialize the field encoding for all fields and to initialize
the field members to 0xFFs.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Thu, 30 Oct 2025 05:06:24 +0000 (13:06 +0800)]
drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()
Use the correct label to complete all cleanup work.
Fixes: 4d154b1ca580 ("drm/amd/pm: Add support for DPM policies") Fixes: 25e82f2e2c59 ("drm/amd/pm: Add temperature metrics sysfs entry") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Mon, 27 Oct 2025 07:22:54 +0000 (15:22 +0800)]
drm/amd/pm: fix the issue of size calculation error for smu 13.0.6
v1:
the driver should handle return value of smu_v13_0_6_printk_clk_levels()
to return the correct size for sysfs reads.
v2:
fix the issue of size calculation error in smu_v13_0_6_print_clks()
Fixes: cdfdec6f1608 ("drm/amd/pm: Avoid writing nulls into `pp_od_clk_voltage`") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Wed, 29 Oct 2025 14:28:14 +0000 (10:28 -0400)]
drm/amd/display: Fix null pointer on analog detection
Check if we have an amdgpu_dm_connector->dc_sink first before
adding common modes for analog outputs. If we don't have a
sink yet we can safely skip this.
Fixes: 70181ad96ec2 ("drm/amd/display: Add common modes to analog displays without EDID") Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Tue, 28 Oct 2025 14:08:28 +0000 (22:08 +0800)]
drm/amdgpu: Remove invalidate and flush hdp macros
Remove amdgpu_asic_flush_hdp & amdgpu_asic_invalidate_hdp functions and
directly use the mapped ones
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Wed, 22 Oct 2025 07:11:42 +0000 (15:11 +0800)]
drm/amd/ras: Add CPER ring read for uniras
Read CPER raw data from debugfs node "/sys/kernel/debug/dri/*/
amdgpu_ring_cper".
Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 27 Oct 2025 10:11:52 +0000 (18:11 +0800)]
drm/amdgpu: Update invalidate and flush hdp function
Update asic_invalidate_hdp and asic_flush_hdp function to check if ip
function exist, if not return void
v2: Use else/if (Kevin)
Update function name (Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 28 Oct 2025 12:09:27 +0000 (17:39 +0530)]
drm/amdgpu: caller should make sure not to double free
Remove the NULL check from amdgpu_hmm_range_free for hmm_pfns
as caller is responsible not to call amdgpu_hmm_range_free
more than once.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 14 Oct 2025 20:45:17 +0000 (16:45 -0400)]
drm/amdgpu: set default gfx reset masks for gfx6-8
These were not set so soft recovery was inadvertantly
disabled.
Fixes: 6ac55eab4fc4 ("drm/amdgpu: move reset support type checks into the caller") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 28 Oct 2025 08:19:24 +0000 (13:49 +0530)]
drm/amdkfd: clean up the code to free hmm_range
a. hmm_range is either NULL or a valid pointer so we
do not need to set range to NULL ever.
b. keep the hmm_range_free in the end irrespective of
the other conditions to avoid some additional checks
and also avoid double free issue.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Simona Vetter [Fri, 31 Oct 2025 17:57:54 +0000 (18:57 +0100)]
Merge tag 'drm-intel-gt-next-2025-10-29' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Driver Changes:
Fixes/improvements/new stuff:
- Set O_LARGEFILE in __create_shmem() (Taotao Chen)
- Fix incorrect error handling in shmem_pwrite() (Taotao Chen)
- Skip GuC communication warning on reset in progress [guc] (Zhanjun Dong)
- Fix conversion between clock ticks and nanoseconds [guc] (Umesh Nerlige Ramappa)
Miscellaneous:
- Avoid accessing uninitialized context in emit_rpcs_query() [selftests] (Krzysztof Karas)
- Fix typo in comment (I915_EXEC_NO_RELOC) [gem] (Marlon Henrique Sanches)
Backmerges:
- Merge drm/drm-next into drm-intel-gt-next (Joonas Lahtinen)
Simona Vetter [Fri, 31 Oct 2025 17:47:16 +0000 (18:47 +0100)]
Merge tag 'drm-misc-next-2025-10-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.19-rc1:
UAPI Changes:
Cross-subsystem Changes:
- Update DT bindings for renesas and powervr-rogue.
- Update MAINTAINERS email and add spsc_queue.
Core Changes:
- Allow ttm page protection flags on risc-v.
- Move freeing of drm client memory to driver.
Driver Changes:
- Assorted small fixes and updates to qaic, ivpu, st7571-i2c, gud,
amdxdna.
- Allow configuration of vkms' display through configfs.
- Add Arm Ethos-U65/U85 accel driver.
Simona Vetter [Fri, 31 Oct 2025 17:40:53 +0000 (18:40 +0100)]
Merge tag 'drm-xe-next-2025-10-28' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Driver Changes:
More xe3p support (Harish, Brian, Balasubramani, Matt Roper)
Make panic support work on VRAM for display (Maarten)
Fix stolen size check (Shuicheng)
xe_pci_test update (Gustavo)
VF migration updates (Tomasz)
A couple of fixes around allocation and PM references (Matt Brost)
Migration update for the MEM_COPY instruction (Matt Auld)
Initial CRI support (Balasubramani, Matt Roper)
Use SVM range helpers in PT layer (Matt Brost)
Drop MAX_GT_TYPE_CHARS constant (Matt Roper)
Fix spelling and typos (Sanjay)
Fix VF FLR synchronization between all GTs (Michal)
Add a Workaround (Nitin)
Access VF's register using dedicated MMIO view (Michal)
pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-redundant explicit call to
pm_runtime_mark_last_busy().
Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-redundant explicit call to
pm_runtime_mark_last_busy().
Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:02:03 +0000 (20:02 +0200)]
drm/amdgpu: Use DC by default for Bonaire
Now that DC supports analog connectors, there is nothing stopping
us from using it by default on Bonaire.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:02:02 +0000 (20:02 +0200)]
drm/amd/display: Don't add freesync modes to analog displays (v2)
VRR is not supported on analog signals.
Don't add freesync modes to analog displays or when
VRR is unsupported by DC.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:02:01 +0000 (20:02 +0200)]
drm/amd/display: Add common modes to analog displays without EDID
When the EDID of an analog display is not available, we can't
know the possible modes supported by the display. However, we
still need to offer the user to select from a variety of common
modes. It will be up to the user to select the best one, though.
This is how it works on other operating systems as well as the
legacy display code path in amdgpu.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:02:00 +0000 (20:02 +0200)]
drm/amd/display: Use DAC load detection on analog connectors (v2)
This feature is useful for analog connections without EDID:
- Really old monitors with a VGA connector
- Cheap DVI/VGA adapters that don't connect DDC pins
When a connection is established through DAC load detection,
the driver is supposed to fill in the supported modes for the
display, which we already do in amdgpu_dm_connector_get_modes.
Also, because the load detection causes visible glitches, do not
attempt to poll the connector again after it was detected this
way. Note that it will still be polled after sleep/resume or
when force is enabled, which is okay.
v2:
Add dc_connection_dac_load connection type.
Properly release sink when no display is connected.
Don't print error when EDID isn't read from an analog display.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:59 +0000 (20:01 +0200)]
drm/amd/display: Add DAC_LoadDetection to BIOS parser (v2)
DAC_LoadDetection can be used to determine whether something
is connected to an analog connector by determining if there is
an analog load. This causes visible flickering on displays, so
we only resort to using this when the connected display doesn't
have an EDID.
For reference, see the legacy display code:
amdgpu_atombios_encoder_dac_load_detect
v2:
Only clear corresponding bit from BIOS_SCRATCH_0.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:58 +0000 (20:01 +0200)]
drm/amd/display: Make get_support_mask_for_device_id reusable
This will be reused by DAC load detection.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:57 +0000 (20:01 +0200)]
drm/amd/display: Add DCE BIOS_SCRATCH_0 register
The BIOS uses this register to write the results of the
DAC_LoadDetection command, so we'll need to read this
in order to make DAC load detection work.
As a reference, I used the mmBIOS_SCRATCH_0 definition from
the amdgpu legacy display code.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:56 +0000 (20:01 +0200)]
drm/amd/display: Poll analog connectors (v3)
VGA connectors don't support any hotplug detection, so the kernel
needs to periodically poll them to see if a display is connected.
DVI-I connectors have hotplug detection for digital signals, and
some analog DVI cables pull up that pin to work with that.
However, in general not all DVI cables do this so we can't rely on
this feature, therefore we need to poll DVI-I connectors as well.
v2:
Call drm_kms_helper_poll_fini in amdgpu_dm_hpd_fini.
Disable/enable polling on suspend/resume.
Don't call full link detection when already connected.
v3:
Encounter CLANG build failure. Remove unused variable:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_irq.c:980:7:
error: variable 'use_polling' set but not used [-Werror,-Wunused-but-
set-variable]
980 | bool use_polling = false;
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prepare for polling analog connectors.
Document the function better.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:54 +0000 (20:01 +0200)]
drm/amd/display: Add analog link detection (v2)
Analog displays typically have a DDC connection which can be
used by the GPU to read EDID. This commit adds the capability
to probe analog displays using DDC, reading the EDID header and
deciding whether the analog link is connected based on the data
that was read.
Note that VGA has no HPD (hotplug detection), so we need to
to do analog link detection for VGA before checking HPD.
In case of DVI-I, while the connector supports HPD, not all
analog cables connect the HPD pins, so we can't rely on HPD
either.
For reference, see the legacy display code:
amdgpu_connector_vga_detect
amdgpu_display_ddc_probe
DAC load detection will be implemented in a separate commit.
v2:
Fix crash / black screen on newer GPUs during link detection.
Ignore HPD pin for analog connectors.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:53 +0000 (20:01 +0200)]
drm/amd/display: Support DAC in dce110_hwseq
The dce110_hwseq is used by all DCE hardware,
so add the DAC support here.
When enabling/disabling a stream for a RGB signal,
this will call the VBIOS to enable/disable the DAC.
Additionally, when applying the controller context,
call SelectCRTC_Source from VBIOS in order to
direct the CRTC output to the DAC.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:52 +0000 (20:01 +0200)]
drm/amd/display: Implement DCE analog link encoders (v2)
We support two kinds of analog connections:
1. DVI-I, which allows both digital and analog signals:
The DC code base only allows 1 encoder per connector, and the
preferred engine type is still going to be digital. So, for DVI-I
to work, we need to make sure the pre-existing link encoder can
also work with analog signals.
1. VGA, which only supports analog signals:
For VGA, we need to create a link encoder that only works with the
DAC without perturbing any digital transmitter functionality.
Since dce110_link_encoder already supports analog DVI-I,
just reuse that code for VGA as well.
v2:
Reduce code churn by reusing same link encoder for VGA and DVI-I.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:51 +0000 (20:01 +0200)]
drm/amd/display: Implement DCE analog stream encoders
Add analog stream encoders for DCE which will be used when
connecting an analog display through VGA or DVI-I.
Considering that all stream encoder functions currently deal
with digital streams, there is nothing for an analog stream
encoder to do, making them basically a no-op.
That being said, we still need some kind of stream encoder to
represent an analog stream, and it is beneficial to split them
from digital stream encoders in the code to make sure they
don't accidentally write any DIG* registers.
On supported chips there is currently up to 1 analog encoder,
which is DACA. There are references to DACB in some code such
as VBIOS commands and register files but it seems to be
not present on DCE 6 and newer.
Set num_analog_stream_encoder = 1 so that we can support
the analog connectors on DCE 6-10, for now.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:50 +0000 (20:01 +0200)]
drm/amd/display: Add concept of analog encoders (v2)
Add a num_analog_stream_encoders field to indicate how many
analog stream encoders are present. When analog stream encoders
are present, create them.
Additionally, add an analog_engine field to link encoders and
search for supported analog encoders in the BIOS for each link.
When connecting an RGB signal, search for analog stream encoders.
The actual DCE analog link and stream encoder is going to be
added in a subsequent commit.
v2:
Add check to see if an analog engine is really supported.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:49 +0000 (20:01 +0200)]
drm/amd/display: Determine early if a link has supported encoders (v2)
Avoid initializing DDC, HPD, etc. when we know that the link is
not going to be constructed because it has no supported encoders.
This is mainly useful for old GPUs which may have encoders such
as TRAVIS and NUTMEG that are not yet supported by DC.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:01:48 +0000 (20:01 +0200)]
drm/amd/display: Don't try to enable/disable HPD when unavailable
VGA connectors don't have HPD (hotplug detection), so don't
touch any HPD related registers for VGA.
Determine whether hotplug detection is available by checking that
the interrupt source is invalid.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>