drm/amd/display: Add clk_mgr NULL checks in dcn32_initialize_min_clocks()
dcn32_init_hw() checks dc->clk_mgr before calling init_clocks(), so the
clock manager is not treated as unconditionally present on this path.
However, dcn32_initialize_min_clocks() later dereferences dc->clk_mgr,
bw_params, and clk_mgr callbacks without validating them.
Add the required guards in dcn32_initialize_min_clocks() before
accessing clk_mgr-dependent state, and check callback presence before
calling get_dispclk_from_dentist() and update_clocks().
Also guard the later update_bw_bounding_box() call in the FAMS2-disabled
path since it also dereferences dc->clk_mgr->bw_params.
This keeps clk_mgr handling consistent in the DCN32 HW init flow and
avoids possible NULL pointer dereferences reported by Smatch.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn32/dcn32_hwseq.c:1012 dcn32_init_hw() error: we previously assumed 'dc->clk_mgr' could be null (see line 978)
Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Thu, 23 Oct 2025 03:04:40 +0000 (11:04 +0800)]
drm/amd/pm: add get_gpu_metrics support for 15.0.8
export .get_gpu_metrics interface for 15.0.8
v2: Remove members already exposed by other interfaces, use mask,
logical conversion (Lijo)
v3: Use correct logic for hbm stacks loop (Lijo)
Remove buffer allocation
v4: Make out of bound check outside loop (Lijo)
v5: fix locking in error case (Alex)
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 26 Nov 2025 10:41:34 +0000 (18:41 +0800)]
drm/amd/pm: Add get_pm_metrics support for smu 15.0.8
export .get_pm_metrics interface for smu 15.0.8.
v2: Make tmo as unsigned (Lijo)
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Wed, 26 Nov 2025 08:02:41 +0000 (16:02 +0800)]
drm/amd/pm: Setup driver pptable for smu 15.0.8
Setup driver pptable and initialize data from static metrics table for
smu_v15_0_8
v2: Remove unrelated changes and update description (Lijo)
v3: Use ARRAY_SIZE (Lijo)
v4: Move structure to header file
v5: squash in static metrics support (Asad)
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 24 Nov 2025 17:19:13 +0000 (01:19 +0800)]
drm/amd/pm: Add mode2 support for smu_v15_0_8
Add initial mode2 support for smu_v15_0_8
v2: Move out non smu code, remove pci save/restore logic (Lijo)
v3: squash in updated msg (Alex)
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Fri, 20 Mar 2026 11:59:01 +0000 (17:29 +0530)]
drm/amdgpu/userq: cleanup amdgpu_userq_get/put where not needed
amdgpu_userq_put/get are not needed in case we already holding
the userq_mutex and reference is valid already from queue create
time or from signal ioctl. These additional get/put could be a
potential reason for deadlock in case the ref count reaches zero
and destroy is called which again try to take the userq_mutex.
Due to the above change we avoid deadlock between suspend/restore
calling destroy queues trying to take userq_mutex again.
Cc: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 18 Mar 2026 05:52:57 +0000 (13:52 +0800)]
drm/amd/pm: Return -EOPNOTSUPP for unsupported OD_MCLK on smu_v13_0_6
When SET_UCLK_MAX capability is absent, return -EOPNOTSUPP from
smu_v13_0_6_emit_clk_levels() for OD_MCLK instead of 0. This makes
unsupported OD_MCLK reporting consistent with other clock types
and allows callers to skip the entry cleanly.
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Wed, 18 Mar 2026 05:48:30 +0000 (13:48 +0800)]
drm/amd/pm: Skip redundant UCLK restore in smu_v13_0_6
Only reapply UCLK soft limits during PP_OD_RESTORE_DEFAULT when the
current max differs from the DPM table max. This avoids redundant
SMC updates and prevents -EINVAL on restore when no change is needed.
Fixes: b7a900344546 ("drm/amd/pm: Allow setting max UCLK on SMU v13.0.6") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 13 Mar 2026 22:42:59 +0000 (17:42 -0500)]
drm/amd/display: Promote DC to 3.2.375
This version brings along following fixes:
- Rework YCbCr422 DSC policy
- Restore full update for tiling change to linear
- add dccg FGCG mask init
- Remove unnecessary completion flag for secure display
- Agument live + capture with CVT case.
- remove dc_clock_limit for apu
- Fix Signed/Unsigned Int Usage Compiler Warning
- Hardcode dtbclk value in bw_params
- Revert inbox0 lock for cursor due to deadlock
- Add 3DLUT DMA broadcast support
- Fix Silence warnings
- export get_power_profile interface for later use
- pg cntl update based on previous asic.
- remove disable_sutter touch pstate debug code
- Refactor DC update checks
- Fix drm_edid leak in amdgpu_dm
- Add Extra SMU Log for dtbclk
- Clamp min DS DCFCLK value to DCN limit
- Update dpia supported configuration
- Multiple DCN42 updates
Joshua Aberback [Thu, 12 Mar 2026 22:33:49 +0000 (18:33 -0400)]
drm/amd/display: Restore full update for tiling change to linear
[Why]
There was previously a dc debug flag to indicate that tiling
changes should only be a medium update instead of full. The
function get_plane_info_type was refactored to not rely on dc
state, but in the process the logic was unintentionally changed,
which leads to screen corruption in some cases.
[How]
- add flag to tiling struct to avoid full update when necessary
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wayne Lin [Fri, 13 Mar 2026 04:05:40 +0000 (12:05 +0800)]
drm/amd/display: Remove unnecessary completion flag for secure display
The completion flag is not used in secure display today.
Remove unnecessary code.
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ChunTao Tso [Wed, 4 Mar 2026 08:37:58 +0000 (16:37 +0800)]
drm/amd/display: Agument live + capture with CVT case.
1. Add LIVE_CAPTURE_WITH_CVT bit (bit[2]) in union replay_optimization
to control this feature via DalRegKey_ReplayOptimization.
2. Check the bit in mod_power_set_live_capture_with_cvt_activate function
before enabling live capture with CVT.
3. Use LIVE_CAPTURE_WITH_CVT to control if Replay want to send CVT in
live + capture or not.
Reviewed-by: Leon Huang <leon.huang1@amd.com> Signed-off-by: ChunTao Tso <chuntao.tso@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Wed, 11 Mar 2026 21:05:19 +0000 (17:05 -0400)]
drm/amd/display: remove dc_clock_limit for apu
[why]
current apu pmfw does not support dc_clock_limit
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix Signed/Unsigned Int Usage Compiler Warning
[Why] Compiler generates compiler warnings when signed enum
constants or literal -1 are implicitly converted to unsigned
integer types, cluttering build output and masking genuine issues.
[How] Use UINT_MAX as the invalid sentinel for unsigned IDs and align
loop/index types to unsigned where appropriate to remove implicit
signed-to-unsigned conversions, with no functional behavior change.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Wed, 11 Mar 2026 19:16:00 +0000 (15:16 -0400)]
drm/amd/display: Hardcode dtbclk value in bw_params
[why&how]
dtbclk should always be 600MHz. Previous logic was to get the real value
from SMU, but this returns 0 when dtbclk is off. Not a problem during
boot when pre-OS enables dtbclk, but PnP was broken due to this.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Revert inbox0 lock for cursor due to deadlock
[Why]
A deadlock occurs when using inbox0 lock for cursor operations on
PSR-SU and Replays that does not when using the inbox1 locking path.
This is because of a priority inversion issue where inbox1 work
cannot be serviced while holding the HW lock from driver and sending
cursor notifications to DMUB.
Typically the lower priority of inbox1 for the lock command would
allow the PSR and Replay FSMs to complete their transition prior
to giving driver the lock but this is no longer the case with inbox0
having the highest priority in servicing.
[How]
This will reintroduce any synchronization bugs that were there
with Replay or PSR-SU touching the cursor at the same time as driver.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Sat, 7 Mar 2026 05:53:03 +0000 (05:53 +0000)]
drm/amd/display: Add 3DLUT DMA broadcast support
[WHY&HOW]
A single HUBP can be used to fetch 3DLUT and broadcast to a
single HUBP. Add logic to select the top pipe for a given
plane and use it's HUBP as the broadcast source for multiple
MPC's.
Charlene Liu [Fri, 6 Mar 2026 15:40:07 +0000 (10:40 -0500)]
drm/amd/display: export get_power_profile interface for later use
[why]
export dcn401 get_power_profile for later asic.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Thu, 5 Mar 2026 21:42:29 +0000 (16:42 -0500)]
drm/amd/display: Refactor DC update checks
[WHY&HOW]
DC currently has fragmented definitions of update types. This changes
consolidates them into a single interface, and adds expanded
functionality to accommodate all use cases.
- adds `dc_check_update_state_and_surfaces_for_stream` to determine
update type including state, surface, and stream changes.
- adds missing surface/stream update checks to
`dc_check_update_surfaces_for_stream`
- adds new update type `UPDATE_TYPE_ADDR_ONLY` to accomodate flows where
further distinction from `UPDATE_TYPE_FAST` was needed
- removes caller reliance on `enable_legacy_fast_update` to determine
which commit function to use, instead embedding it in the update type
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Mon, 9 Mar 2026 17:16:08 +0000 (11:16 -0600)]
drm/amd/display: Fix drm_edid leak in amdgpu_dm
[WHAT]
When a sink is connected, aconnector->drm_edid was overwritten without
freeing the previous allocation, causing a memory leak on resume.
[HOW]
Free the previous drm_edid before updating it.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix DCN42 memory clock table using MemClk instead of UClk
[Why]
DCN42 was using UClk values instead of MemClk from MemPstateTable, causing
DML to see half the actual DRAM bandwidth on DDR5 systems and reject high
refresh rate modes.
[How]
Change dcn42_init_clocks() to use MemPstateTable[i].MemClk instead of
MemPstateTable[i].UClk for memclk_mhz initialization.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Alexander Chechik <alexander.chechik@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Tue, 17 Mar 2026 00:17:57 +0000 (20:17 -0400)]
drm/amd/display: Update underflow detection for DCN42
[Why]
The DCN42 underflow detection functions in dcn42_optc.c use
OPTC_RSMU_UNDERFLOW register but the register offset definitions
were missing from dcn_4_2_0_offset.h and dcn42_resource.h.
[How]
Add missing register definitions.
Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Tue, 3 Feb 2026 16:30:57 +0000 (17:30 +0100)]
drm/amdgpu: fix some more bug in amdgpu_gem_va_ioctl
Some illegal combination of input flags were not checked and we need to
take the PDEs into account when returning the fence as well.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Sat, 14 Mar 2026 00:34:48 +0000 (20:34 -0400)]
drm/amd/display: Clamp min DS DCFCLK value to DCN limit
[why & how]
DCN has a global limit for minimum DS DCFCLK during any operation.
Adhere to that limit and add a debug flag.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Split arbiter programming for DCN42
[Why]
We don't want to update the timeout threshold for stall recovery in
firmware dynamically for DCN42 as we're not using FAMS.
Firmware should own programming of this register since the recovery
can be broken if driver updates the value to 0.
[How]
Split program_arbiter for dcn42 and skip the part that updates the
timeout threshold.
Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Sat, 14 Mar 2026 00:27:54 +0000 (20:27 -0400)]
drm/amd/display: Add missing dcn42 hubbub function pointers
This aligning commit combines:
- fix dcn42 det programming)
- fix missing dcn42 pointers
- fix SDPIF_Request_Rate_Limit programming value
V2: Add back dchvm_init for DCN42
Reviewed-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Fri, 13 Mar 2026 23:34:02 +0000 (19:34 -0400)]
drm/amd/display: Add get_default_tiling_info for dcn42
Add DCN42 portion that was stripped during previously.
Fixes: 8333f22e44a9 ("drm/amd/display: Query DC for gfx handling when setting linear tiling") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move it out of smu present block for cases where it isn't
Reviewed-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Thu, 5 Mar 2026 15:14:39 +0000 (10:14 -0500)]
drm/amd/display: System Hang When System enters to S0i3 w/ iGPU
[why]
System Hang when system enters to S0i3 w/ iGPU
some link_enc are NULL due to BIOS integration info table not correct,
but driver should have enough null pointer protection.
Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Wed, 4 Mar 2026 01:07:58 +0000 (20:07 -0500)]
drm/amd/display: Move DPM clk read to clk_mgr_construct in DCN42
[Why&How]
The DPM clocks on DCN42 are currently read on every dm_resume, which can
cause in gpu memory freeing while the device is still in suspend.
Move the DPM clock read functionality to clk_mgr_construct() so it
completes once on driver enablement.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DCN401 didn't have a MRQ present so these fields didn't exist.
They are still present on DCN42 so we need to continue programming
them like we did on DCN35 or we can block have poor meta requesting
efficiency which blocks p-state.
[How]
Add `hubp42_program_requestor` which takes DML21 input and programs
the registers like DCN35 and prior.
Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Mon, 2 Mar 2026 20:45:41 +0000 (15:45 -0500)]
drm/amd/display: dcn42 don't round up disclk and dppclk
[why]
dml2 based on num_enabled clock != 2 to do clock ramming to dpm.
apu has 8 levels dispclk/dppclk/dcfclk/fclk, but only 4 levels of memclk.
to avoid mapping dispclk/dppclk to DPM clock,
based on arch review, force dispclk/dppclk num_level as 2.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Eric Huang [Mon, 16 Mar 2026 15:01:30 +0000 (11:01 -0400)]
drm/amdgpu: prevent immediate PASID reuse case
PASID resue could cause interrupt issue when process
immediately runs into hw state left by previous
process exited with the same PASID, it's possible that
page faults are still pending in the IH ring buffer when
the process exits and frees up its PASID. To prevent the
case, it uses idr cyclic allocator same as kernel pid's.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ruijing Dong [Tue, 17 Mar 2026 17:54:11 +0000 (13:54 -0400)]
drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3)
amdgpu_device_get_job_timeout_settings() passes a pointer directly
to the global amdgpu_lockup_timeout[] buffer into strsep().
strsep() destructively replaces delimiter characters with '\0'
in-place.
On multi-GPU systems, this function is called once per device.
When a multi-value setting like "0,0,0,-1" is used, the first
GPU's call transforms the global buffer into "0\00\00\0-1". The
second GPU then sees only "0" (terminated at the first '\0'),
parses a single value, hits the single-value fallthrough
(index == 1), and applies timeout=0 to all rings — causing
immediate false job timeouts.
Fix this by copying into a stack-local array before calling
strsep(), so the global module parameter buffer remains intact
across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
(256) bytes, which is safe for the stack.
v2: wrap commit message to 72 columns, add Assisted-by tag.
v3: use stack array with strscpy() instead of kstrdup()/kfree()
to avoid unnecessary heap allocation (Christian).
This patch was developed with assistance from Claude (claude-opus-4-6).
Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 19 Feb 2026 23:20:27 +0000 (18:20 -0500)]
drm/amdgpu/gfx11: look at the right prop for gfx queue priority
Look at hqd_queue_priority rather than hqd_pipe_priority.
In practice, it didn't matter as both were always set for
kernel queues, but that will change in the future.
Fixes: 2e216b1e6ba2 ("drm/amdgpu/gfx11: handle priority setup for gfx pipe1")
Reviewed-by:Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 19 Feb 2026 23:18:28 +0000 (18:18 -0500)]
drm/amdgpu/gfx10: look at the right prop for gfx queue priority
Look at hqd_queue_priority rather than hqd_pipe_priority.
In practice, it didn't matter as both were always set for
kernel queues, but that will change in the future.
Fixes: b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue")
Reviewed-by:Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 17 Mar 2026 20:34:41 +0000 (16:34 -0400)]
drm/amdgpu/pm: drop SMU driver if version not matched messages
It just leads to user confusion.
Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Skip discovery dump when topology is unavailable
When generating a devcoredump, amdgpu_discovery_dump() prints the IP
discovery topology.
The function already needs to handle the case where
adev->discovery.ip_top is NULL to avoid a crash.
Currently, the code prints a section header and an additional message
when the topology is unavailable.
However, for platforms where discovery is not used, this section is not
expected to be present. Printing an extra message adds unnecessary
output.
Simplify this by skipping the entire section when ip_top is NULL.
The NULL check is kept to avoid a crash, but no output is generated when
the discovery topology is unavailable.
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_discovery_dump() uses adev->discovery.ip_top. However,
ip_top may be NULL if the discovery topology was never initialized.
The current code does not check for this before using ip_top. As a
result, when ip_top is NULL, the coredump worker crashes while taking
the spinlock for ip_top->die_kset.
Fix this by checking for a missing ip_top before walking the discovery
topology. If it is unavailable, print a short message in the dump and
return safely.
- If ip_top is NULL, print a message and skip the dump
- Also add the same check in the cleanup path
This makes the coredump and cleanup paths safe even when the
discovery topology is not available.
v2: Updated commit message - Clarified that ip_top is not freed, it can
just be NULL if discovery was not initialized. (Christian/Lijo)
v3: Removed the extra drm_warn() for sysfs init failure as sysfs already
reports errors. (Christian)
Fixes: e81eff80aad6 ("drm/amdgpu: include ip discovery data in devcoredump") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yussuf Khalil [Fri, 6 Mar 2026 12:06:35 +0000 (12:06 +0000)]
drm/amd/display: Do not skip unrelated mode changes in DSC validation
Starting with commit 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in
atomic check"), amdgpu resets the CRTC state mode_changed flag to false when
recomputing the DSC configuration results in no timing change for a particular
stream.
However, this is incorrect in scenarios where a change in MST/DSC configuration
happens in the same KMS commit as another (unrelated) mode change. For example,
the integrated panel of a laptop may be configured differently (e.g., HDR
enabled/disabled) depending on whether external screens are attached. In this
case, plugging in external DP-MST screens may result in the mode_changed flag
being dropped incorrectly for the integrated panel if its DSC configuration
did not change during precomputation in pre_validate_dsc().
At this point, however, dm_update_crtc_state() has already created new streams
for CRTCs with DSC-independent mode changes. In turn,
amdgpu_dm_commit_streams() will never release the old stream, resulting in a
memory leak. amdgpu_dm_atomic_commit_tail() will never acquire a reference to
the new stream either, which manifests as a use-after-free when the stream gets
disabled later on:
BUG: KASAN: use-after-free in dc_stream_release+0x25/0x90 [amdgpu]
Write of size 4 at addr ffff88813d836524 by task kworker/9:9/29977
Since there is no reliable way of figuring out whether a CRTC has unrelated
mode changes pending at the time of DSC validation, remember the value of the
mode_changed flag from before the point where a CRTC was marked as potentially
affected by a change in DSC configuration. Reset the mode_changed flag to this
earlier value instead in pre_validate_dsc().
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5004 Fixes: 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in atomic check") Signed-off-by: Yussuf Khalil <dev@pp3345.net> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/ras: Remove redundant NULL check in pending bad-bank list iteration
ras_umc_log_pending_bad_bank() walks through a list of pending ECC
bad-bank entries. These entries are saved when a bad-bank error cannot
be processed immediately, for example during a GPU reset.
Later, this function iterates over the pending list and retries logging
each bad-bank error. If logging succeeds, the entry is removed from the
list and the memory for that node is freed.
The loop uses list_for_each_entry_safe(), which already guarantees that
ecc_node points to a valid list entry while the loop body is executing.
Checking "ecc_node &&" inside the loop is therefore unnecessary and
redundant.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_umc.c:225 ras_umc_log_pending_bad_bank() warn: variable dereferenced before check 'ecc_node' (see line 223)
Fixes: 7a3f9c0992c4 ("drm/amd/ras: Add umc common ras functions") Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: YiPeng Chai <YiPeng.Chai@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 29 Jan 2026 11:58:10 +0000 (12:58 +0100)]
drm/amdgpu: make amdgpu_user_wait_ioctl more resilent v2
When the memory allocated by userspace isn't sufficient for all the
fences then just wait on them instead of returning an error.
v2: use correct variable as pointed out by Sunil
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jesse.Zhang [Fri, 13 Mar 2026 06:17:10 +0000 (14:17 +0800)]
drm/amdgpu: replace WARN with DRM_ERROR for invalid sched priority
amdgpu_sched_ioctl() currently uses WARN(1, ...) when userspace passes
an out-of-range context priority value. WARN(1, ...) is unconditional
and produces a full stack trace, which is disproportionate for a simple
input validation failure -- the invalid value is already rejected with
-EINVAL on the next line.
Replace WARN(1, ...) with DRM_ERROR() to log the invalid value at an
appropriate level without generating a stack dump. The -EINVAL return
to userspace is unchanged.
No functional change for well-formed userspace callers.
v2:
- Reworked commit message to focus on appropriate log level for
parameter validation
- Clarified that -EINVAL behavior is preserved (Vitaly)
v3: completely drop that warning.
Invalid parameters should never clutter the system log. (Christian)
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 63fff551318f ("drm/amd/display: Add NV12/P010 formats to primary plane") Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Remove dead negative offset check in amdgpu_virt_init_critical_region()
amdgpu_virt_init_critical_region() stores init_hdr_offset as u64.
The subsequent check for init_hdr_offset < 0 is therefore always false.
Drop the unreachable validation and rely on the existing
check_add_overflow() and VRAM end bounds check for offset validation.
This resolves the Smatch warning about comparing an unsigned value
against zero.
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:953 amdgpu_virt_init_critical_region() warn: unsigned 'init_hdr_offset' is never less than zero.
Fixes: 07009df6494d ("drm/amdgpu: Introduce SRIOV critical regions v2 during VF init") Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Ellen Pan <yunru.pan@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Move amdgpu_vm_is_bo_always_valid() before first use
Smatch reports that 'bo' could be NULL in amdgpu_vm_bo_update(), even
though amdgpu_vm_is_bo_always_valid() already checks for a NULL BO.
Move amdgpu_vm_is_bo_always_valid() earlier in the file so the helper
definition appears before its first use. This allows static analysis
tools to see the NULL check performed by the helper and avoids the
warning.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 29 Jan 2026 11:29:26 +0000 (12:29 +0100)]
drm/amdgpu: rework amdgpu_userq_wait_ioctl v4
Lockdep was complaining about a number of issues here. Especially lock
inversion between syncobj, dma_resv and copying things into userspace.
Rework the functionality. Split it up into multiple functions,
consistenly use memdup_array_user(), fix the lock inversions and a few
more bugs in error handling.
v2: drop the dma_fence leak fix, turned out that was actually correct,
just not well documented. Apply some more cleanup suggestion from
Tvrtko.
v3: rebase on already done cleanups
v4: add missing dma_fence_put() in error path.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 28 Jan 2026 15:07:03 +0000 (16:07 +0100)]
drm/amdgpu: fix adding eviction fence
We can't add the eviction fence without validating the BO.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Fri, 30 Jan 2026 15:46:36 +0000 (16:46 +0100)]
drm/amdgpu: fix eviction fence and userq manager shutdown
That is a really complicated dance and wasn't implemented fully correct.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
First of all a lot of checks were placed at incorrect locations, especially if
the resume worker should run or not.
Then a bunch of code was just mid-layering because of incorrect assignment who
should do what.
And finally comments explaining what happens instead of why.
Just re-write it from scratch, that should at least fix some of the hangs we
are seeing.
Use RCU for the eviction fence pointer in the manager, the spinlock usage was
mostly incorrect as well. Then finally remove all the nonsense checks and
actually add them in the correct locations.
v2: some typo fixes and cleanups suggested by Sunil
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 16 Mar 2026 15:04:46 +0000 (11:04 -0400)]
drm/amdgpu: rework how we handle TLB fences
Add a new VM flag to indicate whether or not we need
a TLB fence. Userqs (KFD or KGD) require a TLB fence.
A TLB fence is not strictly required for kernel queues,
but it shouldn't hurt. That said, enabling this
unconditionally should be fine, but it seems to tickle
some issues in KIQ/MES. Only enable them for KFD,
or when KGD userq queues are enabled (currently via module
parameter).
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/ras: Fix NULL deref in ras_core_get_utc_second_timestamp()
ras_core_get_utc_second_timestamp() retrieves the current UTC timestamp
(in seconds since the Unix epoch) through a platform-specific RAS system
callback and is used for timestamping RAS error events.
The function checks ras_core in the conditional statement before calling
the sys_fn callback. However, when the condition fails, the function
prints an error message using ras_core->dev.
If ras_core is NULL, this can lead to a potential NULL pointer
dereference when accessing ras_core->dev.
Add an early NULL check for ras_core at the beginning of the function
and return 0 when the pointer is not valid. This prevents the
dereference and makes the control flow clearer.
Fixes: 13c91b5b4378 ("drm/amd/ras: Add rascore unified interface function") Cc: YiPeng Chai <YiPeng.Chai@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pratap Nirujogi [Wed, 11 Mar 2026 16:15:09 +0000 (12:15 -0400)]
drm/amdgpu: Fix ISP segfault issue in kernel v7.0
Add NULL pointer checks for dev->type before accessing
dev->type->name in ISP genpd add/remove functions to
prevent kernel crashes.
This regression was introduced in v7.0 as the wakeup sources
are registered using physical device instead of ACPI device.
This led to adding wakeup source device as the first child of
AMDGPU device without initializing dev-type variable, and
resulted in segfault when accessed it in the amdgpu isp driver.
Fixes: 057edc58aa59 ("ACPI: PM: Register wakeup sources under physical devices") Suggested-by: Bin Du <Bin.Du@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 16 Mar 2026 19:51:08 +0000 (15:51 -0400)]
drm/amdgpu/gmc9.0: add bounds checking for cid
The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.
Cc: Benjamin Cheng <benjamin.cheng@amd.com> Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>