Philip Yang [Tue, 9 Dec 2025 20:13:23 +0000 (15:13 -0500)]
drm/amdkfd: Unreserve bo if queue update failed
Error handling path should unreserve bo then return failed.
Fixes: 305cd109b761 ("drm/amdkfd: Validate user queue update") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 27 Feb 2026 22:40:05 +0000 (17:40 -0500)]
drm/amd/display: Promote DC to 3.2.373
This version brings along the following updates:
- [FW Promotion] Release 0.1.50.0
- Sync DCN42 with DC 3.2.373
- Add DML support for dcn42
- Enable dcn42 DC clk_mgr
- Clean up unused code
- Add back missing memory type in array
- Fix compile warnings in dml2_0
- Check for S0i3 to be done before DCCG init on DCN21
- Add documentation and cleanup DMUB HW lock manager
- Add new types to replay config
- Fix HWSS v3 fast path determination
- Add missing DCCG register entries for DCN20-DCN316
- Add ESD detection for replay recovery
- Update underflow detection
- Add COLOR_ENCODING/COLOR_RANGE to overlay planes
- Add NV12/P010 formats to primary plane
- Set chroma taps to 1 if luma taps are 1
- Add min clock init for DML21 mode programming
- Return early from vesa replay enable function
- Clean up NULL pointer warnings in dml2
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Thu, 5 Mar 2026 17:56:09 +0000 (12:56 -0500)]
drm/amd/display: Sync dcn42 with DC 3.2.373
This patch provides a bulk merge to align driver
support for DCN42 with Display Core version 3.2.373.
It includes upgrade for:
- clk_mgr
- dml2/dml21
- optc
- hubp
- mpc
- optc
- hwseq
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Tue, 3 Mar 2026 17:00:55 +0000 (12:00 -0500)]
drm/amd/display: Add DML support for dcn42
DML support for DCN 4.2
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Fri, 20 Feb 2026 21:48:14 +0000 (16:48 -0500)]
drm/amd/display: Enable dcn42 DC clk_mgr
Add support for DCN 4.2 clock manager.
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clay King [Fri, 27 Feb 2026 17:34:34 +0000 (12:34 -0500)]
drm/amd/display: Clean up unused code
[WHAT]
Silence warning by cleaning up unused code.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Clay King <clayking@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Chung [Thu, 26 Feb 2026 08:16:19 +0000 (16:16 +0800)]
drm/amd/display: Add back missing memory type in array
[WHY & HOW]
Add back some missing memory type in window_memory_type.
It should be aligned with enum dmub_window_id.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Thu, 26 Feb 2026 02:48:36 +0000 (21:48 -0500)]
drm/amd/display: Check for S0i3 to be done before DCCG init on DCN21
[WHY]
On DCN21, dccg2_init() is called in dcn10_init_hw() before
bios_golden_init(). During S0i3 resume, BIOS sets MICROSECOND_TIME_BASE_DIV
to 0x00120464 as a marker. dccg2_init() overwrites this to 0x00120264,
causing dcn21_s0i3_golden_init_wa() to misdetect the state and skip golden
init.
Eventually during the resume sequence, a flip timeout occurs.
[HOW]
Skip DCCG on dccg2_is_s0i3_golden_init_wa_done() on DCN21.
Fixes: 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ChunTao Tso [Fri, 6 Feb 2026 07:41:54 +0000 (15:41 +0800)]
drm/amd/display: Add new types to replay config
[WHAT]
Add FRAME_SKIPPING_ERROR_STATUS to dpcd_replay_configuration.
Add received_frame_skipping_error_hpd to replay_config.
Add REPLAY_GENERAL_CMD_SET_COASTING_VTOTAL_WITHOUT_FRAME_UPDATE to
dmub_cmd_replay_general_subtype.
Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: ChunTao Tso <chuntao.tso@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix HWSS v3 fast path determination
[WHY]
We're checking surface and stream updates after they've been applied to
their respective states within `update_planes_and_stream_state`.
Medium updates under the HWSS V3 fast path that are not supported or
tested are getting implicitly if they don't trigger a DML validation
and getting updated in place on the dc->current_state context.
[HOW]
Fix this issue by moving up the fast path determination check prior
to `update_planes_and_stream_state`. This is how the V2 path works
and how the V3 path used to work prior to the refactors in this area.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Tue, 24 Feb 2026 21:28:00 +0000 (16:28 -0500)]
drm/amd/display: Add missing DCCG register entries for DCN20-DCN316
Commit 4c595e75110e ("drm/amd/display: Migrate DCCG registers access
from hwseq to dccg component.") moved register writes from hwseq to
dccg2_*() functions but did not add the registers to the DCCG register
list macros. The struct fields default to 0, so REG_WRITE() targets
MMIO offset 0, causing a GPU hang on resume (seen on DCN21/DCN30
during IGT kms_cursor_crc@cursor-suspend).
Add
- MICROSECOND_TIME_BASE_DIV
- MILLISECOND_TIME_BASE_DIV
- DCCG_GATE_DISABLE_CNTL
- DCCG_GATE_DISABLE_CNTL2
- DC_MEM_GLOBAL_PWR_REQ_CNTL
to macros in dcn20_dccg.h, dcn301_dccg.h, dcn31_dccg.h, and dcn314_dccg.h.
Fixes: 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reported-by: Rafael Passos <rafael@rcpassos.me> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Weiguang Li [Mon, 8 Dec 2025 06:13:20 +0000 (14:13 +0800)]
drm/amd/display: Add ESD detection for replay recovery
[HOW]
Add Replay recovery flow so that when HPD occurs and ESD is detected,
Replay can restore the system back to normal.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Reviewed-by: Robin Chen <robin.chen@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Weiguang Li <wei-guang.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Mon, 23 Feb 2026 19:28:14 +0000 (14:28 -0500)]
drm/amd/display: Update underflow detection
[WHY]
Add underflow detection for later ASICs.
Reviewed-by: Leo Chen <leo.chen@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Thu, 15 Jan 2026 19:12:46 +0000 (14:12 -0500)]
drm/amd/display: Add COLOR_ENCODING/COLOR_RANGE to overlay planes
Extend COLOR_ENCODING and COLOR_RANGE property creation to overlay
planes in addition to primary planes. This allows overlay planes to
use YUV formats with proper color space configuration when the
hardware supports NV12/P010 formats.
These properties control the YUV-to-RGB conversion matrix selection
(BT.601/BT.709/BT.2020) and range handling (limited/full range).
Assisted-by: Claude: claude-sonnet-4.5 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Thu, 15 Jan 2026 19:12:05 +0000 (14:12 -0500)]
drm/amd/display: Add NV12/P010 formats to primary plane
Add NV12, NV21, and P010 YUV formats to the primary plane's supported
format list, enabling YUV content to be scanned out directly from the
primary plane.
Assisted-by: Claude: claude-sonnet-4.5 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Fri, 16 Jan 2026 14:39:05 +0000 (09:39 -0500)]
drm/amd/display: Set chroma taps to 1 if luma taps are 1
When luma is unscaled we also want chroma to be pixel-perfect.
When luma taps are > 1 the result will be a blurred luma plane,
even when the image isn't scaled.
This makes IGT tests for CSC colorop pass.
Assisted-by: Claude: claude-sonnet-4.5 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Peichen Huang [Fri, 6 Feb 2026 07:10:25 +0000 (15:10 +0800)]
drm/amd/display: Return early from vesa replay enable function
[WHY & HOW]
If the enable state is already as expect then just return.
Reviewed-by: Robin Chen <robin.chen@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Clean up NULL pointer warnings in dml2
This commit addresses multiple warnings by adding defensive
checks for NULL pointers before dereferencing them. The changes ensure
that pointers such as are validated, preventing potential undefined
behavior.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Fri, 6 Mar 2026 11:15:14 +0000 (16:45 +0530)]
drm/amdgpu: push userq debugfs function in amdgpu_debugfs files
Debugfs files for amdgpu are better to be handled in the dedicated
amdgpu_debugfs.c/.h files.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Fri, 6 Mar 2026 06:54:29 +0000 (12:24 +0530)]
drm/amdgpu/userq: declutter the code with goto
Clean up the amdgpu_userq_create function clean up in
failure condition using goto method. This avoid replication
of cleanup for every failure condition.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 3 Mar 2026 19:08:06 +0000 (00:38 +0530)]
drm/amdgpu/userq: defer queue publication until create completes
The userq create path publishes queues to global xarrays such as
userq_doorbell_xa and userq_xa before creation was fully complete.
Later on if create queue fails, teardown could free an already
visible queue, opening a UAF race with concurrent queue walkers.
Also calling amdgpu_userq_put in such cases complicates the cleanup.
Solution is to defer queue publication until create succeeds and no
partially initialized queue is exposed.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Francis [Fri, 16 Jan 2026 15:21:15 +0000 (10:21 -0500)]
drm/amdgpu: Check for multiplication overflow in checkpoint stack size
get_checkpoint_info() in kfd_mqd_manager_v9.c finds 32-bit value
ctl_stack_size by multiplying two 32-bit values. This can overflow to a
lower value, which could result in copying outside the bounds of
a buffer in checkpoint_mqd() in the same file.
Put in a check for the overflow, and fail with -EINVAL if detected.
v2: use check_mul_overflow()
Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Fri, 6 Mar 2026 07:23:30 +0000 (12:53 +0530)]
drm/amdkfd: fix the warning for potential insecure string
Below is the warning thrown by the clang compiler:
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:588:9: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
stats_dir_filename);
^~~~~~~~~~~~~~~~~~
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:588:9: note: treat the string as an argument to avoid this
stats_dir_filename);
^
"%s",
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:635:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
p->kobj, counters_dir_filename);
^~~~~~~~~~~~~~~~~~~~~
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:635:18: note: treat the string as an argument to avoid this
p->kobj, counters_dir_filename);
^
"%s",
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> CC: Philip Yang <philip.yang@amd.com> CC: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Fri, 6 Mar 2026 11:18:02 +0000 (16:48 +0530)]
drm/amdgpu: fix warning for potentially insecure string
linux/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2358:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
sprintf(ring->name, amdgpu_sw_ring_name(i));
^~~~~~~~~~~~~~~~~~~~~~
linux/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2358:24: note: treat the string as an argument to avoid this
sprintf(ring->name, amdgpu_sw_ring_name(i));
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Wed, 4 Mar 2026 23:45:45 +0000 (18:45 -0500)]
drm/amdgpu: fix gpu idle power consumption issue for gfx v12
Older versions of the MES firmware may cause abnormal GPU power consumption.
When performing inference tasks on the GPU (e.g., with Ollama using ROCm),
the GPU may show abnormal power consumption in idle state and incorrect GPU load information.
This issue has been fixed in firmware version 0x8b and newer.
Closes: https://github.com/ROCm/ROCm/issues/5706 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Joshua Peisach [Tue, 3 Mar 2026 21:18:22 +0000 (16:18 -0500)]
drm/amdgpu/amdgpu_connectors: use struct drm_edid instead of struct edid
Some amdgpu code is still using deprecated edid functions. Switch to
the newer functions and update the amdgpu_connector struct's edid type
to the drm_edid type.
At the same time, use the raw EDID when we need to for speaker
allocations and for determining if the input is digital.
Signed-off-by: Joshua Peisach <jpeisach@ubuntu.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Fix a few more NULL pointer dereference in device cleanup
I found a few more paths that cleanup fails due to a NULL version pointer
on unsupported hardware.
Add NULL checks as applicable.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/radeon: Test for fbdev GEM object with generic helper
Replace radeon's test for the fbdev GEM object with a call to the
generic helper.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Move test for fbdev GEM object into generic helper
Provide a generic helper that tests if fbdev emulation is backed by
a specific GEM object. Not all drivers use client buffers (yet), hence
also test against the first GEM object in the fbdev framebuffer.
Convert amdgpu. The helper will also be useful for radeon.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Wed, 4 Mar 2026 02:14:10 +0000 (21:14 -0500)]
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3
Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 3 Mar 2026 17:00:04 +0000 (22:30 +0530)]
drm/amdgpu/userq: remove queue from doorbell xa during clean up
If function amdgpu_userq_map_helper fails we do need to clean
up and remove the queue from the userq_doorbell_xa.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 3 Mar 2026 16:55:57 +0000 (22:25 +0530)]
drm/amdgpu/userq: remove queue from doorbell xarray
In case of failure in xa_alloc, remove the queue during
clean up from the userq_doorbell_xa.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Fix NULL pointer dereference in device cleanup
When GPU initialization fails due to an unsupported HW block
IP blocks may have a NULL version pointer. During cleanup in
amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and
amdgpu_device_set_cg_state which iterate over all IP blocks and access
adev->ip_blocks[i].version without NULL checks, leading to a kernel
NULL pointer dereference.
Add NULL checks for adev->ip_blocks[i].version in both
amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent
dereferencing NULL pointers during GPU teardown when initialization has
failed.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Wed, 4 Mar 2026 02:10:11 +0000 (21:10 -0500)]
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7
Fixes: cfffd980bf21 ("drm/amd/pm: add zero RPM OD setting support for SMU13") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Fix mutex handling in amdgpu_benchmark_do_move() v3
amdgpu_benchmark_do_move() can exit the loop early if
amdgpu_copy_buffer() or dma_fence_wait() fails.
In the error path, the function jumps to the exit label
without releasing adev->mman.default_entity.lock, which
leaves the mutex held and results in a lock imbalance.
This can block subsequent users of default_entity and
potentially cause deadlocks.
Move the mutex_unlock() to the common exit path so the
lock is released on both success and error returns.
This fixes:
drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c:57 amdgpu_benchmark_do_move()
warn: inconsistent returns '&adev->mman.default_entity.lock'.
v2:
- Drop unrelated initialization of 'r'
- Keep the change focused on the mutex imbalance fix (Pierre).
v3:
- Removed empty line
Fixes: 30f2daedf4d8 ("drm/amdgpu: add missing lock in amdgpu_benchmark_do_move") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 2 Mar 2026 05:35:30 +0000 (13:35 +0800)]
drm/amd/pm: Avoid overflow when sorting pp_feature list
pp_features sorting uses int8_t sort_feature[] to store driver
feature enum indices. On newer ASICs the enum index can exceed 127,
causing signed overflow and silently dropping entries from the output.
Switch the array to int16_t so all enum indices are preserved.
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sguttula [Wed, 25 Feb 2026 08:27:01 +0000 (13:57 +0530)]
drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
The reason the RAP is not granting access to 0x58200 is that
a dedicated RSMU slot would have to be spent for this address range,
and MPASP is close to running out of RSMU slots.
This will help to fix PSP TOC load failure during secureboot.
GFX Driver Need to use indirect access for SMN address regs.
Tvrtko Ursulin [Wed, 7 Jan 2026 12:43:50 +0000 (12:43 +0000)]
drm/amdgpu: Remove redundant missing hw ip handling
Now that it is guaranteed there can be no entity if there is no hw ip
block we can remove the open coded protection during CS parsing.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: 55414ad5c983 ("drm/amdgpu: error out on entity with no run queue") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tvrtko Ursulin [Wed, 7 Jan 2026 12:43:49 +0000 (12:43 +0000)]
drm/amdgpu: Reject impossible entities early
Currently there are two different behaviour modes when userspace tries to
operate on not present HW IP blocks. On a machine without UVD, VCE and VPE
blocks, this can be observed for example like this:
$ sudo ./amd_fuzzing --r cs-wait-fuzzing
...
amd_cs_wait_fuzzing DRM_IOCTL_AMDGPU_CTX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_GFX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_COMPUTE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_DMA r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD_ENC r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_DEC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_ENC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_JPEG r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VPE r 0
We can see that UVD returns an errno (-EINVAL) from the CS_WAIT ioctl,
while VCE and VPE return unexpected successes.
The difference stems from the fact the UVD is a load balancing engine
which retains the context, so with a workaround implemented in
amdgpu_ctx_init_entity(), but which does not account for the fact hardware
block may not be present.
This causes a single NULL scheduler to be passed to
drm_sched_entity_init(), which immediately rejects this with -EINVAL.
The not present VCE and VPE cases on the other hand pass zero schedulers
to drm_sched_entity_init(), which is explicitly allowed and results in
unusable entities.
As the UVD case however shows, call paths can handle the errors, so we can
consolidate this into a single path which will always return -EINVAL if
the HW IP block is not present.
We do this by rejecting it early and not calling drm_sched_entity_init()
when there is no backing hardware.
This also removes the need for the drm_sched_entity_init() to handle the
zero schedulers and NULL scheduler cases, which means that we can follow
up by removing the special casing from the DRM scheduler.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: f34e8bb7d6c6 ("drm/sched: fix null-ptr-deref in init entity") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Mon, 2 Mar 2026 13:20:46 +0000 (18:50 +0530)]
drm/amdgpu/userq: refcount userqueues to avoid any race conditions
To avoid race condition and avoid UAF cases, implement kref
based queues and protect the below operations using xa lock
a. Getting a queue from xarray
b. Increment/Decrement it's refcount
Every time some one want to access a queue, always get via
amdgpu_userq_get to make sure we have locks in place and get
the object if active.
A userqueue is destroyed on the last refcount is dropped which
typically would be via IOCTL or during fini.
v2: Add the missing drop in one the condition in the signal ioclt [Alex]
v3: remove the queue from the xarray first in the free queue ioctl path
[Christian]
- Pass queue to the amdgpu_userq_put directly.
- make amdgpu_userq_put xa_lock free since we are doing put for each get
only and final put is done via destroy and we remove the queue from xa
with lock.
- use userq_put in fini too so cleanup is done fully.
v4: Use xa_erase directly rather than doing load and erase in free
ioctl. Also remove some of the error logs which could be exploited
by the user to flood the logs [Christian]
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alysa Liu [Thu, 5 Feb 2026 16:21:45 +0000 (11:21 -0500)]
drm/amdgpu: Fix use-after-free race in VM acquire
Replace non-atomic vm->process_info assignment with cmpxchg()
to prevent race when parent/child processes sharing a drm_file
both try to acquire the same VM after fork().
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Thu, 26 Feb 2026 03:51:06 +0000 (22:51 -0500)]
drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
v1:
The metrics->EnergyAccumulator field has been deprecated on newer pmfw.
v2:
add smu 13.0.0/13.0.7/13.0.10 support.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: db1882b3ff0c ("drm/amdkfd: Update LDS, Scratch base for 57bit address") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Colin Ian King [Sat, 28 Feb 2026 09:59:38 +0000 (09:59 +0000)]
drm/amd/display: remove extra ; from statement, remove extra tabs
There is a statement that has a ;; at the end, remove the extraneous ;
and remove extra tabs in the code block.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Drop redundant syncobj handle limit checks in userq ioctls
Clang warns that comparing a __u16 value against 65536 is always false.
num_syncobj_handles is defined as __u16 in both the userq signal and
wait ioctl argument structs, so it can never exceed 65535. The checks
against AMDGPU_USERQ_MAX_HANDLES are therefore redundant and trigger
-Wtautological-constant-out-of-range-compare.
Fixes: Clang -Wtautological-constant-out-of-range-compare in userq
signal/wait ioctls
Fixes: d8e760b7996d ("drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_signal") Fixes: c561d2320492 ("drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_wait") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gangliang Xie [Fri, 12 Dec 2025 06:16:17 +0000 (14:16 +0800)]
drm/amd/ras: add pmfw eeprom smu interfaces
add smu interfaces and its data structures for
pmfw eeprom in uniras
v2: add 'const' to smu messages array, and specify
index for each member when initializing.
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gangliang Xie [Fri, 12 Dec 2025 07:42:49 +0000 (15:42 +0800)]
drm/amd/pm: add feature query interface for uniras
add amdgpu_smu_ras_feature_is_enabled to query one feature
is supported or not
v2: change default return value from -EOPNOTSUPP to 0
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gangliang Xie [Thu, 11 Dec 2025 10:14:46 +0000 (18:14 +0800)]
drm/amd/pm: add pmfw eeprom messages into uniras interface
add pmfw eeprom related messages into smu_v13_0_6_ras_send_msg
v2: add sriov check before sending smu commands
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Thu, 26 Feb 2026 15:48:51 +0000 (21:18 +0530)]
drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_wait
update the type for num_syncobj_handles from __u32 to _u16 with
required padding.
This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Thu, 26 Feb 2026 15:44:27 +0000 (21:14 +0530)]
drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_signal
update the type for num_syncobj_handles from __u64 to _u16 with
required padding.
This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 20 Feb 2026 22:53:34 +0000 (17:53 -0500)]
drm/amd/display: Promote DC to 3.2.372
This version brings along the following updates:
- Prevent integer overflow when mhz to khz
- Remove always-false branches
- Remove redundant initializers
- Silence unused variable warning
- Initialize replay_state to PR_STATE_INVALID
- Fallback to boot snapshot for dispclk
- Skip cursor cache reset if hubp powergating is disabled
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Wed, 18 Feb 2026 17:38:33 +0000 (10:38 -0700)]
drm/amd/display: Prevent integer overflow when mhz to khz
[WHAT]
Cast to long long before multiplication to prevent overflow
when converting mhz to khz by multiplying by 1000.
This is reported as INTEGER_OVERFLOW errors by Coverity.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Wed, 18 Feb 2026 16:50:15 +0000 (09:50 -0700)]
drm/amd/display: Remove always-false branches
[WHAT]
program_prealpha_dealpha and hpo_frl_stream_enc_acquired are always
false and all branches depending on them will never be taken.
This is reported as DEADCODE errors by Coverity.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Wed, 18 Feb 2026 16:30:32 +0000 (09:30 -0700)]
drm/amd/display: Remove redundant initializers
[WHAT]
Remove unnecessary default value assignments for variables that
are unconditionally assigned before use.
Linux kernel code style prefers no assignments during initialization
when variables are assigned unconditionally as they can obscures
the actual data flow. In addition, compilers will be able to catch them
if variables are used without being updated later in all conditions.
This is reported as UNUSED_VALUE errors by Coverity.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Clay King <clayking@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Wed, 18 Feb 2026 21:19:15 +0000 (16:19 -0500)]
drm/amd/display: Initialize replay_state to PR_STATE_INVALID
[WHY & HOW]
Initialize the replay_state variable to PR_STATE_INVALID instead of
PR_STATE_0 before retrieving the actual replay state.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Wed, 18 Feb 2026 19:34:28 +0000 (14:34 -0500)]
drm/amd/display: Fallback to boot snapshot for dispclk
[WHY & HOW]
If the dentist is unavailable, fallback to reading CLKIP via the boot
snapshot to get the current dispclk.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>