]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
3 weeks agodrm/amdgpu: fix amdgpu_hmm_range_get_pages
Christian König [Wed, 18 Feb 2026 11:53:27 +0000 (12:53 +0100)] 
drm/amdgpu: fix amdgpu_hmm_range_get_pages

The notifier sequence must only be read once or otherwise we could work
with invalid pages.

While at it also fix the coding style, e.g. drop the pre-initialized
return value and use the common define for 2G range.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/ras: cap pending_ecc_list size
Stanley.Yang [Mon, 11 May 2026 11:44:16 +0000 (19:44 +0800)] 
drm/amd/ras: cap pending_ecc_list size

Drop new entries once pending_ecc_count hits RAS_UMC_PENDING_ECC_MAX
(8192) so an ECC storm or repeated UMC error injection cannot exhaust
kernel memory. Dropped events are counted and reported via a
rate-limited warning.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: init locals in umc_v12_0_convert_error_address
Stanley.Yang [Mon, 11 May 2026 09:27:29 +0000 (17:27 +0800)] 
drm/amdgpu: init locals in umc_v12_0_convert_error_address

row, col, col_lower, row_lower, row_high and bank could be read on
code paths that never assign them. Initialize them to 0.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: use array instead of list for userq_vas
Sunil Khatri [Wed, 20 May 2026 11:09:49 +0000 (16:39 +0530)] 
drm/amdgpu/userq: use array instead of list for userq_vas

Use arrays instead of list for userq_vas since we have fixed no
of bos. Also, we dont have to worry to free that memory later
since this array would be free along with queue only.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: move mqd_destroy to later stage to keep core obj valid
Sunil Khatri [Wed, 20 May 2026 10:55:50 +0000 (16:25 +0530)] 
drm/amdgpu/userq: move mqd_destroy to later stage to keep core obj valid

mqd_destroy cleans up queue core objects like mqd and fw_object
which are needed for any pending fence to signal properly.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdkfd: fix a vulnerability of integer overflow in kfd debugger
Eric Huang [Tue, 12 May 2026 14:19:52 +0000 (10:19 -0400)] 
drm/amdkfd: fix a vulnerability of integer overflow in kfd debugger

get_queue_ids() computes array_size = num_queues * sizeof(uint32_t),
which could overflow on 32-bit size_t build. using array_size()
instead, it saturates to SIZE_MAX on overflow.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd: Add dedicated helper for amdgpu_device_find_parent()
Mario Limonciello [Wed, 20 May 2026 15:46:17 +0000 (10:46 -0500)] 
drm/amd: Add dedicated helper for amdgpu_device_find_parent()

There are a few cases that code walks up the topology to find the
link partner of the integrated switch in a dGPU.  Split this out
to a helper and call in all places.

This does have a functional change that amdgpu_device_gpu_bandwidth()
doesn't cache the internal link but only the parent.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapper
Sunil Khatri [Wed, 20 May 2026 10:43:09 +0000 (16:13 +0530)] 
drm/amdgpu/userq: remove amdgpu_userq_create/destroy_object wrapper

Remove the amdgpu_userq_create/destroy_object wrappers and
use directly the kernel bo allocation function which does all the
things which are done in wrapper.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Fix TOCTOU on UniRAS command response size
Chenglei Xie [Mon, 11 May 2026 18:13:45 +0000 (14:13 -0400)] 
drm/amdgpu: Fix TOCTOU on UniRAS  command response size

The guest maps the PF response in shared VRAM (struct ras_cmd_ctx in the
command buffer). After amdgpu_virt_send_remote_ras_cmd() returns, the code
validated rcmd->output_size against the caller buffer, then copied
rcmd->output_buff_raw using rcmd->output_size again. A malicious PF could
change output_size between those reads so the memcpy length exceeds the
caller’s output_size and overflows guest stack or heap buffers.

Snapshot output_size with READ_ONCE() once, assign cmd->output_size from
that value, and use the same snapshot for the bounds check and memcpy.
Also read cmd_res once with READ_ONCE() so the error branch and
cmd->cmd_res assignment do not observe different values from shared memory.

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: bound SR-IOV RAS CPER dump parsing against used_size
Chenglei Xie [Mon, 11 May 2026 19:24:29 +0000 (15:24 -0400)] 
drm/amdgpu: bound SR-IOV RAS CPER dump parsing against used_size

The VF copies a PF-provided CPER telemetry blob and walks records using
cper_dump->count and each entry's record_length. count is u64 while the
loop used u32, so a large count could loop indefinitely. record_length was
not limited to the kmemdup'd region, so the first iteration could read far
past the allocation; record_length == 0 could spin forever on the same
entry. Together that allowed a malicious hypervisor to leak heap past the
blob into the CPER ring or hang the guest.

Require used_size to cover the fixed header before buf and stay within the
telemetry cap. Track remaining bytes in buf, cap iterations with u64 and
CPER_MAX_ALLOWED_COUNT, and reject record_length outside
[sizeof(cper_hdr), remaining] before writing to the ring.

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/si: Notify the SMC when switching to AC
Jeremy Klarenbeek [Tue, 19 May 2026 08:41:58 +0000 (10:41 +0200)] 
drm/amd/pm/si: Notify the SMC when switching to AC

There are some platforms that don't have a dedicated
GPIO line to manage the AC/DC switch. In this case,
the SI SMC automatically notices when switching to DC,
but needs to be notified when switching to AC.

Fixup and use si_notify_hw_of_powersource() which was
previously hidden behind an "#if 0".

This fixes some SI laptop GPUs to be able to use their
performance power states after switching from DC to AC.

Some affected GPUs are:
FirePro W4170M - Dell Precision M2800
Radeon HD 8790M - Dell Latitude E6540

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Co-developed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/si: Fix updating clock limits from power states
Jeremy Klarenbeek [Tue, 19 May 2026 08:41:57 +0000 (10:41 +0200)] 
drm/amd/pm/si: Fix updating clock limits from power states

VBIOS can contain conflicting values between:
- the maximum allowed clocks and voltages on AC or DC
- the clocks and voltages in power states on AC or DC

Update maximum clock (and voltage) limits for both AC/DC
and take the highest value from the VBIOS limits and
the performance/battery power states. Previously this
was only done for AC, but is also needed for DC.

This commit fixes the behaviour on some laptop GPUs,
where the VBIOS limit was set to the lowest possible
clock frequency, so the GPU was stuck on the lowest
possible power level on battery.

Some affected GPUs are:
FirePro W4170M (Dell Precision M2800)
Radeon HD 8790M (Dell Latitude E6540)
and possibly other laptop GPUs.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Co-developed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/smu7: Notify SMU7 of DC->AC switch
Timur Kristóf [Tue, 19 May 2026 08:41:56 +0000 (10:41 +0200)] 
drm/amd/pm/smu7: Notify SMU7 of DC->AC switch

When ATOM_PP_PLATFORM_CAP_HARDWAREDC is set,
the SMU has a GPIO pin for detecting AC/DC switch
and everything works automatically.

Otherwise when there is no GPIO pin, the SMU can
automatically detect switching to DC, but needs
to be notified of switching to AC.

Use PPSMC_MSG_RunningOnAC to notify the SMC
when switching to AC.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Rename enable_bapm() to notify_ac_dc()
Timur Kristóf [Tue, 19 May 2026 08:41:55 +0000 (10:41 +0200)] 
drm/amd/pm: Rename enable_bapm() to notify_ac_dc()

No functional changes, just change the name of this
function pointer to be more generic.

BAPM refers to a specific feature on KV, but other kinds of
ASICs may also need the SMU to be notified on AC/DC changes.

Also remove the argument and use adev->pm.ac_power instead.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm/si: Disregard vblank time when no displays are connected
Timur Kristóf [Tue, 19 May 2026 08:41:54 +0000 (10:41 +0200)] 
drm/amd/pm/si: Disregard vblank time when no displays are connected

When no displays are connected, there is no vblank
happening so the power management code shouldn't
worry about it.

This fixes a regression that caused the memory clock
to be stuck at maximum when there were no displays
connected to a SI GPU.

Fixes: 9003a0746864 ("drm/amd/pm: Treat zero vblank time as too short in si_dpm (v3)")
Fixes: 9d73b107a61b ("drm/amd/pm: Use pm_display_cfg in legacy DPM (v2)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete PP_DAL_POWERLEVEL
Timur Kristóf [Tue, 19 May 2026 10:21:18 +0000 (12:21 +0200)] 
drm/amd/pm: Delete PP_DAL_POWERLEVEL

Not used and not needed anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete get_dal_power_level
Timur Kristóf [Tue, 19 May 2026 10:21:17 +0000 (12:21 +0200)] 
drm/amd/pm: Delete get_dal_power_level

Not needed anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete vddc_dep_on_dal_pwrl
Timur Kristóf [Tue, 19 May 2026 10:21:16 +0000 (12:21 +0200)] 
drm/amd/pm: Delete vddc_dep_on_dal_pwrl

It was not used by anything anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete non-functional SMU8 get_dal_power_level implementation
Timur Kristóf [Tue, 19 May 2026 10:21:15 +0000 (12:21 +0200)] 
drm/amd/pm: Delete non-functional SMU8 get_dal_power_level implementation

This function was effectively a no-op because it always
returned the maximum possible power level, because the
maximum voltage is in millivolts while the dependency
table didn't contain actual voltages.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete dummy get_dal_power_level implementations
Timur Kristóf [Tue, 19 May 2026 10:21:14 +0000 (12:21 +0200)] 
drm/amd/pm: Delete dummy get_dal_power_level implementations

These implementations did not actually return
the DAL power level, so they were effectively
a no-op.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/pm: Delete unused get_display_power_level() function
Timur Kristóf [Tue, 19 May 2026 10:21:13 +0000 (12:21 +0200)] 
drm/amd/pm: Delete unused get_display_power_level() function

Was not called from anywhere.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete dm_pp_clocks_state
Timur Kristóf [Tue, 19 May 2026 10:21:12 +0000 (12:21 +0200)] 
drm/amd/display: Delete dm_pp_clocks_state

It isn't used by anything anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete disp_clk_voltage from integrated info (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:11 +0000 (12:21 +0200)] 
drm/amd/display: Delete disp_clk_voltage from integrated info (v2)

Only DCE 11.0 relies on this information and even that
didn't use this field, because it queries the information
from the pplib. It also filled the field incorrectly on
that version.

On newer GPUs, the VIOS integrated info no longer contains
display clock voltage dependencies, so we don't need it.

v2:
- Also delete some code wrapped in #if 0

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete max_clks_by_state from DCE clock manager (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:10 +0000 (12:21 +0200)] 
drm/amd/display: Delete max_clks_by_state from DCE clock manager (v2)

It was not used by anything anymore.

Note that the parts of DC that need this information actually
already query it from the pplib and don't use the hardcoded
information from max_clks_by_state.

v2:
- Also delete state_dependent_clocks

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Set max supported display clock without max_clks_by_state (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:09 +0000 (12:21 +0200)] 
drm/amd/display: Set max supported display clock without max_clks_by_state (v2)

The max_clks_by_state was based on hardcoded values, which are
not really used anywhere, only to know the maximum clock.
Just hardcode the same maximum clock for each DCE version.

v2:
- Use previous max display clock for DCE 11.2

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete max_clocks_state
Timur Kristóf [Tue, 19 May 2026 10:21:08 +0000 (12:21 +0200)] 
drm/amd/display: Delete max_clocks_state

It's not used by anything anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Remove min/max clock levels from clk_mgr (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:07 +0000 (12:21 +0200)] 
drm/amd/display: Remove min/max clock levels from clk_mgr (v2)

These fields are not used by anything anymore.

v2:
- Delete dm_pp_get_static_clocks()
- Delete pp_to_dc_powerlevel_state()

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete dce_get_required_clocks_state()
Timur Kristóf [Tue, 19 May 2026 10:21:06 +0000 (12:21 +0200)] 
drm/amd/display: Delete dce_get_required_clocks_state()

It is not called from anywhere anymore.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdkfd: Check for pdd drm file first in CRIU restore path
David Francis [Thu, 14 May 2026 14:31:20 +0000 (10:31 -0400)] 
drm/amdkfd: Check for pdd drm file first in CRIU restore path

CRIU restore ioctls are meant to be called by CRIU with no
existing drm file. There's an error path
for if the drm file unexpectedly exists. It was positioned so
it was missing a fput(drm_file).

Do that check earlier, as soon as we have the pdd.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: fix potential overflow in fs_info.debugfs_name
Stanley.Yang [Mon, 11 May 2026 08:49:19 +0000 (16:49 +0800)] 
drm/amdgpu: fix potential overflow in fs_info.debugfs_name

Use snprintf() with sizeof(fs_info.debugfs_name) so a long RAS block
name plus the "_err_inject" suffix cannot overflow the 32-byte buffer.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: make sure queue is valid in the hang_detect_work
Sunil Khatri [Mon, 18 May 2026 14:28:08 +0000 (19:58 +0530)] 
drm/amdgpu/userq: make sure queue is valid in the hang_detect_work

Thread 1: Running amdgpu_userq_destroy which eventually remove
the queue from door bell and set userq_mgr = NULL.

Thread2: An interrupt might have scheduled the hang_detect_work
which still need userq_mgr to be valid but could get an NULL
ptrs.

To fix that make sure we cancel the hang_detect_work again before
setting userq_mgr to NULL.

Along with that we also need all the queue va to remain valid till
we could be running anything on the queue and hence moving the
userq_va post hang_detect handler is cancelled.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: reserve root bo without interruption
Sunil Khatri [Mon, 18 May 2026 13:25:25 +0000 (18:55 +0530)] 
drm/amdgpu/userq: reserve root bo without interruption

Fix the code to make it an uninterruptible reservation
for root bo.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart fails
Sunil Khatri [Mon, 18 May 2026 13:03:00 +0000 (18:33 +0530)] 
drm/amdgpu/userq: add amdgpu_bo_unpin when amdgpu_ttm_alloc_gart fails

Unpin the wptr_obj->obj when amdgpu_ttm_alloc_gart fails.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_index
Sunil Khatri [Mon, 18 May 2026 12:12:15 +0000 (17:42 +0530)] 
drm/amdgpu: simplify return value in amdgpu_userq_get_doorbell_index

amdgpu_userq_get_doorbell_index returns a uint64 type index
as well as a int type failure values. Simplifying this and
using a int type return value and getting the index in input pointer
of type uint64 type.

Also since it's used at once place making it static would be better.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdkfd: fix NULL pointer bug in svm_range_set_attr
Eric Huang [Thu, 7 May 2026 19:51:49 +0000 (15:51 -0400)] 
drm/amdkfd: fix NULL pointer bug in svm_range_set_attr

The process_info could be NULL if user doesn't call kfd_ioctl_acquire_vm
before calling kfd_ioctl_svm.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Write REFCLK to 48MHz on DCN21
Ivan Lipski [Thu, 14 May 2026 15:53:50 +0000 (11:53 -0400)] 
drm/amd/display: Write REFCLK to 48MHz on DCN21

[Why&How]
dccg21_init() calls dccg2_init() which hardcodes 100MHz refclk values
for MICROSECOND_TIME_BASE_DIV and MILLISECOND_TIME_BASE_DIV. DCN21
uses 48MHz refclk, so the wrong values corrupt DCCG timing and cause eDP
link training failure on cold boot.

Write the correct 48MHz values directly instead of calling dccg2_init().

v2:
Fixed typo

Fixes: e6e2b956fc81 ("drm/amd/display: Add missing DCCG register entries for DCN20-DCN316")
Reported-by: Max Chernoff <git@maxchernoff.ca>
Tested-by: Max Chernoff <git@maxchernoff.ca>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: Fix the mutex_init cleanup for fence_drv_lock
Sunil Khatri [Tue, 19 May 2026 09:42:42 +0000 (15:12 +0530)] 
drm/amdgpu/userq: Fix the mutex_init cleanup for fence_drv_lock

mutex fence_drv_lock is destroyed in amdgpu_userq_fence_driver_free
also in one of the jump condition mutex_destroy is also called leading
to double mutex_destroy.

So rearranging the code so amdgpu_userq_fence_driver_free takes care
of the clean up along with mutex_destroy.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Delete unimplemented dm_pp_apply_power_level_change_request() (v2)
Timur Kristóf [Tue, 19 May 2026 10:21:05 +0000 (12:21 +0200)] 
drm/amd/display: Delete unimplemented dm_pp_apply_power_level_change_request() (v2)

dm_pp_apply_power_level_change_request() was called from old
DCE clock manager implementations on DCE6, 8, 10, 11.2
but has not been implemented ever since the beginning of DC.

Affected GPUs have been working fine without that implementation
for many years. Let's delete it now.

v2:
- Delete dm_pp_apply_power_level_change_request too

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu/userq: Fix doorbell object cleanup of queue
Sunil Khatri [Tue, 19 May 2026 09:32:00 +0000 (15:02 +0530)] 
drm/amdgpu/userq: Fix doorbell object cleanup of queue

Unpin and unref the door bell obj if queue creation fails before
initialization is complete.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Replace use of system_unbound_wq with system_dfl_wq
Marco Crivellari [Thu, 14 May 2026 10:38:09 +0000 (12:38 +0200)] 
drm/amdgpu: Replace use of system_unbound_wq with system_dfl_wq

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Replace use of system_unbound_wq with system_dfl_wq
Marco Crivellari [Thu, 14 May 2026 10:38:08 +0000 (12:38 +0200)] 
drm/amd/display: Replace use of system_unbound_wq with system_dfl_wq

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Cc: Ray Wu <ray.wu@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Rodrigo Siqueira <siqueira@igalia.com>
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: check num_entries in GEM_OP GET_MAPPING_INFO
Ziyi Guo [Sun, 8 Feb 2026 00:02:55 +0000 (00:02 +0000)] 
drm/amdgpu: check num_entries in GEM_OP GET_MAPPING_INFO

kvcalloc(args->num_entries, sizeof(*vm_entries), GFP_KERNEL) at
amdgpu_gem.c:1050 uses the user-supplied num_entries directly without
any upper bounds check. Since num_entries is a __u32 and
sizeof(drm_amdgpu_gem_vm_entry) is 32 bytes, a large num_entries
produces an allocation exceeding INT_MAX, triggering
WARNING in __kvmalloc_node_noprof(), causing a kernel WARNING,
TAINT_WARN, and panic on CONFIG_PANIC_ON_WARN=y systems.

Add a size bounds check before we invoke the kvzalloc() to
reject oversized num_entries early with -EINVAL.

Fixes: 4d82724f7f2b ("drm/amdgpu: Add mapping info option for GEM_OP ioctl")
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: fix lock leak on ENOMEM in AMDGPU_GEM_OP_GET_MAPPING_INFO
Michael Bommarito [Sun, 17 May 2026 13:17:42 +0000 (09:17 -0400)] 
drm/amdgpu: fix lock leak on ENOMEM in AMDGPU_GEM_OP_GET_MAPPING_INFO

The AMDGPU_GEM_OP_GET_MAPPING_INFO branch of amdgpu_gem_op_ioctl()
holds three cleanup-tracked resources before calling kvcalloc():
the drm_gem_object reference from drm_gem_object_lookup(), the
drm_exec lock on the looked-up GEM via drm_exec_lock_obj(), and
the drm_exec lock on the per-process VM root page directory via
amdgpu_vm_lock_pd().  All three are released by the out_exec
label that every other error path in this function jumps to.
The kvcalloc() failure path returns -ENOMEM directly, skipping
out_exec and leaking all three.

The leaked per-process VM root PD dma_resv lock is the
load-bearing leak: any subsequent operation on the same VM
(further GEM ops, command-submission, eviction, TTM shrinker
callbacks) blocks on the held lock.  DRM_IOCTL_AMDGPU_GEM_OP is
DRM_AUTH | DRM_RENDER_ALLOW, so this is an unprivileged-local
denial of service against the caller's GPU context, reachable
by any process with /dev/dri/renderD* access.

Route the failure through out_exec so drm_exec_fini() and
drm_gem_object_put() run.

Reproduced on stock 7.0.0-10, Ryzen 7 5700U / Radeon Vega
(Lucienne): the failing ioctl returns -ENOMEM and a second
GET_MAPPING_INFO on the same fd then blocks in
drm_exec_lock_obj() on the leaked dma_resv.  SIGKILL on the
caller does not reap the task; the fd-release path during
process exit goes through amdgpu_gem_object_close() ->
drm_exec_prepare_obj() on the same lock, leaving the task in D
state until the box is rebooted.  The patched kernel was not
rebuilt and re-tested on this hardware; the fix is mechanical.
Tested on a single Lucienne / Vega box only.

Ziyi Guo posted an independent INT_MAX-bound check for
args->num_entries in the same branch [1]; the two patches are
complementary and can land in either order.

Fixes: 4d82724f7f2b ("drm/amdgpu: Add mapping info option for GEM_OP ioctl")
Link: https://lore.kernel.org/all/20260208000255.4073363-1-n7l8m4@u.northwestern.edu/
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amd/display: Fix amdgpu_dm KUnit allmodconfig build
Ray Wu [Tue, 19 May 2026 03:44:55 +0000 (11:44 +0800)] 
drm/amd/display: Fix amdgpu_dm KUnit allmodconfig build

[Why]
With CONFIG_DRM_AMD_DC_KUNIT_TEST=m, allmodconfig only defines the
_MODULE variant. Four KUnit helper headers gate their declarations
with #ifdef CONFIG_DRM_AMD_DC_KUNIT_TEST, so the declarations vanish
while the matching .c files (driven by IS_ENABLED() via
STATIC_IFN_KUNIT) keep the functions non-static. The build breaks
with implicit declarations and -Werror=missing-prototypes.

amdgpu_dm_crc.h additionally uses symbols that its test file does not
pull in indirectly, amdgpu_dm_colorop_test.c has a copy-paste
duplicate function with the wrong expected bitmask, and the three
colorop TF bitmasks are not exported for modpost.

[How]
- Switch the crc/hdcp/color/psr KUnit guards to IS_ENABLED().
- Make amdgpu_dm_crc.h self-contained (dc_types.h + forward decl).
- Rename the duplicated shaper test back to its intended name and
  fix its expected bitmask.
- Export amdgpu_dm_supported_{degam,shaper,blnd}_tfs via
  EXPORT_IF_KUNIT().

Assisted-by: Copilot:claude-4-opus
Reviewed-by: Alex Hung <alex.hung@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdkfd: Fix UML build guards for x86_64-only code
Alex Hung [Thu, 14 May 2026 17:01:39 +0000 (11:01 -0600)] 
drm/amdkfd: Fix UML build guards for x86_64-only code

cpu_data().topo.apicid and kfd_fill_iolink_info_for_cpu() rely on
x86-specific structs not present on UML. The kfd_topology.c and
kfd_crat.c were guarded by CONFIG_X86_64 alone, causing build
failures when CONFIG_DRM_AMDGPU is selected on UML.

Update guards to '#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)'
to ensure x86_64-only paths are excluded on UML builds.

Fixes: af3f2f5db265 ("drm/amdgpu: Remove UML build exclusion from Kconfig")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605140506.TI8zPIBG-lkp@intel.com/
Cc: Harry Wentland <harry.wentland@amd.com>
Assisted-by: Copilot:Claude-Sonnet-4.6
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add KUnit test for ISM functions
Alex Hung [Thu, 23 Apr 2026 20:12:36 +0000 (14:12 -0600)] 
drm/amd/display: Add KUnit test for ISM functions

Add KUnit tests for three static functions in amdgpu_dm_ism.c:
dm_ism_next_state, dm_ism_get_sso_delay, and
dm_ism_get_idle_allow_delay.

The 32 test cases cover the full FSM transition table,
SSO delay calculation with various timings, and
hysteresis-based idle allow delay including circular
buffer wraparound and old history cutoff logic.

Conditionally remove static linkage and export the three
functions under CONFIG_DRM_AMD_DC_KUNIT_TEST so the test
module can call them.

Assisted-by: Copilot:Claude-Opus-4.6
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add KUnit test for replay
Alex Hung [Thu, 23 Apr 2026 01:51:47 +0000 (19:51 -0600)] 
drm/amd/display: Add KUnit test for replay

Add KUnit tests for amdgpu_dm_link_supports_replay() which
validates panel replay capability based on link DPCD caps,
freesync state, and VSDB info. Nine test cases cover the
positive path and each individual failure condition.

Export the function under CONFIG_DRM_AMD_DC_KUNIT_TEST and
add the amdgpu include path to the tests Makefile so that
amdgpu_dm.h can resolve amdgpu_mode.h types under UML.

Assisted-by: Copilot:Claude-Opus-4.6
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add KUnit test for PSR function
Alex Hung [Thu, 23 Apr 2026 00:12:46 +0000 (18:12 -0600)] 
drm/amd/display: Add KUnit test for PSR function

Add KUnit tests for amdgpu_dm_psr_fill_caps() which
validates PSR capability population from DPCD data.

Export amdgpu_dm_psr_fill_caps() conditionally when
CONFIG_DRM_AMD_DC_KUNIT_TEST is enabled, following the
existing pattern used by CRC and HDCP test files.

The test covers PSR version mapping, RFB setup time
calculation, link training flag, DPCD field passthrough,
rate control caps, and power optimization flags.

Assisted-by: Copilot:Claude-Opus-4.6
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add KUnit test for color helpers
Alex Hung [Wed, 22 Apr 2026 17:54:33 +0000 (11:54 -0600)] 
drm/amd/display: Add KUnit test for color helpers

Add KUnit tests for six pure-logic functions in
amdgpu_dm_color.c: amdgpu_dm_fixpt_from_s3132,
__is_lut_linear, __drm_ctm_to_dc_matrix,
__drm_ctm_3x4_to_dc_matrix, amdgpu_tf_to_dc_tf,
and amdgpu_colorop_tf_to_dc_tf.

Expose these static functions under CONFIG_DRM_AMD_DC_KUNIT_TEST
and add a new amdgpu_dm_color.h header with the KUnit-only
prototypes. The test file re-declares the dc and amdgpu
transfer function enums locally to avoid pulling in the full
DC/amdgpu include chain that fails under UML.

26 test cases cover signed-magnitude to two's complement
conversion, LUT linearity detection, CTM-to-DC matrix
conversion, and transfer function enum mapping.

Assisted-by: Copilot:Claude-Opus-4.6
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add KUnit test for colorop TF bitmasks
Alex Hung [Wed, 22 Apr 2026 17:37:19 +0000 (11:37 -0600)] 
drm/amd/display: Add KUnit test for colorop TF bitmasks

Add KUnit tests that verify the three supported transfer
function bitmask constants exported by amdgpu_dm_colorop.c:
amdgpu_dm_supported_degam_tfs, amdgpu_dm_supported_shaper_tfs,
and amdgpu_dm_supported_blnd_tfs.

Each bitmask is tested for presence of each expected curve
flag and absence of any unexpected bits.  A cross-check
confirms that degam and blnd bitmasks are identical.

amdgpu_dm_initialize_default_pipeline() is not tested
because it needs a fully initialised drm_plane backed by
an amdgpu_device with DC color caps.

Assisted-by: Copilot:Claude-Opus-4.6
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add KUnit test for HDCP process_output
Alex Hung [Wed, 22 Apr 2026 02:57:34 +0000 (20:57 -0600)] 
drm/amd/display: Add KUnit test for HDCP process_output

Expose process_output() as non-static when CONFIG_DRM_AMD_DC_KUNIT_TEST
is enabled and add KUnit tests exercising its full branch logic:

- property_validate_dwork is always enqueued (delay=0)
- callback_dwork is scheduled when callback_needed is set
- callback_dwork is cancelled when callback_stop is set
- watchdog_timer_dwork is scheduled when watchdog_timer_needed is set
- watchdog_timer_dwork is cancelled when watchdog_timer_stop is set
- Both dworks are scheduled independently when both flags are set

Assisted-by: Copilot:Claude-Sonnet-4.6
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add KUnit test for CRC function
Aurabindo Pillai [Wed, 4 Feb 2026 20:15:20 +0000 (15:15 -0500)] 
drm/amd/display: Add KUnit test for CRC function

DM CRC parsing functions are an easy candidate for exploring the use of
KUnit unit-testing frameworks. Add a few tests for the same.

The test file and .kunitconfig are placed under amdgpu_dm/tests/ to
follow the convention of keeping test code separate from production
sources.

Assisted-by: Copilot:Claude-Opus-4.6
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: restructure VM state machine v4
Christian König [Tue, 20 Jan 2026 12:09:52 +0000 (13:09 +0100)] 
drm/amdgpu: restructure VM state machine v4

Instead of coming up with more sophisticated names for states a VM BO
can be in, group them by the type of BO first and then by the state.

So we end with BO type kernel, always_valid and individual and then states
evicted, moved and idle.

Not much functional change, except that evicted_user is moved back
together with the other BOs again which makes the handling in
amdgpu_vm_validate() a bit more complex.

Also fixes a problem with user queues and amdgpu_vm_ready(). We didn't
considered the VM ready when user BOs were not ideally placed, harmless
performance impact for kernel queues but a complete show stopper for
userqueues.

v2: fix a few typos in comments, rename the BO types to make them more
    descriptive, fix a couple of bugs found during testing
v3: squashed together with revert to old status lock handling, looks
    like the first patch still had some bug which this one here should fix.
    Fix a missing lock around debugfs printing.
v4: fix merge clash pointed out by Prike

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: return error when converting records to nps pages fails
Gangliang Xie [Tue, 12 May 2026 07:14:33 +0000 (15:14 +0800)] 
drm/amd/ras: return error when converting records to nps pages fails

return error when converting records to nps pages fails

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: add first record offset check
Gangliang Xie [Tue, 12 May 2026 07:09:06 +0000 (15:09 +0800)] 
drm/amd/ras: add first record offset check

check the upper and lower limits of first record offset

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: check and drop invalid bad page records
YiPeng Chai [Tue, 12 May 2026 07:09:52 +0000 (15:09 +0800)] 
drm/amdgpu: check and drop invalid bad page records

Check and drop invalid bad page records.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: copy ras log data instead of referencing pointers
YiPeng Chai [Tue, 19 May 2026 05:47:34 +0000 (13:47 +0800)] 
drm/amd/ras: copy ras log data instead of referencing pointers

When generating ras cper file, the original data nodes in the ras
log ring buffer may be deleted, leading to invalid pointer
access. Copy the data from the ras log ring instead of directly
referencing the pointers to avoid this issue.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: validate and share PSP fw_pri_buf copies via psp_copy_fw
Candice Li [Wed, 13 May 2026 10:13:30 +0000 (18:13 +0800)] 
drm/amdgpu: validate and share PSP fw_pri_buf copies via psp_copy_fw

Change psp_copy_fw from void to int: return -ENODEV when drm_dev_enter
fails, and -EINVAL when the image size is zero or larger than the
1 MiB PSP private buffer.

Replace open-coded memset/memcpy into fw_pri_buf with psp_copy_fw.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: add length check for ras command output buffer
YiPeng Chai [Tue, 19 May 2026 05:46:55 +0000 (13:46 +0800)] 
drm/amd/ras: add length check for ras command output buffer

Add length check for ras command output buffer.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: fix memory leak on ras sw_init failure
YiPeng Chai [Tue, 12 May 2026 03:00:14 +0000 (11:00 +0800)] 
drm/amd/ras: fix memory leak on ras sw_init failure

Fix memory leak on ras sw_init failure.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: fix handling in amdgpu_userq_create
Christian König [Mon, 27 Apr 2026 14:31:31 +0000 (16:31 +0200)] 
drm/amdgpu: fix handling in amdgpu_userq_create

Well mostly the same issues the other code had as well:

1. Memory allocation while holding the userq_mutex lock is forbidden!
2. Things were created/started/published in the wrong order.
3. The reset lock was taken in the wrong order and seems to be
   unecessary in the first place.
4. Error messages on invalid input parameters can spam the logs.
5. Error messages on memory allocation failures are usually superflous
   as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: remove unused code
YiPeng Chai [Tue, 12 May 2026 02:40:19 +0000 (10:40 +0800)] 
drm/amd/ras: remove unused code

Remove unused code.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: add error handling for seqno operations
YiPeng Chai [Tue, 12 May 2026 02:18:23 +0000 (10:18 +0800)] 
drm/amd/ras: add error handling for seqno operations

Add error handling for seqno operations.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: use mutex to prevent concurrent access conflicts
YiPeng Chai [Tue, 12 May 2026 02:04:07 +0000 (10:04 +0800)] 
drm/amd/ras: use mutex to prevent concurrent access conflicts

Use mutex to prevent concurrent access conflicts.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: add first record offset check
Gangliang Xie [Tue, 12 May 2026 07:05:16 +0000 (15:05 +0800)] 
drm/amdgpu: add first record offset check

check the upper and lower limits of first record offset

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: Bound GPIO I2C table entry count from VBIOS
Candice Li [Wed, 13 May 2026 04:31:57 +0000 (12:31 +0800)] 
drm/amdgpu: Bound GPIO I2C table entry count from VBIOS

Reject undersized tables and cap the derived entry count
to AMDGPU_MAX_I2C_BUS so we do not overrun adev->i2c_bus[]
or walk an absurd number of entries on corrupt size fields.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: cap ATOM command table nesting depth
Candice Li [Wed, 13 May 2026 03:11:15 +0000 (11:11 +0800)] 
drm/amdgpu: cap ATOM command table nesting depth

Cap nesting at 32 levels with execute_depth and
return -ELOOP when exceeded.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: bound CPER record fetch buffer size
Candice Li [Wed, 13 May 2026 02:46:01 +0000 (10:46 +0800)] 
drm/amd/ras: bound CPER record fetch buffer size

Bound CPER record fetch allocation by buffer size.

v2: Drop redundant cap on cper_num and raise
    GET_CPER_RECORD max buffer size.

Suggested-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: Add more IP versions for uniras
Ce Sun [Mon, 18 May 2026 07:01:52 +0000 (15:01 +0800)] 
drm/amd/ras: Add more IP versions for uniras

Add more IP versions for uniras

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: Fix memory leak of i2s_pdata in ACP initialization
Ce Sun [Tue, 12 May 2026 01:53:21 +0000 (09:53 +0800)] 
drm/amdgpu: Fix memory leak of i2s_pdata in ACP initialization

Currently, the i2s_pdata structure is dynamically allocated in
acp_hw_init() but never freed in both the error handling path and
the acp_hw_fini() cleanup path, causing a permanent memory leak.

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/radeon/evergreen_cs: Add missing NULL prefix check in surface check
Vitaliy Triang3l Kuzmin [Fri, 15 May 2026 21:48:32 +0000 (00:48 +0300)] 
drm/radeon/evergreen_cs: Add missing NULL prefix check in surface check

'evergreen_surface_check' is called with a NULL warning prefix when
handling potentially recoverable issues or just to compute the alignment
requirements, and 'evergreen_surface_check' is called again in case of
failure (with the correct prefix, as opposed to NULL), therefore, the
initial check must not print a warning, because the surface may be
accepted successfully after having been corrected, however if it isn't,
the final check will print the warning anyway. The surface check
functions specific to array modes already implement this behavior, but
the 'evergreen_surface_check' function itself doesn't.

This is also supposed to fix the "'%s' directive argument is null
[-Werror=format-overflow=]" compiler warning.

Fixes: 285484e2d55e ("drm/radeon: add support for evergreen/ni tiling informations v11")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vitaliy Triang3l Kuzmin <ml@triang3l.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: Fix SMU EEPROM record field decoding
Xiang Liu [Mon, 11 May 2026 08:45:08 +0000 (16:45 +0800)] 
drm/amd/ras: Fix SMU EEPROM record field decoding

The SMU EEPROM read paths pass byte-sized record field addresses
to mca_ipid_parse(), whose outputs are u32 pointers.

Writing through those widened pointers can clobber adjacent fields
and bytes beyond the record storage.

Parse the IPID values into local u32 temporaries instead, then
explicitly narrow the values when storing them in the EEPROM record.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: reset CPER ring on corrupt entry size
Xiang Liu [Mon, 11 May 2026 07:48:55 +0000 (15:48 +0800)] 
drm/amd/ras: reset CPER ring on corrupt entry size

When CPER ring overflow handling advances the read pointer, it trusts the
parsed entry size from the current ring contents. Corrupt CPER data can
produce an entry size that does not advance rptr after dword conversion
and pointer masking.

In that case the recovery loop keeps testing the same location while
holding the CPER ring mutex. This can hang the worker that is writing the
next CPER record.

Detect a no-progress rptr update and reset the CPER ring to an empty
state instead. This drops the corrupt contents and lets the writer leave
the recovery path without spinning.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: userq_va_mapped should remain true once done
Sunil Khatri [Wed, 13 May 2026 07:59:35 +0000 (13:29 +0530)] 
drm/amdgpu: userq_va_mapped should remain true once done

Multiple queues needs these bo_va objects belonging to
the same uq_mgr. So once they are mapped lets not unmap
them as at any point of time any of the queues might be
using it.

Also userq_va_mapped should be a boolean than atomic.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: avoid integer overflow in VA range check
Ce Sun [Mon, 11 May 2026 10:04:57 +0000 (18:04 +0800)] 
drm/amdgpu: avoid integer overflow in VA range check

The original addition operation in 64-bit unsigned type may encounter
overflow situations. To prevent such issues and safely reject invalid
inputs, the check_add_overflow() function is used.

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/ras: Fix UMC error address allocation leak
Xiang Liu [Mon, 11 May 2026 13:28:59 +0000 (21:28 +0800)] 
drm/amd/ras: Fix UMC error address allocation leak

amdgpu_umc_handle_bad_pages() allocates err_data->err_addr before
querying UMC error information. In the direct and firmware query paths,
the pointer is reassigned to a fresh allocation before the original
buffer is released, so the initial allocation is leaked on each handled
event.

Free the existing buffer before replacing it in those query paths so the
function exit cleanup only owns the active allocation.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: unmap all user mappings of framebuffer and doorbell before mode1 reset
Yifan Zhang [Mon, 11 May 2026 14:14:23 +0000 (22:14 +0800)] 
drm/amdgpu: unmap all user mappings of framebuffer and doorbell before mode1 reset

During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily
inaccessible via PCIe. Any attempt to access framebuffer or MMIO registers during
this window can result in uncompleted PCIe transactions, leading to NMI panics or
system hangs.

To prevent this, Unmap all of the applications mappings of the framebuffer
and doorbell BARs before mode1 reset. Also prevent new mappings from coming in
during the reset process.

v2: remove inode in kfd_dev (Christian)
v3: correct unmap offset (Felix), remove prevent new mappings part
to avoid deadlock (Christian)

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Promote DC to 3.2.383
Taimur Hassan [Sat, 9 May 2026 20:51:05 +0000 (15:51 -0500)] 
drm/amd/display: Promote DC to 3.2.383

This version brings along the following updates:

 - Add amdgpu_dm KUnit test for:
   * CRC function
   * HDCP process_output
   * colorop TF bitmasks
   * color helpers
   * PSR and Replay functions
   * ISM functions
 - Fix eDP receiver ready status check in T7 sequence
 - Enable dcn42 pstate pmo
 - Refactor PSR. Replay and ABM functionality into dedicated power modules
 - Fix assertion due to disable/enable CM blocks
 - Enable additional wait for pipe pending checks
 - Fix ISM dc_lock deadlock during suspend
 - Use lockdep_assert_held() for dc_lock check
 - Fix clear PSR config flow
 - Exclude the MST overhead from BW deallocation
 - Allow power up even w/ powergating disabled on DCN42
 - Fix integer overflow in bios_get_image()
 - Validate GPIO pin LUT table size before iterating
 - Add Auxless-ALPM support in VESA Panel Replay
 - Add debug option for replay ESD recovery.
 - Validate payload length and link_index in dc_process_dmub_aux_transfer_async.
 - Add ADDR3 swizzle modes.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add ADDR3 swizzle modes
Wenxian Wang [Sat, 9 May 2026 02:46:23 +0000 (10:46 +0800)] 
drm/amd/display: Add ADDR3 swizzle modes

[Why]
New swizzle modes are needed for ADDR3 block support.

[How]
Add DC_ADDR3_SW_64KB_2D_Z and DC_ADDR3_SW_256KB_2D_Z enum
values to dc_hw_types.h.

Reviewed-by: Ilya Bakoulin <ilya.bakoulin@amd.com>
Signed-off-by: Wenxian Wang <wenxian.wang@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Validate payload length and link_index in dc_process_dmub_aux_transf...
Harry Wentland [Thu, 7 May 2026 20:26:31 +0000 (16:26 -0400)] 
drm/amd/display: Validate payload length and link_index in dc_process_dmub_aux_transfer_async

[Why&How]
dc_process_dmub_aux_transfer_async() copies payload->length bytes into a
16-byte stack buffer (dpaux.data[16]) guarded only by an ASSERT(), which
is a no-op in release builds. If a caller ever passes length > 16 this
results in a stack buffer overflow via memcpy.

Additionally, link_index is used to dereference dc->links[] without
bounds checking against dc->link_count, risking an out-of-bounds access.

Replace the ASSERT with a hard runtime check that returns false when
payload->length exceeds the destination buffer size, and add a bounds
check for link_index before it is used.

Assisted-by: GitHub Copilot:Claude claude-4-opus
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add debug option for replay ESD recovery
Wei-Guang Li [Wed, 6 May 2026 12:32:33 +0000 (20:32 +0800)] 
drm/amd/display: Add debug option for replay ESD recovery

[Why&How]
Add a new debug option "enable_replay_esd_recovery" to control whether
to enable the replay ESD recovery feature.

Reviewed-by: Robin Chen <robin.chen@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Wei-Guang Li <wei-guang.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add Auxless-ALPM support in VESA Panel Replay
Leon Huang [Thu, 30 Apr 2026 06:53:21 +0000 (14:53 +0800)] 
drm/amd/display: Add Auxless-ALPM support in VESA Panel Replay

[How]
Add Auxless-ALPM data in VESA PR initialization

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Validate GPIO pin LUT table size before iterating
Harry Wentland [Mon, 4 May 2026 20:14:11 +0000 (16:14 -0400)] 
drm/amd/display: Validate GPIO pin LUT table size before iterating

[Why&How]
The GPIO pin table parsers in get_gpio_i2c_info() and
bios_parser_get_gpio_pin_info() derive an element count from the VBIOS
table_header.structuresize field, then iterate over gpio_pin[] entries.
However, GET_IMAGE() only validates that the table header itself fits
within the BIOS image. If the VBIOS reports a structuresize larger than
the actual mapped data, the loop reads past the end of the BIOS image,
causing an out-of-bounds read.

Fix this by calling bios_get_image() to validate that the full claimed
structuresize is accessible within the BIOS image before entering the
loop in both functions.

Assisted-by: GitHub Copilot:claude-opus-4-6
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Fix integer overflow in bios_get_image()
Harry Wentland [Mon, 4 May 2026 15:14:45 +0000 (11:14 -0400)] 
drm/amd/display: Fix integer overflow in bios_get_image()

[Why&How]
The bounds check in bios_get_image() computes 'offset + size' using
unsigned 32-bit arithmetic before comparing against bios_size. If a
VBIOS image contains a near-UINT32_MAX offset the addition wraps to a
small value, the comparison passes, and the function returns a wild
pointer past the VBIOS mapping.

Additionally, the comparison uses '<' (strict), which incorrectly
rejects the valid exact-fit case where offset + size == bios_size.

Fix both issues by restructuring the check to avoid the addition
entirely: first reject if offset alone exceeds bios_size, then check
size against the remaining space (bios_size - offset). This eliminates
the overflow and correctly permits exact-fit accesses.

Assisted-by: GitHub Copilot:claude-opus-4.6
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Refactor Replay functionality into dedicated power_replay module
Lohita Mudimela [Tue, 28 Apr 2026 11:54:09 +0000 (17:24 +0530)] 
drm/amd/display: Refactor Replay functionality into dedicated power_replay module

[Why]
Extract all Replay related functions from power.c and
power_helpers.c into a new power_replay.c module for
better code organization and maintainability.

[How]
Create new power_replay.c file containing
Replay-related functions moved from power.c
and power_helpers.c . Update mod_power.h with
function declarations. Maintain forward
declaration for type compatibility.

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Lohita Mudimela <lohita.mudimela@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Allow power up when PG disallowed in driver
Charlene Liu [Mon, 27 Apr 2026 23:09:02 +0000 (19:09 -0400)] 
drm/amd/display: Allow power up when PG disallowed in driver

[Why]
Do not exit early dcn42 pg control functions on power up for pipe PG
failsafe.

Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Exclude the MST overhead from BW deallocation
Cruise Hung [Wed, 6 May 2026 13:19:10 +0000 (21:19 +0800)] 
drm/amd/display: Exclude the MST overhead from BW deallocation

[Why]
The MST overhead was incorrectly included
in the requested BW during BW deallocation.

[How]
Exclude the MST overhead from BW deallocation.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Use lockdep_assert_held() for dc_lock check
Ray Wu [Mon, 4 May 2026 06:32:13 +0000 (14:32 +0800)] 
drm/amd/display: Use lockdep_assert_held() for dc_lock check

[Why]
mutex_is_locked() only tells whether *some* task holds the mutex, not
the current one, so the existing ASSERT can silently pass when the
caller violates the contract.

[How]
Use the kernel's lockdep debugging utility (include/linux/lockdep.h)
and replace ASSERT(mutex_is_locked(&dm->dc_lock)) with
lockdep_assert_held(&dm->dc_lock), which checks the current task's
held-lock stack.

Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Fix ISM dc_lock deadlock during suspend
Ray Wu [Thu, 30 Apr 2026 02:08:16 +0000 (10:08 +0800)] 
drm/amd/display: Fix ISM dc_lock deadlock during suspend

[Why]
System hang observed during suspend/resume while video is playing.
amdgpu_dm_ism_disable() is called under dc_lock and waits for ISM
delayed work via disable_delayed_work_sync(). The work handlers
themselves take dc_lock, producing an ABBA deadlock when a worker is
in flight at suspend time.

[How]
Split the disable path into two phases with opposite locking
contracts:
  1. amdgpu_dm_ism_disable() -- quiesces workers, must NOT hold
     dc_lock.
  2. amdgpu_dm_ism_force_full_power() (new) -- drives the ISM FSM
     back to FULL_POWER_RUNNING, must hold dc_lock.

Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Enable additional wait for pipe pending checks
Aric Cyr [Tue, 5 May 2026 20:49:47 +0000 (16:49 -0400)] 
drm/amd/display: Enable additional wait for pipe pending checks

[why]
In cases where there are two FULL updates within the same display frame,
it's possible for some blocks to be programmed a second time without having
been latched completely from the first programming.

DCN 3.5 and up already work around this with additional validation checks
for frame count and defer as needed via fsleep.

[how]
Enabled existing pipe checks generically for all DCN versions to avoid HW
programming hazards.

Also removed redundant max_frame_count which can be determined by the
register mask and shift.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Aric Cyr <Aric.Cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Fix assertion due to disable/enable CM blocks
Aric Cyr [Tue, 5 May 2026 20:27:29 +0000 (16:27 -0400)] 
drm/amd/display: Fix assertion due to disable/enable CM blocks

[why]
Some dc state transitions can result in CM blocks being disabled, then
re-enabled.  The disable will set a defer bit, but re-enable will not
clear it.  When optimizing later, an assert will be hit due to incorrect
expected HW state.

[how]
Clear defer bits if the block is re-enabled before optimization is
executed.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Aric Cyr <Aric.Cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Refactor PSR functionality into dedicated power_psr module
Lohita Mudimela [Wed, 22 Apr 2026 08:19:55 +0000 (13:49 +0530)] 
drm/amd/display: Refactor PSR functionality into dedicated power_psr module

[Why]
Extract all PSR (Panel Self Refresh) related functions from power.c
into a new power_psr.c module for better code organization and
maintainability.

[How]
Create new power_psr.c file containing all PSR-related functions
moved from power.c. Remove static qualifier from shared functions
to enable cross-file access:
- psr_context_to_mod_power_psr_context: Convert PSR context to
  module power PSR context
- map_index_from_stream: Map stream to power entity index
- delay_two_frames: Wait for two frame periods

Add function declarations to header. Maintain forward declaration of struct
core_power for type compatibility.

Reviewed-by: Anthony Koo <anthony.koo@amd.com>
Signed-off-by: Lohita Mudimela <lohita.mudimela@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Fix eDP receiver ready status check in T7 sequence
Sung-huai Wang [Tue, 21 Apr 2026 04:53:56 +0000 (12:53 +0800)] 
drm/amd/display: Fix eDP receiver ready status check in T7 sequence

[Why]
Some eDP panels return sinkstatus as 0x5, causing the original sinkstatus == 1
check to never match and resulting in unnecessary polling delay. The
equality check is too restrictive and doesn't properly validate the
specific status bit that indicates receiver readiness.

[How]
Replace direct value comparison with proper bitmask check using
DP_RECEIVE_PORT_0_STATUS constant.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Sung-huai Wang <Danny.Wang@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agoRevert "drm/amd/display: dmub_cmd.h: add missing kernel-doc for enums"
James Lin [Tue, 5 May 2026 22:49:33 +0000 (06:49 +0800)] 
Revert "drm/amd/display: dmub_cmd.h: add missing kernel-doc for enums"

[Why & How]
This reverts commit f71d0b68ec58b781f4f44ea642846bedce075e85.

1. Auto-generated Header: The file 'dmub_cmd.h' is an auto-generated header
   managed in an external repository (dmu_stg). Manual changes made directly in
   this repository will be overwritten and lost during the next automated weekly
   synchronization.

2. Tooling Compatibility: This header is governed by internal AMD firmware
   standards which require Doxygen formatting for cross-team documentation.
   Moving to kernel-doc syntax may break internal documentation pipelines.

3. Suppressing Warnings: Current 'make htmldocs' and 'make W=1' builds
   do not actively scan 'dmub_cmd.h' for kernel-doc compliance, thus no warnings
   are triggered during standard compilation. To address warnings generated when
   manually running './scripts/kernel-doc', we have added a notice at the file
   header indicating that this is an auto-generated file that does not strictly
   follow kernel-doc formatting. This ensures that any future linting tools or
   manual checks recognize the formatting as intentional.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: James Lin <pinglei.lin@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd/display: Add some missing code for dcn42
James Lin [Thu, 7 May 2026 00:01:51 +0000 (08:01 +0800)] 
drm/amd/display: Add some missing code for dcn42

[why & how]
Some DCN4.2 related code is missing from upstream

Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373")
Acked-by: ChiaHsuan Chung <ChiaHsuan.Chung@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: James Lin <pinglei.lin@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amd: Reduce code duplication in runtime PM
Mario Limonciello [Fri, 15 May 2026 20:30:57 +0000 (15:30 -0500)] 
drm/amd: Reduce code duplication in runtime PM

[Why]
amdgpu_pmops_runtime_suspend() runs almost the same code  that
amdgpu_pmops_runtime_idle() runs. That is there is pointless code
duplication.

[How]
Move amdgpu_pmops_runtime_idle() up, extract common code and then
call from both functions.  No intended functional changes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdkfd: Check bounds for allocate_sdma_queue restore_sdma_id
David Francis [Tue, 12 May 2026 19:18:18 +0000 (15:18 -0400)] 
drm/amdkfd: Check bounds for allocate_sdma_queue restore_sdma_id

allocate_sdma_queue has an option where the sdma queue id can be
specified (used by CRIU). We weren't bounds-checking that
value.

Confirm it's less than the maximum number of queues.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu: use atomic operation to achieve lockless serialization
Sunil Khatri [Thu, 14 May 2026 07:01:00 +0000 (12:31 +0530)] 
drm/amdgpu: use atomic operation to achieve lockless serialization

In amdgpu_seq64_alloc there is a possibility that two difference cores
from two separate NODES can try to and could get the same free slot.
So this fixes that race here using atomic test_and_set clear operations.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdgpu/vce3: Fix VCE 3 firmware size and offsets
Timur Kristóf [Wed, 13 May 2026 20:04:16 +0000 (22:04 +0200)] 
drm/amdgpu/vce3: Fix VCE 3 firmware size and offsets

The VCPU BO contains the actual FW at an offset, but
it was not calculated into the VCPU BO size.
Subtract this from the FW size to make sure there is
no out of bounds access.

This may fix VM faults when using VCE 3.

Cc: John Olender <john.olender@gmail.com>
Fixes: e98226221467 ("drm/amdgpu: recalculate VCE firmware BO size")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 weeks agodrm/amdkfd: Check bounds on allocate_doorbell
David Francis [Tue, 12 May 2026 19:15:33 +0000 (15:15 -0400)] 
drm/amdkfd: Check bounds on allocate_doorbell

allocated_doorbell has an option to set the doorbell id
to a specific value (used by CRIU). This value was not
bounds checked.

Check to confirm it's less than KFD_MAX_NUM_OF_QUEUES_PER_PROCESS.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>