git.ipfire.org Git - thirdparty/linux.git/log

drm/amd/display: Refactor DC update checks

[WHY&HOW]
DC currently has fragmented definitions of update types.  This changes
consolidates them into a single interface, and adds expanded
functionality to accommodate all use cases.
- adds `dc_check_update_state_and_surfaces_for_stream` to determine
  update type including state, surface, and stream changes.
- adds missing surface/stream update checks to
  `dc_check_update_surfaces_for_stream`
- adds new update type `UPDATE_TYPE_ADDR_ONLY` to accomodate flows where
further distinction from `UPDATE_TYPE_FAST` was needed
- removes caller reliance on `enable_legacy_fast_update` to determine
  which commit function to use, instead embedding it in the update type

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix drm_edid leak in amdgpu_dm

[WHAT]
When a sink is connected, aconnector->drm_edid was overwritten without
freeing the previous allocation, causing a memory leak on resume.

[HOW]
Free the previous drm_edid before updating it.

Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix DCN42 memory clock table using MemClk instead of UClk

[Why]
DCN42 was using UClk values instead of MemClk from MemPstateTable, causing
DML to see half the actual DRAM bandwidth on DDR5 systems and reject high
refresh rate modes.

[How]
Change dcn42_init_clocks() to use MemPstateTable[i].MemClk instead of
MemPstateTable[i].UClk for memclk_mhz initialization.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Alexander Chechik <alexander.chechik@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add Extra SMU Log for dtbclk

[why]
need to check dtbclk in log for confirmation

Reviewed-by: Gabe Teeger <gabe.teeger@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Update underflow detection for DCN42

[Why]
The DCN42 underflow detection functions in dcn42_optc.c use
OPTC_RSMU_UNDERFLOW register but the register offset definitions
were missing from dcn_4_2_0_offset.h and dcn42_resource.h.

[How]
Add missing register definitions.

Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix some more bug in amdgpu_gem_va_ioctl

Some illegal combination of input flags were not checked and we need to
take the PDEs into account when returning the fence as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Clamp min DS DCFCLK value to DCN limit

[why & how]
DCN has a global limit for minimum DS DCFCLK during any operation.

Adhere to that limit and add a debug flag.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Split arbiter programming for DCN42

[Why]
We don't want to update the timeout threshold for stall recovery in
firmware dynamically for DCN42 as we're not using FAMS.

Firmware should own programming of this register since the recovery
can be broken if driver updates the value to 0.

[How]
Split program_arbiter for dcn42 and skip the part that updates the
timeout threshold.

Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add missing dcn42 hubbub function pointers

This aligning commit combines:
- fix dcn42 det programming)
- fix missing dcn42 pointers
- fix SDPIF_Request_Rate_Limit programming value

V2: Add back dchvm_init for DCN42

Reviewed-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add get_default_tiling_info for dcn42

Add DCN42 portion that was stripped during previously.

Fixes: 8333f22e44a9 ("drm/amd/display: Query DC for gfx handling when setting linear tiling")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: move dcn42 bw_params init

Move it out of smu present block for cases where it isn't

Reviewed-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: System Hang When System enters to S0i3 w/ iGPU

[why]
System Hang when system enters to S0i3 w/ iGPU
some link_enc are NULL due to BIOS integration info table not correct,
but driver should have enough null pointer protection.

Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Move DPM clk read to clk_mgr_construct in DCN42

[Why&How]
The DPM clocks on DCN42 are currently read on every dm_resume, which can
cause in gpu memory freeing while the device is still in suspend.

Move the DPM clock read functionality to clk_mgr_construct() so it
completes once on driver enablement.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add MRQ programming for DCN42

[Why]
DCN401 didn't have a MRQ present so these fields didn't exist.

They are still present on DCN42 so we need to continue programming
them like we did on DCN35 or we can block have poor meta requesting
efficiency which blocks p-state.

[How]
Add `hubp42_program_requestor` which takes DML21 input and programs
the registers like DCN35 and prior.

Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: dcn42 don't round up disclk and dppclk

[why]
dml2 based on num_enabled clock != 2 to do clock ramming to dpm.
apu has 8 levels dispclk/dppclk/dcfclk/fclk, but only 4 levels of memclk.
to avoid mapping dispclk/dppclk to DPM clock,
based on arch review, force dispclk/dppclk num_level as 2.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Update dpia supported configuration

[Why & How]
Init a flag to track if dpia enabled previously
and update that to boot options.

Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: prevent immediate PASID reuse case

PASID resue could cause interrupt issue when process
immediately runs into hw state left by previous
process exited with the same PASID, it's possible that
page faults are still pending in the IH ring buffer when
the process exits and frees up its PASID. To prevent the
case, it uses idr cyclic allocator same as kernel pid's.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3)

amdgpu_device_get_job_timeout_settings() passes a pointer directly
to the global amdgpu_lockup_timeout[] buffer into strsep().
strsep() destructively replaces delimiter characters with '\0'
in-place.

On multi-GPU systems, this function is called once per device.
When a multi-value setting like "0,0,0,-1" is used, the first
GPU's call transforms the global buffer into "0\00\00\0-1". The
second GPU then sees only "0" (terminated at the first '\0'),
parses a single value, hits the single-value fallthrough
(index == 1), and applies timeout=0 to all rings — causing
immediate false job timeouts.

Fix this by copying into a stack-local array before calling
strsep(), so the global module parameter buffer remains intact
across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
(256) bytes, which is safe for the stack.

v2: wrap commit message to 72 columns, add Assisted-by tag.
v3: use stack array with strscpy() instead of kstrdup()/kfree()
to avoid unnecessary heap allocation (Christian).

This patch was developed with assistance from Claude (claude-opus-4-6).

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx11: look at the right prop for gfx queue priority

Look at hqd_queue_priority rather than hqd_pipe_priority.
In practice, it didn't matter as both were always set for
kernel queues, but that will change in the future.

Fixes: 2e216b1e6ba2 ("drm/amdgpu/gfx11: handle priority setup for gfx pipe1")
Reviewed-by：Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx10: look at the right prop for gfx queue priority

Look at hqd_queue_priority rather than hqd_pipe_priority.
In practice, it didn't matter as both were always set for
kernel queues, but that will change in the future.

Fixes: b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue")
Reviewed-by：Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/pm: drop SMU driver if version not matched messages

It just leads to user confusion.

Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Skip discovery dump when topology is unavailable

When generating a devcoredump, amdgpu_discovery_dump() prints the IP
discovery topology.

The function already needs to handle the case where
adev->discovery.ip_top is NULL to avoid a crash.

Currently, the code prints a section header and an additional message
when the topology is unavailable.

However, for platforms where discovery is not used, this section is not
expected to be present. Printing an extra message adds unnecessary
output.

Simplify this by skipping the entire section when ip_top is NULL.

The NULL check is kept to avoid a crash, but no output is generated when
the discovery topology is unavailable.

Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Add input pointer validation in ras core helpers

Add NULL checks for helper input/output pointers that are directly
dereferenced, such as tm, seqno, dev_info and init_config.

Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Add NULL checks for ras_core sys_fn callbacks

Some ras core helper functions access ras_core and its callback
table (sys_fn) without validating them first.

Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: annotate eviction fence signaling path

Make sure lockdep sees the dependencies here.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Avoid NULL dereference in discovery topology coredump path v3

When a GPU fault or timeout happens, the driver creates a devcoredump
to collect debug information.

During this, amdgpu_devcoredump_format() calls
amdgpu_discovery_dump() to print IP discovery data.

amdgpu_discovery_dump() uses:
  adev->discovery.ip_top

and then accesses:
  ip_top->die_kset

amdgpu_discovery_dump() uses adev->discovery.ip_top. However,
ip_top may be NULL if the discovery topology was never initialized.

The current code does not check for this before using ip_top. As a
result, when ip_top is NULL, the coredump worker crashes while taking
the spinlock for ip_top->die_kset.

Fix this by checking for a missing ip_top before walking the discovery
topology. If it is unavailable, print a short message in the dump and
return safely.

- If ip_top is NULL, print a message and skip the dump
- Also add the same check in the cleanup path

This makes the coredump and cleanup paths safe even when the
discovery topology is not available.

KASAN trace:
[  522.228252] [IGT] amd_deadlock: starting subtest amdgpu-deadlock-sdma
[  522.240681] [IGT] amd_deadlock: starting dynamic subtest amdgpu-deadlock-sdma

...

[  522.952317] Write of size 4 at addr 0000000000000050 by task kworker/u129:5/5434
[  522.937526] BUG: KASAN: null-ptr-deref in _raw_spin_lock+0x66/0xc0
[  522.967659] Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]

...

[  522.969445] Call Trace:
[  522.969508]  _raw_spin_lock+0x66/0xc0
[  522.969518]  ? __pfx__raw_spin_lock+0x10/0x10
[  522.969534]  amdgpu_discovery_dump+0x61/0x530 [amdgpu]
[  522.971346]  ? pick_next_task_fair+0x3f6/0x1c60
[  522.971363]  amdgpu_devcoredump_format+0x84f/0x26f0 [amdgpu]
[  522.973188]  ? __pfx_amdgpu_devcoredump_format+0x10/0x10 [amdgpu]
[  522.975012]  ? psi_task_switch+0x2b5/0x9b0
[  522.975027]  ? __pfx___drm_printfn_coredump+0x10/0x10 [drm]
[  522.975198]  ? __pfx___drm_puts_coredump+0x10/0x10 [drm]
[  522.975366]  ? __schedule+0x113c/0x38d0
[  522.975381]  amdgpu_devcoredump_deferred_work+0x4c/0x1f0 [amdgpu]

v2: Updated commit message - Clarified that ip_top is not freed, it can
    just be NULL if discovery was not initialized. (Christian/Lijo)

v3: Removed the extra drm_warn() for sysfs init failure as sysfs already
    reports errors. (Christian)

Fixes: e81eff80aad6 ("drm/amdgpu: include ip discovery data in devcoredump")
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Do not skip unrelated mode changes in DSC validation

Starting with commit 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in
atomic check"), amdgpu resets the CRTC state mode_changed flag to false when
recomputing the DSC configuration results in no timing change for a particular
stream.

However, this is incorrect in scenarios where a change in MST/DSC configuration
happens in the same KMS commit as another (unrelated) mode change. For example,
the integrated panel of a laptop may be configured differently (e.g., HDR
enabled/disabled) depending on whether external screens are attached. In this
case, plugging in external DP-MST screens may result in the mode_changed flag
being dropped incorrectly for the integrated panel if its DSC configuration
did not change during precomputation in pre_validate_dsc().

At this point, however, dm_update_crtc_state() has already created new streams
for CRTCs with DSC-independent mode changes. In turn,
amdgpu_dm_commit_streams() will never release the old stream, resulting in a
memory leak. amdgpu_dm_atomic_commit_tail() will never acquire a reference to
the new stream either, which manifests as a use-after-free when the stream gets
disabled later on:

BUG: KASAN: use-after-free in dc_stream_release+0x25/0x90 [amdgpu]
Write of size 4 at addr ffff88813d836524 by task kworker/9:9/29977

Workqueue: events drm_mode_rmfb_work_fn
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xa0
print_address_description.constprop.0+0x88/0x320
? dc_stream_release+0x25/0x90 [amdgpu]
print_report+0xfc/0x1ff
? srso_alias_return_thunk+0x5/0xfbef5
? __virt_addr_valid+0x225/0x4e0
? dc_stream_release+0x25/0x90 [amdgpu]
kasan_report+0xe1/0x180
? dc_stream_release+0x25/0x90 [amdgpu]
kasan_check_range+0x125/0x200
dc_stream_release+0x25/0x90 [amdgpu]
dc_state_destruct+0x14d/0x5c0 [amdgpu]
dc_state_release.part.0+0x4e/0x130 [amdgpu]
dm_atomic_destroy_state+0x3f/0x70 [amdgpu]
drm_atomic_state_default_clear+0x8ee/0xf30
? drm_mode_object_put.part.0+0xb1/0x130
__drm_atomic_state_free+0x15c/0x2d0
atomic_remove_fb+0x67e/0x980

Since there is no reliable way of figuring out whether a CRTC has unrelated
mode changes pending at the time of DSC validation, remember the value of the
mode_changed flag from before the point where a CRTC was marked as potentially
affected by a change in DSC configuration. Reset the mode_changed flag to this
earlier value instead in pre_validate_dsc().

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5004
Fixes: 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in atomic check")
Signed-off-by: Yussuf Khalil <dev@pp3345.net>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Remove redundant NULL check in pending bad-bank list iteration

ras_umc_log_pending_bad_bank() walks through a list of pending ECC
bad-bank entries. These entries are saved when a bad-bank error cannot
be processed immediately, for example during a GPU reset.

Later, this function iterates over the pending list and retries logging
each bad-bank error. If logging succeeds, the entry is removed from the
list and the memory for that node is freed.

The loop uses list_for_each_entry_safe(), which already guarantees that
ecc_node points to a valid list entry while the loop body is executing.

Checking "ecc_node &&" inside the loop is therefore unnecessary and
redundant.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_umc.c:225 ras_umc_log_pending_bad_bank() warn: variable dereferenced before check 'ecc_node' (see line 223)

Fixes: 7a3f9c0992c4 ("drm/amd/ras: Add umc common ras functions")
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add smu v15_0_8 pmfw header

Add smu v15_0_8 pmfw header

v2: squash in updates (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add smu v15_0_8 message header

Add smu v15_0_8 message header

v2: squash in updates (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add smu v15_0_8 driver interface header

Add smu v15_0_8 driver interface header

v2: squash in updates (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: make amdgpu_user_wait_ioctl more resilent v2

When the memory allocated by userspace isn't sufficient for all the
fences then just wait on them instead of returning an error.

v2: use correct variable as pointed out by Sunil

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: replace WARN with DRM_ERROR for invalid sched priority

amdgpu_sched_ioctl() currently uses WARN(1, ...) when userspace passes
an out-of-range context priority value. WARN(1, ...) is unconditional
and produces a full stack trace, which is disproportionate for a simple
input validation failure -- the invalid value is already rejected with
-EINVAL on the next line.

Replace WARN(1, ...) with DRM_ERROR() to log the invalid value at an
appropriate level without generating a stack dump. The -EINVAL return
to userspace is unchanged.

No functional change for well-formed userspace callers.

v2:
- Reworked commit message to focus on appropriate log level for
parameter validation
- Clarified that -EINVAL behavior is preserved (Vitaly)

v3: completely drop that warning.
Invalid parameters should never clutter the system log. (Christian)

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge tag 'amd-drm-next-7.1-2026-03-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-7.1-2026-03-19:

amdgpu:
- Fix gamma 2.2 colorop TFs
- BO list fix
- LTO fix
- DC FP fix
- DisplayID handling fix
- DCN 2.01 fix
- MMHUB boundary fixes
- ISP fix
- TLB fence fix
- Hainan pm fix
- UserQ fixes
- MES 12.1 Updates
- GC 12.1 updates
- RAS fixes
- DML updates
- Cursor fixes
- SWSMU cleanups
- Misc cleanups
- Clean up duplicate format modifiers
- Devcoredump updates
- Cleanup mmhub cid handling
- Initial VCN 5.0.2 support
- Initial JPEG 5.0.2 support
- PSP 13.0.15 updates

amdkfd:
- Queue properties fix
- GC 12.1 updates

radeon:
- Hainan pm fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260319173334.479766-1-alexander.deucher@amd.com

drm/amdgpu: Add client ids for gmcv9 mmhubs

Initialize client ids for gmcv9 mmhubs

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add client ids for mmhub v2.x

Initialize client ids for mmhub v2.x

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add client ids for mmhub v3.x

Initialize client ids for mmhub v3.x

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add client ids for mmhub v4.x

Initialize client ids for mmhub v4.x

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add client id helpers to mmhub

Add data structure and helpers to get client id data of mmhub.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amd/display: Add NV12/P010 formats to primary plane"

With this change we're adding NV12 and P010 twice to reported
formats on a primary plane, which causes us to hit an assert
in Weston.

This reverts commit 63fff551318f5e0814b94f709a6dfaec789dcd7a.

Fixes: 63fff551318f ("drm/amd/display: Add NV12/P010 formats to primary plane")
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Remove dead negative offset check in amdgpu_virt_init_critical_region()

amdgpu_virt_init_critical_region() stores init_hdr_offset as u64.
The subsequent check for init_hdr_offset < 0 is therefore always false.

Drop the unreachable validation and rely on the existing
check_add_overflow() and VRAM end bounds check for offset validation.

This resolves the Smatch warning about comparing an unsigned value
against zero.

drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:953 amdgpu_virt_init_critical_region() warn: unsigned 'init_hdr_offset' is never less than zero.

Fixes: 07009df6494d ("drm/amdgpu: Introduce SRIOV critical regions v2 during VF init")
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Ellen Pan <yunru.pan@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Ellen Pan <yunru.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Move amdgpu_vm_is_bo_always_valid() before first use

Smatch reports that 'bo' could be NULL in amdgpu_vm_bo_update(), even
though amdgpu_vm_is_bo_always_valid() already checks for a NULL BO.

Move amdgpu_vm_is_bo_always_valid() earlier in the file so the helper
definition appears before its first use. This allows static analysis
tools to see the NULL check performed by the helper and avoids the
warning.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Drop redundant queue NULL check in hang detect worker

amdgpu_userq_hang_detect_work() retrieves the queue pointer using
container_of() from the embedded work item.

Since the work structure is part of struct amdgpu_usermode_queue,
the returned queue pointer cannot be NULL in normal execution.

Remove the redundant !queue check and keep the validation for
queue->userq_mgr.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c:159 amdgpu_userq_hang_detect_work() warn: can 'queue' even be NULL?

Fixes: 290f46cf5726 ("drm/amdgpu: Implement user queue reset functionality")
Cc: Jesse Zhang <Jesse.Zhang@amd.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu : Update psp 13_0_15 ip block support

Included psp_13_0_15 ip block for RAS

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: rework amdgpu_userq_wait_ioctl v4

Lockdep was complaining about a number of issues here. Especially lock
inversion between syncobj, dma_resv and copying things into userspace.

Rework the functionality. Split it up into multiple functions,
consistenly use memdup_array_user(), fix the lock inversions and a few
more bugs in error handling.

v2: drop the dma_fence leak fix, turned out that was actually correct,
just not well documented. Apply some more cleanup suggestion from
Tvrtko.
v3: rebase on already done cleanups
v4: add missing dma_fence_put() in error path.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix adding eviction fence

We can't add the eviction fence without validating the BO.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix eviction fence and userq manager shutdown

That is a really complicated dance and wasn't implemented fully correct.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: completely rework eviction fence handling v2

Well that was broken on multiple levels.

First of all a lot of checks were placed at incorrect locations, especially if
the resume worker should run or not.

Then a bunch of code was just mid-layering because of incorrect assignment who
should do what.

And finally comments explaining what happens instead of why.

Just re-write it from scratch, that should at least fix some of the hangs we
are seeing.

Use RCU for the eviction fence pointer in the manager, the spinlock usage was
mostly incorrect as well. Then finally remove all the nonsense checks and
actually add them in the correct locations.

v2: some typo fixes and cleanups suggested by Sunil

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: apply state adjust rules to some additional HAINAN vairants

They need a similar workaround.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: apply state adjust rules to some additional HAINAN vairants

They need a similar workaround.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: rework how we handle TLB fences

Add a new VM flag to indicate whether or not we need
a TLB fence.  Userqs (KFD or KGD) require a TLB fence.
A TLB fence is not strictly required for kernel queues,
but it shouldn't hurt.  That said, enabling this
unconditionally should be fine, but it seems to tickle
some issues in KIQ/MES.  Only enable them for KFD,
or when KGD userq queues are enabled (currently via module
parameter).

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749
Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update")
Cc: Christian König <christian.koenig@amd.com>
Cc: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add JPEG_v5_0_2 IP block

Add support for JPEG_5_0_2

v2: comment out RAS for now (Alex)
v3: drop some bringup leftovers (Alex)

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Set VCN_5_0_2 DPG mode

Set DPG flag for VCN_5_0_2

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add VCN_5_0_2 codecs capabilities support

Support VCN_5_0_2 codec query

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add VCN v5_0_2

Add support for VCN_5_0_2

v2: squash in RRMT enable bit fix from Sonny (Alex)
v3: sqaush in doorbell enablement patch (Alex)
v4: drop some bringup leftovers (Alex)

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add mutex lock for metrics table

Add metrics table mutex lock in smu table context struct

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update pm attributes

Update pm attributes show/hide for gc_v12_1_0

v2: Use multi-aid check (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add fru eeprom info support

Add fru eeprom info support for smu_v15_0_8

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Fix NULL deref in ras_core_get_utc_second_timestamp()

ras_core_get_utc_second_timestamp() retrieves the current UTC timestamp
(in seconds since the Unix epoch) through a platform-specific RAS system
callback and is used for timestamping RAS error events.

The function checks ras_core in the conditional statement before calling
the sys_fn callback. However, when the condition fails, the function
prints an error message using ras_core->dev.

If ras_core is NULL, this can lead to a potential NULL pointer
dereference when accessing ras_core->dev.

Add an early NULL check for ras_core at the beginning of the function
and return 0 when the pointer is not valid. This prevents the
dereference and makes the control flow clearer.

Fixes: 13c91b5b4378 ("drm/amd/ras: Add rascore unified interface function")
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/atomfirmware: Add LpDDR5x and new fields for info v2_3

[Why]

Newer DCN bandwidth calculations require new definitions.

[How]

Add new fields cpu_id and vram_bit_width for
atom_integrated_system_info_v2_3, and add a memtype for LpDDR5x.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix ISP segfault issue in kernel v7.0

Add NULL pointer checks for dev->type before accessing
dev->type->name in ISP genpd add/remove functions to
prevent kernel crashes.

This regression was introduced in v7.0 as the wakeup sources
are registered using physical device instead of ACPI device.
This led to adding wakeup source device as the first child of
AMDGPU device without initializing dev-type variable, and
resulted in segfault when accessed it in the amdgpu isp driver.

Fixes: 057edc58aa59 ("ACPI: PM: Register wakeup sources under physical devices")
Suggested-by: Bin Du <Bin.Du@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gmc9.0: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Cc: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mmhub4.2.0: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mmhub4.1.0: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mmhub3.0: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mmhub3.0.2: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mmhub3.0.1: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mmhub2.3: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mmhub2.0: add bounds checking for cid

The value should never exceed the array size as those
are the only values the hardware is expected to return,
but add checks anyway.

Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add default case in DVI mode validation

amdgpu_connector_dvi_mode_valid() assigns max_digital_pixel_clock_khz
based on connector_object_id using a switch statement that lacks a
default case.

In practice this code path should never be hit because the existing
cases already cover all digital connector types that this function is
used for. This is also legacy display code which is not used for new
hardware.

Add a default case returning MODE_BAD to make the switch exhaustive and
silence the static analyzer smatch error. The new branch is effectively
defensive and should never be reached during normal operation.

Fixes: 585b2f685c56 ("drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)")
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Timur Kristóf <timur.kristof@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Fix NULL deref in ras_core_ras_interrupt_detected()

Fixes a NULL pointer dereference when ras_core is NULL and ras_core->dev
is accessed in the error path.

Fixes: 13c91b5b4378 ("drm/amd/ras: Add rascore unified interface function")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Drop unreachable return in amdgpu_reg_get_smn_base64()

amdgpu_reg_get_smn_base64() returns from all control-flow paths inside
the !adev->reg.smn.get_smn_base fallback path.

For version == 1, the function returns the base address from
amdgpu_reg_smn_v1_0_get_base(). For all other versions, the default
switch branch emits a dev_err_once() and returns 0.

The trailing return 0 after the switch is therefore unreachable and is
reported by Smatch as dead code:

drivers/gpu/drm/amd/amdgpu/amdgpu_reg_access.c:317
amdgpu_reg_get_smn_base64() warn: ignoring unreachable code

Remove the redundant return statement.

Fixes: 467ebfe65f6e ("drm/amdgpu: Add smn callbacks to register block")
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: validate fence_count in wait_fences ioctl

Add an early parameter check in amdgpu_cs_wait_fences_ioctl() to reject
a zero fence_count with -EINVAL.

dma_fence_wait_any_timeout() requires count > 0. When userspace passes
fence_count == 0, the call propagates down to dma_fence core which does
not expect a zero-length array and triggers a WARN_ON.

Return -EINVAL immediately so the caller gets a clear error instead of
hitting an unexpected warning in the DMA fence subsystem.

No functional change for well-formed userspace callers.

v2:
- Reworked commit message to clarify the parameter validation rationale
- Removed verbose crash log from commit description
- Simplified inline code comment

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: move devcoredump generation to a worker

Update the way drm_coredump_printer is used based on its documentation
and Xe's code: the main idea is to generate the final version in one go
and then use memcpy to return the chunks requested by the caller of
amdgpu_devcoredump_read.

The generation is moved to a separate worker thread.

This cuts the time to copy the dump from 40s to ~0s on my machine.

---
v3:
- removed adev->coredump_in_progress and instead use work as
the synchronisation mechanism
- use kvfree instead of kfree
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: Fix build errors due to declarations after labels

In C90 (which the kernel uses with -std=gnu89), declarations must
appear at the beginning of a block and cannot follow a label. The
switch cases in amdgpu_discovery.c and gmc_v12_1.c contained variable
declarations immediately after case labels, causing the compiler to
error:

drivers/gpu/drm/amd/amdgpu/gmc_v12_1.c:533:3: error: a label can only be
part of a statement and a declaration is not a statement

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: unlock cancel_delayed_work_sync for hang_detect_work

cancel_delayed_work_sync for work hand_detect_work should not be
locked since the amdgpu_userq_hang_detect_work also need the same
mutex and when they run together it could be a deadlock.

we do not need to hold the mutex for
cancel_delayed_work_sync(&queue->hang_detect_work). With this in place
if cancel and worker thread run at same time they will not deadlock.

Due to any failures if there is a hand detect and reset that there a
deadlock scenarios between cancel and running the main thread.

[ 243.118276] task:kworker/9:0 state:D stack:0 pid:73 tgid:73 ppid:2 task_flags:0x4208060 flags:0x00080000
[ 243.118283] Workqueue: events amdgpu_userq_hang_detect_work [amdgpu]
[ 243.118636] Call Trace:
[ 243.118639] <TASK>
[ 243.118644] __schedule+0x581/0x1810
[ 243.118649] ? srso_return_thunk+0x5/0x5f
[ 243.118656] ? srso_return_thunk+0x5/0x5f
[ 243.118659] ? wake_up_process+0x15/0x20
[ 243.118665] schedule+0x64/0xe0
[ 243.118668] schedule_preempt_disabled+0x15/0x30
[ 243.118671] __mutex_lock+0x346/0x950
[ 243.118677] __mutex_lock_slowpath+0x13/0x20
[ 243.118681] mutex_lock+0x2c/0x40
[ 243.118684] amdgpu_userq_hang_detect_work+0x63/0x90 [amdgpu]
[ 243.118888] process_scheduled_works+0x1f0/0x450
[ 243.118894] worker_thread+0x27f/0x370
[ 243.118899] kthread+0x1ed/0x210
[ 243.118903] ? __pfx_worker_thread+0x10/0x10
[ 243.118906] ? srso_return_thunk+0x5/0x5f
[ 243.118909] ? __pfx_kthread+0x10/0x10
[ 243.118913] ret_from_fork+0x10f/0x1b0
[ 243.118916] ? __pfx_kthread+0x10/0x10
[ 243.118920] ret_from_fork_asm+0x1a/0x30

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: fix dma_fence refcount underflow in userq path

An extra dma_fence_put() can drop the last reference to a fence while it is
still attached to a dma_resv object. This frees the fence prematurely via
dma_fence_release() while other users still hold the pointer.

Later accesses through dma_resv iteration may then operate on the freed
fence object, leading to refcount underflow warnings and potential hangs
when walking reservation fences.

Fix this by correcting the fence lifetime so the dma_resv object retains a
valid reference until it is done with the fence.i

[ 31.133803] refcount_t: underflow; use-after-free.
[ 31.133805] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x58/0x90, CPU#18: kworker/u96:1/188

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fallback to default discovery offset/size in sriov guest

In SRIOV guest environment, if dynamic critical region
is not enabled, fallback to default discovery offset
and size to ensure proper initialization

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: Use kvfree instead of kfree in amdgpu_userq_signal_ioctl

In function amdgpu_userq_signal_ioctl, drm_gem_objects_lookup allocates
memory via kvmalloc and hence when that memory is freed the memory
via kvfree.

Fixes: 4ca06f6fb45d ("drm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_signal_ioctl")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: update flip bit setting of RAS bad page

The flip bit setting is different if umc number is half of original
configuration.

v2: block the flip bit setting for unsupported umc configuration.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Replace deprecated strcpy() in amdgpu_virt_write_vf2pf_data

strcpy() is deprecated as it does not do any bounds checking (as
specified in Documentation/process/deprecated.rst).

There is a risk of buffer overflow in the case that the value for
THIS_MODULE->version exceeds the 64 characters. This is unlikely, but
replacing the deprecated function will pre-emptively remove this risk
entirely.

Replace both instances of strcpy() with the safer strscpy() function.

Changes have been compile tested.

Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Yicong Hui <yiconghui@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: fix dcn 2.01 check

The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN.

Fixes: 54b822b3eac3 ("drm/amd/display: Use dce_version instead of chip_id")
Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr()

parse_edid_displayid_vrr() searches the EDID extension blocks for a
DisplayID extension before parsing the dynamic video timing range.

The code previously checked whether edid_ext was NULL after the search
loop. However, edid_ext is assigned during each iteration of the loop,
so it will never be NULL once the loop has executed. If no DisplayID
extension is found, edid_ext ends up pointing to the last extension
block, and the NULL check does not correctly detect the failure case.

Instead, check whether the loop completed without finding a matching
DisplayID block by testing "i == edid->extensions". This ensures the
function exits early when no DisplayID extension is present and avoids
parsing an unrelated EDID extension block.

Also simplify the EDID validation check using "!edid ||
!edid->extensions".

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075)

Fixes: a638b837d0e6 ("drm/amd/display: Fix refresh rate range for some panel")
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Jerry Zuo <jerry.zuo@amd.com>
Cc: Sun peng Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: remove duplicate format modifier

amdgpu_dm_plane_get_plane_modifiers always adds DRM_FORMAT_MOD_LINEAR to
the list of modifiers. However, with gfx12,
amdgpu_dm_plane_add_gfx12_modifiers also adds that modifier to the list.
So we end up with two copies. Most apps just ignore this but some
(Weston) don't like it.

As a fix, we change amdgpu_dm_plane_add_gfx12_modifiers to not add
DRM_FORMAT_MOD_LINEAR to the list, matching the behavior of analogous
functions for other chips.

Signed-off-by: Erik Kurzinger <ekurzinger@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: switch XGMI sysfs show helpers to sysfs_emit_at()

The XGMI sysfs show helpers amdgpu_xgmi_show_num_hops() and
amdgpu_xgmi_show_num_links() currently populate the output buffer with
sprintf() and then call sysfs_emit(buf, "%s\n", buf) to append the final
newline.

Convert both helpers to use sysfs_emit_at() while tracking the current
offset. This keeps buffer construction in the sysfs helpers, avoids
feeding the output buffer back into the final formatted write, and
matches the style already used by
amdgpu_xgmi_show_connected_port_num().

Signed-off-by: David Baum <davidbaum461@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/discovery: Add braces to case statements in amdgpu_discovery_table_check()

When building with a version of clang that supports the narrower
'-fms-anonymous-structs' (as opposed to the wider '-fms-extensions')
along with the associated kernel support (such as in next-20260312 [1]),
there are warnings (or errors with CONFIG_WERROR=y / W=e) from the
switch statement added by commit 47ab777c16c7 ("drm/amdgpu/discovery:
use common function to check discovery table").

  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:560:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    560 |                 struct ip_discovery_header *ihdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:568:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    568 |                 struct gpu_info_header *ghdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:576:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    576 |                 struct harvest_info_header *hhdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:584:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    584 |                 struct vcn_info_header *vhdr =
        |                 ^
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:592:3: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
    592 |                 struct mall_info_header *mhdr =
        |                 ^

If '-fms-extensions' were not present, this would be a hard error in
older clang versions.

Add braces to the case statements that declare variables to clear up the
warnings.

Fixes: 47ab777c16c7 ("drm/amdgpu/discovery: use common function to check discovery table")
Link: https://git.kernel.org/next/linux-next/c/0d3fccf68d9873a3c824fb70be0dbb2c4642aa90
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Pass ras poison consumption message to sriov host

Pass ras poison consumption message to sriov host.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: Use kvfree instead of kfree in amdgpu_userq_wait_ioctl

In function amdgpu_userq_wait_ioctl, drm_gem_objects_lookup allocates
memory via kvmalloc and hence when that memory is freed the memory
via kvfree.

Fixes: 2de9353e193f ("drm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_wait_ioctl")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Use common smu fw check function for smu15

Use common smu fw check function for smu15 and remove dedicated ones

v2: Remove dedicated functions and directly use common one

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Use common smu fw check function for smu13

Use common smu fw check function for smu13 and remove deicated ones

v2: Remove dedicated functions and directly use common one

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.374

This version brings along the following updates:

- Clamp dc_cursor_position x_hotspot to prevent integer overflow
- Query DC for gfx handling when setting linear tiling
- Add a buffer for boot time crc
- Silence static analysis warnings
- Plumb MRQ programming out of DML for dml2_1
- Add dcn_mrq_present Field
- Fix number of opp
- Add debugfs to disallow eDP Replay entry

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Clamp dc_cursor_position x_hotspot to prevent integer overflow

why:
Workaround for duplicate cursor. Cursor offsetting via x_hotspot attempts
to write a 32 bit unsigned integer to the 8 bit field CURSOR_HOT_SPOT_X.
This wraps cursor position back into focus if x_hotspot exceeds 8 bits,
making duplicate cursors visible

how:
Clamp x_hotspot before writing to hardware

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com>
Signed-off-by: Benjamin Nwankwo <Benjamin.Nwankwo@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Query DC for gfx handling when setting linear tiling

[Why]
Post-driver cases always use linear tiling yet gfx handling for this
case is improper, allowing for incorrect gfx structs to be populated and
used.

[How]
Query DC for the apporpriate linear tiling mode and populate the DCN
specific gfx version structs.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Carbones <Nicholas.Carbones@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add a buffer for boot time crc

[Why]
We need to reserve a memory buffer for boot time crc test
during resume.

[How]
Create a buffer during boot up and send the buffer info to
DMUB.

Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Silence static analysis warning

Silence static analysis warnings by ensuring swath size temporaries are
initialized before use. No functional change intended.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Plumb MRQ programming out of DML for dml2_1

[Why]
If the MRQ is present then these fields are also required to be
plumbed out to the requestor for programming.

[How]
Pipe the fields out through rq_dlg_get_rq_reg.

The implementation follows the previous generation in dml2_0 for DCN35
but adjusted for the new helpers and coding style of dml2_1.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add dcn_mrq_present Field

[Why/How]
Add MRQ flag so it can be passed from ip_caps to ip_params

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix number of opp

[Why/How]
Patch number of opp based on IP caps

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add debugfs to disallow eDP Replay entry

[Why & How]
Test applications need to read CRC from eDP sink side, but sink
replay feature prevents proper CRC reading and causing timeout.

Add disallow_edp_enter_replay debugfs interface to allow test apps
to temporarily disable Replay for CRC operations.

Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Wrap dcn32_override_min_req_memclk() in DC_FP_{START, END}

[Why]
The dcn32_override_min_req_memclk function is in dcn32_fpu.c, which is
compiled with CC_FLAGS_FPU into FP instructions. So when we call it we
must use DC_FP_{START,END} to save and restore the FP context, and
prepare the FP unit on architectures like LoongArch where the FP unit
isn't always on.

Reported-by: LiarOnce <liaronce@hotmail.com>
Fixes: ee7be8f3de1c ("drm/amd/display: Limit DCN32 8 channel or less parts to DPM1 for FPO")
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>