]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
5 weeks agodrm/amd/display: pause the workload setting in dm
Kenneth Feng [Fri, 28 Mar 2025 02:34:57 +0000 (10:34 +0800)] 
drm/amd/display: pause the workload setting in dm

v1:
Pause the workload setting in dm when doinn idle optimization

v2:
Rebase patch to latest kernel code base (kernel 6.16)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 weeks agodrm/radeon: Remove calls to drm_put_dev()
Daniel Palmer [Sat, 18 Oct 2025 05:44:51 +0000 (14:44 +0900)] 
drm/radeon: Remove calls to drm_put_dev()

Since the allocation of the drivers main structure was changed to
devm_drm_dev_alloc() drm_put_dev()'ing to trigger it to be free'd
should be done by devres.

However, drm_put_dev() is still in the probe error and device remove
paths. When the driver fails to probe warnings like the following are
shown because devres is trying to drm_put_dev() after the driver
already did it.

[    5.642230] radeon 0000:01:05.0: probe with driver radeon failed with error -22
[    5.649605] ------------[ cut here ]------------
[    5.649607] refcount_t: underflow; use-after-free.
[    5.649620] WARNING: CPU: 0 PID: 357 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110

Fixes: a9ed2f052c5c ("drm/radeon: change drm_dev_alloc to devm_drm_dev_alloc")
Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 weeks agodrm/radeon: Do not kfree() devres managed rdev
Daniel Palmer [Sat, 18 Oct 2025 05:44:50 +0000 (14:44 +0900)] 
drm/radeon: Do not kfree() devres managed rdev

Since the allocation of the drivers main structure was changed to
devm_drm_dev_alloc() rdev is managed by devres and we shouldn't be calling
kfree() on it.

This fixes things exploding if the driver probe fails and devres cleans up
the rdev after we already free'd it.

Fixes: a9ed2f052c5c ("drm/radeon: change drm_dev_alloc to devm_drm_dev_alloc")
Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 weeks agodrm/radeon: Clean up pdev->dev instances in probe
Daniel Palmer [Sat, 18 Oct 2025 05:44:49 +0000 (14:44 +0900)] 
drm/radeon: Clean up pdev->dev instances in probe

Get a struct device pointer from the start and use it.

Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 weeks agodrm/amd: Check that VPE has reached DPM0 in idle handler
Mario Limonciello [Thu, 16 Oct 2025 18:55:27 +0000 (13:55 -0500)] 
drm/amd: Check that VPE has reached DPM0 in idle handler

[Why]
Newer VPE microcode has functionality that will decrease DPM level
only when a workload has run for 2 or more seconds.  If VPE is turned
off before this DPM decrease and the PMFW doesn't reset it when
power gating VPE, the SOC can get stuck with a higher DPM level.

This can happen from amdgpu's ring buffer test because it's a short
quick workload for VPE and VPE is turned off after 1s.

[How]
In idle handler besides checking fences are drained check PMFW version
to determine if it will reset DPM when power gating VPE.  If PMFW will
not do this, then check VPE DPM level. If it is not DPM0 reschedule
delayed work again until it is.

v2: squash in return fix (Alex)

Cc: Peyton.Lee@amd.com
Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Sultan Alsawaf <sultan@kerneltoast.com>
Tested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4615
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Remove unused members in amdgpu_mman
Lijo Lazar [Thu, 16 Oct 2025 13:19:18 +0000 (18:49 +0530)] 
drm/amdgpu: Remove unused members in amdgpu_mman

Discovery related members are now part of amdgpu_discovery_info.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: query block error count of ras module
YiPeng Chai [Sat, 11 Oct 2025 08:52:17 +0000 (16:52 +0800)] 
drm/amdgpu: query block error count of ras module

Query block error count of ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add logic for VF data exchange region to init from dynamic crit_region...
Ellen Pan [Wed, 8 Oct 2025 20:36:50 +0000 (15:36 -0500)] 
drm/amdgpu: Add logic for VF data exchange region to init from dynamic crit_region offsets

1. Added VF logic to init data exchange region using the offsets from dynamic(v2) critical regions;

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add logic for VF ipd and VF bios to init from dynamic crit_region offsets
Ellen Pan [Tue, 7 Oct 2025 16:12:39 +0000 (11:12 -0500)] 
drm/amdgpu: Add logic for VF ipd and VF bios to init from dynamic crit_region offsets

1. Added VF logic in amdgpu_virt to init IP discovery using the offsets from dynamic(v2) critical regions;
2. Added VF logic in amdgpu_virt to init bios image using the offsets from dynamic(v2) critical regions;

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Reuse fw_vram_usage_* for dynamic critical region in SRIOV
Ellen Pan [Wed, 8 Oct 2025 20:01:10 +0000 (15:01 -0500)] 
drm/amdgpu: Reuse fw_vram_usage_* for dynamic critical region in SRIOV

- During guest driver init, asa VFs receive PF msg to
init dynamic critical region(v2), VFs reuse fw_vram_usage_*
 from ttm to store critical region tables in a 5MB chunk.

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Introduce SRIOV critical regions v2 during VF init
Ellen Pan [Tue, 7 Oct 2025 16:00:16 +0000 (11:00 -0500)] 
drm/amdgpu: Introduce SRIOV critical regions v2 during VF init

    1. Introduced amdgpu_virt_init_critical_region during VF init.
     - VFs use init_data_header_offset and init_data_header_size_kb
            transmitted via PF2VF mailbox to fetch the offset of
            critical regions' offsets/sizes in VRAM and save to
            adev->virt.crit_region_offsets and adev->virt.crit_region_sizes_kb.

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add SRIOV crit_region_version support
Ellen Pan [Tue, 7 Oct 2025 14:46:16 +0000 (09:46 -0500)] 
drm/amdgpu: Add SRIOV crit_region_version support

1. Added enum amd_sriov_crit_region_version to support multi versions
2. Added logic in SRIOV mailbox to regonize crit_region version during
   req_gpu_init_data

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Updated naming of SRIOV critical region offsets/sizes with _V1 suffix
Ellen Pan [Mon, 6 Oct 2025 20:47:45 +0000 (15:47 -0500)] 
drm/amdgpu: Updated naming of SRIOV critical region offsets/sizes with _V1 suffix

 - This change prepares the later patches to intro  _v2 suffix to SRIOV critical regions

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: query bad page info of ras module
YiPeng Chai [Sat, 11 Oct 2025 02:49:55 +0000 (10:49 +0800)] 
drm/amdgpu: query bad page info of ras module

Query bad page info of ras module.

V2:
  Update code to reuse bad page output code.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: ras module supports error injection
YiPeng Chai [Tue, 14 Oct 2025 07:30:58 +0000 (15:30 +0800)] 
drm/amdgpu: ras module supports error injection

ras module supports error injection.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/ras: Update function and remove redundant code
YiPeng Chai [Fri, 17 Oct 2025 07:27:56 +0000 (15:27 +0800)] 
drm/amd/ras: Update function and remove redundant code

Update function and remove redundant code:
1. Update function to prepare for internal use.
2. Remove unused function code previously prepared
   for ioctl.

V2:
  Update commit message content.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/ras: Update ras command context structure name
YiPeng Chai [Thu, 25 Sep 2025 08:12:28 +0000 (16:12 +0800)] 
drm/amd/ras: Update ras command context structure name

According to the actual usage of this structure,
it is more appropriate to call it context, the
structure name with ioctl is easy to cause
misunderstanding.

V2:
  Update commit message content.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Promote DC to 3.2.355
Taimur Hassan [Sat, 11 Oct 2025 00:21:13 +0000 (19:21 -0500)] 
drm/amd/display: Promote DC to 3.2.355

This version brings along following update:

-Fix GFP_ATOMIC abuse
-Fix several checkpatch issues
-Set DCN32 to use update planes and stream version 3
-Write segment pointer with mot enabled for MST
-Control BW allocation in FW side
-Change clean dsc blocks condition in accelerated mode
-Check disable_fec flag before enabling FEC

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: write segment pointer with mot enabled for mst
Meenakshikumar Somasundaram [Wed, 8 Oct 2025 17:35:07 +0000 (13:35 -0400)] 
drm/amd/display: write segment pointer with mot enabled for mst

[Why]
Some mst branches NAK's segment pointer writes with mot disabled.
So reset of segment pointer to 0 should be performed with mot enabled.

[How]
Write segment pointer of mst branch devices with mot enabled.

Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Control BW allocation in FW side
Cruise Hung [Wed, 8 Oct 2025 06:44:29 +0000 (14:44 +0800)] 
drm/amd/display: Control BW allocation in FW side

[Why]
The BW allocation feature should be controlled in FW side.

[How]
Pass the control bit to FW boot option.

Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Fix misc. checkpatch issues
Ilya Bakoulin [Tue, 7 Oct 2025 20:34:09 +0000 (16:34 -0400)] 
drm/amd/display: Fix misc. checkpatch issues

[Why/How]
Addresses various checkpatch issues related to the HWSS block sequence
function change.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Change clean dsc blocks condition in accelerated mode
Lewis Huang [Tue, 7 Oct 2025 08:46:59 +0000 (16:46 +0800)] 
drm/amd/display: Change clean dsc blocks condition in accelerated mode

[Why]
On system resume from S4 with the lid closed,
DSC was not cleared because DPMS was already off.

[How]
In accelerated mode, to clean up DSC blocks if eDP dpms off is true
to align the DSC and dpms state when we are not in fast boot and
seamless boot.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Lewis Huang <Lewis.Huang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Set DCN32 to use update planes and stream version 3
Nicholas Carbones [Fri, 3 Oct 2025 22:36:18 +0000 (18:36 -0400)] 
drm/amd/display: Set DCN32 to use update planes and stream version 3

[Why]
Old minimal transition does not always wait for updates to complete
before proceeding, which can lead to corruption in multi display
scenarios for DCN32.

[How]
Set DCN32 to use update_planes_and_stream_v3 for better pipe transition
handling.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Carbones <Nicholas.Carbones@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Check disable_fec flag before enabling fec.
Meenakshikumar Somasundaram [Tue, 7 Oct 2025 02:02:31 +0000 (22:02 -0400)] 
drm/amd/display: Check disable_fec flag before enabling fec.

[Why]
dc debug option disable_fec was not working.

[How]
Check dc debug option disable_fec flag before
enabling fec in dp_should_enable_fec().

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Fix GFP_ATOMIC abuse
Aurabindo Pillai [Fri, 3 Oct 2025 20:06:53 +0000 (16:06 -0400)] 
drm/amd/display: Fix GFP_ATOMIC abuse

There is a lot GFP_ATOMIC allocations which are not in interrupt
context. Change them to use GFP_KERNEL instead.

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Enable ras module
YiPeng Chai [Thu, 20 Mar 2025 09:04:14 +0000 (17:04 +0800)] 
drm/amdgpu: Enable ras module

Enable ras module, disabled by default.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd: Fix set but not used warnings
Tiezhu Yang [Thu, 16 Oct 2025 06:51:17 +0000 (14:51 +0800)] 
drm/amd: Fix set but not used warnings

There are many set but not used warnings under drivers/gpu/drm/amd when
compiling with the latest upstream mainline GCC:

  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:305:18: warning: variable ‘p’ set but not used [-Wunused-but-set-variable=]
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h:103:26: warning: variable ‘internal_reg_offset’ set but not used [-Wunused-but-set-variable=]
  ...
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h:164:26: warning: variable ‘internal_reg_offset’ set but not used [-Wunused-but-set-variable=]
  ...
  drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:445:13: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=]
  drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:875:21: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=]

Remove the variables actually not used or add __maybe_unused attribute for
the variables actually used to fix them, compile tested only.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add ras module ip block to amdgpu discovery
YiPeng Chai [Mon, 31 Mar 2025 03:12:58 +0000 (11:12 +0800)] 
drm/amdgpu: Add ras module ip block to amdgpu discovery

Add ras module ip block to amdgpu discovery.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: check save count before RAS bad page saving
Tao Zhou [Wed, 15 Oct 2025 08:19:07 +0000 (16:19 +0800)] 
drm/amdgpu: check save count before RAS bad page saving

It's possible that unit_num is larger than 0 but save_count is zero,
since we do get bad page address but the address is invalid. Check
unit_num and save_count together.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: add the kernel docs for alloc/free/valid range
Sunil Khatri [Tue, 14 Oct 2025 07:41:02 +0000 (13:11 +0530)] 
drm/amdgpu: add the kernel docs for alloc/free/valid range

Add kernel docs for the functions related to hmm_range.

Documents added for functions:
amdgpu_hmm_range_valid
amdgpu_hmm_range_alloc
amdgpu_hmm_range_free

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: use GPU_HDP_FLUSH for sriov
Victor Zhao [Thu, 9 Oct 2025 02:42:48 +0000 (10:42 +0800)] 
drm/amdgpu: use GPU_HDP_FLUSH for sriov

Currently SRIOV runtime will use kiq to write HDP_MEM_FLUSH_CNTL for
hdp flush. This register need to be write from CPU for nbif to aware,
otherwise it will not work.

Implement amdgpu_kiq_hdp_flush and use kiq to do gpu hdp flush during
sriov runtime.

v2:
- fallback to amdgpu_asic_flush_hdp when amdgpu_kiq_hdp_flush failed
- add function amdgpu_mes_hdp_flush

v3:
- changed returned error

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add kiq hdp flush callbacks
Victor Zhao [Thu, 9 Oct 2025 02:38:28 +0000 (10:38 +0800)] 
drm/amdgpu: Add kiq hdp flush callbacks

Add kiq hdp flush callbacks for gfx ips to support gpu hdp flush when no
ring presents

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd: Add a helper to tell whether an IP block HW is enabled
Mario Limonciello [Tue, 14 Oct 2025 19:30:35 +0000 (14:30 -0500)] 
drm/amd: Add a helper to tell whether an IP block HW is enabled

There is already a helper for telling if a block is valid, but if
IP handling wants to check if it's HW is enabled no such helper
exists.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Fix vram_usage underflow
Alysa Liu [Fri, 10 Oct 2025 21:18:09 +0000 (17:18 -0400)] 
drm/amdgpu: Fix vram_usage underflow

vram_usage was subtracting non-vram memory size,
which caused it to become negative.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/pm: Avoid writing nulls into `pp_od_clk_voltage`
Ilya Zlobintsev [Mon, 13 Oct 2025 16:30:42 +0000 (19:30 +0300)] 
drm/amd/pm: Avoid writing nulls into `pp_od_clk_voltage`

Calling `smu_cmn_get_sysfs_buf` aligns the
offset used by `sysfs_emit_at` to the current page boundary, which was
previously directly returned from the various `print_clk_levels`
implementations to be added to the buffer position.
Instead, only the relative offset showing how much was written
to the buffer should be returned, regardless of how it was changed
for alignment purposes.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Ilya Zlobintsev <ilya.zlobintsev@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Use memset32 for IB padding
Tvrtko Ursulin [Thu, 11 Sep 2025 11:41:40 +0000 (12:41 +0100)] 
drm/amdgpu: Use memset32 for IB padding

Use memset32 instead of open coding it, just because it is
that bit nicer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Promote DC to 3.2.354
Taimur Hassan [Fri, 3 Oct 2025 23:31:30 +0000 (18:31 -0500)] 
drm/amd/display: Promote DC to 3.2.354

Display Core v3.2.354 release highlights:

* DCN35 dispclk, dppclk & other fixes
* DCN401 cursor offload fix
* Add new block seqeunce-building/executing functions
* null ptr fixes
* DPIA hpd fix
* debug improvements
* Fix performance regression from full updates
* Firmware Release 0.1.31.0

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: [FW Promotion] Release 0.1.31.0
Taimur Hassan [Fri, 3 Oct 2025 20:19:27 +0000 (16:19 -0400)] 
drm/amd/display: [FW Promotion] Release 0.1.31.0

Release highlights:

DCN35/351/36:
* fix video lag with replay
* DPP DTO programming sequence fix
* IPS exit programming sequence fix

DCN 3.1.5:
* fix video lag with replay

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: add new block sequence-building/executing functions
Ilya Bakoulin [Wed, 3 Sep 2025 18:07:41 +0000 (14:07 -0400)] 
drm/amd/display: add new block sequence-building/executing functions

[Why/How]
Create functions for building/executing HW block programming steps

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: add additional hdcp traces
Wenjing Liu [Thu, 2 Oct 2025 15:02:39 +0000 (11:02 -0400)] 
drm/amd/display: add additional hdcp traces

[why]
Current hdcp trace only tracks hdcp errors. We need to expand the trace
structure for more tracing information.

[how]
Add following traces for hdcp1:
- attempt_count
- downstream_device_count
Add following traces for hdcp2:
- attempt_count
- downstream_device_count
- hdcp1_device_downstream
- hdcp2_legacy_device_downstream

Reviewed-by: Sung Lee <sung.lee@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Fix performance regression from full updates
Dominik Kaszewski [Fri, 3 Oct 2025 09:50:55 +0000 (11:50 +0200)] 
drm/amd/display: Fix performance regression from full updates

[Why]
full_update_required is too strict at update_planes_and_stream_state,
causing a performance regression due to too many updates being full.

[How]
* Carve out weak version of full_update_required for use inside
update_planes_and_stream_state.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Remove dc state from check_update
Dominik Kaszewski [Tue, 15 Jul 2025 12:02:40 +0000 (14:02 +0200)] 
drm/amd/display: Remove dc state from check_update

[Why]
dc_check_update_surfaces_for_stream should not have access to entire
DC, especially not a mutable one. Concurrent checks should be able
to run independently of one another, without risk of changing state.

[How]
* Remove access to dc state other than debug and capacity.
* Move some checks from DC to DM caller.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: use GFP_NOWAIT for allocation in interrupt handler
Aurabindo Pillai [Thu, 25 Sep 2025 14:23:59 +0000 (10:23 -0400)] 
drm/amd/display: use GFP_NOWAIT for allocation in interrupt handler

schedule_dc_vmin_vmax() is called by dm_crtc_high_irq(). Hence, we
cannot have the former sleep. Use GFP_NOWAIT for allocation in this
function.

Fixes: c210b757b400 ("drm/amd/display: fix dmub access race condition")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Add sink/link debug logs
Aurabindo Pillai [Mon, 29 Sep 2025 20:06:28 +0000 (16:06 -0400)] 
drm/amd/display: Add sink/link debug logs

Add some extra logs to better help triage blackscreen issues.

* Dump all the links to see if they have sinks associated.
* Print the edid manufacturer & product id associated with a stream that
  was just created.

Reviewed-by: Jerry Zuo <jerry.zuo@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Move all DCCG RCG into HWSS root_clock_control
Ovidiu Bunea [Thu, 2 Oct 2025 21:47:36 +0000 (17:47 -0400)] 
drm/amd/display: Move all DCCG RCG into HWSS root_clock_control

[why & how]
Enabling/disabling DCCG RCG should be done as a last-level step when
enabling/disable blocks. This is handled by HWSS root_clock_control
already during optimize_bandwidth.
However, dccg35_dpp_root_clock_control was missing the RCG enable
call on the disable path.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: update perfmon measurement interfaces
Wenjing Liu [Thu, 2 Oct 2025 18:20:02 +0000 (14:20 -0400)] 
drm/amd/display: update perfmon measurement interfaces

[how]
The commit update interfaces for dchubbub perfmon meansurement to better
reflect our requirements.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: fix dppclk rcg poweron check
Yihan Zhu [Fri, 26 Sep 2025 14:07:46 +0000 (10:07 -0400)] 
drm/amd/display: fix dppclk rcg poweron check

[WHY & HOW]
dppclk rcg power down will flip the poweron flag in the cache to cause dppclk rcg will never
run the rcg ungate sequence in some condition. Wait 10us to let dpp dto fully ramp.

Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: not skip hpd irq for bw alloc mode
Peichen Huang [Tue, 30 Sep 2025 05:39:02 +0000 (13:39 +0800)] 
drm/amd/display: not skip hpd irq for bw alloc mode

[WHY]
Driver only process hpd irq when a branch device or when
the link is established. It would cause some irq for bw_alloc
mode of dp tunneling are ignored.

[HOW]
Driver should process hpd irq if bw_alloc and dp tunneling
are enabled.

Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Update spacing in struct
Alvin Lee [Wed, 1 Oct 2025 17:40:57 +0000 (13:40 -0400)] 
drm/amd/display: Update spacing in struct

Update spacing so that fields with longer name will
still be aligned correctly (new fields to be added).

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Update DCN401 path for cursor offload
Alvin Lee [Tue, 30 Sep 2025 21:28:54 +0000 (17:28 -0400)] 
drm/amd/display: Update DCN401 path for cursor offload

[Description]
The DCN401 cursor offload path needs to take into account
use_mall_for_cursor, and also need to ensure the dcn32
function assigns the cursor cache fields (DCN401 uses the
dcn32 implementation).

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: increase max link count and fix link->enc NULL pointer access
Charlene Liu [Tue, 30 Sep 2025 00:29:30 +0000 (20:29 -0400)] 
drm/amd/display: increase max link count and fix link->enc NULL pointer access

[why]
1.) dc->links[MAX_LINKS] array size smaller than actual requested.
max_connector + max_dpia + 4 virtual = 14.
increase from 12 to 14.

2.) hw_init() access null LINK_ENC for dpia non display_endpoint.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Rework HDMI data channel reads
Relja Vojvodic [Wed, 24 Sep 2025 13:33:35 +0000 (09:33 -0400)] 
drm/amd/display: Rework HDMI data channel reads

Fix the HDMI data channel reads to respect scdc_present field
to pass compliance test.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: fix typo in display_mode_core_structs.h
Adi Gollamudi [Sun, 12 Oct 2025 19:13:19 +0000 (12:13 -0700)] 
drm/amd/display: fix typo in display_mode_core_structs.h

Fix a typo in a comment, change "enviroment" to "environment" in
drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h

Signed-off-by: Aditya Gollamudi <adigollamudi@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: add dccg dfs mask def
Charlene Liu [Mon, 29 Sep 2025 19:15:13 +0000 (15:15 -0400)] 
drm/amd/display: add dccg dfs mask def

[why]
add some register masks for DCCG

Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Remove unused field in DML
Alvin Lee [Mon, 29 Sep 2025 18:47:51 +0000 (14:47 -0400)] 
drm/amd/display: Remove unused field in DML

Remove unused fields.

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Fix NULL pointer dereference
Meenakshikumar Somasundaram [Mon, 29 Sep 2025 18:28:34 +0000 (14:28 -0400)] 
drm/amd/display: Fix NULL pointer dereference

[Why]
On a mst branch with multi display setup, dc context is obselete
after updating the first stream. Referencing the same dc context
for the next stream update to fetch dc pointer leads to NULL
pointer dereference.

[How]
Get the dc pointer from the link rather than context.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: add dispclk ramping to dcn35.
Charlene Liu [Fri, 26 Sep 2025 19:51:15 +0000 (15:51 -0400)] 
drm/amd/display: add dispclk ramping to dcn35.

[why]
this is a required logic based on HW programming guide.
tested/ported on dcn401.

Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Add debug option to override EASF scaler taps
Samson Tam [Thu, 25 Sep 2025 19:01:23 +0000 (15:01 -0400)] 
drm/amd/display: Add debug option to override EASF scaler taps

[Why & How]
Add new option override_easf to use in_taps instead of internal
 taps policy for debugging

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: fix duplicate aux command with AMD aux backlight
Harry VanZyllDeJong [Wed, 17 Sep 2025 20:46:13 +0000 (16:46 -0400)] 
drm/amd/display: fix duplicate aux command with AMD aux backlight

when using AMD aux backlight control, we avoid sending backlight
update commands to DMUB firmware because it is controlled by aux commands
in driver.

Reviewed-by: Iswara Nagulendran <iswara.nagulendran@amd.com>
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Harry VanZyllDeJong <hvanzyll@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add ras module eeprom safety watermark check
YiPeng Chai [Wed, 26 Mar 2025 10:03:49 +0000 (18:03 +0800)] 
drm/amdgpu: Add ras module eeprom safety watermark check

Add ras module eeprom safety watermark check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Avoid hive seqno increment in legacy ras
YiPeng Chai [Tue, 25 Mar 2025 06:11:10 +0000 (14:11 +0800)] 
drm/amdgpu: Avoid hive seqno increment in legacy ras

The hive->event_mgr variable is used by both ras module
and legacy ras. To ensure the continuity of hive seqno
growth, after enabling ras module, it is forbidden to
operate the event_mgr variable in legacy ras.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add poison consumption sequence numbers for gfx and sdma
YiPeng Chai [Mon, 24 Mar 2025 10:20:17 +0000 (18:20 +0800)] 
drm/amdgpu: Add poison consumption sequence numbers for gfx and sdma

Add poison consumption sequence numbers for
gfx and sdma.

V3:
  Use RAS_EVENT_LOG to print ras log info.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Avoid loading bad pages into legacy ras
YiPeng Chai [Mon, 21 Jul 2025 07:15:53 +0000 (15:15 +0800)] 
drm/amdgpu: Avoid loading bad pages into legacy ras

When ras module is enabled, the bad pages will
be loaded by ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: add ras module rma check
YiPeng Chai [Tue, 17 Jun 2025 07:16:59 +0000 (15:16 +0800)] 
drm/amdgpu: add ras module rma check

Add ras module rma check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Improve ras fatal error handling function
YiPeng Chai [Mon, 21 Jul 2025 07:14:03 +0000 (15:14 +0800)] 
drm/amdgpu: Improve ras fatal error handling function

In multi-gpu case, a fatal error will generate several
fatal error interrupts. After improving this function,
the ras module can reuse this function to only
handle the first interrupt.

V3:
  Initialize event_id using RAS_EVENT_INVALID_ID.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Intercept ras interrupts to ras module
YiPeng Chai [Sun, 28 Sep 2025 06:25:27 +0000 (14:25 +0800)] 
drm/amdgpu: Intercept ras interrupts to ras module

Intercept ras interrupts to ras module.

V2:
  Change function names in ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Drop calls to restore power limit and clock from smu_resume()
Mario Limonciello [Thu, 9 Oct 2025 20:59:07 +0000 (15:59 -0500)] 
drm/amd: Drop calls to restore power limit and clock from smu_resume()

User requested power limits and clock settings are already restored as
part of smu_restore_dpm_user_profile(). It's unnecessary to call the
same restore as part of smu_resume().

Revert the following commits to drop that extra restore:
commit ed4efe426a49 ("drm/amd: Restore cached power limit during resume")
commit 796ff8a7e01b ("drm/amd: Restore cached manual clock settings during resume")
commit f9b80514a722 ("drm/amd: Only restore cached manual clock settings in restore if OD enabled")

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: update remove after reset flag for MES remove queue
Jonathan Kim [Wed, 18 Jun 2025 14:45:55 +0000 (10:45 -0400)] 
drm/amdgpu: update remove after reset flag for MES remove queue

Remove queue after reset flag is required to remove a queue that has
been successfully reset to clean up the MES' internal state.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Add ras module files into amdgpu
YiPeng Chai [Tue, 30 Sep 2025 02:47:49 +0000 (10:47 +0800)] 
drm/amdgpu: Add ras module files into amdgpu

Add ras module files into amdgpu.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/userqueue: validate userptrs for userqueues
Sunil Khatri [Thu, 25 Sep 2025 09:01:54 +0000 (14:31 +0530)] 
drm/amdgpu/userqueue: validate userptrs for userqueues

userptrs could be changed by the user at any time and
hence while locking all the bos before GPU start processing
validate all the userptr bos.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: update the functions to use amdgpu version of hmm
Sunil Khatri [Fri, 10 Oct 2025 12:39:57 +0000 (18:09 +0530)] 
drm/amdgpu: update the functions to use amdgpu version of hmm

At times we need a bo reference for hmm and for that add
a new struct amdgpu_hmm_range which will hold an optional
bo member and hmm_range.

Use amdgpu_hmm_range instead of hmm_range and let the bo
as an optional argument for the caller if they want to
the bo reference to be taken or they want to handle that
explicitly.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Reserve discovery TMR only if needed
Lijo Lazar [Fri, 10 Oct 2025 11:53:33 +0000 (17:23 +0530)] 
drm/amdgpu: Reserve discovery TMR only if needed

For legacy SOCs, discovery binary is sideloaded. Instead of checking for
binary blob, use a flag to determine if discovery region needs to be
reserved.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/pm: export a function amdgpu_smu_ras_send_msg to allow send msg directly
YiPeng Chai [Tue, 30 Sep 2025 02:46:05 +0000 (10:46 +0800)] 
drm/amd/pm: export a function amdgpu_smu_ras_send_msg to allow send msg directly

provide a interface that allows ras client send msg to smu/pmfw directly.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/pm: Grant interface access after full init
Lijo Lazar [Wed, 8 Oct 2025 07:37:13 +0000 (13:07 +0530)] 
drm/amd/pm: Grant interface access after full init

Allow access to user interfaces like sysfs/hwmon only after full
initialization of the device. When device is part of XGMI hive and a
reset is required during initialization, the inteface files will be
created as part of minimal device initialization. Full initialization of
the device will be done only after all devices in XGMI hive are probed
and a reset is done together on all.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Move reset-on-init sequence earlier
Lijo Lazar [Tue, 7 Oct 2025 13:00:24 +0000 (18:30 +0530)] 
drm/amdgpu: Move reset-on-init sequence earlier

Complete reset-on-init sequence before sysfs interfaces are created.
Devices get properly initiaized only after reset, and then only sysfs
interfaces should be made available.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Add amdgpu_discovery_info
Lijo Lazar [Fri, 10 Oct 2025 11:42:52 +0000 (17:12 +0530)] 
drm/amdgpu: Add amdgpu_discovery_info

Add amdgpu_discovery_info structure to keep all discovery related
information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Reorganize sysfs ini/fini calls
Lijo Lazar [Tue, 7 Oct 2025 12:42:04 +0000 (18:12 +0530)] 
drm/amdgpu: Reorganize sysfs ini/fini calls

Aggregate sysfs ini/fini calls into separate functions. No functional
change.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: clean up and unify hw fence handling
Alex Deucher [Wed, 27 Aug 2025 15:34:14 +0000 (11:34 -0400)] 
drm/amdgpu: clean up and unify hw fence handling

Decouple the amdgpu fence from the amdgpu_job structure.
This lets us clean up the separate fence ops for the embedded
fence and other fences.  This also allows us to allocate the
vm fence up front when we allocate the job.

v2: Additional cleanup suggested by Christian
v3: Additional cleanups suggested by Christian
v4: Additional cleanups suggested by David and
    vm fence fix
v5: cast seqno (David)

Cc: David.Wu3@amd.com
Cc: christian.koenig@amd.com
Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Save and restore all limit types
Mario Limonciello [Thu, 9 Oct 2025 20:59:06 +0000 (15:59 -0500)] 
drm/amd: Save and restore all limit types

Vangogh has separate limits for default PPT and fast PPT. Add
infrastructure to save both of these limits and restore both of them.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Remove second call to set_power_limit()
Mario Limonciello [Thu, 9 Oct 2025 20:59:05 +0000 (15:59 -0500)] 
drm/amd: Remove second call to set_power_limit()

The min/max limits only make sense for default PPT. Restructure
smu_set_power_limit() to only use them in that case.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Stop overloading power limit with limit type
Mario Limonciello [Thu, 9 Oct 2025 20:59:04 +0000 (15:59 -0500)] 
drm/amd: Stop overloading power limit with limit type

When passed around internally the upper 8 bits of power limit include
the limit type. This is non-obvious without digging into the nuances
of each function. Instead pass the limit type as an argument to all
applicable layers.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/userq: drop VCN and VPE doorbell handling
Alex Deucher [Wed, 8 Oct 2025 19:07:53 +0000 (15:07 -0400)] 
drm/amdgpu/userq: drop VCN and VPE doorbell handling

VCN and VPE userqs are not yet supported and this code is
not correct.  Userspace should provide the correct
doorbell offset with in their doorbell page for the IP.
Adjusting it here will not work as expected as userspace
and the queue itself will have different offsets.

We need to add a INFO IOCTL query to get the offset and
range for each IP within the doorbell page to handle this
properly.

Cc: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Pass userq suspend failures up to caller
Mario Limonciello [Thu, 2 Oct 2025 17:42:45 +0000 (12:42 -0500)] 
drm/amd: Pass userq suspend failures up to caller

If a userq failed to suspend the rest of the suspend sequence may
have problems.  Pass the error code up to the caller for a decision
on what to do.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Fix error handling with multiple userq IDRs
Mario Limonciello [Thu, 2 Oct 2025 17:42:44 +0000 (12:42 -0500)] 
drm/amd: Fix error handling with multiple userq IDRs

If multiple userq IDR are in use and there is an error handling one
at suspend or resume it will be silently discarded.
Switch the suspend/resume() code to use guards and return immediately.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Pass IP suspend errors up to callers
Mario Limonciello [Thu, 2 Oct 2025 17:42:43 +0000 (12:42 -0500)] 
drm/amd: Pass IP suspend errors up to callers

If IP suspend fails the callers should be notified so that they can
potentially react.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Don't always set IP block HW status to false
Mario Limonciello [Thu, 2 Oct 2025 17:42:42 +0000 (12:42 -0500)] 
drm/amd: Don't always set IP block HW status to false

amdgpu_device_ip_suspend_phase2() calls amdgpu_ip_block_suspend()
which already sets HW block status to false when succeeding with
IP suspend. Remove the explicit call in
amdgpu_device_ip_suspend_phase2() so that the status is accurate.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Remove comment about handling errors in amdgpu_device_ip_suspend_phase1()
Mario Limonciello [Thu, 2 Oct 2025 17:42:41 +0000 (12:42 -0500)] 
drm/amd: Remove comment about handling errors in amdgpu_device_ip_suspend_phase1()

Error handling was introduced in commit e095026f0066 ("drm/amdgpu:
validate suspend before function call") so the comment about TODO is no
longer needed.

Fixes: e095026f0066 ("drm/amdgpu: validate suspend before function call")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Stop exporting amdgpu_device_ip_suspend() outside amdgpu_device
Mario Limonciello [Thu, 2 Oct 2025 17:42:40 +0000 (12:42 -0500)] 
drm/amd: Stop exporting amdgpu_device_ip_suspend() outside amdgpu_device

amdgpu_device_ip_suspend() doesn't have a caller outside of
amdgpu_device.c. Make it static.

No intended functional changes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Unify shutdown() callback behavior
Mario Limonciello [Thu, 2 Oct 2025 17:42:39 +0000 (12:42 -0500)] 
drm/amd: Unify shutdown() callback behavior

[Why]
The shutdown() callback uses amdgpu_ip_suspend() which doesn't notify
drm clients during shutdown.  This could lead to hangs.

[How]
Change amdgpu_pci_shutdown() to call the same sequence as suspend/resume.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: validate userq va for GEM unmap
Prike Liang [Fri, 19 Sep 2025 07:14:41 +0000 (15:14 +0800)] 
drm/amdgpu: validate userq va for GEM unmap

When a user unmaps a userq VA, the driver must ensure
the queue has no in-flight jobs. If there is pending work,
the kernel should wait for the attached eviction (bookkeeping)
fence to signal before deleting the mapping.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: validate the queue va for resuming the queue
Prike Liang [Thu, 9 Oct 2025 08:45:27 +0000 (16:45 +0800)] 
drm/amdgpu: validate the queue va for resuming the queue

It requires validating the userq VA whether is mapped before
trying to resume the queue.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: keeping waiting userq fence infinitely
Prike Liang [Tue, 22 Jul 2025 05:43:51 +0000 (13:43 +0800)] 
drm/amdgpu: keeping waiting userq fence infinitely

Keeping waiting the userq fence infinitely until
hang detection, and then suspend the hang queue and
set the fence error.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: track the userq bo va for its obj management
Prike Liang [Thu, 9 Oct 2025 08:44:31 +0000 (16:44 +0800)] 
drm/amdgpu: track the userq bo va for its obj management

Track the userq obj for its life time, and reference and
dereference the buffer flag at its creating and destroying
period.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: add userq object va track helpers
Prike Liang [Mon, 29 Sep 2025 05:52:13 +0000 (13:52 +0800)] 
drm/amdgpu: add userq object va track helpers

Add the userq object virtual address list_add() helpers
for tracking the userq obj va address usage.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: reduce queue timeout to 2 seconds v2
Christian König [Thu, 25 Sep 2025 10:09:56 +0000 (12:09 +0200)] 
drm/amdgpu: reduce queue timeout to 2 seconds v2

There has been multiple complains that 10 seconds are usually to long.

The original requirement for longer timeout came from compute tests on
AMDVLK, since that is no longer a topic reduce the timeout back to 2
seconds for all queues.

While at it also remove any special handling for compute queues under
SRIOV or pass through.

v2: fix checkpatch warning.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Remove some unncessary header includes
Mario Limonciello [Wed, 1 Oct 2025 18:03:33 +0000 (13:03 -0500)] 
drm/amd: Remove some unncessary header includes

Unnecessary headers can slow down the build, drop em.

No intended functional changes.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Tested-by: Robert Beckett <bob.beckett@collabora.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Adjust whitespace for vangogh_ppt
Mario Limonciello [Mon, 6 Oct 2025 16:16:20 +0000 (11:16 -0500)] 
drm/amd: Adjust whitespace for vangogh_ppt

A few changes have more whitespace than needed.  Clean them up.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Tested-by: Robert Beckett <bob.beckett@collabora.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/mes: adjust the VMID masks
Alex Deucher [Mon, 6 Oct 2025 18:23:59 +0000 (14:23 -0400)] 
drm/amdgpu/mes: adjust the VMID masks

The firmware limits the max vmid, but align the
settings with the hw limits as well just to be safe.

Reviewed-by: Shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Skip SDMA suspend during mode-2 reset
Lijo Lazar [Wed, 8 Oct 2025 04:56:47 +0000 (10:26 +0530)] 
drm/amdgpu: Skip SDMA suspend during mode-2 reset

For SDMA IP versions >= v4.4.2, firmware will take care of quiescing SDMA
before mode-2 reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: remove gart_window_lock usage from gmc v12
Pierre-Eric Pelloux-Prayer [Mon, 15 Sep 2025 12:25:44 +0000 (14:25 +0200)] 
drm/amdgpu: remove gart_window_lock usage from gmc v12

This lock was part of the SDMA workaround originally implemented in
gmc_v10_0_flush_gpu_tlb (a70cb2176f7ef6f moved it to
amdgpu_gmc_flush_gpu_tlb).

This means this lock is useless and be safely dropped.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>