]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
6 weeks agodrm/amd/display: fix typo in display_mode_core_structs.h
Adi Gollamudi [Sun, 12 Oct 2025 19:13:19 +0000 (12:13 -0700)] 
drm/amd/display: fix typo in display_mode_core_structs.h

Fix a typo in a comment, change "enviroment" to "environment" in
drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h

Signed-off-by: Aditya Gollamudi <adigollamudi@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: add dccg dfs mask def
Charlene Liu [Mon, 29 Sep 2025 19:15:13 +0000 (15:15 -0400)] 
drm/amd/display: add dccg dfs mask def

[why]
add some register masks for DCCG

Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Remove unused field in DML
Alvin Lee [Mon, 29 Sep 2025 18:47:51 +0000 (14:47 -0400)] 
drm/amd/display: Remove unused field in DML

Remove unused fields.

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Fix NULL pointer dereference
Meenakshikumar Somasundaram [Mon, 29 Sep 2025 18:28:34 +0000 (14:28 -0400)] 
drm/amd/display: Fix NULL pointer dereference

[Why]
On a mst branch with multi display setup, dc context is obselete
after updating the first stream. Referencing the same dc context
for the next stream update to fetch dc pointer leads to NULL
pointer dereference.

[How]
Get the dc pointer from the link rather than context.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: add dispclk ramping to dcn35.
Charlene Liu [Fri, 26 Sep 2025 19:51:15 +0000 (15:51 -0400)] 
drm/amd/display: add dispclk ramping to dcn35.

[why]
this is a required logic based on HW programming guide.
tested/ported on dcn401.

Reviewed-by: Yihan Zhu <yihan.zhu@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: Add debug option to override EASF scaler taps
Samson Tam [Thu, 25 Sep 2025 19:01:23 +0000 (15:01 -0400)] 
drm/amd/display: Add debug option to override EASF scaler taps

[Why & How]
Add new option override_easf to use in_taps instead of internal
 taps policy for debugging

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amd/display: fix duplicate aux command with AMD aux backlight
Harry VanZyllDeJong [Wed, 17 Sep 2025 20:46:13 +0000 (16:46 -0400)] 
drm/amd/display: fix duplicate aux command with AMD aux backlight

when using AMD aux backlight control, we avoid sending backlight
update commands to DMUB firmware because it is controlled by aux commands
in driver.

Reviewed-by: Iswara Nagulendran <iswara.nagulendran@amd.com>
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Harry VanZyllDeJong <hvanzyll@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add ras module eeprom safety watermark check
YiPeng Chai [Wed, 26 Mar 2025 10:03:49 +0000 (18:03 +0800)] 
drm/amdgpu: Add ras module eeprom safety watermark check

Add ras module eeprom safety watermark check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Avoid hive seqno increment in legacy ras
YiPeng Chai [Tue, 25 Mar 2025 06:11:10 +0000 (14:11 +0800)] 
drm/amdgpu: Avoid hive seqno increment in legacy ras

The hive->event_mgr variable is used by both ras module
and legacy ras. To ensure the continuity of hive seqno
growth, after enabling ras module, it is forbidden to
operate the event_mgr variable in legacy ras.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Add poison consumption sequence numbers for gfx and sdma
YiPeng Chai [Mon, 24 Mar 2025 10:20:17 +0000 (18:20 +0800)] 
drm/amdgpu: Add poison consumption sequence numbers for gfx and sdma

Add poison consumption sequence numbers for
gfx and sdma.

V3:
  Use RAS_EVENT_LOG to print ras log info.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Avoid loading bad pages into legacy ras
YiPeng Chai [Mon, 21 Jul 2025 07:15:53 +0000 (15:15 +0800)] 
drm/amdgpu: Avoid loading bad pages into legacy ras

When ras module is enabled, the bad pages will
be loaded by ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: add ras module rma check
YiPeng Chai [Tue, 17 Jun 2025 07:16:59 +0000 (15:16 +0800)] 
drm/amdgpu: add ras module rma check

Add ras module rma check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Improve ras fatal error handling function
YiPeng Chai [Mon, 21 Jul 2025 07:14:03 +0000 (15:14 +0800)] 
drm/amdgpu: Improve ras fatal error handling function

In multi-gpu case, a fatal error will generate several
fatal error interrupts. After improving this function,
the ras module can reuse this function to only
handle the first interrupt.

V3:
  Initialize event_id using RAS_EVENT_INVALID_ID.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
6 weeks agodrm/amdgpu: Intercept ras interrupts to ras module
YiPeng Chai [Sun, 28 Sep 2025 06:25:27 +0000 (14:25 +0800)] 
drm/amdgpu: Intercept ras interrupts to ras module

Intercept ras interrupts to ras module.

V2:
  Change function names in ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Drop calls to restore power limit and clock from smu_resume()
Mario Limonciello [Thu, 9 Oct 2025 20:59:07 +0000 (15:59 -0500)] 
drm/amd: Drop calls to restore power limit and clock from smu_resume()

User requested power limits and clock settings are already restored as
part of smu_restore_dpm_user_profile(). It's unnecessary to call the
same restore as part of smu_resume().

Revert the following commits to drop that extra restore:
commit ed4efe426a49 ("drm/amd: Restore cached power limit during resume")
commit 796ff8a7e01b ("drm/amd: Restore cached manual clock settings during resume")
commit f9b80514a722 ("drm/amd: Only restore cached manual clock settings in restore if OD enabled")

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: update remove after reset flag for MES remove queue
Jonathan Kim [Wed, 18 Jun 2025 14:45:55 +0000 (10:45 -0400)] 
drm/amdgpu: update remove after reset flag for MES remove queue

Remove queue after reset flag is required to remove a queue that has
been successfully reset to clean up the MES' internal state.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Add ras module files into amdgpu
YiPeng Chai [Tue, 30 Sep 2025 02:47:49 +0000 (10:47 +0800)] 
drm/amdgpu: Add ras module files into amdgpu

Add ras module files into amdgpu.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/userqueue: validate userptrs for userqueues
Sunil Khatri [Thu, 25 Sep 2025 09:01:54 +0000 (14:31 +0530)] 
drm/amdgpu/userqueue: validate userptrs for userqueues

userptrs could be changed by the user at any time and
hence while locking all the bos before GPU start processing
validate all the userptr bos.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: update the functions to use amdgpu version of hmm
Sunil Khatri [Fri, 10 Oct 2025 12:39:57 +0000 (18:09 +0530)] 
drm/amdgpu: update the functions to use amdgpu version of hmm

At times we need a bo reference for hmm and for that add
a new struct amdgpu_hmm_range which will hold an optional
bo member and hmm_range.

Use amdgpu_hmm_range instead of hmm_range and let the bo
as an optional argument for the caller if they want to
the bo reference to be taken or they want to handle that
explicitly.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Reserve discovery TMR only if needed
Lijo Lazar [Fri, 10 Oct 2025 11:53:33 +0000 (17:23 +0530)] 
drm/amdgpu: Reserve discovery TMR only if needed

For legacy SOCs, discovery binary is sideloaded. Instead of checking for
binary blob, use a flag to determine if discovery region needs to be
reserved.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/pm: export a function amdgpu_smu_ras_send_msg to allow send msg directly
YiPeng Chai [Tue, 30 Sep 2025 02:46:05 +0000 (10:46 +0800)] 
drm/amd/pm: export a function amdgpu_smu_ras_send_msg to allow send msg directly

provide a interface that allows ras client send msg to smu/pmfw directly.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/pm: Grant interface access after full init
Lijo Lazar [Wed, 8 Oct 2025 07:37:13 +0000 (13:07 +0530)] 
drm/amd/pm: Grant interface access after full init

Allow access to user interfaces like sysfs/hwmon only after full
initialization of the device. When device is part of XGMI hive and a
reset is required during initialization, the inteface files will be
created as part of minimal device initialization. Full initialization of
the device will be done only after all devices in XGMI hive are probed
and a reset is done together on all.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Move reset-on-init sequence earlier
Lijo Lazar [Tue, 7 Oct 2025 13:00:24 +0000 (18:30 +0530)] 
drm/amdgpu: Move reset-on-init sequence earlier

Complete reset-on-init sequence before sysfs interfaces are created.
Devices get properly initiaized only after reset, and then only sysfs
interfaces should be made available.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Add amdgpu_discovery_info
Lijo Lazar [Fri, 10 Oct 2025 11:42:52 +0000 (17:12 +0530)] 
drm/amdgpu: Add amdgpu_discovery_info

Add amdgpu_discovery_info structure to keep all discovery related
information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Reorganize sysfs ini/fini calls
Lijo Lazar [Tue, 7 Oct 2025 12:42:04 +0000 (18:12 +0530)] 
drm/amdgpu: Reorganize sysfs ini/fini calls

Aggregate sysfs ini/fini calls into separate functions. No functional
change.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: clean up and unify hw fence handling
Alex Deucher [Wed, 27 Aug 2025 15:34:14 +0000 (11:34 -0400)] 
drm/amdgpu: clean up and unify hw fence handling

Decouple the amdgpu fence from the amdgpu_job structure.
This lets us clean up the separate fence ops for the embedded
fence and other fences.  This also allows us to allocate the
vm fence up front when we allocate the job.

v2: Additional cleanup suggested by Christian
v3: Additional cleanups suggested by Christian
v4: Additional cleanups suggested by David and
    vm fence fix
v5: cast seqno (David)

Cc: David.Wu3@amd.com
Cc: christian.koenig@amd.com
Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Save and restore all limit types
Mario Limonciello [Thu, 9 Oct 2025 20:59:06 +0000 (15:59 -0500)] 
drm/amd: Save and restore all limit types

Vangogh has separate limits for default PPT and fast PPT. Add
infrastructure to save both of these limits and restore both of them.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Remove second call to set_power_limit()
Mario Limonciello [Thu, 9 Oct 2025 20:59:05 +0000 (15:59 -0500)] 
drm/amd: Remove second call to set_power_limit()

The min/max limits only make sense for default PPT. Restructure
smu_set_power_limit() to only use them in that case.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Stop overloading power limit with limit type
Mario Limonciello [Thu, 9 Oct 2025 20:59:04 +0000 (15:59 -0500)] 
drm/amd: Stop overloading power limit with limit type

When passed around internally the upper 8 bits of power limit include
the limit type. This is non-obvious without digging into the nuances
of each function. Instead pass the limit type as an argument to all
applicable layers.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/userq: drop VCN and VPE doorbell handling
Alex Deucher [Wed, 8 Oct 2025 19:07:53 +0000 (15:07 -0400)] 
drm/amdgpu/userq: drop VCN and VPE doorbell handling

VCN and VPE userqs are not yet supported and this code is
not correct.  Userspace should provide the correct
doorbell offset with in their doorbell page for the IP.
Adjusting it here will not work as expected as userspace
and the queue itself will have different offsets.

We need to add a INFO IOCTL query to get the offset and
range for each IP within the doorbell page to handle this
properly.

Cc: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Pass userq suspend failures up to caller
Mario Limonciello [Thu, 2 Oct 2025 17:42:45 +0000 (12:42 -0500)] 
drm/amd: Pass userq suspend failures up to caller

If a userq failed to suspend the rest of the suspend sequence may
have problems.  Pass the error code up to the caller for a decision
on what to do.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Fix error handling with multiple userq IDRs
Mario Limonciello [Thu, 2 Oct 2025 17:42:44 +0000 (12:42 -0500)] 
drm/amd: Fix error handling with multiple userq IDRs

If multiple userq IDR are in use and there is an error handling one
at suspend or resume it will be silently discarded.
Switch the suspend/resume() code to use guards and return immediately.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Pass IP suspend errors up to callers
Mario Limonciello [Thu, 2 Oct 2025 17:42:43 +0000 (12:42 -0500)] 
drm/amd: Pass IP suspend errors up to callers

If IP suspend fails the callers should be notified so that they can
potentially react.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Don't always set IP block HW status to false
Mario Limonciello [Thu, 2 Oct 2025 17:42:42 +0000 (12:42 -0500)] 
drm/amd: Don't always set IP block HW status to false

amdgpu_device_ip_suspend_phase2() calls amdgpu_ip_block_suspend()
which already sets HW block status to false when succeeding with
IP suspend. Remove the explicit call in
amdgpu_device_ip_suspend_phase2() so that the status is accurate.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Remove comment about handling errors in amdgpu_device_ip_suspend_phase1()
Mario Limonciello [Thu, 2 Oct 2025 17:42:41 +0000 (12:42 -0500)] 
drm/amd: Remove comment about handling errors in amdgpu_device_ip_suspend_phase1()

Error handling was introduced in commit e095026f0066 ("drm/amdgpu:
validate suspend before function call") so the comment about TODO is no
longer needed.

Fixes: e095026f0066 ("drm/amdgpu: validate suspend before function call")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Stop exporting amdgpu_device_ip_suspend() outside amdgpu_device
Mario Limonciello [Thu, 2 Oct 2025 17:42:40 +0000 (12:42 -0500)] 
drm/amd: Stop exporting amdgpu_device_ip_suspend() outside amdgpu_device

amdgpu_device_ip_suspend() doesn't have a caller outside of
amdgpu_device.c. Make it static.

No intended functional changes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Unify shutdown() callback behavior
Mario Limonciello [Thu, 2 Oct 2025 17:42:39 +0000 (12:42 -0500)] 
drm/amd: Unify shutdown() callback behavior

[Why]
The shutdown() callback uses amdgpu_ip_suspend() which doesn't notify
drm clients during shutdown.  This could lead to hangs.

[How]
Change amdgpu_pci_shutdown() to call the same sequence as suspend/resume.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: validate userq va for GEM unmap
Prike Liang [Fri, 19 Sep 2025 07:14:41 +0000 (15:14 +0800)] 
drm/amdgpu: validate userq va for GEM unmap

When a user unmaps a userq VA, the driver must ensure
the queue has no in-flight jobs. If there is pending work,
the kernel should wait for the attached eviction (bookkeeping)
fence to signal before deleting the mapping.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: validate the queue va for resuming the queue
Prike Liang [Thu, 9 Oct 2025 08:45:27 +0000 (16:45 +0800)] 
drm/amdgpu: validate the queue va for resuming the queue

It requires validating the userq VA whether is mapped before
trying to resume the queue.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: keeping waiting userq fence infinitely
Prike Liang [Tue, 22 Jul 2025 05:43:51 +0000 (13:43 +0800)] 
drm/amdgpu: keeping waiting userq fence infinitely

Keeping waiting the userq fence infinitely until
hang detection, and then suspend the hang queue and
set the fence error.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: track the userq bo va for its obj management
Prike Liang [Thu, 9 Oct 2025 08:44:31 +0000 (16:44 +0800)] 
drm/amdgpu: track the userq bo va for its obj management

Track the userq obj for its life time, and reference and
dereference the buffer flag at its creating and destroying
period.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: add userq object va track helpers
Prike Liang [Mon, 29 Sep 2025 05:52:13 +0000 (13:52 +0800)] 
drm/amdgpu: add userq object va track helpers

Add the userq object virtual address list_add() helpers
for tracking the userq obj va address usage.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: reduce queue timeout to 2 seconds v2
Christian König [Thu, 25 Sep 2025 10:09:56 +0000 (12:09 +0200)] 
drm/amdgpu: reduce queue timeout to 2 seconds v2

There has been multiple complains that 10 seconds are usually to long.

The original requirement for longer timeout came from compute tests on
AMDVLK, since that is no longer a topic reduce the timeout back to 2
seconds for all queues.

While at it also remove any special handling for compute queues under
SRIOV or pass through.

v2: fix checkpatch warning.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Remove some unncessary header includes
Mario Limonciello [Wed, 1 Oct 2025 18:03:33 +0000 (13:03 -0500)] 
drm/amd: Remove some unncessary header includes

Unnecessary headers can slow down the build, drop em.

No intended functional changes.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Tested-by: Robert Beckett <bob.beckett@collabora.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd: Adjust whitespace for vangogh_ppt
Mario Limonciello [Mon, 6 Oct 2025 16:16:20 +0000 (11:16 -0500)] 
drm/amd: Adjust whitespace for vangogh_ppt

A few changes have more whitespace than needed.  Clean them up.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Tested-by: Robert Beckett <bob.beckett@collabora.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/mes: adjust the VMID masks
Alex Deucher [Mon, 6 Oct 2025 18:23:59 +0000 (14:23 -0400)] 
drm/amdgpu/mes: adjust the VMID masks

The firmware limits the max vmid, but align the
settings with the hw limits as well just to be safe.

Reviewed-by: Shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Skip SDMA suspend during mode-2 reset
Lijo Lazar [Wed, 8 Oct 2025 04:56:47 +0000 (10:26 +0530)] 
drm/amdgpu: Skip SDMA suspend during mode-2 reset

For SDMA IP versions >= v4.4.2, firmware will take care of quiescing SDMA
before mode-2 reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: remove gart_window_lock usage from gmc v12
Pierre-Eric Pelloux-Prayer [Mon, 15 Sep 2025 12:25:44 +0000 (14:25 +0200)] 
drm/amdgpu: remove gart_window_lock usage from gmc v12

This lock was part of the SDMA workaround originally implemented in
gmc_v10_0_flush_gpu_tlb (a70cb2176f7ef6f moved it to
amdgpu_gmc_flush_gpu_tlb).

This means this lock is useless and be safely dropped.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: make non-NULL out fence mandatory
Pierre-Eric Pelloux-Prayer [Fri, 5 Sep 2025 08:19:29 +0000 (10:19 +0200)] 
drm/amdgpu: make non-NULL out fence mandatory

amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out
fence is non-NULL to simplify the code.
Since none of the pointers should be NULL, we can enable
__attribute__((nonnull))__.

While at it make the function static since it's only used from
amdgpuu_ttm.c.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: Remove redundant return value
Lijo Lazar [Tue, 16 Sep 2025 05:50:42 +0000 (11:20 +0530)] 
drm/amdgpu: Remove redundant return value

gfx_v9_4_3_xcc_kcq_init_queue doesn't have a fail condition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu/userq: extend userq state
Prike Liang [Tue, 22 Jul 2025 03:14:28 +0000 (11:14 +0800)] 
drm/amdgpu/userq: extend userq state

Extend the userq state for identifying the
userq invalid cases.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Promote DC to 3.2.353
Taimur Hassan [Fri, 26 Sep 2025 23:58:54 +0000 (18:58 -0500)] 
drm/amd/display: Promote DC to 3.2.353

- [FW Promotion] Release 0.1.30.0
- Driver implementation for cursor offloading to DMU
- Incorrect Mirror Cositing
- Enable Dynamic DTBCLK Switch
- Remove comparing uint32_t to zero
- Remove inaccessible URL

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: [FW Promotion] Release 0.1.30.0
Taimur Hassan [Fri, 26 Sep 2025 20:15:01 +0000 (16:15 -0400)] 
drm/amd/display: [FW Promotion] Release 0.1.30.0

Add new SMART_POWER_HDR commands to optimize power consumption on
certain OLED LED panels by sending MaxCLL per frame to TCON.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Driver implementation for cursor offloading to DMU
Nicholas Kazlauskas [Tue, 26 Aug 2025 21:12:44 +0000 (17:12 -0400)] 
drm/amd/display: Driver implementation for cursor offloading to DMU

[Why]
We require an interlock between driver and firmware for upcoming
features and given that this could possibly happen on any single
cursor programming call (and that we can't asynchronously wait for
firmware to respond because of it) we'd be regressing cursor performance
by at least an extra 40us per call.

When we could possibly have cursor update every 20us - 100s from high
frequency gaming mice this means that we'd be stuttering or dropping
updates and impacting overall cursor performance.

We want a solution that can:

1. Interlock between other firmware features
2. Not stall out or require the DMCUB lock for every single update

[How]
When cursor offloading is enabled and supported by an ASIC driver will
route the cursor programming through to DMU as part of the regular
DC stream cursor programming interfaces for attributes and position.

The atomic pipe programming version will not be updated: this will still
follow the existing programming path by keeping track of a field that
specifies when the register writes should be deferred to DMU.

Cursor locking is not required when cursor offload is in progress since
the updates are consolidated and processed by DMU once at the end
of the frame in a periodic manner.

The shared buffer the firmware queries from is allocated along with the
rest of the scratch state region in an area that's accessible by
both firmware and driver.

The size of the cursor offload (v1) state will not change, but it does
have a unique union per ASIC version with room for expansion if needed.

When firmware features notifying DMU of DRR updates are not enabled we
now send an explicit vtotal min/max update via driver to DMU firmware
whenever the vtotal max changes. This is to allow the cursor programming
to determine the appropriate latch update point offset from vupdate.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Remove comparing uint32_t to zero
Alex Hung [Wed, 24 Sep 2025 15:27:35 +0000 (09:27 -0600)] 
drm/amd/display: Remove comparing uint32_t to zero

[WHAT]
These *bypass are uint32_t and they will never be less than zero.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Remove inaccessible URL
Clay King [Mon, 22 Sep 2025 14:33:21 +0000 (10:33 -0400)] 
drm/amd/display: Remove inaccessible URL

[WHAT]
Remove inaccessible link.

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Promote DC to 3.2.352
Taimur Hassan [Fri, 19 Sep 2025 22:20:17 +0000 (17:20 -0500)] 
drm/amd/display: Promote DC to 3.2.352

- Fix slice width calculation for YCbCr420
- Fix DTBCLK gating
- Use NRD cap as lttpr cap
- Consolidate DML2 FP guards
- DML2.1 Update
- Firmware Release 0.1.29.0 changes

Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: [FW Promotion] Release 0.1.29.0
Taimur Hassan [Fri, 19 Sep 2025 20:22:48 +0000 (16:22 -0400)] 
drm/amd/display: [FW Promotion] Release 0.1.29.0

Add new interface for offloading cursor programming to DMUB.

Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched
Fangzhi Zuo [Thu, 18 Sep 2025 20:25:45 +0000 (16:25 -0400)] 
drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched

[why]
1. With allow_0_dtb_clk enabled, the time required to latch DTBCLK to 600 MHz
depends on the SMU. If DTBCLK is not latched to 600 MHz before set_mode completes,
gating DTBCLK causes the DP2 sink to lose its clock source.

2. The existing DTBCLK gating sequence ungates DTBCLK based on both pix_clk and ref_dtbclk,
but gates DTBCLK when either pix_clk or ref_dtbclk is zero.
pix_clk can be zero outside the set_mode sequence before DTBCLK is properly latched,
which can lead to DTBCLK being gated by mistake.

[how]
Consider both pixel_clk and ref_dtbclk when determining when it is safe to gate DTBCLK;
this is more accurate.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: lttpr cap should be nrd cap in bw_alloc mode
Peichen Huang [Thu, 11 Sep 2025 05:41:16 +0000 (13:41 +0800)] 
drm/amd/display: lttpr cap should be nrd cap in bw_alloc mode

[WHY]
When bw allocation mode enabled, dpia may reports lttpr cap with
reduced common cap. It would cause driver not start pre-training with
max available bandwidth.

[How]
When bw allocation mode enabled, use NRD cap as lttpr cap.

Reviewed-by: Cruise Hung <cruise.hung@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Rename FAMS2 global control lock to DMUB HW control lock
Nicholas Kazlauskas [Wed, 17 Sep 2025 17:00:58 +0000 (13:00 -0400)] 
drm/amd/display: Rename FAMS2 global control lock to DMUB HW control lock

[Why]
FAMS2 dictates whether the inbox0 HW lock is required, but it is not the
only feature that may determine this.

In order to leverage the faster inbox0 HW lock in place of the inbox1
ringbuffer based control lock it's desirable to utilize the HWSS
based locking protocol FAMS2 has already implemented.

[How]
Rename the FAMS2 global control lock to DMUB HW control lock.

This is purely a refactor with no functional change, the logic that will
determine which features need to enable this HW lock will be added in a
future commit.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Rename should_use_dmub_lock to reflect inbox1 usage
Nicholas Kazlauskas [Wed, 17 Sep 2025 15:46:56 +0000 (11:46 -0400)] 
drm/amd/display: Rename should_use_dmub_lock to reflect inbox1 usage

[Why]
Newer DCN use the DMCUB HW lock via inbox0 for performance reasons while
older ones will use inbox1.

The should_use_dmub_lock() function does not describe whether the lock
in general should be used, but whether it should be used via inbox1.

[How]
Rename the function to should_use_dmub_inbox1_lock() to reflect this.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Support possibly NULL link for should_use_dmub_lock
Nicholas Kazlauskas [Tue, 16 Sep 2025 17:57:17 +0000 (13:57 -0400)] 
drm/amd/display: Support possibly NULL link for should_use_dmub_lock

[Why]
It's possible to have a stream enabled without a link or link encoder.

There are cases where we'd still like to interlock the driver
programming from firmware programming to ensure that we don't put the
hardware in an undefined (or error) state if two programming sequences
are simultaneously executed on the same hardware blocks.

[How]
Add an explicit DC parameter to should_use_dmub_lock().

Make pointers to should_use_dmub_lock() const since it's a checker
function that shouldn't modify state.

Update the callsites to pass in DC explicitly.

Check that the link is non-NULL before deferencing and performing link
based checks.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Consolidate two DML2 FP guards
Ivan Lipski [Wed, 17 Sep 2025 15:21:30 +0000 (11:21 -0400)] 
drm/amd/display: Consolidate two DML2 FP guards

[Why&How]
Consolidate two FP guards into one in dml2 since they are separated by
one line of code, independent from the guard.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: Correct slice width calculation for YCbCr420
Relja Vojvodic [Wed, 17 Sep 2025 16:30:51 +0000 (12:30 -0400)] 
drm/amd/display: Correct slice width calculation for YCbCr420

[Why]
-OVT compliance testing for 5120x2880p300Hz YCbCr420 was failing due to
incorrect slice width being calculated

[How]
-Ensure slice width is divisible by 2 for 420 to comply with spec

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/display: DML2.1 Reintegration
Austin Zheng [Thu, 11 Sep 2025 19:37:32 +0000 (15:37 -0400)] 
drm/amd/display: DML2.1 Reintegration

[Summary of changes]
- Updated structs
- Renaming of variables for clarity

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add unified ras module top-level makefile
YiPeng Chai [Mon, 24 Mar 2025 07:13:20 +0000 (15:13 +0800)] 
drm/amd/ras: Add unified ras module top-level makefile

Add unified ras module top-level makefile.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add files to amdgpu ras manager makefile
YiPeng Chai [Mon, 17 Mar 2025 10:04:10 +0000 (18:04 +0800)] 
drm/amd/ras: Add files to amdgpu ras manager makefile

Add files to amdgpu ras manager makefile.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add amdgpu ras management function.
YiPeng Chai [Mon, 17 Mar 2025 10:03:28 +0000 (18:03 +0800)] 
drm/amd/ras: Add amdgpu ras management function.

Add amdgpu system configuration parameters and
functions needed by rascore.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Amdgpu preprocesses ras interrupts
YiPeng Chai [Mon, 24 Mar 2025 10:46:06 +0000 (18:46 +0800)] 
drm/amd/ras: Amdgpu preprocesses ras interrupts

Amdgpu preprocesses ras interrupts.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add amdgpu ras system functions
YiPeng Chai [Mon, 24 Mar 2025 10:44:11 +0000 (18:44 +0800)] 
drm/amd/ras: Add amdgpu ras system functions

Add amdgpu ras system functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Amdgpu handle ras ioctl command
YiPeng Chai [Mon, 17 Mar 2025 10:02:13 +0000 (18:02 +0800)] 
drm/amd/ras: Amdgpu handle ras ioctl command

Amdgpu handle ras ioctl command.

V2:
  Remove non-standard device information.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add amdgpu eeprom i2c configuration function
YiPeng Chai [Mon, 17 Mar 2025 10:00:48 +0000 (18:00 +0800)] 
drm/amd/ras: Add amdgpu eeprom i2c configuration function

Add amdgpu eeprom i2c configuration function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add amdgpu mp1 v13_0 configuration function
YiPeng Chai [Mon, 17 Mar 2025 09:59:25 +0000 (17:59 +0800)] 
drm/amd/ras: Add amdgpu mp1 v13_0 configuration function

Add amdgpu mp1 v13_0 configuration function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add amdgpu nbio v7_9 configuration function
YiPeng Chai [Mon, 24 Mar 2025 07:12:07 +0000 (15:12 +0800)] 
drm/amd/ras: Add amdgpu nbio v7_9 configuration function

Add amdgpu nbio v7_9 configuration function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add files to ras core Makefile
YiPeng Chai [Mon, 17 Mar 2025 09:55:37 +0000 (17:55 +0800)] 
drm/amd/ras: Add files to ras core Makefile

Add files to ras core Makefile.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add rascore unified interface function
YiPeng Chai [Mon, 17 Mar 2025 09:13:19 +0000 (17:13 +0800)] 
drm/amd/ras: Add rascore unified interface function

1. Complete the initialization call of all
   sub-functions.
2. Export common interfaces.

V2:
  Remove the use of typedef to define function pointer.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add cper conversion function
YiPeng Chai [Mon, 17 Mar 2025 09:42:17 +0000 (17:42 +0800)] 
drm/amd/ras: Add cper conversion function

Add cper conversion function.

V3:
  Change commit message and update the calling function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Use ring buffer to record ras ecc data
YiPeng Chai [Mon, 17 Mar 2025 09:37:35 +0000 (17:37 +0800)] 
drm/amd/ras: Use ring buffer to record ras ecc data

Use ring buffer to record ras ecc data.

V3:
  Change commit message and rename the file and
  function names.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add thread to handle ras events
YiPeng Chai [Mon, 17 Mar 2025 09:33:58 +0000 (17:33 +0800)] 
drm/amd/ras: Add thread to handle ras events

Add thread to handle ras events.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add ras ioctl command handler
YiPeng Chai [Mon, 17 Mar 2025 09:31:24 +0000 (17:31 +0800)] 
drm/amd/ras: Add ras ioctl command handler

Add ras ioctl command handler.

V2:
  Remove ras global device list.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add psp ras common functions
YiPeng Chai [Thu, 5 Jun 2025 09:46:51 +0000 (17:46 +0800)] 
drm/amd/ras: Add psp ras common functions

Add psp ras common functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add psp v13_0 ras functions
YiPeng Chai [Thu, 5 Jun 2025 09:46:08 +0000 (17:46 +0800)] 
drm/amd/ras: Add psp v13_0 ras functions

Add psp v13_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add eeprom ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:30:48 +0000 (17:30 +0800)] 
drm/amd/ras: Add eeprom ras functions

Add eeprom ras functions.

V5:
  Remove duplicate data structure definition.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add gfx common ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:28:14 +0000 (17:28 +0800)] 
drm/amd/ras: Add gfx common ras functions

Add gfx common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add gfx v9_0 ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:29:02 +0000 (17:29 +0800)] 
drm/amd/ras: Add gfx v9_0 ras functions

Add gfx v9_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add umc common ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:26:49 +0000 (17:26 +0800)] 
drm/amd/ras: Add umc common ras functions

Add umc common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add umc v12_0 ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:27:44 +0000 (17:27 +0800)] 
drm/amd/ras: Add umc v12_0 ras functions

Add umc v12_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add nbio common ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:25:04 +0000 (17:25 +0800)] 
drm/amd/ras: Add nbio common ras functions

Add nbio common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add nbio v7_9 ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:25:59 +0000 (17:25 +0800)] 
drm/amd/ras: Add nbio v7_9 ras functions

Add nbio v7_9 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add mp1 common ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:20:28 +0000 (17:20 +0800)] 
drm/amd/ras: Add mp1 common ras functions

Add mp1 common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add mp1 v13_0 ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:21:41 +0000 (17:21 +0800)] 
drm/amd/ras: Add mp1 v13_0 ras functions

Add mp1 v13_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add aca common ras functions
YiPeng Chai [Mon, 17 Mar 2025 09:15:59 +0000 (17:15 +0800)] 
drm/amd/ras: Add aca common ras functions

Add aca common ras functions:
1. Aca hw init/fini.
2. Get ecc count of each ras block.
3. Update query ecc count from mp1.
4. Clear ras block ecc count.

V3:
  Update the calling function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amd/ras: Add ras aca parser v1.0
YiPeng Chai [Mon, 17 Mar 2025 09:16:54 +0000 (17:16 +0800)] 
drm/amd/ras: Add ras aca parser v1.0

Add ras aca parser v1.0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: clean up amdgpu hmm range functions
Sunil Khatri [Tue, 30 Sep 2025 08:15:11 +0000 (13:45 +0530)] 
drm/amdgpu: clean up amdgpu hmm range functions

Clean up the amdgpu hmm range functions for clearer
definition of each.

a. Split amdgpu_ttm_tt_get_user_pages_done into two:
   1. amdgpu_hmm_range_valid: To check if the user pages
      are valid and update seq num
   2. amdgpu_hmm_range_free: Clean up the hmm range
      and pfn memory.

b. amdgpu_ttm_tt_get_user_pages_done and
   amdgpu_ttm_tt_discard_user_pages are similar function so remove
   discard and directly use amdgpu_hmm_range_free to clean up the
   hmm range and pfn memory.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages
Sunil Khatri [Wed, 24 Sep 2025 06:53:26 +0000 (12:23 +0530)] 
drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages

update the amdgpu_ttm_tt_get_user_pages and all dependent function
along with it callers to use a user allocated hmm_range buffer instead
hmm layer allocates the buffer.

This is a need to get hmm_range pointers easily accessible
without accessing the bo and that is a requirement for the
userqueue to lock the userptrs effectively.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdkfd: fix suspend/resume all calls in mes based eviction path
Jonathan Kim [Wed, 18 Jun 2025 14:31:15 +0000 (10:31 -0400)] 
drm/amdkfd: fix suspend/resume all calls in mes based eviction path

Suspend/resume all gangs should be done with the device lock is held.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: enable suspend/resume all for gfx 12
Jonathan Kim [Thu, 5 Jun 2025 14:18:37 +0000 (10:18 -0400)] 
drm/amdgpu: enable suspend/resume all for gfx 12

Suspend/resume all gangs has been available for GFX12 for a while now
so enable it.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: fix hung reset queue array memory allocation
Jonathan Kim [Thu, 9 Oct 2025 15:28:19 +0000 (11:28 -0400)] 
drm/amdgpu: fix hung reset queue array memory allocation

By design the MES will return an array result that is twice the number
of hung doorbells it can report.

i.e. if up k reported doorbells are supported, then the
second half of the array, also of length k, holds the HQD information
(type/queue/pipe) where queue 1 corresponds to index 0 and k,
queue 2 corresponds to index 1 and k + 1 etc ...

The driver will use the HDQ info to target queue/pipe reset for
hardware scheduled user compute queues.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
7 weeks agodrm/amdgpu: fix initialization of doorbell array for detect and hang
Jonathan Kim [Thu, 9 Oct 2025 14:48:09 +0000 (10:48 -0400)] 
drm/amdgpu: fix initialization of doorbell array for detect and hang

Initialized doorbells should be set to invalid rather than 0 to prevent
driver from over counting hung doorbells since it checks against the
invalid value to begin with.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>