]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 months agodrm/amdgpu: fix gpu idle power consumption issue for gfx v12
Yang Wang [Wed, 4 Mar 2026 23:45:45 +0000 (18:45 -0500)] 
drm/amdgpu: fix gpu idle power consumption issue for gfx v12

Older versions of the MES firmware may cause abnormal GPU power consumption.
When performing inference tasks on the GPU (e.g., with Ollama using ROCm),
the GPU may show abnormal power consumption in idle state and incorrect GPU load information.
This issue has been fixed in firmware version 0x8b and newer.

Closes: https://github.com/ROCm/ROCm/issues/5706
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Fix kernel-doc comments for some LUT properties
Cristian Ciocaltea [Thu, 5 Mar 2026 11:16:36 +0000 (13:16 +0200)] 
drm/amdgpu: Fix kernel-doc comments for some LUT properties

The following members of struct amdgpu_mode_info do not have valid
references in the related kernel-doc sections:

 - plane_shaper_lut_property
 - plane_shaper_lut_size_property,
 - plane_lut3d_size_property

Correct all affected comment blocks.

Fixes: f545d82479b4 ("drm/amd/display: add plane shaper LUT and TF driver-specific properties")
Fixes: 671994e3bf33 ("drm/amd/display: add plane 3D LUT driver-specific properties")
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/amdgpu_connectors: remove amdgpu_connector_free_edid
Joshua Peisach [Tue, 3 Mar 2026 21:18:23 +0000 (16:18 -0500)] 
drm/amdgpu/amdgpu_connectors: remove amdgpu_connector_free_edid

Now that we are using struct drm_edid, we can just call drm_edid_free
directly. Remove the function and update calls to drm_edid_free.

Signed-off-by: Joshua Peisach <jpeisach@ubuntu.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/amdgpu_connectors: use struct drm_edid instead of struct edid
Joshua Peisach [Tue, 3 Mar 2026 21:18:22 +0000 (16:18 -0500)] 
drm/amdgpu/amdgpu_connectors: use struct drm_edid instead of struct edid

Some amdgpu code is still using deprecated edid functions. Switch to
the newer functions and update the amdgpu_connector struct's edid type
to the drm_edid type.

At the same time, use the raw EDID when we need to for speaker
allocations and for determining if the input is digital.

Signed-off-by: Joshua Peisach <jpeisach@ubuntu.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/gfx12.1: add support for disable_kq
Alex Deucher [Wed, 28 Jan 2026 19:50:31 +0000 (14:50 -0500)] 
drm/amdgpu/gfx12.1: add support for disable_kq

Plumb in support for disabling kernel queues and make it
the default.  For testing, kernel queues can be re-enabled
by setting amdgpu.user_queue=0

v2: integrate feedback from Lijo

Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd: Fix a few more NULL pointer dereference in device cleanup
Mario Limonciello [Thu, 5 Mar 2026 15:06:11 +0000 (09:06 -0600)] 
drm/amd: Fix a few more NULL pointer dereference in device cleanup

I found a few more paths that cleanup fails due to a NULL version pointer
on unsupported hardware.

Add NULL checks as applicable.

Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: gfx 12.1 cleanups
Alex Deucher [Wed, 4 Mar 2026 15:06:04 +0000 (10:06 -0500)] 
drm/amdgpu: gfx 12.1 cleanups

Remove some remnants from when the code was forked
from gfx 12.0.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/radeon: Test for fbdev GEM object with generic helper
Thomas Zimmermann [Wed, 4 Mar 2026 12:58:39 +0000 (13:58 +0100)] 
drm/radeon: Test for fbdev GEM object with generic helper

Replace radeon's test for the fbdev GEM object with a call to the
generic helper.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Move test for fbdev GEM object into generic helper
Thomas Zimmermann [Wed, 4 Mar 2026 12:58:38 +0000 (13:58 +0100)] 
drm/amdgpu: Move test for fbdev GEM object into generic helper

Provide a generic helper that tests if fbdev emulation is backed by
a specific GEM object. Not all drivers use client buffers (yet), hence
also test against the first GEM object in the fbdev framebuffer.

Convert amdgpu. The helper will also be useful for radeon.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14
Yang Wang [Wed, 4 Mar 2026 02:14:10 +0000 (21:14 -0500)] 
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14

add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3

Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/userq: remove queue from doorbell xa during clean up
Sunil Khatri [Tue, 3 Mar 2026 17:00:04 +0000 (22:30 +0530)] 
drm/amdgpu/userq: remove queue from doorbell xa during clean up

If function amdgpu_userq_map_helper fails we do need to clean
up and remove the queue from the userq_doorbell_xa.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/userq: remove queue from doorbell xarray
Sunil Khatri [Tue, 3 Mar 2026 16:55:57 +0000 (22:25 +0530)] 
drm/amdgpu/userq: remove queue from doorbell xarray

In case of failure in xa_alloc, remove the queue during
clean up from the userq_doorbell_xa.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd: Fix NULL pointer dereference in device cleanup
Mario Limonciello [Wed, 4 Mar 2026 20:07:40 +0000 (14:07 -0600)] 
drm/amd: Fix NULL pointer dereference in device cleanup

When GPU initialization fails due to an unsupported HW block
IP blocks may have a NULL version pointer. During cleanup in
amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and
amdgpu_device_set_cg_state which iterate over all IP blocks and access
adev->ip_blocks[i].version without NULL checks, leading to a kernel
NULL pointer dereference.

Add NULL checks for adev->ip_blocks[i].version in both
amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent
dereferencing NULL pointers during GPU teardown when initialization has
failed.

Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13
Yang Wang [Wed, 4 Mar 2026 02:10:11 +0000 (21:10 -0500)] 
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13

add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7

Fixes: cfffd980bf21 ("drm/amd/pm: add zero RPM OD setting support for SMU13")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Fix mutex handling in amdgpu_benchmark_do_move() v3
Srinivasan Shanmugam [Sat, 28 Feb 2026 16:26:40 +0000 (21:56 +0530)] 
drm/amdgpu: Fix mutex handling in amdgpu_benchmark_do_move() v3

amdgpu_benchmark_do_move() can exit the loop early if
amdgpu_copy_buffer() or dma_fence_wait() fails.

In the error path, the function jumps to the exit label
without releasing adev->mman.default_entity.lock, which
leaves the mutex held and results in a lock imbalance.

This can block subsequent users of default_entity and
potentially cause deadlocks.

Move the mutex_unlock() to the common exit path so the
lock is released on both success and error returns.

This fixes:
drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c:57 amdgpu_benchmark_do_move()
warn: inconsistent returns '&adev->mman.default_entity.lock'.

v2:
- Drop unrelated initialization of 'r'
- Keep the change focused on the mutex imbalance fix (Pierre).

v3:
- Removed empty line

Fixes: 30f2daedf4d8 ("drm/amdgpu: add missing lock in amdgpu_benchmark_do_move")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: Avoid overflow when sorting pp_feature list
Asad Kamal [Mon, 2 Mar 2026 05:35:30 +0000 (13:35 +0800)] 
drm/amd/pm: Avoid overflow when sorting pp_feature list

pp_features sorting uses int8_t sort_feature[] to store driver
feature enum indices. On newer ASICs the enum index can exceed 127,
causing signed overflow and silently dropping entries from the output.
Switch the array to int16_t so all enum indices are preserved.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: Add pm firmware info to dmesg log
Lijo Lazar [Wed, 25 Feb 2026 12:23:31 +0000 (17:53 +0530)] 
drm/amd/pm: Add pm firmware info to dmesg log

Add PMFW info to dmesg log for SMUv13 SOCs. It's helpful as diagnostic
data for some driver load issues.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
sguttula [Wed, 25 Feb 2026 08:27:01 +0000 (13:57 +0530)] 
drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox

The reason the RAP is not granting access to 0x58200 is that
a dedicated RSMU slot would have to be spent for this address range,
and MPASP is close to running out of RSMU slots.

This will help to fix PSP TOC load failure during secureboot.
GFX Driver Need to use indirect access for SMN address regs.

Signed-off-by: sguttula <suresh.guttula@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Remove redundant missing hw ip handling
Tvrtko Ursulin [Wed, 7 Jan 2026 12:43:50 +0000 (12:43 +0000)] 
drm/amdgpu: Remove redundant missing hw ip handling

Now that it is guaranteed there can be no entity if there is no hw ip
block we can remove the open coded protection during CS parsing.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: 55414ad5c983 ("drm/amdgpu: error out on entity with no run queue")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Reject impossible entities early
Tvrtko Ursulin [Wed, 7 Jan 2026 12:43:49 +0000 (12:43 +0000)] 
drm/amdgpu: Reject impossible entities early

Currently there are two different behaviour modes when userspace tries to
operate on not present HW IP blocks. On a machine without UVD, VCE and VPE
blocks, this can be observed for example like this:

$ sudo ./amd_fuzzing --r cs-wait-fuzzing
...
amd_cs_wait_fuzzing DRM_IOCTL_AMDGPU_CTX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_GFX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_COMPUTE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_DMA r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD_ENC r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_DEC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_ENC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_JPEG r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VPE r 0

We can see that UVD returns an errno (-EINVAL) from the CS_WAIT ioctl,
while VCE and VPE return unexpected successes.

The difference stems from the fact the UVD is a load balancing engine
which retains the context, so with a workaround implemented in
amdgpu_ctx_init_entity(), but which does not account for the fact hardware
block may not be present.

This causes a single NULL scheduler to be passed to
drm_sched_entity_init(), which immediately rejects this with -EINVAL.

The not present VCE and VPE cases on the other hand pass zero schedulers
to drm_sched_entity_init(), which is explicitly allowed and results in
unusable entities.

As the UVD case however shows, call paths can handle the errors, so we can
consolidate this into a single path which will always return -EINVAL if
the HW IP block is not present.

We do this by rejecting it early and not calling drm_sched_entity_init()
when there is no backing hardware.

This also removes the need for the drm_sched_entity_init() to handle the
zero schedulers and NULL scheduler cases, which means that we can follow
up by removing the special casing from the DRM scheduler.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: f34e8bb7d6c6 ("drm/sched: fix null-ptr-deref in init entity")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/userq: refcount userqueues to avoid any race conditions
Sunil Khatri [Mon, 2 Mar 2026 13:20:46 +0000 (18:50 +0530)] 
drm/amdgpu/userq: refcount userqueues to avoid any race conditions

To avoid race condition and avoid UAF cases, implement kref
based queues and protect the below operations using xa lock
a. Getting a queue from xarray
b. Increment/Decrement it's refcount

Every time some one want to access a queue, always get via
amdgpu_userq_get to make sure we have locks in place and get
the object if active.

A userqueue is destroyed on the last refcount is dropped which
typically would be via IOCTL or during fini.

v2: Add the missing drop in one the condition in the signal ioclt [Alex]

v3: remove the queue from the xarray first in the free queue ioctl path
    [Christian]

- Pass queue to the amdgpu_userq_put directly.
- make amdgpu_userq_put xa_lock free since we are doing put for each get
  only and final put is done via destroy and we remove the queue from xa
  with lock.
- use userq_put in fini too so cleanup is done fully.

v4: Use xa_erase directly rather than doing load and erase in free
    ioctl. Also remove some of the error logs which could be exploited
    by the user to flood the logs [Christian]

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Fix use-after-free race in VM acquire
Alysa Liu [Thu, 5 Feb 2026 16:21:45 +0000 (11:21 -0500)] 
drm/amdgpu: Fix use-after-free race in VM acquire

Replace non-atomic vm->process_info assignment with cmpxchg()
to prevent race when parent/child processes sharing a drm_file
both try to acquire the same VM after fork().

Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
Yang Wang [Thu, 26 Feb 2026 03:51:06 +0000 (22:51 -0500)] 
drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x

v1:
The metrics->EnergyAccumulator field has been deprecated on newer pmfw.

v2:
add smu 13.0.0/13.0.7/13.0.10 support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: adapt sync info func for pmfw eeprom
Gangliang Xie [Tue, 16 Dec 2025 06:31:23 +0000 (14:31 +0800)] 
drm/amd/ras: adapt sync info func for pmfw eeprom

adapt sync info func for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add check func for pmfw eeprom
Gangliang Xie [Mon, 15 Dec 2025 07:54:35 +0000 (15:54 +0800)] 
drm/amd/ras: add check func for pmfw eeprom

add check func for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: GFX12.1 scratch memory limit up to 57-bit
Philip Yang [Thu, 26 Feb 2026 20:15:51 +0000 (15:15 -0500)] 
drm/amdgpu: GFX12.1 scratch memory limit up to 57-bit

The scratch aperture or gmc private aperture in flat memory contains
57 bits of data on gfx v12.1.0 compared to the 32 bits from previous.

Add new helper kfd_init_apertures_v12 for gfx version >= v12.1.0 which
supports 57-bit VA space.

v2:
  - update pdd->scratch_limit (Yu, Lang)
  - update fixes tag (Felix Kuehling)
  - add helper kfd_init_apertures_v12

Fixes: db1882b3ff0c ("drm/amdkfd: Update LDS, Scratch base for 57bit address")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add initialization func for pmfw eeprom
Gangliang Xie [Mon, 15 Dec 2025 07:48:44 +0000 (15:48 +0800)] 
drm/amd/ras: add initialization func for pmfw eeprom

add initialization func for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: adapt page retirement process for pmfw eeprom
Gangliang Xie [Mon, 15 Dec 2025 07:18:35 +0000 (15:18 +0800)] 
drm/amd/ras: adapt page retirement process for pmfw eeprom

read bad page data from pmfw eeprom when retirement
is triggered, use timestamp read from eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add read func for pmfw eeprom
Gangliang Xie [Mon, 15 Dec 2025 06:19:34 +0000 (14:19 +0800)] 
drm/amd/ras: add read func for pmfw eeprom

add read func for pmfw eeprom, and adapt address converting
for bad pages loaded from pmfw eeprom

v2: change label 'Out' to 'out'

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: make MCA IPID parse global
Tao Zhou [Mon, 15 Dec 2025 05:53:59 +0000 (13:53 +0800)] 
drm/amd/ras: make MCA IPID parse global

add a new IPID parse interface for umc, so we can
implement it for each ASIC, and so we can call it
in other blocks

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add append func for pmfw eeprom
Gangliang Xie [Mon, 15 Dec 2025 05:22:55 +0000 (13:22 +0800)] 
drm/amd/ras: add append func for pmfw eeprom

add append func for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add check safety watermark func for pmfw eeprom
Gangliang Xie [Mon, 15 Dec 2025 05:01:04 +0000 (13:01 +0800)] 
drm/amd/ras: add check safety watermark func for pmfw eeprom

add check safety watermark func for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: Add table reset func for pmfw eeprom
Gangliang Xie [Mon, 15 Dec 2025 04:34:41 +0000 (12:34 +0800)] 
drm/amd/ras: Add table reset func for pmfw eeprom

add table reset func for pmfw eeprom, add smu eeprom control
structure

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: remove extra ; from statement, remove extra tabs
Colin Ian King [Sat, 28 Feb 2026 09:59:38 +0000 (09:59 +0000)] 
drm/amd/display: remove extra ; from statement, remove extra tabs

There is a statement that has a ;; at the end, remove the extraneous ;
and remove extra tabs in the code block.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Use get_smn_base in aqua_vanjaram
Lijo Lazar [Tue, 9 Dec 2025 14:13:21 +0000 (19:43 +0530)] 
drm/amdgpu: Use get_smn_base in aqua_vanjaram

Use get_smn_base interface to get IP die instance's base offset in
aqua_vanjaram. encode_ext_smn_addressing is not used.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add smn callbacks to register block
Lijo Lazar [Tue, 9 Dec 2025 12:31:25 +0000 (18:01 +0530)] 
drm/amdgpu: Add smn callbacks to register block

Add smn block to register access and callback interface definition to
get smn base.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Move pcie lock to register block
Lijo Lazar [Tue, 9 Dec 2025 06:02:42 +0000 (11:32 +0530)] 
drm/amdgpu: Move pcie lock to register block

Move pcie register access lock to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add pcie64 extended to register block
Lijo Lazar [Tue, 9 Dec 2025 05:58:45 +0000 (11:28 +0530)] 
drm/amdgpu: Add pcie64 extended to register block

Add extended pcie 64-bit access method to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add pcie64 indirect to register block
Lijo Lazar [Tue, 9 Dec 2025 05:41:41 +0000 (11:11 +0530)] 
drm/amdgpu: Add pcie64 indirect to register block

Move 64-bit pcie indirect read/writes to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add pcie ext access to register block
Lijo Lazar [Tue, 9 Dec 2025 04:39:37 +0000 (10:09 +0530)] 
drm/amdgpu: Add pcie ext access to register block

Move pcie extended access (64-bit address) to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add pcie indirect to register block
Lijo Lazar [Tue, 9 Dec 2025 04:17:13 +0000 (09:47 +0530)] 
drm/amdgpu: Add pcie indirect to register block

Move pcie indirect access to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add pciep method to register block
Lijo Lazar [Tue, 9 Dec 2025 04:05:18 +0000 (09:35 +0530)] 
drm/amdgpu: Add pciep method to register block

Move pcie port method to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add audio method to register block
Lijo Lazar [Mon, 8 Dec 2025 13:41:57 +0000 (19:11 +0530)] 
drm/amdgpu: Add audio method to register block

Move audio endpoint callbacks to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add se cac method to register block
Lijo Lazar [Mon, 8 Dec 2025 13:34:47 +0000 (19:04 +0530)] 
drm/amdgpu: Add se cac method to register block

Move se cac access callbacks to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add gc cac method to register block
Lijo Lazar [Mon, 8 Dec 2025 13:27:47 +0000 (18:57 +0530)] 
drm/amdgpu: Add gc cac method to register block

Move gc cac access callbacks to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add didt method to register block
Lijo Lazar [Mon, 8 Dec 2025 13:22:42 +0000 (18:52 +0530)] 
drm/amdgpu: Add didt method to register block

Move didt callbacks to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Drop redundant syncobj handle limit checks in userq ioctls
Srinivasan Shanmugam [Sun, 1 Mar 2026 12:49:50 +0000 (18:19 +0530)] 
drm/amdgpu: Drop redundant syncobj handle limit checks in userq ioctls

Clang warns that comparing a __u16 value against 65536 is always false.

num_syncobj_handles is defined as __u16 in both the userq signal and
wait ioctl argument structs, so it can never exceed 65535. The checks
against AMDGPU_USERQ_MAX_HANDLES are therefore redundant and trigger
-Wtautological-constant-out-of-range-compare.

Fixes: Clang -Wtautological-constant-out-of-range-compare in userq
signal/wait ioctls

Fixes: d8e760b7996d ("drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_signal")
Fixes: c561d2320492 ("drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_wait")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add wrapper funcs for pmfw eeprom
Gangliang Xie [Fri, 12 Dec 2025 08:15:56 +0000 (16:15 +0800)] 
drm/amd/ras: add wrapper funcs for pmfw eeprom

add wrapper funcs for pmfw eeprom interface to make them
easier to be called

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add uniras smu feature flag init func
Gangliang Xie [Fri, 12 Dec 2025 08:04:30 +0000 (16:04 +0800)] 
drm/amd/ras: add uniras smu feature flag init func

add flag to indicate if pmfw eeprom is supported or
not, and initialize it

v2: change copyright from 2025 to 2026

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: add pmfw eeprom smu interfaces
Gangliang Xie [Fri, 12 Dec 2025 06:16:17 +0000 (14:16 +0800)] 
drm/amd/ras: add pmfw eeprom smu interfaces

add smu interfaces and its data structures for
pmfw eeprom in uniras

v2: add 'const' to smu messages array, and specify
    index for each member when initializing.

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: add feature query interface for uniras
Gangliang Xie [Fri, 12 Dec 2025 07:42:49 +0000 (15:42 +0800)] 
drm/amd/pm: add feature query interface for uniras

add amdgpu_smu_ras_feature_is_enabled to query one feature
is supported or not

v2: change default return value from -EOPNOTSUPP to 0

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: add pmfw eeprom messages into uniras interface
Gangliang Xie [Thu, 11 Dec 2025 10:14:46 +0000 (18:14 +0800)] 
drm/amd/pm: add pmfw eeprom messages into uniras interface

add pmfw eeprom related messages into smu_v13_0_6_ras_send_msg

v2: add sriov check before sending smu commands

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add uvd indirect to register block
Lijo Lazar [Mon, 8 Dec 2025 13:08:52 +0000 (18:38 +0530)] 
drm/amdgpu: Add uvd indirect to register block

Add uvd indirect method to register access block and replace the
existing calls from adev.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Add smc method to register block
Lijo Lazar [Mon, 8 Dec 2025 12:56:09 +0000 (18:26 +0530)] 
drm/amdgpu: Add smc method to register block

Define register access block which consolidates different register access
methods. Add smc method to register access block.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: clear related counter after RAS eeprom reset
Tao Zhou [Sat, 21 Feb 2026 12:11:03 +0000 (20:11 +0800)] 
drm/amdgpu: clear related counter after RAS eeprom reset

Make eeprom data and its counter consistent.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: compatible with specific RAS old eeprom format
Tao Zhou [Sat, 21 Feb 2026 02:48:14 +0000 (10:48 +0800)] 
drm/amdgpu: compatible with specific RAS old eeprom format

Handle RAS eeprom record when UMC_CHANNEL_IDX_V2 is set.

v2: get UMC_CHANNEL_IDX_V2 flag before the clear of it.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_wait
Sunil Khatri [Thu, 26 Feb 2026 15:48:51 +0000 (21:18 +0530)] 
drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_wait

update the type for num_syncobj_handles from __u32 to _u16 with
required padding.

This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_signal
Sunil Khatri [Thu, 26 Feb 2026 15:44:27 +0000 (21:14 +0530)] 
drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_signal

update the type for num_syncobj_handles from __u64 to _u16 with
required padding.

This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu/userq: change queue id type to u32 from int
Sunil Khatri [Thu, 26 Feb 2026 07:35:55 +0000 (13:05 +0530)] 
drm/amdgpu/userq: change queue id type to u32 from int

queue id always remain a positive value and should
be of type unsigned.

With this we also dont need to typecast the id to other
types specially in xarray functions.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Promote DC to 3.2.372
Taimur Hassan [Fri, 20 Feb 2026 22:53:34 +0000 (17:53 -0500)] 
drm/amd/display: Promote DC to 3.2.372

This version brings along the following updates:

- Prevent integer overflow when mhz to khz
- Remove always-false branches
- Remove redundant initializers
- Silence unused variable warning
- Initialize replay_state to PR_STATE_INVALID
- Fallback to boot snapshot for dispclk
- Skip cursor cache reset if hubp powergating is disabled

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Fix static assertion failure issue
YiPeng Chai [Thu, 26 Feb 2026 09:56:26 +0000 (17:56 +0800)] 
drm/amdgpu: Fix static assertion failure issue

Since the PAGE_SIZE is 8KB on sparc64, the size of
structure amdsriov_ras_telemetry will exceed 64KB,
so use absolute value to fix the buffer size.

Fixes the issue:
 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:522:2: error: static
 assertion failed due to requirement 'sizeof(struct
 amdsriov_ras_telemetry) <= 64 << 10': amdsriov_ras_telemetry must be 64 KB
 |  sizeof(struct amdsriov_ras_telemetry) <=
AMD_SRIOV_MSG_RAS_TELEMETRY_SIZE_KB_V1 << 10,
 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:522:40: note:
expression evaluates to '115616 <= 65536'
 |   sizeof(struct amdsriov_ras_telemetry) <=
AMD_SRIOV_MSG_RAS_TELEMETRY_SIZE_KB_V1 << 10,

Fixes: cb48a6b2b61d ("drm/amd/ras: use dedicated memory as vf ras command buffer")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602261700.rVOLIw4l-lkp@intel.com/
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Prevent integer overflow when mhz to khz
Alex Hung [Wed, 18 Feb 2026 17:38:33 +0000 (10:38 -0700)] 
drm/amd/display: Prevent integer overflow when mhz to khz

[WHAT]
Cast to long long before multiplication to prevent overflow
when converting mhz to khz by multiplying by 1000.

This is reported as INTEGER_OVERFLOW errors by Coverity.

Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Remove always-false branches
Alex Hung [Wed, 18 Feb 2026 16:50:15 +0000 (09:50 -0700)] 
drm/amd/display: Remove always-false branches

[WHAT]
program_prealpha_dealpha and hpo_frl_stream_enc_acquired are always
false and all branches depending on them will never be taken.

This is reported as DEADCODE errors by Coverity.

Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Remove redundant initializers
Alex Hung [Wed, 18 Feb 2026 16:30:32 +0000 (09:30 -0700)] 
drm/amd/display: Remove redundant initializers

[WHAT]
Remove unnecessary default value assignments for variables that
are unconditionally assigned before use.

Linux kernel code style prefers no assignments during initialization
when variables are assigned unconditionally as they can obscures
the actual data flow. In addition, compilers will be able to catch them
if variables are used without being updated later in all conditions.

This is reported as UNUSED_VALUE errors by Coverity.

Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Silence unused variable warning
Clay King [Fri, 20 Feb 2026 16:25:47 +0000 (11:25 -0500)] 
drm/amd/display: Silence unused variable warning

[WHY & HOW]
Remove unused dpp_pipe_count variable.

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Initialize replay_state to PR_STATE_INVALID
Ivan Lipski [Wed, 18 Feb 2026 21:19:15 +0000 (16:19 -0500)] 
drm/amd/display: Initialize replay_state to PR_STATE_INVALID

[WHY & HOW]
Initialize the replay_state variable to PR_STATE_INVALID instead of
PR_STATE_0 before retrieving the actual replay state.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Fallback to boot snapshot for dispclk
Dillon Varone [Wed, 18 Feb 2026 19:34:28 +0000 (14:34 -0500)] 
drm/amd/display: Fallback to boot snapshot for dispclk

[WHY & HOW]
If the dentist is unavailable, fallback to reading CLKIP via the boot
snapshot to get the current dispclk.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/pm: fix kernel-doc warning for smu_msg_v1_send_msg()
Yujie Liu [Thu, 26 Feb 2026 03:00:37 +0000 (11:00 +0800)] 
drm/amd/pm: fix kernel-doc warning for smu_msg_v1_send_msg()

Warning: drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:415 expecting prototype for smu_msg_proto_v1_send_msg(). Prototype was for smu_msg_v1_send_msg() instead

Fixes: 4f379370a49c ("drm/amd/pm: Add smu message control block")
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: fix kernel-doc warning for ras_eeprom_append()
Yujie Liu [Thu, 26 Feb 2026 03:00:38 +0000 (11:00 +0800)] 
drm/amd/ras: fix kernel-doc warning for ras_eeprom_append()

Warning: drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_eeprom.c:845 function parameter 'ras_core' not described in 'ras_eeprom_append'
Warning: drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_eeprom.c:845 expecting prototype for ras_core_eeprom_append(). Prototype was for ras_eeprom_append() instead

Fixes: 5c3be5defc92 ("drm/amd/ras: Add eeprom ras functions")
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: fix kernel-doc warning for amdgpu_ttm_alloc_mmio_remap_bo()
Yujie Liu [Thu, 26 Feb 2026 03:00:35 +0000 (11:00 +0800)] 
drm/amdgpu: fix kernel-doc warning for amdgpu_ttm_alloc_mmio_remap_bo()

Warning: drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:1923 expecting prototype for amdgpu_ttm_mmio_remap_bo_init(). Prototype was for amdgpu_ttm_alloc_mmio_remap_bo() instead

Fixes: 96e97a562d06 ("drm/amdgpu: Drop MMIO_REMAP domain bit and keep it Internal")
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/ras: Fix type size of remainder argument
Kees Cook [Wed, 25 Feb 2026 17:47:03 +0000 (09:47 -0800)] 
drm/amd/ras: Fix type size of remainder argument

Forcing an int to be dereferenced at uint64_t for div64_u64_rem() runs
the risk of endian confusion and stack overflowing writes. Seen while
preparing to enable -Warray-bounds globally:

In file included from ../arch/x86/include/asm/processor.h:35,
                 from ../include/linux/sched.h:13,
                 from ../include/linux/ratelimit.h:6,
                 from ../include/linux/dev_printk.h:16,
                 from ../drivers/gpu/drm/amd/amdgpu/../ras/ras_mgr/ras_sys.h:29,
                 from ../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras.h:27,
                 from ../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c:24:
In function 'div64_u64_rem',
    inlined from 'ras_core_convert_timestamp_to_time' at ../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c:72:9:
../include/linux/math64.h:56:20: error: array subscript 'u64 {aka long long unsigned int}[0]' is partly outside array bounds of 'int[1]' [-Werror=array-bounds=]
   56 |         *remainder = dividend % divisor;
      |         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c: In function 'ras_core_convert_timestamp_to_time':
../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c:70:19: note: object 'remaining_seconds' of size 4
   70 |         int days, remaining_seconds;
      |                   ^~~~~~~~~~~~~~~~~

Use a 64-bit type for the remainder calculation, but leave
remaining_seconds as 32-bit to avoid 64-bit division later. The value of
remainder will always be less than seconds_per_day, so there's no
truncation risk.

Fixes: ace232eff50e ("drm/amdgpu: Add ras module files into amdgpu")
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Move register access functions
Lijo Lazar [Mon, 8 Dec 2025 10:11:07 +0000 (15:41 +0530)] 
drm/amdgpu: Move register access functions

Move register access methods from amdgpu_device.c to a dedicated file.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdgpu: Enable DPG support for VCN5
sguttula [Sat, 21 Feb 2026 05:17:59 +0000 (10:47 +0530)] 
drm/amdgpu: Enable DPG support for VCN5

This will set DPG flags for enabling power gating on GFX11_5_4

Signed-off-by: sguttula <suresh.guttula@amd.com>
Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUT
Alex Hung [Fri, 27 Feb 2026 19:30:38 +0000 (12:30 -0700)] 
drm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUT

[WHAT]
Create DEGAMMA properties even if color pipeline is enabled, and enforce
the mutual exclusion in atomic check by rejecting any commit that
attempts to enable both COLOR_PIPELINE on the plane and DEGAMMA_LUT on
the CRTC simultaneously.

Fixes: 18a4127e9315 ("drm/amd/display: Disable CRTC degamma when color pipeline is enabled")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4963
Reviewed-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amd/display: Use mpc.preblend flag to indicate 3D LUT
Alex Hung [Fri, 27 Feb 2026 19:26:04 +0000 (12:26 -0700)] 
drm/amd/display: Use mpc.preblend flag to indicate 3D LUT

[WHAT]
New ASIC's 3D LUT is indicated by mpc.preblend.

Fixes: 0de2b1afea8d ("drm/amd/display: add 3D LUT colorop")
Reviewed-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 months agodrm/amdkfd: fix CWSR trap handler
Alex Deucher [Thu, 26 Feb 2026 16:18:29 +0000 (11:18 -0500)] 
drm/amdkfd: fix CWSR trap handler

Fix up what looks like a bad merge.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd: Disable MES LR compute W/A
Mario Limonciello [Wed, 25 Feb 2026 16:51:16 +0000 (10:51 -0600)] 
drm/amd: Disable MES LR compute W/A

A workaround was introduced in commit 1fb710793ce2 ("drm/amdgpu: Enable
MES lr_compute_wa by default") to help with some hangs observed in gfx1151.

This WA didn't fully fix the issue.  It was actually fixed by adjusting
the VGPR size to the correct value that matched the hardware in commit
b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151").

There are reports of instability on other products with newer GC microcode
versions, and I believe they're caused by this workaround. As we don't
need the workaround any more, remove it.

Fixes: b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Fix error handling in slot reset
Lijo Lazar [Tue, 24 Feb 2026 04:48:51 +0000 (10:18 +0530)] 
drm/amdgpu: Fix error handling in slot reset

If the device has not recovered after slot reset is called, it goes to
out label for error handling. There it could make decision based on
uninitialized hive pointer and could result in accessing an uninitialized
list.

Initialize the list and hive properly so that it handles the error
situation and also releases the reset domain lock which is acquired
during error_detected callback.

Fixes: 732c6cefc1ec ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/ras: Handle check address validity in SR-IOV
Jinzhou Su [Tue, 10 Feb 2026 06:49:35 +0000 (14:49 +0800)] 
drm/amd/ras: Handle check address validity in SR-IOV

Handle check address validity command in SR-IOV
guest.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/vcn5: Add SMU dpm interface type
sguttula [Sat, 21 Feb 2026 04:33:32 +0000 (10:03 +0530)] 
drm/amdgpu/vcn5: Add SMU dpm interface type

This will set AMDGPU_VCN_SMU_DPM_INTERFACE_* smu_type
based on soc type and fixing ring timeout issue seen
for DPM enabled case.

Signed-off-by: sguttula <suresh.guttula@amd.com>
Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/ras: Add function to convert retired address
Jinzhou Su [Tue, 10 Feb 2026 06:36:48 +0000 (14:36 +0800)] 
drm/amd/ras: Add function to convert retired address

Add function to convert retired address in SR-IOV
guest.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/ras: Handle address check in SR-IOV guest
Jinzhou Su [Tue, 10 Feb 2026 06:32:40 +0000 (14:32 +0800)] 
drm/amd/ras: Handle address check in SR-IOV guest

Handle address check validity command in SR-IOV guest

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/ras: Add convert retired address structure
Jinzhou Su [Mon, 9 Feb 2026 07:42:25 +0000 (15:42 +0800)] 
drm/amd/ras: Add convert retired address structure

Add convert retired address command and structure
for uniras.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/ras: Add address check structure
Jinzhou Su [Tue, 10 Feb 2026 07:46:13 +0000 (15:46 +0800)] 
drm/amd/ras: Add address check structure

Add address check command and data structure
for uniras.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Fix locking bugs in error paths
Bart Van Assche [Mon, 23 Feb 2026 21:50:23 +0000 (13:50 -0800)] 
drm/amdgpu: Fix locking bugs in error paths

Do not unlock psp->ras_context.mutex if it has not been locked. This has
been detected by the Clang thread-safety analyzer.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Fixes: b3fb79cda568 ("drm/amdgpu: add mutex to protect ras shared memory")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Unlock a mutex before destroying it
Bart Van Assche [Mon, 23 Feb 2026 22:00:07 +0000 (14:00 -0800)] 
drm/amdgpu: Unlock a mutex before destroying it

Mutexes must be unlocked before these are destroyed. This has been detected
by the Clang thread-safety analyzer.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework")
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Use GFP_ATOMIC in dc_create_stream_for_sink
Natalie Vock [Mon, 23 Feb 2026 11:45:37 +0000 (12:45 +0100)] 
drm/amd/display: Use GFP_ATOMIC in dc_create_stream_for_sink

This can be called while preemption is disabled, for example by
dcn32_internal_validate_bw which is called with the FPU active.

Fixes "BUG: scheduling while atomic" messages I encounter on my Navi31
machine.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: add upper bound check on user inputs in wait ioctl
Sunil Khatri [Tue, 24 Feb 2026 06:43:09 +0000 (12:13 +0530)] 
drm/amdgpu: add upper bound check on user inputs in wait ioctl

Huge input values in amdgpu_userq_wait_ioctl can lead to a OOM and
could be exploited.

So check these input value against AMDGPU_USERQ_MAX_HANDLES
which is big enough value for genuine use cases and could
potentially avoid OOM.

v2: squash in Srini's fix

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: add upper bound check on user inputs in signal ioctl
Sunil Khatri [Fri, 20 Feb 2026 08:17:58 +0000 (13:47 +0530)] 
drm/amdgpu: add upper bound check on user inputs in signal ioctl

Huge input values in amdgpu_userq_signal_ioctl can lead to a OOM and
could be exploited.

So check these input value against AMDGPU_USERQ_MAX_HANDLES
which is big enough value for genuine use cases and could
potentially avoid OOM.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_wait_ioctl
Tvrtko Ursulin [Mon, 23 Feb 2026 12:41:34 +0000 (12:41 +0000)] 
drm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_wait_ioctl

Use the existing helper instead of open coding it

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatrti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_signal_ioctl
Tvrtko Ursulin [Mon, 23 Feb 2026 12:41:33 +0000 (12:41 +0000)] 
drm/amdgpu/userq: Use drm_gem_objects_lookup in amdgpu_userq_signal_ioctl

Use the existing helper instead of open coding it.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatrti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/ras: use dedicated memory as vf ras command buffer
YiPeng Chai [Mon, 9 Feb 2026 08:29:57 +0000 (16:29 +0800)] 
drm/amd/ras: use dedicated memory as vf ras command buffer

Use dedicated memory as vf ras command buffer.

V2:
  Add lock to ensure serialization of sending vf ras commands.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Jinzhou Su <jinzhou.su@amd.com>
Tested-by: Jinzhou Su <jinzhou.su@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdkfd: Removed commented line for MQD queue priority
Andrew Martin [Mon, 23 Feb 2026 21:08:16 +0000 (16:08 -0500)] 
drm/amdkfd: Removed commented line for MQD queue priority

Missed deleting the commented line in the original patch.

Fixes: 73463e26f7e2 ("drm/amdkfd: Disable MQD queue priority")
Signed-off-by: Andrew Martin <andrew.martin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/userq: Consolidate wait ioctl exit path
Tvrtko Ursulin [Mon, 23 Feb 2026 12:41:32 +0000 (12:41 +0000)] 
drm/amdgpu/userq: Consolidate wait ioctl exit path

If we gate the fence destruction with a check telling us whether there are
valid pointers in there we can eliminate the need for dual, basically
identical, exit paths.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/userq: Do not allow userspace to trivially triger kernel warnings
Tvrtko Ursulin [Mon, 23 Feb 2026 12:41:31 +0000 (12:41 +0000)] 
drm/amdgpu/userq: Do not allow userspace to trivially triger kernel warnings

Userspace can either deliberately pass in the too small num_fences, or the
required number can legitimately grow between the two calls to the userq
wait ioctl. In both cases we do not want the emit the kernel warning
backtrace since nothing is wrong with the kernel and userspace will simply
get an errno reported back. So lets simply drop the WARN_ONs.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL")
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/userq: Fix reference leak in amdgpu_userq_wait_ioctl
Tvrtko Ursulin [Mon, 23 Feb 2026 12:41:30 +0000 (12:41 +0000)] 
drm/amdgpu/userq: Fix reference leak in amdgpu_userq_wait_ioctl

Drop reference to syncobj and timeline fence when aborting the ioctl due
output array being too small.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL")
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Fix kdoc formatting in dcn42_hwseq.c
Srinivasan Shanmugam [Mon, 23 Feb 2026 13:44:41 +0000 (19:14 +0530)] 
drm/amd/display: Fix kdoc formatting in dcn42_hwseq.c

Kernel-doc requires all lines within a documentation
comment to start with " *". The previous empty line
caused a "bad line" warning during build.

Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Mario Limonciello <superm1@kernel.org>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Cc: Roman Li <roman.li@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/userq: Use memdup_array_user in amdgpu_userq_signal_ioctl
Tvrtko Ursulin [Fri, 5 Dec 2025 13:40:30 +0000 (13:40 +0000)] 
drm/amdgpu/userq: Use memdup_array_user in amdgpu_userq_signal_ioctl

Use the existing helper instead of multiplying the size.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/userq: Use memdup_array_user in amdgpu_userq_wait_ioctl
Tvrtko Ursulin [Fri, 5 Dec 2025 13:40:29 +0000 (13:40 +0000)] 
drm/amdgpu/userq: Use memdup_array_user in amdgpu_userq_wait_ioctl

Use the existing helper instead of multiplying the size.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/sdma7.1: adjust SDMA limits
Alex Deucher [Thu, 19 Feb 2026 15:48:39 +0000 (10:48 -0500)] 
drm/amdgpu/sdma7.1: adjust SDMA limits

SDMA 7.1 has increased transfer limits.

Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>