Yang Wang [Wed, 4 Mar 2026 23:45:45 +0000 (18:45 -0500)]
drm/amdgpu: fix gpu idle power consumption issue for gfx v12
Older versions of the MES firmware may cause abnormal GPU power consumption.
When performing inference tasks on the GPU (e.g., with Ollama using ROCm),
the GPU may show abnormal power consumption in idle state and incorrect GPU load information.
This issue has been fixed in firmware version 0x8b and newer.
Closes: https://github.com/ROCm/ROCm/issues/5706 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Joshua Peisach [Tue, 3 Mar 2026 21:18:22 +0000 (16:18 -0500)]
drm/amdgpu/amdgpu_connectors: use struct drm_edid instead of struct edid
Some amdgpu code is still using deprecated edid functions. Switch to
the newer functions and update the amdgpu_connector struct's edid type
to the drm_edid type.
At the same time, use the raw EDID when we need to for speaker
allocations and for determining if the input is digital.
Signed-off-by: Joshua Peisach <jpeisach@ubuntu.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Fix a few more NULL pointer dereference in device cleanup
I found a few more paths that cleanup fails due to a NULL version pointer
on unsupported hardware.
Add NULL checks as applicable.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/radeon: Test for fbdev GEM object with generic helper
Replace radeon's test for the fbdev GEM object with a call to the
generic helper.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Move test for fbdev GEM object into generic helper
Provide a generic helper that tests if fbdev emulation is backed by
a specific GEM object. Not all drivers use client buffers (yet), hence
also test against the first GEM object in the fbdev framebuffer.
Convert amdgpu. The helper will also be useful for radeon.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Wed, 4 Mar 2026 02:14:10 +0000 (21:14 -0500)]
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3
Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 3 Mar 2026 17:00:04 +0000 (22:30 +0530)]
drm/amdgpu/userq: remove queue from doorbell xa during clean up
If function amdgpu_userq_map_helper fails we do need to clean
up and remove the queue from the userq_doorbell_xa.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 3 Mar 2026 16:55:57 +0000 (22:25 +0530)]
drm/amdgpu/userq: remove queue from doorbell xarray
In case of failure in xa_alloc, remove the queue during
clean up from the userq_doorbell_xa.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Fix NULL pointer dereference in device cleanup
When GPU initialization fails due to an unsupported HW block
IP blocks may have a NULL version pointer. During cleanup in
amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and
amdgpu_device_set_cg_state which iterate over all IP blocks and access
adev->ip_blocks[i].version without NULL checks, leading to a kernel
NULL pointer dereference.
Add NULL checks for adev->ip_blocks[i].version in both
amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent
dereferencing NULL pointers during GPU teardown when initialization has
failed.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Wed, 4 Mar 2026 02:10:11 +0000 (21:10 -0500)]
drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7
Fixes: cfffd980bf21 ("drm/amd/pm: add zero RPM OD setting support for SMU13") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Fix mutex handling in amdgpu_benchmark_do_move() v3
amdgpu_benchmark_do_move() can exit the loop early if
amdgpu_copy_buffer() or dma_fence_wait() fails.
In the error path, the function jumps to the exit label
without releasing adev->mman.default_entity.lock, which
leaves the mutex held and results in a lock imbalance.
This can block subsequent users of default_entity and
potentially cause deadlocks.
Move the mutex_unlock() to the common exit path so the
lock is released on both success and error returns.
This fixes:
drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c:57 amdgpu_benchmark_do_move()
warn: inconsistent returns '&adev->mman.default_entity.lock'.
v2:
- Drop unrelated initialization of 'r'
- Keep the change focused on the mutex imbalance fix (Pierre).
v3:
- Removed empty line
Fixes: 30f2daedf4d8 ("drm/amdgpu: add missing lock in amdgpu_benchmark_do_move") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 2 Mar 2026 05:35:30 +0000 (13:35 +0800)]
drm/amd/pm: Avoid overflow when sorting pp_feature list
pp_features sorting uses int8_t sort_feature[] to store driver
feature enum indices. On newer ASICs the enum index can exceed 127,
causing signed overflow and silently dropping entries from the output.
Switch the array to int16_t so all enum indices are preserved.
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sguttula [Wed, 25 Feb 2026 08:27:01 +0000 (13:57 +0530)]
drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
The reason the RAP is not granting access to 0x58200 is that
a dedicated RSMU slot would have to be spent for this address range,
and MPASP is close to running out of RSMU slots.
This will help to fix PSP TOC load failure during secureboot.
GFX Driver Need to use indirect access for SMN address regs.
Tvrtko Ursulin [Wed, 7 Jan 2026 12:43:50 +0000 (12:43 +0000)]
drm/amdgpu: Remove redundant missing hw ip handling
Now that it is guaranteed there can be no entity if there is no hw ip
block we can remove the open coded protection during CS parsing.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: 55414ad5c983 ("drm/amdgpu: error out on entity with no run queue") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tvrtko Ursulin [Wed, 7 Jan 2026 12:43:49 +0000 (12:43 +0000)]
drm/amdgpu: Reject impossible entities early
Currently there are two different behaviour modes when userspace tries to
operate on not present HW IP blocks. On a machine without UVD, VCE and VPE
blocks, this can be observed for example like this:
$ sudo ./amd_fuzzing --r cs-wait-fuzzing
...
amd_cs_wait_fuzzing DRM_IOCTL_AMDGPU_CTX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_GFX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_COMPUTE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_DMA r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD_ENC r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_DEC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_ENC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_JPEG r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VPE r 0
We can see that UVD returns an errno (-EINVAL) from the CS_WAIT ioctl,
while VCE and VPE return unexpected successes.
The difference stems from the fact the UVD is a load balancing engine
which retains the context, so with a workaround implemented in
amdgpu_ctx_init_entity(), but which does not account for the fact hardware
block may not be present.
This causes a single NULL scheduler to be passed to
drm_sched_entity_init(), which immediately rejects this with -EINVAL.
The not present VCE and VPE cases on the other hand pass zero schedulers
to drm_sched_entity_init(), which is explicitly allowed and results in
unusable entities.
As the UVD case however shows, call paths can handle the errors, so we can
consolidate this into a single path which will always return -EINVAL if
the HW IP block is not present.
We do this by rejecting it early and not calling drm_sched_entity_init()
when there is no backing hardware.
This also removes the need for the drm_sched_entity_init() to handle the
zero schedulers and NULL scheduler cases, which means that we can follow
up by removing the special casing from the DRM scheduler.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: f34e8bb7d6c6 ("drm/sched: fix null-ptr-deref in init entity") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Mon, 2 Mar 2026 13:20:46 +0000 (18:50 +0530)]
drm/amdgpu/userq: refcount userqueues to avoid any race conditions
To avoid race condition and avoid UAF cases, implement kref
based queues and protect the below operations using xa lock
a. Getting a queue from xarray
b. Increment/Decrement it's refcount
Every time some one want to access a queue, always get via
amdgpu_userq_get to make sure we have locks in place and get
the object if active.
A userqueue is destroyed on the last refcount is dropped which
typically would be via IOCTL or during fini.
v2: Add the missing drop in one the condition in the signal ioclt [Alex]
v3: remove the queue from the xarray first in the free queue ioctl path
[Christian]
- Pass queue to the amdgpu_userq_put directly.
- make amdgpu_userq_put xa_lock free since we are doing put for each get
only and final put is done via destroy and we remove the queue from xa
with lock.
- use userq_put in fini too so cleanup is done fully.
v4: Use xa_erase directly rather than doing load and erase in free
ioctl. Also remove some of the error logs which could be exploited
by the user to flood the logs [Christian]
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alysa Liu [Thu, 5 Feb 2026 16:21:45 +0000 (11:21 -0500)]
drm/amdgpu: Fix use-after-free race in VM acquire
Replace non-atomic vm->process_info assignment with cmpxchg()
to prevent race when parent/child processes sharing a drm_file
both try to acquire the same VM after fork().
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Thu, 26 Feb 2026 03:51:06 +0000 (22:51 -0500)]
drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
v1:
The metrics->EnergyAccumulator field has been deprecated on newer pmfw.
v2:
add smu 13.0.0/13.0.7/13.0.10 support.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: db1882b3ff0c ("drm/amdkfd: Update LDS, Scratch base for 57bit address") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Colin Ian King [Sat, 28 Feb 2026 09:59:38 +0000 (09:59 +0000)]
drm/amd/display: remove extra ; from statement, remove extra tabs
There is a statement that has a ;; at the end, remove the extraneous ;
and remove extra tabs in the code block.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Drop redundant syncobj handle limit checks in userq ioctls
Clang warns that comparing a __u16 value against 65536 is always false.
num_syncobj_handles is defined as __u16 in both the userq signal and
wait ioctl argument structs, so it can never exceed 65535. The checks
against AMDGPU_USERQ_MAX_HANDLES are therefore redundant and trigger
-Wtautological-constant-out-of-range-compare.
Fixes: Clang -Wtautological-constant-out-of-range-compare in userq
signal/wait ioctls
Fixes: d8e760b7996d ("drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_signal") Fixes: c561d2320492 ("drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_wait") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gangliang Xie [Fri, 12 Dec 2025 06:16:17 +0000 (14:16 +0800)]
drm/amd/ras: add pmfw eeprom smu interfaces
add smu interfaces and its data structures for
pmfw eeprom in uniras
v2: add 'const' to smu messages array, and specify
index for each member when initializing.
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gangliang Xie [Fri, 12 Dec 2025 07:42:49 +0000 (15:42 +0800)]
drm/amd/pm: add feature query interface for uniras
add amdgpu_smu_ras_feature_is_enabled to query one feature
is supported or not
v2: change default return value from -EOPNOTSUPP to 0
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gangliang Xie [Thu, 11 Dec 2025 10:14:46 +0000 (18:14 +0800)]
drm/amd/pm: add pmfw eeprom messages into uniras interface
add pmfw eeprom related messages into smu_v13_0_6_ras_send_msg
v2: add sriov check before sending smu commands
Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Thu, 26 Feb 2026 15:48:51 +0000 (21:18 +0530)]
drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_wait
update the type for num_syncobj_handles from __u32 to _u16 with
required padding.
This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Thu, 26 Feb 2026 15:44:27 +0000 (21:14 +0530)]
drm/amdgpu: update type for num_syncobj_handles in drm_amdgpu_userq_signal
update the type for num_syncobj_handles from __u64 to _u16 with
required padding.
This breaks the UAPI for big-endian platforms but this is deliberate
and harmless since userqueues is still a beta feature. It is enabled
via module parameter and need the right fw support to work.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 20 Feb 2026 22:53:34 +0000 (17:53 -0500)]
drm/amd/display: Promote DC to 3.2.372
This version brings along the following updates:
- Prevent integer overflow when mhz to khz
- Remove always-false branches
- Remove redundant initializers
- Silence unused variable warning
- Initialize replay_state to PR_STATE_INVALID
- Fallback to boot snapshot for dispclk
- Skip cursor cache reset if hubp powergating is disabled
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Wed, 18 Feb 2026 17:38:33 +0000 (10:38 -0700)]
drm/amd/display: Prevent integer overflow when mhz to khz
[WHAT]
Cast to long long before multiplication to prevent overflow
when converting mhz to khz by multiplying by 1000.
This is reported as INTEGER_OVERFLOW errors by Coverity.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Wed, 18 Feb 2026 16:50:15 +0000 (09:50 -0700)]
drm/amd/display: Remove always-false branches
[WHAT]
program_prealpha_dealpha and hpo_frl_stream_enc_acquired are always
false and all branches depending on them will never be taken.
This is reported as DEADCODE errors by Coverity.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Wed, 18 Feb 2026 16:30:32 +0000 (09:30 -0700)]
drm/amd/display: Remove redundant initializers
[WHAT]
Remove unnecessary default value assignments for variables that
are unconditionally assigned before use.
Linux kernel code style prefers no assignments during initialization
when variables are assigned unconditionally as they can obscures
the actual data flow. In addition, compilers will be able to catch them
if variables are used without being updated later in all conditions.
This is reported as UNUSED_VALUE errors by Coverity.
Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Clay King <clayking@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Wed, 18 Feb 2026 21:19:15 +0000 (16:19 -0500)]
drm/amd/display: Initialize replay_state to PR_STATE_INVALID
[WHY & HOW]
Initialize the replay_state variable to PR_STATE_INVALID instead of
PR_STATE_0 before retrieving the actual replay state.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Wed, 18 Feb 2026 19:34:28 +0000 (14:34 -0500)]
drm/amd/display: Fallback to boot snapshot for dispclk
[WHY & HOW]
If the dentist is unavailable, fallback to reading CLKIP via the boot
snapshot to get the current dispclk.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yujie Liu [Thu, 26 Feb 2026 03:00:37 +0000 (11:00 +0800)]
drm/amd/pm: fix kernel-doc warning for smu_msg_v1_send_msg()
Warning: drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:415 expecting prototype for smu_msg_proto_v1_send_msg(). Prototype was for smu_msg_v1_send_msg() instead
Fixes: 4f379370a49c ("drm/amd/pm: Add smu message control block") Signed-off-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yujie Liu [Thu, 26 Feb 2026 03:00:38 +0000 (11:00 +0800)]
drm/amd/ras: fix kernel-doc warning for ras_eeprom_append()
Warning: drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_eeprom.c:845 function parameter 'ras_core' not described in 'ras_eeprom_append'
Warning: drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_eeprom.c:845 expecting prototype for ras_core_eeprom_append(). Prototype was for ras_eeprom_append() instead
Fixes: 5c3be5defc92 ("drm/amd/ras: Add eeprom ras functions") Signed-off-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yujie Liu [Thu, 26 Feb 2026 03:00:35 +0000 (11:00 +0800)]
drm/amdgpu: fix kernel-doc warning for amdgpu_ttm_alloc_mmio_remap_bo()
Warning: drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:1923 expecting prototype for amdgpu_ttm_mmio_remap_bo_init(). Prototype was for amdgpu_ttm_alloc_mmio_remap_bo() instead
Fixes: 96e97a562d06 ("drm/amdgpu: Drop MMIO_REMAP domain bit and keep it Internal") Signed-off-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kees Cook [Wed, 25 Feb 2026 17:47:03 +0000 (09:47 -0800)]
drm/amd/ras: Fix type size of remainder argument
Forcing an int to be dereferenced at uint64_t for div64_u64_rem() runs
the risk of endian confusion and stack overflowing writes. Seen while
preparing to enable -Warray-bounds globally:
In file included from ../arch/x86/include/asm/processor.h:35,
from ../include/linux/sched.h:13,
from ../include/linux/ratelimit.h:6,
from ../include/linux/dev_printk.h:16,
from ../drivers/gpu/drm/amd/amdgpu/../ras/ras_mgr/ras_sys.h:29,
from ../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras.h:27,
from ../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c:24:
In function 'div64_u64_rem',
inlined from 'ras_core_convert_timestamp_to_time' at ../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c:72:9:
../include/linux/math64.h:56:20: error: array subscript 'u64 {aka long long unsigned int}[0]' is partly outside array bounds of 'int[1]' [-Werror=array-bounds=]
56 | *remainder = dividend % divisor;
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c: In function 'ras_core_convert_timestamp_to_time':
../drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_core.c:70:19: note: object 'remaining_seconds' of size 4
70 | int days, remaining_seconds;
| ^~~~~~~~~~~~~~~~~
Use a 64-bit type for the remainder calculation, but leave
remaining_seconds as 32-bit to avoid 64-bit division later. The value of
remainder will always be less than seconds_per_day, so there's no
truncation risk.
Fixes: ace232eff50e ("drm/amdgpu: Add ras module files into amdgpu") Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Fri, 27 Feb 2026 19:30:38 +0000 (12:30 -0700)]
drm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUT
[WHAT]
Create DEGAMMA properties even if color pipeline is enabled, and enforce
the mutual exclusion in atomic check by rejecting any commit that
attempts to enable both COLOR_PIPELINE on the plane and DEGAMMA_LUT on
the CRTC simultaneously.
Fixes: 18a4127e9315 ("drm/amd/display: Disable CRTC degamma when color pipeline is enabled") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4963 Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A workaround was introduced in commit 1fb710793ce2 ("drm/amdgpu: Enable
MES lr_compute_wa by default") to help with some hangs observed in gfx1151.
This WA didn't fully fix the issue. It was actually fixed by adjusting
the VGPR size to the correct value that matched the hardware in commit b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151").
There are reports of instability on other products with newer GC microcode
versions, and I believe they're caused by this workaround. As we don't
need the workaround any more, remove it.
Fixes: b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lijo Lazar [Tue, 24 Feb 2026 04:48:51 +0000 (10:18 +0530)]
drm/amdgpu: Fix error handling in slot reset
If the device has not recovered after slot reset is called, it goes to
out label for error handling. There it could make decision based on
uninitialized hive pointer and could result in accessing an uninitialized
list.
Initialize the list and hive properly so that it handles the error
situation and also releases the reset domain lock which is acquired
during error_detected callback.
Fixes: 732c6cefc1ec ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bart Van Assche [Mon, 23 Feb 2026 22:00:07 +0000 (14:00 -0800)]
drm/amdgpu: Unlock a mutex before destroying it
Mutexes must be unlocked before these are destroyed. This has been detected
by the Clang thread-safety analyzer.
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Tue, 24 Feb 2026 06:43:09 +0000 (12:13 +0530)]
drm/amdgpu: add upper bound check on user inputs in wait ioctl
Huge input values in amdgpu_userq_wait_ioctl can lead to a OOM and
could be exploited.
So check these input value against AMDGPU_USERQ_MAX_HANDLES
which is big enough value for genuine use cases and could
potentially avoid OOM.
v2: squash in Srini's fix
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sunil Khatri [Fri, 20 Feb 2026 08:17:58 +0000 (13:47 +0530)]
drm/amdgpu: add upper bound check on user inputs in signal ioctl
Huge input values in amdgpu_userq_signal_ioctl can lead to a OOM and
could be exploited.
So check these input value against AMDGPU_USERQ_MAX_HANDLES
which is big enough value for genuine use cases and could
potentially avoid OOM.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Andrew Martin [Mon, 23 Feb 2026 21:08:16 +0000 (16:08 -0500)]
drm/amdkfd: Removed commented line for MQD queue priority
Missed deleting the commented line in the original patch.
Fixes: 73463e26f7e2 ("drm/amdkfd: Disable MQD queue priority") Signed-off-by: Andrew Martin <andrew.martin@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If we gate the fence destruction with a check telling us whether there are
valid pointers in there we can eliminate the need for dual, basically
identical, exit paths.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tvrtko Ursulin [Mon, 23 Feb 2026 12:41:31 +0000 (12:41 +0000)]
drm/amdgpu/userq: Do not allow userspace to trivially triger kernel warnings
Userspace can either deliberately pass in the too small num_fences, or the
required number can legitimately grow between the two calls to the userq
wait ioctl. In both cases we do not want the emit the kernel warning
backtrace since nothing is wrong with the kernel and userspace will simply
get an errno reported back. So lets simply drop the WARN_ONs.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix kdoc formatting in dcn42_hwseq.c
Kernel-doc requires all lines within a documentation
comment to start with " *". The previous empty line
caused a "bad line" warning during build.
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Mario Limonciello <superm1@kernel.org> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>