Xiang Liu [Fri, 29 May 2026 14:11:26 +0000 (22:11 +0800)]
drm/amd/ras: chunk UNIRAS CPER debugfs reads
Legacy CPER ring readers can issue one debugfs read with a buffer larger
than the UNIRAS RAS command payload limit. Passing that full size to
GET_CPER_RECORD makes the command reject the request, so userspace may
only see the ring prefix and treat the CPER stream as empty.
Commit 3c88fb7aa57d ("drm/amd/ras: bound CPER record fetch buffer
size") intentionally bounds CPER record fetch allocation by the command
buffer size. Keep the debugfs ABI as a single contiguous ring read by
splitting the internal GET_CPER_RECORD requests into
RAS_CMD_MAX_CPER_BUF_SZ chunks.
Accumulate the copied payload and update the legacy header write pointers
from the total bytes returned to userspace.
Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Tue, 26 May 2026 02:25:26 +0000 (10:25 +0800)]
drm/amdgpu: improve the userq seq BO free bit lookup
Use find_next_zero_bit() to locate the next free seq slot bit
instead of the current walk, for more efficient bitmap scanning.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yishai Hadas [Mon, 25 May 2026 14:21:36 +0000 (17:21 +0300)]
RDMA/core: Validate cpu_id against nr_cpu_ids in DMAH alloc
The cpu_id attribute supplied by user space through
UVERBS_ATTR_ALLOC_DMAH_CPU_ID is passed directly to cpumask_test_cpu()
without first verifying that the value is within the valid CPU range.
Passing such untrusted data to cpumask_test_cpu() may lead to an
out-of-bounds read of the underlying cpumask bitmap: the helper expands
to a test_bit() that indexes the bitmap by cpu_id / BITS_PER_LONG with
no bound check.
In addition, on kernels built with CONFIG_DEBUG_PER_CPU_MAPS it trips
the WARN_ON_ONCE() in cpumask_check(); combined with panic_on_warn this
turns a bad user input into a machine reboot.
Reject any cpu_id that is not smaller than nr_cpu_ids with -EINVAL
before it is used.
Reported by Smatch.
Fixes: d83edab562a4 ("RDMA/core: Introduce a DMAH object and its alloc/free APIs") Link: https://patch.msgid.link/r/20260525142136.28165-1-yishaih@nvidia.com Cc: stable@vger.kernel.org Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/ag68qoAW3P04J7pT@stanley.mountain/ Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
_PR3 detection was changed in commit 134b8c5d8674 ("drm/amd: Fix
detection of _PR3 on the PCIe root port") to look at the root port
of the topology containing the GPU. This however was too far because
it ignored whether or not all the intermediary bridges could power
off the device. The original design in commit b10c1c5b3a4e ("drm/amdgpu:
add check for ACPI power resources") was too narrow because it matched
the switches internal to the GPU.
Use the goldilocks approach and look for the first bridge outside of the
GPU and check for _PR3 on that device.
Fixes: 134b8c5d8674 ("drm/amd: Fix detection of _PR3 on the PCIe root port") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chenglei Xie [Thu, 7 May 2026 14:29:10 +0000 (10:29 -0400)]
drm/amdgpu: grow VF RAS bad page table with bounded dynamic alloc
The VF RAS error handler used fixed-size bps[] / bps_bo[] arrays (512
slots). When the PF2VF bad-page block listed more entries than fit,
amdgpu_virt_ras_add_bps() could memcpy() past the end of those arrays.
Replace the fixed backing store with a dynamically grown table:
- Add capacity to track allocated slots separately from count.
- Start at 512 slots and realloc bps / bps_bo together when full.
- Refuse growth beyond maximum EEPROM record limit (AMDGPU_VIRT_RAS_BAD_PAGE_TABLE_MAX_CAPACITY).
- Return failure from amdgpu_virt_ras_add_bps() and stop processing
the PF2VF block if allocation fails or the cap is reached.
Sunil Khatri [Mon, 25 May 2026 07:48:00 +0000 (13:18 +0530)]
drm/amdgpu/userq: remove the vital queue unmap logging
Mesa userqueues free does not wait for the free to complete and go ahead
in unmapping the vital bos while kernel is still in queue free and
corresponding cleanup.
So ideally we don't need the logging for that and hence remove the warn
message as this is expected behaviour and functionally, we are making
sure to wait for the required fences before unmap.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Andrew Martin [Thu, 28 May 2026 16:54:39 +0000 (12:54 -0400)]
drm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11
The v11 MQD manager incorrectly assigned the CP-compute variants of
checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These functions
use sizeof(struct v11_compute_mqd) (2048 bytes) instead of sizeof(struct
v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow.
During CRIU checkpoint of an SDMA queue on Navi3x:
- checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer,
leaking 1536 bytes of adjacent GTT memory to userspace
During CRIU restore:
- restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer,
corrupting 1536 bytes of adjacent GTT memory (often the ring buffer
or neighboring MQDs)
This is a copy-paste regression unique to v11. All other ASIC backends
(cik, vi, v9, v10, v12) correctly use the SDMA-specific variants.
Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that properly
handle the smaller v11_sdma_mqd structure, matching the pattern used in
other MQD managers.
Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") Assisted-by: Claude:Sonnet 4-5 Signed-off-by: Andrew Martin <andrew.martin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Muhammad Bilal [Sat, 23 May 2026 16:56:46 +0000 (16:56 +0000)]
drm/amdkfd: fix NULL dereference in get_queue_ids()
When usr_queue_id_array is NULL and num_queues is non-zero,
get_queue_ids() returns NULL. The callers check only IS_ERR() on the
return value; since IS_ERR(NULL) == false the check passes, and
suspend_queues() calls q_array_invalidate() which immediately
dereferences NULL while iterating num_queues times.
Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying
num_queues > 0 with a zero queue_array_ptr, causing a kernel panic.
A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op
(q_array_invalidate never executes, and resume_queues already guards
all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL)
only when num_queues is non-zero and the pointer is absent; both callers
already propagate IS_ERR() returns correctly to userspace.
Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation") Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: widen FRL debug knobs to unsigned int
force_frl_rate, select_ffe and limit_ffe in dc_debug_options carry
non-negative configuration values: an FRL link-rate enum (0..0xF), an
FFE level selector and an FFE level limit. They are only ever compared
against 0/0xF, assigned, or cast to uint8_t before being written to
hardware. No call site relies on signed semantics.
Make the types unsigned int to match how the values are actually used
and to silence MISRA-style signedness warnings on internal builds.
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Vitaly Prosyak [Fri, 29 May 2026 17:50:38 +0000 (13:50 -0400)]
drm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)
Problem:
While developing the amd_close_race IGT test (which intentionally triggers
execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table
entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce
zero diagnostic output. The GPU simply hangs silently for ~10s until the
scheduler timeout fires. There is no way to distinguish an execute
permission fault from any other type of GPU hang.
Root cause:
GFX 10.1.x defaults to noretry=0, which sets
RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers
(gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE,
wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware
loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the
translation indefinitely, expecting software to eventually fix the
permission bits (as happens in SVM/HMM recovery). No interrupt of any kind
reaches the IH ring.
This is different from invalid-page faults (V=0) which DO generate a retry
fault interrupt that the driver can escalate to a no-retry fault. Permission
faults with valid PTEs loop silently forever in hardware.
GFX 10.3+ already defaults to noretry=1, which makes permission faults
generate immediate L2 protection fault interrupts. GFX 10.1.x was
inadvertently left out of this default.
Fix:
Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to
IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line
change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer
generations.
With noretry=1, the existing non-retry fault handler
(gmc_v10_0_process_interrupt) already decodes and prints the full
GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS,
faulting address, VMID, PASID, and process name. No additional logging
code is needed — the fix is purely routing permission faults to the
existing, fully-capable non-retry interrupt handler.
v2: Dropped GFX10-specific logging from gmc_v10_0.c and
kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry
fault handler, but with noretry=1 permission faults take the non-retry
path — the v1 retry handler code was dead and would never execute.
Tested on Navi10 (GFX 10.1.10):
- Execute permission faults now produce immediate, clear output:
[gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592)
Process amd_close_race pid 13380 thread amd_close_race pid 13384
in page at address 0x40001000 from client 0x1b (UTCL2)
GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881
PERMISSION_FAULTS: 0x8
- No regressions with properly-mapped GPU workloads
Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Mon, 25 May 2026 11:45:02 +0000 (13:45 +0200)]
drm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as needed
When the fault stop mode isn't AMDGPU_VM_FAULT_STOP_ALWAYS,
these bits should be programmed to 0.
Program CRASH_ON_NO_RETRY_FAULT and CRASH_ON_RETRY_FAULT
always, to make sure to clear the bits when we don't want
to crash.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 18 Feb 2026 12:05:46 +0000 (13:05 +0100)]
drm/amdgpu: fix waiting for all submissions for userptrs
Wait for all submissions when userptrs need to be invalidated by the MMU
notifier, not just the one the userptr was involved into.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Mon, 11 May 2026 08:33:37 +0000 (16:33 +0800)]
drm/amd/pm: zero unused SMU argument registers
SMU messages may use fewer arguments than the available argument registers,
the previous code only wrote used registers and left the rest unchanged,
so stale values from a prior message could persist.
Write all argument registers for each message and zero the unused tail
to keep command arguments deterministic and avoid unintended carry-over.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Tue, 19 May 2026 03:18:12 +0000 (11:18 +0800)]
drm/amd/pm: fix smu13 power limit default/cap calculation
smu_v13_0_0_get_power_limit() and smu_v13_0_7_get_power_limit() mix
runtime power_limit with PP table limits when reporting default/min/max.
When current power limit query succeeds, default_power_limit was set to the
runtime value instead of the PP table default, and min/max could be derived
from inconsistent bases (MsgLimits/runtime), leading to incorrect cap info.
Use SocketPowerLimitAc/Dc as the PP default base (pp_limit), keep
current_power_limit as runtime value, and derive min/max from pp_limit with
OD percentages.
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5227 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Thu, 21 May 2026 14:36:37 +0000 (22:36 +0800)]
drm/amd/pm: apply SMU 13.0.10 workaround during MP1 unload
On SMU v13.0.10, sending PrepareMp1ForUnload with the default
parameter may leave the device in an inaccessible state. This can
affect runtime power management and partial PnP flows.
e.g: kexec, driver unload, boco/d3cold.
Pass the required workaround parameter 0x55, when preparing MP1 for
unload on SMU v13.0.10, keep the existing behavior for other SMU
versions.
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5133 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Tue, 19 May 2026 10:42:09 +0000 (18:42 +0800)]
drm/amdgpu: NUL-terminate securedisplay debugfs input from userspace
Use strncpy_from_user() instead of copy_from_user() before sscanf() in
the securedisplay_test debugfs write handler so a full-length write
cannot leave the stack buffer without a terminator.
Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Tue, 19 May 2026 09:31:54 +0000 (17:31 +0800)]
drm/amdgpu: validate RAS EEPROM tbl_size before record count
Corrupt EEPROM data can set tbl_size below the table header size.
Guard the RAS_NUM_RECS macros against undersized tbl_size and reset
the table during init when tbl_size is below the minimum for the table
version instead of trusting the header.
Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Tue, 19 May 2026 09:31:21 +0000 (17:31 +0800)]
drm/amd/ras: validate RAS EEPROM tbl_size before record count
Corrupt EEPROM data can set tbl_size below the table header size.
Guard the RAS_NUM_RECS macros against undersized tbl_size and reset
the table during init when tbl_size is below the minimum for the table
version instead of trusting the header.
Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Tue, 12 May 2026 07:44:32 +0000 (15:44 +0800)]
drm/amd/pm: bound pp_dpm_set_pp_table() memcpy
The powerplay path allocates hardcode_pp_table once with kmemdup(...,
soft_pp_table_size). memcpy(..., size) used the sysfs store count (up to
PAGE_SIZE) with no upper bound, causing heap overflow. Reject
writes where size exceeds soft_pp_table_size.
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The thermal_throttling_logging sysfs store function accepts negative
values like -1 and -9999999, which are nonsensical for a logging interval.
Current behavior:
- Values <= 0 disable logging (intended for 0 only)
- Values 1-3600 enable logging with interval in seconds
- Negative values are accepted and treated as disable
Issue:
Large negative values like -9999999 make no semantic sense and could
indicate input validation bypass attempts. While they functionally
disable logging (same as 0), accepting arbitrary negative values
suggests inadequate input validation.
Fix:
Add explicit check to reject values < 0 before processing.
Only accept:
- 0: disable thermal throttling logging
- 1-3600: enable with interval in seconds (existing validation)
This improves input validation and makes the interface more robust.
Test Results Before Fix:
thermal_throttling_logging: 6 failures
- Accepted: 0, -1, -9999999, -2147483648, empty string, 0777
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The AMDGPU power management sysfs store functions accept whitespace-only
strings when they should reject them with -EINVAL. This was discovered via
systematic fuzzing of sysfs interfaces crossing the user/kernel trust
boundary.
Impact:
- Whitespace-only writes (e.g., "\n", " ") can cause unexpected behavior
- Better input validation at user/kernel trust boundary
- Defense-in-depth improvement
Root Cause:
The sysfs_streq() function matches whitespace-only strings against empty
string, allowing invalid input to be processed.
Fix:
Add explicit validation at the start of each affected store function:
if (count == 0 || sysfs_streq(buf, ""))
return -EINVAL;
This rejects whitespace-only inputs before they are processed. Note that
write() calls with count=0 (truly empty strings) are handled by the VFS
layer before reaching the sysfs .store() callback - the VFS returns 0
(success) without calling the kernel function. This is POSIX-compliant
behavior and cannot be changed at the kernel driver level.
What This Patch Fixes:
- Whitespace-only strings: "\n", " ", " ", etc. are now rejected
- Defense-in-depth: Explicit validation at trust boundary
- Code clarity: Intent to reject invalid input is explicit
What This Patch Cannot Fix:
- write(fd, "", 0) returning success - this is VFS layer behavior
- Fuzzer tests for empty strings (count=0) will still report "accepted"
because the VFS handles this before the kernel callback
Test Results After Fix:
- Whitespace strings ("\n", " ") now properly rejected
- Empty string tests (count=0) still show as "accepted" due to VFS behavior
- Overall improvement in input validation robustness
- No impact on valid inputs
This is a defense-in-depth improvement that hardens input validation
even though VFS layer behavior prevents catching all edge cases.
Tested: amd_fuzzing_sysfs IGT test
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Vitaly Prosyak [Thu, 14 May 2026 22:55:42 +0000 (18:55 -0400)]
drm/amdgpu: fix KASAN slab-out-of-bounds in amdgpu_coredump ring dump
The ring content dump in amdgpu_coredump() uses two separate loops over
adev->rings[]: the first counts rings with unsignalled fences to size
the allocation, and the second copies ring data into the allocated
buffers.
Because last_seq is an atomic that is updated concurrently by the fence
signalling path, additional rings may appear unsignalled in the second
loop that were signalled during the first. When this happens, idx
exceeds the allocated ring_count and the store to coredump->rings[idx]
writes past the end of the kcalloc-ed buffer.
This was found during IGT stressful test amd_queue_reset which
triggers random GPU resets. The OVERSIZE subtest
(CMD_STREAM_EXEC_INVALID_PACKET_LENGTH_OVERSIZE on GFX ring) provokes
a ring timeout and subsequent coredump, which hits the race between
the counting and copying loops. The failure is non-deterministic and
depends on fence signalling timing during the reset.
KASAN log:
BUG: KASAN: slab-out-of-bounds in amdgpu_coredump+0x1274/0x12f0 [amdgpu]
Write of size 4 at addr ffff888106154258 by task kworker/u128:5/23625
CPU: 16 UID: 0 PID: 23625 Comm: kworker/u128:5 Not tainted 6.19.0+ #35
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Call Trace:
<TASK>
dump_stack_lvl+0xa5/0x110
print_report+0xd1/0x660
kasan_report+0xf3/0x130
__asan_report_store4_noabort+0x17/0x30
amdgpu_coredump+0x1274/0x12f0 [amdgpu]
amdgpu_job_timedout+0xef0/0x16c0 [amdgpu]
drm_sched_job_timedout+0x194/0x5c0 [gpu_sched]
process_one_work+0x84b/0x1990
worker_thread+0x6b8/0x11b0
</TASK>
The buggy address belongs to the object at ffff888106154200
which belongs to the cache kmalloc-rnd-09-96 of size 96
The buggy address is located 16 bytes to the right of
allocated 72-byte region [ffff888106154200, ffff888106154248)
72 bytes = 3 * sizeof(struct amdgpu_coredump_ring), so ring_count was 3
but idx reached 3+, writing ring_index (at struct offset 16) 16 bytes
past the allocation.
Fix by adding an idx < ring_count guard to the copy loop so it cannot
exceed the allocated count even when the fence state changes between
the two passes.
Fixes: eea85914d15b (drm/amdgpu: save ring content before resetting the device) Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Tue, 12 May 2026 11:10:23 +0000 (19:10 +0800)]
drm/amdgpu: harden FRU PIA parsing with bounded helpers
Replace the open-coded TLV walk with fru_pia_advance()
and fru_pia_copy_field() helpers that bound every read
by the actual EEPROM data length, preventing out-of-bounds
reads on truncated or malformed FRU data.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiang Liu [Fri, 29 May 2026 14:10:09 +0000 (22:10 +0800)]
drm/amd/ras: make UNIRAS CPER debugfs header legacy-compatible
The UNIRAS CPER debugfs path returned a zeroed 12-byte prefix and used
file offset directly as the CPER record index. Legacy CPER ring readers
expect the prefix to contain three 32-bit ring pointers followed
immediately by CPER payload data.
Build the same header shape for UNIRAS reads by reporting a zero read
pointer and matching write pointers for the returned payload size. Keep
an internal record cursor behind the debugfs offset so follow-up reads
continue from the correct CPER record while first reads still expose the
legacy prefix.
Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Tue, 12 May 2026 07:44:28 +0000 (15:44 +0800)]
drm/amd/ras: snapshot remote cmd header to fix double-fetch
The response header lives in PF-controlled shared memory. Copy it
into a local struct once, then read cmd_res and output_size from the
snapshot so the PF cannot flip cmd_res or grow output_size between
checks.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Mon, 25 May 2026 11:45:01 +0000 (13:45 +0200)]
drm/amdgpu: Use gmc->noretry instead of amdgpu_noretry directly
Whether retry faults are actually enabled, is determined by
the amdgpu_gmc_noretry_set() function. The rest of the code
base should use gmc->noretry instead of the module parameter.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Mon, 25 May 2026 11:22:04 +0000 (13:22 +0200)]
drm/amdgpu: Align amdgpu_gtt_mgr entries to TLB size on all SI
It seems that Pitcairn has the same issues as Tahiti
with regards to the TLB size. This commit fixes a
VCE1 FW validation timeout on suspend/resume on Pitcairn.
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5336 Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Thu, 14 May 2026 09:21:09 +0000 (17:21 +0800)]
drm/amdgpu: unmap userq for evicting user queue
If the driver only preempts queues, there can still be inflight waves,
pending dispatch state, or resume/redispatch possibility tied to the
same queue. Then the VM/TTM side may proceed to move/unmap queue related
BOs during evicting userq objects while shader TCP clients still need to
access them.
So for eviction, unmap is safer because it makes the queue nonrunnable
before memory backing is invalidated. Meanwhile, for a idle queue it's
more sutiable for unmapping it rather preempt and unmapping also can
save more processing time than preempt.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Tue, 26 May 2026 08:42:14 +0000 (16:42 +0800)]
drm/amdgpu: reserve TTM move fences slot for rearming eviction fences
The eviction rearming does not cover possible TTM move fences. If TTM
moves the BO and consumes move fence slots, the later eviction fence
add can hit the dma_resv_add_fence() BUG.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 27 May 2026 19:41:59 +0000 (15:41 -0400)]
drm/amdgpu/sdma7.1: fix support for disable_kq
Set the flag in the ring structure.
Fixes: 80d4d3a45b86 ("drm/amdgpu/sdma7.1: add support for disable_kq") Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alysa Liu [Wed, 27 May 2026 15:31:35 +0000 (11:31 -0400)]
drm/amdkfd: fix UAF race in destroy_queue_cpsch
wait_on_destroy_queue() drops locks to wait for queue resume, allowing
a concurrent destroy to free the queue. Use is_being_destroyed flag to
serialize destruction.
Reviewed-by: Amir Shetaia <Amir.Shetaia@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
commit eb53125a7ad9 ("drm/amd: Add dedicated helper for
amdgpu_device_find_parent()") created a dedicated helper to find
the parent device outside of the dGPU but it had a logic error
that caused it to walk all the way up the topology and return
the wrong device.
Break out of the loop when the device is found.
Reviewed-by: Alexander Deucher <alexander.deucher@amd.com> Fixes: eb53125a7ad9 ("drm/amd: Add dedicated helper for amdgpu_device_find_parent()") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Add missing kdoc for ALLM parameters
Add descriptions for the missing parameters for ALLMEnabled and
ALLMValue to keep the function documentation synchronized with the
function prototype mod_build_hf_vsif_infopacket().
Fixes the below with gcc W=1:
../display/modules/info_packet/info_packet.c:507 function parameter 'ALLMEnabled' not described in 'mod_build_hf_vsif_infopacket'
../display/modules/info_packet/info_packet.c:507 function parameter 'ALLMValue' not described in 'mod_build_hf_vsif_infopacket'
Fixes: 3c2381b92cba ("drm/amd/display: add support for VSIP info packet") Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Wayne Lin <Wayne.Lin@amd.com> Cc: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix kdoc parameter names for DSC padding helper
Replace incorrect kdoc parameter names with the actual function
parameter names used by
dc_update_modified_pix_clock_for_dsc_with_padding().
Fixes the below with gcc W=1:
../display/dc/core/dc_resource.c:4616 function parameter 'stream' not described in 'dc_update_modified_pix_clock_for_dsc_with_padding'
../display/dc/core/dc_resource.c:4616 function parameter 'timing' not described in 'dc_update_modified_pix_clock_for_dsc_with_padding'
../display/dc/core/dc_resource.c:4616 function parameter 'stream' not described in 'dc_update_modified_pix_clock_for_dsc_with_padding'
../display/dc/core/dc_resource.c:4616 function parameter 'timing' not described in 'dc_update_modified_pix_clock_for_dsc_with_padding'
Fixes: 547cc004c3c1 ("drm/amd/display: add HDMI 2.1 DSC over FRL support") Cc: Harry Wentland <harry.wentland@amd.com> Cc: Fangzhi Zuo <Jerry.Zuo@amd.com> Cc: Dan Wheeler <daniel.wheeler@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fangzhi Zuo [Wed, 13 May 2026 21:02:03 +0000 (17:02 -0400)]
drm/amd/display: Disable FRL and add module param to enable it
FRL links don't yet support VRR. If we enable it by default
users will see a functional regression when connected to an FRL
capable display as the driver will now default to FRL and not
allow VRR.
VRR support will come soon, so instead of making an elaborate
TMDS fallback mechanism simply default FRL to disabled, but
provide a dcfeaturemask of 0x400 to enable it if anyone wants
to already try it out.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 22 May 2026 10:00:08 +0000 (06:00 -0400)]
drm/amd/display: Promote DC to 3.2.384
This version brings along the following updates:
- Enable DCN 4.2.1:
* Add register header files for DCN42B
* Add DCN42B DC resource files
* Add DCN42B DMUB support
* Add DCN42B code to DC and dcn42b_soc_bb to DML2
* Add DCN42 PMO init_for_pstate_support
* Enable DCN42 PMO policy and pstate pmo
* Enable DCN 4.2.1 in amdgpu_dm
* Enable DM for DCN 4.2.1
- Add no_native_i2c codepath
- Add amdgpu_dm KUnit tests for:
* amdgpu_dm_psr_set_event
* dm_ism_dispatch_next_event and additional ISM functions
* amdgpu_dm_colorop
* color LUT functions and transfer function helpers
- Enable gcov coverage for amdgpu_dm KUnit builds
- Extract dm_ism_dispatch_next_event and transfer function helpers
- Refactor amdgpu_dm_initialize_default_pipeline
- Clean up PSR helper functions
- Fix gamma 2.2 colorop TF direction in tests
- Handle aux_inst for connectors without DDC pin
- Fix DP_PIXEL_FORMAT fields & update clk_src for DCN4x
- Avoid DPMS-on for phantom stream
- Change default driver setting for "Force ODM2:1 for eDP" policy
- Add DC_VALIDATE_MODE_AND_PROGRAMMING condition check for force odm2:1
- Check for sharpening case when calculating max vtaps for scaler
- Add DRAM table fields to clk_mgr_internal
- Enable frame skipping in 0x37B
- Bound VBIOS record-chain walk loops
- Clamp HDMI HDCP2 rx_id_list read to buffer size
- Clamp VBIOS HDMI retimer register count to array size
- Reject gpio_bitshift >= 32 in bios_parser_get_gpio_pin_info()
- Use krealloc_array() in dal_vector_reserve()
- Fix NULL deref and buffer over-read in SDP debugfs
- Fix out-of-bounds read in dp_get_eq_aux_rd_interval()
- FW Release 0.1.61.0
Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Fri, 22 May 2026 09:57:40 +0000 (05:57 -0400)]
ddrm/amd/display: [FW Promotion] Release 0.1.61.0
[Why & How]
Update DMUB related command structure.
Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Tue, 19 May 2026 01:14:20 +0000 (21:14 -0400)]
drm/amd/display: Enable DM for DCN 4.2.1
[Why & How]
Add DM IP block to amdgpu_discovery
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Mon, 4 May 2026 21:04:54 +0000 (17:04 -0400)]
drm/amd/display: Enable DCN 4.2.1 in amdgpu_dm
[Why & How]
Add checks for IP version 4.2.1.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Tue, 19 May 2026 01:49:14 +0000 (21:49 -0400)]
drm/amd/display: Add DCN42B DMUB support
[Why & How]
Add DMUB support for DCN42B
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Mon, 18 May 2026 19:00:50 +0000 (15:00 -0400)]
drm/amd/display: Add DCN42B code to DC
[Why & How]
Add DCN42B code to DC
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Mon, 18 May 2026 17:59:21 +0000 (13:59 -0400)]
drm/amd/display: Add dcn42b_soc_bb to DML2
[Why & How]
Add bounding box for dcn42b
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Mon, 18 May 2026 00:07:20 +0000 (20:07 -0400)]
drm/amd/display: Add DCN42B DC resource files
[Why & How]
Add DC resource files for DCN42B.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gabe Teeger [Thu, 21 May 2026 15:06:22 +0000 (11:06 -0400)]
drm/amd/display: Handle aux_inst for connectors without DDC pin
[Why & How]
Must use an alternative codepath to access AUX channel when
link->no_ddc_pin is set.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Gabe Teeger <gabe.teeger@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Matthew Stewart [Tue, 19 May 2026 01:34:57 +0000 (21:34 -0400)]
drm/amd/display: Add no_native_i2c codepath
[Why]
ASICs which do not have native DDC capability must use a different
codepath to access the AUX channel.
[How]
- BIOS cap NO_DDC_PIN is set to 1 for links which do not have the DDC pin.
- dp_connector_no_native_i2c in dc_config must also be set to true to
use this codepath.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Fri, 15 May 2026 22:30:45 +0000 (16:30 -0600)]
drm/amd/display: Clean up PSR helper functions
[Why & How]
Use the existing local dc variable in amdgpu_dm_set_psr_caps() instead
of redundantly dereferencing link->ctx->dc.
Simplify amdgpu_dm_psr_is_active_allowed() by replacing with early
return and inlining the intermediate stream variable.
No functional changes.
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Tue, 2 Jun 2026 16:15:36 +0000 (10:15 -0600)]
drm/amd/display: Export symbols for KUnit test modules
Add missing EXPORT_IF_KUNIT() calls for amdgpu_dm_psr_set_event,
amdgpu_dm_ism_init, and amdgpu_dm_ism_fini so that the KUnit test
modules can resolve these symbols when built as modules, i.e.,
CONFIG_DRM_AMD_DC_KUNIT_TEST=m.
Fixes: 34f281489976 ("drm/amd/display: Add KUnit tests for amdgpu_dm_psr_set_event") Fixes: a3142b13fe9f ("drm/amd/display: Add more KUnit tests for amdgpu_dm_ism") Assisted-by: Copilot:Claude-Opus-4.6 Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Fri, 15 May 2026 22:09:48 +0000 (16:09 -0600)]
drm/amd/display: Add KUnit tests for amdgpu_dm_psr_set_event
[Why & How]
Add three KUnit tests covering the early-exit validation guard in
amdgpu_dm_psr_set_event():
- NULL stream argument returns false immediately
- Valid stream with NULL link returns false
- Valid stream/link with psr_feature_enabled == false returns false
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
Separate the "should we emit IMMEDIATE?" decision into a pure,
side-effect-free helper so it can be tested in isolation without a
full DRM context.
- dm_ism_dispatch_next_event(current_state, delay_ns, sso_delay_ns)
returns DM_ISM_EVENT_IMMEDIATE when the state requires an immediate
follow-up (HYSTERESIS_WAITING with zero delay, OPTIMIZED_IDLE with
zero SSO delay, or TIMER_ABORTED), and DM_ISM_NUM_EVENTS otherwise.
- Removes the passthrough event parameter that was always
DM_ISM_NUM_EVENTS at the call site, making the sentinel explicit.
- Drops the now-unused event parameter from dm_ism_dispatch_power_state.
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Fri, 15 May 2026 15:55:45 +0000 (09:55 -0600)]
drm/amd/display: Add more KUnit tests for amdgpu_dm_ism
[Why & How]
Add 8 more KUnit tests:
- dm_ism_get_idle_allow_delay: add a case where filter_entry_count
exceeds filter_history_size, exercising the max() branch in the
history_size calculation.
- amdgpu_dm_ism_init: verify all fields are initialised to
expected values after construction.
- amdgpu_dm_ism_fini: smoke-test cancellation of never-scheduled
delayed work items paired with a preceding init.
- dm_ism_set_last_idle_ts: verify last_idle_timestamp_ns is
updated to at least the value captured before the call.
- dm_ism_insert_record: verify index increment and duration
calculation; verify out-of-bounds index wraps to slot 0.
- dm_ism_trigger_event: verify current_state and previous_state
are updated on a valid transition and left unchanged on an
invalid one.
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Thu, 21 May 2026 02:10:53 +0000 (20:10 -0600)]
drm/amd/display: Add KUnit tests for amdgpu_dm_colorop
[Why & How]
Add a KUnit pipeline test for amdgpu_dm_build_default_pipeline().
The pipeline test uses a mock drm_device (via DRM KUnit helpers),
calls amdgpu_dm_build_default_pipeline() with hw_3d_lut=true, then
walks the returned colorop chain asserting the correct type sequence
(degam_tf, mult, ctm, shaper_tf, shaper_lut, 3dlut, blnd_tf,
blnd_lut), that every op carries bypass_property, and that the
chain length is exactly eight. A kunit cleanup action calls
drm_colorop_pipeline_destroy() before the device is torn down.
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Thu, 21 May 2026 02:10:42 +0000 (20:10 -0600)]
drm/amd/display: Fix gamma 2.2 colorop TF direction in tests
[Why & How]
Correct the gamma 2.2 TF direction used in the supported-TF bitmask
tests. Degam and blnd use DRM_COLOROP_1D_CURVE_GAMMA22 (EOTF
direction); shaper uses DRM_COLOROP_1D_CURVE_GAMMA22_INV (inverse
EOTF direction).
This aligns the tests with commit d8f9f42effd7
("drm/amd/display: Fix gamma 2.2 colorop TFs").
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
Extract amdgpu_dm_initialize_default_pipeline() into a new
STATIC_IFN_KUNIT helper amdgpu_dm_build_default_pipeline().
This separation makes the pipeline-building logic testable via
KUnit without pulling in amdgpu_device and its dependencies
that are unavailable in the UML KUnit build environment.
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Thu, 14 May 2026 20:29:52 +0000 (14:29 -0600)]
drm/amd/display: Add more color KUnit tests
[Why & How]
Add KUnit tests for __set_tf_bypass(), __set_tf_distributed_points(),
the bypass paths of amdgpu_dm_set_atomic_regamma(),
amdgpu_dm_atomic_shaper_lut(), and amdgpu_dm_atomic_blend_lut(), and
all three input branches of __set_colorop_in_tf_1d_curve().
Export the three shaper/blend/regamma helpers and
__set_colorop_in_tf_1d_curve with STATIC_IFN_KUNIT and EXPORT_IF_KUNIT
to make their branches reachable from tests.
Add the following test cases in amdgpu_dm_color_test.c:
- dm_test_set_tf_bypass: verify bypass TF setup
- dm_test_set_tf_distributed_points_srgb: validate sRGB gamma
- dm_test_set_tf_distributed_points_pq: validate PQ gamma
- dm_test_set_atomic_regamma_bypass: verify regamma bypass path
- dm_test_atomic_shaper_lut_bypass: verify shaper LUT bypass path
- dm_test_atomic_blend_lut_bypass: verify blend LUT bypass path
- dm_test_set_colorop_in_tf_1d_curve_invalid_type: verify invalid
colorop type returns -EINVAL
- dm_test_set_colorop_in_tf_1d_curve_unsupported_curve: verify
unsupported curve type returns -EINVAL
- dm_test_set_colorop_in_tf_1d_curve_bypass: verify bypass path sets
TF_TYPE_BYPASS and TRANSFER_FUNCTION_LINEAR
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Thu, 14 May 2026 20:10:11 +0000 (14:10 -0600)]
drm/amd/display: Extract transfer function helpers
[Why]
Extract repeated inline dc_transfer_func setup patterns into two
small helper functions in amdgpu_dm_color.c.
Three functions (amdgpu_dm_set_atomic_regamma,
amdgpu_dm_atomic_shaper_lut, amdgpu_dm_atomic_blend_lut) each
contained identical two-line bypass setup and identical three-line
distributed-points setup. __set_colorop_in_tf_1d_curve contained
the same two-line bypass pattern as well.
[How]
Extract to __set_tf_bypass() and __set_tf_distributed_points().
Replace all four bypass sites with __set_tf_bypass(), and all three
distributed-points sites with __set_tf_distributed_points().
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Thu, 14 May 2026 18:35:33 +0000 (12:35 -0600)]
drm/amd/display: Add KUnit tests for color LUT functions
[Why]
Add KUnit tests for three color management functions in
amdgpu_dm_color.c: amdgpu_dm_verify_lut_sizes,
amdgpu_dm_atomic_lut3d, and __set_colorop_3dlut.
[How]
Export amdgpu_dm_verify_lut_sizes with EXPORT_IF_KUNIT. Change
amdgpu_dm_atomic_lut3d and __set_colorop_3dlut from static to
STATIC_IFN_KUNIT and export them with EXPORT_IF_KUNIT. Add their
prototypes to amdgpu_dm_color.h inside the KUnit guard block.
Implement 14 test cases in amdgpu_dm_color_test.c:
- 8 tests for amdgpu_dm_verify_lut_sizes covering null LUTs,
valid and invalid degamma/gamma sizes, both valid, and priority
- 3 tests for amdgpu_dm_atomic_lut3d covering zero size clearing
initialized state, nonzero setting state bits and mode flags,
and LUT data forwarding to tetrahedral_17
- 3 tests for __set_colorop_3dlut covering zero size returning
-EINVAL and clearing initialized state, nonzero returning 0
and setting state bits, and 32-bit LUT data forwarding to
tetrahedral_17
Assisted-by: Copilot:Claude-Sonnet-4.6 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ovidiu Bunea [Thu, 14 May 2026 21:39:25 +0000 (17:39 -0400)]
drm/amd/display: Fix DP_PIXEL_FORMAT fields & update clk_src for DCN4x
[Why & How]
The enc1_stream_encoder_dp_get_pixel_format() function reads fields of the
DP_PIXEL_FORMAT register that are deprecated on DCN4x. This breaks seamless
boot because driver cannot properly determine the pixel format programmed by VBIOS.
The previous changed submitted for this issue incorrectly calculated the DP DTO
frequency because register access to DCN4x DP DTO registers was not working.
Create a new function that reads the correct fields.
Update clk_src structs to support register access for new DCN4x registers & fields.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Tue, 12 May 2026 19:24:22 +0000 (15:24 -0400)]
drm/amd/display: Bound VBIOS record-chain walk loops
[Why & How]
All record-chain walk loops in bios_parser.c and bios_parser2.c use
for(;;) and only terminate on a 0xFF record_type sentinel or zero
record_size. A malformed VBIOS image missing the terminator record
causes unbounded iteration at probe time, potentially hundreds of
thousands of iterations with record_size=1. In the final iterations
near the BIOS image boundary, struct casts beyond the 2-byte header
validated by GET_IMAGE can also read out of bounds.
Cap all 14 record-chain walk loops to BIOS_MAX_NUM_RECORD (256)
iterations. The atombios.h defines up to 22 distinct record types
and atomfirmware.h has 13. Assuming an average of less than 10
records per type (which is reasonable since most are connector-
based) 256 is a generous upper bound.
Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Assisted-by: Copilot:claude-opus-4.6 Mythos Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Thu, 7 May 2026 19:38:37 +0000 (15:38 -0400)]
drm/amd/display: Clamp HDMI HDCP2 rx_id_list read to buffer size
[Why & How]
During HDCP 2.x repeater authentication over HDMI, the driver reads the
sink's RxStatus register and extracts a 10-bit message size field (max
value 1023). This value is used as the read length for the ReceiverID
list without being clamped to the size of the destination buffer
rx_id_list[177]. A malicious HDMI repeater could advertise a message
size larger than the buffer, causing an out-of-bounds write during the
I2C read.
Clamp the read length in mod_hdcp_read_rx_id_list() to the size of the
rx_id_list buffer, matching the approach already used in the DP branch.
Fixes: eff682f83c9c ("drm/amd/display: Add DDC handles for HDCP2.2") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Tue, 5 May 2026 15:50:07 +0000 (11:50 -0400)]
drm/amd/display: Reject gpio_bitshift >= 32 in bios_parser_get_gpio_pin_info()
[Why & How]
gpio_bitshift is a uint8_t read directly from the VBIOS GPIO pin table.
If the value is >= 32, the expression "1 << gpio_bitshift" triggers
undefined behaviour in C (shift count exceeds type width). On x86 the
shift is silently masked to 5 bits, producing an incorrect GPIO mask
that may cause wrong MMIO register bits to be toggled.
Validate gpio_bitshift before use and return BP_RESULT_BADBIOSTABLE for
out-of-range values.
Fixes: ae79c310b1a6 ("drm/amd/display: Add DCE12 bios parser support") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Tue, 5 May 2026 15:52:15 +0000 (11:52 -0400)]
drm/amd/display: Use krealloc_array() in dal_vector_reserve()
[Why & How]
dal_vector_reserve() computes the allocation size as
"capacity * vector->struct_size" using uint32_t arithmetic, which can
silently wrap to a small value on overflow. This would cause krealloc to
return a smaller buffer than expected, leading to heap overflows on
subsequent vector appends.
Replace krealloc() with krealloc_array() which performs an internal
overflow check and returns NULL on wrap, preventing the issue.
Fixes: 2004f45ef83f ("drm/amd/display: Use kernel alloc/free") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>