From: Vitaly Prosyak Date: Tue, 14 Apr 2026 03:07:55 +0000 (-0400) Subject: drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_format X-Git-Tag: v7.1-rc1~24^2~3^2~58 X-Git-Url: http://git.ipfire.org/gitweb/?a=commitdiff_plain;h=d91db49e97382efe38b51f5943c1a7073c0f06cd;p=thirdparty%2Fkernel%2Flinux.git drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_format A race condition in the devcoredump code causes a NULL pointer dereference in amdgpu_devcoredump_format() when multiple GPU resets occur in quick succession. The sequence of events: 1. First reset calls amdgpu_coredump(), creates coredump1, sets adev->coredump = coredump1, and queues the deferred work. 2. The deferred work begins executing (work_pending() returns false since the work is now running, not just queued). 3. A second reset calls amdgpu_coredump(). work_pending() returns false because the work is running, so amdgpu_coredump() proceeds: creates coredump2, overwrites adev->coredump = coredump2, and re-queues the deferred work with queue_work(). 4. The first deferred work finishes and unconditionally sets adev->coredump = NULL, destroying the reference to coredump2. 5. The re-queued deferred work starts and reads adev->coredump = NULL. It then passes this NULL into amdgpu_devcoredump_format() which dereferences coredump->adev (offset 0 in the struct), triggering: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:amdgpu_devcoredump_format+0xa6/0x36b0 [amdgpu] This was observed during the amd_deadlock IGT test where multiple subtests trigger rapid ring resets. The dmesg log shows four coredumps created within 120ms (at 102.377s, 104.424s, 104.492s, and 104.497s), with the crash occurring 13ms after the last one. Fix this with two changes: - Replace work_pending() with work_busy() in amdgpu_coredump() to also reject new coredumps while the deferred work is executing, not just when it is queued. This closes the main race window. - Add a defensive NULL check for adev->coredump at the start of amdgpu_devcoredump_deferred_work() to prevent the crash if the race still occurs (work_busy() is advisory, not a full barrier). v2: Drop the job->pasid NULL guard -- that fix was independently submitted and merged as commit 4c1f0a162da5 ("drm/amdgpu: add job->pasid in check as amdgpu_job could be NULL") by Sunil Khatri, reviewed by Christian König. Integrate with that patch as suggested by Christian. Fixes: 4bbba79a7f1d ("drm/amdgpu: move devcoredump generation to a worker") Cc: Pierre-Eric Pelloux-Prayer Cc: Christian König Cc: Alex Deucher Signed-off-by: Vitaly Prosyak Reviewed-by: Pierre-Eric Pelloux-Prayer Signed-off-by: Alex Deucher --- diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c index 3d7aa6b09815..5e353a83ec0d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c @@ -464,6 +464,9 @@ static void amdgpu_devcoredump_deferred_work(struct work_struct *work) struct amdgpu_device *adev = container_of(work, typeof(*adev), coredump_work); struct amdgpu_coredump_info *coredump = adev->coredump; + if (!coredump) + goto end; + /* Do a one-time preparation of the coredump output because * repeatingly calling drm_coredump_printer is very slow. */ @@ -499,7 +502,7 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check, int i, off, idx; /* No need to generate a new coredump if there's one in progress already. */ - if (work_pending(&adev->coredump_work)) + if (work_busy(&adev->coredump_work)) return; if (job && job->pasid)