]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline...
authorSrinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Thu, 11 Dec 2025 15:55:20 +0000 (21:25 +0530)
committerAlex Deucher <alexander.deucher@amd.com>
Mon, 5 Jan 2026 22:00:00 +0000 (17:00 -0500)
commitbd8150a1b3370a9f7761c5814202a3fe5a79f44f
tree6e6755a3da0e3dc85d4c8dcfef4e3d0730429f1f
parentf9f281e839b7518109a4b7efa8dbee425bd8bfad
drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4

This commit simplifies the amdgpu_gem_va_ioctl function, key updates
include:
 - Moved the logic for managing the last update fence directly into
   amdgpu_gem_va_update_vm.
 - Introduced checks for the timeline point to enable conditional
   replacement or addition of fences.

v2: Addressed review comments from Christian.
v3: Updated comments (Christian).
v4: The previous version selected the fence too early and did not manage its
    reference correctly, which could lead to stale or freed fences being used.
    This resulted in refcount underflows and could crash when updating GPU
    timelines.
    The fence is now chosen only after the VA mapping work is completed, and its
    reference is taken safely. After exporting it to the VM timeline syncobj, the
    driver always drops its local fence reference, ensuring balanced refcounting
    and avoiding use-after-free on dma_fence.

Crash signature:
[  205.828135] refcount_t: underflow; use-after-free.
[  205.832963] WARNING: CPU: 30 PID: 7274 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
...
[  206.074014] Call Trace:
[  206.076488]  <TASK>
[  206.078608]  amdgpu_gem_va_ioctl+0x6ea/0x740 [amdgpu]
[  206.084040]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  206.089994]  drm_ioctl_kernel+0x86/0xe0 [drm]
[  206.094415]  drm_ioctl+0x26e/0x520 [drm]
[  206.098424]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[  206.104402]  amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
[  206.109387]  __x64_sys_ioctl+0x96/0xe0
[  206.113156]  do_syscall_64+0x66/0x2d0
...
[  206.553351] BUG: unable to handle page fault for address: ffffffffc0dfde90
...
[  206.553378] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
...
[  206.553405] Call Trace:
[  206.553409]  <IRQ>
[  206.553415]  ? __pfx_drm_sched_fence_free_rcu+0x10/0x10 [gpu_sched]
[  206.553424]  dma_fence_signal+0x30/0x60
[  206.553427]  drm_sched_job_done.isra.0+0x123/0x150 [gpu_sched]
[  206.553434]  dma_fence_signal_timestamp_locked+0x6e/0xe0
[  206.553437]  dma_fence_signal+0x30/0x60
[  206.553441]  amdgpu_fence_process+0xd8/0x150 [amdgpu]
[  206.553854]  sdma_v4_0_process_trap_irq+0x97/0xb0 [amdgpu]
[  206.554353]  edac_mce_amd(E) ee1004(E)
[  206.554270]  amdgpu_irq_dispatch+0x150/0x230 [amdgpu]
[  206.554702]  amdgpu_ih_process+0x6a/0x180 [amdgpu]
[  206.555101]  amdgpu_irq_handler+0x23/0x60 [amdgpu]
[  206.555500]  __handle_irq_event_percpu+0x4a/0x1c0
[  206.555506]  handle_irq_event+0x38/0x80
[  206.555509]  handle_edge_irq+0x92/0x1e0
[  206.555513]  __common_interrupt+0x3e/0xb0
[  206.555519]  common_interrupt+0x80/0xa0
[  206.555525]  </IRQ>
[  206.555527]  <TASK>
...
[  206.555650] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
...
[  206.555667] Kernel panic - not syncing: Fatal exception in interrupt

Link: https://patchwork.freedesktop.org/patch/654669/
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c