amdgpu_amdkfd_gpuvm_free_memory_of_gpu() unpinned DOORBELL and MMIO
remap BOs (which are pinned at allocation time) before checking whether
the BO is still mapped to the GPU. When the BO is still mapped, the
function returns -EBUSY and leaves the BO alive, but it has already
been unpinned. The BO is then unpinned again when it is finally freed
during process teardown, triggering a ttm_bo_unpin() underflow warning:
WARNING: CPU: 18 PID: 15066 at ttm/ttm_bo.c:650 amdttm_bo_unpin+0x6d/0x80 [amdttm]
Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu]
RIP: 0010:amdttm_bo_unpin+0x6d/0x80 [amdttm]
Call Trace:
amdgpu_bo_unpin+0x1a/0x90 [amdgpu]
amdgpu_amdkfd_gpuvm_unpin_bo+0x31/0xb0 [amdgpu]
amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x3bf/0x460 [amdgpu]
kfd_process_free_outstanding_kfd_bos+0xd4/0x170 [amdgpu]
kfd_process_wq_release+0x109/0x1b0 [amdgpu]
process_one_work+0x1e2/0x3b0
worker_thread+0x50/0x3a0
kthread+0xdd/0x100
ret_from_fork+0x29/0x50
Move the unpin after the mapped_to_gpu_memory check so it only happens
once we are committed to freeing the BO.
Fixes: d25e35bc26c3 ("drm/amdgpu: Pin MMIO/DOORBELL BO's in GTT domain")
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit
927c5b2defb9b09856444d94bebfd056a002bd75)
mutex_lock(&mem->lock);
- /* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
- if (mem->alloc_flags &
- (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
- KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
- amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
- }
-
mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
is_imported = mem->is_imported;
mutex_unlock(&mem->lock);
return -EBUSY;
}
+ /* At this point the BO is guaranteed to be freed, so unpin the
+ * MMIO/DOORBELL BOs that were pinned during allocation.
+ */
+ if (mem->alloc_flags &
+ (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+ KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
+ amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
+ }
+
/* Make sure restore workers don't access the BO any more */
mutex_lock(&process_info->lock);
if (!list_empty(&mem->validate_list))