]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
drm/amd: Clean up kfd node on surprise disconnect
authorMario Limonciello (AMD) <superm1@kernel.org>
Wed, 7 Jan 2026 21:37:28 +0000 (15:37 -0600)
committerAlex Deucher <alexander.deucher@amd.com>
Thu, 8 Jan 2026 16:43:23 +0000 (11:43 -0500)
When an eGPU is unplugged the KFD topology should also be destroyed
for that GPU. This never happens because the fini_sw callbacks never
get to run. Run them manually before calling amdgpu_device_ip_fini_early()
when a device has already been disconnected.

This location is intentionally chosen to make sure that the kfd locking
refcount doesn't get incremented unintentionally.

Cc: kent.russell@amd.com
Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index dd5d9da1c5a8bbddd33fdc02b89a884d3e15602b..33135b185fed3e9fc47f1ed656d195128ab8890f 100644 (file)
@@ -4920,6 +4920,14 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
        amdgpu_ttm_set_buffer_funcs_status(adev, false);
 
+       /*
+        * device went through surprise hotplug; we need to destroy topology
+        * before ip_fini_early to prevent kfd locking refcount issues by calling
+        * amdgpu_amdkfd_suspend()
+        */
+       if (drm_dev_is_unplugged(adev_to_drm(adev)))
+               amdgpu_amdkfd_device_fini_sw(adev);
+
        amdgpu_device_ip_fini_early(adev);
 
        amdgpu_irq_fini_hw(adev);