From: Mario Limonciello (AMD) Date: Wed, 7 Jan 2026 21:37:28 +0000 (-0600) Subject: drm/amd: Clean up kfd node on surprise disconnect X-Git-Tag: v6.19-rc6~20^2~1^2~13 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=28695ca09d326461f8078332aa01db516983e8a2;p=thirdparty%2Flinux.git drm/amd: Clean up kfd node on surprise disconnect When an eGPU is unplugged the KFD topology should also be destroyed for that GPU. This never happens because the fini_sw callbacks never get to run. Run them manually before calling amdgpu_device_ip_fini_early() when a device has already been disconnected. This location is intentionally chosen to make sure that the kfd locking refcount doesn't get incremented unintentionally. Cc: kent.russell@amd.com Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 Signed-off-by: Mario Limonciello (AMD) Reviewed-by: Kent Russell Signed-off-by: Alex Deucher (cherry picked from commit 6a23e7b4332c10f8b56c33a9c5431b52ecff9aab) Cc: stable@vger.kernel.org --- diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d5c44bd34d45..d2c3885de711 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5063,6 +5063,14 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_ttm_set_buffer_funcs_status(adev, false); + /* + * device went through surprise hotplug; we need to destroy topology + * before ip_fini_early to prevent kfd locking refcount issues by calling + * amdgpu_amdkfd_suspend() + */ + if (drm_dev_is_unplugged(adev_to_drm(adev))) + amdgpu_amdkfd_device_fini_sw(adev); + amdgpu_device_ip_fini_early(adev); amdgpu_irq_fini_hw(adev);