]> git.ipfire.org Git - thirdparty/kernel/stable.git/commit
drm/amdgpu: Fix user queue deadlock by reordering mutex locking
authorJesse.Zhang <Jesse.Zhang@amd.com>
Fri, 9 May 2025 09:18:16 +0000 (17:18 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Tue, 13 May 2025 13:32:25 +0000 (09:32 -0400)
commit648a0dc0d78c369233b16878e4f351efe7fd8df6
tree0e4625bf01e204a222773f0f9002aa6144c7abbf
parentf71509fdd03e30789293133735e785ea0ca31060
drm/amdgpu: Fix user queue deadlock by reordering mutex locking

This resolves a deadlock between user queue management and GPU reset
paths by enforcing consistent lock ordering.

The deadlock occurred when:

1. Process exit path (amdgpu_userq_mgr_fini) would:
   - Take uqm->userq_mutex
   - Then try to take adev->userq_mutex for list operations

2. GPU reset path (amdgpu_userq_pre_reset) would:
   - Take adev->userq_mutex first (for list traversal)
   - Then take uqm->userq_mutex

The solution establishes a strict top-down locking order:
1. Always take adev->userq_mutex before any uqm->userq_mutex
2. Maintain this order consistently across all code paths

Changes made:
- Reordered locking in amdgpu_userq_mgr_fini() to take device lock first
- Kept existing proper order in amdgpu_userq_pre_reset()
- Simplified the fini flow by removing redundant operations

This prevents circular dependencies while maintaining thread safety
during both normal operation and GPU reset scenarios.

Fixes: 4ce60dbada96 ("drm/amdgpu: store userq_managers in a list in adev")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c