]> git.ipfire.org Git - thirdparty/linux.git/commit
drm/amdgpu: resume MES scheduling after user queue hang detection and recovery
authorJesse.Zhang <Jesse.Zhang@amd.com>
Fri, 7 Nov 2025 11:19:08 +0000 (19:19 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Wed, 12 Nov 2025 02:54:17 +0000 (21:54 -0500)
commit46f2029fe1dbdbb2ff3d6a566b32002660d3944b
treeedea49af54d1e76b9a2b2177fa419d295f85fab6
parent547985579932c1de13f57f8bcf62cd9361b9d3d3
drm/amdgpu: resume MES scheduling after user queue hang detection and recovery

This patch ensures the Micro-Engine Scheduler (MES) is properly resumed
after detecting and recovering from a user queue hang condition.

Key changes:
1. Track when a hung user queue is detected using found_hung_queue flag
2. Call amdgpu_mes_resume() to restart MES scheduling after completing
   the hang recovery process
3. This complements the existing recovery steps (fence force completion
   and device wedging) by ensuring the scheduler can process new work

Without this resume call, the MES scheduler may remain in a paused state
even after the hung queue has been handled, preventing newly submitted
work from being processed and leading to system stalls.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/mes_userqueue.c