git.ipfire.org Git - thirdparty/kernel/linux.git/commit

author	Jesse.Zhang <Jesse.Zhang@amd.com>
	Tue, 13 Jan 2026 08:13:47 +0000 (16:13 +0800)
committer	Alex Deucher <alexander.deucher@amd.com>
	Tue, 20 Jan 2026 22:16:12 +0000 (17:16 -0500)
commit	fc3336be9c6297282dd7968a597166b212cb0dc0
tree	e70239542e6bab67d5edb02dccfce4fe4cf8a54a	tree
parent	5aaa5058dec5bfdcb24c42fe17ad91565a3037ca	commit \| diff

drm/amd/amdgpu: Add independent hang detect work for user queue fence

In error scenarios (e.g., malformed commands), user queue fences may never
be signaled, causing processes to wait indefinitely. To address this while
preserving the requirement of infinite fence waits, implement an independent
timeout detection mechanism:

1. Initialize a hang detect work when creating a user queue (one-time setup)
2. Start the work with queue-type-specific timeout (gfx/compute/sdma) when
the last fence is created via amdgpu_userq_signal_ioctl (per-fence timing)
3. Trigger queue reset logic if the timer expires before the fence is signaled

v2: make timeout per queue type (adev->gfx_timeout vs adev->compute_timeout vs adev->sdma_timeout) to be consistent with kernel queues. (Alex)
v3: The timeout detection must be independent from the fence, e.g. you don't wait for a timeout on the fence
but rather have the timeout start as soon as the fence is initialized. (Christian)
v4: replace the timer with the `hang_detect_work` delayed work.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c		diff \| blob \| blame \| history
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h		diff \| blob \| blame \| history
drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c		diff \| blob \| blame \| history