]> git.ipfire.org Git - thirdparty/kernel/stable.git/commit
drm/amdgpu: Fix the race condition for draining retry fault
authorEmily Deng <Emily.Deng@amd.com>
Thu, 6 Mar 2025 00:40:01 +0000 (08:40 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Sun, 20 Apr 2025 08:15:26 +0000 (10:15 +0200)
commite64be12f8401819662e608efa247638b61d023cd
treef9dbd4b491923a0ad1a08dff43424491a1406a8a
parent8feefd106afb0592f8f574b31341dbeb034d9086
drm/amdgpu: Fix the race condition for draining retry fault

[ Upstream commit f844732e3ad9c4b78df7436232949b8d2096d1a6 ]

Issue:
In the scenario where svm_range_restore_pages is called, but
svm->checkpoint_ts has not been set and the retry fault has not been
drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free.
Meanwhile, svm_range_restore_pages continues execution and reaches
svm_range_from_addr. This results in a "failed to find prange..." error,
 causing the page recovery to fail.

How to fix:
Move the timestamp check code under the protection of svm->lock.

v2:
Make sure all right locks are released before go out.

v3:
Directly goto out_unlock_svms, and return -EAGAIN.

v4:
Refine code.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c