From: Victor Skvortsov Date: Sun, 19 May 2024 14:39:43 +0000 (-0400) Subject: drm/amdgpu: Queue KFD reset workitem in VF FED X-Git-Tag: v6.11-rc1~141^2~25^2~133 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=5434bc03f52de2ec57d6ce684b1853928f508cbc;p=thirdparty%2Fkernel%2Flinux.git drm/amdgpu: Queue KFD reset workitem in VF FED The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED. Signed-off-by: Victor Skvortsov Reviewed-by: Zhigang Luo Signed-off-by: Alex Deucher --- diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index d98d619fba975..3d5f58e76f2de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -602,7 +602,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct work_struct *work) amdgpu_sriov_runtime(adev)) { amdgpu_ras_set_fed(adev, true); if (amdgpu_reset_domain_schedule(adev->reset_domain, - &adev->virt.flr_work)) + &adev->kfd.reset_work)) return; else dev_err(adev->dev, "Failed to queue work! at %s", __func__);