From fccb446f82b9155c05758d1fa30af4a06494e0ec Mon Sep 17 00:00:00 2001 From: Lijo Lazar Date: Mon, 9 Dec 2024 09:14:53 +0530 Subject: [PATCH] drm/amdgpu: Avoid VF for RAS recovery source check VF device sets the RAS flag when mailbox data can't be read properly. There is no conclusive way to tell if the real source is RAS error. Therefore VF schedules a KFD based reset which doesn't set RAS source. SKip checking RAS source for any VF scheduled recovery. Signed-off-by: Lijo Lazar Reported-by: Vojislav Tomasevic Reviewed-by: Yiqing Yao Tested-by: Yiqing Yao Fixes: e1ee2111ca48 ("drm/amdgpu: Prefer RAS recovery for scheduler hang") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 144295da9e4cc..e22fc7a8101f0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5866,6 +5866,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, * detected at the same time, let RAS recovery take care of it. */ if (amdgpu_ras_is_err_state(adev, AMDGPU_RAS_BLOCK__ANY) && + !amdgpu_sriov_vf(adev) && reset_context->src != AMDGPU_RESET_SRC_RAS) { dev_dbg(adev->dev, "Gpu recovery from source: %d yielding to RAS error recovery handling", -- 2.39.5