From: Yunxiang Li Date: Fri, 5 Jun 2026 12:59:34 +0000 (-0400) Subject: drm/amdgpu: skip already suspended IP blocks in ip_suspend_phase2 X-Git-Tag: v7.2-rc1~10^2~1^2~28 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b29aaf0a72222bfafe4d73899dadb0486b5bcb09;p=thirdparty%2Fkernel%2Flinux.git drm/amdgpu: skip already suspended IP blocks in ip_suspend_phase2 The GPU reload test (S3 / mode1 reset / module reload) triggers a WARN_ON in amdgpu_irq_put() on gfx10 when unloading amdgpu: WARNING: CPU: 0 PID: 2314 at amd/amdgpu/amdgpu_irq.c:676 amdgpu_irq_put+0xc3/0xe0 [amdgpu] Call Trace: gfx_v10_0_hw_fini+0x41/0x150 [amdgpu] amdgpu_ip_block_hw_fini+0x29/0xc0 [amdgpu] amdgpu_device_fini_hw+0x315/0x610 [amdgpu] amdgpu_driver_unload_kms+0x7c/0x90 [amdgpu] amdgpu_pci_remove+0x51/0x90 [amdgpu] amdgpu_device_ip_resume_phase2() skips IP blocks whose status.hw is already set, but amdgpu_device_ip_suspend_phase2() never had the matching guard, so a block can be suspended twice (e.g. a reset or recovery issued while the device is already suspended). The second suspend runs hw_fini again, which now releases the gfx fault IRQs unconditionally, dropping a refcount that is already zero and tripping the WARN_ON in amdgpu_irq_put(). The fault/EOP IRQ get/put were balanced through late_init/hw_fini before, which masked the double-suspend; moving the get into hw_init made the suspend/resume asymmetry visible as an IRQ refcount underflow. Honor status.hw in ip_suspend_phase2() so suspend mirrors resume and a block is only torn down once. Fixes: 9117d8be850b ("drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_fini") Fixes: 482f0e538580 ("drm/amdgpu: fix double ucode load by PSP(v3)") Signed-off-by: Yunxiang Li Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit f44f2af13c418969be358b15743f939d705de998) --- diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 942f0251c748f..0fa2ce36c2ea9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3043,7 +3043,7 @@ static int amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev) amdgpu_dpm_gfx_state_change(adev, sGpuChangeState_D3Entry); for (i = adev->num_ip_blocks - 1; i >= 0; i--) { - if (!adev->ip_blocks[i].status.valid) + if (!adev->ip_blocks[i].status.valid || !adev->ip_blocks[i].status.hw) continue; /* displays are handled in phase1 */ if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE)