]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
drm/xe/vf: Workaround for race condition in GuC firmware during VF pause
authorMatthew Brost <matthew.brost@intel.com>
Wed, 8 Oct 2025 21:45:27 +0000 (14:45 -0700)
committerMatthew Brost <matthew.brost@intel.com>
Thu, 9 Oct 2025 10:22:57 +0000 (03:22 -0700)
A race condition exists where a paused VF's H2G request can be processed
and subsequently rejected. This rejection results in a FAST_REQ failure
being delivered to the KMD, which then terminates the CT via a dead
worker and triggers a GT reset—an undesirable outcome.

This workaround mitigates the issue by checking if a VF post-migration
recovery is in progress and aborting these adverse actions accordingly.
The GuC firmware will address this bug in an upcoming release. Once that
version is available and VF migration depends on it, this workaround can
be safely removed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-30-matthew.brost@intel.com
drivers/gpu/drm/xe/xe_guc_ct.c

index 3472e4ea2609b6553c753c5b7a6038c67d64141b..3ae1e8db143a41cc60a127f9037a4ed137f0d654 100644 (file)
@@ -1398,6 +1398,10 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
 
                fast_req_report(ct, fence);
 
+               /* FIXME: W/A race in the GuC, will get in firmware soon */
+               if (xe_gt_recovery_pending(gt))
+                       return 0;
+
                CT_DEAD(ct, NULL, PARSE_G2H_RESPONSE);
 
                return -EPROTO;