git.ipfire.org Git - thirdparty/linux.git/commit

scsi: ufs: core: Handle PM commands timeout before SCSI EH

A PM START STOP sent from the UFS well-known LU resume path can race
with SCSI EH:

The "wl resume" task flow is:
  __ufshcd_wl_resume()
    ufshcd_set_dev_pwr_mode(UFS_ACTIVE_PWR_MODE)
      ufshcd_execute_start_stop()
        scsi_execute_cmd()
          blk_execute_rq           <-- wait
          scsi_check_passthrough() <-- may retry START STOP

If the first START STOP time out, SCSI EH may already recover the link and
reset the device before scsi_execute_cmd() returns:

  scsi_timeout()
    scsi_eh_scmd_add()
      scsi_error_handler()
        scsi_unjam_host()
          scsi_eh_ready_devs()
            scsi_eh_host_reset()
              ufshcd_eh_host_reset_handler()
                if (hba->pm_op_in_progress)
                  ufshcd_link_recovery()
                    ufshcd_device_reset()
                    ufshcd_host_reset_and_restore()
          ...
          scsi_eh_flush_done_q()   <-- wakeup "wl resume" task
        ...                        <-- host still in SHOST_RECOVERY
        scsi_restart_operations()

A later passthrough retry can then run while the host is still in
SHOST_RECOVERY and hit the SCMD_FAIL_IF_RECOVERING path:

  scsi_queue_rq()
    if (scsi_host_in_recovery(shost) &&
        cmd->flags & SCMD_FAIL_IF_RECOVERING)
      return BLK_STS_OFFLINE

That retry completes with DID_ERROR or DID_NO_CONNECT even though EH may
already have restored the device to an operational ACTIVE state.

Handle these PM timeouts directly from ufshcd_eh_timed_out() instead.
After ufshcd_link_recovery(), complete the timed-out command immediately
if it has not been completed already.

For regular SCSI commands, complete them with DID_REQUEUE to match the
existing MCQ force-completion semantics and allow scsi_execute_cmd() to
retry if needed. For reserved internal device-management commands,
finish the request with DID_TIME_OUT without calling
ufshcd_release_scsi_cmd() since those commands use different resource
lifetime rules.

The system_suspending flag is no longer needed because PM command
timeout handling now uses pm_op_in_progress.

Fixes: b8c3a7bac9b6 ("scsi: ufs: Have midlayer retry start stop errors")
Signed-off-by: Hongjie Fang <hongjiefang@asrmicro.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Link: https://patch.msgid.link/20260605112034.3802540-1-hongjiefang@asrmicro.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

author	Hongjie Fang <hongjiefang@asrmicro.com>
	Fri, 5 Jun 2026 11:20:34 +0000 (19:20 +0800)
committer	Martin K. Petersen <martin.petersen@oracle.com>
	Mon, 8 Jun 2026 21:41:45 +0000 (17:41 -0400)
commit	01d5e237b33931b970dd190dd6a19c5ef32c105d
tree	5076c03bac11640ece0eb783451ccccb7f4a9a01	tree \| snapshot
parent	3c08f6034d7459f6d4d5c1599de7bb42c1d89521	commit \| diff

drivers/ufs/core/ufshcd.c		diff \| blob \| blame \| history
include/ufs/ufshcd.h		diff \| blob \| blame \| history