Xiang Chen [Fri, 2 Oct 2020 14:30:37 +0000 (22:30 +0800)]
scsi: hisi_sas: Filter out new PHY up events during suspend
Currently sas_resume_ha() is called while resuming the controller to wait
for all suspended PHYs to come up and all the libsas events to be
completed.
There is a scenario which will cause task hung: For direct attach with two
disks connected with two PHYs, disable phy0 before suspending the disk on
phy1 and the controller, then enable phy0 and resume the controller, and
task hung occurs as follows:
If an extra phy0 up happens during resume of the SAS controller, it will
emit a new libsas event (event PORTE_BYTES_DMAED and event
DISCE_DISCOVER_DOMAIN). We will call function scsi_sysfs_add_sdev() in
event DISCE_DISCOVER_DOMAIN, which will call __pm_runtime_set_status() to
resume supplier (host controller). For runtime PM core, if device is in the
resuming state, the later resume request of the device will wait for
previous resume request to complete synchronously. At that point in time
the state of the controller is still resuming as it waits for all libsas
events to be completed, while libsas event DISCE_DISCOVER_DOMAIN is blocked
as the state of the controller is resuming which causes a deadlock.
To avoid the issue, filter out new PHY up events while the controller is
suspended.
Xiang Chen [Fri, 2 Oct 2020 14:30:36 +0000 (22:30 +0800)]
scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
Runtime PM of SCSI devices is already supported in SCSI layer, we can
suspend/resume every SCSI device separately. But if there is no link
between hisi_hba and SCSI devices or SCSI targets it will cause issues if
the controller is suspended while SCSI devices are still resuming. Only
when all the SCSI devices under the controller are suspended, the
controller can be suspended. Add the device link between SCSI devices
and the controller.
Xiang Chen [Fri, 2 Oct 2020 14:30:35 +0000 (22:30 +0800)]
scsi: hisi_sas: Add check for methods _PS0 and _PR0
To support system suspend/resume or runtime suspend/resume, need to use the
function pci_set_power_state() to change the power state which requires at
least method _PS0 or _PR0 be filled by platform for v3 hw. So check whether
the method is supported, if not, print a warning.
A Kconfig dependency is added as there is no stub for
acpi_device_power_manageable().
Xiang Chen [Fri, 2 Oct 2020 14:30:33 +0000 (22:30 +0800)]
scsi: hisi_sas: Switch to new framework to support suspend and resume
For v3 hw we will add support for runtime PM which is only supported in new
framework. Legacy PM support and new framework are not allowed to be used
together. Switch to new framework to support suspend and resume.
The reason is that if we use pci_alloc_irq_vectors_affinity() to allocate
IRQs, the number of CQ IRQs can only be less than or equal to the number of
online CPUs, but we use hisi_hba->queue_count (always 16) to iterate during
interrupt_disable_v3_hw().
Use hisi_hba->cq_nvecs to replace hisi_hba->queue_count to avoid
synchronize IRQ on a CPU which does not exist.
Dan Carpenter [Mon, 28 Sep 2020 09:13:00 +0000 (12:13 +0300)]
scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
The be_fill_queue() function can only fail when "eq_vaddress" is NULL and
since it's non-NULL here that means the function call can't fail. But
imagine if it could, then in that situation we would want to store the
"paddr" so that dma memory can be released.
Link: https://lore.kernel.org/r/20200928091300.GD377727@mwanda Fixes: bfead3b2cb46 ("[SCSI] be2iscsi: Adding msix and mcc_rings V3") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Donnelly [Thu, 24 Sep 2020 00:19:20 +0000 (17:19 -0700)]
scsi: target: tcmu: Fix warning: 'page' may be used uninitialized
Corrects drivers/target/target_core_user.c:688:6: warning: 'page' may be
used uninitialized.
Link: https://lore.kernel.org/r/20200924001920.43594-1-john.p.donnelly@oracle.com Fixes: 3c58f737231e ("scsi: target: tcmu: Optimize use of flush_dcache_page") Cc: Mike Christie <michael.christie@oracle.com> Acked-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: John Donnelly <john.p.donnelly@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ye Bin [Wed, 30 Sep 2020 02:19:19 +0000 (10:19 +0800)]
scsi: fnic: Fix inconsistent format argument type in fnic_debugfs.c
Fix the following warnings:
[drivers/scsi/fnic/fnic_debugfs.c:123]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'int'.
[drivers/scsi/fnic/fnic_debugfs.c:125]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'int'.
[drivers/scsi/fnic/fnic_debugfs.c:127]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'int'.
Hannes Reinecke [Fri, 15 May 2020 11:26:47 +0000 (13:26 +0200)]
scsi: fnic: Do not call 'scsi_done()' for unhandled commands
The fnic drivers assigns an ioreq structure to each command and severs this
assignment once scsi_done() has been called and the command has been
completed.
When traversing commands to terminate outstanding I/O we should not call
scsi_done() on commands which do not have a corresponding ioreq structure;
these commands have either never entered the driver or have already been
completed.
[mkp: fixed unused label warning]
Link: https://lore.kernel.org/r/20200515112647.49260-1-hare@suse.de Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Acked-by: Satish Kharat <satishkh@cisco.com> Acked-by: Karan Tilak Kumar <kartilak@cisco.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ye Bin [Wed, 30 Sep 2020 02:25:15 +0000 (10:25 +0800)]
scsi: qla2xxx: Fix inconsistent format argument type in qla_dbg.c
Fix the following warning:
[drivers/scsi/qla2xxx/qla_dbg.c:2451]: (warning) %ld in format string (no. 4)
requires 'long' but the argument type is 'unsigned long'.
Link: https://lore.kernel.org/r/20200930022515.2862532-4-yebin10@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ye Bin [Wed, 30 Sep 2020 02:25:14 +0000 (10:25 +0800)]
scsi: qla2xxx: Fix inconsistent format argument type in qla_os.c
Fix the following warnings:
[drivers/scsi/qla2xxx/qla_os.c:4882]: (warning) %ld in format string (no. 2)
requires 'long' but the argument type is 'unsigned long'.
[drivers/scsi/qla2xxx/qla_os.c:5011]: (warning) %ld in format string (no. 1)
requires 'long' but the argument type is 'unsigned long'.
Link: https://lore.kernel.org/r/20200930022515.2862532-3-yebin10@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ye Bin [Wed, 30 Sep 2020 02:25:13 +0000 (10:25 +0800)]
scsi: qla2xxx: Fix inconsistent format argument type in tcm_qla2xxx.c
Fix the following warnings:
[drivers/scsi/qla2xxx/tcm_qla2xxx.c:884]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[drivers/scsi/qla2xxx/tcm_qla2xxx.c:885]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[drivers/scsi/qla2xxx/tcm_qla2xxx.c:886]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[drivers/scsi/qla2xxx/tcm_qla2xxx.c:887]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
[drivers/scsi/qla2xxx/tcm_qla2xxx.c:888]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'signed int'.
Link: https://lore.kernel.org/r/20200930022515.2862532-2-yebin10@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike Christie [Thu, 1 Oct 2020 15:35:54 +0000 (10:35 -0500)]
scsi: sd: Allow user to configure command retries
Some iSCSI targets went with the traditional "export N ports" approach and
then allowed the initiator to multipath over them. Other targets went the
opposite direction and export a single port, and then software on the
target side performs load balancing and failover to other targets via an
iSCSI specific feature or IP takover.
The problem for the 2nd type of config is we quickly run out of our five
retries and get I/O errors. In these setups we want to reduce resource use
on the initiator side so we only wanted the one session and no
dm-multipath. To handle traditional multipath operations like failover we
do IP takover on the target side. So we would have an iSCSI target running
on node1. Some monitoring software decides it's dead or the node is
overloaded so it starts the iSCSI target on node2. The problem is for the
failover case where we might have the equivalent of a dm-multipath
temporary all paths down, or we just have to try more than 5 nodes before
finding a good one.
To handle this type of issue allow the user to configure the disk cmd
retries from -1 to the current max of 5. -1 means infinite retries and
should be used for setups where some other setting is going to control when
to fail. For example iSCSI has the replacement/recovery timeout and fc
(some users have used FC with NPIV and done something similar as IP
takover) has dev_loss_tmo/fast_io_fail which will eventually expire and
fail I/O.
Mike Christie [Thu, 1 Oct 2020 15:35:53 +0000 (10:35 -0500)]
scsi: core: Add limitless cmd retry support
Add infinite retry support to SCSI midlayer by combining common checks for
retries into some helper functions, and then checking for the
-1/SCSI_CMD_RETRIES_NO_LIMIT.
Roman Bolshakov [Tue, 29 Sep 2020 12:59:57 +0000 (15:59 +0300)]
scsi: target: core: Add CONTROL field for trace events
trace-cmd report doesn't show events from target subsystem because
scsi_command_size() leaks through event format string:
[target:target_sequencer_start] function scsi_command_size not defined
[target:target_cmd_complete] function scsi_command_size not defined
Addition of scsi_command_size() to plugin_scsi.c in trace-cmd doesn't
help because an expression is used inside TP_printk(). trace-cmd event
parser doesn't understand minus sign inside [ ]:
Error: expected ']' but read '-'
Rather than duplicating kernel code in plugin_scsi.c, provide a dedicated
field for CONTROL byte.
Driver was using a shorter timeout waiting for PLOGI from the peer in
point-to-point configurations. Some devices takes some time (~4 seconds) to
initiate the PLOGI. This peer initiating PLOGI is when the peer has a
higher P-WWN.
Increase the wait time based on N2N R_A_TOV.
Link: https://lore.kernel.org/r/20200929102152.32278-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Normally, the MPI firmware is reset when an MPI dump is collected. If an
unsaved MPI dump exists in the driver, though, an alternate mechanism is
used. This mechanism, which was not fully correct, is not recommended and
instead an MPI dump template walk is suggested to perform the MPI reset.
To allow for the MPI dump template walk, extra space is reserved in the MPI
dump buffer which gets used only when there is already an MPI dump in
place.
Current code uses wrong mailbox option to extract bbc from firmware. This
field is nested inside of PLOGI payload. Extract bbc from PLOGI template
payload.
Link: https://lore.kernel.org/r/20200929102152.32278-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
BIT_13 of extended FW attribute informs about NVMe-2 support. Set BIT_15
of special feature control block for enabling SLER in FW. Set bit 8 (SLER
supported) to 1 for the service parameter information when sending NVMe
PRLI request. Set BIT_14 of special feature control block for enabling PI
Control in FW. Driver should set bit 9 (PI Control supported) to 1 for the
service parameter information when sending NVMe PRLI request. Set BIT_13
for NVMe Async events.
Link: https://lore.kernel.org/r/20200904045128.23631-13-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch tracks number of IOCB resources used in the I/O fast path. If
the number of used IOCBs reach a high water limit, driver would return the
I/O as busy and let upper layer retry. This prevents over subscription of
IOCB resources where any future error recovery command is unable to cut
through. Enable IOCB throttling by default.
scsi: qla2xxx: Fix I/O errors during LIP reset tests
In .fcp_io(), returning ENODEV as soon as remote port delete has started
can cause I/O errors. Fix this by returning EBUSY until the remote port
delete finishes.
Link: https://lore.kernel.org/r/20200904045128.23631-9-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: qla2xxx: Honor status qualifier in FCP_RSP per spec
FCP-4 (referred FCP-4 rev-2b) identifies the earlier known "retry delay
timer" field as "status qualifier", which is described in SAM-5 and later
specs. This fix makes appropriate driver side modifications to honor the
new definition. The SAM document referred was SAM-6 rev-5.
scsi: qla2xxx: Fix I/O failures during remote port toggle testing
Driver was using a lower value for dev_loss_tmo making it more prone to I/O
failures during remote port toggle testing. Set dev_loss_tmo to zero during
remote port registration to allow nvme-fc default dev_loss_tmo to be used,
which is higher than what driver was using.
Brian King [Wed, 16 Sep 2020 20:09:59 +0000 (15:09 -0500)]
scsi: ibmvfc: Protect vhost->task_set increment by the host lock
In the discovery thread, ibmvfc does a vhost->task_set++ without any lock
held. This could result in two targets getting the same cancel key, which
could have strange effects in error recovery. The actual probability of
this occurring should be extremely small, since this should all be done in
a single threaded loop from the discovery thread, but let's fix it up
anyway to be safe.
Bodo Stroesser [Thu, 10 Sep 2020 15:50:41 +0000 (17:50 +0200)]
scsi: target: tcmu: Optimize scatter_data_area()
scatter_data_area() has two purposes:
1) Create the iovs for the data area buffer of a SCSI cmd.
2) If there is data in DMA_TO_DEVICE direction, copy
the data from sg_list to data area buffer.
Both are done in a common loop.
In case of DMA_FROM_DEVICE data transfer, scatter_data_area() is called
with parameter copy_data = false. But this flag is just used to skip
memcpy() for data, while radix_tree_lookup still is called for every dbi of
the area area buffer, and kmap and kunmap are called for every page from
sg_list and data_area as well as flush_dcache_page() for the data area
pages. Since the only thing to do with copy_data = false would be to set
up the iovs, this is a noticeable overhead. Rework the iov creation in the
main loop of scatter_data_area() providing the new function
new_block_to_iov(). Based on this, create the short new function
tcmu_setup_iovs() that only writes the iovs with no overhead. This new
function is now called instead of scatter_data_area() for bidi buffers and
for data buffers in those cases where memcpy() would have been skipped.
Bodo Stroesser [Thu, 10 Sep 2020 15:50:40 +0000 (17:50 +0200)]
scsi: target: tcmu: Optimize queue_cmd_ring()
queue_cmd_ring() needs to check whether there is enough space in cmd ring
and data area for the cmd to queue.
Currently the sequence is:
1) Calculate size the cmd will occupy on the ring based on estimation of
needed iovs.
2) Check whether there is enough space on the ring based on size from 1)
3) Allocate buffers in data area.
4) Calculate number of iovs the command really needs while copying
incoming data (if any) to data area.
5) Re-calculate real size of cmd on ring based on real number of iovs.
6) Set up possible padding and cmd on the ring.
Step 1) must not underestimate the cmd size so use max possible number of
iovs for the given I/O data size. The resulting overestimation can be
really high so this sequence is not ideal. The earliest the real number of
iovs can be calculated is after data buffer allocation. Therefore rework
the code to implement the following sequence:
A) Allocate buffers on data area and calculate number of necessary iovs
during this.
B) Calculate real size of cmd on ring based on number of iovs.
C) Check whether there is enough space on the ring.
D) Set up possible padding and cmd on the ring.
The new sequence enforces the split of new function tcmu_alloc_data_space()
from is_ring_space_avail(). Using this function, change queue_cmd_ring()
according to the new sequence.
Change routines called by tcmu_alloc_data_space() to allow calculating and
returning the iov count. Remove counting of iovs in scatter_data_area().
Bodo Stroesser [Thu, 10 Sep 2020 15:50:39 +0000 (17:50 +0200)]
scsi: target: tcmu: Join tcmu_cmd_get_data_length() and tcmu_cmd_get_block_cnt()
Simplify code by joining tcmu_cmd_get_data_length() and
tcmu_cmd_get_block_cnt() into tcmu_cmd_set_block_cnts(). The new function
sets tcmu_cmd->dbi_cnt and also the new field tcmu_cmd->dbi_bidi_cnt which
is needed for further enhancements in following patches. Simplify some
code by using tcmu_cmd->dbi(_bidi)_cnt instead of calculation from length.
Please note: The calculation of the number of dbis needed for bidi was
wrong. It was based on the length of the first bidi sg only. I changed it
to correctly sum up entire length of all bidi sgs.
Ming Lei [Thu, 10 Sep 2020 07:50:56 +0000 (15:50 +0800)]
scsi: core: Only re-run queue in scsi_end_request() if device queue is busy
The request queue is currently run unconditionally in scsi_end_request() if
both target queue and host queue are ready.
Recently Long Li reported that cost of a queue run can be very heavy in
case of high queue depth. Improve this situation by only running the
request queue when this LUN is busy.
Link: https://lore.kernel.org/r/20200910075056.36509-1-ming.lei@redhat.com Reported-by: Long Li <longli@microsoft.com> Tested-by: Long Li <longli@microsoft.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Matej Genci [Fri, 28 Aug 2020 12:21:35 +0000 (12:21 +0000)]
scsi: virtio_scsi: Rescan the entire target on transport reset when LUN is 0
VirtIO 1.0 spec says:
The removed and rescan events ... when sent for LUN 0, they MAY
apply to the entire target so the driver can ask the initiator
to rescan the target to detect this.
This change introduces the behaviour described above by scanning the entire
SCSI target when LUN is set to 0. This is both a functional and a
performance fix. It aligns the driver with the spec and allows control
planes to hotplug targets with large numbers of LUNs without having to
request a RESCAN for each one of them.
Jason Yan [Tue, 15 Sep 2020 08:40:18 +0000 (16:40 +0800)]
scsi: myrb: Make some symblos static
This addresses the following sparse warning:
drivers/scsi/myrb.c:2229:27: warning: symbol 'myrb_template' was not
declared. Should it be static?
drivers/scsi/myrb.c:2318:31: warning: symbol 'myrb_raid_functions' was
not declared. Should it be static?
drivers/scsi/myrb.c:2492:6: warning: symbol 'myrb_err_status' was not
declared. Should it be static?
Jason Yan [Tue, 15 Sep 2020 08:40:08 +0000 (16:40 +0800)]
scsi: myrs: Make some symbols static
This addresses the following sparse warning:
drivers/scsi/myrs.c:1532:5: warning: symbol 'myrs_host_reset' was not
declared. Should it be static?
drivers/scsi/myrs.c:1922:27: warning: symbol 'myrs_template' was not
declared. Should it be static?
drivers/scsi/myrs.c:2036:31: warning: symbol 'myrs_raid_functions' was
not declared. Should it be static?
drivers/scsi/myrs.c:2046:6: warning: symbol 'myrs_flush_cache' was not
declared. Should it be static?
Jason Yan [Sat, 12 Sep 2020 03:37:58 +0000 (11:37 +0800)]
scsi: bnx2fc: Make a bunch of symbols static in bnx2fc_fcoe.c
This eliminates the following sparse warning:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:53:1: warning: symbol
'bnx2fc_global_lock' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:111:6: warning: symbol
'bnx2fc_devloss_tmo' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:116:6: warning: symbol
'bnx2fc_max_luns' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:121:6: warning: symbol
'bnx2fc_queue_depth' was not declared. Should it be static?
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:126:6: warning: symbol
'bnx2fc_log_fka' was not declared. Should it be static?
Jason Yan [Sat, 12 Sep 2020 03:37:49 +0000 (11:37 +0800)]
scsi: aacraid: Make some symbols static in aachba.c
This eliminates the following sparse warning:
drivers/scsi/aacraid/aachba.c:245:5: warning: symbol 'aac_convert_sgl'
was not declared. Should it be static?
drivers/scsi/aacraid/aachba.c:293:5: warning: symbol 'acbsize' was not
declared. Should it be static?
drivers/scsi/aacraid/aachba.c:324:5: warning: symbol 'aac_wwn' was not
declared. Should it be static?
Ye Bin [Wed, 2 Sep 2020 06:16:46 +0000 (14:16 +0800)]
scsi: sym53c8xx_2: Delete unnecessary else-if in sym_xerr_cam_status()
If (x_status & XE_PARITY_ERR) is true we set cam_status = DID_PARITY,
othervise cam_status always ends up being DID_ERROR. Delete superfluous
else-if statements.
Brian King [Fri, 11 Sep 2020 21:28:26 +0000 (16:28 -0500)]
scsi: ibmvfc: Avoid link down on FS9100 canister reboot
When a canister on a FS9100, or similar storage, running in NPIV mode, is
rebooted, its WWPNs will fail over to another canister. When this occurs,
we see a WWPN going away from the fabric at one N-Port ID, and, a short
time later, the same WWPN appears at a different N-Port ID. When the
canister is fully operational again, the WWPNs fail back to the original
canister. If there is any I/O outstanding to the target when this occurs,
it will result in the implicit logout the ibmvfc driver issues before
removing the rport to fail. When the WWPN then shows up at a different
N-Port ID, and we issue a PLOGI to it, the VIOS will see that it still has
a login for this WWPN at the old N-Port ID, which results in the VIOS
simulating a link down / link up sequence to the client, in order to get
the VIOS and client LPAR in sync.
The patch below improves the way we handle this scenario so as to avoid the
link bounce, which affects all targets under the virtual host adapter. The
change is to utilize the Move Login MAD, which will work even when I/O is
outstanding to the target. The change only alters the target state machine
for the case where the implicit logout fails prior to deleting the rport.
If this implicit logout fails, we defer deleting the ibmvfc_target object
after calling fc_remote_port_delete. This enables us to later retry the
implicit logout after terminate_rport_io occurs, or to issue the Move Login
request if a WWPN shows up at a new N-Port ID prior to this occurring.
This has been tested by IBM's storage interoperability team on a FS9100,
forcing the failover to occur. With debug tracing enabled in the ibmvfc
driver, we confirmed the move login was sent in this scenario and confirmed
the link bounce no longer occurred.
Damien Le Moal [Thu, 10 Sep 2020 07:48:42 +0000 (16:48 +0900)]
scsi: core: Update additional sense codes list
Add missing Additional Sense Codes listed in
http://www.t10.org/lists/asc-num.txt.
Link: https://lore.kernel.org/r/20200910074843.217661-3-damien.lemoal@wdc.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 10 Sep 2020 07:48:41 +0000 (16:48 +0900)]
scsi: core: Clean up scsi_noretry_cmd()
No need for else after return.
Link: https://lore.kernel.org/r/20200910074843.217661-2-damien.lemoal@wdc.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Daejun Park [Wed, 2 Sep 2020 02:58:52 +0000 (11:58 +0900)]
scsi: ufs: Fix NOP OUT timeout value
Boot occasionally fails with some Samsung low-power UFS devices. The reason
is that these devices have a little bit higher latency for NOP OUT
responses. This causes boot to fail because the NOP OUT command is issued
during initialization to check whether the device transport protocol is
ready or not. Increase NOP_OUT_TIMEOUT value from 30 to 50ms.
Tomas Henzl [Thu, 10 Sep 2020 14:21:26 +0000 (16:21 +0200)]
scsi: mpt3sas: Fix sync irqs
_base_process_reply_queue() called from _base_interrupt() may schedule a
new irq poll. Fix this by calling synchronize_irq() first.
Also ensure that enable_irq() is called only when necessary to avoid
"Unbalanced enable for IRQ..." errors.
Link: https://lore.kernel.org/r/20200910142126.8147-1-thenzl@redhat.com Fixes: 320e77acb327 ("scsi: mpt3sas: Irq poll to avoid CPU hard lockups") Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sreekanth Reddy [Fri, 14 Aug 2020 13:04:26 +0000 (13:04 +0000)]
scsi: mpt3sas: Detect tampered Aero and Sea adapters
The driver will throw an error message when a tampered type controller
is detected. The intent is to avoid interacting with any firmware
which is not secured/signed by Broadcom. Any tampering on firmware
component will be detected by hardware and it will be communicated to
the driver to avoid any further interaction with that component.
Jason Yan [Tue, 15 Sep 2020 08:39:48 +0000 (16:39 +0800)]
scsi: megaraid: Make smp_affinity_enable static
This addresses the following sparse warning:
drivers/scsi/megaraid/megaraid_sas_base.c:80:5: warning: symbol
'smp_affinity_enable' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20200915083948.2826598-1-yanaijie@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi: zfcp: Clarify access to erp_action in zfcp_fsf_req_complete()
While reviewing commit 936e6b85da04 ("scsi: zfcp: Fix panic on ERP timeout
for previously dismissed ERP action"), I stumbled over
zfcp_fsf_req_complete() and wondered whether it has similar issues wrt
concurrent modification of req->erp_action by
zfcp_erp_strategy_check_fsfreq().
But a closer look shows that both its two callers [zfcp_fsf_reqid_check(),
zfcp_fsf_req_dismiss_all()] remove the request from the adapter's req_list
under the req_list's lock. Hence we can trust that if
zfcp_erp_strategy_check_fsfreq() concurrently looks up the corresponding
req_id, it won't find this request and is thus unable to modify it while
it's being processed by zfcp_fsf_req_complete().
Add a code comment that hopefully makes this easier for future readers, and
condense the two accesses to ->erp_action that made me trip over this code
path in the first place.
Jason Yan [Fri, 11 Sep 2020 09:10:21 +0000 (17:10 +0800)]
scsi: qla2xxx: Remove unneeded variable 'rval'
This addresses the following coccinelle warning:
drivers/scsi/qla2xxx/qla_init.c:7112:5-9: Unneeded variable: "rval".
Return "QLA_SUCCESS" on line 7115
Link: https://lore.kernel.org/r/20200911091021.2937708-1-yanaijie@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Adrian Hunter [Thu, 27 Aug 2020 07:20:30 +0000 (10:20 +0300)]
scsi: ufs-pci: Add LTR support for Intel controllers
Intel host controllers support the setting of latency tolerance.
Accordingly, implement the PM QoS ->set_latency_tolerance() callback. The
raw register values are also exposed via debugfs.
Link: https://lore.kernel.org/r/20200827072030.24655-1-adrian.hunter@intel.com Fixes: 8c09d7527697 ("scsi: ufshdc-pci: Add Intel PCI IDs for EHL") Fixes: 1ab27c9cf8b6 ("ufs: Add support for clock gating") Reviewed-by: Avri Altman <avri.altman@wdc.com> Acked-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ye Bin [Wed, 9 Sep 2020 08:27:16 +0000 (16:27 +0800)]
scsi: lpfc: Remove set but not used 'qp'
This addresses the following gcc warning with "make W=1":
not used [-Wunused-but-set-variable]
struct lpfc_sli4_hdw_queue *qp;
^
Link: https://lore.kernel.org/r/20200909082716.37787-1-yebin10@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/scsi/lpfc/lpfc_debugfs.c: In function ‘lpfc_debugfs_hdwqstat_data’:
drivers/scsi/lpfc/lpfc_debugfs.c:1699:30: warning: variable ‘qp’ set but
Ye Bin [Wed, 9 Sep 2020 08:26:26 +0000 (16:26 +0800)]
scsi: gdth: Remove set but used 'cmd_index'
This addresses the following gcc warning with "make W=1":
drivers/scsi/gdth.c: In function ‘gdth_async_event’:
drivers/scsi/gdth.c:3010:9: warning: variable ‘cmd_index’ set but not
used [-Wunused-but-set-variable]
int cmd_index;
Ye Bin [Wed, 9 Sep 2020 08:26:27 +0000 (16:26 +0800)]
scsi: pmcraid: Remove set but not used 'res'
This addresses the following gcc warning with "make W=1":
drivers/scsi/pmcraid.c: In function ‘pmcraid_abort_cmd’:
drivers/scsi/pmcraid.c:2863:33: warning: variable ‘res’ set but not
used [-Wunused-but-set-variable]
struct pmcraid_resource_entry *res;
^
Jason Yan [Mon, 7 Sep 2020 07:45:18 +0000 (15:45 +0800)]
scsi: qla1280: Remove set but not used variable in qla1280_status_entry()
This addresses the following gcc warning with "make W=1":
drivers/scsi/qla1280.c: In function ‘qla1280_status_entry’:
drivers/scsi/qla1280.c:3607:28: warning: variable ‘lun’ set but not used
[-Wunused-but-set-variable]
3607 | unsigned int bus, target, lun;
| ^~~
drivers/scsi/qla1280.c:3607:20: warning: variable ‘target’ set but not
used [-Wunused-but-set-variable]
3607 | unsigned int bus, target, lun;
| ^~~~~~
drivers/scsi/qla1280.c:3607:15: warning: variable ‘bus’ set but not used
[-Wunused-but-set-variable]
3607 | unsigned int bus, target, lun;
| ^~~
Jason Yan [Mon, 7 Sep 2020 07:45:17 +0000 (15:45 +0800)]
scsi: qla1280: Remove set but not used variable in qla1280_mailbox_command()
This addresses the following gcc warning with "make W=1":
drivers/scsi/qla1280.c: In function ‘qla1280_mailbox_command’:
drivers/scsi/qla1280.c:2430:11: warning: variable ‘data’ set but not
used [-Wunused-but-set-variable]
2430 | uint16_t data;
| ^~~~
Jason Yan [Mon, 7 Sep 2020 07:45:16 +0000 (15:45 +0800)]
scsi: qla1280: Remove set but not used variable in qla1280_nvram_config()
This addresses the following gcc warning with "make W=1":
drivers/scsi/qla1280.c: In function ‘qla1280_nvram_config’:
drivers/scsi/qla1280.c:2188:36: warning: variable ‘ddma_conf’ set but
not used [-Wunused-but-set-variable]
2188 | uint16_t hwrev, cfg1, cdma_conf, ddma_conf;
| ^~~~~~~~~