3.18-stable patches

author Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thu, 10 Jan 2019 14:15:55 +0000 (15:15 +0100)

committer Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thu, 10 Jan 2019 14:15:55 +0000 (15:15 +0100)
author Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 10 Jan 2019 14:15:55 +0000 (15:15 +0100)
committer Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 10 Jan 2019 14:15:55 +0000 (15:15 +0100)
diff --git a/queue-3.18/scsi-zfcp-fix-posting-too-many-status-read-buffers-leading-to-adapter-shutdown.patch b/queue-3.18/scsi-zfcp-fix-posting-too-many-status-read-buffers-leading-to-adapter-shutdown.patch

new file mode 100644 (file)

index 0000000..3b27f1f
--- /dev/null
+++ b/queue-3.18/scsi-zfcp-fix-posting-too-many-status-read-buffers-leading-to-adapter-shutdown.patch
@@ -0,0 +1,92 @@
+From 60a161b7e5b2a252ff0d4c622266a7d8da1120ce Mon Sep 17 00:00:00 2001
+From: Steffen Maier <maier@linux.ibm.com>
+Date: Thu, 6 Dec 2018 17:31:20 +0100
+Subject: scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
+
+From: Steffen Maier <maier@linux.ibm.com>
+
+commit 60a161b7e5b2a252ff0d4c622266a7d8da1120ce upstream.
+
+Suppose adapter (open) recovery is between opened QDIO queues and before
+(the end of) initial posting of status read buffers (SRBs). This time
+window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
+causing by design looping with exponential increase sleeps in the function
+performing exchange config data during recovery
+[zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.
+
+Suppose an event occurs for which the FCP channel would send an unsolicited
+notification to zfcp by means of a previously posted SRB.  We saw it with
+local cable pull (link down) in multi-initiator zoning with multiple
+NPIV-enabled subchannels of the same shared FCP channel.
+
+As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial
+status read buffers from within the adapter's ERP thread, the channel does
+send an unsolicited notification.
+
+Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted
+status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
+adapter->stat_work to re-fill the just consumed SRB from a work item.
+
+Now the ERP thread and the work item post SRBs in parallel.  Both contexts
+call the helper function zfcp_status_read_refill().  The tracking of
+missing (to be posted / re-filled) SRBs is not thread-safe due to separate
+atomic_read() and atomic_dec(), in order to depend on posting
+success. Hence, both contexts can see
+atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
+one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
+(trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
+zfcp_qdio_handler_error().
+
+An obvious and seemingly clean fix would be to schedule stat_work from the
+ERP thread and wait for it to finish. This would serialize all SRB
+re-fills. However, we already have another work item wait on the ERP
+thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
+zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the
+open port recoveries during zfcp auto port scan, but in fact it waits for
+any pending recovery including an adapter recovery. This approach leads to
+a deadlock.  [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan
+resiliency"); v2.6.37 commit d3e1088d6873
+("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
+v2.6.28 commit fca55b6fb587
+("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
+fixing v2.6.27 commit c57a39a45a76
+("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
+v2.6.27 commit cc8c282963bd
+("[SCSI] zfcp: Automatically attach remote ports")]
+
+Instead make the accounting of missing SRBs atomic for parallel execution
+in both the ERP thread and adapter->stat_work.
+
+Signed-off-by: Steffen Maier <maier@linux.ibm.com>
+Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
+Cc: <stable@vger.kernel.org> #2.6.27+
+Reviewed-by: Jens Remus <jremus@linux.ibm.com>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ drivers/s390/scsi/zfcp_aux.c |    6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/drivers/s390/scsi/zfcp_aux.c
++++ b/drivers/s390/scsi/zfcp_aux.c
+@@ -275,16 +275,16 @@ static void zfcp_free_low_mem_buffers(st
+  */
+ int zfcp_status_read_refill(struct zfcp_adapter *adapter)
+ {
+-      while (atomic_read(&adapter->stat_miss) > 0)
++      while (atomic_add_unless(&adapter->stat_miss, -1, 0))
+               if (zfcp_fsf_status_read(adapter->qdio)) {
++                      atomic_inc(&adapter->stat_miss); /* undo add -1 */
+                       if (atomic_read(&adapter->stat_miss) >=
+                           adapter->stat_read_buf_num) {
+                               zfcp_erp_adapter_reopen(adapter, 0, "axsref1");
+                               return 1;
+                       }
+                       break;
+-              } else
+-                      atomic_dec(&adapter->stat_miss);
++              }
+       return 0;
+ }
+ 
diff --git a/queue-3.18/series b/queue-3.18/series

index 057b226733b99dbc8385c785e944c453e34e1b1f..5c4ed8730786bf33eacbf58070008db5ecfbc72c 100644 (file)
--- a/queue-3.18/series
+++ b/queue-3.18/series
@@ -28,3 +28,4 @@ media-vivid-free-bitmap_cap-when-updating-std-timings-etc.patch
  mips-ensure-pmd_present-returns-false-after-pmd_mknotpresent.patch
  mips-align-kernel-load-address-to-64kb.patch
  cifs-fix-error-mapping-for-smb2_lock-command-which-caused-ofd-lock-problem.patch
+scsi-zfcp-fix-posting-too-many-status-read-buffers-leading-to-adapter-shutdown.patch
author	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	Thu, 10 Jan 2019 14:15:55 +0000 (15:15 +0100)
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	Thu, 10 Jan 2019 14:15:55 +0000 (15:15 +0100)
queue-3.18/scsi-zfcp-fix-posting-too-many-status-read-buffers-leading-to-adapter-shutdown.patch	[new file with mode: 0644]	patch \| blob
queue-3.18/series		patch \| blob \| blame \| history