From: Greg Kroah-Hartman Date: Mon, 23 Jul 2018 07:59:00 +0000 (+0200) Subject: 4.9-stable patches X-Git-Tag: v4.4.144~8 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=1f88ca39d5729a7bc07f9fefbc31e8b43f6802fb;p=thirdparty%2Fkernel%2Fstable-queue.git 4.9-stable patches added patches: block-do-not-use-interruptible-wait-anywhere.patch xhci-fix-perceived-dead-host-due-to-runtime-suspend-race-with-event-handler.patch xprtrdma-return-enobufs-when-no-pages-are-available.patch --- diff --git a/queue-4.9/block-do-not-use-interruptible-wait-anywhere.patch b/queue-4.9/block-do-not-use-interruptible-wait-anywhere.patch new file mode 100644 index 00000000000..fb76a65faaa --- /dev/null +++ b/queue-4.9/block-do-not-use-interruptible-wait-anywhere.patch @@ -0,0 +1,72 @@ +From 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 Mon Sep 17 00:00:00 2001 +From: Alan Jenkins +Date: Thu, 12 Apr 2018 19:11:58 +0100 +Subject: block: do not use interruptible wait anywhere + +From: Alan Jenkins + +commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream. + +When blk_queue_enter() waits for a queue to unfreeze, or unset the +PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. + +The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec +("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI +device is resumed asynchronously, i.e. after un-freezing userspace tasks. + +So that commit exposed the bug as a regression in v4.15. A mysterious +SIGBUS (or -EIO) sometimes happened during the time the device was being +resumed. Most frequently, there was no kernel log message, and we saw Xorg +or Xwayland killed by SIGBUS.[1] + +[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 + +Without this fix, I get an IO error in this test: + +# dd if=/dev/sda of=/dev/null iflag=direct & \ + while killall -SIGUSR1 dd; do sleep 0.1; done & \ + echo mem > /sys/power/state ; \ + sleep 5; killall dd # stop after 5 seconds + +The interruptible wait was added to blk_queue_enter in +commit 3ef28e83ab15 ("block: generic request_queue reference counting"). +Before then, the interruptible wait was only in blk-mq, but I don't think +it could ever have been correct. + +Reviewed-by: Bart Van Assche +Cc: stable@vger.kernel.org +Signed-off-by: Alan Jenkins +Signed-off-by: Jens Axboe +Signed-off-by: Sudip Mukherjee +Signed-off-by: Greg Kroah-Hartman +--- + block/blk-core.c | 9 +++------ + 1 file changed, 3 insertions(+), 6 deletions(-) + +--- a/block/blk-core.c ++++ b/block/blk-core.c +@@ -652,7 +652,6 @@ EXPORT_SYMBOL(blk_alloc_queue); + int blk_queue_enter(struct request_queue *q, bool nowait) + { + while (true) { +- int ret; + + if (percpu_ref_tryget_live(&q->q_usage_counter)) + return 0; +@@ -660,13 +659,11 @@ int blk_queue_enter(struct request_queue + if (nowait) + return -EBUSY; + +- ret = wait_event_interruptible(q->mq_freeze_wq, +- !atomic_read(&q->mq_freeze_depth) || +- blk_queue_dying(q)); ++ wait_event(q->mq_freeze_wq, ++ !atomic_read(&q->mq_freeze_depth) || ++ blk_queue_dying(q)); + if (blk_queue_dying(q)) + return -ENODEV; +- if (ret) +- return ret; + } + } + diff --git a/queue-4.9/series b/queue-4.9/series index 94fdf52856f..6a0fd5f2690 100644 --- a/queue-4.9/series +++ b/queue-4.9/series @@ -23,3 +23,6 @@ tg3-add-higher-cpu-clock-for-5762.patch net-usb-asix-replace-mii_nway_restart-in-resume-path.patch net-don-t-copy-pfmemalloc-flag-in-__copy_skb_header.patch skbuff-unconditionally-copy-pfmemalloc-in-__skb_clone.patch +xhci-fix-perceived-dead-host-due-to-runtime-suspend-race-with-event-handler.patch +xprtrdma-return-enobufs-when-no-pages-are-available.patch +block-do-not-use-interruptible-wait-anywhere.patch diff --git a/queue-4.9/xhci-fix-perceived-dead-host-due-to-runtime-suspend-race-with-event-handler.patch b/queue-4.9/xhci-fix-perceived-dead-host-due-to-runtime-suspend-race-with-event-handler.patch new file mode 100644 index 00000000000..fbb6f705fc9 --- /dev/null +++ b/queue-4.9/xhci-fix-perceived-dead-host-due-to-runtime-suspend-race-with-event-handler.patch @@ -0,0 +1,123 @@ +From 229bc19fd7aca4f37964af06e3583c1c8f36b5d6 Mon Sep 17 00:00:00 2001 +From: Mathias Nyman +Date: Thu, 21 Jun 2018 16:19:41 +0300 +Subject: xhci: Fix perceived dead host due to runtime suspend race with event handler + +From: Mathias Nyman + +commit 229bc19fd7aca4f37964af06e3583c1c8f36b5d6 upstream. + +Don't rely on event interrupt (EINT) bit alone to detect pending port +change in resume. If no change event is detected the host may be suspended +again, oterwise roothubs are resumed. + +There is a lag in xHC setting EINT. If we don't notice the pending change +in resume, and the controller is runtime suspeded again, it causes the +event handler to assume host is dead as it will fail to read xHC registers +once PCI puts the controller to D3 state. + +[ 268.520969] xhci_hcd: xhci_resume: starting port polling. +[ 268.520985] xhci_hcd: xhci_hub_status_data: stopping port polling. +[ 268.521030] xhci_hcd: xhci_suspend: stopping port polling. +[ 268.521040] xhci_hcd: // Setting command ring address to 0x349bd001 +[ 268.521139] xhci_hcd: Port Status Change Event for port 3 +[ 268.521149] xhci_hcd: resume root hub +[ 268.521163] xhci_hcd: port resume event for port 3 +[ 268.521168] xhci_hcd: xHC is not running. +[ 268.521174] xhci_hcd: handle_port_status: starting port polling. +[ 268.596322] xhci_hcd: xhci_hc_died: xHCI host controller not responding, assume dead + +The EINT lag is described in a additional note in xhci specs 4.19.2: + +"Due to internal xHC scheduling and system delays, there will be a lag +between a change bit being set and the Port Status Change Event that it +generated being written to the Event Ring. If SW reads the PORTSC and +sees a change bit set, there is no guarantee that the corresponding Port +Status Change Event has already been written into the Event Ring." + +Cc: +Signed-off-by: Mathias Nyman +Signed-off-by: Kai-Heng Feng +Signed-off-by: Greg Kroah-Hartman + +--- + drivers/usb/host/xhci.c | 40 +++++++++++++++++++++++++++++++++++++--- + drivers/usb/host/xhci.h | 4 ++++ + 2 files changed, 41 insertions(+), 3 deletions(-) + +--- a/drivers/usb/host/xhci.c ++++ b/drivers/usb/host/xhci.c +@@ -891,6 +891,41 @@ static void xhci_disable_port_wake_on_bi + spin_unlock_irqrestore(&xhci->lock, flags); + } + ++static bool xhci_pending_portevent(struct xhci_hcd *xhci) ++{ ++ __le32 __iomem **port_array; ++ int port_index; ++ u32 status; ++ u32 portsc; ++ ++ status = readl(&xhci->op_regs->status); ++ if (status & STS_EINT) ++ return true; ++ /* ++ * Checking STS_EINT is not enough as there is a lag between a change ++ * bit being set and the Port Status Change Event that it generated ++ * being written to the Event Ring. See note in xhci 1.1 section 4.19.2. ++ */ ++ ++ port_index = xhci->num_usb2_ports; ++ port_array = xhci->usb2_ports; ++ while (port_index--) { ++ portsc = readl(port_array[port_index]); ++ if (portsc & PORT_CHANGE_MASK || ++ (portsc & PORT_PLS_MASK) == XDEV_RESUME) ++ return true; ++ } ++ port_index = xhci->num_usb3_ports; ++ port_array = xhci->usb3_ports; ++ while (port_index--) { ++ portsc = readl(port_array[port_index]); ++ if (portsc & PORT_CHANGE_MASK || ++ (portsc & PORT_PLS_MASK) == XDEV_RESUME) ++ return true; ++ } ++ return false; ++} ++ + /* + * Stop HC (not bus-specific) + * +@@ -987,7 +1022,7 @@ EXPORT_SYMBOL_GPL(xhci_suspend); + */ + int xhci_resume(struct xhci_hcd *xhci, bool hibernated) + { +- u32 command, temp = 0, status; ++ u32 command, temp = 0; + struct usb_hcd *hcd = xhci_to_hcd(xhci); + struct usb_hcd *secondary_hcd; + int retval = 0; +@@ -1109,8 +1144,7 @@ int xhci_resume(struct xhci_hcd *xhci, b + done: + if (retval == 0) { + /* Resume root hubs only when have pending events. */ +- status = readl(&xhci->op_regs->status); +- if (status & STS_EINT) { ++ if (xhci_pending_portevent(xhci)) { + usb_hcd_resume_root_hub(xhci->shared_hcd); + usb_hcd_resume_root_hub(hcd); + } +--- a/drivers/usb/host/xhci.h ++++ b/drivers/usb/host/xhci.h +@@ -385,6 +385,10 @@ struct xhci_op_regs { + #define PORT_PLC (1 << 22) + /* port configure error change - port failed to configure its link partner */ + #define PORT_CEC (1 << 23) ++#define PORT_CHANGE_MASK (PORT_CSC | PORT_PEC | PORT_WRC | PORT_OCC | \ ++ PORT_RC | PORT_PLC | PORT_CEC) ++ ++ + /* Cold Attach Status - xHC can set this bit to report device attached during + * Sx state. Warm port reset should be perfomed to clear this bit and move port + * to connected state. diff --git a/queue-4.9/xprtrdma-return-enobufs-when-no-pages-are-available.patch b/queue-4.9/xprtrdma-return-enobufs-when-no-pages-are-available.patch new file mode 100644 index 00000000000..4234278eff2 --- /dev/null +++ b/queue-4.9/xprtrdma-return-enobufs-when-no-pages-are-available.patch @@ -0,0 +1,35 @@ +From a8f688ec437dc2045cc8f0c89fe877d5803850da Mon Sep 17 00:00:00 2001 +From: Chuck Lever +Date: Fri, 4 May 2018 15:35:46 -0400 +Subject: xprtrdma: Return -ENOBUFS when no pages are available + +From: Chuck Lever + +commit a8f688ec437dc2045cc8f0c89fe877d5803850da upstream. + +The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the +transport never calls xprt_write_space() when more pages become +available. -ENOBUFS will trigger the correct "delay briefly and call +again" logic. + +Fixes: 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract") +Signed-off-by: Chuck Lever +Cc: stable@vger.kernel.org # 4.8+ +Signed-off-by: Anna Schumaker +Signed-off-by: Sudip Mukherjee +Signed-off-by: Greg Kroah-Hartman +--- + net/sunrpc/xprtrdma/rpc_rdma.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/net/sunrpc/xprtrdma/rpc_rdma.c ++++ b/net/sunrpc/xprtrdma/rpc_rdma.c +@@ -229,7 +229,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt + /* alloc the pagelist for receiving buffer */ + ppages[p] = alloc_page(GFP_ATOMIC); + if (!ppages[p]) +- return -EAGAIN; ++ return -ENOBUFS; + } + seg[n].mr_page = ppages[p]; + seg[n].mr_offset = (void *)(unsigned long) page_base;