]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/commitdiff
4.4-stable patches
authorGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fri, 21 Feb 2020 07:19:42 +0000 (08:19 +0100)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fri, 21 Feb 2020 07:19:42 +0000 (08:19 +0100)
added patches:
enic-prevent-waking-up-stopped-tx-queues-over-watchdog-reset.patch

queue-4.4/enic-prevent-waking-up-stopped-tx-queues-over-watchdog-reset.patch [new file with mode: 0644]
queue-4.4/series

diff --git a/queue-4.4/enic-prevent-waking-up-stopped-tx-queues-over-watchdog-reset.patch b/queue-4.4/enic-prevent-waking-up-stopped-tx-queues-over-watchdog-reset.patch
new file mode 100644 (file)
index 0000000..f6a1275
--- /dev/null
@@ -0,0 +1,57 @@
+From foo@baz Fri 21 Feb 2020 08:19:03 AM CET
+From: Firo Yang <firo.yang@suse.com>
+Date: Wed, 12 Feb 2020 06:09:17 +0100
+Subject: enic: prevent waking up stopped tx queues over watchdog reset
+
+From: Firo Yang <firo.yang@suse.com>
+
+[ Upstream commit 0f90522591fd09dd201065c53ebefdfe3c6b55cb ]
+
+Recent months, our customer reported several kernel crashes all
+preceding with following message:
+NETDEV WATCHDOG: eth2 (enic): transmit queue 0 timed out
+Error message of one of those crashes:
+BUG: unable to handle kernel paging request at ffffffffa007e090
+
+After analyzing severl vmcores, I found that most of crashes are
+caused by memory corruption. And all the corrupted memory areas
+are overwritten by data of network packets. Moreover, I also found
+that the tx queues were enabled over watchdog reset.
+
+After going through the source code, I found that in enic_stop(),
+the tx queues stopped by netif_tx_disable() could be woken up over
+a small time window between netif_tx_disable() and the
+napi_disable() by the following code path:
+napi_poll->
+  enic_poll_msix_wq->
+     vnic_cq_service->
+        enic_wq_service->
+           netif_wake_subqueue(enic->netdev, q_number)->
+              test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)
+In turn, upper netowrk stack could queue skb to ENIC NIC though
+enic_hard_start_xmit(). And this might introduce some race condition.
+
+Our customer comfirmed that this kind of kernel crash doesn't occur over
+90 days since they applied this patch.
+
+Signed-off-by: Firo Yang <firo.yang@suse.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/net/ethernet/cisco/enic/enic_main.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/net/ethernet/cisco/enic/enic_main.c
++++ b/drivers/net/ethernet/cisco/enic/enic_main.c
+@@ -1807,10 +1807,10 @@ static int enic_stop(struct net_device *
+       }
+       netif_carrier_off(netdev);
+-      netif_tx_disable(netdev);
+       if (vnic_dev_get_intr_mode(enic->vdev) == VNIC_DEV_INTR_MODE_MSIX)
+               for (i = 0; i < enic->wq_count; i++)
+                       napi_disable(&enic->napi[enic_cq_wq(enic, i)]);
++      netif_tx_disable(netdev);
+       if (!enic_is_dynamic(enic) && !enic_is_sriov_vf(enic))
+               enic_dev_del_station_addr(enic);
index 52b3ce76d837f37f09481a23d464aec57fa8ced7..b674b11243447de09f20742820e023de5923160a 100644 (file)
@@ -79,3 +79,4 @@ irqchip-gic-v3-its-reference-to-its_invall_cmd-descr.patch
 microblaze-prevent-the-overflow-of-the-start.patch
 brd-check-and-limit-max_part-par.patch
 selinux-ensure-we-cleanup-the-internal-avc-counters-.patch
+enic-prevent-waking-up-stopped-tx-queues-over-watchdog-reset.patch