From: Sasha Levin Date: Fri, 17 May 2019 02:10:55 +0000 (-0400) Subject: fixes for 4.14 X-Git-Tag: v4.9.178~50 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=9b9d21fb1bfe1a6383804a2d0859007776364bd6;p=thirdparty%2Fkernel%2Fstable-queue.git fixes for 4.14 Signed-off-by: Sasha Levin --- diff --git a/queue-4.14/locking-rwsem-prevent-decrement-of-reader-count-befo.patch b/queue-4.14/locking-rwsem-prevent-decrement-of-reader-count-befo.patch new file mode 100644 index 00000000000..412dcd08e96 --- /dev/null +++ b/queue-4.14/locking-rwsem-prevent-decrement-of-reader-count-befo.patch @@ -0,0 +1,129 @@ +From 37455e458d58dcceb59a7356d87f122a9c620b7a Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Sun, 28 Apr 2019 17:25:38 -0400 +Subject: locking/rwsem: Prevent decrement of reader count before increment + +[ Upstream commit a9e9bcb45b1525ba7aea26ed9441e8632aeeda58 ] + +During my rwsem testing, it was found that after a down_read(), the +reader count may occasionally become 0 or even negative. Consequently, +a writer may steal the lock at that time and execute with the reader +in parallel thus breaking the mutual exclusion guarantee of the write +lock. In other words, both readers and writer can become rwsem owners +simultaneously. + +The current reader wakeup code does it in one pass to clear waiter->task +and put them into wake_q before fully incrementing the reader count. +Once waiter->task is cleared, the corresponding reader may see it, +finish the critical section and do unlock to decrement the count before +the count is incremented. This is not a problem if there is only one +reader to wake up as the count has been pre-incremented by 1. It is +a problem if there are more than one readers to be woken up and writer +can steal the lock. + +The wakeup was actually done in 2 passes before the following v4.9 commit: + + 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once") + +To fix this problem, the wakeup is now done in two passes +again. In the first pass, we collect the readers and count them. +The reader count is then fully incremented. In the second pass, the +waiter->task is then cleared and they are put into wake_q to be woken +up later. + +Signed-off-by: Waiman Long +Acked-by: Linus Torvalds +Cc: Borislav Petkov +Cc: Davidlohr Bueso +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Cc: Tim Chen +Cc: Will Deacon +Cc: huang ying +Fixes: 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once") +Link: http://lkml.kernel.org/r/20190428212557.13482-2-longman@redhat.com +Signed-off-by: Ingo Molnar +Signed-off-by: Sasha Levin +--- + kernel/locking/rwsem-xadd.c | 44 +++++++++++++++++++++++++------------ + 1 file changed, 30 insertions(+), 14 deletions(-) + +diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c +index c75017326c37a..3f5be624c7649 100644 +--- a/kernel/locking/rwsem-xadd.c ++++ b/kernel/locking/rwsem-xadd.c +@@ -130,6 +130,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, + { + struct rwsem_waiter *waiter, *tmp; + long oldcount, woken = 0, adjustment = 0; ++ struct list_head wlist; + + /* + * Take a peek at the queue head waiter such that we can determine +@@ -188,18 +189,42 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, + * of the queue. We know that woken will be at least 1 as we accounted + * for above. Note we increment the 'active part' of the count by the + * number of readers before waking any processes up. ++ * ++ * We have to do wakeup in 2 passes to prevent the possibility that ++ * the reader count may be decremented before it is incremented. It ++ * is because the to-be-woken waiter may not have slept yet. So it ++ * may see waiter->task got cleared, finish its critical section and ++ * do an unlock before the reader count increment. ++ * ++ * 1) Collect the read-waiters in a separate list, count them and ++ * fully increment the reader count in rwsem. ++ * 2) For each waiters in the new list, clear waiter->task and ++ * put them into wake_q to be woken up later. + */ +- list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) { +- struct task_struct *tsk; +- ++ list_for_each_entry(waiter, &sem->wait_list, list) { + if (waiter->type == RWSEM_WAITING_FOR_WRITE) + break; + + woken++; +- tsk = waiter->task; ++ } ++ list_cut_before(&wlist, &sem->wait_list, &waiter->list); ++ ++ adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; ++ if (list_empty(&sem->wait_list)) { ++ /* hit end of list above */ ++ adjustment -= RWSEM_WAITING_BIAS; ++ } ++ ++ if (adjustment) ++ atomic_long_add(adjustment, &sem->count); ++ ++ /* 2nd pass */ ++ list_for_each_entry_safe(waiter, tmp, &wlist, list) { ++ struct task_struct *tsk; + ++ tsk = waiter->task; + get_task_struct(tsk); +- list_del(&waiter->list); ++ + /* + * Ensure calling get_task_struct() before setting the reader + * waiter to nil such that rwsem_down_read_failed() cannot +@@ -215,15 +240,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, + /* wake_q_add() already take the task ref */ + put_task_struct(tsk); + } +- +- adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; +- if (list_empty(&sem->wait_list)) { +- /* hit end of list above */ +- adjustment -= RWSEM_WAITING_BIAS; +- } +- +- if (adjustment) +- atomic_long_add(adjustment, &sem->count); + } + + /* +-- +2.20.1 + diff --git a/queue-4.14/net-core-another-layer-of-lists-around-pf_memalloc-s.patch b/queue-4.14/net-core-another-layer-of-lists-around-pf_memalloc-s.patch new file mode 100644 index 00000000000..5ee71c210c4 --- /dev/null +++ b/queue-4.14/net-core-another-layer-of-lists-around-pf_memalloc-s.patch @@ -0,0 +1,64 @@ +From d75510c5cd74ed3e2b30c08cb59917893c2d5df6 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 16 May 2019 21:30:49 -0400 +Subject: net: core: another layer of lists, around PF_MEMALLOC skb handling + +[ Upstream commit 78ed8cc25986ac5c21762eeddc1e86e94d422e36 ] + +First example of a layer splitting the list (rather than merely taking + individual packets off it). +Involves new list.h function, list_cut_before(), like list_cut_position() + but cuts on the other side of the given entry. + +Signed-off-by: Edward Cree +Signed-off-by: David S. Miller +[sl: cut out non list.h bits, we only want list_cut_before] +Signed-off-by: Sasha Levin +--- + include/linux/list.h | 30 ++++++++++++++++++++++++++++++ + 1 file changed, 30 insertions(+) + +diff --git a/include/linux/list.h b/include/linux/list.h +index 4b129df4d46b5..de04cc5ed5367 100644 +--- a/include/linux/list.h ++++ b/include/linux/list.h +@@ -285,6 +285,36 @@ static inline void list_cut_position(struct list_head *list, + __list_cut_position(list, head, entry); + } + ++/** ++ * list_cut_before - cut a list into two, before given entry ++ * @list: a new list to add all removed entries ++ * @head: a list with entries ++ * @entry: an entry within head, could be the head itself ++ * ++ * This helper moves the initial part of @head, up to but ++ * excluding @entry, from @head to @list. You should pass ++ * in @entry an element you know is on @head. @list should ++ * be an empty list or a list you do not care about losing ++ * its data. ++ * If @entry == @head, all entries on @head are moved to ++ * @list. ++ */ ++static inline void list_cut_before(struct list_head *list, ++ struct list_head *head, ++ struct list_head *entry) ++{ ++ if (head->next == entry) { ++ INIT_LIST_HEAD(list); ++ return; ++ } ++ list->next = head->next; ++ list->next->prev = list; ++ list->prev = entry->prev; ++ list->prev->next = list; ++ head->next = entry; ++ entry->prev = head; ++} ++ + static inline void __list_splice(const struct list_head *list, + struct list_head *prev, + struct list_head *next) +-- +2.20.1 + diff --git a/queue-4.14/pci-hv-add-hv_pci_remove_slots-when-we-unload-the-dr.patch b/queue-4.14/pci-hv-add-hv_pci_remove_slots-when-we-unload-the-dr.patch new file mode 100644 index 00000000000..fc4138f7e9d --- /dev/null +++ b/queue-4.14/pci-hv-add-hv_pci_remove_slots-when-we-unload-the-dr.patch @@ -0,0 +1,80 @@ +From de27bc86963c7bd69a939253588b17d266b36454 Mon Sep 17 00:00:00 2001 +From: Dexuan Cui +Date: Wed, 15 May 2019 15:59:15 -0700 +Subject: PCI: hv: Add hv_pci_remove_slots() when we unload the driver + +[ Upstream commit 15becc2b56c6eda3d9bf5ae993bafd5661c1fad1 ] + +When we unload the pci-hyperv host controller driver, the host does not +send us a PCI_EJECT message. + +In this case we also need to make sure the sysfs PCI slot directory is +removed, otherwise a command on a slot file eg: + +"cat /sys/bus/pci/slots/2/address" + +will trigger a + +"BUG: unable to handle kernel paging request" + +and, if we unload/reload the driver several times we would end up with +stale slot entries in PCI slot directories in /sys/bus/pci/slots/ + +root@localhost:~# ls -rtl /sys/bus/pci/slots/ +total 0 +drwxr-xr-x 2 root root 0 Feb 7 10:49 2 +drwxr-xr-x 2 root root 0 Feb 7 10:49 2-1 +drwxr-xr-x 2 root root 0 Feb 7 10:51 2-2 + +Add the missing code to remove the PCI slot and fix the current +behaviour. + +Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") +Signed-off-by: Dexuan Cui +[lorenzo.pieralisi@arm.com: reformatted the log] +Signed-off-by: Lorenzo Pieralisi +Reviewed-by: Stephen Hemminger +Reviewed-by: Michael Kelley +Cc: stable@vger.kernel.org +Signed-off-by: Sasha Levin +--- + drivers/pci/host/pci-hyperv.c | 16 ++++++++++++++++ + 1 file changed, 16 insertions(+) + +diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c +index 292450c7da625..a5825bbcded72 100644 +--- a/drivers/pci/host/pci-hyperv.c ++++ b/drivers/pci/host/pci-hyperv.c +@@ -1513,6 +1513,21 @@ static void hv_pci_assign_slots(struct hv_pcibus_device *hbus) + } + } + ++/* ++ * Remove entries in sysfs pci slot directory. ++ */ ++static void hv_pci_remove_slots(struct hv_pcibus_device *hbus) ++{ ++ struct hv_pci_dev *hpdev; ++ ++ list_for_each_entry(hpdev, &hbus->children, list_entry) { ++ if (!hpdev->pci_slot) ++ continue; ++ pci_destroy_slot(hpdev->pci_slot); ++ hpdev->pci_slot = NULL; ++ } ++} ++ + /** + * create_root_hv_pci_bus() - Expose a new root PCI bus + * @hbus: Root PCI bus, as understood by this driver +@@ -2719,6 +2734,7 @@ static int hv_pci_remove(struct hv_device *hdev) + pci_lock_rescan_remove(); + pci_stop_root_bus(hbus->pci_bus); + pci_remove_root_bus(hbus->pci_bus); ++ hv_pci_remove_slots(hbus); + pci_unlock_rescan_remove(); + hbus->state = hv_pcibus_removed; + } +-- +2.20.1 + diff --git a/queue-4.14/pci-hv-add-pci_destroy_slot-in-pci_devices_present_w.patch b/queue-4.14/pci-hv-add-pci_destroy_slot-in-pci_devices_present_w.patch new file mode 100644 index 00000000000..42280c1ed5b --- /dev/null +++ b/queue-4.14/pci-hv-add-pci_destroy_slot-in-pci_devices_present_w.patch @@ -0,0 +1,94 @@ +From df1a1c65fc7b1b9c9a5341973dd8cd02039368d0 Mon Sep 17 00:00:00 2001 +From: Dexuan Cui +Date: Wed, 15 May 2019 16:06:22 -0700 +Subject: PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if + necessary + +[ Upstream commit 340d455699400f2c2c0f9b3f703ade3085cdb501 ] + +When we hot-remove a device, usually the host sends us a PCI_EJECT message, +and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0. + +When we execute the quick hot-add/hot-remove test, the host may not send +us the PCI_EJECT message if the guest has not fully finished the +initialization by sending the PCI_RESOURCES_ASSIGNED* message to the +host, so it's potentially unsafe to only depend on the +pci_destroy_slot() in hv_eject_device_work() because the code path + +create_root_hv_pci_bus() + -> hv_pci_assign_slots() + +is not called in this case. Note: in this case, the host still sends the +guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0. + +In the quick hot-add/hot-remove test, we can have such a race before +the code path + +pci_devices_present_work() + -> new_pcichild_device() + +adds the new device into the hbus->children list, we may have already +received the PCI_EJECT message, and since the tasklet handler + +hv_pci_onchannelcallback() + +may fail to find the "hpdev" by calling + +get_pcichild_wslot(hbus, dev_message->wslot.slot) + +hv_pci_eject_device() is not called; Later, by continuing execution + +create_root_hv_pci_bus() + -> hv_pci_assign_slots() + +creates the slot and the PCI_BUS_RELATIONS message with +bus_rel->device_count == 0 removes the device from hbus->children, and +we end up being unable to remove the slot in + +hv_pci_remove() + -> hv_pci_remove_slots() + +Remove the slot in pci_devices_present_work() when the device +is removed to address this race. + +pci_devices_present_work() and hv_eject_device_work() run in the +singled-threaded hbus->wq, so there is not a double-remove issue for the +slot. + +We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback() +to the workqueue, because we need the hv_pci_onchannelcallback() +synchronously call hv_pci_eject_device() to poll the channel +ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue +fixed in commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in +hv_compose_msi_msg()") + +Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") +Signed-off-by: Dexuan Cui +[lorenzo.pieralisi@arm.com: rewritten commit log] +Signed-off-by: Lorenzo Pieralisi +Reviewed-by: Stephen Hemminger +Reviewed-by: Michael Kelley +Cc: stable@vger.kernel.org +Signed-off-by: Sasha Levin +--- + drivers/pci/host/pci-hyperv.c | 4 ++++ + 1 file changed, 4 insertions(+) + +diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c +index a5825bbcded72..f591de23f3d35 100644 +--- a/drivers/pci/host/pci-hyperv.c ++++ b/drivers/pci/host/pci-hyperv.c +@@ -1824,6 +1824,10 @@ static void pci_devices_present_work(struct work_struct *work) + hpdev = list_first_entry(&removed, struct hv_pci_dev, + list_entry); + list_del(&hpdev->list_entry); ++ ++ if (hpdev->pci_slot) ++ pci_destroy_slot(hpdev->pci_slot); ++ + put_pcichild(hpdev, hv_pcidev_ref_initial); + } + +-- +2.20.1 + diff --git a/queue-4.14/pci-hv-fix-a-memory-leak-in-hv_eject_device_work.patch b/queue-4.14/pci-hv-fix-a-memory-leak-in-hv_eject_device_work.patch new file mode 100644 index 00000000000..63495fb1e3e --- /dev/null +++ b/queue-4.14/pci-hv-fix-a-memory-leak-in-hv_eject_device_work.patch @@ -0,0 +1,51 @@ +From 50e8d77ba0e1e57675e72ea4244a3b7b0656d1f1 Mon Sep 17 00:00:00 2001 +From: Dexuan Cui +Date: Wed, 15 May 2019 15:42:07 -0700 +Subject: PCI: hv: Fix a memory leak in hv_eject_device_work() + +[ Upstream commit 05f151a73ec2b23ffbff706e5203e729a995cdc2 ] + +When a device is created in new_pcichild_device(), hpdev->refs is set +to 2 (i.e. the initial value of 1 plus the get_pcichild()). + +When we hot remove the device from the host, in a Linux VM we first call +hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and +then schedules a work of hv_eject_device_work(), so hpdev->refs becomes +3 (let's ignore the paired get/put_pcichild() in other places). But in +hv_eject_device_work(), currently we only call put_pcichild() twice, +meaning the 'hpdev' struct can't be freed in put_pcichild(). + +Add one put_pcichild() to fix the memory leak. + +The device can also be removed when we run "rmmod pci-hyperv". On this +path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()), +hpdev->refs is 2, and we do correctly call put_pcichild() twice in +pci_devices_present_work(). + +Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") +Signed-off-by: Dexuan Cui +[lorenzo.pieralisi@arm.com: commit log rework] +Signed-off-by: Lorenzo Pieralisi +Reviewed-by: Stephen Hemminger +Reviewed-by: Michael Kelley +Cc: stable@vger.kernel.org +Signed-off-by: Sasha Levin +--- + drivers/pci/host/pci-hyperv.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c +index 53d1c08cef4dc..292450c7da625 100644 +--- a/drivers/pci/host/pci-hyperv.c ++++ b/drivers/pci/host/pci-hyperv.c +@@ -1941,6 +1941,7 @@ static void hv_eject_device_work(struct work_struct *work) + VM_PKT_DATA_INBAND, 0); + + put_pcichild(hpdev, hv_pcidev_ref_childlist); ++ put_pcichild(hpdev, hv_pcidev_ref_initial); + put_pcichild(hpdev, hv_pcidev_ref_pnp); + put_hvpcibus(hpdev->hbus); + } +-- +2.20.1 + diff --git a/queue-4.14/series b/queue-4.14/series new file mode 100644 index 00000000000..57ad6228d83 --- /dev/null +++ b/queue-4.14/series @@ -0,0 +1,5 @@ +locking-rwsem-prevent-decrement-of-reader-count-befo.patch +pci-hv-fix-a-memory-leak-in-hv_eject_device_work.patch +pci-hv-add-hv_pci_remove_slots-when-we-unload-the-dr.patch +pci-hv-add-pci_destroy_slot-in-pci_devices_present_w.patch +net-core-another-layer-of-lists-around-pf_memalloc-s.patch