From: Greg Kroah-Hartman Date: Wed, 28 Jun 2023 18:30:08 +0000 (+0200) Subject: 5.15-stable patches X-Git-Tag: v6.4.1~50 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=273f9b36d1d730d370b0cdfbafc73a97189e4af3;p=thirdparty%2Fkernel%2Fstable-queue.git 5.15-stable patches added patches: mm-hwpoison-try-to-recover-from-copy-on-write-faults.patch mm-hwpoison-when-copy-on-write-hits-poison-take-page-offline.patch mptcp-consolidate-fallback-and-non-fallback-state-machine.patch mptcp-fix-possible-divide-by-zero-in-recvmsg.patch series --- diff --git a/queue-5.15/mm-hwpoison-try-to-recover-from-copy-on-write-faults.patch b/queue-5.15/mm-hwpoison-try-to-recover-from-copy-on-write-faults.patch new file mode 100644 index 00000000000..bc753e5bf6d --- /dev/null +++ b/queue-5.15/mm-hwpoison-try-to-recover-from-copy-on-write-faults.patch @@ -0,0 +1,225 @@ +From stable-owner@vger.kernel.org Tue Jun 27 01:04:10 2023 +From: Jane Chu +Date: Mon, 26 Jun 2023 17:02:18 -0600 +Subject: mm, hwpoison: try to recover from copy-on write faults +To: stable@vger.kernel.org +Cc: tony.luck@intel.com, dan.j.williams@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, glider@google.com, jane.chu@oracle.com +Message-ID: <20230626230221.3064291-2-jane.chu@oracle.com> + +From: Tony Luck + +commit a873dfe1032a132bf89f9e19a6ac44f5a0b78754 upstream. + +Patch series "Copy-on-write poison recovery", v3. + +Part 1 deals with the process that triggered the copy on write fault with +a store to a shared read-only page. That process is send a SIGBUS with +the usual machine check decoration to specify the virtual address of the +lost page, together with the scope. + +Part 2 sets up to asynchronously take the page with the uncorrected error +offline to prevent additional machine check faults. H/t to Miaohe Lin + and Shuai Xue for +pointing me to the existing function to queue a call to memory_failure(). + +On x86 there is some duplicate reporting (because the error is also +signalled by the memory controller as well as by the core that triggered +the machine check). Console logs look like this: + +This patch (of 2): + +If the kernel is copying a page as the result of a copy-on-write +fault and runs into an uncorrectable error, Linux will crash because +it does not have recovery code for this case where poison is consumed +by the kernel. + +It is easy to set up a test case. Just inject an error into a private +page, fork(2), and have the child process write to the page. + +I wrapped that neatly into a test at: + + git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git + +just enable ACPI error injection and run: + + # ./einj_mem-uc -f copy-on-write + +Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel() +on architectures where that is available (currently x86 and powerpc). +When an error is detected during the page copy, return VM_FAULT_HWPOISON +to caller of wp_page_copy(). This propagates up the call stack. Both x86 +and powerpc have code in their fault handler to deal with this code by +sending a SIGBUS to the application. + +Note that this patch avoids a system crash and signals the process that +triggered the copy-on-write action. It does not take any action for the +memory error that is still in the shared page. To handle that a call to +memory_failure() is needed. But this cannot be done from wp_page_copy() +because it holds mmap_lock(). Perhaps the architecture fault handlers +can deal with this loose end in a subsequent patch? + +On Intel/x86 this loose end will often be handled automatically because +the memory controller provides an additional notification of the h/w +poison in memory, the handler for this will call memory_failure(). This +isn't a 100% solution. If there are multiple errors, not all may be +logged in this way. + +Cc: +[tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin] + Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com +Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com +Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com +Signed-off-by: Tony Luck +Reviewed-by: Dan Williams +Reviewed-by: Naoya Horiguchi +Reviewed-by: Miaohe Lin +Reviewed-by: Alexander Potapenko +Tested-by: Shuai Xue +Cc: Christophe Leroy +Cc: Matthew Wilcox (Oracle) +Cc: Michael Ellerman +Cc: Nicholas Piggin +Signed-off-by: Andrew Morton +[ Due to missing commits + c89357e27f20d ("mm: support GUP-triggered unsharing of anonymous pages") + 662ce1dc9caf4 ("delayacct: track delays from write-protect copy") + b073d7f8aee4e ("mm: kmsan: maintain KMSAN metadata for page operations") + The impact of c89357e27f20d is a name change from cow_user_page() to + __wp_page_copy_user(). + The impact of 662ce1dc9caf4 is the introduction of a new feature of + tracking write-protect copy in delayacct. + The impact of b073d7f8aee4e is an introduction of KASAN feature. + None of these commits establishes meaningful dependency, hence resolve by + ignoring them. - jane] +Signed-off-by: Jane Chu +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/highmem.h | 24 ++++++++++++++++++++++++ + mm/memory.c | 31 +++++++++++++++++++++---------- + 2 files changed, 45 insertions(+), 10 deletions(-) + +--- a/include/linux/highmem.h ++++ b/include/linux/highmem.h +@@ -247,6 +247,30 @@ static inline void copy_user_highpage(st + + #endif + ++#ifdef copy_mc_to_kernel ++static inline int copy_mc_user_highpage(struct page *to, struct page *from, ++ unsigned long vaddr, struct vm_area_struct *vma) ++{ ++ unsigned long ret; ++ char *vfrom, *vto; ++ ++ vfrom = kmap_local_page(from); ++ vto = kmap_local_page(to); ++ ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); ++ kunmap_local(vto); ++ kunmap_local(vfrom); ++ ++ return ret; ++} ++#else ++static inline int copy_mc_user_highpage(struct page *to, struct page *from, ++ unsigned long vaddr, struct vm_area_struct *vma) ++{ ++ copy_user_highpage(to, from, vaddr, vma); ++ return 0; ++} ++#endif ++ + #ifndef __HAVE_ARCH_COPY_HIGHPAGE + + static inline void copy_highpage(struct page *to, struct page *from) +--- a/mm/memory.c ++++ b/mm/memory.c +@@ -2753,10 +2753,16 @@ static inline int pte_unmap_same(struct + return same; + } + +-static inline bool cow_user_page(struct page *dst, struct page *src, +- struct vm_fault *vmf) ++/* ++ * Return: ++ * 0: copied succeeded ++ * -EHWPOISON: copy failed due to hwpoison in source page ++ * -EAGAIN: copied failed (some other reason) ++ */ ++static inline int cow_user_page(struct page *dst, struct page *src, ++ struct vm_fault *vmf) + { +- bool ret; ++ int ret; + void *kaddr; + void __user *uaddr; + bool locked = false; +@@ -2765,8 +2771,9 @@ static inline bool cow_user_page(struct + unsigned long addr = vmf->address; + + if (likely(src)) { +- copy_user_highpage(dst, src, addr, vma); +- return true; ++ if (copy_mc_user_highpage(dst, src, addr, vma)) ++ return -EHWPOISON; ++ return 0; + } + + /* +@@ -2793,7 +2800,7 @@ static inline bool cow_user_page(struct + * and update local tlb only + */ + update_mmu_tlb(vma, addr, vmf->pte); +- ret = false; ++ ret = -EAGAIN; + goto pte_unlock; + } + +@@ -2818,7 +2825,7 @@ static inline bool cow_user_page(struct + if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + /* The PTE changed under us, update local tlb */ + update_mmu_tlb(vma, addr, vmf->pte); +- ret = false; ++ ret = -EAGAIN; + goto pte_unlock; + } + +@@ -2837,7 +2844,7 @@ warn: + } + } + +- ret = true; ++ ret = 0; + + pte_unlock: + if (locked) +@@ -3003,6 +3010,7 @@ static vm_fault_t wp_page_copy(struct vm + pte_t entry; + int page_copied = 0; + struct mmu_notifier_range range; ++ int ret; + + if (unlikely(anon_vma_prepare(vma))) + goto oom; +@@ -3018,17 +3026,20 @@ static vm_fault_t wp_page_copy(struct vm + if (!new_page) + goto oom; + +- if (!cow_user_page(new_page, old_page, vmf)) { ++ ret = cow_user_page(new_page, old_page, vmf); ++ if (ret) { + /* + * COW failed, if the fault was solved by other, + * it's fine. If not, userspace would re-fault on + * the same address and we will handle the fault + * from the second attempt. ++ * The -EHWPOISON case will not be retried. + */ + put_page(new_page); + if (old_page) + put_page(old_page); +- return 0; ++ ++ return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0; + } + } + diff --git a/queue-5.15/mm-hwpoison-when-copy-on-write-hits-poison-take-page-offline.patch b/queue-5.15/mm-hwpoison-when-copy-on-write-hits-poison-take-page-offline.patch new file mode 100644 index 00000000000..3f60f25e8c1 --- /dev/null +++ b/queue-5.15/mm-hwpoison-when-copy-on-write-hits-poison-take-page-offline.patch @@ -0,0 +1,89 @@ +From stable-owner@vger.kernel.org Tue Jun 27 01:04:09 2023 +From: Jane Chu +Date: Mon, 26 Jun 2023 17:02:19 -0600 +Subject: mm, hwpoison: when copy-on-write hits poison, take page offline +To: stable@vger.kernel.org +Cc: tony.luck@intel.com, dan.j.williams@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, glider@google.com, jane.chu@oracle.com +Message-ID: <20230626230221.3064291-3-jane.chu@oracle.com> + +From: Jane Chu + +From: Tony Luck + +commit d302c2398ba269e788a4f37ae57c07a7fcabaa42 upstream. + +Cannot call memory_failure() directly from the fault handler because +mmap_lock (and others) are held. + +It is important, but not urgent, to mark the source page as h/w poisoned +and unmap it from other tasks. + +Use memory_failure_queue() to request a call to memory_failure() for the +page with the error. + +Also provide a stub version for CONFIG_MEMORY_FAILURE=n + +Cc: +Link: https://lkml.kernel.org/r/20221021200120.175753-3-tony.luck@intel.com +Signed-off-by: Tony Luck +Reviewed-by: Miaohe Lin +Cc: Christophe Leroy +Cc: Dan Williams +Cc: Matthew Wilcox (Oracle) +Cc: Michael Ellerman +Cc: Naoya Horiguchi +Cc: Nicholas Piggin +Cc: Shuai Xue +Signed-off-by: Andrew Morton +[ Due to missing commits + e591ef7d96d6e ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage") + 5033091de814a ("mm/hwpoison: introduce per-memory_block hwpoison counter") + The impact of e591ef7d96d6e is its introduction of an additional flag in + __get_huge_page_for_hwpoison() that serves as an indication a hwpoisoned + hugetlb page should have its migratable bit cleared. + The impact of 5033091de814a is contexual. + Resolve by ignoring both missing commits. - jane] +Signed-off-by: Jane Chu +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/mm.h | 5 ++++- + mm/memory.c | 4 +++- + 2 files changed, 7 insertions(+), 2 deletions(-) + +--- a/include/linux/mm.h ++++ b/include/linux/mm.h +@@ -3124,7 +3124,6 @@ enum mf_flags { + MF_SOFT_OFFLINE = 1 << 3, + }; + extern int memory_failure(unsigned long pfn, int flags); +-extern void memory_failure_queue(unsigned long pfn, int flags); + extern void memory_failure_queue_kick(int cpu); + extern int unpoison_memory(unsigned long pfn); + extern int sysctl_memory_failure_early_kill; +@@ -3133,8 +3132,12 @@ extern void shake_page(struct page *p); + extern atomic_long_t num_poisoned_pages __read_mostly; + extern int soft_offline_page(unsigned long pfn, int flags); + #ifdef CONFIG_MEMORY_FAILURE ++extern void memory_failure_queue(unsigned long pfn, int flags); + extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags); + #else ++static inline void memory_failure_queue(unsigned long pfn, int flags) ++{ ++} + static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) + { + return 0; +--- a/mm/memory.c ++++ b/mm/memory.c +@@ -2771,8 +2771,10 @@ static inline int cow_user_page(struct p + unsigned long addr = vmf->address; + + if (likely(src)) { +- if (copy_mc_user_highpage(dst, src, addr, vma)) ++ if (copy_mc_user_highpage(dst, src, addr, vma)) { ++ memory_failure_queue(page_to_pfn(src), 0); + return -EHWPOISON; ++ } + return 0; + } + diff --git a/queue-5.15/mptcp-consolidate-fallback-and-non-fallback-state-machine.patch b/queue-5.15/mptcp-consolidate-fallback-and-non-fallback-state-machine.patch new file mode 100644 index 00000000000..6fd5e679fb2 --- /dev/null +++ b/queue-5.15/mptcp-consolidate-fallback-and-non-fallback-state-machine.patch @@ -0,0 +1,203 @@ +From 81c1d029016001f994ce1c46849c5e9900d8eab8 Mon Sep 17 00:00:00 2001 +From: Paolo Abeni +Date: Tue, 20 Jun 2023 18:24:21 +0200 +Subject: mptcp: consolidate fallback and non fallback state machine + +From: Paolo Abeni + +commit 81c1d029016001f994ce1c46849c5e9900d8eab8 upstream. + +An orphaned msk releases the used resources via the worker, +when the latter first see the msk in CLOSED status. + +If the msk status transitions to TCP_CLOSE in the release callback +invoked by the worker's final release_sock(), such instance of the +workqueue will not take any action. + +Additionally the MPTCP code prevents scheduling the worker once the +socket reaches the CLOSE status: such msk resources will be leaked. + +The only code path that can trigger the above scenario is the +__mptcp_check_send_data_fin() in fallback mode. + +Address the issue removing the special handling of fallback socket +in __mptcp_check_send_data_fin(), consolidating the state machine +for fallback and non fallback socket. + +Since non-fallback sockets do not send and do not receive data_fin, +the mptcp code can update the msk internal status to match the next +step in the SM every time data fin (ack) should be generated or +received. + +As a consequence we can remove a bunch of checks for fallback from +the fastpath. + +Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") +Cc: stable@vger.kernel.org +Signed-off-by: Paolo Abeni +Reviewed-by: Mat Martineau +Signed-off-by: Matthieu Baerts +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +--- + net/mptcp/protocol.c | 39 +++++++++++++++------------------------ + net/mptcp/subflow.c | 17 ++++++++++------- + 2 files changed, 25 insertions(+), 31 deletions(-) + +--- a/net/mptcp/protocol.c ++++ b/net/mptcp/protocol.c +@@ -51,7 +51,7 @@ enum { + static struct percpu_counter mptcp_sockets_allocated; + + static void __mptcp_destroy_sock(struct sock *sk); +-static void __mptcp_check_send_data_fin(struct sock *sk); ++static void mptcp_check_send_data_fin(struct sock *sk); + + DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions); + static struct net_device mptcp_napi_dev; +@@ -355,8 +355,7 @@ static bool mptcp_pending_data_fin_ack(s + { + struct mptcp_sock *msk = mptcp_sk(sk); + +- return !__mptcp_check_fallback(msk) && +- ((1 << sk->sk_state) & ++ return ((1 << sk->sk_state) & + (TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK)) && + msk->write_seq == READ_ONCE(msk->snd_una); + } +@@ -509,9 +508,6 @@ static bool mptcp_check_data_fin(struct + u64 rcv_data_fin_seq; + bool ret = false; + +- if (__mptcp_check_fallback(msk)) +- return ret; +- + /* Need to ack a DATA_FIN received from a peer while this side + * of the connection is in ESTABLISHED, FIN_WAIT1, or FIN_WAIT2. + * msk->rcv_data_fin was set when parsing the incoming options +@@ -549,7 +545,8 @@ static bool mptcp_check_data_fin(struct + } + + ret = true; +- mptcp_send_ack(msk); ++ if (!__mptcp_check_fallback(msk)) ++ mptcp_send_ack(msk); + mptcp_close_wake_up(sk); + } + return ret; +@@ -1612,7 +1609,7 @@ out: + if (!mptcp_timer_pending(sk)) + mptcp_reset_timer(sk); + if (copied) +- __mptcp_check_send_data_fin(sk); ++ mptcp_check_send_data_fin(sk); + } + + static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk) +@@ -2451,7 +2448,6 @@ static void mptcp_worker(struct work_str + if (unlikely((1 << state) & (TCPF_CLOSE | TCPF_LISTEN))) + goto unlock; + +- mptcp_check_data_fin_ack(sk); + mptcp_flush_join_list(msk); + + mptcp_check_fastclose(msk); +@@ -2462,7 +2458,8 @@ static void mptcp_worker(struct work_str + if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags)) + mptcp_check_for_eof(msk); + +- __mptcp_check_send_data_fin(sk); ++ mptcp_check_send_data_fin(sk); ++ mptcp_check_data_fin_ack(sk); + mptcp_check_data_fin(sk); + + /* There is no point in keeping around an orphaned sk timedout or +@@ -2591,6 +2588,12 @@ void mptcp_subflow_shutdown(struct sock + pr_debug("Fallback"); + ssk->sk_shutdown |= how; + tcp_shutdown(ssk, how); ++ ++ /* simulate the data_fin ack reception to let the state ++ * machine move forward ++ */ ++ WRITE_ONCE(mptcp_sk(sk)->snd_una, mptcp_sk(sk)->snd_nxt); ++ mptcp_schedule_work(sk); + } else { + pr_debug("Sending DATA_FIN on subflow %p", ssk); + tcp_send_ack(ssk); +@@ -2630,7 +2633,7 @@ static int mptcp_close_state(struct sock + return next & TCP_ACTION_FIN; + } + +-static void __mptcp_check_send_data_fin(struct sock *sk) ++static void mptcp_check_send_data_fin(struct sock *sk) + { + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk = mptcp_sk(sk); +@@ -2648,18 +2651,6 @@ static void __mptcp_check_send_data_fin( + + WRITE_ONCE(msk->snd_nxt, msk->write_seq); + +- /* fallback socket will not get data_fin/ack, can move to the next +- * state now +- */ +- if (__mptcp_check_fallback(msk)) { +- if ((1 << sk->sk_state) & (TCPF_CLOSING | TCPF_LAST_ACK)) { +- inet_sk_state_store(sk, TCP_CLOSE); +- mptcp_close_wake_up(sk); +- } else if (sk->sk_state == TCP_FIN_WAIT1) { +- inet_sk_state_store(sk, TCP_FIN_WAIT2); +- } +- } +- + mptcp_flush_join_list(msk); + mptcp_for_each_subflow(msk, subflow) { + struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow); +@@ -2680,7 +2671,7 @@ static void __mptcp_wr_shutdown(struct s + WRITE_ONCE(msk->write_seq, msk->write_seq + 1); + WRITE_ONCE(msk->snd_data_fin_enable, 1); + +- __mptcp_check_send_data_fin(sk); ++ mptcp_check_send_data_fin(sk); + } + + static void __mptcp_destroy_sock(struct sock *sk) +--- a/net/mptcp/subflow.c ++++ b/net/mptcp/subflow.c +@@ -1653,14 +1653,16 @@ static void subflow_state_change(struct + { + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct sock *parent = subflow->conn; ++ struct mptcp_sock *msk; + + __subflow_state_change(sk); + ++ msk = mptcp_sk(parent); + if (subflow_simultaneous_connect(sk)) { + mptcp_propagate_sndbuf(parent, sk); + mptcp_do_fallback(sk); +- mptcp_rcv_space_init(mptcp_sk(parent), sk); +- pr_fallback(mptcp_sk(parent)); ++ mptcp_rcv_space_init(msk, sk); ++ pr_fallback(msk); + subflow->conn_finished = 1; + mptcp_set_connected(parent); + } +@@ -1676,11 +1678,12 @@ static void subflow_state_change(struct + + subflow_sched_work_if_closed(mptcp_sk(parent), sk); + +- if (__mptcp_check_fallback(mptcp_sk(parent)) && +- !subflow->rx_eof && subflow_is_done(sk)) { +- subflow->rx_eof = 1; +- mptcp_subflow_eof(parent); +- } ++ /* when the fallback subflow closes the rx side, trigger a 'dummy' ++ * ingress data fin, so that the msk state will follow along ++ */ ++ if (__mptcp_check_fallback(msk) && subflow_is_done(sk) && msk->first == sk && ++ mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) ++ mptcp_schedule_work(parent); + } + + static int subflow_ulp_init(struct sock *sk) diff --git a/queue-5.15/mptcp-fix-possible-divide-by-zero-in-recvmsg.patch b/queue-5.15/mptcp-fix-possible-divide-by-zero-in-recvmsg.patch new file mode 100644 index 00000000000..08a6b326807 --- /dev/null +++ b/queue-5.15/mptcp-fix-possible-divide-by-zero-in-recvmsg.patch @@ -0,0 +1,95 @@ +From 0ad529d9fd2bfa3fc619552a8d2fb2f2ef0bce2e Mon Sep 17 00:00:00 2001 +From: Paolo Abeni +Date: Tue, 20 Jun 2023 18:24:19 +0200 +Subject: mptcp: fix possible divide by zero in recvmsg() + +From: Paolo Abeni + +commit 0ad529d9fd2bfa3fc619552a8d2fb2f2ef0bce2e upstream. + +Christoph reported a divide by zero bug in mptcp_recvmsg(): + +divide error: 0000 [#1] PREEMPT SMP +CPU: 1 PID: 19978 Comm: syz-executor.6 Not tainted 6.4.0-rc2-gffcc7899081b #20 +Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 +RIP: 0010:__tcp_select_window+0x30e/0x420 net/ipv4/tcp_output.c:3018 +Code: 11 ff 0f b7 cd c1 e9 0c b8 ff ff ff ff d3 e0 89 c1 f7 d1 01 cb 21 c3 eb 17 e8 2e 83 11 ff 31 db eb 0e e8 25 83 11 ff 89 d8 99 7c 24 04 29 d3 65 48 8b 04 25 28 00 00 00 48 3b 44 24 10 75 60 +RSP: 0018:ffffc90000a07a18 EFLAGS: 00010246 +RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000040000 +RDX: 0000000000000000 RSI: 000000000003ffff RDI: 0000000000040000 +RBP: 000000000000ffd7 R08: ffffffff820cf297 R09: 0000000000000001 +R10: 0000000000000000 R11: ffffffff8103d1a0 R12: 0000000000003f00 +R13: 0000000000300000 R14: ffff888101cf3540 R15: 0000000000180000 +FS: 00007f9af4c09640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 +CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +CR2: 0000001b33824000 CR3: 000000012f241001 CR4: 0000000000170ee0 +Call Trace: + + __tcp_cleanup_rbuf+0x138/0x1d0 net/ipv4/tcp.c:1611 + mptcp_recvmsg+0xcb8/0xdd0 net/mptcp/protocol.c:2034 + inet_recvmsg+0x127/0x1f0 net/ipv4/af_inet.c:861 + ____sys_recvmsg+0x269/0x2b0 net/socket.c:1019 + ___sys_recvmsg+0xe6/0x260 net/socket.c:2764 + do_recvmmsg+0x1a5/0x470 net/socket.c:2858 + __do_sys_recvmmsg net/socket.c:2937 [inline] + __se_sys_recvmmsg net/socket.c:2953 [inline] + __x64_sys_recvmmsg+0xa6/0x130 net/socket.c:2953 + do_syscall_x64 arch/x86/entry/common.c:50 [inline] + do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80 + entry_SYSCALL_64_after_hwframe+0x72/0xdc +RIP: 0033:0x7f9af58fc6a9 +Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48 +RSP: 002b:00007f9af4c08cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b +RAX: ffffffffffffffda RBX: 00000000006bc050 RCX: 00007f9af58fc6a9 +RDX: 0000000000000001 RSI: 0000000020000140 RDI: 0000000000000004 +RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 +R10: 0000000000000f00 R11: 0000000000000246 R12: 00000000006bc05c +R13: fffffffffffffea8 R14: 00000000006bc050 R15: 000000000001fe40 + + +mptcp_recvmsg is allowed to release the msk socket lock when +blocking, and before re-acquiring it another thread could have +switched the sock to TCP_LISTEN status - with a prior +connect(AF_UNSPEC) - also clearing icsk_ack.rcv_mss. + +Address the issue preventing the disconnect if some other process is +concurrently performing a blocking syscall on the same socket, alike +commit 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting"). + +Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") +Cc: stable@vger.kernel.org +Reported-by: Christoph Paasch +Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/404 +Signed-off-by: Paolo Abeni +Tested-by: Christoph Paasch +Reviewed-by: Matthieu Baerts +Signed-off-by: Matthieu Baerts +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +--- + net/mptcp/protocol.c | 7 +++++++ + 1 file changed, 7 insertions(+) + +--- a/net/mptcp/protocol.c ++++ b/net/mptcp/protocol.c +@@ -2807,6 +2807,12 @@ static int mptcp_disconnect(struct sock + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk = mptcp_sk(sk); + ++ /* Deny disconnect if other threads are blocked in sk_wait_event() ++ * or inet_wait_for_connect(). ++ */ ++ if (sk->sk_wait_pending) ++ return -EBUSY; ++ + mptcp_do_flush_join_list(msk); + + mptcp_for_each_subflow(msk, subflow) { +@@ -2845,6 +2851,7 @@ struct sock *mptcp_sk_clone(const struct + inet_sk(nsk)->pinet6 = mptcp_inet6_sk(nsk); + #endif + ++ nsk->sk_wait_pending = 0; + __mptcp_init_sock(nsk); + + msk = mptcp_sk(nsk); diff --git a/queue-5.15/series b/queue-5.15/series new file mode 100644 index 00000000000..f74c86f7040 --- /dev/null +++ b/queue-5.15/series @@ -0,0 +1,4 @@ +mptcp-fix-possible-divide-by-zero-in-recvmsg.patch +mptcp-consolidate-fallback-and-non-fallback-state-machine.patch +mm-hwpoison-try-to-recover-from-copy-on-write-faults.patch +mm-hwpoison-when-copy-on-write-hits-poison-take-page-offline.patch