From: Sasha Levin Date: Sun, 2 Mar 2025 14:46:03 +0000 (-0500) Subject: Fixes for 6.1 X-Git-Tag: v6.6.81~37 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=48b37eda243630611ed0808a962da5d6b7c27bb6;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for 6.1 Signed-off-by: Sasha Levin --- diff --git a/queue-6.1/io_uring-net-save-msg_control-for-compat.patch b/queue-6.1/io_uring-net-save-msg_control-for-compat.patch new file mode 100644 index 0000000000..1e41ccd9ab --- /dev/null +++ b/queue-6.1/io_uring-net-save-msg_control-for-compat.patch @@ -0,0 +1,39 @@ +From bca0df5bb0beef50d9fdd1b7e0820e489c5a5ca9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 25 Feb 2025 15:59:02 +0000 +Subject: io_uring/net: save msg_control for compat + +From: Pavel Begunkov + +[ Upstream commit 6ebf05189dfc6d0d597c99a6448a4d1064439a18 ] + +Match the compat part of io_sendmsg_copy_hdr() with its counterpart and +save msg_control. + +Fixes: c55978024d123 ("io_uring/net: move receive multishot out of the generic msghdr path") +Signed-off-by: Pavel Begunkov +Link: https://lore.kernel.org/r/2a8418821fe83d3b64350ad2b3c0303e9b732bbd.1740498502.git.asml.silence@gmail.com +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + io_uring/net.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +diff --git a/io_uring/net.c b/io_uring/net.c +index dc7c1e44ec47b..d56e8a47e50f2 100644 +--- a/io_uring/net.c ++++ b/io_uring/net.c +@@ -282,7 +282,9 @@ static int io_sendmsg_copy_hdr(struct io_kiocb *req, + if (unlikely(ret)) + return ret; + +- return __get_compat_msghdr(&iomsg->msg, &cmsg, NULL); ++ ret = __get_compat_msghdr(&iomsg->msg, &cmsg, NULL); ++ sr->msg_control = iomsg->msg.msg_control_user; ++ return ret; + } + #endif + +-- +2.39.5 + diff --git a/queue-6.1/mm-don-t-pin-zero_page-in-pin_user_pages.patch b/queue-6.1/mm-don-t-pin-zero_page-in-pin_user_pages.patch new file mode 100644 index 0000000000..c116e34580 --- /dev/null +++ b/queue-6.1/mm-don-t-pin-zero_page-in-pin_user_pages.patch @@ -0,0 +1,204 @@ +From a8599aaa353716a3bbc9c9054116540cc63836e0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 26 May 2023 22:41:40 +0100 +Subject: mm: Don't pin ZERO_PAGE in pin_user_pages() + +From: David Howells + +[ Upstream commit c8070b78751955e59b42457b974bea4a4fe00187 ] + +Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer +to it from the page tables and make unpin_user_page*() correspondingly +ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a +zero page's refcount as we're only allowed ~2 million pins on it - +something that userspace can conceivably trigger. + +Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. + +Signed-off-by: David Howells +cc: Christoph Hellwig +cc: David Hildenbrand +cc: Lorenzo Stoakes +cc: Andrew Morton +cc: Jens Axboe +cc: Al Viro +cc: Matthew Wilcox +cc: Jan Kara +cc: Jeff Layton +cc: Jason Gunthorpe +cc: Logan Gunthorpe +cc: Hillf Danton +cc: Christian Brauner +cc: Linus Torvalds +cc: linux-fsdevel@vger.kernel.org +cc: linux-block@vger.kernel.org +cc: linux-kernel@vger.kernel.org +cc: linux-mm@kvack.org +Reviewed-by: Lorenzo Stoakes +Reviewed-by: Christoph Hellwig +Acked-by: David Hildenbrand +Link: https://lore.kernel.org/r/20230526214142.958751-2-dhowells@redhat.com +Signed-off-by: Jens Axboe +Stable-dep-of: bddf10d26e6e ("uprobes: Reject the shared zeropage in uprobe_write_opcode()") +Signed-off-by: Sasha Levin +--- + Documentation/core-api/pin_user_pages.rst | 6 +++++ + include/linux/mm.h | 26 +++++++++++++++++-- + mm/gup.c | 31 ++++++++++++++++++++++- + 3 files changed, 60 insertions(+), 3 deletions(-) + +diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst +index b18416f4500fe..7995ce2b9676a 100644 +--- a/Documentation/core-api/pin_user_pages.rst ++++ b/Documentation/core-api/pin_user_pages.rst +@@ -113,6 +113,12 @@ pages: + This also leads to limitations: there are only 31-10==21 bits available for a + counter that increments 10 bits at a time. + ++* Because of that limitation, special handling is applied to the zero pages ++ when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its ++ refcount or pincount at all (it is permanent, so there's no need). The ++ unpinning functions also don't do anything to a zero page. This is ++ transparent to the caller. ++ + * Callers must specifically request "dma-pinned tracking of pages". In other + words, just calling get_user_pages() will not suffice; a new set of functions, + pin_user_page() and related, must be used. +diff --git a/include/linux/mm.h b/include/linux/mm.h +index 971186f0b7b07..03357c196e0ba 100644 +--- a/include/linux/mm.h ++++ b/include/linux/mm.h +@@ -1610,6 +1610,28 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, + return page_maybe_dma_pinned(page); + } + ++/** ++ * is_zero_page - Query if a page is a zero page ++ * @page: The page to query ++ * ++ * This returns true if @page is one of the permanent zero pages. ++ */ ++static inline bool is_zero_page(const struct page *page) ++{ ++ return is_zero_pfn(page_to_pfn(page)); ++} ++ ++/** ++ * is_zero_folio - Query if a folio is a zero page ++ * @folio: The folio to query ++ * ++ * This returns true if @folio is one of the permanent zero pages. ++ */ ++static inline bool is_zero_folio(const struct folio *folio) ++{ ++ return is_zero_page(&folio->page); ++} ++ + /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */ + #ifdef CONFIG_MIGRATION + static inline bool is_longterm_pinnable_page(struct page *page) +@@ -1620,8 +1642,8 @@ static inline bool is_longterm_pinnable_page(struct page *page) + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) + return false; + #endif +- /* The zero page may always be pinned */ +- if (is_zero_pfn(page_to_pfn(page))) ++ /* The zero page can be "pinned" but gets special handling. */ ++ if (is_zero_page(page)) + return true; + + /* Coherent device memory must always allow eviction. */ +diff --git a/mm/gup.c b/mm/gup.c +index e31d00443c4e6..b1daaa9d89aab 100644 +--- a/mm/gup.c ++++ b/mm/gup.c +@@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages, + struct page *page = *pages; + struct folio *folio = page_folio(page); + +- if (!folio_test_anon(folio)) ++ if (is_zero_page(page) || ++ !folio_test_anon(folio)) + continue; + if (!folio_test_large(folio) || folio_test_hugetlb(folio)) + VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page); +@@ -128,6 +129,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) + else if (flags & FOLL_PIN) { + struct folio *folio; + ++ /* ++ * Don't take a pin on the zero page - it's not going anywhere ++ * and it is used in a *lot* of places. ++ */ ++ if (is_zero_page(page)) ++ return page_folio(page); ++ + /* + * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a + * right zone, so fail and let the caller fall back to the slow +@@ -177,6 +185,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) + static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) + { + if (flags & FOLL_PIN) { ++ if (is_zero_folio(folio)) ++ return; + node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs); + if (folio_test_large(folio)) + atomic_sub(refs, folio_pincount_ptr(folio)); +@@ -217,6 +227,13 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags) + if (flags & FOLL_GET) + folio_ref_inc(folio); + else if (flags & FOLL_PIN) { ++ /* ++ * Don't take a pin on the zero page - it's not going anywhere ++ * and it is used in a *lot* of places. ++ */ ++ if (is_zero_page(page)) ++ return 0; ++ + /* + * Similar to try_grab_folio(): be sure to *also* + * increment the normal page refcount field at least once, +@@ -3149,6 +3166,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast); + * + * FOLL_PIN means that the pages must be released via unpin_user_page(). Please + * see Documentation/core-api/pin_user_pages.rst for further details. ++ * ++ * Note that if a zero_page is amongst the returned pages, it will not have ++ * pins in it and unpin_user_page() will not remove pins from it. + */ + int pin_user_pages_fast(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +@@ -3225,6 +3245,9 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast_only); + * + * FOLL_PIN means that the pages must be released via unpin_user_page(). Please + * see Documentation/core-api/pin_user_pages.rst for details. ++ * ++ * Note that if a zero_page is amongst the returned pages, it will not have ++ * pins in it and unpin_user_page*() will not remove pins from it. + */ + long pin_user_pages_remote(struct mm_struct *mm, + unsigned long start, unsigned long nr_pages, +@@ -3260,6 +3283,9 @@ EXPORT_SYMBOL(pin_user_pages_remote); + * + * FOLL_PIN means that the pages must be released via unpin_user_page(). Please + * see Documentation/core-api/pin_user_pages.rst for details. ++ * ++ * Note that if a zero_page is amongst the returned pages, it will not have ++ * pins in it and unpin_user_page*() will not remove pins from it. + */ + long pin_user_pages(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, +@@ -3282,6 +3308,9 @@ EXPORT_SYMBOL(pin_user_pages); + * pin_user_pages_unlocked() is the FOLL_PIN variant of + * get_user_pages_unlocked(). Behavior is the same, except that this one sets + * FOLL_PIN and rejects FOLL_GET. ++ * ++ * Note that if a zero_page is amongst the returned pages, it will not have ++ * pins in it and unpin_user_page*() will not remove pins from it. + */ + long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, + struct page **pages, unsigned int gup_flags) +-- +2.39.5 + diff --git a/queue-6.1/series b/queue-6.1/series index f4af24f3e2..f20981853d 100644 --- a/queue-6.1/series +++ b/queue-6.1/series @@ -140,3 +140,7 @@ net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch net-ipv6-fix-dst-ref-loop-on-input-in-seg6-lwt.patch net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch net-ipv6-fix-dst-ref-loop-on-input-in-rpl-lwt.patch +mm-don-t-pin-zero_page-in-pin_user_pages.patch +uprobes-reject-the-shared-zeropage-in-uprobe_write_o.patch +io_uring-net-save-msg_control-for-compat.patch +x86-cpu-fix-warm-boot-hang-regression-on-amd-sc1100-.patch diff --git a/queue-6.1/uprobes-reject-the-shared-zeropage-in-uprobe_write_o.patch b/queue-6.1/uprobes-reject-the-shared-zeropage-in-uprobe_write_o.patch new file mode 100644 index 0000000000..d8a2db9f7b --- /dev/null +++ b/queue-6.1/uprobes-reject-the-shared-zeropage-in-uprobe_write_o.patch @@ -0,0 +1,112 @@ +From ae46ae421edc0622f79130dc5759799b410ce276 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 24 Feb 2025 11:11:49 +0800 +Subject: uprobes: Reject the shared zeropage in uprobe_write_opcode() + +From: Tong Tiangen + +[ Upstream commit bddf10d26e6e5114e7415a0e442ec6f51a559468 ] + +We triggered the following crash in syzkaller tests: + + BUG: Bad page state in process syz.7.38 pfn:1eff3 + page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1eff3 + flags: 0x3fffff00004004(referenced|reserved|node=0|zone=1|lastcpupid=0x1fffff) + raw: 003fffff00004004 ffffe6c6c07bfcc8 ffffe6c6c07bfcc8 0000000000000000 + raw: 0000000000000000 0000000000000000 00000000fffffffe 0000000000000000 + page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 + Call Trace: + + dump_stack_lvl+0x32/0x50 + bad_page+0x69/0xf0 + free_unref_page_prepare+0x401/0x500 + free_unref_page+0x6d/0x1b0 + uprobe_write_opcode+0x460/0x8e0 + install_breakpoint.part.0+0x51/0x80 + register_for_each_vma+0x1d9/0x2b0 + __uprobe_register+0x245/0x300 + bpf_uprobe_multi_link_attach+0x29b/0x4f0 + link_create+0x1e2/0x280 + __sys_bpf+0x75f/0xac0 + __x64_sys_bpf+0x1a/0x30 + do_syscall_64+0x56/0x100 + entry_SYSCALL_64_after_hwframe+0x78/0xe2 + + BUG: Bad rss-counter state mm:00000000452453e0 type:MM_FILEPAGES val:-1 + +The following syzkaller test case can be used to reproduce: + + r2 = creat(&(0x7f0000000000)='./file0\x00', 0x8) + write$nbd(r2, &(0x7f0000000580)=ANY=[], 0x10) + r4 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00', 0x42, 0x0) + mmap$IORING_OFF_SQ_RING(&(0x7f0000ffd000/0x3000)=nil, 0x3000, 0x0, 0x12, r4, 0x0) + r5 = userfaultfd(0x80801) + ioctl$UFFDIO_API(r5, 0xc018aa3f, &(0x7f0000000040)={0xaa, 0x20}) + r6 = userfaultfd(0x80801) + ioctl$UFFDIO_API(r6, 0xc018aa3f, &(0x7f0000000140)) + ioctl$UFFDIO_REGISTER(r6, 0xc020aa00, &(0x7f0000000100)={{&(0x7f0000ffc000/0x4000)=nil, 0x4000}, 0x2}) + ioctl$UFFDIO_ZEROPAGE(r5, 0xc020aa04, &(0x7f0000000000)={{&(0x7f0000ffd000/0x1000)=nil, 0x1000}}) + r7 = bpf$PROG_LOAD(0x5, &(0x7f0000000140)={0x2, 0x3, &(0x7f0000000200)=ANY=[@ANYBLOB="1800000000120000000000000000000095"], &(0x7f0000000000)='GPL\x00', 0x7, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, @fallback=0x30, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, @void, @value}, 0x94) + bpf$BPF_LINK_CREATE_XDP(0x1c, &(0x7f0000000040)={r7, 0x0, 0x30, 0x1e, @val=@uprobe_multi={&(0x7f0000000080)='./file0\x00', &(0x7f0000000100)=[0x2], 0x0, 0x0, 0x1}}, 0x40) + +The cause is that zero pfn is set to the PTE without increasing the RSS +count in mfill_atomic_pte_zeropage() and the refcount of zero folio does +not increase accordingly. Then, the operation on the same pfn is performed +in uprobe_write_opcode()->__replace_page() to unconditional decrease the +RSS count and old_folio's refcount. + +Therefore, two bugs are introduced: + + 1. The RSS count is incorrect, when process exit, the check_mm() report + error "Bad rss-count". + + 2. The reserved folio (zero folio) is freed when folio->refcount is zero, + then free_pages_prepare->free_page_is_bad() report error + "Bad page state". + +There is more, the following warning could also theoretically be triggered: + + __replace_page() + -> ... + -> folio_remove_rmap_pte() + -> VM_WARN_ON_FOLIO(is_zero_folio(folio), folio) + +Considering that uprobe hit on the zero folio is a very rare case, just +reject zero old folio immediately after get_user_page_vma_remote(). + +[ mingo: Cleaned up the changelog ] + +Fixes: 7396fa818d62 ("uprobes/core: Make background page replacement logic account for rss_stat counters") +Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints") +Signed-off-by: Tong Tiangen +Signed-off-by: Ingo Molnar +Reviewed-by: David Hildenbrand +Reviewed-by: Oleg Nesterov +Cc: Peter Zijlstra +Cc: Masami Hiramatsu +Link: https://lore.kernel.org/r/20250224031149.1598949-1-tongtiangen@huawei.com +Signed-off-by: Sasha Levin +--- + kernel/events/uprobes.c | 5 +++++ + 1 file changed, 5 insertions(+) + +diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c +index 9ee25351cecac..7a22db17f3b5e 100644 +--- a/kernel/events/uprobes.c ++++ b/kernel/events/uprobes.c +@@ -484,6 +484,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, + if (ret <= 0) + goto put_old; + ++ if (is_zero_page(old_page)) { ++ ret = -EINVAL; ++ goto put_old; ++ } ++ + if (WARN(!is_register && PageCompound(old_page), + "uprobe unregister should never work on compound page\n")) { + ret = -EINVAL; +-- +2.39.5 + diff --git a/queue-6.1/x86-cpu-fix-warm-boot-hang-regression-on-amd-sc1100-.patch b/queue-6.1/x86-cpu-fix-warm-boot-hang-regression-on-amd-sc1100-.patch new file mode 100644 index 0000000000..6749057f03 --- /dev/null +++ b/queue-6.1/x86-cpu-fix-warm-boot-hang-regression-on-amd-sc1100-.patch @@ -0,0 +1,95 @@ +From 98b06a94f40c45e36699127c9eb0408046265fbb Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 25 Feb 2025 22:31:20 +0100 +Subject: x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems + +From: Russell Senior + +[ Upstream commit bebe35bb738b573c32a5033499cd59f20293f2a3 ] + +I still have some Soekris net4826 in a Community Wireless Network I +volunteer with. These devices use an AMD SC1100 SoC. I am running +OpenWrt on them, which uses a patched kernel, that naturally has +evolved over time. I haven't updated the ones in the field in a +number of years (circa 2017), but have one in a test bed, where I have +intermittently tried out test builds. + +A few years ago, I noticed some trouble, particularly when "warm +booting", that is, doing a reboot without removing power, and noticed +the device was hanging after the kernel message: + + [ 0.081615] Working around Cyrix MediaGX virtual DMA bugs. + +If I removed power and then restarted, it would boot fine, continuing +through the message above, thusly: + + [ 0.081615] Working around Cyrix MediaGX virtual DMA bugs. + [ 0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor. + [ 0.100000] Enable Memory access reorder on Cyrix/NSC processor. + [ 0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 + [ 0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 + [ 0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1) + [...] + +In order to continue using modern tools, like ssh, to interact with +the software on these old devices, I need modern builds of the OpenWrt +firmware on the devices. I confirmed that the warm boot hang was still +an issue in modern OpenWrt builds (currently using a patched linux +v6.6.65). + +Last night, I decided it was time to get to the bottom of the warm +boot hang, and began bisecting. From preserved builds, I narrowed down +the bisection window from late February to late May 2019. During this +period, the OpenWrt builds were using 4.14.x. I was able to build +using period-correct Ubuntu 18.04.6. After a number of bisection +iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as +the commit that introduced the warm boot hang. + + https://github.com/openwrt/openwrt/commit/07aaa7e3d62ad32767d7067107db64b6ade81537 + +Looking at the upstream changes in the stable kernel between 4.14.112 +and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect: + + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5 + +So, I tried reverting just that kernel change on top of the breaking +OpenWrt commit, and my warm boot hang went away. + +Presumably, the warm boot hang is due to some register not getting +cleared in the same way that a loss of power does. That is +approximately as much as I understand about the problem. + +More poking/prodding and coaching from Jonas Gorski, it looks +like this test patch fixes the problem on my board: Tested against +v6.6.67 and v4.14.113. + +Fixes: 18fb053f9b82 ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors") +Debugged-by: Jonas Gorski +Signed-off-by: Russell Senior +Signed-off-by: Ingo Molnar +Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com +Cc: Matthew Whitehead +Cc: Thomas Gleixner +Signed-off-by: Sasha Levin +--- + arch/x86/kernel/cpu/cyrix.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c +index 9651275aecd1b..dfec2c61e3547 100644 +--- a/arch/x86/kernel/cpu/cyrix.c ++++ b/arch/x86/kernel/cpu/cyrix.c +@@ -153,8 +153,8 @@ static void geode_configure(void) + u8 ccr3; + local_irq_save(flags); + +- /* Suspend on halt power saving and enable #SUSP pin */ +- setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88); ++ /* Suspend on halt power saving */ ++ setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x08); + + ccr3 = getCx86(CX86_CCR3); + setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10); /* enable MAPEN */ +-- +2.39.5 +