]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/commitdiff
5.4-stable patches
authorGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Mon, 26 Jun 2023 15:26:42 +0000 (17:26 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Mon, 26 Jun 2023 15:26:42 +0000 (17:26 +0200)
added patches:
mm-fix-vm_bug_on-pagetail-and-bug_on-pagewriteback.patch
mm-make-wait_on_page_writeback-wait-for-multiple-pending-writebacks.patch

queue-5.4/mm-fix-vm_bug_on-pagetail-and-bug_on-pagewriteback.patch [new file with mode: 0644]
queue-5.4/mm-make-wait_on_page_writeback-wait-for-multiple-pending-writebacks.patch [new file with mode: 0644]
queue-5.4/series

diff --git a/queue-5.4/mm-fix-vm_bug_on-pagetail-and-bug_on-pagewriteback.patch b/queue-5.4/mm-fix-vm_bug_on-pagetail-and-bug_on-pagewriteback.patch
new file mode 100644 (file)
index 0000000..ed717c9
--- /dev/null
@@ -0,0 +1,122 @@
+From 073861ed77b6b957c3c8d54a11dc503f7d986ceb Mon Sep 17 00:00:00 2001
+From: Hugh Dickins <hughd@google.com>
+Date: Tue, 24 Nov 2020 08:46:43 -0800
+Subject: mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
+
+From: Hugh Dickins <hughd@google.com>
+
+commit 073861ed77b6b957c3c8d54a11dc503f7d986ceb upstream.
+
+Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
+on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
+end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
+no longer an ext4 page at all.
+
+The problem is that PageWriteback is not accompanied by a page reference
+(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
+soon as TestClearPageWriteback has been done, that page could be removed
+from page cache, freed, and reused for something else by the time that
+wake_up_page() is reached.
+
+https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
+Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
+check; but I'm paranoid about even looking at an unreferenced struct page,
+lest its memory might itself have already been reused or hotremoved (and
+wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
+
+Then on crashing a second time, realized there's a stronger reason against
+that approach.  If my testing just occasionally crashes on that check,
+when the page is reused for part of a compound page, wouldn't it be much
+more common for the page to get reused as an order-0 page before reaching
+wake_up_page()?  And on rare occasions, might that reused page already be
+marked PageWriteback by its new user, and already be waited upon?  What
+would that look like?
+
+It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
+in write_cache_pages() (though I have never seen that crash myself).
+
+Matthew Wilcox explaining this to himself:
+ "page is allocated, added to page cache, dirtied, writeback starts,
+
+  --- thread A ---
+  filesystem calls end_page_writeback()
+        test_clear_page_writeback()
+  --- context switch to thread B ---
+  truncate_inode_pages_range() finds the page, it doesn't have writeback set,
+  we delete it from the page cache.  Page gets reallocated, dirtied, writeback
+  starts again.  Then we call write_cache_pages(), see
+  PageWriteback() set, call wait_on_page_writeback()
+  --- context switch back to thread A ---
+  wake_up_page(page, PG_writeback);
+  ... thread B is woken, but because the wakeup was for the old use of
+  the page, PageWriteback is still set.
+
+  Devious"
+
+And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
+this would have been much less likely: before that, wake_page_function()'s
+non-exclusive case would stop walking and not wake if it found Writeback
+already set again; whereas now the non-exclusive case proceeds to wake.
+
+I have not thought of a fix that does not add a little overhead: the
+simplest fix is for end_page_writeback() to get_page() before calling
+test_clear_page_writeback(), then put_page() after wake_up_page().
+
+Was there a chance of missed wakeups before, since a page freed before
+reaching wake_up_page() would have PageWaiters cleared?  I think not,
+because each waiter does hold a reference on the page.  This bug comes
+when the old use of the page, the one we do TestClearPageWriteback on,
+had *no* waiters, so no additional page reference beyond the page cache
+(and whoever racily freed it).  The reuse of the page has a waiter
+holding a reference, and its own PageWriteback set; but the belated
+wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
+
+Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
+Reported-by: Qian Cai <cai@lca.pw>
+Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
+Signed-off-by: Hugh Dickins <hughd@google.com>
+Cc: stable@vger.kernel.org # v5.8+
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/filemap.c        |    8 ++++++++
+ mm/page-writeback.c |    6 ------
+ 2 files changed, 8 insertions(+), 6 deletions(-)
+
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -1388,11 +1388,19 @@ void end_page_writeback(struct page *pag
+               rotate_reclaimable_page(page);
+       }
++      /*
++       * Writeback does not hold a page reference of its own, relying
++       * on truncation to wait for the clearing of PG_writeback.
++       * But here we must make sure that the page is not freed and
++       * reused before the wake_up_page().
++       */
++      get_page(page);
+       if (!test_clear_page_writeback(page))
+               BUG();
+       smp_mb__after_atomic();
+       wake_up_page(page, PG_writeback);
++      put_page(page);
+ }
+ EXPORT_SYMBOL(end_page_writeback);
+--- a/mm/page-writeback.c
++++ b/mm/page-writeback.c
+@@ -2746,12 +2746,6 @@ int test_clear_page_writeback(struct pag
+       } else {
+               ret = TestClearPageWriteback(page);
+       }
+-      /*
+-       * NOTE: Page might be free now! Writeback doesn't hold a page
+-       * reference on its own, it relies on truncation to wait for
+-       * the clearing of PG_writeback. The below can only access
+-       * page state that is static across allocation cycles.
+-       */
+       if (ret) {
+               dec_lruvec_state(lruvec, NR_WRITEBACK);
+               dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
diff --git a/queue-5.4/mm-make-wait_on_page_writeback-wait-for-multiple-pending-writebacks.patch b/queue-5.4/mm-make-wait_on_page_writeback-wait-for-multiple-pending-writebacks.patch
new file mode 100644 (file)
index 0000000..cf7e815
--- /dev/null
@@ -0,0 +1,102 @@
+From c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Tue, 5 Jan 2021 11:33:00 -0800
+Subject: mm: make wait_on_page_writeback() wait for multiple pending writebacks
+
+From: Linus Torvalds <torvalds@linux-foundation.org>
+
+commit c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 upstream.
+
+Ever since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common()
+logic") we've had some very occasional reports of BUG_ON(PageWriteback)
+in write_cache_pages(), which we thought we already fixed in commit
+073861ed77b6 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
+
+But syzbot just reported another one, even with that commit in place.
+
+And it turns out that there's a simpler way to trigger the BUG_ON() than
+the one Hugh found with page re-use.  It all boils down to the fact that
+the page writeback is ostensibly serialized by the page lock, but that
+isn't actually really true.
+
+Yes, the people _setting_ writeback all do so under the page lock, but
+the actual clearing of the bit - and waking up any waiters - happens
+without any page lock.
+
+This gives us this fairly simple race condition:
+
+  CPU1 = end previous writeback
+  CPU2 = start new writeback under page lock
+  CPU3 = write_cache_pages()
+
+  CPU1          CPU2            CPU3
+  ----          ----            ----
+
+  end_page_writeback()
+    test_clear_page_writeback(page)
+    ... delayed...
+
+                lock_page();
+                set_page_writeback()
+                unlock_page()
+
+                                lock_page()
+                                wait_on_page_writeback();
+
+    wake_up_page(page, PG_writeback);
+    .. wakes up CPU3 ..
+
+                                BUG_ON(PageWriteback(page));
+
+where the BUG_ON() happens because we woke up the PG_writeback bit
+becasue of the _previous_ writeback, but a new one had already been
+started because the clearing of the bit wasn't actually atomic wrt the
+actual wakeup or serialized by the page lock.
+
+The reason this didn't use to happen was that the old logic in waiting
+on a page bit would just loop if it ever saw the bit set again.
+
+The nice proper fix would probably be to get rid of the whole "wait for
+writeback to clear, and then set it" logic in the writeback path, and
+replace it with an atomic "wait-to-set" (ie the same as we have for page
+locking: we set the page lock bit with a single "lock_page()", not with
+"wait for lock bit to clear and then set it").
+
+However, out current model for writeback is that the waiting for the
+writeback bit is done by the generic VFS code (ie write_cache_pages()),
+but the actual setting of the writeback bit is done much later by the
+filesystem ".writepages()" function.
+
+IOW, to make the writeback bit have that same kind of "wait-to-set"
+behavior as we have for page locking, we'd have to change our roughly
+~50 different writeback functions.  Painful.
+
+Instead, just make "wait_on_page_writeback()" loop on the very unlikely
+situation that the PG_writeback bit is still set, basically re-instating
+the old behavior.  This is very non-optimal in case of contention, but
+since we only ever set the bit under the page lock, that situation is
+controlled.
+
+Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
+Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
+Acked-by: Hugh Dickins <hughd@google.com>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Cc: Matthew Wilcox <willy@infradead.org>
+Cc: stable@kernel.org
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/page-writeback.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/mm/page-writeback.c
++++ b/mm/page-writeback.c
+@@ -2811,7 +2811,7 @@ EXPORT_SYMBOL(__test_set_page_writeback)
+  */
+ void wait_on_page_writeback(struct page *page)
+ {
+-      if (PageWriteback(page)) {
++      while (PageWriteback(page)) {
+               trace_wait_on_page_writeback(page, page_mapping(page));
+               wait_on_page_bit(page, PG_writeback);
+       }
index b2f306348804f86f7362510ced3c483e07f6a0a0..59ff235412c5e0ae959399f0dae80ff13c2c9ea1 100644 (file)
@@ -55,3 +55,5 @@ drm-exynos-fix-race-condition-uaf-in-exynos_g2d_exec.patch
 drm-radeon-fix-race-condition-uaf-in-radeon_gem_set_.patch
 x86-apic-fix-kernel-panic-when-booting-with-intremap.patch
 i2c-imx-lpi2c-fix-type-char-overflow-issue-when-calc.patch
+mm-fix-vm_bug_on-pagetail-and-bug_on-pagewriteback.patch
+mm-make-wait_on_page_writeback-wait-for-multiple-pending-writebacks.patch