--- /dev/null
+From 073861ed77b6b957c3c8d54a11dc503f7d986ceb Mon Sep 17 00:00:00 2001
+From: Hugh Dickins <hughd@google.com>
+Date: Tue, 24 Nov 2020 08:46:43 -0800
+Subject: mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
+
+From: Hugh Dickins <hughd@google.com>
+
+commit 073861ed77b6b957c3c8d54a11dc503f7d986ceb upstream.
+
+Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
+on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
+end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
+no longer an ext4 page at all.
+
+The problem is that PageWriteback is not accompanied by a page reference
+(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
+soon as TestClearPageWriteback has been done, that page could be removed
+from page cache, freed, and reused for something else by the time that
+wake_up_page() is reached.
+
+https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
+Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
+check; but I'm paranoid about even looking at an unreferenced struct page,
+lest its memory might itself have already been reused or hotremoved (and
+wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
+
+Then on crashing a second time, realized there's a stronger reason against
+that approach. If my testing just occasionally crashes on that check,
+when the page is reused for part of a compound page, wouldn't it be much
+more common for the page to get reused as an order-0 page before reaching
+wake_up_page()? And on rare occasions, might that reused page already be
+marked PageWriteback by its new user, and already be waited upon? What
+would that look like?
+
+It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
+in write_cache_pages() (though I have never seen that crash myself).
+
+Matthew Wilcox explaining this to himself:
+ "page is allocated, added to page cache, dirtied, writeback starts,
+
+ --- thread A ---
+ filesystem calls end_page_writeback()
+ test_clear_page_writeback()
+ --- context switch to thread B ---
+ truncate_inode_pages_range() finds the page, it doesn't have writeback set,
+ we delete it from the page cache. Page gets reallocated, dirtied, writeback
+ starts again. Then we call write_cache_pages(), see
+ PageWriteback() set, call wait_on_page_writeback()
+ --- context switch back to thread A ---
+ wake_up_page(page, PG_writeback);
+ ... thread B is woken, but because the wakeup was for the old use of
+ the page, PageWriteback is still set.
+
+ Devious"
+
+And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
+this would have been much less likely: before that, wake_page_function()'s
+non-exclusive case would stop walking and not wake if it found Writeback
+already set again; whereas now the non-exclusive case proceeds to wake.
+
+I have not thought of a fix that does not add a little overhead: the
+simplest fix is for end_page_writeback() to get_page() before calling
+test_clear_page_writeback(), then put_page() after wake_up_page().
+
+Was there a chance of missed wakeups before, since a page freed before
+reaching wake_up_page() would have PageWaiters cleared? I think not,
+because each waiter does hold a reference on the page. This bug comes
+when the old use of the page, the one we do TestClearPageWriteback on,
+had *no* waiters, so no additional page reference beyond the page cache
+(and whoever racily freed it). The reuse of the page has a waiter
+holding a reference, and its own PageWriteback set; but the belated
+wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
+
+Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
+Reported-by: Qian Cai <cai@lca.pw>
+Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
+Signed-off-by: Hugh Dickins <hughd@google.com>
+Cc: stable@vger.kernel.org # v5.8+
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/filemap.c | 8 ++++++++
+ mm/page-writeback.c | 6 ------
+ 2 files changed, 8 insertions(+), 6 deletions(-)
+
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -1388,11 +1388,19 @@ void end_page_writeback(struct page *pag
+ rotate_reclaimable_page(page);
+ }
+
++ /*
++ * Writeback does not hold a page reference of its own, relying
++ * on truncation to wait for the clearing of PG_writeback.
++ * But here we must make sure that the page is not freed and
++ * reused before the wake_up_page().
++ */
++ get_page(page);
+ if (!test_clear_page_writeback(page))
+ BUG();
+
+ smp_mb__after_atomic();
+ wake_up_page(page, PG_writeback);
++ put_page(page);
+ }
+ EXPORT_SYMBOL(end_page_writeback);
+
+--- a/mm/page-writeback.c
++++ b/mm/page-writeback.c
+@@ -2746,12 +2746,6 @@ int test_clear_page_writeback(struct pag
+ } else {
+ ret = TestClearPageWriteback(page);
+ }
+- /*
+- * NOTE: Page might be free now! Writeback doesn't hold a page
+- * reference on its own, it relies on truncation to wait for
+- * the clearing of PG_writeback. The below can only access
+- * page state that is static across allocation cycles.
+- */
+ if (ret) {
+ dec_lruvec_state(lruvec, NR_WRITEBACK);
+ dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
--- /dev/null
+From c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 Mon Sep 17 00:00:00 2001
+From: Linus Torvalds <torvalds@linux-foundation.org>
+Date: Tue, 5 Jan 2021 11:33:00 -0800
+Subject: mm: make wait_on_page_writeback() wait for multiple pending writebacks
+
+From: Linus Torvalds <torvalds@linux-foundation.org>
+
+commit c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 upstream.
+
+Ever since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common()
+logic") we've had some very occasional reports of BUG_ON(PageWriteback)
+in write_cache_pages(), which we thought we already fixed in commit
+073861ed77b6 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
+
+But syzbot just reported another one, even with that commit in place.
+
+And it turns out that there's a simpler way to trigger the BUG_ON() than
+the one Hugh found with page re-use. It all boils down to the fact that
+the page writeback is ostensibly serialized by the page lock, but that
+isn't actually really true.
+
+Yes, the people _setting_ writeback all do so under the page lock, but
+the actual clearing of the bit - and waking up any waiters - happens
+without any page lock.
+
+This gives us this fairly simple race condition:
+
+ CPU1 = end previous writeback
+ CPU2 = start new writeback under page lock
+ CPU3 = write_cache_pages()
+
+ CPU1 CPU2 CPU3
+ ---- ---- ----
+
+ end_page_writeback()
+ test_clear_page_writeback(page)
+ ... delayed...
+
+ lock_page();
+ set_page_writeback()
+ unlock_page()
+
+ lock_page()
+ wait_on_page_writeback();
+
+ wake_up_page(page, PG_writeback);
+ .. wakes up CPU3 ..
+
+ BUG_ON(PageWriteback(page));
+
+where the BUG_ON() happens because we woke up the PG_writeback bit
+becasue of the _previous_ writeback, but a new one had already been
+started because the clearing of the bit wasn't actually atomic wrt the
+actual wakeup or serialized by the page lock.
+
+The reason this didn't use to happen was that the old logic in waiting
+on a page bit would just loop if it ever saw the bit set again.
+
+The nice proper fix would probably be to get rid of the whole "wait for
+writeback to clear, and then set it" logic in the writeback path, and
+replace it with an atomic "wait-to-set" (ie the same as we have for page
+locking: we set the page lock bit with a single "lock_page()", not with
+"wait for lock bit to clear and then set it").
+
+However, out current model for writeback is that the waiting for the
+writeback bit is done by the generic VFS code (ie write_cache_pages()),
+but the actual setting of the writeback bit is done much later by the
+filesystem ".writepages()" function.
+
+IOW, to make the writeback bit have that same kind of "wait-to-set"
+behavior as we have for page locking, we'd have to change our roughly
+~50 different writeback functions. Painful.
+
+Instead, just make "wait_on_page_writeback()" loop on the very unlikely
+situation that the PG_writeback bit is still set, basically re-instating
+the old behavior. This is very non-optimal in case of contention, but
+since we only ever set the bit under the page lock, that situation is
+controlled.
+
+Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
+Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
+Acked-by: Hugh Dickins <hughd@google.com>
+Cc: Andrew Morton <akpm@linux-foundation.org>
+Cc: Matthew Wilcox <willy@infradead.org>
+Cc: stable@kernel.org
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/page-writeback.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/mm/page-writeback.c
++++ b/mm/page-writeback.c
+@@ -2811,7 +2811,7 @@ EXPORT_SYMBOL(__test_set_page_writeback)
+ */
+ void wait_on_page_writeback(struct page *page)
+ {
+- if (PageWriteback(page)) {
++ while (PageWriteback(page)) {
+ trace_wait_on_page_writeback(page, page_mapping(page));
+ wait_on_page_bit(page, PG_writeback);
+ }