From: Qu Wenruo <wqu@suse.com>
Date: Mon, 3 Nov 2025 04:07:29 +0000 (+1030)
Subject: fs: fully sync all fses even for an emergency sync
X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=2706659d642eb6b2d659aab52c122d6170e59dd0;p=thirdparty%2Flinux.git

fs: fully sync all fses even for an emergency sync

[BUG]
There is a bug report that during emergency sync, btrfs only write back
all the dirty data and metadadta, but no full transaction commit,
resulting the super block still pointing to the old trees, thus the end
user can only see the old data, not the newer one.

[CAUSE]
Initially this looks like a btrfs specific bug, since ext4 doesn't get
affected by this one.

But the root problem here is, a combination of btrfs features and the no
wait nature of emergency sync.

Firstly do_sync_work() will call sync_inodes_one_sb() for every fs, to
writeback all the dirty pages for the fs.

Btrfs will properly writeback all dirty pages, including both data and
the updated metadata. So far so good.

Then sync_fs_one_sb() called with @nowait, in the case of btrfs it means
no full transaction commit, thus no super block update.

At this stage, btrfs is only one super block update away to be fully committed.
I believe it's the more or less the same for other fses too.

The problem is the next step, sync_bdevs().
Normally other fses have their super block already updated in the page
cache of the block device, but btrfs only updates the super block during
full transaction commit.

So sync_bdevs() may work for other fses, but not for btrfs, btrfs is
still using its older super block, all pointing back to the old metadata
and data.

Thus if after emergency sync, power loss happened, the end user will
only see the old data, not the newer one, despite that everything but the
super block is already written back.

[FIX]
Since the emergency sync is already executing in a workqueue, I didn't
see much need to only do a nowait sync.
Especially after the fact that sync_inodes_one_sb() always wait for the
writeback to finish.

Instead for the second iteration of sync_fs_one_sb(), pass wait == 1
into it, so fses like btrfs can properly commit its super blocks.

Reported-by: Askar Safin <safinaskar@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/20251101150429.321537-1-safinaskar@gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Link: https://patch.msgid.link/7b7fd40c5fe440b633b6c0c741d96ce93eb5a89a.1762142636.git.wqu@suse.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
---

diff --git a/fs/sync.c b/fs/sync.c
index c80c2e658b09..d4bb6d25b1de 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -117,6 +117,7 @@ SYSCALL_DEFINE0(sync)
 static void do_sync_work(struct work_struct *work)
 {
 	int nowait = 0;
+	int wait = 1;
 
 	/*
 	 * Sync twice to reduce the possibility we skipped some inodes / pages
@@ -126,7 +127,7 @@ static void do_sync_work(struct work_struct *work)
 	iterate_supers(sync_fs_one_sb, &nowait);
 	sync_bdevs(false);
 	iterate_supers(sync_inodes_one_sb, NULL);
-	iterate_supers(sync_fs_one_sb, &nowait);
+	iterate_supers(sync_fs_one_sb, &wait);
 	sync_bdevs(false);
 	printk("Emergency Sync complete\n");
 	kfree(work);