From d5258c3812e6955bc58a68f4b5529762fff683ea Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 21 May 2018 10:34:22 +0200 Subject: [PATCH] 4.16-stable patches added patches: btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch --- ...a-blocks-after-degraded-raid1-mounts.patch | 96 +++++++++++++++++++ queue-4.16/series | 1 + 2 files changed, 97 insertions(+) create mode 100644 queue-4.16/btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch diff --git a/queue-4.16/btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch b/queue-4.16/btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch new file mode 100644 index 00000000000..17b48c30680 --- /dev/null +++ b/queue-4.16/btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch @@ -0,0 +1,96 @@ +From 02a3307aa9c20b4f6626255b028f07f6cfa16feb Mon Sep 17 00:00:00 2001 +From: Liu Bo +Date: Wed, 16 May 2018 01:37:36 +0800 +Subject: btrfs: fix reading stale metadata blocks after degraded raid1 mounts + +From: Liu Bo + +commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream. + +If a btree block, aka. extent buffer, is not available in the extent +buffer cache, it'll be read out from the disk instead, i.e. + +btrfs_search_slot() + read_block_for_search() # hold parent and its lock, go to read child + btrfs_release_path() + read_tree_block() # read child + +Unfortunately, the parent lock got released before reading child, so +commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had +used 0 as parent transid to read the child block. It forces +read_tree_block() not to check if parent transid is different with the +generation id of the child that it reads out from disk. + +A simple PoC is included in btrfs/124, + +0. A two-disk raid1 btrfs, + +1. Right after mkfs.btrfs, block A is allocated to be device tree's root. + +2. Mount this filesystem and put it in use, after a while, device tree's + root got COW but block A hasn't been allocated/overwritten yet. + +3. Umount it and reload the btrfs module to remove both disks from the + global @fs_devices list. + +4. mount -odegraded dev1 and write some data, so now block A is allocated + to be a leaf in checksum tree. Note that only dev1 has the latest + metadata of this filesystem. + +5. Umount it and mount it again normally (with both disks), since raid1 + can pick up one disk by the writer task's pid, if btrfs_search_slot() + needs to read block A, dev2 which does NOT have the latest metadata + might be read for block A, then we got a stale block A. + +6. As parent transid is not checked, block A is marked as uptodate and + put into the extent buffer cache, so the future search won't bother + to read disk again, which means it'll make changes on this stale + one and make it dirty and flush it onto disk. + +To avoid the problem, parent transid needs to be passed to +read_tree_block(). + +In order to get a valid parent transid, we need to hold the parent's +lock until finishing reading child. + +This patch needs to be slightly adapted for stable kernels, the +&first_key parameter added to read_tree_block() is from 4.16+ +(581c1760415c4). The fix is to replace 0 by 'gen'. + +Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race") +CC: stable@vger.kernel.org # 4.4+ +Signed-off-by: Liu Bo +Reviewed-by: Filipe Manana +Reviewed-by: Qu Wenruo +[ update changelog ] +Signed-off-by: David Sterba +Signed-off-by: Nikolay Borisov +Signed-off-by: Greg Kroah-Hartman + +--- + fs/btrfs/ctree.c | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +--- a/fs/btrfs/ctree.c ++++ b/fs/btrfs/ctree.c +@@ -2491,10 +2491,8 @@ read_block_for_search(struct btrfs_root + if (p->reada != READA_NONE) + reada_for_search(fs_info, p, level, slot, key->objectid); + +- btrfs_release_path(p); +- + ret = -EAGAIN; +- tmp = read_tree_block(fs_info, blocknr, 0); ++ tmp = read_tree_block(fs_info, blocknr, gen); + if (!IS_ERR(tmp)) { + /* + * If the read above didn't mark this buffer up to date, +@@ -2508,6 +2506,8 @@ read_block_for_search(struct btrfs_root + } else { + ret = PTR_ERR(tmp); + } ++ ++ btrfs_release_path(p); + return ret; + } + diff --git a/queue-4.16/series b/queue-4.16/series index 553aa99b00e..0e5540306d8 100644 --- a/queue-4.16/series +++ b/queue-4.16/series @@ -58,3 +58,4 @@ btrfs-property-set-incompat-flag-if-lzo-zstd-compression-is-set.patch btrfs-fix-crash-when-trying-to-resume-balance-without-the-resume-flag.patch btrfs-split-btrfs_del_delalloc_inode-into-2-functions.patch btrfs-fix-delalloc-inodes-invalidation-during-transaction-abort.patch +btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch -- 2.47.2