]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blame - releases/4.4.133/btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch
4.14-stable patches
[thirdparty/kernel/stable-queue.git] / releases / 4.4.133 / btrfs-fix-reading-stale-metadata-blocks-after-degraded-raid1-mounts.patch
CommitLineData
11f02f71
GKH
1From 02a3307aa9c20b4f6626255b028f07f6cfa16feb Mon Sep 17 00:00:00 2001
2From: Liu Bo <bo.liu@linux.alibaba.com>
3Date: Wed, 16 May 2018 01:37:36 +0800
4Subject: btrfs: fix reading stale metadata blocks after degraded raid1 mounts
5
6From: Liu Bo <bo.liu@linux.alibaba.com>
7
8commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream.
9
10If a btree block, aka. extent buffer, is not available in the extent
11buffer cache, it'll be read out from the disk instead, i.e.
12
13btrfs_search_slot()
14 read_block_for_search() # hold parent and its lock, go to read child
15 btrfs_release_path()
16 read_tree_block() # read child
17
18Unfortunately, the parent lock got released before reading child, so
19commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had
20used 0 as parent transid to read the child block. It forces
21read_tree_block() not to check if parent transid is different with the
22generation id of the child that it reads out from disk.
23
24A simple PoC is included in btrfs/124,
25
260. A two-disk raid1 btrfs,
27
281. Right after mkfs.btrfs, block A is allocated to be device tree's root.
29
302. Mount this filesystem and put it in use, after a while, device tree's
31 root got COW but block A hasn't been allocated/overwritten yet.
32
333. Umount it and reload the btrfs module to remove both disks from the
34 global @fs_devices list.
35
364. mount -odegraded dev1 and write some data, so now block A is allocated
37 to be a leaf in checksum tree. Note that only dev1 has the latest
38 metadata of this filesystem.
39
405. Umount it and mount it again normally (with both disks), since raid1
41 can pick up one disk by the writer task's pid, if btrfs_search_slot()
42 needs to read block A, dev2 which does NOT have the latest metadata
43 might be read for block A, then we got a stale block A.
44
456. As parent transid is not checked, block A is marked as uptodate and
46 put into the extent buffer cache, so the future search won't bother
47 to read disk again, which means it'll make changes on this stale
48 one and make it dirty and flush it onto disk.
49
50To avoid the problem, parent transid needs to be passed to
51read_tree_block().
52
53In order to get a valid parent transid, we need to hold the parent's
54lock until finishing reading child.
55
56This patch needs to be slightly adapted for stable kernels, the
57&first_key parameter added to read_tree_block() is from 4.16+
58(581c1760415c4). The fix is to replace 0 by 'gen'.
59
60Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race")
61CC: stable@vger.kernel.org # 4.4+
62Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
63Reviewed-by: Filipe Manana <fdmanana@suse.com>
64Reviewed-by: Qu Wenruo <wqu@suse.com>
65[ update changelog ]
66Signed-off-by: David Sterba <dsterba@suse.com>
67Signed-off-by: Nikolay Borisov <nborisov@suse.com>
68Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
69
70---
71 fs/btrfs/ctree.c | 6 +++---
72 1 file changed, 3 insertions(+), 3 deletions(-)
73
74--- a/fs/btrfs/ctree.c
75+++ b/fs/btrfs/ctree.c
76@@ -2497,10 +2497,8 @@ read_block_for_search(struct btrfs_trans
77 if (p->reada)
78 reada_for_search(root, p, level, slot, key->objectid);
79
80- btrfs_release_path(p);
81-
82 ret = -EAGAIN;
83- tmp = read_tree_block(root, blocknr, 0);
84+ tmp = read_tree_block(root, blocknr, gen);
85 if (!IS_ERR(tmp)) {
86 /*
87 * If the read above didn't mark this buffer up to date,
88@@ -2512,6 +2510,8 @@ read_block_for_search(struct btrfs_trans
89 ret = -EIO;
90 free_extent_buffer(tmp);
91 }
92+
93+ btrfs_release_path(p);
94 return ret;
95 }
96