--- /dev/null
+From 02a3307aa9c20b4f6626255b028f07f6cfa16feb Mon Sep 17 00:00:00 2001
+From: Liu Bo <bo.liu@linux.alibaba.com>
+Date: Wed, 16 May 2018 01:37:36 +0800
+Subject: btrfs: fix reading stale metadata blocks after degraded raid1 mounts
+
+From: Liu Bo <bo.liu@linux.alibaba.com>
+
+commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream.
+
+If a btree block, aka. extent buffer, is not available in the extent
+buffer cache, it'll be read out from the disk instead, i.e.
+
+btrfs_search_slot()
+ read_block_for_search() # hold parent and its lock, go to read child
+ btrfs_release_path()
+ read_tree_block() # read child
+
+Unfortunately, the parent lock got released before reading child, so
+commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had
+used 0 as parent transid to read the child block. It forces
+read_tree_block() not to check if parent transid is different with the
+generation id of the child that it reads out from disk.
+
+A simple PoC is included in btrfs/124,
+
+0. A two-disk raid1 btrfs,
+
+1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
+
+2. Mount this filesystem and put it in use, after a while, device tree's
+ root got COW but block A hasn't been allocated/overwritten yet.
+
+3. Umount it and reload the btrfs module to remove both disks from the
+ global @fs_devices list.
+
+4. mount -odegraded dev1 and write some data, so now block A is allocated
+ to be a leaf in checksum tree. Note that only dev1 has the latest
+ metadata of this filesystem.
+
+5. Umount it and mount it again normally (with both disks), since raid1
+ can pick up one disk by the writer task's pid, if btrfs_search_slot()
+ needs to read block A, dev2 which does NOT have the latest metadata
+ might be read for block A, then we got a stale block A.
+
+6. As parent transid is not checked, block A is marked as uptodate and
+ put into the extent buffer cache, so the future search won't bother
+ to read disk again, which means it'll make changes on this stale
+ one and make it dirty and flush it onto disk.
+
+To avoid the problem, parent transid needs to be passed to
+read_tree_block().
+
+In order to get a valid parent transid, we need to hold the parent's
+lock until finishing reading child.
+
+This patch needs to be slightly adapted for stable kernels, the
+&first_key parameter added to read_tree_block() is from 4.16+
+(581c1760415c4). The fix is to replace 0 by 'gen'.
+
+Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race")
+CC: stable@vger.kernel.org # 4.4+
+Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
+Reviewed-by: Filipe Manana <fdmanana@suse.com>
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+[ update changelog ]
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Nikolay Borisov <nborisov@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/ctree.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/fs/btrfs/ctree.c
++++ b/fs/btrfs/ctree.c
+@@ -2491,10 +2491,8 @@ read_block_for_search(struct btrfs_root
+ if (p->reada != READA_NONE)
+ reada_for_search(fs_info, p, level, slot, key->objectid);
+
+- btrfs_release_path(p);
+-
+ ret = -EAGAIN;
+- tmp = read_tree_block(fs_info, blocknr, 0);
++ tmp = read_tree_block(fs_info, blocknr, gen);
+ if (!IS_ERR(tmp)) {
+ /*
+ * If the read above didn't mark this buffer up to date,
+@@ -2508,6 +2506,8 @@ read_block_for_search(struct btrfs_root
+ } else {
+ ret = PTR_ERR(tmp);
+ }
++
++ btrfs_release_path(p);
+ return ret;
+ }
+