]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob - queue-5.0/btrfs-honor-path-skip_locking-in-backref-code.patch
5.0-stable patches
[thirdparty/kernel/stable-queue.git] / queue-5.0 / btrfs-honor-path-skip_locking-in-backref-code.patch
1 From 38e3eebff643db725633657d1d87a3be019d1018 Mon Sep 17 00:00:00 2001
2 From: Josef Bacik <josef@toxicpanda.com>
3 Date: Wed, 16 Jan 2019 11:00:57 -0500
4 Subject: btrfs: honor path->skip_locking in backref code
5
6 From: Josef Bacik <josef@toxicpanda.com>
7
8 commit 38e3eebff643db725633657d1d87a3be019d1018 upstream.
9
10 Qgroups will do the old roots lookup at delayed ref time, which could be
11 while walking down the extent root while running a delayed ref. This
12 should be fine, except we specifically lock eb's in the backref walking
13 code irrespective of path->skip_locking, which deadlocks the system.
14 Fix up the backref code to honor path->skip_locking, nobody will be
15 modifying the commit_root when we're searching so it's completely safe
16 to do.
17
18 This happens since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup
19 accounting time out of commit trans"), kernel may lockup with quota
20 enabled.
21
22 There is one backref trace triggered by snapshot dropping along with
23 write operation in the source subvolume. The example can be reliably
24 reproduced:
25
26 btrfs-cleaner D 0 4062 2 0x80000000
27 Call Trace:
28 schedule+0x32/0x90
29 btrfs_tree_read_lock+0x93/0x130 [btrfs]
30 find_parent_nodes+0x29b/0x1170 [btrfs]
31 btrfs_find_all_roots_safe+0xa8/0x120 [btrfs]
32 btrfs_find_all_roots+0x57/0x70 [btrfs]
33 btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
34 btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs]
35 btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs]
36 do_walk_down+0x541/0x5e3 [btrfs]
37 walk_down_tree+0xab/0xe7 [btrfs]
38 btrfs_drop_snapshot+0x356/0x71a [btrfs]
39 btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs]
40 cleaner_kthread+0x12b/0x160 [btrfs]
41 kthread+0x112/0x130
42 ret_from_fork+0x27/0x50
43
44 When dropping snapshots with qgroup enabled, we will trigger backref
45 walk.
46
47 However such backref walk at that timing is pretty dangerous, as if one
48 of the parent nodes get WRITE locked by other thread, we could cause a
49 dead lock.
50
51 For example:
52
53 FS 260 FS 261 (Dropped)
54 node A node B
55 / \ / \
56 node C node D node E
57 / \ / \ / \
58 leaf F|leaf G|leaf H|leaf I|leaf J|leaf K
59
60 The lock sequence would be:
61
62 Thread A (cleaner) | Thread B (other writer)
63 -----------------------------------------------------------------------
64 write_lock(B) |
65 write_lock(D) |
66 ^^^ called by walk_down_tree() |
67 | write_lock(A)
68 | write_lock(D) << Stall
69 read_lock(H) << for backref walk |
70 read_lock(D) << lock owner is |
71 the same thread A |
72 so read lock is OK |
73 read_lock(A) << Stall |
74
75 So thread A hold write lock D, and needs read lock A to unlock.
76 While thread B holds write lock A, while needs lock D to unlock.
77
78 This will cause a deadlock.
79
80 This is not only limited to snapshot dropping case. As the backref
81 walk, even only happens on commit trees, is breaking the normal top-down
82 locking order, makes it deadlock prone.
83
84 Fixes: fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans")
85 CC: stable@vger.kernel.org # 4.14+
86 Reported-and-tested-by: David Sterba <dsterba@suse.com>
87 Reported-by: Filipe Manana <fdmanana@suse.com>
88 Reviewed-by: Qu Wenruo <wqu@suse.com>
89 Signed-off-by: Josef Bacik <josef@toxicpanda.com>
90 Reviewed-by: Filipe Manana <fdmanana@suse.com>
91 [ rebase to latest branch and fix lock assert bug in btrfs/007 ]
92 [ solve conflicts and backport to linux-5.0.y ]
93 Signed-off-by: Qu Wenruo <wqu@suse.com>
94 [ copy logs and deadlock analysis from Qu's patch ]
95 Signed-off-by: David Sterba <dsterba@suse.com>
96 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
97
98 ---
99 fs/btrfs/backref.c | 19 ++++++++++++-------
100 1 file changed, 12 insertions(+), 7 deletions(-)
101
102 --- a/fs/btrfs/backref.c
103 +++ b/fs/btrfs/backref.c
104 @@ -712,7 +712,7 @@ out:
105 * read tree blocks and add keys where required.
106 */
107 static int add_missing_keys(struct btrfs_fs_info *fs_info,
108 - struct preftrees *preftrees)
109 + struct preftrees *preftrees, bool lock)
110 {
111 struct prelim_ref *ref;
112 struct extent_buffer *eb;
113 @@ -737,12 +737,14 @@ static int add_missing_keys(struct btrfs
114 free_extent_buffer(eb);
115 return -EIO;
116 }
117 - btrfs_tree_read_lock(eb);
118 + if (lock)
119 + btrfs_tree_read_lock(eb);
120 if (btrfs_header_level(eb) == 0)
121 btrfs_item_key_to_cpu(eb, &ref->key_for_search, 0);
122 else
123 btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0);
124 - btrfs_tree_read_unlock(eb);
125 + if (lock)
126 + btrfs_tree_read_unlock(eb);
127 free_extent_buffer(eb);
128 prelim_ref_insert(fs_info, &preftrees->indirect, ref, NULL);
129 cond_resched();
130 @@ -1227,7 +1229,7 @@ again:
131
132 btrfs_release_path(path);
133
134 - ret = add_missing_keys(fs_info, &preftrees);
135 + ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0);
136 if (ret)
137 goto out;
138
139 @@ -1288,11 +1290,14 @@ again:
140 ret = -EIO;
141 goto out;
142 }
143 - btrfs_tree_read_lock(eb);
144 - btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
145 + if (!path->skip_locking) {
146 + btrfs_tree_read_lock(eb);
147 + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
148 + }
149 ret = find_extent_in_eb(eb, bytenr,
150 *extent_item_pos, &eie, ignore_offset);
151 - btrfs_tree_read_unlock_blocking(eb);
152 + if (!path->skip_locking)
153 + btrfs_tree_read_unlock_blocking(eb);
154 free_extent_buffer(eb);
155 if (ret < 0)
156 goto out;