]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob - releases/3.18.3/writeback-fix-a-subtle-race-condition-in-i_dirty-clearing.patch
Fixes for 4.19
[thirdparty/kernel/stable-queue.git] / releases / 3.18.3 / writeback-fix-a-subtle-race-condition-in-i_dirty-clearing.patch
1 From 9c6ac78eb3521c5937b2dd8a7d1b300f41092f45 Mon Sep 17 00:00:00 2001
2 From: Tejun Heo <tj@kernel.org>
3 Date: Fri, 24 Oct 2014 15:38:21 -0400
4 Subject: writeback: fix a subtle race condition in I_DIRTY clearing
5
6 From: Tejun Heo <tj@kernel.org>
7
8 commit 9c6ac78eb3521c5937b2dd8a7d1b300f41092f45 upstream.
9
10 After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
11 tests inode->i_state locklessly to see whether it already has all the
12 necessary I_DIRTY bits set. The comment above the barrier doesn't
13 contain any useful information - memory barriers can't ensure "changes
14 are seen by all cpus" by itself.
15
16 And it sure enough was broken. Please consider the following
17 scenario.
18
19 CPU 0 CPU 1
20 -------------------------------------------------------------------------------
21
22 enters __writeback_single_inode()
23 grabs inode->i_lock
24 tests PAGECACHE_TAG_DIRTY which is clear
25 enters __set_page_dirty()
26 grabs mapping->tree_lock
27 sets PAGECACHE_TAG_DIRTY
28 releases mapping->tree_lock
29 leaves __set_page_dirty()
30
31 enters __mark_inode_dirty()
32 smp_mb()
33 sees I_DIRTY_PAGES set
34 leaves __mark_inode_dirty()
35 clears I_DIRTY_PAGES
36 releases inode->i_lock
37
38 Now @inode has dirty pages w/ I_DIRTY_PAGES clear. This doesn't seem
39 to lead to an immediately critical problem because requeue_inode()
40 later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
41 deciding whether the inode needs to be requeued for IO and there are
42 enough unintentional memory barriers inbetween, so while the inode
43 ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
44 IO list.
45
46 The lack of explicit barrier may also theoretically affect the other
47 I_DIRTY bits which deal with metadata dirtiness. There is no
48 guarantee that a strong enough barrier exists between
49 I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
50 inode. Filesystem inode writeout path likely has enough stuff which
51 can behave as full barrier but it's theoretically possible that the
52 writeout may not see all the updates from ->dirty_inode().
53
54 Fix it by adding an explicit smp_mb() after I_DIRTY clearing. Note
55 that I_DIRTY_PAGES needs a special treatment as it always needs to be
56 cleared to be interlocked with the lockless test on
57 __mark_inode_dirty() side. It's cleared unconditionally and
58 reinstated after smp_mb() if the mapping still has dirty pages.
59
60 Also add comments explaining how and why the barriers are paired.
61
62 Lightly tested.
63
64 Signed-off-by: Tejun Heo <tj@kernel.org>
65 Cc: Jan Kara <jack@suse.cz>
66 Cc: Mikulas Patocka <mpatocka@redhat.com>
67 Cc: Jens Axboe <axboe@kernel.dk>
68 Cc: Al Viro <viro@zeniv.linux.org.uk>
69 Reviewed-by: Jan Kara <jack@suse.cz>
70 Signed-off-by: Jens Axboe <axboe@fb.com>
71 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
72
73 ---
74 fs/fs-writeback.c | 29 ++++++++++++++++++++++-------
75 1 file changed, 22 insertions(+), 7 deletions(-)
76
77 --- a/fs/fs-writeback.c
78 +++ b/fs/fs-writeback.c
79 @@ -479,12 +479,28 @@ __writeback_single_inode(struct inode *i
80 * write_inode()
81 */
82 spin_lock(&inode->i_lock);
83 - /* Clear I_DIRTY_PAGES if we've written out all dirty pages */
84 - if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
85 - inode->i_state &= ~I_DIRTY_PAGES;
86 +
87 dirty = inode->i_state & I_DIRTY;
88 - inode->i_state &= ~(I_DIRTY_SYNC | I_DIRTY_DATASYNC);
89 + inode->i_state &= ~I_DIRTY;
90 +
91 + /*
92 + * Paired with smp_mb() in __mark_inode_dirty(). This allows
93 + * __mark_inode_dirty() to test i_state without grabbing i_lock -
94 + * either they see the I_DIRTY bits cleared or we see the dirtied
95 + * inode.
96 + *
97 + * I_DIRTY_PAGES is always cleared together above even if @mapping
98 + * still has dirty pages. The flag is reinstated after smp_mb() if
99 + * necessary. This guarantees that either __mark_inode_dirty()
100 + * sees clear I_DIRTY_PAGES or we see PAGECACHE_TAG_DIRTY.
101 + */
102 + smp_mb();
103 +
104 + if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
105 + inode->i_state |= I_DIRTY_PAGES;
106 +
107 spin_unlock(&inode->i_lock);
108 +
109 /* Don't write the inode if only I_DIRTY_PAGES was set */
110 if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
111 int err = write_inode(inode, wbc);
112 @@ -1148,12 +1164,11 @@ void __mark_inode_dirty(struct inode *in
113 }
114
115 /*
116 - * make sure that changes are seen by all cpus before we test i_state
117 - * -- mikulas
118 + * Paired with smp_mb() in __writeback_single_inode() for the
119 + * following lockless i_state test. See there for details.
120 */
121 smp_mb();
122
123 - /* avoid the locking if we can */
124 if ((inode->i_state & flags) == flags)
125 return;
126