]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blame - releases/2.6.19.2/vm-fix-nasty-and-subtle-race-in-shared-mmap-ed-page-writeback.patch
drop a bunch of perf patches
[thirdparty/kernel/stable-queue.git] / releases / 2.6.19.2 / vm-fix-nasty-and-subtle-race-in-shared-mmap-ed-page-writeback.patch
CommitLineData
2532c0bb
CW
1From 7658cc289288b8ae7dd2c2224549a048431222b3 Mon Sep 17 00:00:00 2001
2From: Linus Torvalds <torvalds@macmini.osdl.org>
3Date: Fri, 29 Dec 2006 10:00:58 -0800
4Subject: VM: Fix nasty and subtle race in shared mmap'ed page writeback
5
6The VM layer (on the face of it, fairly reasonably) expected that when
7it does a ->writepage() call to the filesystem, it would write out the
8full page at that point in time. Especially since it had earlier marked
9the whole page dirty with "set_page_dirty()".
10
11But that isn't actually the case: ->writepage() does not actually write
12a page, it writes the parts of the page that have been explicitly marked
13dirty before, *and* that had not got written out for other reasons since
14the last time we told it they were dirty.
15
16That last caveat is the important one.
17
18Which _most_ of the time ends up being the whole page (since we had
19called "set_page_dirty()" on the page earlier), but if the filesystem
20had done any dirty flushing of its own (for example, to honor some
21internal write ordering guarantees), it might end up doing only a
22partial page IO (or none at all) when ->writepage() is actually called.
23
24That is the correct thing in general (since we actually often _want_
25only the known-dirty parts of the page to be written out), but the
26shared dirty page handling had implicitly forgotten about these details,
27and had a number of cases where it was doing just the "->writepage()"
28part, without telling the low-level filesystem that the whole page might
29have been re-dirtied as part of being mapped writably into user space.
30
31Since most of the time the FS did actually write out the full page, we
32didn't notice this for a loong time, and this needed some really odd
33patterns to trigger. But it caused occasional corruption with rtorrent
34and with the Debian "apt" database, because both use shared mmaps to
35update the end result.
36
37This fixes it. Finally. After way too much hair-pulling.
38
39Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
40Acked-by: Martin J. Bligh <mbligh@google.com>
41Acked-by: Martin Michlmayr <tbm@cyrius.com>
42Acked-by: Martin Johansson <martin@fatbob.nu>
43Acked-by: Ingo Molnar <mingo@elte.hu>
44Acked-by: Andrei Popa <andrei.popa@i-neo.ro>
45Cc: High Dickins <hugh@veritas.com>
46Cc: Andrew Morton <akpm@osdl.org>,
47Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
48Cc: Segher Boessenkool <segher@kernel.crashing.org>
49Cc: David Miller <davem@davemloft.net>
50Cc: Arjan van de Ven <arjan@infradead.org>
51Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
52Cc: Guillaume Chazarain <guichaz@yahoo.fr>
53Cc: Theodore Tso <tytso@mit.edu>
54Cc: Kenneth Cheng <kenneth.w.chen@intel.com>
55Cc: Tobias Diedrich <ranma@tdiedrich.de>
56Signed-off-by: Linus Torvalds <torvalds@osdl.org>
57[chrisw: backport to 2.6.19.1]
58Signed-off-by: Chris Wright <chrisw@sous-sol.org>
59---
60 mm/page-writeback.c | 39 ++++++++++++++++++++++++++++++++++-----
61 1 file changed, 34 insertions(+), 5 deletions(-)
62
63--- linux-2.6.19.1.orig/mm/page-writeback.c
64+++ linux-2.6.19.1/mm/page-writeback.c
65@@ -893,12 +893,41 @@ int clear_page_dirty_for_io(struct page
66 {
67 struct address_space *mapping = page_mapping(page);
68
69- if (mapping) {
70+ if (mapping && mapping_cap_account_dirty(mapping)) {
71+ /*
72+ * Yes, Virginia, this is indeed insane.
73+ *
74+ * We use this sequence to make sure that
75+ * (a) we account for dirty stats properly
76+ * (b) we tell the low-level filesystem to
77+ * mark the whole page dirty if it was
78+ * dirty in a pagetable. Only to then
79+ * (c) clean the page again and return 1 to
80+ * cause the writeback.
81+ *
82+ * This way we avoid all nasty races with the
83+ * dirty bit in multiple places and clearing
84+ * them concurrently from different threads.
85+ *
86+ * Note! Normally the "set_page_dirty(page)"
87+ * has no effect on the actual dirty bit - since
88+ * that will already usually be set. But we
89+ * need the side effects, and it can help us
90+ * avoid races.
91+ *
92+ * We basically use the page "master dirty bit"
93+ * as a serialization point for all the different
94+ * threads doing their things.
95+ *
96+ * FIXME! We still have a race here: if somebody
97+ * adds the page back to the page tables in
98+ * between the "page_mkclean()" and the "TestClearPageDirty()",
99+ * we might have it mapped without the dirty bit set.
100+ */
101+ if (page_mkclean(page))
102+ set_page_dirty(page);
103 if (TestClearPageDirty(page)) {
104- if (mapping_cap_account_dirty(mapping)) {
105- page_mkclean(page);
106- dec_zone_page_state(page, NR_FILE_DIRTY);
107- }
108+ dec_zone_page_state(page, NR_FILE_DIRTY);
109 return 1;
110 }
111 return 0;