]>
Commit | Line | Data |
---|---|---|
9fae0da3 GKH |
1 | From dd18dbc2d42af75fffa60c77e0f02220bc329829 Mon Sep 17 00:00:00 2001 |
2 | From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> | |
3 | Date: Fri, 9 May 2014 15:37:00 -0700 | |
4 | Subject: mm, thp: close race between mremap() and split_huge_page() | |
5 | ||
6 | From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> | |
7 | ||
8 | commit dd18dbc2d42af75fffa60c77e0f02220bc329829 upstream. | |
9 | ||
10 | It's critical for split_huge_page() (and migration) to catch and freeze | |
11 | all PMDs on rmap walk. It gets tricky if there's concurrent fork() or | |
12 | mremap() since usually we copy/move page table entries on dup_mm() or | |
13 | move_page_tables() without rmap lock taken. To get it work we rely on | |
14 | rmap walk order to not miss any entry. We expect to see destination VMA | |
15 | after source one to work correctly. | |
16 | ||
17 | But after switching rmap implementation to interval tree it's not always | |
18 | possible to preserve expected walk order. | |
19 | ||
20 | It works fine for dup_mm() since new VMA has the same vma_start_pgoff() | |
21 | / vma_last_pgoff() and explicitly insert dst VMA after src one with | |
22 | vma_interval_tree_insert_after(). | |
23 | ||
24 | But on move_vma() destination VMA can be merged into adjacent one and as | |
25 | result shifted left in interval tree. Fortunately, we can detect the | |
26 | situation and prevent race with rmap walk by moving page table entries | |
27 | under rmap lock. See commit 38a76013ad80. | |
28 | ||
29 | Problem is that we miss the lock when we move transhuge PMD. Most | |
30 | likely this bug caused the crash[1]. | |
31 | ||
32 | [1] http://thread.gmane.org/gmane.linux.kernel.mm/96473 | |
33 | ||
34 | Fixes: 108d6642ad81 ("mm anon rmap: remove anon_vma_moveto_tail") | |
35 | ||
36 | Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> | |
37 | Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> | |
38 | Cc: Rik van Riel <riel@redhat.com> | |
39 | Acked-by: Michel Lespinasse <walken@google.com> | |
40 | Cc: Dave Jones <davej@redhat.com> | |
41 | Cc: David Miller <davem@davemloft.net> | |
42 | Acked-by: Johannes Weiner <hannes@cmpxchg.org> | |
43 | Signed-off-by: Andrew Morton <akpm@linux-foundation.org> | |
44 | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | |
45 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
46 | ||
47 | --- | |
48 | mm/mremap.c | 9 ++++++++- | |
49 | 1 file changed, 8 insertions(+), 1 deletion(-) | |
50 | ||
51 | --- a/mm/mremap.c | |
52 | +++ b/mm/mremap.c | |
53 | @@ -175,10 +175,17 @@ unsigned long move_page_tables(struct vm | |
54 | break; | |
55 | if (pmd_trans_huge(*old_pmd)) { | |
56 | int err = 0; | |
57 | - if (extent == HPAGE_PMD_SIZE) | |
58 | + if (extent == HPAGE_PMD_SIZE) { | |
59 | + VM_BUG_ON(vma->vm_file || !vma->anon_vma); | |
60 | + /* See comment in move_ptes() */ | |
61 | + if (need_rmap_locks) | |
62 | + anon_vma_lock_write(vma->anon_vma); | |
63 | err = move_huge_pmd(vma, new_vma, old_addr, | |
64 | new_addr, old_end, | |
65 | old_pmd, new_pmd); | |
66 | + if (need_rmap_locks) | |
67 | + anon_vma_unlock_write(vma->anon_vma); | |
68 | + } | |
69 | if (err > 0) { | |
70 | need_flush = true; | |
71 | continue; |