]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blame - releases/5.0.19/x86-mpx-mm-core-fix-recursive-munmap-corruption.patch
4.9-stable patches
[thirdparty/kernel/stable-queue.git] / releases / 5.0.19 / x86-mpx-mm-core-fix-recursive-munmap-corruption.patch
CommitLineData
dc13a30d
GKH
1From 5a28fc94c9143db766d1ba5480cae82d856ad080 Mon Sep 17 00:00:00 2001
2From: Dave Hansen <dave.hansen@linux.intel.com>
3Date: Fri, 19 Apr 2019 12:47:47 -0700
4Subject: x86/mpx, mm/core: Fix recursive munmap() corruption
5
6From: Dave Hansen <dave.hansen@linux.intel.com>
7
8commit 5a28fc94c9143db766d1ba5480cae82d856ad080 upstream.
9
10This is a bit of a mess, to put it mildly. But, it's a bug
11that only seems to have showed up in 4.20 but wasn't noticed
12until now, because nobody uses MPX.
13
14MPX has the arch_unmap() hook inside of munmap() because MPX
15uses bounds tables that protect other areas of memory. When
16memory is unmapped, there is also a need to unmap the MPX
17bounds tables. Barring this, unused bounds tables can eat 80%
18of the address space.
19
20But, the recursive do_munmap() that gets called vi arch_unmap()
21wreaks havoc with __do_munmap()'s state. It can result in
22freeing populated page tables, accessing bogus VMA state,
23double-freed VMAs and more.
24
25See the "long story" further below for the gory details.
26
27To fix this, call arch_unmap() before __do_unmap() has a chance
28to do anything meaningful. Also, remove the 'vma' argument
29and force the MPX code to do its own, independent VMA lookup.
30
31== UML / unicore32 impact ==
32
33Remove unused 'vma' argument to arch_unmap(). No functional
34change.
35
36I compile tested this on UML but not unicore32.
37
38== powerpc impact ==
39
40powerpc uses arch_unmap() well to watch for munmap() on the
41VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
42arch_unmap() makes this happen earlier in __do_munmap(). But,
43'vdso_base' seems to only be used in perf and in the signal
44delivery that happens near the return to userspace. I can not
45find any likely impact to powerpc, other than the zeroing
46happening a little earlier.
47
48powerpc does not use the 'vma' argument and is unaffected by
49its removal.
50
51I compile-tested a 64-bit powerpc defconfig.
52
53== x86 impact ==
54
55For the common success case this is functionally identical to
56what was there before. For the munmap() failure case, it's
57possible that some MPX tables will be zapped for memory that
58continues to be in use. But, this is an extraordinarily
59unlikely scenario and the harm would be that MPX provides no
60protection since the bounds table got reset (zeroed).
61
62I can't imagine anyone doing this:
63
64 ptr = mmap();
65 // use ptr
66 ret = munmap(ptr);
67 if (ret)
68 // oh, there was an error, I'll
69 // keep using ptr.
70
71Because if you're doing munmap(), you are *done* with the
72memory. There's probably no good data in there _anyway_.
73
74This passes the original reproducer from Richard Biener as
75well as the existing mpx selftests/.
76
77The long story:
78
79munmap() has a couple of pieces:
80
81 1. Find the affected VMA(s)
82 2. Split the start/end one(s) if neceesary
83 3. Pull the VMAs out of the rbtree
84 4. Actually zap the memory via unmap_region(), including
85 freeing page tables (or queueing them to be freed).
86 5. Fix up some of the accounting (like fput()) and actually
87 free the VMA itself.
88
89This specific ordering was actually introduced by:
90
91 dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
92
93during the 4.20 merge window. The previous __do_munmap() code
94was actually safe because the only thing after arch_unmap() was
95remove_vma_list(). arch_unmap() could not see 'vma' in the
96rbtree because it was detached, so it is not even capable of
97doing operations unsafe for remove_vma_list()'s use of 'vma'.
98
99Richard Biener reported a test that shows this in dmesg:
100
101 [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
102 [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
103
104What triggered this was the recursive do_munmap() called via
105arch_unmap(). It was freeing page tables that has not been
106properly zapped.
107
108But, the problem was bigger than this. For one, arch_unmap()
109can free VMAs. But, the calling __do_munmap() has variables
110that *point* to VMAs and obviously can't handle them just
111getting freed while the pointer is still in use.
112
113I tried a couple of things here. First, I tried to fix the page
114table freeing problem in isolation, but I then found the VMA
115issue. I also tried having the MPX code return a flag if it
116modified the rbtree which would force __do_munmap() to re-walk
117to restart. That spiralled out of control in complexity pretty
118fast.
119
120Just moving arch_unmap() and accepting that the bonkers failure
121case might eat some bounds tables seems like the simplest viable
122fix.
123
124This was also reported in the following kernel bugzilla entry:
125
126 https://bugzilla.kernel.org/show_bug.cgi?id=203123
127
128There are some reports that this commit triggered this bug:
129
130 dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap")
131
132While that commit certainly made the issues easier to hit, I believe
133the fundamental issue has been with us as long as MPX itself, thus
134the Fixes: tag below is for one of the original MPX commits.
135
136[ mingo: Minor edits to the changelog and the patch. ]
137
138Reported-by: Richard Biener <rguenther@suse.de>
139Reported-by: H.J. Lu <hjl.tools@gmail.com>
140Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
141Reviewed-by Thomas Gleixner <tglx@linutronix.de>
142Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
143Acked-by: Michael Ellerman <mpe@ellerman.id.au>
144Cc: Andrew Morton <akpm@linux-foundation.org>
145Cc: Andy Lutomirski <luto@kernel.org>
146Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
147Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
148Cc: Borislav Petkov <bp@alien8.de>
149Cc: Guan Xuetao <gxt@pku.edu.cn>
150Cc: H. Peter Anvin <hpa@zytor.com>
151Cc: Jeff Dike <jdike@addtoit.com>
152Cc: Linus Torvalds <torvalds@linux-foundation.org>
153Cc: Michal Hocko <mhocko@suse.com>
154Cc: Paul Mackerras <paulus@samba.org>
155Cc: Peter Zijlstra <peterz@infradead.org>
156Cc: Richard Weinberger <richard@nod.at>
157Cc: Rik van Riel <riel@surriel.com>
158Cc: Vlastimil Babka <vbabka@suse.cz>
159Cc: linux-arch@vger.kernel.org
160Cc: linux-mm@kvack.org
161Cc: linux-um@lists.infradead.org
162Cc: linuxppc-dev@lists.ozlabs.org
163Cc: stable@vger.kernel.org
164Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
165Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com
166Signed-off-by: Ingo Molnar <mingo@kernel.org>
167Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
168
169---
170 arch/powerpc/include/asm/mmu_context.h | 1 -
171 arch/um/include/asm/mmu_context.h | 1 -
172 arch/unicore32/include/asm/mmu_context.h | 1 -
173 arch/x86/include/asm/mmu_context.h | 6 +++---
174 arch/x86/include/asm/mpx.h | 15 ++++++++-------
175 arch/x86/mm/mpx.c | 10 ++++++----
176 include/asm-generic/mm_hooks.h | 1 -
177 mm/mmap.c | 15 ++++++++-------
178 8 files changed, 25 insertions(+), 25 deletions(-)
179
180--- a/arch/powerpc/include/asm/mmu_context.h
181+++ b/arch/powerpc/include/asm/mmu_context.h
182@@ -237,7 +237,6 @@ extern void arch_exit_mmap(struct mm_str
183 #endif
184
185 static inline void arch_unmap(struct mm_struct *mm,
186- struct vm_area_struct *vma,
187 unsigned long start, unsigned long end)
188 {
189 if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
190--- a/arch/um/include/asm/mmu_context.h
191+++ b/arch/um/include/asm/mmu_context.h
192@@ -22,7 +22,6 @@ static inline int arch_dup_mmap(struct m
193 }
194 extern void arch_exit_mmap(struct mm_struct *mm);
195 static inline void arch_unmap(struct mm_struct *mm,
196- struct vm_area_struct *vma,
197 unsigned long start, unsigned long end)
198 {
199 }
200--- a/arch/unicore32/include/asm/mmu_context.h
201+++ b/arch/unicore32/include/asm/mmu_context.h
202@@ -88,7 +88,6 @@ static inline int arch_dup_mmap(struct m
203 }
204
205 static inline void arch_unmap(struct mm_struct *mm,
206- struct vm_area_struct *vma,
207 unsigned long start, unsigned long end)
208 {
209 }
210--- a/arch/x86/include/asm/mmu_context.h
211+++ b/arch/x86/include/asm/mmu_context.h
212@@ -277,8 +277,8 @@ static inline void arch_bprm_mm_init(str
213 mpx_mm_init(mm);
214 }
215
216-static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
217- unsigned long start, unsigned long end)
218+static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
219+ unsigned long end)
220 {
221 /*
222 * mpx_notify_unmap() goes and reads a rarely-hot
223@@ -298,7 +298,7 @@ static inline void arch_unmap(struct mm_
224 * consistently wrong.
225 */
226 if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
227- mpx_notify_unmap(mm, vma, start, end);
228+ mpx_notify_unmap(mm, start, end);
229 }
230
231 /*
232--- a/arch/x86/include/asm/mpx.h
233+++ b/arch/x86/include/asm/mpx.h
234@@ -64,12 +64,15 @@ struct mpx_fault_info {
235 };
236
237 #ifdef CONFIG_X86_INTEL_MPX
238-int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs);
239-int mpx_handle_bd_fault(void);
240+
241+extern int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs);
242+extern int mpx_handle_bd_fault(void);
243+
244 static inline int kernel_managing_mpx_tables(struct mm_struct *mm)
245 {
246 return (mm->context.bd_addr != MPX_INVALID_BOUNDS_DIR);
247 }
248+
249 static inline void mpx_mm_init(struct mm_struct *mm)
250 {
251 /*
252@@ -78,11 +81,10 @@ static inline void mpx_mm_init(struct mm
253 */
254 mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
255 }
256-void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
257- unsigned long start, unsigned long end);
258
259-unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
260- unsigned long flags);
261+extern void mpx_notify_unmap(struct mm_struct *mm, unsigned long start, unsigned long end);
262+extern unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, unsigned long flags);
263+
264 #else
265 static inline int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs)
266 {
267@@ -100,7 +102,6 @@ static inline void mpx_mm_init(struct mm
268 {
269 }
270 static inline void mpx_notify_unmap(struct mm_struct *mm,
271- struct vm_area_struct *vma,
272 unsigned long start, unsigned long end)
273 {
274 }
275--- a/arch/x86/mm/mpx.c
276+++ b/arch/x86/mm/mpx.c
277@@ -881,9 +881,10 @@ static int mpx_unmap_tables(struct mm_st
278 * the virtual address region start...end have already been split if
279 * necessary, and the 'vma' is the first vma in this range (start -> end).
280 */
281-void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
282- unsigned long start, unsigned long end)
283+void mpx_notify_unmap(struct mm_struct *mm, unsigned long start,
284+ unsigned long end)
285 {
286+ struct vm_area_struct *vma;
287 int ret;
288
289 /*
290@@ -902,11 +903,12 @@ void mpx_notify_unmap(struct mm_struct *
291 * which should not occur normally. Being strict about it here
292 * helps ensure that we do not have an exploitable stack overflow.
293 */
294- do {
295+ vma = find_vma(mm, start);
296+ while (vma && vma->vm_start < end) {
297 if (vma->vm_flags & VM_MPX)
298 return;
299 vma = vma->vm_next;
300- } while (vma && vma->vm_start < end);
301+ }
302
303 ret = mpx_unmap_tables(mm, start, end);
304 if (ret)
305--- a/include/asm-generic/mm_hooks.h
306+++ b/include/asm-generic/mm_hooks.h
307@@ -18,7 +18,6 @@ static inline void arch_exit_mmap(struct
308 }
309
310 static inline void arch_unmap(struct mm_struct *mm,
311- struct vm_area_struct *vma,
312 unsigned long start, unsigned long end)
313 {
314 }
315--- a/mm/mmap.c
316+++ b/mm/mmap.c
317@@ -2736,9 +2736,17 @@ int __do_munmap(struct mm_struct *mm, un
318 return -EINVAL;
319
320 len = PAGE_ALIGN(len);
321+ end = start + len;
322 if (len == 0)
323 return -EINVAL;
324
325+ /*
326+ * arch_unmap() might do unmaps itself. It must be called
327+ * and finish any rbtree manipulation before this code
328+ * runs and also starts to manipulate the rbtree.
329+ */
330+ arch_unmap(mm, start, end);
331+
332 /* Find the first overlapping VMA */
333 vma = find_vma(mm, start);
334 if (!vma)
335@@ -2747,7 +2755,6 @@ int __do_munmap(struct mm_struct *mm, un
336 /* we have start < vma->vm_end */
337
338 /* if it doesn't overlap, we have nothing.. */
339- end = start + len;
340 if (vma->vm_start >= end)
341 return 0;
342
343@@ -2817,12 +2824,6 @@ int __do_munmap(struct mm_struct *mm, un
344 /* Detach vmas from rbtree */
345 detach_vmas_to_be_unmapped(mm, vma, prev, end);
346
347- /*
348- * mpx unmap needs to be called with mmap_sem held for write.
349- * It is safe to call it before unmap_region().
350- */
351- arch_unmap(mm, vma, start, end);
352-
353 if (downgrade)
354 downgrade_write(&mm->mmap_sem);
355