From: Greg Kroah-Hartman Date: Wed, 18 Oct 2017 16:51:07 +0000 (+0200) Subject: 3.18-stable patches X-Git-Tag: v3.18.77~18 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=d02dbc961927027f59390c0845696c1c8584722e;p=thirdparty%2Fkernel%2Fstable-queue.git 3.18-stable patches added patches: x86-mm-disable-preemption-during-cr3-read-write.patch --- diff --git a/queue-3.18/series b/queue-3.18/series new file mode 100644 index 00000000000..d39de0dc438 --- /dev/null +++ b/queue-3.18/series @@ -0,0 +1 @@ +x86-mm-disable-preemption-during-cr3-read-write.patch diff --git a/queue-3.18/x86-mm-disable-preemption-during-cr3-read-write.patch b/queue-3.18/x86-mm-disable-preemption-during-cr3-read-write.patch new file mode 100644 index 00000000000..52b72c7b4d5 --- /dev/null +++ b/queue-3.18/x86-mm-disable-preemption-during-cr3-read-write.patch @@ -0,0 +1,111 @@ +From 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e Mon Sep 17 00:00:00 2001 +From: Sebastian Andrzej Siewior +Date: Fri, 5 Aug 2016 15:37:39 +0200 +Subject: x86/mm: Disable preemption during CR3 read+write +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Sebastian Andrzej Siewior + +commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e upstream. + +There's a subtle preemption race on UP kernels: + +Usually current->mm (and therefore mm->pgd) stays the same during the +lifetime of a task so it does not matter if a task gets preempted during +the read and write of the CR3. + +But then, there is this scenario on x86-UP: + +TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by: + + -> mmput() + -> exit_mmap() + -> tlb_finish_mmu() + -> tlb_flush_mmu() + -> tlb_flush_mmu_tlbonly() + -> tlb_flush() + -> flush_tlb_mm_range() + -> __flush_tlb_up() + -> __flush_tlb() + -> __native_flush_tlb() + +At this point current->mm is NULL but current->active_mm still points to +the "old" mm. + +Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its +own mm so CR3 has changed. + +Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's +mm and so CR3 remains unchanged. Once taskA gets active it continues +where it was interrupted and that means it writes its old CR3 value +back. Everything is fine because userland won't need its memory +anymore. + +Now the fun part: + +Let's preempt taskA one more time and get back to taskB. This +time switch_mm() won't do a thing because oldmm (->active_mm) +is the same as mm (as per context_switch()). So we remain +with a bad CR3 / PGD and return to userland. + +The next thing that happens is handle_mm_fault() with an address for +the execution of its code in userland. handle_mm_fault() realizes that +it has a PTE with proper rights so it returns doing nothing. But the +CPU looks at the wrong PGD and insists that something is wrong and +faults again. And again. And one more time… + +This pagefault circle continues until the scheduler gets tired of it and +puts another task on the CPU. It gets little difficult if the task is a +RT task with a high priority. The system will either freeze or it gets +fixed by the software watchdog thread which usually runs at RT-max prio. +But waiting for the watchdog will increase the latency of the RT task +which is no good. + +Fix this by disabling preemption across the critical code section. + +Signed-off-by: Sebastian Andrzej Siewior +Acked-by: Peter Zijlstra (Intel) +Acked-by: Rik van Riel +Acked-by: Andy Lutomirski +Cc: Borislav Petkov +Cc: Borislav Petkov +Cc: Brian Gerst +Cc: Denys Vlasenko +Cc: H. Peter Anvin +Cc: Josh Poimboeuf +Cc: Linus Torvalds +Cc: Mel Gorman +Cc: Peter Zijlstra +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Cc: linux-mm@kvack.org +Cc: stable@vger.kernel.org +Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de +[ Prettified the changelog. ] +Signed-off-by: Ingo Molnar +Cc: Bernhard Kaindl +Signed-off-by: Greg Kroah-Hartman + +--- + arch/x86/include/asm/tlbflush.h | 7 +++++++ + 1 file changed, 7 insertions(+) + +--- a/arch/x86/include/asm/tlbflush.h ++++ b/arch/x86/include/asm/tlbflush.h +@@ -86,7 +86,14 @@ static inline void cr4_set_bits_and_upda + + static inline void __native_flush_tlb(void) + { ++ /* ++ * If current->mm == NULL then we borrow a mm which may change during a ++ * task switch and therefore we must not be preempted while we write CR3 ++ * back: ++ */ ++ preempt_disable(); + native_write_cr3(native_read_cr3()); ++ preempt_enable(); + } + + static inline void __native_flush_tlb_global_irq_disabled(void)