--- /dev/null
+From ca72d88378b2f2444d3ec145dd442d449d3fefbc Mon Sep 17 00:00:00 2001
+From: Michael Ellerman <mpe@ellerman.id.au>
+Date: Wed, 12 Jun 2019 23:35:07 +1000
+Subject: powerpc/mm/64s/hash: Reallocate context ids on fork
+
+From: Michael Ellerman <mpe@ellerman.id.au>
+
+commit ca72d88378b2f2444d3ec145dd442d449d3fefbc upstream.
+
+When using the Hash Page Table (HPT) MMU, userspace memory mappings
+are managed at two levels. Firstly in the Linux page tables, much like
+other architectures, and secondly in the SLB (Segment Lookaside
+Buffer) and HPT. It's the SLB and HPT that are actually used by the
+hardware to do translations.
+
+As part of the series adding support for 4PB user virtual address
+space using the hash MMU, we added support for allocating multiple
+"context ids" per process, one for each 512TB chunk of address space.
+These are tracked in an array called extended_id in the mm_context_t
+of a process that has done a mapping above 512TB.
+
+If such a process forks (ie. clone(2) without CLONE_VM set) it's mm is
+copied, including the mm_context_t, and then init_new_context() is
+called to reinitialise parts of the mm_context_t as appropriate to
+separate the address spaces of the two processes.
+
+The key step in ensuring the two processes have separate address
+spaces is to allocate a new context id for the process, this is done
+at the beginning of hash__init_new_context(). If we didn't allocate a
+new context id then the two processes would share mappings as far as
+the SLB and HPT are concerned, even though their Linux page tables
+would be separate.
+
+For mappings above 512TB, which use the extended_id array, we
+neglected to allocate new context ids on fork, meaning the parent and
+child use the same ids and therefore share those mappings even though
+they're supposed to be separate. This can lead to the parent seeing
+writes done by the child, which is essentially memory corruption.
+
+There is an additional exposure which is that if the child process
+exits, all its context ids are freed, including the context ids that
+are still in use by the parent for mappings above 512TB. One or more
+of those ids can then be reallocated to a third process, that process
+can then read/write to the parent's mappings above 512TB. Additionally
+if the freed id is used for the third process's primary context id,
+then the parent is able to read/write to the third process's mappings
+*below* 512TB.
+
+All of these are fundamental failures to enforce separation between
+processes. The only mitigating factor is that the bug only occurs if a
+process creates mappings above 512TB, and most applications still do
+not create such mappings.
+
+Only machines using the hash page table MMU are affected, eg. PowerPC
+970 (G5), PA6T, Power5/6/7/8/9. By default Power9 bare metal machines
+(powernv) use the Radix MMU and are not affected, unless the machine
+has been explicitly booted in HPT mode (using disable_radix on the
+kernel command line). KVM guests on Power9 may be affected if the host
+or guest is configured to use the HPT MMU. LPARs under PowerVM on
+Power9 are affected as they always use the HPT MMU. Kernels built with
+PAGE_SIZE=4K are not affected.
+
+The fix is relatively simple, we need to reallocate context ids for
+all extended mappings on fork.
+
+Fixes: f384796c40dc ("powerpc/mm: Add support for handling > 512TB address in SLB miss")
+Cc: stable@vger.kernel.org # v4.17+
+Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ arch/powerpc/mm/mmu_context_book3s64.c | 46 ++++++++++++++++++++++++++++++---
+ 1 file changed, 42 insertions(+), 4 deletions(-)
+
+--- a/arch/powerpc/mm/mmu_context_book3s64.c
++++ b/arch/powerpc/mm/mmu_context_book3s64.c
+@@ -53,14 +53,48 @@ int hash__alloc_context_id(void)
+ }
+ EXPORT_SYMBOL_GPL(hash__alloc_context_id);
+
++static int realloc_context_ids(mm_context_t *ctx)
++{
++ int i, id;
++
++ /*
++ * id 0 (aka. ctx->id) is special, we always allocate a new one, even if
++ * there wasn't one allocated previously (which happens in the exec
++ * case where ctx is newly allocated).
++ *
++ * We have to be a bit careful here. We must keep the existing ids in
++ * the array, so that we can test if they're non-zero to decide if we
++ * need to allocate a new one. However in case of error we must free the
++ * ids we've allocated but *not* any of the existing ones (or risk a
++ * UAF). That's why we decrement i at the start of the error handling
++ * loop, to skip the id that we just tested but couldn't reallocate.
++ */
++ for (i = 0; i < ARRAY_SIZE(ctx->extended_id); i++) {
++ if (i == 0 || ctx->extended_id[i]) {
++ id = hash__alloc_context_id();
++ if (id < 0)
++ goto error;
++
++ ctx->extended_id[i] = id;
++ }
++ }
++
++ /* The caller expects us to return id */
++ return ctx->id;
++
++error:
++ for (i--; i >= 0; i--) {
++ if (ctx->extended_id[i])
++ ida_free(&mmu_context_ida, ctx->extended_id[i]);
++ }
++
++ return id;
++}
++
+ static int hash__init_new_context(struct mm_struct *mm)
+ {
+ int index;
+
+- index = hash__alloc_context_id();
+- if (index < 0)
+- return index;
+-
+ /*
+ * The old code would re-promote on fork, we don't do that when using
+ * slices as it could cause problem promoting slices that have been
+@@ -78,6 +112,10 @@ static int hash__init_new_context(struct
+ if (mm->context.id == 0)
+ slice_init_new_context_exec(mm);
+
++ index = realloc_context_ids(&mm->context);
++ if (index < 0)
++ return index;
++
+ subpage_prot_init_new_context(mm);
+
+ pkey_mm_init(mm);