From: Greg Kroah-Hartman Date: Thu, 25 Aug 2022 10:41:21 +0000 (+0200) Subject: 5.19-stable patches X-Git-Tag: v5.10.140~56 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=f2a68c4e078fce6cc11c108daafb81bdfb6daf70;p=thirdparty%2Fkernel%2Fstable-queue.git 5.19-stable patches added patches: mm-gup-fix-foll_force-cow-security-issue-and-remove-foll_cow.patch --- diff --git a/queue-4.14/series b/queue-4.14/series new file mode 100644 index 00000000000..e69de29bb2d diff --git a/queue-5.19/mm-gup-fix-foll_force-cow-security-issue-and-remove-foll_cow.patch b/queue-5.19/mm-gup-fix-foll_force-cow-security-issue-and-remove-foll_cow.patch new file mode 100644 index 00000000000..1f8a2deb3fa --- /dev/null +++ b/queue-5.19/mm-gup-fix-foll_force-cow-security-issue-and-remove-foll_cow.patch @@ -0,0 +1,310 @@ +From david@redhat.com Thu Aug 25 12:40:27 2022 +From: David Hildenbrand +Date: Wed, 24 Aug 2022 21:23:33 +0200 +Subject: mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW +To: linux-kernel@vger.kernel.org +Cc: linux-mm@kvack.org, David Hildenbrand , Greg Kroah-Hartman , Axel Rasmussen , Nadav Amit , Peter Xu , Hugh Dickins , Andrea Arcangeli , Matthew Wilcox , Vlastimil Babka , John Hubbard , Jason Gunthorpe , David Laight , stable@vger.kernel.org +Message-ID: <20220824192333.287405-1-david@redhat.com> + +From: David Hildenbrand + +commit 5535be3099717646781ce1540cf725965d680e7b upstream. + +Ever since the Dirty COW (CVE-2016-5195) security issue happened, we know +that FOLL_FORCE can be possibly dangerous, especially if there are races +that can be exploited by user space. + +Right now, it would be sufficient to have some code that sets a PTE of a +R/O-mapped shared page dirty, in order for it to erroneously become +writable by FOLL_FORCE. The implications of setting a write-protected PTE +dirty might not be immediately obvious to everyone. + +And in fact ever since commit 9ae0f87d009c ("mm/shmem: unconditionally set +pte dirty in mfill_atomic_install_pte"), we can use UFFDIO_CONTINUE to map +a shmem page R/O while marking the pte dirty. This can be used by +unprivileged user space to modify tmpfs/shmem file content even if the +user does not have write permissions to the file, and to bypass memfd +write sealing -- Dirty COW restricted to tmpfs/shmem (CVE-2022-2590). + +To fix such security issues for good, the insight is that we really only +need that fancy retry logic (FOLL_COW) for COW mappings that are not +writable (!VM_WRITE). And in a COW mapping, we really only broke COW if +we have an exclusive anonymous page mapped. If we have something else +mapped, or the mapped anonymous page might be shared (!PageAnonExclusive), +we have to trigger a write fault to break COW. If we don't find an +exclusive anonymous page when we retry, we have to trigger COW breaking +once again because something intervened. + +Let's move away from this mandatory-retry + dirty handling and rely on our +PageAnonExclusive() flag for making a similar decision, to use the same +COW logic as in other kernel parts here as well. In case we stumble over +a PTE in a COW mapping that does not map an exclusive anonymous page, COW +was not properly broken and we have to trigger a fake write-fault to break +COW. + +Just like we do in can_change_pte_writable() added via commit 64fe24a3e05e +("mm/mprotect: try avoiding write faults for exclusive anonymous pages +when changing protection") and commit 76aefad628aa ("mm/mprotect: fix +soft-dirty check in can_change_pte_writable()"), take care of softdirty +and uffd-wp manually. + +For example, a write() via /proc/self/mem to a uffd-wp-protected range has +to fail instead of silently granting write access and bypassing the +userspace fault handler. Note that FOLL_FORCE is not only used for debug +access, but also triggered by applications without debug intentions, for +example, when pinning pages via RDMA. + +This fixes CVE-2022-2590. Note that only x86_64 and aarch64 are +affected, because only those support CONFIG_HAVE_ARCH_USERFAULTFD_MINOR. + +Fortunately, FOLL_COW is no longer required to handle FOLL_FORCE. So +let's just get rid of it. + +Thanks to Nadav Amit for pointing out that the pte_dirty() check in +FOLL_FORCE code is problematic and might be exploitable. + +Note 1: We don't check for the PTE being dirty because it doesn't matter + for making a "was COWed" decision anymore, and whoever modifies the + page has to set the page dirty either way. + +Note 2: Kernels before extended uffd-wp support and before + PageAnonExclusive (< 5.19) can simply revert the problematic + commit instead and be safe regarding UFFDIO_CONTINUE. A backport to + v5.19 requires minor adjustments due to lack of + vma_soft_dirty_enabled(). + +Link: https://lkml.kernel.org/r/20220809205640.70916-1-david@redhat.com +Fixes: 9ae0f87d009c ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte") +Signed-off-by: David Hildenbrand +Cc: Greg Kroah-Hartman +Cc: Axel Rasmussen +Cc: Nadav Amit +Cc: Peter Xu +Cc: Hugh Dickins +Cc: Andrea Arcangeli +Cc: Matthew Wilcox +Cc: Vlastimil Babka +Cc: John Hubbard +Cc: Jason Gunthorpe +Cc: David Laight +Cc: [5.16] +Signed-off-by: Andrew Morton +Signed-off-by: David Hildenbrand +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/mm.h | 1 + mm/gup.c | 69 ++++++++++++++++++++++++++++++++++++----------------- + mm/huge_memory.c | 65 +++++++++++++++++++++++++++++++++---------------- + 3 files changed, 91 insertions(+), 44 deletions(-) + +--- a/include/linux/mm.h ++++ b/include/linux/mm.h +@@ -2939,7 +2939,6 @@ struct page *follow_page(struct vm_area_ + #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ + #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ + #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ +-#define FOLL_COW 0x4000 /* internal GUP flag */ + #define FOLL_ANON 0x8000 /* don't do file mappings */ + #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ + #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ +--- a/mm/gup.c ++++ b/mm/gup.c +@@ -478,14 +478,43 @@ static int follow_pfn_pte(struct vm_area + return -EEXIST; + } + +-/* +- * FOLL_FORCE can write to even unwritable pte's, but only +- * after we've gone through a COW cycle and they are dirty. +- */ +-static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) ++/* FOLL_FORCE can write to even unwritable PTEs in COW mappings. */ ++static inline bool can_follow_write_pte(pte_t pte, struct page *page, ++ struct vm_area_struct *vma, ++ unsigned int flags) + { +- return pte_write(pte) || +- ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); ++ /* If the pte is writable, we can write to the page. */ ++ if (pte_write(pte)) ++ return true; ++ ++ /* Maybe FOLL_FORCE is set to override it? */ ++ if (!(flags & FOLL_FORCE)) ++ return false; ++ ++ /* But FOLL_FORCE has no effect on shared mappings */ ++ if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) ++ return false; ++ ++ /* ... or read-only private ones */ ++ if (!(vma->vm_flags & VM_MAYWRITE)) ++ return false; ++ ++ /* ... or already writable ones that just need to take a write fault */ ++ if (vma->vm_flags & VM_WRITE) ++ return false; ++ ++ /* ++ * See can_change_pte_writable(): we broke COW and could map the page ++ * writable if we have an exclusive anonymous page ... ++ */ ++ if (!page || !PageAnon(page) || !PageAnonExclusive(page)) ++ return false; ++ ++ /* ... and a write-fault isn't required for other reasons. */ ++ if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && ++ !(vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte)) ++ return false; ++ return !userfaultfd_pte_wp(vma, pte); + } + + static struct page *follow_page_pte(struct vm_area_struct *vma, +@@ -528,12 +557,19 @@ retry: + } + if ((flags & FOLL_NUMA) && pte_protnone(pte)) + goto no_page; +- if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { +- pte_unmap_unlock(ptep, ptl); +- return NULL; +- } + + page = vm_normal_page(vma, address, pte); ++ ++ /* ++ * We only care about anon pages in can_follow_write_pte() and don't ++ * have to worry about pte_devmap() because they are never anon. ++ */ ++ if ((flags & FOLL_WRITE) && ++ !can_follow_write_pte(pte, page, vma, flags)) { ++ page = NULL; ++ goto out; ++ } ++ + if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) { + /* + * Only return device mapping pages in the FOLL_GET or FOLL_PIN +@@ -967,17 +1003,6 @@ static int faultin_page(struct vm_area_s + return -EBUSY; + } + +- /* +- * The VM_FAULT_WRITE bit tells us that do_wp_page has broken COW when +- * necessary, even if maybe_mkwrite decided not to set pte_write. We +- * can thus safely do subsequent page lookups as if they were reads. +- * But only do so when looping for pte_write is futile: in some cases +- * userspace may also be wanting to write to the gotten user page, +- * which a read fault here might prevent (a readonly page might get +- * reCOWed by userspace write). +- */ +- if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE)) +- *flags |= FOLL_COW; + return 0; + } + +--- a/mm/huge_memory.c ++++ b/mm/huge_memory.c +@@ -978,12 +978,6 @@ struct page *follow_devmap_pmd(struct vm + + assert_spin_locked(pmd_lockptr(mm, pmd)); + +- /* +- * When we COW a devmap PMD entry, we split it into PTEs, so we should +- * not be in this function with `flags & FOLL_COW` set. +- */ +- WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set"); +- + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == + (FOLL_PIN | FOLL_GET))) +@@ -1349,14 +1343,43 @@ fallback: + return VM_FAULT_FALLBACK; + } + +-/* +- * FOLL_FORCE can write to even unwritable pmd's, but only +- * after we've gone through a COW cycle and they are dirty. +- */ +-static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) ++/* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */ ++static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, ++ struct vm_area_struct *vma, ++ unsigned int flags) + { +- return pmd_write(pmd) || +- ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd)); ++ /* If the pmd is writable, we can write to the page. */ ++ if (pmd_write(pmd)) ++ return true; ++ ++ /* Maybe FOLL_FORCE is set to override it? */ ++ if (!(flags & FOLL_FORCE)) ++ return false; ++ ++ /* But FOLL_FORCE has no effect on shared mappings */ ++ if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) ++ return false; ++ ++ /* ... or read-only private ones */ ++ if (!(vma->vm_flags & VM_MAYWRITE)) ++ return false; ++ ++ /* ... or already writable ones that just need to take a write fault */ ++ if (vma->vm_flags & VM_WRITE) ++ return false; ++ ++ /* ++ * See can_change_pte_writable(): we broke COW and could map the page ++ * writable if we have an exclusive anonymous page ... ++ */ ++ if (!page || !PageAnon(page) || !PageAnonExclusive(page)) ++ return false; ++ ++ /* ... and a write-fault isn't required for other reasons. */ ++ if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && ++ !(vma->vm_flags & VM_SOFTDIRTY) && !pmd_soft_dirty(pmd)) ++ return false; ++ return !userfaultfd_huge_pmd_wp(vma, pmd); + } + + struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, +@@ -1365,12 +1388,16 @@ struct page *follow_trans_huge_pmd(struc + unsigned int flags) + { + struct mm_struct *mm = vma->vm_mm; +- struct page *page = NULL; ++ struct page *page; + + assert_spin_locked(pmd_lockptr(mm, pmd)); + +- if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags)) +- goto out; ++ page = pmd_page(*pmd); ++ VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); ++ ++ if ((flags & FOLL_WRITE) && ++ !can_follow_write_pmd(*pmd, page, vma, flags)) ++ return NULL; + + /* Avoid dumping huge zero page */ + if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) +@@ -1378,10 +1405,7 @@ struct page *follow_trans_huge_pmd(struc + + /* Full NUMA hinting faults to serialise migration in fault paths */ + if ((flags & FOLL_NUMA) && pmd_protnone(*pmd)) +- goto out; +- +- page = pmd_page(*pmd); +- VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); ++ return NULL; + + if (!pmd_write(*pmd) && gup_must_unshare(flags, page)) + return ERR_PTR(-EMLINK); +@@ -1398,7 +1422,6 @@ struct page *follow_trans_huge_pmd(struc + page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; + VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); + +-out: + return page; + } + diff --git a/queue-5.19/series b/queue-5.19/series new file mode 100644 index 00000000000..3af30c53c98 --- /dev/null +++ b/queue-5.19/series @@ -0,0 +1 @@ +mm-gup-fix-foll_force-cow-security-issue-and-remove-foll_cow.patch