From: David Hildenbrand Date: Mon, 10 Feb 2025 19:37:47 +0000 (+0100) Subject: mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writa... X-Git-Tag: v6.15-rc1~81^2~408 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=9a914592140ec548ef72bfef7c8a4e036a1ac492;p=thirdparty%2Fkernel%2Flinux.git mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable() Let's do it just like mprotect write-upgrade or during NUMA-hinting faults on PROT_NONE PTEs: detect if the PTE can be writable by using can_change_pte_writable(). Set the PTE only dirty if the folio is dirty: we might not necessarily have a write access, and setting the PTE writable doesn't require setting the PTE dirty. From a CPU perspective, these entries are clean. So only set the PTE dirty if the folios is dirty. With this change in place, there is no need to have separate readable and writable device-exclusive entry types, and we'll merge them next separately. Note that, during fork(), we first convert the device-exclusive entries back to ordinary PTEs, and we only ever allow conversion of writable PTEs to device-exclusive -- only mprotect can currently change them to readable-device-exclusive. Consequently, we always expect PageAnonExclusive(page)==true and can_change_pte_writable()==true, unless we are dealing with soft-dirty tracking or uffd-wp. But reusing can_change_pte_writable() for now is cleaner. Link: https://lkml.kernel.org/r/20250210193801.781278-6-david@redhat.com Signed-off-by: David Hildenbrand Tested-by: Alistair Popple Cc: Alex Shi Cc: Danilo Krummrich Cc: Dave Airlie Cc: Jann Horn Cc: Jason Gunthorpe Cc: Jerome Glisse Cc: John Hubbard Cc: Jonathan Corbet Cc: Karol Herbst Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Lyude Cc: "Masami Hiramatsu (Google)" Cc: Oleg Nesterov Cc: Pasha Tatashin Cc: Peter Xu Cc: Peter Zijlstra (Intel) Cc: SeongJae Park Cc: Simona Vetter Cc: Vlastimil Babka Cc: Yanteng Si Cc: Barry Song Signed-off-by: Andrew Morton --- diff --git a/mm/memory.c b/mm/memory.c index f77c10fd26b91..568b3afde8d20 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -723,18 +723,21 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, struct folio *folio = page_folio(page); pte_t orig_pte; pte_t pte; - swp_entry_t entry; orig_pte = ptep_get(ptep); pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); if (pte_swp_soft_dirty(orig_pte)) pte = pte_mksoft_dirty(pte); - entry = pte_to_swp_entry(orig_pte); if (pte_swp_uffd_wp(orig_pte)) pte = pte_mkuffd_wp(pte); - else if (is_writable_device_exclusive_entry(entry)) - pte = maybe_mkwrite(pte_mkdirty(pte), vma); + + if ((vma->vm_flags & VM_WRITE) && + can_change_pte_writable(vma, address, pte)) { + if (folio_test_dirty(folio)) + pte = pte_mkdirty(pte); + pte = pte_mkwrite(pte, vma); + } VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && PageAnonExclusive(page)), folio);