Since [1], constructors/destructors are expected to be called for all page
table pages, at all levels and for both user and kernel pgtables. There
is however one glaring exception: kernel PTEs are managed via separate
helpers (pte_alloc_kernel/pte_free_kernel), which do not call the [cd]tor,
at least not in the generic implementation.
The most obvious reason for this anomaly is that init_mm is special-cased
not to use split page table locks. As a result calling ptlock_init() for
PTEs associated with init_mm would be wasteful, potentially resulting in
dynamic memory allocation. However, pgtable [cd]tors perform other
actions - currently related to accounting/statistics, and potentially more
functionally significant in the future.
Now that pagetable_pte_ctor() is passed the associated mm, we can make it
skip the call to ptlock_init() for init_mm; this allows us to call the
ctor from pte_alloc_one_kernel() too. This is matched by a call to the
pgtable destructor in pte_free_kernel(); no special-casing is needed on
that path, as ptlock_free() is already called unconditionally.
(ptlock_free() is a no-op unless a ptlock was allocated for the given
PTP.)
This patch ensures that all architectures that rely on
<asm-generic/pgalloc.h> call the [cd]tor for kernel PTEs.
pte_free_kernel() cannot be overridden so changing the generic
implementation is sufficient. pte_alloc_one_kernel() can be overridden
using __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL, and a few architectures implement
it by calling the page allocator directly. We amend those so that they
call the generic __pte_alloc_one_kernel() instead, if possible, ensuring
that the ctor is called.
A few architectures do not use <asm-generic/pgalloc.h>; those will be
taken care of separately.
[1] https://lore.kernel.org/linux-mm/
20250103184415.
2744423-1-kevin.brodsky@arm.com/
Link: https://lkml.kernel.org/r/20250408095222.860601-4-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <x86@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pte_t *pte;
unsigned long i;
- pte = (pte_t *) __get_free_page(GFP_KERNEL);
+ pte = __pte_alloc_one_kernel(mm);
if (!pte)
return NULL;
__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
{
if (mem_init_done)
- return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ return __pte_alloc_one_kernel(mm);
else
return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
MEMBLOCK_LOW_LIMIT,
pte_t *pte;
if (likely(mem_init_done)) {
- pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+ pte = __pte_alloc_one_kernel(mm);
} else {
pte = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
}
if (!ptdesc)
return NULL;
+ if (!pagetable_pte_ctor(mm, ptdesc)) {
+ pagetable_free(ptdesc);
+ return NULL;
+ }
+
return ptdesc_address(ptdesc);
}
#define __pte_alloc_one_kernel(...) alloc_hooks(__pte_alloc_one_kernel_noprof(__VA_ARGS__))
*/
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{
- pagetable_free(virt_to_ptdesc(pte));
+ pagetable_dtor_free(virt_to_ptdesc(pte));
}
/**
static inline bool pagetable_pte_ctor(struct mm_struct *mm,
struct ptdesc *ptdesc)
{
- if (!ptlock_init(ptdesc))
+ if (mm != &init_mm && !ptlock_init(ptdesc))
return false;
__pagetable_ctor(ptdesc);
return true;