From: Dev Jain Date: Wed, 10 Dec 2025 12:15:18 +0000 (+0000) Subject: malloc: Enable 2MB THP by default on Aarch64 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=321e1fc73f53081d92ba357cdd48c56b79292020;p=thirdparty%2Fglibc.git malloc: Enable 2MB THP by default on Aarch64 Linux supports multi-sized Transparent Huge Pages (mTHP). For the purpose of this patch description, we call the block size mapped by a non-last level pagetable level, the traditional THP size (2M for 4K basepage, 512M for 64K basepage). Linux now also supports intermediate THP sizes mapped by the last level pagetable - we call that the mTHP size. The support for mTHP in Linux has grown to be better and stable over time - applications can benefit from reduced page faults and reduced kernel memory management overhead, albeit at the cost of internal fragmentation. We have observed consistent performance boosts with mTHP with little variance. As a result, enable 2M THP by default on Aarch64. This enables THP even if user hasn't passed glibc.malloc.hugetlb=1. If user has passed it, we avoid making the system call to check the hugepage size from sysfs, and override it with the hardcoded 2MB. There are two additional benefits of this patch, if the transparent hugepage sysctl is set to madvise or always: 1) The THP size is now hardcoded to 2MB for Aarch64. This avoids a syscall for fetching the THP size from sysfs. 2) On 64K basepage size systems, the traditional THP size is 512M, which is unusable and impractical. We can instead benefit from the mTHP size of 2M. Apart from the usual benefit of THPs/mTHPs as described above, Aarch64 systems benefit from reduced TLB pressure on this mTHP size, commonly known as the "contpte" size. If the application takes a pagefault, and either the THP sysctl settings is "always", or the virtual memory area has been madvise(MADV_HUGEPAGE)'d along with sysctl being "madvise", then Linux will fault in a 2M mTHP, mapping contiguous pages into the pagetable, and painting the pagetable entries with the cont-bit. This bit is a hint to the hardware that the concerned pagetable entry maps a page which is part of a set of contiguous pages - the TLB then only remembers a single entry for this set of 2M/64K = 32 pages, because the physical address of any other page in this contiguous set is computable by the TLB cached physical address via a linear offset. Hence, what was only possible with the traditional THP size, is now possible with the mTHP size. We see a 6.25% performance improvement on SPEC. If the sysctl is set to never, no transparent hugepages will be created by the kernel. But, this patch still sets thp_pagesize = 2MB. The benefit is that on MORECORE() invocation, we extend the heap by 2MB instead of 4KB, potentially reducing the frequency of this syscall's invocation by 512x. Note that, there is no difference in cost between an sbrk(2M) and sbrk(4K); the kernel only does a virtual reservation and does not touch user physical memory. Reviewed-by: Wilco Dijkstra  --- diff --git a/malloc/malloc.c b/malloc/malloc.c index 2c56c1f124..8dd7bbe4f4 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2053,10 +2053,17 @@ free_perturb (char *p, size_t n) /* ----------- Routines dealing with transparent huge pages ----------- */ +static void thp_init (void); + static inline void madvise_thp (void *p, INTERNAL_SIZE_T size) { #ifdef MADV_HUGEPAGE + + /* Ensure thp_init () is invoked only once */ + if (mp_.thp_pagesize < DEFAULT_THP_PAGESIZE) + thp_init (); + /* Only use __madvise if the system is using 'madvise' mode. Otherwise the call is wasteful. */ if (mp_.thp_mode != malloc_thp_mode_madvise) @@ -2664,6 +2671,10 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) previous calls. Otherwise, we correct to page-align below. */ + /* Ensure thp_init () is invoked only once */ + if (mp_.thp_pagesize < DEFAULT_THP_PAGESIZE) + thp_init (); + if (__glibc_unlikely (mp_.thp_pagesize != 0)) { uintptr_t lastbrk = (uintptr_t) MORECORE (0); @@ -5660,6 +5671,15 @@ do_set_hugetlb (size_t value) return 0; } +static __always_inline void +thp_init (void) +{ + /* thp_pagesize is set even if thp_mode is never. This reduces frequency + of MORECORE () invocation. */ + mp_.thp_pagesize = DEFAULT_THP_PAGESIZE; + mp_.thp_mode = __malloc_thp_mode (); +} + int __libc_mallopt (int param_number, int value) {