Linux supports multi-sized Transparent Huge Pages (mTHP). For the purpose
of this patch description, we call the block size mapped by a non-last
level pagetable level, the traditional THP size (2M for 4K basepage,
512M for 64K basepage). Linux now also supports intermediate THP sizes
mapped by the last level pagetable - we call that the mTHP size.
The support for mTHP in Linux has grown to be better and stable over time -
applications can benefit from reduced page faults and reduced kernel
memory management overhead, albeit at the cost of internal fragmentation.
We have observed consistent performance boosts with mTHP with little
variance.
As a result, enable 2M THP by default on Aarch64. This enables THP even if
user hasn't passed glibc.malloc.hugetlb=1. If user has passed it, we avoid
making the system call to check the hugepage size from sysfs, and override
it with the hardcoded 2MB.
There are two additional benefits of this patch, if the transparent
hugepage sysctl is set to madvise or always:
1) The THP size is now hardcoded to 2MB for Aarch64. This avoids a
syscall for fetching the THP size from sysfs.
2) On 64K basepage size systems, the traditional THP size is 512M, which
is unusable and impractical. We can instead benefit from the mTHP size of
2M. Apart from the usual benefit of THPs/mTHPs as described above, Aarch64
systems benefit from reduced TLB pressure on this mTHP size, commonly
known as the "contpte" size. If the application takes a pagefault, and
either the THP sysctl settings is "always", or the virtual memory area
has been madvise(MADV_HUGEPAGE)'d along with sysctl being "madvise", then
Linux will fault in a 2M mTHP, mapping contiguous pages into the pagetable,
and painting the pagetable entries with the cont-bit. This bit is a hint to
the hardware that the concerned pagetable entry maps a page which is part
of a set of contiguous pages - the TLB then only remembers a single entry
for this set of 2M/64K = 32 pages, because the physical address of any
other page in this contiguous set is computable by the TLB cached physical
address via a linear offset. Hence, what was only possible with the
traditional THP size, is now possible with the mTHP size.
We see a 6.25% performance improvement on SPEC.
If the sysctl is set to never, no transparent hugepages will be created by
the kernel. But, this patch still sets thp_pagesize = 2MB. The benefit is
that on MORECORE() invocation, we extend the heap by 2MB instead of 4KB,
potentially reducing the frequency of this syscall's invocation by 512x.
Note that, there is no difference in cost between an sbrk(2M) and sbrk(4K);
the kernel only does a virtual reservation and does not touch user physical
memory.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
/* ----------- Routines dealing with transparent huge pages ----------- */
+static void thp_init (void);
+
static inline void
madvise_thp (void *p, INTERNAL_SIZE_T size)
{
#ifdef MADV_HUGEPAGE
+
+ /* Ensure thp_init () is invoked only once */
+ if (mp_.thp_pagesize < DEFAULT_THP_PAGESIZE)
+ thp_init ();
+
/* Only use __madvise if the system is using 'madvise' mode.
Otherwise the call is wasteful. */
if (mp_.thp_mode != malloc_thp_mode_madvise)
previous calls. Otherwise, we correct to page-align below.
*/
+ /* Ensure thp_init () is invoked only once */
+ if (mp_.thp_pagesize < DEFAULT_THP_PAGESIZE)
+ thp_init ();
+
if (__glibc_unlikely (mp_.thp_pagesize != 0))
{
uintptr_t lastbrk = (uintptr_t) MORECORE (0);
return 0;
}
+static __always_inline void
+thp_init (void)
+{
+ /* thp_pagesize is set even if thp_mode is never. This reduces frequency
+ of MORECORE () invocation. */
+ mp_.thp_pagesize = DEFAULT_THP_PAGESIZE;
+ mp_.thp_mode = __malloc_thp_mode ();
+}
+
int
__libc_mallopt (int param_number, int value)
{