From: Greg Kroah-Hartman Date: Sat, 9 Nov 2013 05:59:22 +0000 (-0800) Subject: 3.4-stable patches X-Git-Tag: v3.4.69~4 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=cd304b49d2d04d1b6c2f18730ec9f162518d8138;p=thirdparty%2Fkernel%2Fstable-queue.git 3.4-stable patches added patches: mm-fix-aio-performance-regression-for-database-caused-by-thp.patch --- diff --git a/queue-3.4/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch b/queue-3.4/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch new file mode 100644 index 00000000000..59b293c9e05 --- /dev/null +++ b/queue-3.4/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch @@ -0,0 +1,138 @@ +From 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 Mon Sep 17 00:00:00 2001 +From: Khalid Aziz +Date: Wed, 11 Sep 2013 14:22:20 -0700 +Subject: mm: fix aio performance regression for database caused by THP + +From: Khalid Aziz + +commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream. + +I am working with a tool that simulates oracle database I/O workload. +This tool (orion to be specific - +) +allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then +does aio into these pages from flash disks using various common block +sizes used by database. I am looking at performance with two of the most +common block sizes - 1M and 64K. aio performance with these two block +sizes plunged after Transparent HugePages was introduced in the kernel. +Here are performance numbers: + + pre-THP 2.6.39 3.11-rc5 +1M read 8384 MB/s 5629 MB/s 6501 MB/s +64K read 7867 MB/s 4576 MB/s 4251 MB/s + +I have narrowed the performance impact down to the overheads introduced by +THP in __get_page_tail() and put_compound_page() routines. perf top shows +>40% of cycles being spent in these two routines. Every time direct I/O +to hugetlbfs pages starts, kernel calls get_page() to grab a reference to +the pages and calls put_page() when I/O completes to put the reference +away. THP introduced significant amount of locking overhead to get_page() +and put_page() when dealing with compound pages because hugepages can be +split underneath get_page() and put_page(). It added this overhead +irrespective of whether it is dealing with hugetlbfs pages or transparent +hugepages. This resulted in 20%-45% drop in aio performance when using +hugetlbfs pages. + +Since hugetlbfs pages can not be split, there is no reason to go through +all the locking overhead for these pages from what I can see. I added +code to __get_page_tail() and put_compound_page() to bypass all the +locking code when working with hugetlbfs pages. This improved performance +significantly. Performance numbers with this patch: + + pre-THP 3.11-rc5 3.11-rc5 + Patch +1M read 8384 MB/s 6501 MB/s 8371 MB/s +64K read 7867 MB/s 4251 MB/s 6510 MB/s + +Performance with 64K read is still lower than what it was before THP, but +still a 53% improvement. It does mean there is more work to be done but I +will take a 53% improvement for now. + +Please take a look at the following patch and let me know if it looks +reasonable. + +[akpm@linux-foundation.org: tweak comments] +Signed-off-by: Khalid Aziz +Cc: Pravin B Shelar +Cc: Christoph Lameter +Cc: Andrea Arcangeli +Cc: Johannes Weiner +Cc: Mel Gorman +Cc: Rik van Riel +Cc: Minchan Kim +Cc: Andi Kleen +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/swap.c | 31 +++++++++++++++++++++++++++++-- + 1 file changed, 29 insertions(+), 2 deletions(-) + +--- a/mm/swap.c ++++ b/mm/swap.c +@@ -30,6 +30,7 @@ + #include + #include + #include ++#include + + #include "internal.h" + +@@ -68,13 +69,26 @@ static void __put_compound_page(struct p + { + compound_page_dtor *dtor; + +- __page_cache_release(page); ++ if (!PageHuge(page)) ++ __page_cache_release(page); + dtor = get_compound_page_dtor(page); + (*dtor)(page); + } + + static void put_compound_page(struct page *page) + { ++ /* ++ * hugetlbfs pages cannot be split from under us. If this is a ++ * hugetlbfs page, check refcount on head page and release the page if ++ * the refcount becomes zero. ++ */ ++ if (PageHuge(page)) { ++ page = compound_head(page); ++ if (put_page_testzero(page)) ++ __put_compound_page(page); ++ return; ++ } ++ + if (unlikely(PageTail(page))) { + /* __split_huge_page_refcount can run under us */ + struct page *page_head = compound_trans_head(page); +@@ -159,8 +173,20 @@ bool __get_page_tail(struct page *page) + */ + unsigned long flags; + bool got = false; +- struct page *page_head = compound_trans_head(page); ++ struct page *page_head; ++ ++ /* ++ * If this is a hugetlbfs page it cannot be split under us. Simply ++ * increment refcount for the head page. ++ */ ++ if (PageHuge(page)) { ++ page_head = compound_head(page); ++ atomic_inc(&page_head->_count); ++ got = true; ++ goto out; ++ } + ++ page_head = compound_trans_head(page); + if (likely(page != page_head && get_page_unless_zero(page_head))) { + /* + * page_head wasn't a dangling pointer but it +@@ -178,6 +204,7 @@ bool __get_page_tail(struct page *page) + if (unlikely(!got)) + put_page(page_head); + } ++out: + return got; + } + EXPORT_SYMBOL(__get_page_tail); diff --git a/queue-3.4/series b/queue-3.4/series index 5cb4bcfda74..7bc3af88ce9 100644 --- a/queue-3.4/series +++ b/queue-3.4/series @@ -21,3 +21,4 @@ uml-check-length-in-exitcode_proc_write.patch xtensa-don-t-use-alternate-signal-stack-on-threads.patch lib-scatterlist.c-don-t-flush_kernel_dcache_page-on-slab-page.patch aacraid-missing-capable-check-in-compat-ioctl.patch +mm-fix-aio-performance-regression-for-database-caused-by-thp.patch