From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Sat, 9 Nov 2013 05:59:22 +0000 (-0800)
Subject: 3.4-stable patches
X-Git-Tag: v3.4.69~4
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=cd304b49d2d04d1b6c2f18730ec9f162518d8138;p=thirdparty%2Fkernel%2Fstable-queue.git

3.4-stable patches

added patches:
	mm-fix-aio-performance-regression-for-database-caused-by-thp.patch
---

diff --git a/queue-3.4/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch b/queue-3.4/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch
new file mode 100644
index 00000000000..59b293c9e05
--- /dev/null
+++ b/queue-3.4/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch
@@ -0,0 +1,138 @@
+From 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 Mon Sep 17 00:00:00 2001
+From: Khalid Aziz <khalid.aziz@oracle.com>
+Date: Wed, 11 Sep 2013 14:22:20 -0700
+Subject: mm: fix aio performance regression for database caused by THP
+
+From: Khalid Aziz <khalid.aziz@oracle.com>
+
+commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
+
+I am working with a tool that simulates oracle database I/O workload.
+This tool (orion to be specific -
+<http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
+allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag.  It then
+does aio into these pages from flash disks using various common block
+sizes used by database.  I am looking at performance with two of the most
+common block sizes - 1M and 64K.  aio performance with these two block
+sizes plunged after Transparent HugePages was introduced in the kernel.
+Here are performance numbers:
+
+		pre-THP		2.6.39		3.11-rc5
+1M read		8384 MB/s	5629 MB/s	6501 MB/s
+64K read	7867 MB/s	4576 MB/s	4251 MB/s
+
+I have narrowed the performance impact down to the overheads introduced by
+THP in __get_page_tail() and put_compound_page() routines.  perf top shows
+>40% of cycles being spent in these two routines.  Every time direct I/O
+to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
+the pages and calls put_page() when I/O completes to put the reference
+away.  THP introduced significant amount of locking overhead to get_page()
+and put_page() when dealing with compound pages because hugepages can be
+split underneath get_page() and put_page().  It added this overhead
+irrespective of whether it is dealing with hugetlbfs pages or transparent
+hugepages.  This resulted in 20%-45% drop in aio performance when using
+hugetlbfs pages.
+
+Since hugetlbfs pages can not be split, there is no reason to go through
+all the locking overhead for these pages from what I can see.  I added
+code to __get_page_tail() and put_compound_page() to bypass all the
+locking code when working with hugetlbfs pages.  This improved performance
+significantly.  Performance numbers with this patch:
+
+		pre-THP		3.11-rc5	3.11-rc5 + Patch
+1M read		8384 MB/s	6501 MB/s	8371 MB/s
+64K read	7867 MB/s	4251 MB/s	6510 MB/s
+
+Performance with 64K read is still lower than what it was before THP, but
+still a 53% improvement.  It does mean there is more work to be done but I
+will take a 53% improvement for now.
+
+Please take a look at the following patch and let me know if it looks
+reasonable.
+
+[akpm@linux-foundation.org: tweak comments]
+Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
+Cc: Pravin B Shelar <pshelar@nicira.com>
+Cc: Christoph Lameter <cl@linux.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Mel Gorman <mel@csn.ul.ie>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Minchan Kim <minchan@kernel.org>
+Cc: Andi Kleen <andi@firstfloor.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/swap.c |   31 +++++++++++++++++++++++++++++--
+ 1 file changed, 29 insertions(+), 2 deletions(-)
+
+--- a/mm/swap.c
++++ b/mm/swap.c
+@@ -30,6 +30,7 @@
+ #include <linux/backing-dev.h>
+ #include <linux/memcontrol.h>
+ #include <linux/gfp.h>
++#include <linux/hugetlb.h>
+ 
+ #include "internal.h"
+ 
+@@ -68,13 +69,26 @@ static void __put_compound_page(struct p
+ {
+ 	compound_page_dtor *dtor;
+ 
+-	__page_cache_release(page);
++	if (!PageHuge(page))
++		__page_cache_release(page);
+ 	dtor = get_compound_page_dtor(page);
+ 	(*dtor)(page);
+ }
+ 
+ static void put_compound_page(struct page *page)
+ {
++	/*
++	 * hugetlbfs pages cannot be split from under us.  If this is a
++	 * hugetlbfs page, check refcount on head page and release the page if
++	 * the refcount becomes zero.
++	 */
++	if (PageHuge(page)) {
++		page = compound_head(page);
++		if (put_page_testzero(page))
++			__put_compound_page(page);
++		return;
++	}
++
+ 	if (unlikely(PageTail(page))) {
+ 		/* __split_huge_page_refcount can run under us */
+ 		struct page *page_head = compound_trans_head(page);
+@@ -159,8 +173,20 @@ bool __get_page_tail(struct page *page)
+ 	 */
+ 	unsigned long flags;
+ 	bool got = false;
+-	struct page *page_head = compound_trans_head(page);
++	struct page *page_head;
++
++	/*
++	 * If this is a hugetlbfs page it cannot be split under us.  Simply
++	 * increment refcount for the head page.
++	 */
++	if (PageHuge(page)) {
++		page_head = compound_head(page);
++		atomic_inc(&page_head->_count);
++		got = true;
++		goto out;
++	}
+ 
++	page_head = compound_trans_head(page);
+ 	if (likely(page != page_head && get_page_unless_zero(page_head))) {
+ 		/*
+ 		 * page_head wasn't a dangling pointer but it
+@@ -178,6 +204,7 @@ bool __get_page_tail(struct page *page)
+ 		if (unlikely(!got))
+ 			put_page(page_head);
+ 	}
++out:
+ 	return got;
+ }
+ EXPORT_SYMBOL(__get_page_tail);
diff --git a/queue-3.4/series b/queue-3.4/series
index 5cb4bcfda74..7bc3af88ce9 100644
--- a/queue-3.4/series
+++ b/queue-3.4/series
@@ -21,3 +21,4 @@ uml-check-length-in-exitcode_proc_write.patch
 xtensa-don-t-use-alternate-signal-stack-on-threads.patch
 lib-scatterlist.c-don-t-flush_kernel_dcache_page-on-slab-page.patch
 aacraid-missing-capable-check-in-compat-ioctl.patch
+mm-fix-aio-performance-regression-for-database-caused-by-thp.patch