From: Greg Kroah-Hartman Date: Thu, 3 Oct 2013 18:34:13 +0000 (-0700) Subject: delete queue-3.0/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch X-Git-Tag: v3.0.99~2 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=612c1f62c878888c41e33c8a86e3fce5532fe0b2;p=thirdparty%2Fkernel%2Fstable-queue.git delete queue-3.0/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch --- diff --git a/queue-3.0/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch b/queue-3.0/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch deleted file mode 100644 index 6f8860d74a6..00000000000 --- a/queue-3.0/mm-fix-aio-performance-regression-for-database-caused-by-thp.patch +++ /dev/null @@ -1,172 +0,0 @@ -From khalid.aziz@oracle.com Wed Oct 2 19:38:33 2013 -From: Khalid Aziz -Date: Mon, 23 Sep 2013 13:54:09 -0600 -Subject: mm: fix aio performance regression for database caused by THP -To: bhutchings@solarflare.com, gregkh@linuxfoundation.org -Cc: stable@vger.kernel.org, pshelar@nicira.com, cl@linux.com, aarcange@redhat.com, hannes@cmpxchg.org, mel@csn.ul.ie, riel@redhat.com, minchan@kernel.org, andi@firstfloor.org, akpm@linux-foundation.org, torvalds@linux-foundation.org -Message-ID: <1379966049.30551.9.camel@concerto> - -From: Khalid Aziz - -commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream. - -This patch needed to be backported due to changes to mm/swap.c some time -after 3.6 kernel. - -I am working with a tool that simulates oracle database I/O workload. -This tool (orion to be specific - -) -allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then -does aio into these pages from flash disks using various common block -sizes used by database. I am looking at performance with two of the most -common block sizes - 1M and 64K. aio performance with these two block -sizes plunged after Transparent HugePages was introduced in the kernel. -Here are performance numbers: - - pre-THP 2.6.39 3.11-rc5 -1M read 8384 MB/s 5629 MB/s 6501 MB/s -64K read 7867 MB/s 4576 MB/s 4251 MB/s - -I have narrowed the performance impact down to the overheads introduced by -THP in __get_page_tail() and put_compound_page() routines. perf top shows ->40% of cycles being spent in these two routines. Every time direct I/O -to hugetlbfs pages starts, kernel calls get_page() to grab a reference to -the pages and calls put_page() when I/O completes to put the reference -away. THP introduced significant amount of locking overhead to get_page() -and put_page() when dealing with compound pages because hugepages can be -split underneath get_page() and put_page(). It added this overhead -irrespective of whether it is dealing with hugetlbfs pages or transparent -hugepages. This resulted in 20%-45% drop in aio performance when using -hugetlbfs pages. - -Since hugetlbfs pages can not be split, there is no reason to go through -all the locking overhead for these pages from what I can see. I added -code to __get_page_tail() and put_compound_page() to bypass all the -locking code when working with hugetlbfs pages. This improved performance -significantly. Performance numbers with this patch: - - pre-THP 3.11-rc5 3.11-rc5 + Patch -1M read 8384 MB/s 6501 MB/s 8371 MB/s -64K read 7867 MB/s 4251 MB/s 6510 MB/s - -Performance with 64K read is still lower than what it was before THP, but -still a 53% improvement. It does mean there is more work to be done but I -will take a 53% improvement for now. - -Please take a look at the following patch and let me know if it looks -reasonable. - -[akpm@linux-foundation.org: tweak comments] -Signed-off-by: Khalid Aziz -Cc: Pravin B Shelar -Cc: Christoph Lameter -Cc: Andrea Arcangeli -Cc: Johannes Weiner -Cc: Mel Gorman -Cc: Rik van Riel -Cc: Minchan Kim -Cc: Andi Kleen -Signed-off-by: Andrew Morton -Signed-off-by: Linus Torvalds -Signed-off-by: Greg Kroah-Hartman ---- - mm/swap.c | 65 ++++++++++++++++++++++++++++++++++++++++++++------------------ - 1 file changed, 47 insertions(+), 18 deletions(-) - ---- a/mm/swap.c -+++ b/mm/swap.c -@@ -41,6 +41,8 @@ static DEFINE_PER_CPU(struct pagevec[NR_ - static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); - static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); - -+int PageHuge(struct page *page); -+ - /* - * This path almost never happens for VM activity - pages are normally - * freed via pagevecs. But it gets used by networking. -@@ -69,13 +71,26 @@ static void __put_compound_page(struct p - { - compound_page_dtor *dtor; - -- __page_cache_release(page); -+ if (!PageHuge(page)) -+ __page_cache_release(page); - dtor = get_compound_page_dtor(page); - (*dtor)(page); - } - - static void put_compound_page(struct page *page) - { -+ /* -+ * hugetlbfs pages can not be split from under us. So if this -+ * is a hugetlbfs page, check refcount on head page and release -+ * the page if refcount is zero. -+ */ -+ if (PageHuge(page)) { -+ page = compound_head(page); -+ if (put_page_testzero(page)) -+ __put_compound_page(page); -+ return; -+ } -+ - if (unlikely(PageTail(page))) { - /* __split_huge_page_refcount can run under us */ - struct page *page_head = compound_trans_head(page); -@@ -158,26 +173,40 @@ bool __get_page_tail(struct page *page) - * proper PT lock that already serializes against - * split_huge_page(). - */ -- unsigned long flags; - bool got = false; -- struct page *page_head = compound_trans_head(page); - -- if (likely(page != page_head && get_page_unless_zero(page_head))) { -- /* -- * page_head wasn't a dangling pointer but it -- * may not be a head page anymore by the time -- * we obtain the lock. That is ok as long as it -- * can't be freed from under us. -- */ -- flags = compound_lock_irqsave(page_head); -- /* here __split_huge_page_refcount won't run anymore */ -- if (likely(PageTail(page))) { -- __get_page_tail_foll(page, false); -- got = true; -+ /* -+ * If this is a hugetlbfs page, it can not be split under -+ * us. Simply increment counts for tail page and its head page -+ */ -+ if (PageHuge(page)) { -+ struct page *page_head; -+ -+ page_head = compound_head(page); -+ atomic_inc(&page_head->_count); -+ got = true; -+ } else { -+ struct page *page_head = compound_trans_head(page); -+ unsigned long flags; -+ -+ if (likely(page != page_head && -+ get_page_unless_zero(page_head))) { -+ /* -+ * page_head wasn't a dangling pointer but it -+ * may not be a head page anymore by the time -+ * we obtain the lock. That is ok as long as it -+ * can't be freed from under us. -+ */ -+ flags = compound_lock_irqsave(page_head); -+ /* here __split_huge_page_refcount won't run anymore */ -+ if (likely(PageTail(page))) { -+ __get_page_tail_foll(page, false); -+ got = true; -+ } -+ compound_unlock_irqrestore(page_head, flags); -+ if (unlikely(!got)) -+ put_page(page_head); - } -- compound_unlock_irqrestore(page_head, flags); -- if (unlikely(!got)) -- put_page(page_head); - } - return got; - } diff --git a/queue-3.0/series b/queue-3.0/series index e22b514a5eb..9c17e00687f 100644 --- a/queue-3.0/series +++ b/queue-3.0/series @@ -8,6 +8,5 @@ dm-snapshot-workaround-for-a-false-positive-lockdep-warning.patch dm-snapshot-fix-performance-degradation-due-to-small-hash-size.patch drm-i915-dp-increase-i2c-over-aux-retry-interval-on-aux-defer.patch hwmon-applesmc-check-key-count-before-proceeding.patch -mm-fix-aio-performance-regression-for-database-caused-by-thp.patch hwmon-applesmc-silence-uninitialized-warnings.patch splice-fix-racy-pipe-buffers-uses.patch