]>
Commit | Line | Data |
---|---|---|
14f1017b GKH |
1 | From akpm@linux-foundation.org Thu Oct 1 15:24:58 2009 |
2 | From: Lee Schermerhorn <Lee.Schermerhorn@hp.com> | |
3 | Date: Mon, 21 Sep 2009 17:01:04 -0700 | |
4 | Subject: hugetlb: restore interleaving of bootmem huge pages (2.6.31) | |
5 | To: torvalds@linux-foundation.org | |
6 | Cc: Lee.Schermerhorn@hp.com, lee.schermerhorn@hp.com, ak@linux.intel.com, eric.whitney@hp.com, mel@csn.ul.ie, rientjes@google.com, agl@us.ibm.com, apw@canonical.com, akpm@linux-foundation.org, stable@kernel.org | |
7 | Message-ID: <200909220001.n8M014vN026389@imap1.linux-foundation.org> | |
8 | ||
9 | ||
10 | From: Lee Schermerhorn <Lee.Schermerhorn@hp.com> | |
11 | ||
12 | Not upstream as it is fixed differently in .32 | |
13 | ||
14 | I noticed that alloc_bootmem_huge_page() will only advance to the next | |
15 | node on failure to allocate a huge page. I asked about this on linux-mm | |
16 | and linux-numa, cc'ing the usual huge page suspects. Mel Gorman | |
17 | responded: | |
18 | ||
19 | I strongly suspect that the same node being used until allocation | |
20 | failure instead of round-robin is an oversight and not deliberate | |
21 | at all. It appears to be a side-effect of a fix made way back in | |
22 | commit 63b4613c3f0d4b724ba259dc6c201bb68b884e1a ["hugetlb: fix | |
23 | hugepage allocation with memoryless nodes"]. Prior to that patch | |
24 | it looked like allocations would always round-robin even when | |
25 | allocation was successful. | |
26 | ||
27 | Andy Whitcroft countered that the existing behavior looked like Andi | |
28 | Kleen's original implementation and suggested that we ask him. We did and | |
29 | Andy replied that his intention was to interleave the allocations. So, | |
30 | ... | |
31 | ||
32 | This patch moves the advance of the hstate next node from which to | |
33 | allocate up before the test for success of the attempted allocation. This | |
34 | will unconditionally advance the next node from which to alloc, | |
35 | interleaving successful allocations over the nodes with sufficient | |
36 | contiguous memory, and skipping over nodes that fail the huge page | |
37 | allocation attempt. | |
38 | ||
39 | Note that alloc_bootmem_huge_page() will only be called for huge pages of | |
40 | order > MAX_ORDER. | |
41 | ||
42 | Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> | |
43 | Reviewed-by: Andi Kleen <ak@linux.intel.com> | |
44 | Cc: Mel Gorman <mel@csn.ul.ie> | |
45 | Cc: David Rientjes <rientjes@google.com> | |
46 | Cc: Adam Litke <agl@us.ibm.com> | |
47 | Cc: Andy Whitcroft <apw@canonical.com> | |
48 | Cc: Eric Whitney <eric.whitney@hp.com> | |
49 | Signed-off-by: Andrew Morton <akpm@linux-foundation.org> | |
50 | Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> | |
51 | ||
52 | --- | |
53 | mm/hugetlb.c | 2 +- | |
54 | 1 file changed, 1 insertion(+), 1 deletion(-) | |
55 | ||
56 | --- a/mm/hugetlb.c | |
57 | +++ b/mm/hugetlb.c | |
58 | @@ -983,6 +983,7 @@ __attribute__((weak)) int alloc_bootmem_ | |
59 | NODE_DATA(h->hugetlb_next_nid), | |
60 | huge_page_size(h), huge_page_size(h), 0); | |
61 | ||
62 | + hstate_next_node(h); | |
63 | if (addr) { | |
64 | /* | |
65 | * Use the beginning of the huge page to store the | |
66 | @@ -993,7 +994,6 @@ __attribute__((weak)) int alloc_bootmem_ | |
67 | if (m) | |
68 | goto found; | |
69 | } | |
70 | - hstate_next_node(h); | |
71 | nr_nodes--; | |
72 | } | |
73 | return 0; |