]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blame - releases/2.6.27.36/mmap-avoid-unnecessary-anon_vma-lock-acquisition-in-vma_adjust.patch
Fixes for 5.10
[thirdparty/kernel/stable-queue.git] / releases / 2.6.27.36 / mmap-avoid-unnecessary-anon_vma-lock-acquisition-in-vma_adjust.patch
CommitLineData
2dda5791
GKH
1From 252c5f94d944487e9f50ece7942b0fbf659c5c31 Mon Sep 17 00:00:00 2001
2From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
3Date: Mon, 21 Sep 2009 17:03:40 -0700
4Subject: mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()
5
6From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
7
8commit 252c5f94d944487e9f50ece7942b0fbf659c5c31 upstream.
9
10We noticed very erratic behavior [throughput] with the AIM7 shared
11workload running on recent distro [SLES11] and mainline kernels on an
128-socket, 32-core, 256GB x86_64 platform. On the SLES11 kernel
13[2.6.27.19+] with Barcelona processors, as we increased the load [10s of
14thousands of tasks], the throughput would vary between two "plateaus"--one
15at ~65K jobs per minute and one at ~130K jpm. The simple patch below
16causes the results to smooth out at the ~130k plateau.
17
18But wait, there's more:
19
20We do not see this behavior on smaller platforms--e.g., 4 socket/8 core.
21This could be the result of the larger number of cpus on the larger
22platform--a scalability issue--or it could be the result of the larger
23number of interconnect "hops" between some nodes in this platform and how
24the tasks for a given load end up distributed over the nodes' cpus and
25memories--a stochastic NUMA effect.
26
27The variability in the results are less pronounced [on the same platform]
28with Shanghai processors and with mainline kernels. With 31-rc6 on
29Shanghai processors and 288 file systems on 288 fibre attached storage
30volumes, the curves [jpm vs load] are both quite flat with the patched
31kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K
32jpm] than the unpatched kernel.
33
34Profiling indicated that the "slow" runs were incurring high[er]
35contention on an anon_vma lock in vma_adjust(), apparently called from the
36sbrk() system call.
37
38The patch:
39
40A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the
41anon_vma lock when we're only adjusting the end of a vma, as is the case
42for brk(). The comment questions whether it's worth while to optimize for
43this case. Apparently, on the newer, larger x86_64 platforms, with
44interesting NUMA topologies, it is worth while--especially considering
45that the patch [if correct!] is quite simple.
46
47We can detect this condition--no overlap with next vma--by noting a NULL
48"importer". The anon_vma pointer will also be NULL in this case, so
49simply avoid loading vma->anon_vma to avoid the lock.
50
51However, we DO need to take the anon_vma lock when we're inserting a vma
52['insert' non-NULL] even when we have no overlap [NULL "importer"], so we
53need to check for 'insert', as well. And Hugh points out that we should
54also take it when adjusting vm_start (so that rmap.c can rely upon
55vma_address() while it holds the anon_vma lock).
56
57akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it
58might be -stable material even though thiss isn't a regression: "this
59issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is
60severe on large machine (4*8*2 cpu)"
61
62[hugh.dickins@tiscali.co.uk: test vma start too]
63Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
64Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
65Cc: Nick Piggin <npiggin@suse.de>
66Cc: Eric Whitney <eric.whitney@hp.com>
67Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
68Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
69Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
70Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
71
72---
73 mm/mmap.c | 4 ++--
74 1 file changed, 2 insertions(+), 2 deletions(-)
75
76--- a/mm/mmap.c
77+++ b/mm/mmap.c
78@@ -575,9 +575,9 @@ again: remove_next = 1 + (end > next->
79
80 /*
81 * When changing only vma->vm_end, we don't really need
82- * anon_vma lock: but is that case worth optimizing out?
83+ * anon_vma lock.
84 */
85- if (vma->anon_vma)
86+ if (vma->anon_vma && (insert || importer || start != vma->vm_start))
87 anon_vma = vma->anon_vma;
88 if (anon_vma) {
89 spin_lock(&anon_vma->lock);