]>
Commit | Line | Data |
---|---|---|
a756faa3 GKH |
1 | From 778c14affaf94a9e4953179d3e13a544ccce7707 Mon Sep 17 00:00:00 2001 |
2 | From: David Rientjes <rientjes@google.com> | |
3 | Date: Thu, 30 Jan 2014 15:46:11 -0800 | |
4 | Subject: mm, oom: base root bonus on current usage | |
5 | ||
6 | From: David Rientjes <rientjes@google.com> | |
7 | ||
8 | commit 778c14affaf94a9e4953179d3e13a544ccce7707 upstream. | |
9 | ||
10 | A 3% of system memory bonus is sometimes too excessive in comparison to | |
11 | other processes. | |
12 | ||
13 | With commit a63d83f427fb ("oom: badness heuristic rewrite"), the OOM | |
14 | killer tries to avoid killing privileged tasks by subtracting 3% of | |
15 | overall memory (system or cgroup) from their per-task consumption. But | |
16 | as a result, all root tasks that consume less than 3% of overall memory | |
17 | are considered equal, and so it only takes 33+ privileged tasks pushing | |
18 | the system out of memory for the OOM killer to do something stupid and | |
19 | kill dhclient or other root-owned processes. For example, on a 32G | |
20 | machine it can't tell the difference between the 1M agetty and the 10G | |
21 | fork bomb member. | |
22 | ||
23 | The changelog describes this 3% boost as the equivalent to the global | |
24 | overcommit limit being 3% higher for privileged tasks, but this is not | |
25 | the same as discounting 3% of overall memory from _every privileged task | |
26 | individually_ during OOM selection. | |
27 | ||
28 | Replace the 3% of system memory bonus with a 3% of current memory usage | |
29 | bonus. | |
30 | ||
31 | By giving root tasks a bonus that is proportional to their actual size, | |
32 | they remain comparable even when relatively small. In the example | |
33 | above, the OOM killer will discount the 1M agetty's 256 badness points | |
34 | down to 179, and the 10G fork bomb's 262144 points down to 183500 points | |
35 | and make the right choice, instead of discounting both to 0 and killing | |
36 | agetty because it's first in the task list. | |
37 | ||
38 | Signed-off-by: David Rientjes <rientjes@google.com> | |
39 | Reported-by: Johannes Weiner <hannes@cmpxchg.org> | |
40 | Acked-by: Johannes Weiner <hannes@cmpxchg.org> | |
41 | Cc: Michal Hocko <mhocko@suse.cz> | |
42 | Signed-off-by: Andrew Morton <akpm@linux-foundation.org> | |
43 | Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | |
44 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
45 | ||
46 | --- | |
47 | Documentation/filesystems/proc.txt | 4 ++-- | |
48 | mm/oom_kill.c | 2 +- | |
49 | 2 files changed, 3 insertions(+), 3 deletions(-) | |
50 | ||
51 | --- a/Documentation/filesystems/proc.txt | |
52 | +++ b/Documentation/filesystems/proc.txt | |
53 | @@ -1372,8 +1372,8 @@ may allocate from based on an estimation | |
54 | For example, if a task is using all allowed memory, its badness score will be | |
55 | 1000. If it is using half of its allowed memory, its score will be 500. | |
56 | ||
57 | -There is an additional factor included in the badness score: root | |
58 | -processes are given 3% extra memory over other tasks. | |
59 | +There is an additional factor included in the badness score: the current memory | |
60 | +and swap usage is discounted by 3% for root processes. | |
61 | ||
62 | The amount of "allowed" memory depends on the context in which the oom killer | |
63 | was called. If it is due to the memory assigned to the allocating task's cpuset | |
64 | --- a/mm/oom_kill.c | |
65 | +++ b/mm/oom_kill.c | |
66 | @@ -170,7 +170,7 @@ unsigned long oom_badness(struct task_st | |
67 | * implementation used by LSMs. | |
68 | */ | |
69 | if (has_capability_noaudit(p, CAP_SYS_ADMIN)) | |
70 | - adj -= 30; | |
71 | + points -= (points * 3) / 100; | |
72 | ||
73 | /* Normalize to oom_score_adj units */ | |
74 | adj *= totalpages / 1000; |