]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob
5f7aba44a986feb45baff1367add9bbffe5a2737
[thirdparty/kernel/stable-queue.git] /
1 From b4c46484dc3fa3721d68fdfae85c1d7b1f6b5472 Mon Sep 17 00:00:00 2001
2 From: Roman Gushchin <guro@fb.com>
3 Date: Fri, 30 Aug 2019 16:04:39 -0700
4 Subject: mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
5
6 From: Roman Gushchin <guro@fb.com>
7
8 commit b4c46484dc3fa3721d68fdfae85c1d7b1f6b5472 upstream.
9
10 Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
11 with the hierarchical ones") effectively decreased the precision of
12 per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
13
14 That's good for displaying in memory.stat, but brings a serious
15 regression into the reclaim process.
16
17 One issue I've discovered and debugged is the following:
18 lruvec_lru_size() can return 0 instead of the actual number of pages in
19 the lru list, preventing the kernel to reclaim last remaining pages.
20 Result is yet another dying memory cgroups flooding. The opposite is
21 also happening: scanning an empty lru list is the waste of cpu time.
22
23 Also, inactive_list_is_low() can return incorrect values, preventing the
24 active lru from being scanned and freed. It can fail both because the
25 size of active and inactive lists are inaccurate, and because the number
26 of workingset refaults isn't precise. In other words, the result is
27 pretty random.
28
29 I'm not sure, if using the approximate number of slab pages in
30 count_shadow_number() is acceptable, but issues described above are
31 enough to partially revert the patch.
32
33 Let's keep per-memcg vmstat_local batched (they are only used for
34 displaying stats to the userspace), but keep lruvec stats precise. This
35 change fixes the dead memcg flooding on my setup.
36
37 Link: http://lkml.kernel.org/r/20190817004726.2530670-1-guro@fb.com
38 Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones")
39 Signed-off-by: Roman Gushchin <guro@fb.com>
40 Acked-by: Yafang Shao <laoar.shao@gmail.com>
41 Cc: Johannes Weiner <hannes@cmpxchg.org>
42 Cc: Michal Hocko <mhocko@kernel.org>
43 Cc: <stable@vger.kernel.org>
44 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
45 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
46 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
47
48 ---
49 mm/memcontrol.c | 8 +++-----
50 1 file changed, 3 insertions(+), 5 deletions(-)
51
52 --- a/mm/memcontrol.c
53 +++ b/mm/memcontrol.c
54 @@ -748,15 +748,13 @@ void __mod_lruvec_state(struct lruvec *l
55 /* Update memcg */
56 __mod_memcg_state(memcg, idx, val);
57
58 + /* Update lruvec */
59 + __this_cpu_add(pn->lruvec_stat_local->count[idx], val);
60 +
61 x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
62 if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
63 struct mem_cgroup_per_node *pi;
64
65 - /*
66 - * Batch local counters to keep them in sync with
67 - * the hierarchical ones.
68 - */
69 - __this_cpu_add(pn->lruvec_stat_local->count[idx], x);
70 for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id))
71 atomic_long_add(x, &pi->lruvec_stat[idx]);
72 x = 0;