From 2676d2162ad9202f8a9932e8b245ca9459679bcd Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 13 Aug 2017 08:56:13 -0700 Subject: [PATCH] 3.18-stable patches added patches: mm-ratelimit-pfns-busy-info-message.patch --- .../mm-ratelimit-pfns-busy-info-message.patch | 79 +++++++++++++++++++ queue-3.18/series | 1 + queue-4.12/series | 3 + queue-4.4/series | 1 + queue-4.9/series | 3 + 5 files changed, 87 insertions(+) create mode 100644 queue-3.18/mm-ratelimit-pfns-busy-info-message.patch create mode 100644 queue-3.18/series create mode 100644 queue-4.12/series create mode 100644 queue-4.9/series diff --git a/queue-3.18/mm-ratelimit-pfns-busy-info-message.patch b/queue-3.18/mm-ratelimit-pfns-busy-info-message.patch new file mode 100644 index 00000000000..4c9acccd857 --- /dev/null +++ b/queue-3.18/mm-ratelimit-pfns-busy-info-message.patch @@ -0,0 +1,79 @@ +From 75dddef32514f7aa58930bde6a1263253bc3d4ba Mon Sep 17 00:00:00 2001 +From: Jonathan Toppins +Date: Thu, 10 Aug 2017 15:23:35 -0700 +Subject: mm: ratelimit PFNs busy info message + +From: Jonathan Toppins + +commit 75dddef32514f7aa58930bde6a1263253bc3d4ba upstream. + +The RDMA subsystem can generate several thousand of these messages per +second eventually leading to a kernel crash. Ratelimit these messages +to prevent this crash. + +Doug said: + "I've been carrying a version of this for several kernel versions. I + don't remember when they started, but we have one (and only one) class + of machines: Dell PE R730xd, that generate these errors. When it + happens, without a rate limit, we get rcu timeouts and kernel oopses. + With the rate limit, we just get a lot of annoying kernel messages but + the machine continues on, recovers, and eventually the memory + operations all succeed" + +And: + "> Well... why are all these EBUSY's occurring? It sounds inefficient + > (at least) but if it is expected, normal and unavoidable then + > perhaps we should just remove that message altogether? + + I don't have an answer to that question. To be honest, I haven't + looked real hard. We never had this at all, then it started out of the + blue, but only on our Dell 730xd machines (and it hits all of them), + but no other classes or brands of machines. And we have our 730xd + machines loaded up with different brands and models of cards (for + instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an + ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines + meant it wasn't tied to any particular brand/model of RDMA hardware. + To me, it always smelled of a hardware oddity specific to maybe the + CPUs or mainboard chipsets in these machines, so given that I'm not an + mm expert anyway, I never chased it down. + + A few other relevant details: it showed up somewhere around 4.8/4.9 or + thereabouts. It never happened before, but the prinkt has been there + since the 3.18 days, so possibly the test to trigger this message was + changed, or something else in the allocator changed such that the + situation started happening on these machines? + + And, like I said, it is specific to our 730xd machines (but they are + all identical, so that could mean it's something like their specific + ram configuration is causing the allocator to hit this on these + machine but not on other machines in the cluster, I don't want to say + it's necessarily the model of chipset or CPU, there are other bits of + identicalness between these machines)" + +Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com +Signed-off-by: Jonathan Toppins +Reviewed-by: Doug Ledford +Tested-by: Doug Ledford +Cc: Michal Hocko +Cc: Vlastimil Babka +Cc: Mel Gorman +Cc: Hillf Danton +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/page_alloc.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -6424,7 +6424,7 @@ int alloc_contig_range(unsigned long sta + + /* Make sure the range is really isolated. */ + if (test_pages_isolated(outer_start, end, false)) { +- pr_info("%s: [%lx, %lx) PFNs busy\n", ++ pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", + __func__, outer_start, end); + ret = -EBUSY; + goto done; diff --git a/queue-3.18/series b/queue-3.18/series new file mode 100644 index 00000000000..381ae9cf0fe --- /dev/null +++ b/queue-3.18/series @@ -0,0 +1 @@ +mm-ratelimit-pfns-busy-info-message.patch diff --git a/queue-4.12/series b/queue-4.12/series new file mode 100644 index 00000000000..257dcd53558 --- /dev/null +++ b/queue-4.12/series @@ -0,0 +1,3 @@ +mm-ratelimit-pfns-busy-info-message.patch +mm-fix-list-corruptions-on-shmem-shrinklist.patch +futex-remove-unnecessary-warning-from-get_futex_key.patch diff --git a/queue-4.4/series b/queue-4.4/series index 415d3c874ee..24734539160 100644 --- a/queue-4.4/series +++ b/queue-4.4/series @@ -1 +1,2 @@ cpuset-fix-a-deadlock-due-to-incomplete-patching-of-cpusets_enabled.patch +mm-ratelimit-pfns-busy-info-message.patch diff --git a/queue-4.9/series b/queue-4.9/series new file mode 100644 index 00000000000..257dcd53558 --- /dev/null +++ b/queue-4.9/series @@ -0,0 +1,3 @@ +mm-ratelimit-pfns-busy-info-message.patch +mm-fix-list-corruptions-on-shmem-shrinklist.patch +futex-remove-unnecessary-warning-from-get_futex_key.patch -- 2.47.3