]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob - releases/4.11.2/ib-mlx4-reduce-sriov-multicast-cleanup-warning-message-to-debug-level.patch
4.9-stable patches
[thirdparty/kernel/stable-queue.git] / releases / 4.11.2 / ib-mlx4-reduce-sriov-multicast-cleanup-warning-message-to-debug-level.patch
1 From fb7a91746af18b2ebf596778b38a709cdbc488d3 Mon Sep 17 00:00:00 2001
2 From: Jack Morgenstein <jackm@dev.mellanox.co.il>
3 Date: Tue, 21 Mar 2017 12:57:06 +0200
4 Subject: IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
5
6 From: Jack Morgenstein <jackm@dev.mellanox.co.il>
7
8 commit fb7a91746af18b2ebf596778b38a709cdbc488d3 upstream.
9
10 A warning message during SRIOV multicast cleanup should have actually been
11 a debug level message. The condition generating the warning does no harm
12 and can fill the message log.
13
14 In some cases, during testing, some tests were so intense as to swamp the
15 message log with these warning messages, causing a stall in the console
16 message log output task. This stall caused an NMI to be sent to all CPUs
17 (so that they all dumped their stacks into the message log).
18 Aside from the message flood causing an NMI, the tests all passed.
19
20 Once the message flood which caused the NMI is removed (by reducing the
21 warning message to debug level), the NMI no longer occurs.
22
23 Sample message log (console log) output illustrating the flood and
24 resultant NMI (snippets with comments and modified with ... instead
25 of hex digits, to satisfy checkpatch.pl):
26
27 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
28 *** About 4000 almost identical lines in less than one second ***
29 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
30 INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
31 *** { 17} above indicates that CPU 17 was the one that stalled ***
32 sending NMI to all CPUs:
33 ...
34 NMI backtrace for cpu 17
35 CPU: 17 PID: 45909 Comm: kworker/17:2
36 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
37 Workqueue: events fb_flashcursor
38 task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
39 RIP: 0010:[ffffffff81......] [ffffffff81......] io_serial_in+0x15/0x20
40 RSP: 0018:ffff88064e257cb0 EFLAGS: 00000002
41 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
42 RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
43 RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
44 R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
45 R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
46 FS: 0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
47 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080......
48 CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
49 DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
50 DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
51 Stack:
52 ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
53 ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
54 ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
55 Call Trace:
56 [<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
57 [<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
58 [<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
59 [<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
60 [<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
61 [<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
62 [<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
63 [<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
64 [<ffffffff81355c30>] ? bit_clear+0x120/0x120
65 [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
66 [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
67 [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
68 [<ffffffff810a5aef>] kthread+0xcf/0xe0
69 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
70 [<ffffffff81645858>] ret_from_fork+0x58/0x90
71 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
72 Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6
73
74 As indicated in the stack trace above, the console output task got swamped.
75
76 Fixes: b9c5d6a64358 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
77 Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
78 Signed-off-by: Leon Romanovsky <leon@kernel.org>
79 Signed-off-by: Doug Ledford <dledford@redhat.com>
80 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
81
82 ---
83 drivers/infiniband/hw/mlx4/mcg.c | 3 ++-
84 1 file changed, 2 insertions(+), 1 deletion(-)
85
86 --- a/drivers/infiniband/hw/mlx4/mcg.c
87 +++ b/drivers/infiniband/hw/mlx4/mcg.c
88 @@ -1102,7 +1102,8 @@ static void _mlx4_ib_mcg_port_cleanup(st
89 while ((p = rb_first(&ctx->mcg_table)) != NULL) {
90 group = rb_entry(p, struct mcast_group, node);
91 if (atomic_read(&group->refcount))
92 - mcg_warn_group(group, "group refcount %d!!! (pointer %p)\n", atomic_read(&group->refcount), group);
93 + mcg_debug_group(group, "group refcount %d!!! (pointer %p)\n",
94 + atomic_read(&group->refcount), group);
95
96 force_clean_group(group);
97 }