]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob - releases/4.19.7/tipc-fix-lockdep-warning-during-node-delete.patch
5.1-stable patches
[thirdparty/kernel/stable-queue.git] / releases / 4.19.7 / tipc-fix-lockdep-warning-during-node-delete.patch
1 From foo@baz Mon Dec 3 10:10:43 CET 2018
2 From: Jon Maloy <donmalo99@gmail.com>
3 Date: Mon, 26 Nov 2018 12:26:14 -0500
4 Subject: tipc: fix lockdep warning during node delete
5
6 From: Jon Maloy <donmalo99@gmail.com>
7
8 [ Upstream commit ec835f891232d7763dea9da0358f31e24ca6dfb7 ]
9
10 We see the following lockdep warning:
11
12 [ 2284.078521] ======================================================
13 [ 2284.078604] WARNING: possible circular locking dependency detected
14 [ 2284.078604] 4.19.0+ #42 Tainted: G E
15 [ 2284.078604] ------------------------------------------------------
16 [ 2284.078604] rmmod/254 is trying to acquire lock:
17 [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
18 [ 2284.078604]
19 [ 2284.078604] but task is already holding lock:
20 [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
21 [ 2284.078604]
22 [ 2284.078604] which lock already depends on the new lock.
23 [ 2284.078604]
24 [ 2284.078604]
25 [ 2284.078604] the existing dependency chain (in reverse order) is:
26 [ 2284.078604]
27 [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
28 [ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc]
29 [ 2284.078604] call_timer_fn+0xa1/0x280
30 [ 2284.078604] run_timer_softirq+0x1f2/0x4d0
31 [ 2284.078604] __do_softirq+0xfc/0x413
32 [ 2284.078604] irq_exit+0xb5/0xc0
33 [ 2284.078604] smp_apic_timer_interrupt+0xac/0x210
34 [ 2284.078604] apic_timer_interrupt+0xf/0x20
35 [ 2284.078604] default_idle+0x1c/0x140
36 [ 2284.078604] do_idle+0x1bc/0x280
37 [ 2284.078604] cpu_startup_entry+0x19/0x20
38 [ 2284.078604] start_secondary+0x187/0x1c0
39 [ 2284.078604] secondary_startup_64+0xa4/0xb0
40 [ 2284.078604]
41 [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
42 [ 2284.078604] del_timer_sync+0x34/0xa0
43 [ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc]
44 [ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc]
45 [ 2284.078604] tipc_net_stop+0x154/0x170 [tipc]
46 [ 2284.078604] tipc_exit_net+0x16/0x30 [tipc]
47 [ 2284.078604] ops_exit_list.isra.8+0x36/0x70
48 [ 2284.078604] unregister_pernet_operations+0x87/0xd0
49 [ 2284.078604] unregister_pernet_subsys+0x1d/0x30
50 [ 2284.078604] tipc_exit+0x11/0x6f2 [tipc]
51 [ 2284.078604] __x64_sys_delete_module+0x1df/0x240
52 [ 2284.078604] do_syscall_64+0x66/0x460
53 [ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe
54 [ 2284.078604]
55 [ 2284.078604] other info that might help us debug this:
56 [ 2284.078604]
57 [ 2284.078604] Possible unsafe locking scenario:
58 [ 2284.078604]
59 [ 2284.078604] CPU0 CPU1
60 [ 2284.078604] ---- ----
61 [ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
62 [ 2284.078604] lock((&n->timer)#2);
63 [ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
64 [ 2284.078604] lock((&n->timer)#2);
65 [ 2284.078604]
66 [ 2284.078604] *** DEADLOCK ***
67 [ 2284.078604]
68 [ 2284.078604] 3 locks held by rmmod/254:
69 [ 2284.078604] #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
70 [ 2284.078604] #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
71 [ 2284.078604] #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
72 [...}
73
74 The reason is that the node timer handler sometimes needs to delete a
75 node which has been disconnected for too long. To do this, it grabs
76 the lock 'node_list_lock', which may at the same time be held by the
77 generic node cleanup function, tipc_node_stop(), during module removal.
78 Since the latter is calling del_timer_sync() inside the same lock, we
79 have a potential deadlock.
80
81 We fix this letting the timer cleanup function use spin_trylock()
82 instead of just spin_lock(), and when it fails to grab the lock it
83 just returns so that the timer handler can terminate its execution.
84 This is safe to do, since tipc_node_stop() anyway is about to
85 delete both the timer and the node instance.
86
87 Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance")
88 Acked-by: Ying Xue <ying.xue@windriver.com>
89 Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
90 Signed-off-by: David S. Miller <davem@davemloft.net>
91 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
92 ---
93 net/tipc/node.c | 7 +++++--
94 1 file changed, 5 insertions(+), 2 deletions(-)
95
96 --- a/net/tipc/node.c
97 +++ b/net/tipc/node.c
98 @@ -584,12 +584,15 @@ static void tipc_node_clear_links(struc
99 /* tipc_node_cleanup - delete nodes that does not
100 * have active links for NODE_CLEANUP_AFTER time
101 */
102 -static int tipc_node_cleanup(struct tipc_node *peer)
103 +static bool tipc_node_cleanup(struct tipc_node *peer)
104 {
105 struct tipc_net *tn = tipc_net(peer->net);
106 bool deleted = false;
107
108 - spin_lock_bh(&tn->node_list_lock);
109 + /* If lock held by tipc_node_stop() the node will be deleted anyway */
110 + if (!spin_trylock_bh(&tn->node_list_lock))
111 + return false;
112 +
113 tipc_node_write_lock(peer);
114
115 if (!node_is_up(peer) && time_after(jiffies, peer->delete_at)) {