]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
net/mlx5: E-switch, unload IB representors when unloading ETH representors
authorChiara Meiohas <cmeiohas@nvidia.com>
Thu, 7 Nov 2024 18:35:21 +0000 (20:35 +0200)
committerJakub Kicinski <kuba@kernel.org>
Tue, 12 Nov 2024 03:23:38 +0000 (19:23 -0800)
IB representors depend on ETH representors, so the IB representors
should not exist without the ETH ones. When unloading the ETH
representors, the corresponding IB representors should be also
unloaded.

The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
introduced the use of the ib_device_set_netdev API in IB
repsresentors. ib_device_set_netdev() increments the refcount of
the representor's netdev when loading an IB representor and
decrements it when unloading.
Without the unloading of the IB representor, the refcount of the
representor's netdev remains greater than 0, preventing it from
being unregistered.
The patch uncovered an underlying bug where the eth representor is
unloaded, without unloading the IB representor.

This issue happened when using multiport E-switch and rebooting,
causing the shutdown to hang when unloading the ETH representor
because the refcount of the representor's netdevice was greater than 0.

Call trace:
unregister_netdevice: waiting for eth3 to become free. Usage count = 2
ref_tracker: eth%d@00000000661d60f7 has 1/1 users at
ib_device_set_netdev+0x160/0x2d0 [ib_core]
mlx5_ib_vport_rep_load+0x104/0x3f0 [mlx5_ib]
mlx5_eswitch_reload_ib_reps+0xfc/0x110 [mlx5_core]
mlx5_mpesw_work+0x236/0x330 [mlx5_core]
process_one_work+0x169/0x320
worker_thread+0x288/0x3a0
kthread+0xb8/0xe0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x11/0x20

Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241107183527.676877-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c

index f24f91d213f24c782dfaaa9cef00259f7261ba69..8cf61ae8b89d2485d9d0de32e2e86d8442ae1917 100644 (file)
@@ -2527,8 +2527,11 @@ static void __esw_offloads_unload_rep(struct mlx5_eswitch *esw,
                                      struct mlx5_eswitch_rep *rep, u8 rep_type)
 {
        if (atomic_cmpxchg(&rep->rep_data[rep_type].state,
-                          REP_LOADED, REP_REGISTERED) == REP_LOADED)
+                          REP_LOADED, REP_REGISTERED) == REP_LOADED) {
+               if (rep_type == REP_ETH)
+                       __esw_offloads_unload_rep(esw, rep, REP_IB);
                esw->offloads.rep_ops[rep_type]->unload(rep);
+       }
 }
 
 static void __unload_reps_all_vport(struct mlx5_eswitch *esw, u8 rep_type)