In a corner case of concurrent driver removal and driver reset,
bonding resource is first released in hns_roce_hw_v2_exit() during
driver removal, and then is allocated again in hns_roce_register_device()
during driver reset. This leads to memory leak because the release
timing has already passed. This may also lead to a kernel panic
as below because of the leaked notifier callback:
Call trace:
0xffffa20fccc04978 (P)
raw_notifier_call_chain+0x20/0x38
call_netdevice_notifiers_info+0x60/0xb8
netdev_lower_state_changed+0x4c/0xb8
As Sashiko suggested, the teardown order of bonding resources should
be inverted to make sure the resources are released when the driver
is removed.
Fixes: b37ad2e290fc ("RDMA/hns: Initialize bonding resources")
Link: https://patch.msgid.link/r/20260613102045.811623-1-huangjunxian6@hisilicon.com
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
static void __exit hns_roce_hw_v2_exit(void)
{
- hns_roce_dealloc_bond_grp();
hnae3_unregister_client(&hns_roce_hw_v2_client);
+ hns_roce_dealloc_bond_grp();
hns_roce_cleanup_debugfs();
}