]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
net/mlx5: Abort new commands if all command slots are stalled
authorSaeed Mahameed <saeedm@nvidia.com>
Mon, 17 Nov 2025 21:42:08 +0000 (23:42 +0200)
committerJakub Kicinski <kuba@kernel.org>
Wed, 19 Nov 2025 02:53:34 +0000 (18:53 -0800)
commitfbb9933666e31f84c62e9620e9ec4d220ee31ab4
tree7fe52970c3b3414095b21697d007e36e73bbd2e8
parentea3270351c792632db5722ea3ca83b468cebb531
net/mlx5: Abort new commands if all command slots are stalled

In case of a FW issue, FW might be not responding to FW commands,
causing kernel lockout for a long period of time, e.g. rtnl_lock held
while ethtool is trying to collect stats waiting for FW to respond to
multiple commands, when all of them will timeout.

While there's no immediate indication of the FW lockout, we can safely
assume that something is wrong when all command slots are busy and in
a timeout state and no FW completion was received on any of them.

In such case, start immediately failing new commands.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/mellanox/mlx5/core/cmd.c
include/linux/mlx5/driver.h