From: Yevgeny Kliteynik Date: Thu, 7 May 2026 17:34:41 +0000 (+0300) Subject: net/mlx5: HWS, Check if device is down while polling for completion X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=e3ec1570895bcf81f443e8ac60059edc61dbfca3;p=thirdparty%2Flinux.git net/mlx5: HWS, Check if device is down while polling for completion In case the device is down for any reason (e.g. FLR), the HW will no longer generate completions - no point polling and waiting for timeout. Signed-off-by: Yevgeny Kliteynik Reviewed-by: Erez Shitrit Reviewed-by: Shay Drori Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/20260507173443.320465-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c index 6dcd9c2a78aa..eae02bc74221 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c @@ -422,6 +422,18 @@ int mlx5hws_bwc_queue_poll(struct mlx5hws_context *ctx, if (!got_comp && !drain) return 0; + if (unlikely(ctx->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)) { + /* If the device is down for any reason (e.g. FLR), the HW will + * no longer generate completions. + * Note that ETIMEDOUT is returned here because the BWC layer + * already has a special handling for timeouts - it breaks the + * rehash / resize / shrink loops to avoid chain of timeouts. + */ + mlx5_core_warn_once(ctx->mdev, + "BWC poll: device is down, polling for completion aborted\n"); + return -ETIMEDOUT; + } + queue_full = mlx5hws_send_engine_full(&ctx->send_queue[queue_id]); while (queue_full || ((got_comp || drain) && *pending_rules)) { ret = mlx5hws_send_queue_poll(ctx, queue_id, comp, burst_th);