]> git.ipfire.org Git - thirdparty/linux.git/commitdiff
net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
authorTariq Toukan <tariqt@nvidia.com>
Tue, 9 Dec 2025 12:56:16 +0000 (14:56 +0200)
committerPaolo Abeni <pabeni@redhat.com>
Thu, 18 Dec 2025 12:39:29 +0000 (13:39 +0100)
During channel reconfiguration (e.g., ethtool private flags changes),
the driver can trigger a kernel BUG_ON in dql_completed() with the error
"kernel BUG at lib/dynamic_queue_limits.c:99".

The issue occurs in the following sequence:

During mlx5e_safe_switch_params(), old channels are deactivated via
mlx5e_deactivate_txqsq(). New channels are created and activated, taking
ownership of the netdev_queues and their BQL state.

When old channels are closed via mlx5e_close_txqsq(), there may be
pending TX descriptors (sq->cc != sq->pc) that were in-flight during the
deactivation.

mlx5e_free_txqsq_descs() frees these pending descriptors and attempts to
complete them via netdev_tx_completed_queue().

However, the BQL state (dql->num_queued and dql->num_completed) have
been reset in mlx5e_activate_txqsq and belong to the new queue owner,
leading to dql->num_queued - dql->num_completed < nbytes.

This triggers BUG_ON(count > num_queued - num_completed) in
dql_completed().

Fixes: 3b88a535a8e1 ("net/mlx5e: Defer channels closure to reduce interface down time")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-9-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c

index 14884b9ea7f396069c17b778d2a65cb05608ff0e..a01ee656a1e7f3db814f9caff7c4d43ba83ca5dd 100644 (file)
@@ -939,7 +939,11 @@ void mlx5e_free_txqsq_descs(struct mlx5e_txqsq *sq)
        sq->dma_fifo_cc = dma_fifo_cc;
        sq->cc = sqcc;
 
-       netdev_tx_completed_queue(sq->txq, npkts, nbytes);
+       /* Do not update BQL for TXQs that got replaced by new active ones, as
+        * netdev_tx_reset_queue() is called for them in mlx5e_activate_txqsq().
+        */
+       if (sq == sq->priv->txq2sq[sq->txq_ix])
+               netdev_tx_completed_queue(sq->txq, npkts, nbytes);
 }
 
 #ifdef CONFIG_MLX5_CORE_IPOIB