]> git.ipfire.org Git - thirdparty/linux.git/commit
Merge branch 'virtio-net-fix-the-deadlock-when-disabling-rx-napi'
authorJakub Kicinski <kuba@kernel.org>
Sat, 10 Jan 2026 19:13:03 +0000 (11:13 -0800)
committerJakub Kicinski <kuba@kernel.org>
Sat, 10 Jan 2026 19:13:03 +0000 (11:13 -0800)
commitcac2c363c41c12ec9ea1caefdf90f99607a531aa
tree81b1021c873444654896a7b850fe2e3cb5150f61
parent7470a7a63dc162f07c26dbf960e41ee1e248d80e
parenta0c159647e6627496a85e57ca81f8cd6c685564b
Merge branch 'virtio-net-fix-the-deadlock-when-disabling-rx-napi'

Bui Quang Minh says:

====================
virtio-net: fix the deadlock when disabling rx NAPI

Calling napi_disable() on an already disabled napi can cause the
deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill
when pausing rx"), to avoid the deadlock, when pausing the RX in
virtnet_rx_pause[_all](), we disable and cancel the delayed refill work.
However, in the virtnet_rx_resume_all(), we enable the delayed refill
work too early before enabling all the receive queue napis.

The deadlock can be reproduced by running
selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net
device and inserting a cond_resched() inside the for loop in
virtnet_rx_resume_all() to increase the success rate. Because the worker
processing the delayed refilled work runs on the same CPU as
virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock.
In real scenario, the contention on netdev_lock can cause the
reschedule.

Due to the complexity of delayed refill worker, in this series, we remove
it. When we fail to refill the receive buffer, we will retry in the next
NAPI poll instead.

- Patch 1: removes delayed refill worker schedule and retry refill
  in next NAPI
- Patch 2, 3: removes and clean up unused delayed refill worker code

For testing, I've run the following tests with no issue so far
- selftests/drivers/net/hw/xsk_reconfig.py which sets up the XDP zerocopy
   without providing any descriptors to the fill ring. As a result,
   try_fill_recv will always fail.
- Send TCP packets from host to guest while guest is nearly OOM and some
  try_fill_recv calls fail.

v2: https://lore.kernel.org/20260102152023.10773-1-minhquangbui99@gmail.com
v1: https://lore.kernel.org/20251223152533.24364-1-minhquangbui99@gmail.com

Link to the previous approach and discussion:
https://lore.kernel.org/20251212152741.11656-1-minhquangbui99@gmail.com
====================

Link: https://patch.msgid.link/20260106150438.7425-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>