--- /dev/null
+From daniel@iogearbox.net Tue Feb 17 11:33:58 2026
+From: Daniel Borkmann <daniel@iogearbox.net>
+Date: Mon, 16 Feb 2026 22:31:13 +0100
+Subject: Revert "wireguard: device: enable threaded NAPI"
+To: gregkh@linuxfoundation.org
+Cc: stable@vger.kernel.org, netdev@vger.kernel.org, Jason@zx2c4.com, kuba@kernel.org
+Message-ID: <c05f3968fa63b630ce22d65aa03e6dcef4bf4e83.1771277247.git.daniel@iogearbox.net>
+
+From: Daniel Borkmann <daniel@iogearbox.net>
+
+This reverts commit 933466fc50a8e4eb167acbd0d8ec96a078462e9c which is
+commit db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e upstream.
+
+We have had three independent production user reports in combination
+with Cilium utilizing WireGuard as encryption underneath that k8s Pod
+E/W traffic to certain peer nodes fully stalled. The situation appears
+as follows:
+
+ - Occurs very rarely but at random times under heavy networking load.
+ - Once the issue triggers the decryption side stops working completely
+ for that WireGuard peer, other peers keep working fine. The stall
+ happens also for newly initiated connections towards that particular
+ WireGuard peer.
+ - Only the decryption side is affected, never the encryption side.
+ - Once it triggers, it never recovers and remains in this state,
+ the CPU/mem on that node looks normal, no leak, busy loop or crash.
+ - bpftrace on the affected system shows that wg_prev_queue_enqueue
+ fails, thus the MAX_QUEUED_PACKETS (1024 skbs!) for the peer's
+ rx_queue is reached.
+ - Also, bpftrace shows that wg_packet_rx_poll for that peer is never
+ called again after reaching this state for that peer. For other
+ peers wg_packet_rx_poll does get called normally.
+ - Commit db9ae3b ("wireguard: device: enable threaded NAPI")
+ switched WireGuard to threaded NAPI by default. The default has
+ not been changed for triggering the issue, neither did CPU
+ hotplugging occur (i.e. 5bd8de2 ("wireguard: queueing: always
+ return valid online CPU in wg_cpumask_choose_online()")).
+ - The issue has been observed with stable kernels of v5.15 as well as
+ v6.1. It was reported to us that v5.10 stable is working fine, and
+ no report on v6.6 stable either (somewhat related discussion in [0]
+ though).
+ - In the WireGuard driver the only material difference between v5.10
+ stable and v5.15 stable is the switch to threaded NAPI by default.
+
+ [0] https://lore.kernel.org/netdev/CA+wXwBTT74RErDGAnj98PqS=wvdh8eM1pi4q6tTdExtjnokKqA@mail.gmail.com/
+
+Breakdown of the problem:
+
+ 1) skbs arriving for decryption are enqueued to the peer->rx_queue in
+ wg_packet_consume_data via wg_queue_enqueue_per_device_and_peer.
+ 2) The latter only moves the skb into the MPSC peer queue if it does
+ not surpass MAX_QUEUED_PACKETS (1024) which is kept track in an
+ atomic counter via wg_prev_queue_enqueue.
+ 3) In case enqueueing was successful, the skb is also queued up
+ in the device queue, round-robin picks a next online CPU, and
+ schedules the decryption worker.
+ 4) The wg_packet_decrypt_worker, once scheduled, picks these up
+ from the queue, decrypts the packets and once done calls into
+ wg_queue_enqueue_per_peer_rx.
+ 5) The latter updates the state to PACKET_STATE_CRYPTED on success
+ and calls napi_schedule on the per peer->napi instance.
+ 6) NAPI then polls via wg_packet_rx_poll. wg_prev_queue_peek checks
+ on the peer->rx_queue. It will wg_prev_queue_dequeue if the
+ queue->peeked skb was not cached yet, or just return the latter
+ otherwise. (wg_prev_queue_drop_peeked later clears the cache.)
+ 7) From an ordering perspective, the peer->rx_queue has skbs in order
+ while the device queue with the per-CPU worker threads from a
+ global ordering PoV can finish the decryption and signal the skb
+ PACKET_STATE_CRYPTED out of order.
+ 8) A situation can be observed that the first packet coming in will
+ be stuck waiting for the decryption worker to be scheduled for
+ a longer time when the system is under pressure.
+ 9) While this is the case, the other CPUs in the meantime finish
+ decryption and call into napi_schedule.
+ 10) Now in wg_packet_rx_poll it picks up the first in-order skb
+ from the peer->rx_queue and sees that its state is still
+ PACKET_STATE_UNCRYPTED. The NAPI poll routine then exits early
+ with work_done = 0 and calls napi_complete_done, signalling
+ it "finished" processing.
+ 11) The assumption in wg_packet_decrypt_worker is that when the
+ decryption finished the subsequent napi_schedule will always
+ lead to a later invocation of wg_packet_rx_poll to pick up
+ the finished packet.
+ 12) However, it appears that a later napi_schedule does /not/
+ schedule a later poll and thus no wg_packet_rx_poll.
+ 13) If this situation happens exactly for the corner case where
+ the decryption worker of the first packet is stuck and waiting
+ to be scheduled, and the network load for WireGuard is very
+ high then the queue can build up to MAX_QUEUED_PACKETS.
+ 14) If this situation occurs, then no new decryption worker will
+ be scheduled and also no new napi_schedule to make forward
+ progress.
+ 15) This means the peer->rx_queue stops processing packets completely
+ and they are indefinitely stuck waiting for a new NAPI poll on
+ that peer which never happens. New packets for that peer are
+ then dropped due to full queue, as it has been observed on the
+ production machines.
+
+Technically, the backport of commit db9ae3b6b43c ("wireguard: device:
+enable threaded NAPI") to stable should not have happened since it is
+more of an optimization rather than a pure fix and addresses a NAPI
+situation with utilizing many WireGuard tunnel devices in parallel.
+Revert it from stable given the backport triggers a regression for
+mentioned kernels.
+
+Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
+Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/net/wireguard/device.c | 1 -
+ 1 file changed, 1 deletion(-)
+
+--- a/drivers/net/wireguard/device.c
++++ b/drivers/net/wireguard/device.c
+@@ -364,7 +364,6 @@ static int wg_newlink(struct net *src_ne
+ if (ret < 0)
+ goto err_free_handshake_queue;
+
+- dev_set_threaded(dev, true);
+ ret = register_netdevice(dev);
+ if (ret < 0)
+ goto err_uninit_ratelimiter;