From: Greg Kroah-Hartman Date: Tue, 17 Feb 2026 11:27:25 +0000 (+0100) Subject: 6.12-stable patches X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=ae21e5be4409d5606b722684a9780bad7b896ab1;p=thirdparty%2Fkernel%2Fstable-queue.git 6.12-stable patches added patches: revert-wireguard-device-enable-threaded-napi.patch --- diff --git a/queue-6.12/revert-wireguard-device-enable-threaded-napi.patch b/queue-6.12/revert-wireguard-device-enable-threaded-napi.patch new file mode 100644 index 0000000000..3d4dc14fb6 --- /dev/null +++ b/queue-6.12/revert-wireguard-device-enable-threaded-napi.patch @@ -0,0 +1,122 @@ +From daniel@iogearbox.net Tue Feb 17 11:33:58 2026 +From: Daniel Borkmann +Date: Mon, 16 Feb 2026 22:31:13 +0100 +Subject: Revert "wireguard: device: enable threaded NAPI" +To: gregkh@linuxfoundation.org +Cc: stable@vger.kernel.org, netdev@vger.kernel.org, Jason@zx2c4.com, kuba@kernel.org +Message-ID: + +From: Daniel Borkmann + +This reverts commit 933466fc50a8e4eb167acbd0d8ec96a078462e9c which is +commit db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e upstream. + +We have had three independent production user reports in combination +with Cilium utilizing WireGuard as encryption underneath that k8s Pod +E/W traffic to certain peer nodes fully stalled. The situation appears +as follows: + + - Occurs very rarely but at random times under heavy networking load. + - Once the issue triggers the decryption side stops working completely + for that WireGuard peer, other peers keep working fine. The stall + happens also for newly initiated connections towards that particular + WireGuard peer. + - Only the decryption side is affected, never the encryption side. + - Once it triggers, it never recovers and remains in this state, + the CPU/mem on that node looks normal, no leak, busy loop or crash. + - bpftrace on the affected system shows that wg_prev_queue_enqueue + fails, thus the MAX_QUEUED_PACKETS (1024 skbs!) for the peer's + rx_queue is reached. + - Also, bpftrace shows that wg_packet_rx_poll for that peer is never + called again after reaching this state for that peer. For other + peers wg_packet_rx_poll does get called normally. + - Commit db9ae3b ("wireguard: device: enable threaded NAPI") + switched WireGuard to threaded NAPI by default. The default has + not been changed for triggering the issue, neither did CPU + hotplugging occur (i.e. 5bd8de2 ("wireguard: queueing: always + return valid online CPU in wg_cpumask_choose_online()")). + - The issue has been observed with stable kernels of v5.15 as well as + v6.1. It was reported to us that v5.10 stable is working fine, and + no report on v6.6 stable either (somewhat related discussion in [0] + though). + - In the WireGuard driver the only material difference between v5.10 + stable and v5.15 stable is the switch to threaded NAPI by default. + + [0] https://lore.kernel.org/netdev/CA+wXwBTT74RErDGAnj98PqS=wvdh8eM1pi4q6tTdExtjnokKqA@mail.gmail.com/ + +Breakdown of the problem: + + 1) skbs arriving for decryption are enqueued to the peer->rx_queue in + wg_packet_consume_data via wg_queue_enqueue_per_device_and_peer. + 2) The latter only moves the skb into the MPSC peer queue if it does + not surpass MAX_QUEUED_PACKETS (1024) which is kept track in an + atomic counter via wg_prev_queue_enqueue. + 3) In case enqueueing was successful, the skb is also queued up + in the device queue, round-robin picks a next online CPU, and + schedules the decryption worker. + 4) The wg_packet_decrypt_worker, once scheduled, picks these up + from the queue, decrypts the packets and once done calls into + wg_queue_enqueue_per_peer_rx. + 5) The latter updates the state to PACKET_STATE_CRYPTED on success + and calls napi_schedule on the per peer->napi instance. + 6) NAPI then polls via wg_packet_rx_poll. wg_prev_queue_peek checks + on the peer->rx_queue. It will wg_prev_queue_dequeue if the + queue->peeked skb was not cached yet, or just return the latter + otherwise. (wg_prev_queue_drop_peeked later clears the cache.) + 7) From an ordering perspective, the peer->rx_queue has skbs in order + while the device queue with the per-CPU worker threads from a + global ordering PoV can finish the decryption and signal the skb + PACKET_STATE_CRYPTED out of order. + 8) A situation can be observed that the first packet coming in will + be stuck waiting for the decryption worker to be scheduled for + a longer time when the system is under pressure. + 9) While this is the case, the other CPUs in the meantime finish + decryption and call into napi_schedule. + 10) Now in wg_packet_rx_poll it picks up the first in-order skb + from the peer->rx_queue and sees that its state is still + PACKET_STATE_UNCRYPTED. The NAPI poll routine then exits early + with work_done = 0 and calls napi_complete_done, signalling + it "finished" processing. + 11) The assumption in wg_packet_decrypt_worker is that when the + decryption finished the subsequent napi_schedule will always + lead to a later invocation of wg_packet_rx_poll to pick up + the finished packet. + 12) However, it appears that a later napi_schedule does /not/ + schedule a later poll and thus no wg_packet_rx_poll. + 13) If this situation happens exactly for the corner case where + the decryption worker of the first packet is stuck and waiting + to be scheduled, and the network load for WireGuard is very + high then the queue can build up to MAX_QUEUED_PACKETS. + 14) If this situation occurs, then no new decryption worker will + be scheduled and also no new napi_schedule to make forward + progress. + 15) This means the peer->rx_queue stops processing packets completely + and they are indefinitely stuck waiting for a new NAPI poll on + that peer which never happens. New packets for that peer are + then dropped due to full queue, as it has been observed on the + production machines. + +Technically, the backport of commit db9ae3b6b43c ("wireguard: device: +enable threaded NAPI") to stable should not have happened since it is +more of an optimization rather than a pure fix and addresses a NAPI +situation with utilizing many WireGuard tunnel devices in parallel. +Revert it from stable given the backport triggers a regression for +mentioned kernels. + +Signed-off-by: Daniel Borkmann +Acked-by: Jason A. Donenfeld +Signed-off-by: Greg Kroah-Hartman +--- + drivers/net/wireguard/device.c | 1 - + 1 file changed, 1 deletion(-) + +--- a/drivers/net/wireguard/device.c ++++ b/drivers/net/wireguard/device.c +@@ -364,7 +364,6 @@ static int wg_newlink(struct net *src_ne + if (ret < 0) + goto err_free_handshake_queue; + +- dev_set_threaded(dev, true); + ret = register_netdevice(dev); + if (ret < 0) + goto err_uninit_ratelimiter; diff --git a/queue-6.12/series b/queue-6.12/series index ca743501a0..92845ad7e7 100644 --- a/queue-6.12/series +++ b/queue-6.12/series @@ -28,3 +28,4 @@ mm-hugetlb-fix-hugetlb_pmd_shared.patch mm-hugetlb-fix-two-comments-related-to-huge_pmd_unshare.patch mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch loongarch-rework-kasan-initialization-for-ptw-enabled-systems.patch +revert-wireguard-device-enable-threaded-napi.patch