]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
tcp: give up on stronger sk_rcvbuf checks (for now)
authorJakub Kicinski <kuba@kernel.org>
Fri, 27 Feb 2026 00:33:59 +0000 (16:33 -0800)
committerJakub Kicinski <kuba@kernel.org>
Sat, 28 Feb 2026 15:55:39 +0000 (07:55 -0800)
commit026dfef287c07f37d4d4eef7a0b5a4bfdb29b32d
treeece4b8cfde45267f87cb8a45b57c5f19e0771bfa
parent6996a2d2d0a64808c19c98002aeb5d9d1b2df6a4
tcp: give up on stronger sk_rcvbuf checks (for now)

We hit another corner case which leads to TcpExtTCPRcvQDrop

Connections which send RPCs in the 20-80kB range over loopback
experience spurious drops. The exact conditions for most of
the drops I investigated are that:
 - socket exchanged >1MB of data so its not completely fresh
 - rcvbuf is around 128kB (default, hasn't grown)
 - there is ~60kB of data in rcvq
 - skb > 64kB arrives

The sum of skb->len (!) of both of the skbs (the one already
in rcvq and the arriving one) is larger than rwnd.
My suspicion is that this happens because __tcp_select_window()
rounds the rwnd up to (1 << wscale) if less than half of
the rwnd has been consumed.

Eric suggests that given the number of Fixes we already have
pointing to 1d2fbaad7cd8 it's probably time to give up on it,
until a bigger revamp of rmem management.

Also while we could risk tweaking the rwnd math, there are other
drops on workloads I investigated, after the commit in question,
not explained by this phenomenon.

Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/20260225122355.585fd57b@kernel.org
Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260227003359.2391017-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/ipv4/tcp_input.c