From: Greg Kroah-Hartman Date: Tue, 16 Jul 2024 07:00:57 +0000 (+0200) Subject: 4.19-stable patches X-Git-Tag: v4.19.318~32 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=2f6a88979d8d9b448550c53bb314504597fba29f;p=thirdparty%2Fkernel%2Fstable-queue.git 4.19-stable patches added patches: net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch tcp-avoid-too-many-retransmit-packets.patch tcp-refactor-tcp_retransmit_timer.patch tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch --- diff --git a/queue-4.19/net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch b/queue-4.19/net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch new file mode 100644 index 00000000000..db8f898b7c8 --- /dev/null +++ b/queue-4.19/net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch @@ -0,0 +1,81 @@ +From 3vtKVZggKBvUbarjXwbqdlldib.ZljdobdheifkruclrkaXqflk.lod@flex--edumazet.bounces.google.com Tue Jul 16 03:54:07 2024 +From: Eric Dumazet +Date: Tue, 16 Jul 2024 01:53:58 +0000 +Subject: net: tcp: fix unexcepted socket die when snd_wnd is 0 +To: Greg Kroah-Hartman +Cc: "David S . Miller" , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, eric.dumazet@gmail.com, Neal Cardwell , Jason Xing , Jon Maxwell , Kuniyuki Iwashima , Menglong Dong , Eric Dumazet +Message-ID: <20240716015401.2365503-3-edumazet@google.com> + +From: Menglong Dong + +commit e89688e3e97868451a5d05b38a9d2633d6785cd4 upstream. + +In tcp_retransmit_timer(), a window shrunk connection will be regarded +as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not +right all the time. + +The retransmits will become zero-window probes in tcp_retransmit_timer() +if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to +TCP_RTO_MAX sooner or later. + +However, the timer can be delayed and be triggered after 122877ms, not +TCP_RTO_MAX, as I tested. + +Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true +once the RTO come up to TCP_RTO_MAX, and the socket will die. + +Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout', +which is exact the timestamp of the timeout. + +However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp +could already be a long time (minutes or hours) in the past even on the +first RTO. So we double check the timeout with the duration of the +retransmission. + +Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket +dying too soon. + +Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") +Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAMM096pg@mail.gmail.com/ +Signed-off-by: Menglong Dong +Reviewed-by: Eric Dumazet +Signed-off-by: David S. Miller +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/tcp_timer.c | 18 +++++++++++++++++- + 1 file changed, 17 insertions(+), 1 deletion(-) + +--- a/net/ipv4/tcp_timer.c ++++ b/net/ipv4/tcp_timer.c +@@ -411,6 +411,22 @@ static void tcp_fastopen_synack_timer(st + TCP_TIMEOUT_INIT << req->num_timeout, TCP_RTO_MAX); + } + ++static bool tcp_rtx_probe0_timed_out(const struct sock *sk, ++ const struct sk_buff *skb) ++{ ++ const struct tcp_sock *tp = tcp_sk(sk); ++ const int timeout = TCP_RTO_MAX * 2; ++ u32 rcv_delta, rtx_delta; ++ ++ rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; ++ if (rcv_delta <= timeout) ++ return false; ++ ++ rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - ++ (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); ++ ++ return rtx_delta > timeout; ++} + + /** + * tcp_retransmit_timer() - The TCP retransmit timeout handler +@@ -471,7 +487,7 @@ void tcp_retransmit_timer(struct sock *s + tp->snd_una, tp->snd_nxt); + } + #endif +- if (tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX) { ++ if (tcp_rtx_probe0_timed_out(sk, skb)) { + tcp_write_err(sk); + goto out; + } diff --git a/queue-4.19/series b/queue-4.19/series index c0d409f3ef0..42f19315689 100644 --- a/queue-4.19/series +++ b/queue-4.19/series @@ -57,3 +57,7 @@ usb-gadget-configfs-prevent-oob-read-write-in-usb_string_copy.patch usb-core-fix-duplicate-endpoint-bug-by-clearing-reserved-bits-in-the-descriptor.patch hpet-support-32-bit-userspace.patch libceph-fix-race-between-delayed_work-and-ceph_monc_stop.patch +tcp-refactor-tcp_retransmit_timer.patch +net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch +tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch +tcp-avoid-too-many-retransmit-packets.patch diff --git a/queue-4.19/tcp-avoid-too-many-retransmit-packets.patch b/queue-4.19/tcp-avoid-too-many-retransmit-packets.patch new file mode 100644 index 00000000000..85c9adca888 --- /dev/null +++ b/queue-4.19/tcp-avoid-too-many-retransmit-packets.patch @@ -0,0 +1,74 @@ +From 3wdKVZggKBvgjizrf4jylttlqj.htrlwjlpmqnsz2ktzsifynts.twl@flex--edumazet.bounces.google.com Tue Jul 16 03:54:11 2024 +From: Eric Dumazet +Date: Tue, 16 Jul 2024 01:54:00 +0000 +Subject: tcp: avoid too many retransmit packets +To: Greg Kroah-Hartman +Cc: "David S . Miller" , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, eric.dumazet@gmail.com, Neal Cardwell , Jason Xing , Jon Maxwell , Kuniyuki Iwashima , Eric Dumazet +Message-ID: <20240716015401.2365503-5-edumazet@google.com> + +From: Eric Dumazet + +commit 97a9063518f198ec0adb2ecb89789de342bb8283 upstream. + +If a TCP socket is using TCP_USER_TIMEOUT, and the other peer +retracted its window to zero, tcp_retransmit_timer() can +retransmit a packet every two jiffies (2 ms for HZ=1000), +for about 4 minutes after TCP_USER_TIMEOUT has 'expired'. + +The fix is to make sure tcp_rtx_probe0_timed_out() takes +icsk->icsk_user_timeout into account. + +Before blamed commit, the socket would not timeout after +icsk->icsk_user_timeout, but would use standard exponential +backoff for the retransmits. + +Also worth noting that before commit e89688e3e978 ("net: tcp: +fix unexcepted socket die when snd_wnd is 0"), the issue +would last 2 minutes instead of 4. + +Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") +Signed-off-by: Eric Dumazet +Cc: Neal Cardwell +Reviewed-by: Jason Xing +Reviewed-by: Jon Maxwell +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/tcp_timer.c | 16 ++++++++++++++-- + 1 file changed, 14 insertions(+), 2 deletions(-) + +--- a/net/ipv4/tcp_timer.c ++++ b/net/ipv4/tcp_timer.c +@@ -414,16 +414,28 @@ static void tcp_fastopen_synack_timer(st + static bool tcp_rtx_probe0_timed_out(const struct sock *sk, + const struct sk_buff *skb) + { ++ const struct inet_connection_sock *icsk = inet_csk(sk); ++ u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout); + const struct tcp_sock *tp = tcp_sk(sk); +- const int timeout = TCP_RTO_MAX * 2; ++ int timeout = TCP_RTO_MAX * 2; + u32 rtx_delta; + s32 rcv_delta; + ++ if (user_timeout) { ++ /* If user application specified a TCP_USER_TIMEOUT, ++ * it does not want win 0 packets to 'reset the timer' ++ * while retransmits are not making progress. ++ */ ++ if (rtx_delta > user_timeout) ++ return true; ++ timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout)); ++ } ++ + /* Note: timer interrupt might have been delayed by at least one jiffy, + * and tp->rcv_tstamp might very well have been written recently. + * rcv_delta can thus be negative. + */ +- rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; ++ rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp; + if (rcv_delta <= timeout) + return false; + diff --git a/queue-4.19/tcp-refactor-tcp_retransmit_timer.patch b/queue-4.19/tcp-refactor-tcp_retransmit_timer.patch new file mode 100644 index 00000000000..72bf6c7d563 --- /dev/null +++ b/queue-4.19/tcp-refactor-tcp_retransmit_timer.patch @@ -0,0 +1,63 @@ +From 3vdKVZggKBvQaZqiWvapckkcha.YkicnacgdhejqtbkqjZWpekj.knc@flex--edumazet.bounces.google.com Tue Jul 16 03:54:05 2024 +From: Eric Dumazet +Date: Tue, 16 Jul 2024 01:53:57 +0000 +Subject: tcp: refactor tcp_retransmit_timer() +To: Greg Kroah-Hartman +Cc: "David S . Miller" , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, eric.dumazet@gmail.com, Neal Cardwell , Jason Xing , Jon Maxwell , Kuniyuki Iwashima , Eric Dumazet , Willem de Bruijn , Soheil Hassas Yeganeh +Message-ID: <20240716015401.2365503-2-edumazet@google.com> + +From: Eric Dumazet + +commit 0d580fbd2db084a5c96ee9c00492236a279d5e0f upstream. + +It appears linux-4.14 stable needs a backport of commit +88f8598d0a30 ("tcp: exit if nothing to retransmit on RTO timeout") + +Since tcp_rtx_queue_empty() is not in pre 4.15 kernels, +let's refactor tcp_retransmit_timer() to only use tcp_rtx_queue_head() + +I will provide to stable teams the squashed patches. + +Signed-off-by: Eric Dumazet +Cc: Willem de Bruijn +Cc: Greg Kroah-Hartman +Acked-by: Soheil Hassas Yeganeh +Signed-off-by: David S. Miller +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/tcp_timer.c | 10 ++++++++-- + 1 file changed, 8 insertions(+), 2 deletions(-) + +--- a/net/ipv4/tcp_timer.c ++++ b/net/ipv4/tcp_timer.c +@@ -391,6 +391,7 @@ static void tcp_fastopen_synack_timer(st + int max_retries = icsk->icsk_syn_retries ? : + sock_net(sk)->ipv4.sysctl_tcp_synack_retries + 1; /* add one more retry for fastopen */ + struct request_sock *req; ++ struct sk_buff *skb; + + req = tcp_sk(sk)->fastopen_rsk; + req->rsk_ops->syn_ack_timeout(req); +@@ -438,7 +439,12 @@ void tcp_retransmit_timer(struct sock *s + */ + return; + } +- if (!tp->packets_out || WARN_ON_ONCE(tcp_rtx_queue_empty(sk))) ++ ++ if (!tp->packets_out) ++ return; ++ ++ skb = tcp_rtx_queue_head(sk); ++ if (WARN_ON_ONCE(!skb)) + return; + + if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) && +@@ -470,7 +476,7 @@ void tcp_retransmit_timer(struct sock *s + goto out; + } + tcp_enter_loss(sk); +- tcp_retransmit_skb(sk, tcp_rtx_queue_head(sk), 1); ++ tcp_retransmit_skb(sk, skb, 1); + __sk_dst_reset(sk); + goto out_reset_timer; + } diff --git a/queue-4.19/tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch b/queue-4.19/tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch new file mode 100644 index 00000000000..0fce8afb9f5 --- /dev/null +++ b/queue-4.19/tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch @@ -0,0 +1,54 @@ +From 3wNKVZggKBvcdctlZydsfnnfkd.bnlfqdfjgkhmtwentmcZshnm.nqf@flex--edumazet.bounces.google.com Tue Jul 16 03:54:09 2024 +From: Eric Dumazet +Date: Tue, 16 Jul 2024 01:53:59 +0000 +Subject: tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() +To: Greg Kroah-Hartman +Cc: "David S . Miller" , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, eric.dumazet@gmail.com, Neal Cardwell , Jason Xing , Jon Maxwell , Kuniyuki Iwashima , Eric Dumazet , Menglong Dong +Message-ID: <20240716015401.2365503-4-edumazet@google.com> + +From: Eric Dumazet + +commit 36534d3c54537bf098224a32dc31397793d4594d upstream. + +Due to timer wheel implementation, a timer will usually fire +after its schedule. + +For instance, for HZ=1000, a timeout between 512ms and 4s +has a granularity of 64ms. +For this range of values, the extra delay could be up to 63ms. + +For TCP, this means that tp->rcv_tstamp may be after +inet_csk(sk)->icsk_timeout whenever the timer interrupt +finally triggers, if one packet came during the extra delay. + +We need to make sure tcp_rtx_probe0_timed_out() handles this case. + +Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") +Signed-off-by: Eric Dumazet +Cc: Menglong Dong +Acked-by: Neal Cardwell +Reviewed-by: Jason Xing +Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +--- + net/ipv4/tcp_timer.c | 7 ++++++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +--- a/net/ipv4/tcp_timer.c ++++ b/net/ipv4/tcp_timer.c +@@ -416,8 +416,13 @@ static bool tcp_rtx_probe0_timed_out(con + { + const struct tcp_sock *tp = tcp_sk(sk); + const int timeout = TCP_RTO_MAX * 2; +- u32 rcv_delta, rtx_delta; ++ u32 rtx_delta; ++ s32 rcv_delta; + ++ /* Note: timer interrupt might have been delayed by at least one jiffy, ++ * and tp->rcv_tstamp might very well have been written recently. ++ * rcv_delta can thus be negative. ++ */ + rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; + if (rcv_delta <= timeout) + return false;