From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Tue, 16 Jul 2024 07:00:57 +0000 (+0200)
Subject: 4.19-stable patches
X-Git-Tag: v4.19.318~32
X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=2f6a88979d8d9b448550c53bb314504597fba29f;p=thirdparty%2Fkernel%2Fstable-queue.git

4.19-stable patches

added patches:
	net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch
	tcp-avoid-too-many-retransmit-packets.patch
	tcp-refactor-tcp_retransmit_timer.patch
	tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch
---

diff --git a/queue-4.19/net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch b/queue-4.19/net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch
new file mode 100644
index 00000000000..db8f898b7c8
--- /dev/null
+++ b/queue-4.19/net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch
@@ -0,0 +1,81 @@
+From 3vtKVZggKBvUbarjXwbqdlldib.ZljdobdheifkruclrkaXqflk.lod@flex--edumazet.bounces.google.com Tue Jul 16 03:54:07 2024
+From: Eric Dumazet <edumazet@google.com>
+Date: Tue, 16 Jul 2024 01:53:58 +0000
+Subject: net: tcp: fix unexcepted socket die when snd_wnd is 0
+To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>,  Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, eric.dumazet@gmail.com,  Neal Cardwell <ncardwell@google.com>, Jason Xing <kerneljasonxing@gmail.com>,  Jon Maxwell <jmaxwell37@gmail.com>, Kuniyuki Iwashima <kuniyu@amazon.com>,  Menglong Dong <imagedong@tencent.com>, Eric Dumazet <edumazet@google.com>
+Message-ID: <20240716015401.2365503-3-edumazet@google.com>
+
+From: Menglong Dong <imagedong@tencent.com>
+
+commit e89688e3e97868451a5d05b38a9d2633d6785cd4 upstream.
+
+In tcp_retransmit_timer(), a window shrunk connection will be regarded
+as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not
+right all the time.
+
+The retransmits will become zero-window probes in tcp_retransmit_timer()
+if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to
+TCP_RTO_MAX sooner or later.
+
+However, the timer can be delayed and be triggered after 122877ms, not
+TCP_RTO_MAX, as I tested.
+
+Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true
+once the RTO come up to TCP_RTO_MAX, and the socket will die.
+
+Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout',
+which is exact the timestamp of the timeout.
+
+However, "tp->rcv_tstamp" can restart from idle, then tp->rcv_tstamp
+could already be a long time (minutes or hours) in the past even on the
+first RTO. So we double check the timeout with the duration of the
+retransmission.
+
+Meanwhile, making "2 * TCP_RTO_MAX" as the timeout to avoid the socket
+dying too soon.
+
+Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
+Link: https://lore.kernel.org/netdev/CADxym3YyMiO+zMD4zj03YPM3FBi-1LHi6gSD2XT8pyAMM096pg@mail.gmail.com/
+Signed-off-by: Menglong Dong <imagedong@tencent.com>
+Reviewed-by: Eric Dumazet <edumazet@google.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ipv4/tcp_timer.c |   18 +++++++++++++++++-
+ 1 file changed, 17 insertions(+), 1 deletion(-)
+
+--- a/net/ipv4/tcp_timer.c
++++ b/net/ipv4/tcp_timer.c
+@@ -411,6 +411,22 @@ static void tcp_fastopen_synack_timer(st
+ 			  TCP_TIMEOUT_INIT << req->num_timeout, TCP_RTO_MAX);
+ }
+ 
++static bool tcp_rtx_probe0_timed_out(const struct sock *sk,
++				     const struct sk_buff *skb)
++{
++	const struct tcp_sock *tp = tcp_sk(sk);
++	const int timeout = TCP_RTO_MAX * 2;
++	u32 rcv_delta, rtx_delta;
++
++	rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
++	if (rcv_delta <= timeout)
++		return false;
++
++	rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) -
++			(tp->retrans_stamp ?: tcp_skb_timestamp(skb)));
++
++	return rtx_delta > timeout;
++}
+ 
+ /**
+  *  tcp_retransmit_timer() - The TCP retransmit timeout handler
+@@ -471,7 +487,7 @@ void tcp_retransmit_timer(struct sock *s
+ 					    tp->snd_una, tp->snd_nxt);
+ 		}
+ #endif
+-		if (tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX) {
++		if (tcp_rtx_probe0_timed_out(sk, skb)) {
+ 			tcp_write_err(sk);
+ 			goto out;
+ 		}
diff --git a/queue-4.19/series b/queue-4.19/series
index c0d409f3ef0..42f19315689 100644
--- a/queue-4.19/series
+++ b/queue-4.19/series
@@ -57,3 +57,7 @@ usb-gadget-configfs-prevent-oob-read-write-in-usb_string_copy.patch
 usb-core-fix-duplicate-endpoint-bug-by-clearing-reserved-bits-in-the-descriptor.patch
 hpet-support-32-bit-userspace.patch
 libceph-fix-race-between-delayed_work-and-ceph_monc_stop.patch
+tcp-refactor-tcp_retransmit_timer.patch
+net-tcp-fix-unexcepted-socket-die-when-snd_wnd-is-0.patch
+tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch
+tcp-avoid-too-many-retransmit-packets.patch
diff --git a/queue-4.19/tcp-avoid-too-many-retransmit-packets.patch b/queue-4.19/tcp-avoid-too-many-retransmit-packets.patch
new file mode 100644
index 00000000000..85c9adca888
--- /dev/null
+++ b/queue-4.19/tcp-avoid-too-many-retransmit-packets.patch
@@ -0,0 +1,74 @@
+From 3wdKVZggKBvgjizrf4jylttlqj.htrlwjlpmqnsz2ktzsifynts.twl@flex--edumazet.bounces.google.com Tue Jul 16 03:54:11 2024
+From: Eric Dumazet <edumazet@google.com>
+Date: Tue, 16 Jul 2024 01:54:00 +0000
+Subject: tcp: avoid too many retransmit packets
+To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>,  Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, eric.dumazet@gmail.com,  Neal Cardwell <ncardwell@google.com>, Jason Xing <kerneljasonxing@gmail.com>,  Jon Maxwell <jmaxwell37@gmail.com>, Kuniyuki Iwashima <kuniyu@amazon.com>,  Eric Dumazet <edumazet@google.com>
+Message-ID: <20240716015401.2365503-5-edumazet@google.com>
+
+From: Eric Dumazet <edumazet@google.com>
+
+commit 97a9063518f198ec0adb2ecb89789de342bb8283 upstream.
+
+If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
+retracted its window to zero, tcp_retransmit_timer() can
+retransmit a packet every two jiffies (2 ms for HZ=1000),
+for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
+
+The fix is to make sure tcp_rtx_probe0_timed_out() takes
+icsk->icsk_user_timeout into account.
+
+Before blamed commit, the socket would not timeout after
+icsk->icsk_user_timeout, but would use standard exponential
+backoff for the retransmits.
+
+Also worth noting that before commit e89688e3e978 ("net: tcp:
+fix unexcepted socket die when snd_wnd is 0"), the issue
+would last 2 minutes instead of 4.
+
+Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Cc: Neal Cardwell <ncardwell@google.com>
+Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
+Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ipv4/tcp_timer.c |   16 ++++++++++++++--
+ 1 file changed, 14 insertions(+), 2 deletions(-)
+
+--- a/net/ipv4/tcp_timer.c
++++ b/net/ipv4/tcp_timer.c
+@@ -414,16 +414,28 @@ static void tcp_fastopen_synack_timer(st
+ static bool tcp_rtx_probe0_timed_out(const struct sock *sk,
+ 				     const struct sk_buff *skb)
+ {
++	const struct inet_connection_sock *icsk = inet_csk(sk);
++	u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout);
+ 	const struct tcp_sock *tp = tcp_sk(sk);
+-	const int timeout = TCP_RTO_MAX * 2;
++	int timeout = TCP_RTO_MAX * 2;
+ 	u32 rtx_delta;
+ 	s32 rcv_delta;
+ 
++	if (user_timeout) {
++		/* If user application specified a TCP_USER_TIMEOUT,
++		 * it does not want win 0 packets to 'reset the timer'
++		 * while retransmits are not making progress.
++		 */
++		if (rtx_delta > user_timeout)
++			return true;
++		timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout));
++	}
++
+ 	/* Note: timer interrupt might have been delayed by at least one jiffy,
+ 	 * and tp->rcv_tstamp might very well have been written recently.
+ 	 * rcv_delta can thus be negative.
+ 	 */
+-	rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
++	rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp;
+ 	if (rcv_delta <= timeout)
+ 		return false;
+ 
diff --git a/queue-4.19/tcp-refactor-tcp_retransmit_timer.patch b/queue-4.19/tcp-refactor-tcp_retransmit_timer.patch
new file mode 100644
index 00000000000..72bf6c7d563
--- /dev/null
+++ b/queue-4.19/tcp-refactor-tcp_retransmit_timer.patch
@@ -0,0 +1,63 @@
+From 3vdKVZggKBvQaZqiWvapckkcha.YkicnacgdhejqtbkqjZWpekj.knc@flex--edumazet.bounces.google.com Tue Jul 16 03:54:05 2024
+From: Eric Dumazet <edumazet@google.com>
+Date: Tue, 16 Jul 2024 01:53:57 +0000
+Subject: tcp: refactor tcp_retransmit_timer()
+To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>,  Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, eric.dumazet@gmail.com,  Neal Cardwell <ncardwell@google.com>, Jason Xing <kerneljasonxing@gmail.com>,  Jon Maxwell <jmaxwell37@gmail.com>, Kuniyuki Iwashima <kuniyu@amazon.com>,  Eric Dumazet <edumazet@google.com>, Willem de Bruijn <willemb@google.com>,  Soheil Hassas Yeganeh <soheil@google.com>
+Message-ID: <20240716015401.2365503-2-edumazet@google.com>
+
+From: Eric Dumazet <edumazet@google.com>
+
+commit 0d580fbd2db084a5c96ee9c00492236a279d5e0f upstream.
+
+It appears linux-4.14 stable needs a backport of commit
+88f8598d0a30 ("tcp: exit if nothing to retransmit on RTO timeout")
+
+Since tcp_rtx_queue_empty() is not in pre 4.15 kernels,
+let's refactor tcp_retransmit_timer() to only use tcp_rtx_queue_head()
+
+I will provide to stable teams the squashed patches.
+
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Cc: Willem de Bruijn <willemb@google.com>
+Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ipv4/tcp_timer.c |   10 ++++++++--
+ 1 file changed, 8 insertions(+), 2 deletions(-)
+
+--- a/net/ipv4/tcp_timer.c
++++ b/net/ipv4/tcp_timer.c
+@@ -391,6 +391,7 @@ static void tcp_fastopen_synack_timer(st
+ 	int max_retries = icsk->icsk_syn_retries ? :
+ 	    sock_net(sk)->ipv4.sysctl_tcp_synack_retries + 1; /* add one more retry for fastopen */
+ 	struct request_sock *req;
++	struct sk_buff *skb;
+ 
+ 	req = tcp_sk(sk)->fastopen_rsk;
+ 	req->rsk_ops->syn_ack_timeout(req);
+@@ -438,7 +439,12 @@ void tcp_retransmit_timer(struct sock *s
+ 		 */
+ 		return;
+ 	}
+-	if (!tp->packets_out || WARN_ON_ONCE(tcp_rtx_queue_empty(sk)))
++
++	if (!tp->packets_out)
++		return;
++
++	skb = tcp_rtx_queue_head(sk);
++	if (WARN_ON_ONCE(!skb))
+ 		return;
+ 
+ 	if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) &&
+@@ -470,7 +476,7 @@ void tcp_retransmit_timer(struct sock *s
+ 			goto out;
+ 		}
+ 		tcp_enter_loss(sk);
+-		tcp_retransmit_skb(sk, tcp_rtx_queue_head(sk), 1);
++		tcp_retransmit_skb(sk, skb, 1);
+ 		__sk_dst_reset(sk);
+ 		goto out_reset_timer;
+ 	}
diff --git a/queue-4.19/tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch b/queue-4.19/tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch
new file mode 100644
index 00000000000..0fce8afb9f5
--- /dev/null
+++ b/queue-4.19/tcp-use-signed-arithmetic-in-tcp_rtx_probe0_timed_out.patch
@@ -0,0 +1,54 @@
+From 3wNKVZggKBvcdctlZydsfnnfkd.bnlfqdfjgkhmtwentmcZshnm.nqf@flex--edumazet.bounces.google.com Tue Jul 16 03:54:09 2024
+From: Eric Dumazet <edumazet@google.com>
+Date: Tue, 16 Jul 2024 01:53:59 +0000
+Subject: tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
+To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>,  Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, eric.dumazet@gmail.com,  Neal Cardwell <ncardwell@google.com>, Jason Xing <kerneljasonxing@gmail.com>,  Jon Maxwell <jmaxwell37@gmail.com>, Kuniyuki Iwashima <kuniyu@amazon.com>,  Eric Dumazet <edumazet@google.com>, Menglong Dong <imagedong@tencent.com>
+Message-ID: <20240716015401.2365503-4-edumazet@google.com>
+
+From: Eric Dumazet <edumazet@google.com>
+
+commit 36534d3c54537bf098224a32dc31397793d4594d upstream.
+
+Due to timer wheel implementation, a timer will usually fire
+after its schedule.
+
+For instance, for HZ=1000, a timeout between 512ms and 4s
+has a granularity of 64ms.
+For this range of values, the extra delay could be up to 63ms.
+
+For TCP, this means that tp->rcv_tstamp may be after
+inet_csk(sk)->icsk_timeout whenever the timer interrupt
+finally triggers, if one packet came during the extra delay.
+
+We need to make sure tcp_rtx_probe0_timed_out() handles this case.
+
+Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Cc: Menglong Dong <imagedong@tencent.com>
+Acked-by: Neal Cardwell <ncardwell@google.com>
+Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
+Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ net/ipv4/tcp_timer.c |    7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+--- a/net/ipv4/tcp_timer.c
++++ b/net/ipv4/tcp_timer.c
+@@ -416,8 +416,13 @@ static bool tcp_rtx_probe0_timed_out(con
+ {
+ 	const struct tcp_sock *tp = tcp_sk(sk);
+ 	const int timeout = TCP_RTO_MAX * 2;
+-	u32 rcv_delta, rtx_delta;
++	u32 rtx_delta;
++	s32 rcv_delta;
+ 
++	/* Note: timer interrupt might have been delayed by at least one jiffy,
++	 * and tp->rcv_tstamp might very well have been written recently.
++	 * rcv_delta can thus be negative.
++	 */
+ 	rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
+ 	if (rcv_delta <= timeout)
+ 		return false;