From: Greg Kroah-Hartman Date: Mon, 29 Jan 2018 10:27:04 +0000 (+0100) Subject: 4.4-stable patches X-Git-Tag: v4.4.114~10 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=3074d1ad1f1891acb511b4f64c5e9779ed440d13;p=thirdparty%2Fkernel%2Fstable-queue.git 4.4-stable patches added patches: net-tcp-close-sock-if-net-namespace-is-exiting.patch --- diff --git a/queue-4.4/net-tcp-close-sock-if-net-namespace-is-exiting.patch b/queue-4.4/net-tcp-close-sock-if-net-namespace-is-exiting.patch new file mode 100644 index 00000000000..074d89596ff --- /dev/null +++ b/queue-4.4/net-tcp-close-sock-if-net-namespace-is-exiting.patch @@ -0,0 +1,120 @@ +From foo@baz Mon Jan 29 10:14:57 CET 2018 +From: Dan Streetman +Date: Thu, 18 Jan 2018 16:14:26 -0500 +Subject: net: tcp: close sock if net namespace is exiting + +From: Dan Streetman + + +[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ] + +When a tcp socket is closed, if it detects that its net namespace is +exiting, close immediately and do not wait for FIN sequence. + +For normal sockets, a reference is taken to their net namespace, so it will +never exit while the socket is open. However, kernel sockets do not take a +reference to their net namespace, so it may begin exiting while the kernel +socket is still open. In this case if the kernel socket is a tcp socket, +it will stay open trying to complete its close sequence. The sock's dst(s) +hold a reference to their interface, which are all transferred to the +namespace's loopback interface when the real interfaces are taken down. +When the namespace tries to take down its loopback interface, it hangs +waiting for all references to the loopback interface to release, which +results in messages like: + +unregister_netdevice: waiting for lo to become free. Usage count = 1 + +These messages continue until the socket finally times out and closes. +Since the net namespace cleanup holds the net_mutex while calling its +registered pernet callbacks, any new net namespace initialization is +blocked until the current net namespace finishes exiting. + +After this change, the tcp socket notices the exiting net namespace, and +closes immediately, releasing its dst(s) and their reference to the +loopback interface, which lets the net namespace continue exiting. + +Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 +Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 +Signed-off-by: Dan Streetman +Signed-off-by: David S. Miller +Signed-off-by: Greg Kroah-Hartman +--- + include/net/net_namespace.h | 10 ++++++++++ + net/ipv4/tcp.c | 3 +++ + net/ipv4/tcp_timer.c | 15 +++++++++++++++ + 3 files changed, 28 insertions(+) + +--- a/include/net/net_namespace.h ++++ b/include/net/net_namespace.h +@@ -209,6 +209,11 @@ int net_eq(const struct net *net1, const + return net1 == net2; + } + ++static inline int check_net(const struct net *net) ++{ ++ return atomic_read(&net->count) != 0; ++} ++ + void net_drop_ns(void *); + + #else +@@ -232,6 +237,11 @@ int net_eq(const struct net *net1, const + { + return 1; + } ++ ++static inline int check_net(const struct net *net) ++{ ++ return 1; ++} + + #define net_drop_ns NULL + #endif +--- a/net/ipv4/tcp.c ++++ b/net/ipv4/tcp.c +@@ -2176,6 +2176,9 @@ adjudge_to_death: + tcp_send_active_reset(sk, GFP_ATOMIC); + NET_INC_STATS_BH(sock_net(sk), + LINUX_MIB_TCPABORTONMEMORY); ++ } else if (!check_net(sock_net(sk))) { ++ /* Not possible to send reset; just close */ ++ tcp_set_state(sk, TCP_CLOSE); + } + } + +--- a/net/ipv4/tcp_timer.c ++++ b/net/ipv4/tcp_timer.c +@@ -46,11 +46,19 @@ static void tcp_write_err(struct sock *s + * to prevent DoS attacks. It is called when a retransmission timeout + * or zero probe timeout occurs on orphaned socket. + * ++ * Also close if our net namespace is exiting; in that case there is no ++ * hope of ever communicating again since all netns interfaces are already ++ * down (or about to be down), and we need to release our dst references, ++ * which have been moved to the netns loopback interface, so the namespace ++ * can finish exiting. This condition is only possible if we are a kernel ++ * socket, as those do not hold references to the namespace. ++ * + * Criteria is still not confirmed experimentally and may change. + * We kill the socket, if: + * 1. If number of orphaned sockets exceeds an administratively configured + * limit. + * 2. If we have strong memory pressure. ++ * 3. If our net namespace is exiting. + */ + static int tcp_out_of_resources(struct sock *sk, bool do_reset) + { +@@ -79,6 +87,13 @@ static int tcp_out_of_resources(struct s + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); + return 1; + } ++ ++ if (!check_net(sock_net(sk))) { ++ /* Not possible to send reset; just close */ ++ tcp_done(sk); ++ return 1; ++ } ++ + return 0; + } + diff --git a/queue-4.4/series b/queue-4.4/series index d9c5acfb819..34334d77cf5 100644 --- a/queue-4.4/series +++ b/queue-4.4/series @@ -71,3 +71,4 @@ vmxnet3-repair-memory-leak.patch net-allow-neigh-contructor-functions-ability-to-modify-the-primary_key.patch ipv4-make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-inaddr_any.patch flow_dissector-properly-cap-thoff-field.patch +net-tcp-close-sock-if-net-namespace-is-exiting.patch