When deleting a peer in case of keepalive expiration, the peer is
removed from the OpenVPN hashtable and is temporary inserted in a
"release list" for further processing.
This happens in:
ovpn_peer_keepalive_work()
unlock_ovpn(release_list)
This processing includes detaching from the socket being used to
talk to this peer, by restoring its original proto and socket
ops/callbacks.
In case of TCP it may happen that, while the peer is sitting in
the release list, userspace decides to close the socket.
This will result in a concurrent execution of:
tcp_close(sk)
__tcp_close(sk)
sock_orphan(sk)
sk_set_socket(sk, NULL)
The last function call will set sk->sk_socket to NULL.
When the releasing routine is resumed, ovpn_tcp_socket_detach()
will attempt to dereference sk->sk_socket to restore its original
ops member. This operation will crash due to sk->sk_socket being NULL.
Fix this race condition by testing-and-accessing
sk->sk_socket atomically under sk->sk_callback_lock.
Link: https://lore.kernel.org/netdev/176996279620.3109699.15382994681575380467@eldamar.lan/
Link: https://github.com/OpenVPN/ovpn-net-next/issues/29
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Link: https://patch.msgid.link/20260212213130.11497-1-antonio@openvpn.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready;
sk->sk_write_space = peer->tcp.sk_cb.sk_write_space;
sk->sk_prot = peer->tcp.sk_cb.prot;
- sk->sk_socket->ops = peer->tcp.sk_cb.ops;
+
+ /* tcp_close() may race this function and could set
+ * sk->sk_socket to NULL. It does so by invoking
+ * sock_orphan(), which holds sk_callback_lock before
+ * doing the assignment.
+ *
+ * For this reason we acquire the same lock to avoid
+ * sk_socket to disappear under our feet
+ */
+ write_lock_bh(&sk->sk_callback_lock);
+ if (sk->sk_socket)
+ sk->sk_socket->ops = peer->tcp.sk_cb.ops;
+ write_unlock_bh(&sk->sk_callback_lock);
rcu_assign_sk_user_data(sk, NULL);
}