From 3aab17bd56614f05cfbec553e618b774ed07cd45 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Fri, 20 Nov 2020 17:22:44 +0100 Subject: [PATCH] BUG/MAJOR: connection: reset conn->owner when detaching from session list Baptiste reported a new crash affecting 2.3 which can be triggered when using H2 on the backend, with http-reuse always and with a tens of clients doing close only. There are a few combined cases which cause this to happen, but each time the issue is the same, an already freed session is dereferenced in session_unown_conn(). Two cases were identified to cause this: - a connection referencing a session as its owner, which is detached from the session's list and is destroyed after this session ends. The test on conn->owner before calling session_unown_conn() is not sufficent as the pointer is not null but is not valid anymore. - a connection that never goes idle and that gets killed form the mux, where session_free() is called first, then conn_free() calls session_unown_conn() which scans the just freed session for older connections. This one is only triggered with DEBUG_UAF The reason for this session to be present here is that it's needed during the connection setup, to be passed to conn_install_mux_be() to mux->init() as the owning session, but it's never deleted aftrewards. Furthermore, even conn_session_free() doesn't delete this pointer after freeing the session that lies there. Both do definitely result in a use-after-free that's more easily triggered under DEBUG_UAF. This patch makes sure that the owner is always deleted after detaching or killing the session. However it is currently not possible to clear the owner right after a synchronous init because the proxy protocol apparently needs it (a reg test checks this), and if we leave it past the connection setup with the session not attached anywhere, it's hard to catch the right moment to detach it. This means that the session may remain in conn->owner as long as the connection has never been added to nor removed from the session's idle list. Given that this patch needs to remain simple enough to be backported, instead it adds a workaround in session_unown_conn() to detect that the element is already not attached anywhere. This fix absolutely requires previous patch "CLEANUP: connection: do not use conn->owner when the session is known" otherwise the situation will be even worse, as some places used to rely on conn->owner instead of the session. The fix could theorically be backported as far as 1.8. However, the code in this area has significantly changed along versions and there are more risks of breaking working stuff than fixing real issues there. The issue was really woken up in two steps during 2.3-dev when slightly reworking the idle conns with commit 08016ab82 ("MEDIUM: connection: Add private connections synchronously in session server list") and when adding support for storing used H2 connections in the session and adding the necessary call to session_unown_conn() in the muxes. But the same test managed to crash 2.2 when built in DEBUG_UAF and patched like this, proving that we used to already leave dangling pointers behind us: | diff --git a/include/haproxy/connection.h b/include/haproxy/connection.h | index f8f235c1a..dd30b5f80 100644 | --- a/include/haproxy/connection.h | +++ b/include/haproxy/connection.h | @@ -458,6 +458,10 @@ static inline void conn_free(struct connection *conn) | sess->idle_conns--; | session_unown_conn(sess, conn); | } | + else { | + struct session *sess = conn->owner; | + BUG_ON(sess && sess->origin != &conn->obj_type); | + } | | sockaddr_free(&conn->src); | sockaddr_free(&conn->dst); It's uncertain whether an existing code path there can lead to dereferencing conn->owner when it's bad, though certain suspicious memory corruption bugs make one think it's a likely candidate. The patch should not be hard to adapt there. Backports to 2.1 and older are left to the appreciation of the person doing the backport. A reproducer consists in this: global nbthread 1 listen l bind :9000 mode http http-reuse always server s 127.0.0.1:8999 proto h2 frontend f bind :8999 proto h2 mode http http-request return status 200 Then this will make it crash within 2-3 seconds: $ h1load -e -r 1 -c 10 http://0:9000/ If it does not, it might be that DEBUG_UAF was not used (it's harder then) and it might be useful to restart. --- include/haproxy/session.h | 11 +++++++++++ src/session.c | 1 + 2 files changed, 12 insertions(+) diff --git a/include/haproxy/session.h b/include/haproxy/session.h index 8174a2ae79..fa21dda6cc 100644 --- a/include/haproxy/session.h +++ b/include/haproxy/session.h @@ -78,9 +78,20 @@ static inline void session_unown_conn(struct session *sess, struct connection *c { struct sess_srv_list *srv_list = NULL; + /* WT: this currently is a workaround for an inconsistency between + * the link status of the connection in the session list and the + * connection's owner. This should be removed as soon as all this + * is addressed. Right now it's possible to enter here with a non-null + * conn->owner that points to a dead session, but in this case the + * element is not linked. + */ + if (!LIST_ADDED(&conn->session_list)) + return; + if (conn->flags & CO_FL_SESS_IDLE) sess->idle_conns--; LIST_DEL_INIT(&conn->session_list); + conn->owner = NULL; list_for_each_entry(srv_list, &sess->srv_list, srv_list) { if (srv_list->target == conn->target) { if (LIST_ISEMPTY(&srv_list->conn_list)) { diff --git a/src/session.c b/src/session.c index b065ff3d22..710b6b6f61 100644 --- a/src/session.c +++ b/src/session.c @@ -99,6 +99,7 @@ void session_free(struct session *sess) void conn_session_free(struct connection *conn) { session_free(conn->owner); + conn->owner = NULL; } /* count a new session to keep frontend, listener and track stats up to date */ -- 2.39.5