From: Willy Tarreau Date: Thu, 30 Jan 2025 15:28:33 +0000 (+0100) Subject: DEBUG: epoll: store and compare the FD's generation count with reported event X-Git-Tag: v3.2-dev5~70 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=5012b6c6d9e2278ae247e3e372765be3c11af86c;p=thirdparty%2Fhaproxy.git DEBUG: epoll: store and compare the FD's generation count with reported event There have been some reported cases where races between threads in epoll were causing wrong reports of close or error events. Since the epoll_event data is 64 bits, we can store the FD's generation counter in the upper bits to verify if we're speaking about the same instance of the FD as the current one or a stale one. If the generation number does not match, then we classify these into 3 conditions and increment the relevant COUNT_IF() counters (stale report for closed FD, stale report of harmless event on reopened FD, stale report of HUP/ERR on reopened FD). Tests have shown that with heavy concurrency, a very small maxconn (typically 1 per thread), http-reuse always and a server closing connections first but randomly (httpterm with /C=2r), such events can happen at a pace of a few per second for the closed FDs, and a few per minute for the other ones, so there's value in leaving this accessible for troubleshooting. E.g after a few minutes: Count Type Location function(): "condition" [comment] 5541 CNT ev_epoll.c:296 _do_poll(): "1" [epoll report of event on a just closed fd (harmless)] 10 CNT ev_epoll.c:294 _do_poll(): "1" [epoll report of event on a closed recycled fd (rare)] 42 CNT ev_epoll.c:289 _do_poll(): "1" [epoll report of HUP on a stale fd reopened on the same thread (suspicious)] 212 CNT ev_epoll.c:279 _do_poll(): "1" [epoll report of HUP/ERR on a stale fd reopened on another thread (harmless)] 1 CNT mux_h1.c:3911 h1_send(): "b_data(&h1c->obuf)" [connection error (send) with pending output data] This one with the following setup, whicih abuses threads contention by starting 64 threads on two cores: - config: global nbthread 64 stats socket /tmp/sock1 level admin stats timeout 1h defaults timeout client 5s timeout server 5s timeout connect 5s mode http listen p2 bind :8002 http-reuse always server s1 127.0.0.1:8000 maxconn 4 - haproxy forcefully started on 2C4T: $ taskset -c 0,1,4,5 ./haproxy -db -f epoll-dbg.cfg - httpterm on port 8000, cpus 2,3,6,7 (2C4T) - h1load with responses larger than a single buffer, and randomly closing/keeping alive: $ taskset -c 2,3,6,7 h1load -e -t 4 -c 256 -r 1 0:8002/?s=19k/C=2r --- diff --git a/src/ev_epoll.c b/src/ev_epoll.c index 11e99310a..02ac0ff95 100644 --- a/src/ev_epoll.c +++ b/src/ev_epoll.c @@ -153,7 +153,7 @@ static void _update_fd(int fd) done: ev.events &= ~epoll_mask; - ev.data.fd = fd; + ev.data.u64 = ((u64)fdtab[fd].generation << 32) + fd; epoll_ctl(epoll_fd[tid], opcode, fd, &ev); } @@ -252,9 +252,52 @@ static void _do_poll(struct poller *p, int exp, int wake) for (count = 0; count < status; count++) { unsigned int n, e; + uint64_t epoll_data; + uint ev_gen, fd_gen; e = epoll_events[count].events; - fd = epoll_events[count].data.fd; + epoll_data = epoll_events[count].data.u64; + + /* epoll_data contains the fd's generation in the 32 upper bits + * and the fd in the 32 lower ones. + */ + fd = (uint32_t)epoll_data; + ev_gen = epoll_data >> 32; + fd_gen = _HA_ATOMIC_LOAD(&fdtab[fd].generation); + + if (unlikely(ev_gen != fd_gen)) { + /* this is a stale report for an older instance of this FD */ + + if (_HA_ATOMIC_LOAD(&fdtab[fd].owner)) { + ulong tmask = _HA_ATOMIC_LOAD(&fdtab[fd].thread_mask); + if (!(tmask & ti->ltid_bit)) { + /* thread has change. quite common, that's already handled + * by fd_update_events(), let's just report sensitivive + * events for statistics purposes. + */ + if (e & (EPOLLRDHUP|EPOLLHUP|EPOLLERR)) + COUNT_IF(1, "epoll report of HUP/ERR on a stale fd reopened on another thread (harmless)"); + } else { + /* same thread but different generation, this smells bad, + * maybe that could be caused by crossed takeovers with a + * close() in between or something like this, but this is + * something fd_update_events() cannot detect. It still + * remains relatively safe for HUP because we consider it + * once we've read all pending data. + */ + if (e & EPOLLERR) + COUNT_IF(1, "epoll report of ERR on a stale fd reopened on the same thread (suspicious)"); + else if (e & (EPOLLRDHUP|EPOLLHUP)) + COUNT_IF(1, "epoll report of HUP on a stale fd reopened on the same thread (suspicious)"); + else + COUNT_IF(1, "epoll report of a harmless event on a stale fd reopened on the same thread (suspicious)"); + } + } else if (ev_gen + 1 != fd_gen) { + COUNT_IF(1, "epoll report of event on a closed recycled fd (rare)"); + } else { + COUNT_IF(1, "epoll report of event on a just closed fd (harmless)"); + } + } if ((e & EPOLLRDHUP) && !(cur_poller.flags & HAP_POLL_F_RDHUP)) _HA_ATOMIC_OR(&cur_poller.flags, HAP_POLL_F_RDHUP);