Christian Brauner <brauner@kernel.org> says:
The recent UAF series (
a6dc643c6931 and follow-ups) rode on
invariants in fs/eventpoll.c that were nowhere documented and had
to be reverse-engineered from the code: the lifetime relationships
between struct eventpoll, struct epitem, and struct file, the three
removal paths coordinating via epi_fget() pins and ep->mtx, the
ovflist sentinel-encoded scan state machine, the POLLFREE
release/acquire handshake, and the loop / path check globals
serialized by epnested_mutex. The fix was correct but the next
person to touch this code will hit the same learning curve.
This adds a bunch of documentation (a bunch of swearwords were removed
by having an llm go over it) and refactors. The end goal is hopefully a
bit more pallatable than what this is right now. No functional changes
intended yet.
This series codifies those invariants in source and tightens the
surrounding structure.
First there are a couple of pure documentation changes. A top-of-file
overview with field-protection tables for struct eventpoll and struct
epitem, a section gathering the loop-check / path-check globals next to
their declarations, labelled comments on the two sides of the POLLFREE
handshake, refreshed comments on epi_fget() and ep_remove_file() (whose
contract the UAF fix re-shaped), and a docblock on
ep_clear_and_put() that names its two-pass structure as load-bearing.
Next are a couple of mechanical naming cleanups.
ep_refcount_dec_and_test() -> ep_put() to pair with ep_get(); the unused
depth argument dropped from epoll_mutex_lock() (all three callers passed
zero); attach_epitem() -> ep_attach_file() for ep_remove_file()
symmetry; and the CONFIG_KCMP block relocated next to CONFIG_COMPAT so
the hot-path code is contiguous.
Next are a couple of changes that extract long bodies into named
helpers. ep_insert() splits into ep_alloc_epitem() and
ep_register_epitem(); ep_clear_and_put()'s two passes become
ep_drain_pollwaits() and ep_drain_tree() so the ordering invariant is
enforced by the call sequence rather than convention; the per-event
delivery loop body extracts from ep_send_events() as ep_deliver_event();
and the ep->mtx + epnested_mutex acquisition dance lifts out of
do_epoll_ctl() into ep_ctl_lock() / ep_ctl_unlock(), with a return value
that doubles as the @full_check argument to ep_insert().
Next are a couple of changes that address sentinel and predicate sprawl.
The EP_UNACTIVE_PTR overload (meaning "no scan in progress" on
ep->ovflist and "epi not on ovflist" on epi->next) is hidden behind
named helpers (ep_is_scanning, epi_on_ovflist, ...); epi->next is
renamed to epi->ovflist_next and the local txlist to scan_batch; and
is_file_epoll(), ep_is_linked(), ep_events_available() are converted to
return bool to match their already-boolean bodies.
And last we move the per-CTL_ADD scratch state (tfile_check_list,
path_count[], inserting_into) from file-scope globals into a
stack-allocated struct ep_ctl_ctx plumbed through the loop / path check
chain. loop_check_gen stays at file scope because the stamp it leaves on
ep->gen across calls must not collide with a future walk.
The load-bearing invariants the UAF series closed are preserved
verbatim: the epi_fget() pin in ep_remove(), the ordering of
ep_unregister_pollwait() before ep_remove_file() / ep_remove_epi()
in all three removal paths, kfree_rcu(epi) and kfree_rcu(ep), the
POLLFREE smp_store_release / smp_load_acquire pair on pwq->whead,
ep->lock IRQ-safety, the mutex_lock_nested() subclass arithmetic
in ep_insert (subclass 0 outer, 1 for tep) and __ep_eventpoll_poll
/ ep_loop_check_proc (depth-based), and the WARN_ON_ONCE contract
on ep_put() in ep_remove().
* patches from https://patch.msgid.link/
20260424-work-epoll-rework-v1-0-
249ed00a20f3@kernel.org:
eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx
eventpoll: use bool for predicate helpers
eventpoll: rename epi->next and txlist for clarity
eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers
eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock()
eventpoll: extract ep_deliver_event() from ep_send_events()
eventpoll: split ep_clear_and_put() into drain helpers
eventpoll: split ep_insert() into alloc + register stages
eventpoll: relocate KCMP helpers near compat syscalls
eventpoll: rename attach_epitem() to ep_attach_file()
eventpoll: drop unused depth argument from epoll_mutex_lock()
eventpoll: rename ep_refcount_dec_and_test() to ep_put()
eventpoll: document ep_clear_and_put() two-pass pattern
eventpoll: refresh epi_fget() / ep_remove_file() comments
eventpoll: clarify POLLFREE handshake comments
eventpoll: document loop-check / path-check globals
eventpoll: expand top-of-file overview / locking doc
Link: https://patch.msgid.link/20260424-work-epoll-rework-v1-0-249ed00a20f3@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>