Pull eventpoll updates from Christian Brauner:
- eventpoll clarity refactor
The recent eventpoll UAF fixes (
a6dc643c6931 and follow-ups) depended
on invariants in fs/eventpoll.c that were nowhere documented and had
to be reverse-engineered from the code: the lifetime relationships
between struct eventpoll, struct epitem, and struct file, the three
removal paths coordinating via epi_fget() pins and ep->mtx, the
ovflist sentinel-encoded scan state machine, the POLLFREE
release/acquire handshake, and the loop / path check globals
serialized by epnested_mutex. The fixes were correct but the next
person to touch this code would hit the same learning curve.
This series codifies those invariants in source and tightens the
surrounding structure. No functional changes intended:
- Documentation: a top-of-file overview with field-protection
tables for struct eventpoll and struct epitem, a section
gathering the loop-check / path-check globals next to their
declarations, labelled comments on the two sides of the POLLFREE
handshake, refreshed comments on epi_fget() and ep_remove_file(),
and a docblock on ep_clear_and_put() that names its two-pass
structure as load-bearing.
- Mechanical renames: ep_refcount_dec_and_test() -> ep_put() to
pair with ep_get(), attach_epitem() -> ep_attach_file() for
ep_remove_file() symmetry, the unused depth argument dropped from
epoll_mutex_lock(), and the CONFIG_KCMP block relocated next to
CONFIG_COMPAT so the hot-path code is contiguous.
- Helper extraction: ep_insert() splits into ep_alloc_epitem() and
ep_register_epitem(), ep_clear_and_put()'s two passes become
ep_drain_pollwaits() and ep_drain_tree() so the ordering
invariant is enforced by the call sequence rather than
convention, the per-event delivery loop body becomes
ep_deliver_event(), and the ep->mtx + epnested_mutex acquisition
dance lifts out of do_epoll_ctl() into ep_ctl_lock() /
ep_ctl_unlock().
- Sentinel and predicate cleanup: the EP_UNACTIVE_PTR overload is
hidden behind named helpers (ep_is_scanning, epi_on_ovflist,
...), epi->next is renamed to epi->ovflist_next, and the boolean
predicates return bool.
- The per-CTL_ADD scratch state (tfile_check_list, path_count[],
inserting_into) moves from file-scope globals into a
stack-allocated struct ep_ctl_ctx plumbed through the loop / path
check chain.
Two follow-up fixes are included: missing kernel-doc for the new @ctx
parameters, and restoring the EP_UNACTIVE_PTR sentinel for
ctx->tfile_check_list - replacing it with NULL termination broke
ep_remove_file()'s "never listed" check for the list tail, causing a
syzbot-reported use-after-free.
- io_uring related epoll cleanups
One of the nastier things about epoll is how it allows nesting
contexts inside each other, leading to the necessity of loop
detection and the issues that have come with that. There is no reason
to support nesting on the io_uring side, so contain the damage and
disallow nested contexts from there: eventpoll gains a file based
control interface and struct epoll_filefd is renamed to epoll_key.
The io_uring side proper goes on top of this through the block tree.
- Fix epoll_wait() reporting false negatives
ep_events_available() checks ep->rdllist and ep_is_scanning() without
a lock and can race with a concurrent scan such that neither check
sees the events, causing epoll_wait() with a zero timeout to wrongly
report no events even though events are available. A sequence lock
closes the race and a reproducer is added to the eventpoll selftests.
* tag 'vfs-7.2-rc1.eventpoll' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
eventpoll: restore EP_UNACTIVE_PTR sentinel for ctx->tfile_check_list
eventpoll: Fix epoll_wait() report false negative
selftests/eventpoll: Add test for multiple waiters
eventpoll: add missing kernel-doc for @ctx function parameters
eventpoll: rename struct epoll_filefd to epoll_key
eventpoll: add file based control interface
eventpoll: export is_file_epoll()
eventpoll: pass struct epoll_filefd through ep_find() and ep_insert()
eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx
eventpoll: use bool for predicate helpers
eventpoll: rename epi->next and txlist for clarity
eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers
eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock()
eventpoll: extract ep_deliver_event() from ep_send_events()
eventpoll: split ep_clear_and_put() into drain helpers
eventpoll: split ep_insert() into alloc + register stages
eventpoll: relocate KCMP helpers near compat syscalls
eventpoll: rename attach_epitem() to ep_attach_file()
eventpoll: drop unused depth argument from epoll_mutex_lock()
eventpoll: rename ep_refcount_dec_and_test() to ep_put()
...