]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
bpf: switch task_vma iterator from mmap_lock to per-VMA locks
authorPuranjay Mohan <puranjay@kernel.org>
Wed, 8 Apr 2026 15:45:36 +0000 (08:45 -0700)
committerAlexei Starovoitov <ast@kernel.org>
Fri, 10 Apr 2026 19:05:16 +0000 (12:05 -0700)
commitbee9ef4a40a277bf401be43d39ba7f7f063cf39c
tree7b18af20121c31ba3b3266419454214e2ad9d6d0
parentd8e27d2d22b6e2df3a0125b8c08e9aace38c954c
bpf: switch task_vma iterator from mmap_lock to per-VMA locks

The open-coded task_vma iterator holds mmap_lock for the entire duration
of iteration, increasing contention on this highly contended lock.

Switch to per-VMA locking. Find the next VMA via an RCU-protected maple
tree walk and lock it with lock_vma_under_rcu(). lock_next_vma() is not
used because its fallback takes mmap_read_lock(), and the iterator must
work in non-sleepable contexts.

lock_vma_under_rcu() is a point lookup (mas_walk) that finds the VMA
containing a given address but cannot iterate across gaps. An
RCU-protected vma_next() walk (mas_find) first locates the next VMA's
vm_start to pass to lock_vma_under_rcu().

Between the RCU walk and the lock, the VMA may be removed, shrunk, or
write-locked. On failure, advance past it using vm_end from the RCU
walk. Because the VMA slab is SLAB_TYPESAFE_BY_RCU, vm_end may be
stale; fall back to PAGE_SIZE advancement when it does not make forward
progress. Concurrent VMA insertions at addresses already passed by the
iterator are not detected.

CONFIG_PER_VMA_LOCK is required; return -EOPNOTSUPP without it.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260408154539.3832150-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
kernel/bpf/task_iter.c