Under heavy memcg-driven slab reclaim with many memcgs and CPUs,
shrink_slab_memcg() invokes the per-superblock count callback once per
(memcg, NUMA node) tuple. For btrfs that callback reaches
percpu_counter_sum_positive() on fs_info->evictable_extent_maps, which
takes the percpu_counter's raw spinlock with IRQs disabled and walks
every online CPU. With hundreds of memcgs driving reclaim on a host with
dozens of CPUs, this counter lock becomes a global serialization point:
profiles show CPU pinned in the spin_lock_irqsave acquire under
__percpu_counter_sum, with cross-CPU IPIs hitting csd_lock_wait_toolong
while waiting for spinning vCPUs.
The shrinker count is advisory -- super_cache_count() already notes
"counts can change between super_cache_count and super_cache_scan, so we
really don't need locks here." Use percpu_counter_read_positive(), which
is lockless. Worst-case skew is bounded by batch * num_online_cpus (a
few thousand), negligible compared to the millions of extent maps a busy
filesystem accumulates and well within the noise that the shrinker
already tolerates.
Tested-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Ben Maurer <bmaurer@meta.com>
Signed-off-by: David Sterba <dsterba@suse.com>
static long btrfs_nr_cached_objects(struct super_block *sb, struct shrink_control *sc)
{
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
- const s64 nr = percpu_counter_sum_positive(&fs_info->evictable_extent_maps);
+ const s64 nr = percpu_counter_read_positive(&fs_info->evictable_extent_maps);
trace_btrfs_extent_map_shrinker_count(fs_info, nr);