There are situations where reclaim kicks in on a system with free memory.
One possible cause is a NUMA imbalance scenario where one or more nodes
are under pressure. It would help if we could easily identify such nodes.
Move the pgscan, pgsteal, and pgrefill counters from vm_event_item to
node_stat_item to provide per-node reclaim visibility. With these
counters as node stats, the values are now displayed in the per-node
section of /proc/zoneinfo, which allows for quick identification of the
affected nodes.
/proc/vmstat continues to report the same counters, aggregated across all
nodes. But the ordering of these items within the readout changes as they
move from the vm events section to the node stats section.
Memcg accounting of these counters is preserved. The relocated counters
remain visible in memory.stat alongside the existing aggregate pgscan and
pgsteal counters.
However, this change affects how the global counters are accumulated.
Previously, the global event count update was gated on !cgroup_reclaim(),
excluding memcg-based reclaim from /proc/vmstat. Now that
mod_lruvec_state() is being used to update the counters, the global
counters will include all reclaim. This is consistent with how pgdemote
counters are already tracked.
Finally, the virtio_balloon driver is updated to use
global_node_page_state() to fetch the counters, as they are no longer
accessible through the vm_events array.
Link: https://lkml.kernel.org/r/20260219235846.161910-1-jp.kobryn@linux.dev
Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Eugenio Pérez <eperezma@redhat.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
- pages_to_bytes(events[PGSCAN_KSWAPD]));
+ pages_to_bytes(global_node_page_state(PGSCAN_KSWAPD)));
update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
- pages_to_bytes(events[PGSCAN_DIRECT]));
+ pages_to_bytes(global_node_page_state(PGSCAN_DIRECT)));
update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
- pages_to_bytes(events[PGSTEAL_KSWAPD]));
+ pages_to_bytes(global_node_page_state(PGSTEAL_KSWAPD)));
update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
- pages_to_bytes(events[PGSTEAL_DIRECT]));
+ pages_to_bytes(global_node_page_state(PGSTEAL_DIRECT)));
#ifdef CONFIG_HUGETLB_PAGE
update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
PGDEMOTE_PROACTIVE,
+ PGSTEAL_KSWAPD,
+ PGSTEAL_DIRECT,
+ PGSTEAL_KHUGEPAGED,
+ PGSTEAL_PROACTIVE,
+ PGSTEAL_ANON,
+ PGSTEAL_FILE,
+ PGSCAN_KSWAPD,
+ PGSCAN_DIRECT,
+ PGSCAN_KHUGEPAGED,
+ PGSCAN_PROACTIVE,
+ PGSCAN_ANON,
+ PGSCAN_FILE,
+ PGREFILL,
#ifdef CONFIG_HUGETLB_PAGE
NR_HUGETLB,
#endif
PGFREE, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE,
PGFAULT, PGMAJFAULT,
PGLAZYFREED,
- PGREFILL,
PGREUSE,
- PGSTEAL_KSWAPD,
- PGSTEAL_DIRECT,
- PGSTEAL_KHUGEPAGED,
- PGSTEAL_PROACTIVE,
- PGSCAN_KSWAPD,
- PGSCAN_DIRECT,
- PGSCAN_KHUGEPAGED,
- PGSCAN_PROACTIVE,
PGSCAN_DIRECT_THROTTLE,
- PGSCAN_ANON,
- PGSCAN_FILE,
- PGSTEAL_ANON,
- PGSTEAL_FILE,
#ifdef CONFIG_NUMA
PGSCAN_ZONE_RECLAIM_SUCCESS,
PGSCAN_ZONE_RECLAIM_FAILED,
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
PGDEMOTE_PROACTIVE,
+ PGSTEAL_KSWAPD,
+ PGSTEAL_DIRECT,
+ PGSTEAL_KHUGEPAGED,
+ PGSTEAL_PROACTIVE,
+ PGSTEAL_ANON,
+ PGSTEAL_FILE,
+ PGSCAN_KSWAPD,
+ PGSCAN_DIRECT,
+ PGSCAN_KHUGEPAGED,
+ PGSCAN_PROACTIVE,
+ PGSCAN_ANON,
+ PGSCAN_FILE,
+ PGREFILL,
#ifdef CONFIG_HUGETLB_PAGE
NR_HUGETLB,
#endif
#endif
PSWPIN,
PSWPOUT,
- PGSCAN_KSWAPD,
- PGSCAN_DIRECT,
- PGSCAN_KHUGEPAGED,
- PGSCAN_PROACTIVE,
- PGSTEAL_KSWAPD,
- PGSTEAL_DIRECT,
- PGSTEAL_KHUGEPAGED,
- PGSTEAL_PROACTIVE,
PGFAULT,
PGMAJFAULT,
- PGREFILL,
PGACTIVATE,
PGDEACTIVATE,
PGLAZYFREE,
{ "pgdemote_direct", PGDEMOTE_DIRECT },
{ "pgdemote_khugepaged", PGDEMOTE_KHUGEPAGED },
{ "pgdemote_proactive", PGDEMOTE_PROACTIVE },
+ { "pgsteal_kswapd", PGSTEAL_KSWAPD },
+ { "pgsteal_direct", PGSTEAL_DIRECT },
+ { "pgsteal_khugepaged", PGSTEAL_KHUGEPAGED },
+ { "pgsteal_proactive", PGSTEAL_PROACTIVE },
+ { "pgscan_kswapd", PGSCAN_KSWAPD },
+ { "pgscan_direct", PGSCAN_DIRECT },
+ { "pgscan_khugepaged", PGSCAN_KHUGEPAGED },
+ { "pgscan_proactive", PGSCAN_PROACTIVE },
+ { "pgrefill", PGREFILL },
#ifdef CONFIG_NUMA_BALANCING
{ "pgpromote_success", PGPROMOTE_SUCCESS },
#endif
case PGDEMOTE_DIRECT:
case PGDEMOTE_KHUGEPAGED:
case PGDEMOTE_PROACTIVE:
+ case PGSTEAL_KSWAPD:
+ case PGSTEAL_DIRECT:
+ case PGSTEAL_KHUGEPAGED:
+ case PGSTEAL_PROACTIVE:
+ case PGSCAN_KSWAPD:
+ case PGSCAN_DIRECT:
+ case PGSCAN_KHUGEPAGED:
+ case PGSCAN_PROACTIVE:
+ case PGREFILL:
#ifdef CONFIG_NUMA_BALANCING
case PGPROMOTE_SUCCESS:
#endif
/* Accumulated memory events */
seq_buf_printf(s, "pgscan %lu\n",
- memcg_events(memcg, PGSCAN_KSWAPD) +
- memcg_events(memcg, PGSCAN_DIRECT) +
- memcg_events(memcg, PGSCAN_PROACTIVE) +
- memcg_events(memcg, PGSCAN_KHUGEPAGED));
+ memcg_page_state(memcg, PGSCAN_KSWAPD) +
+ memcg_page_state(memcg, PGSCAN_DIRECT) +
+ memcg_page_state(memcg, PGSCAN_PROACTIVE) +
+ memcg_page_state(memcg, PGSCAN_KHUGEPAGED));
seq_buf_printf(s, "pgsteal %lu\n",
- memcg_events(memcg, PGSTEAL_KSWAPD) +
- memcg_events(memcg, PGSTEAL_DIRECT) +
- memcg_events(memcg, PGSTEAL_PROACTIVE) +
- memcg_events(memcg, PGSTEAL_KHUGEPAGED));
+ memcg_page_state(memcg, PGSTEAL_KSWAPD) +
+ memcg_page_state(memcg, PGSTEAL_DIRECT) +
+ memcg_page_state(memcg, PGSTEAL_PROACTIVE) +
+ memcg_page_state(memcg, PGSTEAL_KHUGEPAGED));
for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++) {
#ifdef CONFIG_MEMCG_V1
unsigned long nr_taken;
struct reclaim_stat stat;
bool file = is_file_lru(lru);
- enum vm_event_item item;
+ enum node_stat_item item;
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
bool stalled = false;
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
item = PGSCAN_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc))
- __count_vm_events(item, nr_scanned);
- count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
- __count_vm_events(PGSCAN_ANON + file, nr_scanned);
+ mod_lruvec_state(lruvec, item, nr_scanned);
+ mod_lruvec_state(lruvec, PGSCAN_ANON + file, nr_scanned);
spin_unlock_irq(&lruvec->lru_lock);
stat.nr_demoted);
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc))
- __count_vm_events(item, nr_reclaimed);
- count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
- __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
+ mod_lruvec_state(lruvec, item, nr_reclaimed);
+ mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed);
lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout,
nr_scanned - nr_reclaimed);
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
- if (!cgroup_reclaim(sc))
- __count_vm_events(PGREFILL, nr_scanned);
- count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned);
+ mod_lruvec_state(lruvec, PGREFILL, nr_scanned);
spin_unlock_irq(&lruvec->lru_lock);
{
int i;
int gen;
- enum vm_event_item item;
+ enum node_stat_item item;
int sorted = 0;
int scanned = 0;
int isolated = 0;
int scan_batch = min(nr_to_scan, MAX_LRU_BATCH);
int remaining = scan_batch;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
- struct mem_cgroup *memcg = lruvec_memcg(lruvec);
VM_WARN_ON_ONCE(!list_empty(list));
}
item = PGSCAN_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc)) {
- __count_vm_events(item, isolated);
- __count_vm_events(PGREFILL, sorted);
- }
- count_memcg_events(memcg, item, isolated);
- count_memcg_events(memcg, PGREFILL, sorted);
- __count_vm_events(PGSCAN_ANON + type, isolated);
+ mod_lruvec_state(lruvec, item, isolated);
+ mod_lruvec_state(lruvec, PGREFILL, sorted);
+ mod_lruvec_state(lruvec, PGSCAN_ANON + type, isolated);
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, scan_batch,
scanned, skipped, isolated,
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
LIST_HEAD(clean);
struct folio *folio;
struct folio *next;
- enum vm_event_item item;
+ enum node_stat_item item;
struct reclaim_stat stat;
struct lru_gen_mm_walk *walk;
bool skip_retry = false;
stat.nr_demoted);
item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc))
- __count_vm_events(item, reclaimed);
- count_memcg_events(memcg, item, reclaimed);
- __count_vm_events(PGSTEAL_ANON + type, reclaimed);
+ mod_lruvec_state(lruvec, item, reclaimed);
+ mod_lruvec_state(lruvec, PGSTEAL_ANON + type, reclaimed);
spin_unlock_irq(&lruvec->lru_lock);
[I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
[I(PGDEMOTE_KHUGEPAGED)] = "pgdemote_khugepaged",
[I(PGDEMOTE_PROACTIVE)] = "pgdemote_proactive",
+ [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
+ [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
+ [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
+ [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
+ [I(PGSTEAL_ANON)] = "pgsteal_anon",
+ [I(PGSTEAL_FILE)] = "pgsteal_file",
+ [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
+ [I(PGSCAN_DIRECT)] = "pgscan_direct",
+ [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
+ [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
+ [I(PGSCAN_ANON)] = "pgscan_anon",
+ [I(PGSCAN_FILE)] = "pgscan_file",
+ [I(PGREFILL)] = "pgrefill",
#ifdef CONFIG_HUGETLB_PAGE
[I(NR_HUGETLB)] = "nr_hugetlb",
#endif
[I(PGMAJFAULT)] = "pgmajfault",
[I(PGLAZYFREED)] = "pglazyfreed",
- [I(PGREFILL)] = "pgrefill",
[I(PGREUSE)] = "pgreuse",
- [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
- [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
- [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
- [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
- [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
- [I(PGSCAN_DIRECT)] = "pgscan_direct",
- [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
- [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
[I(PGSCAN_DIRECT_THROTTLE)] = "pgscan_direct_throttle",
- [I(PGSCAN_ANON)] = "pgscan_anon",
- [I(PGSCAN_FILE)] = "pgscan_file",
- [I(PGSTEAL_ANON)] = "pgsteal_anon",
- [I(PGSTEAL_FILE)] = "pgsteal_file",
#ifdef CONFIG_NUMA
[I(PGSCAN_ZONE_RECLAIM_SUCCESS)] = "zone_reclaim_success",