kvm_gmem_get_policy() sets the interleave index (the output param that's
typically named "ilx") to the full page offset (vm_pgoff + vma offset).
But get_vma_policy() adds the page offset on top of the interleave index,
and so the offset is counted twice. This causes NUMA interleaving to skip
nodes: for order-0 pages the effective index jumps by 2 for each
consecutive page.
The vm_op.get_policy() implementation should return only a per-file bias in
the interleave index (like shmem_get_policy does with inode->i_ino),
letting get_vma_policy() add the page-offset component.
Fix by setting the output interleave index to the inode number (a la shmem)
instead of the full page offset, as the index is intended to be a constant,
semi-random value for a given file, e.g. so that interleaving doesn't start
at the same node for every file, and so that allocations are round-robined
across nodes based on the page offset (the selected node would bounce/skip
around if the index isn't constant).
Found by Sashiko (sashiko.dev) AI code review.
Fixes: ed1ffa810bd6 ("KVM: guest_memfd: Enforce NUMA mempolicy using shared policy")
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Fixes: 7f3779a3ac3e ("mm/filemap: Add NUMA mempolicy support to filemap_alloc_folio()")
Link: https://patch.msgid.link/0eff0a90667b900bee837d06b5db5025e1f304b5.1780501924.git.mst@redhat.com
[sean: use reverse fir-tree, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
}
static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
- unsigned long addr, pgoff_t *pgoff)
+ unsigned long addr, pgoff_t *ilx)
{
+ pgoff_t pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
struct inode *inode = file_inode(vma->vm_file);
- *pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ *ilx = inode->i_ino;
/*
* Return the memory policy for this index, or NULL if none is set.
* can then replace NULL with the default memory policy instead of the
* current task's memory policy.
*/
- return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, *pgoff);
+ return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, pgoff);
}
#endif /* CONFIG_NUMA */