]> git.ipfire.org Git - thirdparty/linux.git/commitdiff
KVM: Conditionally reschedule when resetting the dirty ring
authorSean Christopherson <seanjc@google.com>
Fri, 16 May 2025 21:35:37 +0000 (14:35 -0700)
committerSean Christopherson <seanjc@google.com>
Fri, 20 Jun 2025 20:39:42 +0000 (13:39 -0700)
When resetting a dirty ring, conditionally reschedule on each iteration
after the first.  The recently introduced hard limit mitigates the issue
of an endless reset, but isn't sufficient to completely prevent RCU
stalls, soft lockups, etc., nor is the hard limit intended to guard
against such badness.

Note!  Take care to check for reschedule even in the "continue" paths,
as a pathological scenario (or malicious userspace) could dirty the same
gfn over and over, i.e. always hit the continue path.

  rcu: INFO: rcu_sched self-detected stall on CPU
  rcu:  4-....: (5249 ticks this GP) idle=51e4/1/0x4000000000000000 softirq=309/309 fqs=2563
  rcu:  (t=5250 jiffies g=-319 q=608 ncpus=24)
  CPU: 4 UID: 1000 PID: 1067 Comm: dirty_log_test Tainted: G             L     6.13.0-rc3-17fa7a24ea1e-HEAD-vm #814
  Tainted: [L]=SOFTLOCKUP
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x26/0x200 [kvm]
  Call Trace:
   <TASK>
   kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
   kvm_dirty_ring_reset+0x58/0x220 [kvm]
   kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
   __x64_sys_ioctl+0x8b/0xb0
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>
  Tainted: [L]=SOFTLOCKUP
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x17/0x200 [kvm]
  Call Trace:
   <TASK>
   kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
   kvm_dirty_ring_reset+0x58/0x220 [kvm]
   kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
   __x64_sys_ioctl+0x8b/0xb0
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>

Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20250516213540.2546077-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
virt/kvm/dirty_ring.c

index e844e869e8c7f080a8d0ff35d0dbc584ace6ea33..97cca0c02fd1763de86146f3ab6c9318e8f099d0 100644 (file)
@@ -134,6 +134,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
 
                ring->reset_index++;
                (*nr_entries_reset)++;
+
+               /*
+                * While the size of each ring is fixed, it's possible for the
+                * ring to be constantly re-dirtied/harvested while the reset
+                * is in-progress (the hard limit exists only to guard against
+                * wrapping the count into negative space).
+                */
+               if (!first_round)
+                       cond_resched();
+
                /*
                 * Try to coalesce the reset operations when the guest is
                 * scanning pages in the same slot.