]> git.ipfire.org Git - thirdparty/linux.git/commitdiff
x86/virt: Silence RCU lockdep splat in emergency virt callback path
authorMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Mon, 4 May 2026 23:54:35 +0000 (04:54 +0500)
committerSean Christopherson <seanjc@google.com>
Wed, 13 May 2026 16:53:43 +0000 (09:53 -0700)
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching the crashing CPU, which triggers a suspicious
RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
panic/kdump:

  WARNING: suspicious RCU usage
  arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by tee/11119:
   #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write

  Call Trace:
   <TASK>
   dump_stack_lvl+0x84/0xd0
   lockdep_rcu_suspicious.cold+0x37/0x8f
   x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
   x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
   x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
   native_machine_crash_shutdown+0x72/0x170
   __crash_kexec+0x137/0x280
   panic+0xce/0xd0
   sysrq_handle_crash+0x1f/0x20
   __handle_sysrq.cold+0x192/0x335
   write_sysrq_trigger+0x8c/0xc0
   proc_reg_write+0x1c3/0x3c0
   vfs_write+0x1d0/0xf80
   ksys_write+0x116/0x250
   do_syscall_64+0x11c/0x1480
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   </TASK>

A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
panic context (RCU may ignore the crashing CPU during synchronization),
and a concurrent KVM module unload could in principle race with the
callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
notifier registered on reboot/shutdown") which notes that nothing
prevents module unload during panic/reboot.

However, the alternatives are worse:

  - smp_store_release()/smp_load_acquire() handles ordering but not
    liveness; the kernel still needs to keep the module text alive
    while the callback is in flight.
  - Taking a lock in the panic path is risky — any lock could be held
    by a CPU that has already been NMI'd to a halt.

Use rcu_dereference_raw() to silence the splat and accept the
vanishingly small remaining race. Panic context inherently cannot
guarantee complete correctness; the goal here is to keep debug builds
quiet on the kdump path so the splat doesn't obscure the actual
kernel state being captured.

Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:

  echo c > /proc/sysrq-trigger

Suggested-by: Sean Christopherson <seanjc@google.com>
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20260504235435.90957-1-mikhail.v.gavrilov@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
arch/x86/virt/hw.c

index f647557d38ac527ac3024de38f15c86f215b2faa..7e9091c640be0b0bf89a7f620cf988ff8b22a000 100644 (file)
@@ -49,7 +49,20 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
 {
        cpu_emergency_virt_cb *kvm_callback;
 
-       kvm_callback = rcu_dereference(kvm_emergency_callback);
+       /*
+        * RCU may not be watching the crashing CPU here, so rcu_dereference()
+        * triggers a suspicious-RCU-usage splat. In principle, a concurrent
+        * KVM module unload could race with this read; see commit 2baa33a8ddd6
+        * ("KVM: x86: Leave user-return notifier registered on reboot/shutdown")
+        * which notes that nothing prevents module unload during panic/reboot.
+        *
+        * However, taking a lock here would be riskier than the current race:
+        * the system is going down via NMI shootdown, and any lock could be
+        * held by an already-stopped CPU. Use rcu_dereference_raw() to silence
+        * the lockdep splat and accept the comically small remaining race;
+        * panic context inherently cannot guarantee complete correctness.
+        */
+       kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
        if (kvm_callback)
                kvm_callback();
 }