During DLPAR CPU hotplug, newly added CPUs start in RTAS stopped state
(quiesced). If a kexec crash occurs before the guest starts these CPUs
via start-cpu RTAS call, H_SIGNAL_SYS_RESET_ALL_OTHERS will reset them
anyway, causing the kdump kernel to hang:
[ 5.519483][ T1] Processor 0 is stuck.
[ 11.089481][ T1] Processor 1 is stuck.
The hypervisor should only reset CPUs that the guest has started. The
cpu->env.quiesced flag tracks RTAS stopped state - CPUs in this state
are already inactive and should not be reset.
Skip system reset for quiesced CPUs to prevent kdump hangs during CPU
hotplug operations.
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Suggested-by: Vishal Chourasia <vishalc@linux.ibm.com>
Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Shivang Upadhyay <shivangu@linux.ibm.com>
Link: https://lore.kernel.org/qemu-devel/20260511095055.82495-1-shivangu@linux.ibm.com
[harshpb: expanded comment to elobarate more on the rationale]
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
continue;
}
}
+
+ /* Skip quiesced CPUs - they are in RTAS stopped state and
+ * should not be reset. This prevents kdump hangs when CPUs
+ * are hotplugged but not yet started by the guest.
+ */
+ if (c->env.quiesced) {
+ continue;
+ }
+
run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL);
}
return H_SUCCESS;