-From stable-bounces@linux.kernel.org Fri Dec 8 03:26:13 2006
-Message-Id: <200612081120.kB8BKZqK019065@shell0.pdx.osdl.net>
-To: ak@muc.de
-From: akpm@osdl.org
-Date: Fri, 08 Dec 2006 03:20:34 -0800
-Cc: akpm@osdl.org, shai@scalex86.org, stable@kernel.org, kiran@scalex86.org
-Subject: x86_64: fix boot hang due to nmi watchdog init code
-
+From 92715e282be7c7488f892703c8d39b08976a833b Mon Sep 17 00:00:00 2001
From: Ravikiran G Thirumalai <kiran@scalex86.org>
+Date: Sat, 9 Dec 2006 21:33:35 +0100
+Subject: x86: Fix boot hang due to nmi watchdog init code
-2.6.19 stopped booting (or booted based on build/config) on our x86_64
+2.6.19 stopped booting (or booted based on build/config) on our x86_64
systems due to a bug introduced in 2.6.19. check_nmi_watchdog schedules an
-IPI on all cpus to busy wait on a flag, but fails to set the busywait flag
-if NMI functionality is disabled.
-
-This causes the secondary cpus to spin in an endless loop, causing the
-kernel bootup to hang.
+IPI on all cpus to busy wait on a flag, but fails to set the busywait
+flag if NMI functionality is disabled. This causes the secondary cpus
+to spin in an endless loop, causing the kernel bootup to hang.
+Depending upon the build, the busywait flag got overwritten (stack variable)
+and caused the kernel to bootup on certain builds. Following patch fixes
+the bug by setting the busywait flag before returning from check_nmi_watchdog.
+I guess using a stack variable is not good here as the calling function could
+potentially return while the busy wait loop is still spinning on the flag.
-Depending upon the build, the busywait flag got overwritten (stack
-variable) and caused the kernel to bootup on certain builds. Following
-patch fixes the bug by setting the busywait flag before returning from
-check_nmi_watchdog.
+AK: I redid the patch significantly to be cleaner
-I guess using a stack variable is not good here as the calling function
-could potentially return while the busy wait loop is still spinning on the
-flag. I would think this is a good candidate for 2.6.19 stable as well.
-
-[akpm@osdl.org: cleanups]
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
-Cc: Andi Kleen <ak@muc.de>
-Cc: <stable@kernel.org>
-Signed-off-by: Andrew Morton <akpm@osdl.org>
+Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
+ arch/i386/kernel/nmi.c | 8 ++++----
+ arch/x86_64/kernel/nmi.c | 9 +++++----
+ 2 files changed, 9 insertions(+), 8 deletions(-)
- arch/x86_64/kernel/nmi.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
+--- linux-2.6.19.orig/arch/i386/kernel/nmi.c
++++ linux-2.6.19/arch/i386/kernel/nmi.c
+@@ -192,6 +192,8 @@ static __cpuinit inline int nmi_known_cp
+ return 0;
+ }
+
++static int endflag __initdata = 0;
++
+ #ifdef CONFIG_SMP
+ /* The performance counters used by NMI_LOCAL_APIC don't trigger when
+ * the CPU is idle. To make sure the NMI watchdog really ticks on all
+@@ -199,7 +201,6 @@ static __cpuinit inline int nmi_known_cp
+ */
+ static __init void nmi_cpu_busy(void *data)
+ {
+- volatile int *endflag = data;
+ local_irq_enable_in_hardirq();
+ /* Intentionally don't use cpu_relax here. This is
+ to make sure that the performance counter really ticks,
+@@ -207,14 +208,13 @@ static __init void nmi_cpu_busy(void *da
+ pause instruction. On a real HT machine this is fine because
+ all other CPUs are busy with "useless" delay loops and don't
+ care if they get somewhat less cycles. */
+- while (*endflag == 0)
+- barrier();
++ while (endflag == 0)
++ mb();
+ }
+ #endif
+
+ static int __init check_nmi_watchdog(void)
+ {
+- volatile int endflag = 0;
+ unsigned int *prev_nmi_count;
+ int cpu;
+
--- linux-2.6.19.orig/arch/x86_64/kernel/nmi.c
+++ linux-2.6.19/arch/x86_64/kernel/nmi.c
-@@ -212,7 +212,7 @@ static __init void nmi_cpu_busy(void *da
+@@ -190,6 +190,8 @@ void nmi_watchdog_default(void)
+ nmi_watchdog = NMI_IO_APIC;
+ }
+
++static int endflag __initdata = 0;
++
+ #ifdef CONFIG_SMP
+ /* The performance counters used by NMI_LOCAL_APIC don't trigger when
+ * the CPU is idle. To make sure the NMI watchdog really ticks on all
+@@ -197,7 +199,6 @@ void nmi_watchdog_default(void)
+ */
+ static __init void nmi_cpu_busy(void *data)
+ {
+- volatile int *endflag = data;
+ local_irq_enable_in_hardirq();
+ /* Intentionally don't use cpu_relax here. This is
+ to make sure that the performance counter really ticks,
+@@ -205,14 +206,13 @@ static __init void nmi_cpu_busy(void *da
+ pause instruction. On a real HT machine this is fine because
+ all other CPUs are busy with "useless" delay loops and don't
+ care if they get somewhat less cycles. */
+- while (*endflag == 0)
+- barrier();
++ while (endflag == 0)
++ mb();
+ }
+ #endif
int __init check_nmi_watchdog (void)
{
- volatile int endflag = 0;
-+ static int __initdata endflag;
int *counts;
int cpu;