doc: watchdog: document buddy detector

author Mayank Rungta <mrungta@google.com>

Thu, 12 Mar 2026 23:22:06 +0000 (16:22 -0700)

committer Andrew Morton <akpm@linux-foundation.org>

Sat, 28 Mar 2026 04:19:47 +0000 (21:19 -0700)
author Mayank Rungta <mrungta@google.com>
Thu, 12 Mar 2026 23:22:06 +0000 (16:22 -0700)
committer Andrew Morton <akpm@linux-foundation.org>
Sat, 28 Mar 2026 04:19:47 +0000 (21:19 -0700)
diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst

index 1b374053771f676d874716b3210cade55ae89b28..7ae7ce3abd2c838ff29c70f7a32ffaf58531e150 100644 (file)
--- a/Documentation/admin-guide/lockup-watchdogs.rst
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -30,22 +30,23 @@ timeout is set through the confusingly named "kernel.panic" sysctl),
  to cause the system to reboot automatically after a specified amount
  of time.
  
+Configuration
+=============
+
+A kernel knob is provided that allows administrators to configure
+this period. The "watchdog_thresh" parameter (default 10 seconds)
+controls the threshold. The right value for a particular environment
+is a trade-off between fast response to lockups and detection overhead.
+
  Implementation
  ==============
  
-The soft and hard lockup detectors are built on top of the hrtimer and
-perf subsystems, respectively. A direct consequence of this is that,
-in principle, they should work in any architecture where these
-subsystems are present.
+The soft lockup detector is built on top of the hrtimer subsystem.
+The hard lockup detector is built on top of the perf subsystem
+(on architectures that support it) or uses an SMP "buddy" system.
  
-A periodic hrtimer runs to generate interrupts and kick the watchdog
-job. An NMI perf event is generated every "watchdog_thresh"
-(compile-time initialized to 10 and configurable through sysctl of the
-same name) seconds to check for hardlockups. If any CPU in the system
-does not receive any hrtimer interrupt during that time the
-'hardlockup detector' (the handler for the NMI perf event) will
-generate a kernel warning or call panic, depending on the
-configuration.
+Softlockup Detector
+-------------------
  
  The watchdog job runs in a stop scheduling thread that updates a
  timestamp every time it is scheduled. If that timestamp is not updated
@@ -55,53 +56,105 @@ will dump useful debug information to the system log, after which it
  will call panic if it was instructed to do so or resume execution of
  other kernel code.
  
-The period of the hrtimer is 2*watchdog_thresh/5, which means it has
-two or three chances to generate an interrupt before the hardlockup
-detector kicks in.
+Frequency and Heartbeats
+------------------------
+
+The hrtimer used by the softlockup detector serves a dual purpose:
+it detects softlockups, and it also generates the interrupts
+(heartbeats) that the hardlockup detectors use to verify CPU liveness.
+
+The period of this hrtimer is 2*watchdog_thresh/5. This means the
+hrtimer has two or three chances to generate an interrupt before the
+NMI hardlockup detector kicks in.
+
+Hardlockup Detector (NMI/Perf)
+------------------------------
+
+On architectures that support NMI (Non-Maskable Interrupt) perf events,
+a periodic NMI is generated every "watchdog_thresh" seconds.
+
+If any CPU in the system does not receive any hrtimer interrupt
+(heartbeat) during the "watchdog_thresh" window, the 'hardlockup
+detector' (the handler for the NMI perf event) will generate a kernel
+warning or call panic.
+
+**Detection Overhead (NMI):**
+
+The time to detect a lockup can vary depending on when the lockup
+occurs relative to the NMI check window. Examples below assume a watchdog_thresh of 10.
+
+* **Best Case:** The lockup occurs just before the first heartbeat is
+  due. The detector will notice the missing hrtimer interrupt almost
+  immediately during the next check.
+
+  ::
+
+    Time 100.0: cpu 1 heartbeat
+    Time 100.1: hardlockup_check, cpu1 stores its state
+    Time 103.9: Hard Lockup on cpu1
+    Time 104.0: cpu 1 heartbeat never comes
+    Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+
+    Time to detection: ~6 seconds
+
+* **Worst Case:** The lockup occurs shortly after a valid interrupt
+  (heartbeat) which itself happened just after the NMI check. The next
+  NMI check sees that the interrupt count has changed (due to that one
+  heartbeat), assumes the CPU is healthy, and resets the baseline. The
+  lockup is only detected at the subsequent check.
+
+  ::
+
+    Time 100.0: hardlockup_check, cpu1 stores its state
+    Time 100.1: cpu 1 heartbeat
+    Time 100.2: Hard Lockup on cpu1
+    Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
+    Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
  
-As explained above, a kernel knob is provided that allows
-administrators to configure the period of the hrtimer and the perf
-event. The right value for a particular environment is a trade-off
-between fast response to lockups and detection overhead.
+    Time to detection: ~20 seconds
  
-Detection Overhead
-------------------
+Hardlockup Detector (Buddy)
+---------------------------
  
-The hardlockup detector checks for lockups using a periodic NMI perf
-event. This means the time to detect a lockup can vary depending on
-when the lockup occurs relative to the NMI check window.
+On architectures or configurations where NMI perf events are not
+available (or disabled), the kernel may use the "buddy" hardlockup
+detector. This mechanism requires SMP (Symmetric Multi-Processing).
  
-**Best Case:**
-In the best case scenario, the lockup occurs just before the first
-heartbeat is due. The detector will notice the missing hrtimer
-interrupt almost immediately during the next check.
+In this mode, each CPU is assigned a "buddy" CPU to monitor. The
+monitoring CPU runs its own hrtimer (the same one used for softlockup
+detection) and checks if the buddy CPU's hrtimer interrupt count has
+increased.
  
-::
+To ensure timeliness and avoid false positives, the buddy system performs
+checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds
+by default). It uses a missed-interrupt threshold of 3. If the buddy's
+interrupt count has not changed for 3 consecutive checks, it is assumed
+that the buddy CPU is hardlocked (interrupts disabled). The monitoring
+CPU will then trigger the hardlockup response (warning or panic).
  
-  Time 100.0: cpu 1 heartbeat
-  Time 100.1: hardlockup_check, cpu1 stores its state
-  Time 103.9: Hard Lockup on cpu1
-  Time 104.0: cpu 1 heartbeat never comes
-  Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+**Detection Overhead (Buddy):**
  
-  Time to detection: ~6 seconds
+With a default check interval of 4 seconds (watchdog_thresh = 10):
  
-**Worst Case:**
-In the worst case scenario, the lockup occurs shortly after a valid
-interrupt (heartbeat) which itself happened just after the NMI check.
-The next NMI check sees that the interrupt count has changed (due to
-that one heartbeat), assumes the CPU is healthy, and resets the
-baseline. The lockup is only detected at the subsequent check.
+* **Best case:** Lockup occurs just before a check.
+    Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
+* **Worst case:** Lockup occurs just after a check.
+    Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).
  
-::
+**Limitations of the Buddy Detector:**
  
-  Time 100.0: hardlockup_check, cpu1 stores its state
-  Time 100.1: cpu 1 heartbeat
-  Time 100.2: Hard Lockup on cpu1
-  Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed)
-  Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup
+1.  **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy
+    detector cannot detect the condition because the monitoring CPUs
+    are also frozen.
+2.  **Stack Traces:** Unlike the NMI detector, the buddy detector
+    cannot directly interrupt the locked CPU to grab a stack trace.
+    It relies on architecture-specific mechanisms (like NMI backtrace
+    support) to try and retrieve the status of the locked CPU. If
+    such support is missing, the log may only show that a lockup
+    occurred without providing the locked CPU's stack.
  
-  Time to detection: ~20 seconds
+Watchdog Core Exclusion
+=======================
  
  By default, the watchdog runs on all online cores.  However, on a
  kernel configured with NO_HZ_FULL, by default the watchdog runs only
author	Mayank Rungta <mrungta@google.com>
	Thu, 12 Mar 2026 23:22:06 +0000 (16:22 -0700)
committer	Andrew Morton <akpm@linux-foundation.org>
	Sat, 28 Mar 2026 04:19:47 +0000 (21:19 -0700)