From: Mayank Rungta Date: Thu, 12 Mar 2026 23:22:06 +0000 (-0700) Subject: doc: watchdog: document buddy detector X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=cb8615f3cb00210a27237e1c97cefe3aaf27f0cb;p=thirdparty%2Fkernel%2Fstable.git doc: watchdog: document buddy detector The current documentation generalizes the hardlockup detector as primarily NMI-perf-based and lacks details on the SMP "Buddy" detector. Update the documentation to add a detailed description of the Buddy detector, and also restructure the "Implementation" section to explicitly separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and "Hardlockup Detector (Buddy)". Clarify that the softlockup hrtimer acts as the heartbeat generator for both hardlockup mechanisms and centralize the configuration details in a "Frequency and Heartbeats" section. Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-5-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta Reviewed-by: Douglas Anderson Cc: Ian Rogers Cc: Jonathan Corbet Cc: Li Huafei Cc: Max Kellermann Cc: Petr Mladek Cc: Shuah Khan Cc: Stephane Erainan Cc: Wang Jinchao Cc: Yunhui Cui Signed-off-by: Andrew Morton --- diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst index 1b374053771f..7ae7ce3abd2c 100644 --- a/Documentation/admin-guide/lockup-watchdogs.rst +++ b/Documentation/admin-guide/lockup-watchdogs.rst @@ -30,22 +30,23 @@ timeout is set through the confusingly named "kernel.panic" sysctl), to cause the system to reboot automatically after a specified amount of time. +Configuration +============= + +A kernel knob is provided that allows administrators to configure +this period. The "watchdog_thresh" parameter (default 10 seconds) +controls the threshold. The right value for a particular environment +is a trade-off between fast response to lockups and detection overhead. + Implementation ============== -The soft and hard lockup detectors are built on top of the hrtimer and -perf subsystems, respectively. A direct consequence of this is that, -in principle, they should work in any architecture where these -subsystems are present. +The soft lockup detector is built on top of the hrtimer subsystem. +The hard lockup detector is built on top of the perf subsystem +(on architectures that support it) or uses an SMP "buddy" system. -A periodic hrtimer runs to generate interrupts and kick the watchdog -job. An NMI perf event is generated every "watchdog_thresh" -(compile-time initialized to 10 and configurable through sysctl of the -same name) seconds to check for hardlockups. If any CPU in the system -does not receive any hrtimer interrupt during that time the -'hardlockup detector' (the handler for the NMI perf event) will -generate a kernel warning or call panic, depending on the -configuration. +Softlockup Detector +------------------- The watchdog job runs in a stop scheduling thread that updates a timestamp every time it is scheduled. If that timestamp is not updated @@ -55,53 +56,105 @@ will dump useful debug information to the system log, after which it will call panic if it was instructed to do so or resume execution of other kernel code. -The period of the hrtimer is 2*watchdog_thresh/5, which means it has -two or three chances to generate an interrupt before the hardlockup -detector kicks in. +Frequency and Heartbeats +------------------------ + +The hrtimer used by the softlockup detector serves a dual purpose: +it detects softlockups, and it also generates the interrupts +(heartbeats) that the hardlockup detectors use to verify CPU liveness. + +The period of this hrtimer is 2*watchdog_thresh/5. This means the +hrtimer has two or three chances to generate an interrupt before the +NMI hardlockup detector kicks in. + +Hardlockup Detector (NMI/Perf) +------------------------------ + +On architectures that support NMI (Non-Maskable Interrupt) perf events, +a periodic NMI is generated every "watchdog_thresh" seconds. + +If any CPU in the system does not receive any hrtimer interrupt +(heartbeat) during the "watchdog_thresh" window, the 'hardlockup +detector' (the handler for the NMI perf event) will generate a kernel +warning or call panic. + +**Detection Overhead (NMI):** + +The time to detect a lockup can vary depending on when the lockup +occurs relative to the NMI check window. Examples below assume a watchdog_thresh of 10. + +* **Best Case:** The lockup occurs just before the first heartbeat is + due. The detector will notice the missing hrtimer interrupt almost + immediately during the next check. + + :: + + Time 100.0: cpu 1 heartbeat + Time 100.1: hardlockup_check, cpu1 stores its state + Time 103.9: Hard Lockup on cpu1 + Time 104.0: cpu 1 heartbeat never comes + Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup + + Time to detection: ~6 seconds + +* **Worst Case:** The lockup occurs shortly after a valid interrupt + (heartbeat) which itself happened just after the NMI check. The next + NMI check sees that the interrupt count has changed (due to that one + heartbeat), assumes the CPU is healthy, and resets the baseline. The + lockup is only detected at the subsequent check. + + :: + + Time 100.0: hardlockup_check, cpu1 stores its state + Time 100.1: cpu 1 heartbeat + Time 100.2: Hard Lockup on cpu1 + Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed) + Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup -As explained above, a kernel knob is provided that allows -administrators to configure the period of the hrtimer and the perf -event. The right value for a particular environment is a trade-off -between fast response to lockups and detection overhead. + Time to detection: ~20 seconds -Detection Overhead ------------------- +Hardlockup Detector (Buddy) +--------------------------- -The hardlockup detector checks for lockups using a periodic NMI perf -event. This means the time to detect a lockup can vary depending on -when the lockup occurs relative to the NMI check window. +On architectures or configurations where NMI perf events are not +available (or disabled), the kernel may use the "buddy" hardlockup +detector. This mechanism requires SMP (Symmetric Multi-Processing). -**Best Case:** -In the best case scenario, the lockup occurs just before the first -heartbeat is due. The detector will notice the missing hrtimer -interrupt almost immediately during the next check. +In this mode, each CPU is assigned a "buddy" CPU to monitor. The +monitoring CPU runs its own hrtimer (the same one used for softlockup +detection) and checks if the buddy CPU's hrtimer interrupt count has +increased. -:: +To ensure timeliness and avoid false positives, the buddy system performs +checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds +by default). It uses a missed-interrupt threshold of 3. If the buddy's +interrupt count has not changed for 3 consecutive checks, it is assumed +that the buddy CPU is hardlocked (interrupts disabled). The monitoring +CPU will then trigger the hardlockup response (warning or panic). - Time 100.0: cpu 1 heartbeat - Time 100.1: hardlockup_check, cpu1 stores its state - Time 103.9: Hard Lockup on cpu1 - Time 104.0: cpu 1 heartbeat never comes - Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup +**Detection Overhead (Buddy):** - Time to detection: ~6 seconds +With a default check interval of 4 seconds (watchdog_thresh = 10): -**Worst Case:** -In the worst case scenario, the lockup occurs shortly after a valid -interrupt (heartbeat) which itself happened just after the NMI check. -The next NMI check sees that the interrupt count has changed (due to -that one heartbeat), assumes the CPU is healthy, and resets the -baseline. The lockup is only detected at the subsequent check. +* **Best case:** Lockup occurs just before a check. + Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd). +* **Worst case:** Lockup occurs just after a check. + Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd). -:: +**Limitations of the Buddy Detector:** - Time 100.0: hardlockup_check, cpu1 stores its state - Time 100.1: cpu 1 heartbeat - Time 100.2: Hard Lockup on cpu1 - Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed) - Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup +1. **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy + detector cannot detect the condition because the monitoring CPUs + are also frozen. +2. **Stack Traces:** Unlike the NMI detector, the buddy detector + cannot directly interrupt the locked CPU to grab a stack trace. + It relies on architecture-specific mechanisms (like NMI backtrace + support) to try and retrieve the status of the locked CPU. If + such support is missing, the log may only show that a lockup + occurred without providing the locked CPU's stack. - Time to detection: ~20 seconds +Watchdog Core Exclusion +======================= By default, the watchdog runs on all online cores. However, on a kernel configured with NO_HZ_FULL, by default the watchdog runs only