]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
x86/mce: Handle AMD threshold interrupt storms
authorSmita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Fri, 21 Nov 2025 19:04:05 +0000 (19:04 +0000)
committerBorislav Petkov (AMD) <bp@alien8.de>
Fri, 21 Nov 2025 19:41:10 +0000 (20:41 +0100)
commit5c4663ed1eac01987a1421f059380db48ab7b1a3
treee6fb4d800fc57e6f2fa8b3af4c9f58321e74413f
parentd7ac083f095d894a0b8ac0573516bfd035e6b25a
x86/mce: Handle AMD threshold interrupt storms

Extend the logic of handling CMCI storms to AMD threshold interrupts.

Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and
per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on
a storm. Rather, disable the interrupt on the corresponding CPU and bank.
Re-enable back the interrupts if enough consecutive polls of the bank show no
corrected errors (30, as programmed by Intel).

Turning off the threshold interrupts would be a better solution on AMD systems
as other error severities will still be handled even if the threshold
interrupts are disabled.

  [ Tony: Small tweak because mce_handle_storm() isn't a pointer now ]
  [ Yazen: Rebase and simplify ]
  [ Avadhut: Remove check to not clear bank's bit in mce_poll_banks and fix
    checkpatch warnings. ]

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251121190542.2447913-3-avadhut.naik@amd.com
arch/x86/kernel/cpu/mce/amd.c
arch/x86/kernel/cpu/mce/internal.h
arch/x86/kernel/cpu/mce/threshold.c