From: Yazen Ghannam Date: Sat, 28 Feb 2026 14:08:14 +0000 (-0500) Subject: x86/mce/amd: Filter bogus hardware errors on Zen3 clients X-Git-Tag: v7.0~5^2 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=0422b07bc4c296b736e240d95d21fbfebbfaa2ca;p=thirdparty%2Fkernel%2Flinux.git x86/mce/amd: Filter bogus hardware errors on Zen3 clients Users have been observing multiple L3 cache deferred errors after recent kernel rework of deferred error handling.¹ ⁴ The errors are bogus due to inconsistent status values. Also, user verified that bogus MCA_DESTAT values are present on the system even with an older kernel.² The errors seem to be garbage values present in the MCA_DESTAT of some L3 cache banks. These were implicitly ignored before the recent kernel rework because these do not generate a deferred error interrupt. A later revision of the rework patch was merged for v6.19. This naturally filtered out most of the bogus error logs. However, a few signatures still remain.³ Minimize the scope of the filter to the reported CPU family/model/stepping and only for errors which don't have the Enabled bit in the MCi status MSR. ¹ https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de ² https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de ³ https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.de ⁴ https://lore.kernel.org/r/CAKFB093B2k3sKsGJ_QNX1jVQsaXVFyy=wNwpzCGLOXa_vSDwXw@mail.gmail.com [ bp: Generalize the condition according to which errors are bogus. ] Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling") Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de Reported-by: Bert Karwatzki Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Mario Limonciello Tested-By: Bert Karwatzki Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de --- diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index a030ee4cecc2..28deaba08833 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -604,6 +604,14 @@ bool amd_filter_mce(struct mce *m) enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank); struct cpuinfo_x86 *c = &boot_cpu_data; + /* Bogus hw errors on Cezanne A0. */ + if (c->x86 == 0x19 && + c->x86_model == 0x50 && + c->x86_stepping == 0x0) { + if (!(m->status & MCI_STATUS_EN)) + return true; + } + /* See Family 17h Models 10h-2Fh Erratum #1114. */ if (c->x86 == 0x17 && c->x86_model >= 0x10 && c->x86_model <= 0x2F &&