]> git.ipfire.org Git - thirdparty/kernel/stable.git/commit
EDAC/{skx_common,i10nm}: Fix some missing error reports on Emerald Rapids
authorQiuxu Zhuo <qiuxu.zhuo@intel.com>
Fri, 14 Feb 2025 00:27:28 +0000 (08:27 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 10 Apr 2025 12:39:10 +0000 (14:39 +0200)
commit2c27c9e1d18a586a7ae015cb205ae67d807e6531
tree6fe31b0922e13889343b8f9b23520b3e5b22b10a
parentf381c92ab4ec265cbdf253b93de8c1932999e829
EDAC/{skx_common,i10nm}: Fix some missing error reports on Emerald Rapids

[ Upstream commit d9207cf7760f5f5599e9ff7eb0fedf56821a1d59 ]

When doing error injection to some memory DIMMs on certain Intel Emerald
Rapids servers, the i10nm_edac missed error reports for some memory DIMMs.

Certain BIOS configurations may hide some memory controllers, and the
i10nm_edac doesn't enumerate these hidden memory controllers. However, the
ADXL decodes memory errors using memory controller physical indices even
if there are hidden memory controllers. Therefore, the memory controller
physical indices reported by the ADXL may mismatch the logical indices
enumerated by the i10nm_edac, resulting in missed error reports for some
memory DIMMs.

Fix this issue by creating a mapping table from memory controller physical
indices (used by the ADXL) to logical indices (used by the i10nm_edac) and
using it to convert the physical indices to the logical indices during the
error handling process.

Fixes: c545f5e41225 ("EDAC/i10nm: Skip the absent memory controllers")
Reported-by: Kevin Chang <kevin1.chang@intel.com>
Tested-by: Kevin Chang <kevin1.chang@intel.com>
Reported-by: Thomas Chen <Thomas.Chen@intel.com>
Tested-by: Thomas Chen <Thomas.Chen@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250214002728.6287-1-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/edac/i10nm_base.c
drivers/edac/skx_common.c
drivers/edac/skx_common.h