]> git.ipfire.org Git - thirdparty/linux.git/commitdiff
ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered
authorShuai Xue <xueshuai@linux.alibaba.com>
Mon, 14 Jul 2025 11:42:11 +0000 (19:42 +0800)
committerRafael J. Wysocki <rafael.j.wysocki@intel.com>
Wed, 16 Jul 2025 19:08:04 +0000 (21:08 +0200)
If a synchronous error is detected as a result of user-space process
triggering a 2-bit uncorrected error, the CPU will take a synchronous
error exception such as Synchronous External Abort (SEA) on Arm64. The
kernel will queue a memory_failure() work which poisons the related
page, unmaps the page, and then sends a SIGBUS to the process, so that
a system wide panic can be avoided.

However, no memory_failure() work will be queued when abnormal
synchronous errors occur. These errors can include situations like
invalid PA, unexpected severity, no memory failure config support,
invalid GUID section, etc. In such a case, the user-space process will
trigger SEA again.  This loop can potentially exceed the platform
firmware threshold or even trigger a kernel hard lockup, leading to a
system reboot.

Fix it by performing a force kill if no memory_failure() work is queued
for synchronous errors.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20250714114212.31660-2-xueshuai@linux.alibaba.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
drivers/acpi/apei/ghes.c

index 3d44f926afe8e0099666798da619614141d39422..bda33a0f0a01316d639493ef08912ca41eb23b8e 100644 (file)
@@ -902,6 +902,17 @@ static bool ghes_do_proc(struct ghes *ghes,
                }
        }
 
+       /*
+        * If no memory failure work is queued for abnormal synchronous
+        * errors, do a force kill.
+        */
+       if (sync && !queued) {
+               dev_err(ghes->dev,
+                       HW_ERR GHES_PFX "%s:%d: synchronous unrecoverable error (SIGBUS)\n",
+                       current->comm, task_pid_nr(current));
+               force_sig(SIGBUS);
+       }
+
        return queued;
 }