From: Koby Elbaz Date: Mon, 29 May 2023 08:41:04 +0000 (+0300) Subject: accel/habanalabs: set device status 'malfunction' while in rmmod X-Git-Tag: v6.7-rc1~145^2~11^2~71 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=e4a97d6b62599cc6c4bd37674de378b2a86225c3;p=thirdparty%2Fkernel%2Flinux.git accel/habanalabs: set device status 'malfunction' while in rmmod hl_device_status() returns the status of an acquired device. If a device is going down (following an rmmod cmd), it should be marked as an unusable/malfunctioning device, and hence should not be acquired. However, since this was not the case so far (i.e., a device going down would inaccurately return 'in reset' status allowing the user to acquire the device) it introduced a bug where as part of a reset flow, the driver could not kill processes that have not run yet, and since those processes aren't blocked from reacquiring a device, we get eventually a new flow of a driver attempting to kill all processes in a list that can't be ever really empty. Signed-off-by: Koby Elbaz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index 9933058712920..5973e4d64e19b 100644 --- a/drivers/accel/habanalabs/common/device.c +++ b/drivers/accel/habanalabs/common/device.c @@ -315,7 +315,9 @@ enum hl_device_status hl_device_status(struct hl_device *hdev) { enum hl_device_status status; - if (hdev->reset_info.in_reset) { + if (hdev->device_fini_pending) { + status = HL_DEVICE_STATUS_MALFUNCTION; + } else if (hdev->reset_info.in_reset) { if (hdev->reset_info.in_compute_reset) status = HL_DEVICE_STATUS_IN_RESET_AFTER_DEVICE_RELEASE; else @@ -343,9 +345,9 @@ bool hl_device_operational(struct hl_device *hdev, *status = current_status; switch (current_status) { + case HL_DEVICE_STATUS_MALFUNCTION: case HL_DEVICE_STATUS_IN_RESET: case HL_DEVICE_STATUS_IN_RESET_AFTER_DEVICE_RELEASE: - case HL_DEVICE_STATUS_MALFUNCTION: case HL_DEVICE_STATUS_NEEDS_RESET: return false; case HL_DEVICE_STATUS_OPERATIONAL: