From: Tony Battersby Date: Mon, 10 Nov 2025 15:50:05 +0000 (-0500) Subject: scsi: qla2xxx: Fix lost interrupts with qlini_mode=disabled X-Git-Tag: v6.19-rc1~95^2~15^2~13 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=4f6aaade2a22ac428fa99ed716cf2b87e79c9837;p=thirdparty%2Fkernel%2Flinux.git scsi: qla2xxx: Fix lost interrupts with qlini_mode=disabled When qla2xxx is loaded with qlini_mode=disabled, ha->flags.disable_msix_handshake is used before it is set, resulting in the wrong interrupt handler being used on certain HBAs (qla2xxx_msix_rsp_q_hs() is used when qla2xxx_msix_rsp_q() should be used). The only difference between these two interrupt handlers is that the _hs() version writes to a register to clear the "RISC" interrupt, whereas the other version does not. So this bug results in the RISC interrupt being cleared when it should not be. This occasionally causes a different interrupt handler qla24xx_msix_default() for a different vector to see ((stat & HSRX_RISC_INT) == 0) and ignore its interrupt, which then causes problems like: qla2xxx [0000:02:00.0]-d04c:6: MBX Command timeout for cmd 20, iocontrol=8 jiffies=1090c0300 mb[0-3]=[0x4000 0x0 0x40 0xda] mb7 0x500 host_status 0x40000010 hccr 0x3f00 qla2xxx [0000:02:00.0]-101e:6: Mailbox cmd timeout occurred, cmd=0x20, mb[0]=0x20. Scheduling ISP abort (the cmd varies; sometimes it is 0x20, 0x22, 0x54, 0x5a, 0x5d, or 0x6a) This problem can be reproduced with a 16 or 32 Gbps HBA by loading qla2xxx with qlini_mode=disabled and running a high IOPS test while triggering frequent RSCN database change events. While analyzing the problem I discovered that even with disable_msix_handshake forced to 0, it is not necessary to clear the RISC interrupt from qla2xxx_msix_rsp_q_hs() (more below). So just completely remove qla2xxx_msix_rsp_q_hs() and the logic for selecting it, which also fixes the bug with qlini_mode=disabled. The test below describes the justification for not needing qla2xxx_msix_rsp_q_hs(): Force disable_msix_handshake to 0: qla24xx_config_rings(): if (0 && (ha->fw_attributes & BIT_6) && (IS_MSIX_NACK_CAPABLE(ha)) && (ha->flags.msix_enabled)) { In qla24xx_msix_rsp_q() and qla2xxx_msix_rsp_q_hs(), check: (rd_reg_dword(®->host_status) & HSRX_RISC_INT) Count the number of calls to each function with HSRX_RISC_INT set and the number with HSRX_RISC_INT not set while performing some I/O. If qla2xxx_msix_rsp_q_hs() clears the RISC interrupt (original code): qla24xx_msix_rsp_q: 50% of calls have HSRX_RISC_INT set qla2xxx_msix_rsp_q_hs: 5% of calls have HSRX_RISC_INT set (# of qla2xxx_msix_rsp_q_hs interrupts) = (# of qla24xx_msix_rsp_q interrupts) * 3 If qla2xxx_msix_rsp_q_hs() does not clear the RISC interrupt (patched code): qla24xx_msix_rsp_q: 100% of calls have HSRX_RISC_INT set qla2xxx_msix_rsp_q_hs: 9% of calls have HSRX_RISC_INT set (# of qla2xxx_msix_rsp_q_hs interrupts) = (# of qla24xx_msix_rsp_q interrupts) * 3 In the case of the original code, qla24xx_msix_rsp_q() was seeing HSRX_RISC_INT set only 50% of the time because qla2xxx_msix_rsp_q_hs() was clearing it when it shouldn't have been. In the patched code, qla24xx_msix_rsp_q() sees HSRX_RISC_INT set 100% of the time, which makes sense if that interrupt handler needs to clear the RISC interrupt (which it does). qla2xxx_msix_rsp_q_hs() sees HSRX_RISC_INT only 9% of the time, which is just overlap from the other interrupt during the high IOPS test. Tested with SCST on: QLE2742 FW:v9.08.02 (32 Gbps 2-port) QLE2694L FW:v9.10.11 (16 Gbps 4-port) QLE2694L FW:v9.08.02 (16 Gbps 4-port) QLE2672 FW:v8.07.12 (16 Gbps 2-port) both initiator and target mode Signed-off-by: Tony Battersby Link: https://patch.msgid.link/56d378eb-14ad-49c7-bae9-c649b6c7691e@cybernetics.com Signed-off-by: Martin K. Petersen --- diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index cb95b7b12051d..b3265952c4bed 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -3503,7 +3503,6 @@ struct isp_operations { #define QLA_MSIX_RSP_Q 0x01 #define QLA_ATIO_VECTOR 0x02 #define QLA_MSIX_QPAIR_MULTIQ_RSP_Q 0x03 -#define QLA_MSIX_QPAIR_MULTIQ_RSP_Q_HS 0x04 #define QLA_MIDX_DEFAULT 0 #define QLA_MIDX_RSP_Q 1 diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h index 145defc420f27..55d531c19e6b2 100644 --- a/drivers/scsi/qla2xxx/qla_gbl.h +++ b/drivers/scsi/qla2xxx/qla_gbl.h @@ -766,7 +766,7 @@ extern int qla2x00_dfs_remove(scsi_qla_host_t *); /* Globa function prototypes for multi-q */ extern int qla25xx_request_irq(struct qla_hw_data *, struct qla_qpair *, - struct qla_msix_entry *, int); + struct qla_msix_entry *); extern int qla25xx_init_req_que(struct scsi_qla_host *, struct req_que *); extern int qla25xx_init_rsp_que(struct scsi_qla_host *, struct rsp_que *); extern int qla25xx_create_req_que(struct qla_hw_data *, uint16_t, uint8_t, diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index c4c6b5c6658c0..a3971afc2dd1e 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -4467,32 +4467,6 @@ qla2xxx_msix_rsp_q(int irq, void *dev_id) return IRQ_HANDLED; } -irqreturn_t -qla2xxx_msix_rsp_q_hs(int irq, void *dev_id) -{ - struct qla_hw_data *ha; - struct qla_qpair *qpair; - struct device_reg_24xx __iomem *reg; - unsigned long flags; - - qpair = dev_id; - if (!qpair) { - ql_log(ql_log_info, NULL, 0x505b, - "%s: NULL response queue pointer.\n", __func__); - return IRQ_NONE; - } - ha = qpair->hw; - - reg = &ha->iobase->isp24; - spin_lock_irqsave(&ha->hardware_lock, flags); - wrt_reg_dword(®->hccr, HCCRX_CLR_RISC_INT); - spin_unlock_irqrestore(&ha->hardware_lock, flags); - - queue_work(ha->wq, &qpair->q_work); - - return IRQ_HANDLED; -} - /* Interrupt handling helpers. */ struct qla_init_msix_entry { @@ -4505,7 +4479,6 @@ static const struct qla_init_msix_entry msix_entries[] = { { "rsp_q", qla24xx_msix_rsp_q }, { "atio_q", qla83xx_msix_atio_q }, { "qpair_multiq", qla2xxx_msix_rsp_q }, - { "qpair_multiq_hs", qla2xxx_msix_rsp_q_hs }, }; static const struct qla_init_msix_entry qla82xx_msix_entries[] = { @@ -4792,9 +4765,10 @@ free_irqs: } int qla25xx_request_irq(struct qla_hw_data *ha, struct qla_qpair *qpair, - struct qla_msix_entry *msix, int vector_type) + struct qla_msix_entry *msix) { - const struct qla_init_msix_entry *intr = &msix_entries[vector_type]; + const struct qla_init_msix_entry *intr = + &msix_entries[QLA_MSIX_QPAIR_MULTIQ_RSP_Q]; scsi_qla_host_t *vha = pci_get_drvdata(ha->pdev); int ret; diff --git a/drivers/scsi/qla2xxx/qla_mid.c b/drivers/scsi/qla2xxx/qla_mid.c index 8b71ac0b1d999..0abc47e72e0bf 100644 --- a/drivers/scsi/qla2xxx/qla_mid.c +++ b/drivers/scsi/qla2xxx/qla_mid.c @@ -899,9 +899,7 @@ qla25xx_create_rsp_que(struct qla_hw_data *ha, uint16_t options, rsp->options, rsp->id, rsp->rsp_q_in, rsp->rsp_q_out); - ret = qla25xx_request_irq(ha, qpair, qpair->msix, - ha->flags.disable_msix_handshake ? - QLA_MSIX_QPAIR_MULTIQ_RSP_Q : QLA_MSIX_QPAIR_MULTIQ_RSP_Q_HS); + ret = qla25xx_request_irq(ha, qpair, qpair->msix); if (ret) goto que_failed;