When searching for the error source, the AER driver rules out devices whose
enable_cnt is zero. This was introduced in 2009 by commit
28eb27cf0839
("PCI AER: support invalid error source IDs") without providing a
rationale.
Drivers typically call pci_enable_device() on probe, hence the enable_cnt
check essentially filters out unbound devices. At the time of the commit,
drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
and so any AER-enabled device could be assumed to be bound to a driver.
The check thus made sense because it allowed skipping config space accesses
to devices which were known not to be the error source.
But since 2022, AER is universally enabled on all devices when they are
enumerated, cf. commit
f26e58bf6f54 ("PCI/AER: Enable error reporting when
AER is native").
Errors may very well be reported by unbound devices, e.g. due to link
instability. By ruling them out as error source, errors reported by them
are neither logged nor cleared. When they do get bound and another error
occurs, the earlier error is reported together with the new error, which
may confuse users. Stop doing so.
Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stefan Roese <stefan.roese@mailbox.org>
Cc: stable@vger.kernel.org # v6.0+
Link: https://patch.msgid.link/734338c2e8b669db5a5a3b45d34131b55ffebfca.1774605029.git.lukas@wunner.de
* 3) There are multiple errors and prior ID comparing fails;
* We check AER status registers to find possible reporter.
*/
- if (atomic_read(&dev->enable_cnt) == 0)
- return false;
/* Check if AER is enabled */
pcie_capability_read_word(dev, PCI_EXP_DEVCTL, ®16);