Bjorn Helgaas [Wed, 4 Jun 2025 15:50:40 +0000 (10:50 -0500)]
Merge branch 'pci/controller/imx6'
- Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP (Richard
Zhu)
- Remove redundant dw_pcie_wait_for_link() from imx_pcie_start_link();
since the DWC core does this, imx6 only needs it when retraining for a
faster link speed (Richard Zhu)
- Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)
- Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in some
cases, the controller can't exit 'L23 Ready' through Beacon or PERST#
deassertion (Richard Zhu)
- Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8 GT/s,
causing timeouts in L1 (Richard Zhu)
- Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)
- Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)
* pci/controller/imx6:
PCI: imx6: Save and restore the LUT setting during suspend/resume for i.MX95 SoC
PCI: imx6: Add PLL lock check for i.MX95 SoC
PCI: imx6: Add workaround for errata ERR051586
PCI: imx6: Add workaround for errata ERR051624
PCI: imx6: Toggle the core reset for i.MX95 PCIe
PCI: imx6: Call dw_pcie_wait_for_link() from start_link() callback only when required
PCI: imx6: Skip link up workaround for newer platforms
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:39 +0000 (10:50 -0500)]
Merge branch 'pci/controller/dwc'
- Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training more
robust; this will not affect the intended link width if all lanes are
functional (Wenbin Yao)
* pci/controller/dwc:
PCI: dwc: Make link training more robust by setting PORT_LOGIC_LINK_WIDTH to one lane
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:38 +0000 (10:50 -0500)]
Merge branch 'pci/controller/dw-rockchip'
- Check only PCIE_LINKUP, not LTSSM status, to determine whether the link
is up (Shawn Lin)
- Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s for Root
Complex and Endpoint modes (Shawn Lin)
- Hide the broken ATS Capability in rockchip_pcie_ep_init() instead of
rockchip_pcie_ep_pre_init() so it stays hidden after PERST# resets
non-sticky registers (Shawn Lin)
- Organize register and bitfield definitions logically (Hans Zhang)
- Use rockchip_pcie_link_up() to check link up instead of open coding, and
use GENMASK() and FIELD_GET() when possible (Hans Zhang)
- Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
(Diederik de Haas)
- Return bool (not int) for link-up check in dw_pcie_ops.link_up() and
armada8k, dra7xx, dw-rockchip, exynos, histb, keembay, keystone, kirin,
meson, qcom, qcom-ep, rcar_gen4, spear13xx, tegra194, uniphier, visconti
(Hans Zhang)
- Return bool (not int) for link-up check in mobiveil_pab_ops.link_up() and
layerscape-gen4, mobiveil (Hans Zhang)
- Simplify j721e link-up check (Hans Zhang)
- Convert pci-host-common to a library so platforms that don't need native
host controller drivers don't need to include these helper functions
(Manivannan Sadhasivam)
* pci/controller/dw-rockchip:
PCI: qcom: Replace PERST# sleep time with proper macro
PCI: dw-rockchip: Replace PERST# sleep time with proper macro
PCI: host-common: Convert to library for host controller drivers
PCI: cadence: Simplify J721e link status check
PCI: mobiveil: Return bool from link up check
PCI: dwc: Return bool from link up check
PCI: dw-rockchip: Fix PHY function call sequence in rockchip_pcie_phy_deinit()
PCI: dw-rockchip: Use rockchip_pcie_link_up() to check link up instead of open coding
PCI: dw-rockchip: Reorganize register and bitfield definitions
PCI: dw-rockchip: Remove unused PCIE_CLIENT_GENERAL_DEBUG definition
PCI: dw-rockchip: Move rockchip_pcie_ep_hide_broken_ats_cap_rk3588() to dw_pcie_ep_ops::init()
PCI: dw-rockchip: Enable ASPM L0s capability for both RC and EP modes
PCI: dw-rockchip: Remove PCIE_L0S_ENTRY check from rockchip_pcie_link_up()
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:04 +0000 (10:50 -0500)]
Merge branch 'pci/controller/apple'
- Skip ports disabled in DT when setting up ports (Janne Grunau)
- Add t6020 compatible string (Alyssa Rosenzweig)
- Extract ECAM bridge creation helper from pci_host_common_probe() to
separate driver-specific things like MSI from PCI things (Marc Zyngier)
- Dynamically allocate RID-to_SID bitmap to prepare for SoCs with varying
capabilities (Marc Zyngier)
- Directly set/clear INTx mask bits because T602x dropped the accessors
that could do this without locking (Marc Zyngier)
- Move port PHY registers to their own reg items to accommodate T602x,
which moves them around; retain default offsets for existing DTs that
lack phy%d entries with the reg offsets (Hector Martin)
- Stop polling for core refclk, which doesn't work on T602x and the
bootloader has already done anyway (Hector Martin)
- Use gpiod_set_value_cansleep() when asserting PERST# in probe because
we're allowed to sleep there (Hector Martin)
- Move register offsets into SoC-specific structure (Hector Martin)
- Add T602x PCIe support (Hector Martin)
* pci/controller/apple:
PCI: apple: Add T602x PCIe support
PCI: apple: Abstract register offsets via a SoC-specific structure
PCI: apple: Use gpiod_set_value_cansleep in probe flow
PCI: apple: Drop poll for CORE_RC_PHYIF_STAT_REFCLK
PCI: apple: Move port PHY registers to their own reg items
PCI: apple: Fix missing OF node reference in apple_pcie_setup_port
PCI: apple: Move away from INTMSK{SET,CLR} for INTx and private interrupts
PCI: apple: Dynamically allocate RID-to_SID bitmap
PCI: apple: Move over to standalone probing
PCI: ecam: Allow cfg->priv to be pre-populated from the root port device
PCI: host-generic: Extract an ECAM bridge creation helper from pci_host_common_probe()
dt-bindings: pci: apple,pcie: Add t6020 compatible string
PCI: apple: Set only available ports up
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:03 +0000 (10:50 -0500)]
Merge branch 'pci/endpoint'
- For fixed-size BARs, retain both the actual size and the possibly larger
size allocated to accommodate iATU alignment requirements (Jerome Brunet)
- Simplify ctrl/SPAD space allocation and avoid allocating more space than
needed (Jerome Brunet)
- Correct MSI-X PBA offset calculations for DesignWare and Cadence endpoint
controllers (Niklas Cassel)
- Align the return value (number of interrupts) encoding for
pci_epc_get_msi()/pci_epc_ops::get_msi() and
pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)
- Align the nr_irqs parameter encoding for
pci_epc_set_msi()/pci_epc_ops::set_msi() and
pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)
* pci/endpoint:
PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
PCI: cadence-ep: Correct PBA offset in .set_msix() callback
PCI: dwc: ep: Correct PBA offset in .set_msix() callback
PCI: endpoint: pci-epf-vntb: Simplify ctrl/SPAD space allocation
PCI: endpoint: Retain fixed-size BAR size as well as aligned size
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:03 +0000 (10:50 -0500)]
Merge branch 'pci/virtualization'
- Add an ACS quirk for Loongson Root Ports that don't advertise ACS but
don't allow peer-to-peer transactions between Root Ports; the quirk
allows each Root Port to be in a separate IOMMU group (Huacai Chen)
* pci/virtualization:
PCI: Add ACS quirk for Loongson PCIe
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:02 +0000 (10:50 -0500)]
Merge branch 'pci/pwrctrl'
- Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the
filename paths. Retain old deprecated symbols for compatibility, except
for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold)
- When unregistering pwrctrl, cancel outstanding rescan work before
cleaning up data structures to avoid use-after-free issues (Brian Norris)
* pci/pwrctrl:
arm64: Kconfig: switch to HAVE_PWRCTRL
wifi: ath12k: switch to PCI_PWRCTRL_PWRSEQ
wifi: ath11k: switch to PCI_PWRCTRL_PWRSEQ
PCI/pwrctrl: Rename pwrctrl Kconfig symbols and slot module
PCI/pwrctrl: Cancel outstanding rescan work when unregistering
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:01 +0000 (10:50 -0500)]
Merge branch 'pci/pm'
- Add pm_runtime_put() cleanup helper for use with __free() to
automatically drop the device usage count when a pointer goes out of
scope (Alex Williamson)
- Increment PM usage counter when probing reset methods so we don't try to
read config space of a powered-off device (Alex Williamson)
- Set all devices to D0 during enumeration to ensure ACPI opregion is
connected via _REG (Mario Limonciello)
* pci/pm:
PCI: Explicitly put devices into D0 when initializing
PCI: Increment PM usage counter when probing reset methods
PM: runtime: Define pm_runtime_put cleanup helper
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:59 +0000 (10:49 -0500)]
Merge branch 'pci/hotplug'
- Ignore Presence Detect Changed caused by DPC. pciehp already ignores
Link Down/Up events caused by DPC, but on slots using in-band presence
detect, DPC causes a spurious Presence Detect Changed event (Lukas
Wunner)
- Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports
using in-band presence detect, the reset causes a Presence Detect Changed
event, which mistakenly caused teardown and re-enumeration of the device.
Drivers may need to annotate code that resets their device (Lukas Wunner)
* pci/hotplug:
PCI: hotplug: Drop superfluous #include directives
PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
PCI: pciehp: Ignore Presence Detect Changed caused by DPC
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:56 +0000 (10:49 -0500)]
Merge branch 'pci/enumeration'
- Remove pci_fixup_cardbus(), which has no users left (Heiner Kallweit)
- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
instead of assuming it was 1000ms (Wilfred Mallawa)
- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
devices', which broke resume from system sleep on AMD platforms and has
been fixed by other commits (Lukas Wunner)
- Restrict visibility of pci_dev.match_driver since it's no longer used
outside the PCI core (Lukas Wunner)
* pci/enumeration:
PCI: Limit visibility of match_driver flag to PCI core
Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices"
PCI: Print the actual delay time in pci_bridge_wait_for_secondary_bus()
PCI: Use PCI_STD_NUM_BARS instead of 6
PCI: Remove pci_fixup_cardbus()
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:50 +0000 (10:49 -0500)]
Merge branch 'pci/devres'
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and
unnecessary (Philipp Stanner)
- Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and
related flags since all uses have been removed (Philipp Stanner)
- Rework devres 'request' functions so they are no longer 'hybrid', i.e.,
their behavior no longer depends on whether pcim_enable_device or
pci_enable_device() was used, and remove related code (Philipp Stanner)
* pci/devres:
PCI: Remove function pcim_intx() prototype from pci.h
PCI: Remove hybrid-devres usage warnings from kernel-doc
PCI: Remove redundant set of request functions
PCI: Remove exclusive requests flags from _pcim_request_region()
PCI: Remove pcim_request_region_exclusive()
Documentation/driver-api: Update pcim_enable_device()
PCI: Remove hybrid devres nature from request functions
PCI: Remove pcim_iounmap_regions()
mtip32xx: Remove unnecessary pcim_iounmap_regions() calls
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:49 +0000 (10:49 -0500)]
Merge branch 'pci/bwctrl'
- Simplify link bandwidth controller by replacing the count of Link
Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag
(Ilpo Järvinen)
- Update the Link Speed after retraining, since the Link Speed may have
changed (Ilpo Järvinen)
* pci/bwctrl:
PCI: Update Link Speed after retraining
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:49 +0000 (10:49 -0500)]
Merge branch 'pci/aer'
- Initialize struct aer_err_info before using it to avoid depending on
stack garbage (Bjorn Helgaas)
- Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL
or ERR_NONFATAL was received from a downstream device) and decode into
bus/device/function (Bjorn Helgaas)
- Consolidate AER Error Source ID in one place for message consistency
(Bjorn Helgaas)
- Update statistics and emit trace events early in AER logging paths,
before any potential ratelimiting (Bjorn Helgaas)
- Determine AER log level once and save it so all related messages use the
same level (Karolina Stolarek)
- Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors.
- Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
controls on interval and burst count, to avoid flooding logs and RCU
stall warnings (Jon Pan-Doh)
* pci/aer:
PCI/ERR: Remove misleading TODO regarding kernel panic
PCI/AER: Add sysfs attributes for log ratelimits
PCI/AER: Add ratelimits to PCI AER Documentation
PCI/AER: Ratelimit correctable and non-fatal error logging
PCI/AER: Simplify add_error_device()
PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index
PCI/AER: Rename struct aer_stats to aer_info
PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNING
PCI/ERR: Add printk level to pcie_print_tlp_log()
PCI/AER: Check log level once and remember it
PCI/AER: Trace error event before ratelimiting
PCI/AER: Update statistics before ratelimiting
PCI/AER: Simplify pci_print_aer()
PCI/AER: Initialize aer_err_info before using it
PCI/AER: Move aer_print_source() earlier in file
PCI/AER: Rename aer_print_port_info() to aer_print_source()
PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc
PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()
PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()
PCI/DPC: Log Error Source ID only when valid
PCI/DPC: Initialize aer_err_info before using it
The j721e driver has a single platform driver that can be built-in or a
loadable module, but it calls two separate backend drivers depending on
whether it is a host or endpoint.
If the two modes are not the same, we can end up with a situation where the
built-in pci-j721e driver tries to call the modular host or endpoint
driver, which causes a link failure:
ld.lld-21: error: undefined symbol: cdns_pcie_ep_setup
>>> referenced by pci-j721e.c
>>> drivers/pci/controller/cadence/pci-j721e.o:(j721e_pcie_probe) in archive vmlinux.a
ld.lld-21: error: undefined symbol: cdns_pcie_host_setup
>>> referenced by pci-j721e.c
>>> drivers/pci/controller/cadence/pci-j721e.o:(j721e_pcie_probe) in archive vmlinux.a
Rework the dependencies so that the 'select' is done by the common Kconfig
symbol, based on which of the two are enabled. Effectively this means that
having one built-in makes the other either built-in or disabled, but all
configurations will now build.
PCI: j721e: Add support to build as a loadable module
The 'pci-j721e.c' driver is the application/glue/wrapper driver for the
Cadence PCIe Controllers on TI SoCs. Implement support for building it as a
loadable module.
PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup
Introduce the helper function cdns_pcie_ep_disable() which will undo the
configuration performed by cdns_pcie_ep_setup(). Also, export it for use
by the existing callers of cdns_pcie_ep_setup(), thereby allowing them
to cleanup on their exit path.
PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup
Introduce the helper function cdns_pcie_host_disable() which will undo
the configuration performed by cdns_pcie_host_setup(). Also, export it
for use by existing callers of cdns_pcie_host_setup(), thereby allowing
them to cleanup on their exit path.
PCI: cadence: Add support to build pcie-cadence library as a kernel module
Currently, the Cadence PCIe controller driver can be built as a built-in
module only. Since PCIe functionality is not a necessity for booting, add
support to build the Cadence PCIe driver as a loadable module as well.
PCI: host-common: Convert to library for host controller drivers
This common library will be used as a placeholder for helper functions
shared by the host controller drivers. This avoids placing the host
controller drivers specific helpers in drivers/pci/*.c, to avoid enlarging
the kernel image on platforms that do not use host controller drivers at
all (like x86/ACPI platforms).
PCI/ERR: Remove misleading TODO regarding kernel panic
A PCI device is just another peripheral in a system. So failure to
recover it, must not result in a kernel panic. So remove the TODO which
is quite misleading.
The Cadence PCIe controller driver defines message codes in enum
cdns_pcie_msg_code duplicating the existing PCIE_MSG_CODE_* definitions in
drivers/pci/pci.h. The driver only uses ASSERT_INTA and DEASSERT_INTA codes
from this enum.
Remove the redundant Cadence-specific enum definitions and use the ones
available in drivers/pci/pci.h. This helps in avoiding code duplication,
maintaining consistency with the spec, and simplifying the code
maintenance.
The kdoc for pci_epc_set_msix() says:
"Invoke to set the required number of MSI-X interrupts."
The kdoc for the callback pci_epc_ops->set_msix() says:
"ops to set the requested number of MSI-X interrupts in the MSI-X
capability register"
pci_epc_ops::set_msix() does however expect the parameter 'interrupts' to
be in the encoding as defined by the Table Size field. Nowhere in the
kdoc does it say that the number of interrupts should be in Table Size
encoding.
It is very confusing that the API pci_epc_set_msix() and the callback
function pci_epc_ops::set_msix() both take a parameter named 'interrupts',
but they expect completely different encodings.
Clean up the API and the callback function to have the same semantics,
i.e. the parameter represents the number of interrupts, regardless of the
internal encoding of that value.
Also rename the parameter 'interrupts' to 'nr_irqs', in both the wrapper
function and the callback function, such that the name is unambiguous.
[bhelgaas: more specific subject]
Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable+noautosel@kernel.org # this is simply a cleanup Link: https://patch.msgid.link/20250514074313.283156-14-cassel@kernel.org
The kdoc for pci_epc_set_msi() says:
"Invoke to set the required number of MSI interrupts."
The kdoc for the callback pci_epc_ops::set_msi() says:
"ops to set the requested number of MSI interrupts in the MSI capability
register"
pci_epc_ops::set_msi() does however expect the parameter 'interrupts' to be
in the encoding as defined by the Multiple Message Capable (MMC) field of
the MSI capability structure. Nowhere in the kdoc does it say that the
number of interrupts should be in MMC encoding.
It is very confusing that the API pci_epc_set_msi() and the callback
function pci_epc_ops::set_msi() both take a parameter named 'interrupts',
but they expect completely different encodings.
Clean up the API and the callback function to have the same semantics,
i.e. the parameter represents the number of interrupts, regardless of the
internal encoding of that value.
Also rename the parameter 'interrupts' to 'nr_irqs', in both the wrapper
function and the callback function, such that the name is unambiguous.
[bhelgaas: more specific subject]
Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable+noautosel@kernel.org # this is simply a cleanup Link: https://patch.msgid.link/20250514074313.283156-13-cassel@kernel.org
Niklas Cassel [Wed, 14 May 2025 07:43:17 +0000 (09:43 +0200)]
PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
The kdoc for pci_epc_get_msix() says:
"Invoke to get the number of MSI-X interrupts allocated by the RC"
The kdoc for the callback pci_epc_ops->get_msix() says:
"ops to get the number of MSI-X interrupts allocated by the RC from the
MSI-X capability register"
pci_epc_ops::get_msix() does however return the number of interrupts in the
encoding as defined by the Table Size field. Nowhere in the kdoc does it
say that the returned number of interrupts is in Table Size encoding.
It is very confusing that the API pci_epc_get_msix() and the callback
function pci_epc_ops::get_msix() don't return the same value.
Clean up the API and the callback function to have the same semantics,
i.e. return the number of interrupts, regardless of the internal encoding
of that value.
[bhelgaas: more specific subject]
Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: stable+noautosel@kernel.org # this is simply a cleanup Link: https://patch.msgid.link/20250514074313.283156-12-cassel@kernel.org
Niklas Cassel [Wed, 14 May 2025 07:43:16 +0000 (09:43 +0200)]
PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
The kdoc for API pci_epc_get_msi() says:
"Invoke to get the number of MSI interrupts allocated by the RC"
The kdoc for the callback pci_epc_ops::get_msi() says:
"ops to get the number of MSI interrupts allocated by the RC from
the MSI capability register"
pci_epc_ops::get_msi() does however return the number of interrupts in the
encoding as defined by the Multiple Message Enable (MME) field of the MSI
Capability structure.
Nowhere in the kdoc does it say that the returned number of interrupts is
in MME encoding. It is very confusing that the API pci_epc_get_msi() and
the callback function pci_epc_ops::get_msi() don't return the same value.
Clean up the API and the callback function to have the same semantics,
i.e. return the number of interrupts, regardless of the internal encoding
of that value.
[bhelgaas: more specific subject]
Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: stable+noautosel@kernel.org # this is simply a cleanup Link: https://patch.msgid.link/20250514074313.283156-11-cassel@kernel.org
Niklas Cassel [Wed, 14 May 2025 07:43:15 +0000 (09:43 +0200)]
PCI: cadence-ep: Correct PBA offset in .set_msix() callback
While cdns_pcie_ep_set_msix() writes the Table Size field correctly (N-1),
the calculation of the PBA offset is wrong because it calculates space for
(N-1) entries instead of N.
This results in the following QEMU error when using PCI passthrough on a
device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
[bhelgaas: more specific subject and commit log]
Fixes: 3ef5d16f50f8 ("PCI: cadence: Add MSI-X support to Endpoint driver") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250514074313.283156-10-cassel@kernel.org
Niklas Cassel [Wed, 14 May 2025 07:43:14 +0000 (09:43 +0200)]
PCI: dwc: ep: Correct PBA offset in .set_msix() callback
While dw_pcie_ep_set_msix() writes the Table Size field correctly (N-1),
the calculation of the PBA offset is wrong because it calculates space for
(N-1) entries instead of N.
This results in the following QEMU error when using PCI passthrough on a
device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
[bhelgaas: more specific subject and commit log]
Fixes: 83153d9f36e2 ("PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250514074313.283156-9-cassel@kernel.org
PCI: endpoint: pci-epf-vntb: Simplify ctrl/SPAD space allocation
When allocating the shared ctrl/SPAD space, epf_ntb_config_spad_bar_alloc()
should not try to handle the size quirks for underlying BAR, whether it is
fixed size or alignment. This is already handled by pci_epf_alloc_space().
Also, when handling the alignment, this allocates more space than
necessary. For example, with a SPAD size of 1024B and a ctrl size of 308B,
the space necessary is 1332B. If the alignment is 1MB,
epf_ntb_config_spad_bar_alloc() tries to allocate 2MB where 1MB would have
been more than enough.
Drop the handling of the BAR size quirks and let pci_epf_alloc_space()
handle that. Just make sure the 32bits SPAD register are aligned on 32bits.
PCI: endpoint: Retain fixed-size BAR size as well as aligned size
When allocating space for an endpoint function on a BAR with a fixed size,
the size saved in 'struct pci_epf_bar.size' should be the fixed size as
expected by pci_epc_set_bar().
However, if pci_epf_alloc_space() increased the allocation size to
accommodate iATU alignment requirements, it previously saved the larger
aligned size in .size, which broke pci_epc_set_bar().
To solve this, keep the fixed BAR size in .size and save the aligned size
in a new .aligned_size for use when deallocating it.
Fixes: 2a9a801620ef ("PCI: endpoint: Add support to specify alignment for buffers allocated to BARs") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
[mani: commit message fixup] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[bhelgaas: more specific subject, commit log, wrap comment to match file] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20250424-pci-ep-size-alignment-v5-1-2d4ec2af23f5@baylibre.com
Johan Hovold [Wed, 2 Apr 2025 13:26:31 +0000 (15:26 +0200)]
PCI/pwrctrl: Rename pwrctrl Kconfig symbols and slot module
Commits b88cbaaa6fa1 ("PCI/pwrctrl: Rename pwrctl files to pwrctrl") and 3f925cd62874 ("PCI/pwrctrl: Rename pwrctrl functions and structures")
renamed the "pwrctl" framework to "pwrctrl" for consistency reasons.
Rename also the Kconfig symbols so that they reflect the new name while
adding entries for the deprecated ones. The old symbols can be removed once
everything that depends on them has been updated.
Note that no deprecated symbol is added for the new slot driver to avoid
having to add a user visible option.
Rename the new slot module to reflect the framework name and match the
other pwrctrl modules.
Jon Pan-Doh [Thu, 22 May 2025 23:21:26 +0000 (18:21 -0500)]
PCI/AER: Add sysfs attributes for log ratelimits
Allow userspace to read/write log ratelimits per device (including
enable/disable). Create aer/ sysfs directory to store them and any
future AER configs.
The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so
if we try to emit more than 10 messages in a 5 second period, some are
suppressed.
Update AER sysfs ABI filename to reflect the broader scope of AER sysfs
attributes (e.g. stats and ratelimits).
Jon Pan-Doh [Thu, 22 May 2025 23:21:24 +0000 (18:21 -0500)]
PCI/AER: Ratelimit correctable and non-fatal error logging
Spammy devices can flood kernel logs with AER errors and slow/stall
execution. Add per-device ratelimits for AER correctable and non-fatal
uncorrectable errors that use the kernel defaults (10 per 5s). Logging of
fatal errors is not ratelimited.
There are two AER logging entry points:
- aer_print_error() is used by DPC and native AER
- pci_print_aer() is used by GHES and CXL
The native AER aer_print_error() case includes a loop that may log details
from multiple devices, which are ratelimited individually. If we log
details for any device, we also log the Error Source ID from the Root Port
or RCEC.
If no such device details are found, we still log the Error Source from the
ERR_* Message, ratelimited by the Root Port or RCEC that received it.
The DPC aer_print_error() case is not ratelimited, since this only happens
for fatal errors.
The CXL pci_print_aer() case is ratelimited by the Error Source device.
The GHES pci_print_aer() case is via aer_recover_work_func(), which
searches for the Error Source device. If the device is not found, there's
no per-device ratelimit, so we use a system-wide ratelimit that covers all
error types (correctable, non-fatal, and fatal).
Sargun at Meta reported internally that a flood of AER errors causes RCU
CPU stall warnings and CSD-lock warnings.
Tested using aer-inject[1]. Sent 11 AER errors. Observed 10 errors logged
while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) show
true count of 11.
Bjorn Helgaas [Thu, 22 May 2025 23:21:22 +0000 (18:21 -0500)]
PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index
Previously aer_get_device_error_info() and aer_print_error() took a pointer
to struct aer_err_info and a pointer to a pci_dev. Typically the pci_dev
was one of the elements of the aer_err_info.dev[] array (DPC was an
exception, where the dev[] array was unused).
Convert aer_get_device_error_info() and aer_print_error() to take an index
into the aer_err_info.dev[] array instead. A future patch will add
per-device ratelimit information, so the index makes it convenient to find
the ratelimit associated with the device.
To accommodate DPC, set info->dev[0] to the DPC port before using these
interfaces.
Update name to reflect the broader definition of structs/variables that are
stored (e.g. ratelimits). This is a preparatory patch for adding rate limit
support.
[bhelgaas: "aer_report" -> "aer_info"]
Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-16-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:19 +0000 (18:21 -0500)]
PCI/ERR: Add printk level to pcie_print_tlp_log()
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/
etc) that depends on the kind of error, and it calls pcie_print_tlp_log(),
which previously always produced output at KERN_ERR.
Add a "level" parameter so aer_print_error() can control the level of the
pcie_print_tlp_log() output to match.
When reporting an AER error, we check its type multiple times to determine
the log level for each message. Do this check only in the top-level
functions (aer_isr_one_error(), pci_print_aer()) and save the level in
struct aer_err_info.
[bhelgaas: save log level in struct aer_err_info instead of passing it
as a parameter]
Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-13-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:16 +0000 (18:21 -0500)]
PCI/AER: Update statistics before ratelimiting
There are two AER logging entry points:
- aer_print_error() is used by DPC (dpc_process_error()) and native AER
handling (aer_process_err_devices()).
- pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL
(cxl_handle_rdport_errors())
Both use __aer_print_error() to print the AER error bits. Previously
__aer_print_error() also incremented the AER statistics via
pci_dev_aer_stats_incr().
Call pci_dev_aer_stats_incr() early in the entry points instead of in
__aer_print_error() so we update the statistics even if the actual printing
of error bits is rate limited by a future change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-11-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:15 +0000 (18:21 -0500)]
PCI/AER: Simplify pci_print_aer()
Simplify pci_print_aer() by initializing the struct aer_err_info "info"
with a designated initializer list (it was previously initialized with
memset()) and using pci_name().
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-10-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:14 +0000 (18:21 -0500)]
PCI/AER: Initialize aer_err_info before using it
Previously the struct aer_err_info "e_info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "e_info" at declaration with a designated initializer list,
which initializes the other members to zero.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-9-helgaas@kernel.org
Jon Pan-Doh [Thu, 22 May 2025 23:21:12 +0000 (18:21 -0500)]
PCI/AER: Rename aer_print_port_info() to aer_print_source()
Rename aer_print_port_info() to aer_print_source() to be more descriptive.
This logs the Error Source ID logged by a Root Port or Root Complex Event
Collector when it receives an ERR_COR, ERR_NONFATAL, or ERR_FATAL Message.
Bjorn Helgaas [Thu, 22 May 2025 23:21:11 +0000 (18:21 -0500)]
PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number,
device, and function number directly from the Error Source ID. There's no
need to shift and mask it explicitly.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-6-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:10 +0000 (18:21 -0500)]
PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()
Previously we decoded the AER Error Source ID in aer_isr_one_error_type(),
then again in find_source_device() if we didn't find any devices with
errors logged in their AER Capabilities.
Consolidate this so we only decode and log the Error Source ID once in
aer_isr_one_error_type(). Add a "found" parameter so we can add a note
when we didn't find any downstream devices with errors logged in their AER
Capability.
This changes the dmesg logging when we found no devices with errors logged:
- pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0
- pci 0000:00:01.0: AER: found no error details for 0000:02:00.0
+ pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0 (no details found)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-5-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:09 +0000 (18:21 -0500)]
PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()
aer_isr_one_error() duplicates the Error Source ID logging and AER error
processing for Correctable Errors and Uncorrectable Errors. Factor out the
duplicated code to aer_isr_one_error_type().
aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the
Root Port or RCEC pci_dev pointer instead.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-4-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:08 +0000 (18:21 -0500)]
PCI/DPC: Log Error Source ID only when valid
DPC Error Source ID is only valid when the DPC Trigger Reason indicates
that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL
Message (PCIe r6.0, sec 7.9.14.5).
When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE)
or ERR_FATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) from a downstream device,
log the Error Source ID (decoded into domain/bus/device/function). Don't
print the source otherwise, since it's not valid.
For DPC trigger due to reception of ERR_NONFATAL or ERR_FATAL, the dmesg
logging changes:
Previously the "containment event" message was at KERN_INFO and the
"%s detected" message was at KERN_WARNING. Now the single message is at
KERN_WARNING.
Fixes: 26e515713342 ("PCI: Add Downstream Port Containment driver") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-3-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:07 +0000 (18:21 -0500)]
PCI/DPC: Initialize aer_err_info before using it
Previously the struct aer_err_info "info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "info" at declaration so it starts as all zeros.
Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-2-helgaas@kernel.org
PCI/ACPI: Fix allocated memory release on error in pci_acpi_scan_root()
In the pci_acpi_scan_root() function, when creating a PCI bus fails,
we need to free up the previously allocated memory, which can avoid
invalid memory usage and save resources.
Fixes: 789befdfa389 ("arm64: PCI: Migrate ACPI related functions to pci-acpi.c") Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn>
[kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250430060603.381504-1-qiaozhe@iscas.ac.cn
Philipp Stanner [Mon, 19 May 2025 11:30:00 +0000 (13:30 +0200)]
PCI: Remove hybrid-devres usage warnings from kernel-doc
pci/iomap.c still contains warnings about those functions not behaving
in a managed manner if pcim_enable_device() was called. Since all hybrid
behavior that users could know about has been removed by now, those
explicit warnings are no longer necessary.
Remove the hybrid-devres usage warnings from the kernel-doc.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
[kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-8-phasta@kernel.org
Philipp Stanner [Mon, 19 May 2025 11:29:59 +0000 (13:29 +0200)]
PCI: Remove redundant set of request functions
When the demangling of the hybrid devres functions within PCI was
implemented, it was necessary to implement several PCI functions a
second time to avoid cyclic calls, since the hybrid functions in pci.c
call the managed functions in devres.c, which in turn can be directly
used outside of PCI and needed request infrastructure, too.
Therefore, __pcim_request_region_range(), __pci_release_region_range()
and wrappers around them were implemented.
The hybrid nature has recently been removed from all functions in pci.c.
Therefore, the functions in devres.c can now directly use their
counterparts in pci.c without causing a call-cycle.
Remove __pcim_request_region_range(), __pcim_request_region_range() and
the wrappers. Use the corresponding request functions from pci.c in
devres.c
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-7-phasta@kernel.org
Philipp Stanner [Mon, 19 May 2025 11:29:57 +0000 (13:29 +0200)]
PCI: Remove pcim_request_region_exclusive()
pcim_request_region_exclusive() was only needed for redirecting the
relatively exotic exclusive request functions in pci.c in case of them
operating in managed mode.
The managed nature has been removed from those functions and no one else
uses pcim_request_region_exclusive().
Remove pcim_request_region_exclusive().
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-5-phasta@kernel.org
pcim_enable_device() is not related anymore to switching the mode of
operation of any functions. It merely sets up a devres callback for
automatically disabling the PCI device on driver detach.
Adjust the function's documentation.
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-4-phasta@kernel.org
Philipp Stanner [Mon, 19 May 2025 11:29:55 +0000 (13:29 +0200)]
PCI: Remove hybrid devres nature from request functions
All functions based on __pci_request_region() and its release counter
part support "hybrid mode", where the functions become managed if the
PCI device was enabled with pcim_enable_device().
Removing this undesirable feature requires to remove all users who
activated their device with that function and use one of the affected
request functions.
These users were:
ASoC
alsa
cardreader
cirrus
i2c
mmc
mtd
mtd
mxser
net
spi
vdpa
vmwgfx
all of which have been ported to always-managed pcim_ functions by now.
The hybrid nature can, thus, be removed from the aforementioned PCI
functions.
Remove all function guards and documentation in pci.c related to the
hybrid redirection. Adjust the visibility of pcim_release_region().
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-3-phasta@kernel.org
Ilpo Järvinen [Wed, 14 May 2025 13:28:21 +0000 (16:28 +0300)]
PCI: Update Link Speed after retraining
PCIe Link Retraining can alter Link Speed. pcie_retrain_link() that
performs the Link Training is called from bwctrl and ASPM driver.
While bwctrl listens for Link Bandwidth Management Status (LBMS) to
pick up changes in Link Speed, there is a race between
pcie_reset_lbms() clearing LBMS after the Link Training and
pcie_bwnotif_irq() reading the Link Status register. If LBMS is already
cleared when the irq handler reads the register, the interrupt handler
will return early with IRQ_NONE and won't update the Link Speed.
When Link Speed update originates from bwctrl,
pcie_bwctrl_change_speed() ensures Link Speed is updated after the
retraining. ASPM driver, however, calls pcie_retrain_link() but does
not update the Link Speed after retraining which can result in stale
Link Speed. Also, it is possible to have ASPM support with
CONFIG_PCIEPORTBUS=n in which case bwctrl will not be built in (and
thus won't update the Link Speed at all).
To ensure Link Speed is not left stale after Link Training, move the
call to pcie_update_link_speed() from pcie_bwctrl_change_speed() into
pcie_retrain_link().
PCI: Limit visibility of match_driver flag to PCI core
Since commit 58d9a38f6fac ("PCI: Skip attaching driver in device_add()"),
PCI enumeration is split into two steps: In the first step, all devices
are published in sysfs with device_add(). In the second step, drivers are
bound to the devices with device_attach(). To delay driver binding until
the second step, a "bool match_driver" in struct pci_dev is used.
Instead of a bool, use a bit in the "unsigned long priv_flags" to shrink
struct pci_dev a little and prevent use of the bool outside the PCI core
(as has happened with commit cbbc00be2ce3 ("iommu/amd: Prevent binding
other PCI drivers to IOMMU PCI devices")).
Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices"
Commit 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") changed IRQ handling on PCI driver probing.
It inadvertently broke resume from system sleep on AMD platforms:
* 8affb487d4a4 ("x86/PCI: Don't alloc pcibios-irq when MSI is enabled")
* cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices")
The breaking change and one of these two fixes were subsequently reverted:
* fe25d078874f ("Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"")
* 6c777e8799a9 ("Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"")
This rendered the second fix unnecessary, so revert it as well. It used
the match_driver flag in struct pci_dev, which is internal to the PCI core
and not supposed to be touched by arbitrary drivers.
Ilpo Järvinen [Tue, 22 Apr 2025 11:55:47 +0000 (14:55 +0300)]
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
PCIe BW controller counted LBMS assertions for the purposes of the Target
Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the
LBMS count through sysfs to allow better diagnosing link related issues.
Lukas Wunner suggested, however, that adding a trace event would be better
for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user
of the lbms_count.
The logic in pcie_failed_link_retrain() does not require keeping count of
LBMS assertions, so replace lbms_count with a simple flag in pci_dev's
priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem.
Since pcie_failed_link_retrain() runs before bwctrl is probed during boot,
the LBMS in Link Status register still has to be checked by the quirk.
The priv_flags numbering is not continuous because hotplug code added a few
flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in
different branches).
Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: squashed a fix to resolve build failures from
https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
Hans Zhang [Sat, 10 May 2025 16:07:10 +0000 (00:07 +0800)]
PCI: cadence: Simplify J721e link status check
Replace explicit if-else condition with direct return statement in
j721e_pcie_link_up(). This reduces code verbosity while maintaining
the same logic for detecting PCIe link completion.
Hans Zhang [Sat, 10 May 2025 16:07:09 +0000 (00:07 +0800)]
PCI: mobiveil: Return bool from link up check
PCIe link status check is supposed to return a boolean to indicate whether
the link is up or not. So update ls_g4_pcie_link_up() to return bool and
also simplify the LTSSM state check.
Hans Zhang [Sat, 10 May 2025 16:07:08 +0000 (00:07 +0800)]
PCI: dwc: Return bool from link up check
PCIe link status check is supposed to return a boolean to indicate whether
the link is up or not. So, modify the link_up callbacks and
dw_pcie_link_up() function to return bool instead of int.
PCI: Explicitly put devices into D0 when initializing
AMD BIOS team has root caused an issue that NVMe storage failed to come
back from suspend to a lack of a call to _REG when NVMe device was probed.
112a7f9c8edbf ("PCI/ACPI: Call _REG when transitioning D-states") added
support for calling _REG when transitioning D-states, but this only works
if the device actually "transitions" D-states.
967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")
added support for runtime PM on PCI devices, but never actually
'explicitly' sets the device to D0.
To make sure that devices are in D0 and that platform methods such as
_REG are called, explicitly set all devices into D0 during initialization.
Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Denis Benato <benato.denis96@gmail.com> Tested-By: Yijun Shen <Yijun_Shen@Dell.com> Tested-By: David Perry <david.perry@amd.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://patch.msgid.link/20250424043232.1848107-1-superm1@kernel.org
Ilpo Järvinen [Mon, 5 May 2025 11:54:12 +0000 (14:54 +0300)]
PCI: Fix lock symmetry in pci_slot_unlock()
The commit a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()")
made the lock function to call depend on dev->subordinate but left
pci_slot_unlock() unmodified creating locking asymmetry compared with
pci_slot_lock().
Because of the asymmetric lock handling, the same bridge device is unlocked
twice. First pci_bus_unlock() unlocks bus->self and then pci_slot_unlock()
will unconditionally unlock the same bridge device.
Move pci_dev_unlock() inside an else branch to match the logic in
pci_slot_lock().
Fixes: a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250505115412.37628-1-ilpo.jarvinen@linux.intel.com
Wenbin Yao [Tue, 22 Apr 2025 10:36:23 +0000 (18:36 +0800)]
PCI: dwc: Make link training more robust by setting PORT_LOGIC_LINK_WIDTH to one lane
As per DWC PCIe registers description 4.30a, section 1.13.43, NUM_OF_LANES
named as PORT_LOGIC_LINK_WIDTH in PCIe DWC driver, is referred to as the
"Predetermined Number of Lanes" in PCIe r6.0, sec 4.2.7.2.1, which explains
the conditions required to enter Polling.Configuration:
Next state is Polling.Configuration after at least 1024 TS1 Ordered Sets
were transmitted, and all Lanes that detected a Receiver during Detect
receive eight consecutive training sequences ...
Otherwise, after a 24 ms timeout the next state is:
Polling.Configuration if,
(i) Any Lane, which detected a Receiver during Detect, received eight
consecutive training sequences ... and a minimum of 1024 TS1 Ordered
Sets are transmitted after receiving one TS1 or TS2 Ordered Set.
And
(ii) At least a predetermined set of Lanes that detected a Receiver
during Detect have detected an exit from Electrical Idle at least
once since entering Polling.Active.
Note: This may prevent one or more bad Receivers or Transmitters
from holding up a valid Link from being configured, and allow for
additional training in Polling.Configuration. The exact set of
predetermined Lanes is implementation specific.
Note: Any Lane that receives eight consecutive TS1 or TS2 Ordered
Sets should have detected an exit from Electrical Idle at least
once since entering Polling.Active.
In a PCIe link supporting multiple lanes, if PORT_LOGIC_LINK_WIDTH is set
to lane width the hardware supports, all lanes that detect a receiver
during the Detect phase must receive eight consecutive training sequences.
Otherwise, LTSSM will not enter Polling.Configuration and link training
will fail.
Therefore, always set PORT_LOGIC_LINK_WIDTH to 1, regardless of the number
of lanes the port actually supports, to make link up more robust. This
setting will not affect the intended link width if all lanes are
functional. Additionally, the link can still be established with at least
one lane if other lanes are faulty.
Hans Zhang [Sun, 27 Apr 2025 12:53:16 +0000 (20:53 +0800)]
PCI: dw-rockchip: Use rockchip_pcie_link_up() to check link up instead of open coding
Some of the callers of rockchip_pcie_link_up() are open coding the
rockchip_pcie_link_up() function, leading to code duplication. So switch
them to use rockchip_pcie_link_up() function.
Also, use the FIELD_GET() macro to simplify the link up check in
rockchip_pcie_link_up().
Hans Zhang [Sun, 27 Apr 2025 12:53:15 +0000 (20:53 +0800)]
PCI: dw-rockchip: Reorganize register and bitfield definitions
Register definitions were scattered with ambiguous names (e.g.,
PCIE_RDLH_LINK_UP_CHGED in PCIE_CLIENT_INTR_STATUS_MISC) and lacked
hierarchical grouping.
Group registers and their associated bitfields logically. This improves
maintainability and aligns the code with hardware documentation.
Richard Zhu [Wed, 16 Apr 2025 08:13:14 +0000 (16:13 +0800)]
PCI: imx6: Save and restore the LUT setting during suspend/resume for i.MX95 SoC
The look up table (LUT) setting would be lost during the PCIe suspend on
i.MX95 SoC. So to ensure proper functionality after resume, save it during
suspend and restore it while resuming.
Fixes: 9d6b1bd6b3c8 ("PCI: imx6: Add i.MX8MQ, i.MX8Q and i.MX95 PM support") Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: subject and description rewording] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://patch.msgid.link/20250416081314.3929794-8-hongxing.zhu@nxp.com
Richard Zhu [Wed, 16 Apr 2025 08:13:12 +0000 (16:13 +0800)]
PCI: imx6: Add workaround for errata ERR051586
ERR051586: Compliance with 8GT/s Receiver Impedance ECN.
The default value of GEN3_RELATED_OFF[GEN3_ZRXDC_NONCOMPL] is 1 which
makes receiver non-compliant with the ZRX-DC parameter for 2.5 GT/s when
operating at 8 GT/s or higher. It causes unnecessary timeout in L1.
So the workaround is to set GEN3_RELATED_OFF[GEN3_ZRXDC_NONCOMPL] to 0.
Add this workaround in the dw_pcie_host_ops::post_init() callback for
i.MX95 platforms.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: subject and description rewording] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://patch.msgid.link/20250416081314.3929794-6-hongxing.zhu@nxp.com
Richard Zhu [Wed, 16 Apr 2025 08:13:11 +0000 (16:13 +0800)]
PCI: imx6: Add workaround for errata ERR051624
ERR051624: The Controller Without Vaux Cannot Exit L23 Ready Through Beacon
or PERST# De-assertion
When the auxiliary power is not available, the controller cannot exit from
L23 Ready with beacon or PERST# de-assertion when main power is not
removed. So the workaround is to set SS_RW_REG_1[SYS_AUX_PWR_DET] to 1.
This workaround is required irrespective of whether Vaux is supplied to the
link partner or not.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: subject and description rewording] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://patch.msgid.link/20250416081314.3929794-5-hongxing.zhu@nxp.com
Richard Zhu [Wed, 16 Apr 2025 08:13:10 +0000 (16:13 +0800)]
PCI: imx6: Toggle the core reset for i.MX95 PCIe
Add toggling core reset for i.MX95 to align with PHY's power-up sequence.
Note that the register is named as IMX95_PCIE_COLD_RST in hardware, though
it is used to reset the PCIe core.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[mani: subject and description rewording] Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250416081314.3929794-4-hongxing.zhu@nxp.com
Richard Zhu [Wed, 16 Apr 2025 08:13:09 +0000 (16:13 +0800)]
PCI: imx6: Call dw_pcie_wait_for_link() from start_link() callback only when required
Since the DWC driver is already calling dw_pcie_wait_for_link() after
calling the start_link() callback, remove the redundant
dw_pcie_wait_for_link() call from imx_pcie_start_link(). It is still
required to call this function for controllers supporting Gen 2 and higher
link speeds.
Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: subject and description rewording] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250416081314.3929794-3-hongxing.zhu@nxp.com
Richard Zhu [Wed, 16 Apr 2025 08:13:08 +0000 (16:13 +0800)]
PCI: imx6: Skip link up workaround for newer platforms
The current link setup procedure has one workaround to detect the devices
behind PCIe switches on some i.MX6 platforms. But this workaround is not
needed on recent i.MX7 platforms. So skip the workaround for platforms that
do not set the flag and start LTSSM directly.
Also, change the flag name from IMX_PCIE_FLAG_IMX_SPEED_CHANGE to
IMX_PCIE_FLAG_SPEED_CHANGE_WORKAROUND to match the usecase.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: subject and description rewording] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://patch.msgid.link/20250416081314.3929794-2-hongxing.zhu@nxp.com
Shawn Lin [Mon, 14 Apr 2025 01:28:29 +0000 (09:28 +0800)]
PCI: dw-rockchip: Move rockchip_pcie_ep_hide_broken_ats_cap_rk3588() to dw_pcie_ep_ops::init()
In the case of PERST# deassert, non-sticky registers will get reset to
their hardware default state and EXT_CAP registers are one among them. But
since the broken ATS cap is hidden only in dw_pcie_ep_ops::pre_init()
callback which is not gettting called during PERST# deassert, it results in
the capability getting advertised again.
Shawn Lin [Thu, 17 Apr 2025 00:35:10 +0000 (08:35 +0800)]
PCI: dw-rockchip: Enable ASPM L0s capability for both RC and EP modes
L0s capability isn't enabled on all Rockchip SoCs by default, so enable it
in order to make ASPM L0s work on Rockchip platforms.
Testing the L0s for a long time revealed that the default N_FTS value of
210 in the hardware doesn't work stable and causes LTSSM to switch between
L0s and Recovery states. This leads to long exit latency and also causes
link down sometimes. So override the value to the max 255, which seems to
work fine under both PHYs used on Rockchip platforms.
Shawn Lin [Thu, 17 Apr 2025 00:35:09 +0000 (08:35 +0800)]
PCI: dw-rockchip: Remove PCIE_L0S_ENTRY check from rockchip_pcie_link_up()
rockchip_pcie_link_up() currently has two issues:
1. Value 0x11 of PCIE_L0S_ENTRY corresponds to L0 state, not L0S. So the
naming is wrong from the very beginning.
2. Checking for value 0x11 treats other states like L0S and L1 as link
down, which is wrong.
Hence, remove the PCIE_L0S_ENTRY check and also its definition. This allows
adding ASPM support in the successive commits.
Alex Williamson [Tue, 22 Apr 2025 23:05:32 +0000 (17:05 -0600)]
PCI: Increment PM usage counter when probing reset methods
We can get different results probing reset methods for a device depending
on its power state. For example, reading the PM control register of a
device in D3cold will always indicate NoSoftRst+ because we get ~0 data
when the config read fails on PCI, preventing us from correctly probing PM
reset support.
Increment the PM usage counter before any probes and use the cleanup __free
facility to automatically drop the usage counter out of scope.
Hector Martin [Tue, 1 Apr 2025 09:17:13 +0000 (10:17 +0100)]
PCI: apple: Add T602x PCIe support
This version of the hardware moved around a bunch of registers, so we
avoid the old compatible for these and introduce register offset
structures to handle the differences.
Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Link: https://patch.msgid.link/20250401091713.2765724-14-maz@kernel.org
Hector Martin [Tue, 1 Apr 2025 09:17:12 +0000 (10:17 +0100)]
PCI: apple: Abstract register offsets via a SoC-specific structure
Newer versions of the Apple PCIe block have a bunch of small, but
annoying differences.
In order to embrace this diversity of implementations, move the
currently hardcoded offsets into a hw_info structure. Future SoCs
will provide their own structure describing the applicable offsets.
Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
[maz: split from original patch to only address T8103] Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Link: https://patch.msgid.link/20250401091713.2765724-13-maz@kernel.org