Bjorn Helgaas [Fri, 6 Feb 2026 23:09:50 +0000 (17:09 -0600)]
Merge branch 'pci/controller/rzg3s-host'
- Use pci_generic_config_write(), not custom wrapper, since we don't need
the writability provided by the wrapper (Claudiu Beznea)
- Drop lock around RZG3S_PCI_MSIRS and RZG3S_PCI_PINTRCVIS updates since
they are RW1C registers (Claudiu Beznea)
- Fix a device node reference leak in rzg3s_pcie_host_parse_port() (Felix
Gu)
* pci/controller/rzg3s-host:
PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
PCI: rzg3s-host: Drop the lock on RZG3S_PCI_MSIRS and RZG3S_PCI_PINTRCVIS
PCI: rzg3s-host: Use pci_generic_config_write() for the root bus
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:46 +0000 (17:09 -0600)]
Merge branch 'pci/controller/dwc-qcom'
- Parse PERST# from all PCIe bridge nodes for future platforms that will
have PERST# in Switch Downstream Ports as well as in Root Ports
(Manivannan Sadhasivam)
- Rename qcom PERST# assert/deassert helpers, e.g., qcom_ep_reset_assert(),
to avoid confusion with Endpoint interfaces (Manivannan Sadhasivam)
* pci/controller/dwc-qcom:
PCI: qcom: Rename PERST# assert/deassert helpers for uniformity
PCI: qcom: Parse PERST# from all PCIe bridge nodes
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:34 +0000 (17:09 -0600)]
Merge branch 'pci/controller/dwc'
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
pointer to the preceding Capability (Qiang Yu)
- Add dw_pcie_remove_capability() and dw_pcie_remove_ext_capability() to
remove Capabilities that are advertised but not fully implemented (Qiang
Yu)
- Remove MSI and MSI-X Capabilities for DWC controllers in platforms that
can't support them, so we automatically fall back to INTx (Qiang Yu)
- Remove MSI-X and DPC Capabilities for Qualcomm platforms that advertise
but don't support them (Qiang Yu)
- Remove duplicate dw_pcie_ep_hide_ext_capability() function and replace
with dw_pcie_remove_ext_capability() (Qiang Yu)
- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status for
drivers that support this (Shawn Lin)
- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link
is not up to avoid an unnecessary timeout (Manivannan Sadhasivam)
- Revert dw-rockchip, qcom, and DWC core changes that used link-up IRQs to
trigger enumeration instead of waiting for link to be up because the PCI
core doesn't allocate bus number space for hierarchies that might be
attached (Niklas Cassel)
- Make endpoint iATU entry for MSI permanent instead of programming it
dynamically, which is slow and racy with respect to other concurrent
traffic, e.g., eDMA (Koichiro Den)
- Use iMSI-RX MSI target address when possible to fix endpoints using
32-bit MSI (Shawn Lin)
- Make dw_pcie_ltssm_status_string() available and use it for logging
errors in dw_pcie_wait_for_link() (Manivannan Sadhasivam)
- Return -ENODEV when dw_pcie_wait_for_link() finds no devices, -EIO for
device present but inactive, -ETIMEDOUT for other failures, so callers
can handle these cases differently (Manivannan Sadhasivam)
- Allow DWC host controller driver probe to continue if device is not found
or found but inactive; only fail when there's an error with the link
(Manivannan Sadhasivam)
- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers are
not accessible after PME_Turn_Off, simply wait 10ms instead of polling
for L2/L3 Ready (Richard Zhu)
- Use multiple iATU entries to map large bridge windows and DMA ranges when
necessary instead of failing (Samuel Holland)
- Rename struct dw_pcie_rp.has_msi_ctrl to .use_imsi_rx for clarity (Qiang
Yu)
- Add EPC dynamic_inbound_mapping feature bit for Endpoint Controllers that
can update BAR inbound address translation without requiring EPF driver
to clear/reset the BAR first, and advertise it for DWC-based Endpoints
(Koichiro Den)
- Add EPC subrange_mapping feature bit for Endpoint Controllers that can
map multiple independent inbound regions in a single BAR, implement
subrange mapping, advertise it for DWC-based Endpoints, and add Endpoint
selftests for it (Koichiro Den)
- Allow overriding default BAR sizes for pci-epf-test (Niklas Cassel)
- Make resizable BARs work for Endpoint multi-PF configurations; previously
it only worked for PF 0 (Aksh Garg)
- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings, and
Address Match Mode (Aksh Garg)
- Fix issues with outbound iATU index assignment that caused iATU index to
be out of bounds (Niklas Cassel)
- Clean up iATU index tracking to be consistent (Niklas Cassel)
- Set up iATU when ECAM is enabled; previously IO and MEM outbound windows
weren't programmed, and ECAM-related iATU entries weren't restored after
suspend/resume, so config accesses failed (Krishna Chaitanya Chundru)
* pci/controller/dwc:
PCI: dwc: Fix missing iATU setup when ECAM is enabled
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
PCI: dwc: Fix msg_atu_index assignment
PCI: dwc: ep: Add comment explaining controller level PTM access in multi PF setup
PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support
PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations
PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes
selftests: pci_endpoint: Add BAR subrange mapping test case
misc: pci_endpoint_test: Add BAR subrange mapping test case
PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support
Documentation: PCI: endpoint: Clarify pci_epc_set_bar() usage
PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU
PCI: dwc: Advertise dynamic inbound mapping support
PCI: endpoint: Add BAR subrange mapping support
PCI: endpoint: Add dynamic_inbound_mapping EPC feature
PCI: dwc: Rename dw_pcie_rp::has_msi_ctrl to dw_pcie_rp::use_imsi_rx for clarity
PCI: dwc: Fix grammar and formatting for comment in dw_pcie_remove_ext_capability()
PCI: dwc: Use multiple iATU windows for mapping large bridge windows and DMA ranges
PCI: dwc: Remove duplicate dw_pcie_ep_hide_ext_capability() function
PCI: dwc: Skip waiting for L2/L3 Ready if dw_pcie_rp::skip_l23_wait is true
PCI: dwc: Fail dw_pcie_host_init() if dw_pcie_wait_for_link() returns -ETIMEDOUT
PCI: dwc: Rework the error print of dw_pcie_wait_for_link()
PCI: dwc: Rename and move ltssm_status_string() to pcie-designware.c
PCI: dwc: Return -EIO from dw_pcie_wait_for_link() if device is not active
PCI: dwc: Return -ENODEV from dw_pcie_wait_for_link() if device is not found
PCI: dwc: Use cfg0_base as iMSI-RX target address to support 32-bit MSI devices
PCI: dwc: ep: Cache MSI outbound iATU mapping
Revert "PCI: dwc: Don't wait for link up if driver can detect Link Up event"
Revert "PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt"
Revert "PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported"
Revert "PCI: qcom: Don't wait for link if we can detect Link Up"
Revert "PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ"
Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up"
PCI: dwc: Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up
PCI: dw-rockchip: Change get_ltssm() to provide L1 Substates info
PCI: dwc: Add L1 Substates context to ltssm_status of debugfs
PCI: qcom: Remove DPC Extended Capability
PCI: qcom: Remove MSI-X Capability for Root Ports
PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller
PCI: dwc: Add new APIs to remove standard and extended Capability
PCI: Add preceding capability position support in PCI_FIND_NEXT_*_CAP macros
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:32 +0000 (17:09 -0600)]
Merge branch 'pci/workqueue'
- Add WQ_PERCPU to alloc_workqueue() users (Marco Crivellari)
- Replace use of system_wq with system_percpu_wq (Marco Crivellari)
- Check for failure of alloc_workqueue() to avoid NULL pointer dereferences
(Haotian Zhang)
* pci/workqueue:
PCI: endpoint: Add missing NULL check for alloc_workqueue()
PCI: endpoint: Replace use of system_wq with system_percpu_wq
PCI: Add WQ_PERCPU to alloc_workqueue() users
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:26 +0000 (17:09 -0600)]
Merge branch 'pci/virtualization'
- Mark ASM1164 SATA controller to avoid bus reset since it fails to train
the Link after reset (Alex Williamson)
- Mark Nvidia GB10 Root Ports to avoid bus reset since they may fail to
retrain the link after reset (Johnny-CC Chang)
- Add lockdep and other lock assertions (Ilpo Järvinen)
- Add ACS quirk for Qualcomm Hamoa & Glymur, which provides ACS-like
features but doesn't advertise an ACS Capability (Krishna Chaitanya
Chundru)
- Add ACS quirk for Pericom PI7C9X2G404 switches, which fail under load
when P2P Redirect Request is enabled (Nicolas Cavallari)
- Remove an incorrect unlock in pci_slot_trylock() error handling (Jinhui
Guo)
- Lock the bridge device for slot reset (Keith Busch)
- Enable ACS after IOMMU configuration on OF platforms so ACS is enabled an
all devices; previously the first device enumeration (typically a Root
Port) was omitted (Manivannan Sadhasivam)
- Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to work
around hardware erratum; previously ACS SV was temporarily disabled,
which worked for enumeration but not after reset (Manivannan Sadhasivam)
* pci/virtualization:
PCI: Disable ACS SV for IDT 0x8090 switch
PCI: Disable ACS SV for IDT 0x80b5 switch
PCI: Cache ACS Capabilities register
PCI: Enable ACS after configuring IOMMU for OF platforms
PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
PCI: Use device_lock_assert() to verify device lock is held
PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
PCI: Fix pci_slot_lock () device locking
PCI: Fix pci_slot_trylock() error handling
PCI: Mark Nvidia GB10 to avoid bus reset
PCI: Mark ASM1164 SATA controller to avoid bus reset
- Add more logging of resource assignment (Ilpo Järvinen)
- Add pbus_mem_size_optional() to handle sizes of optional resources
(SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)
- Move CardBus code to setup-cardbus.c and only build it when
CONFIG_CARDBUS is set (Ilpo Järvinen)
- Use scnprintf() instead of sprintf() (Ilpo Järvinen)
- Add pbus_validate_busn() for Bus Number validation (Ilpo Järvinen)
- Don't claim disabled bridge windows to avoid spurious claim failures
(Ilpo Järvinen)
* pci/resource:
PCI: Don't claim disabled bridge windows
PCI: Move CardBus bridge scanning to setup-cardbus.c
PCI: Add pbus_validate_busn() for Bus Number validation
PCI: Add dword #defines for Bus Number + Secondary Latency Timer
PCI: Use scnprintf() instead of sprintf()
PCI: Handle CardBus-specific params in setup-cardbus.c
PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS
PCI: Add 'pci' prefix to struct pci_dev_resource handling functions
PCI: Use resource_assigned() in setup-bus.c algorithm
resource: Mark res given to resource_assigned() as const
PCI: Add pbus_mem_size_optional() to handle optional sizes
PCI: Check invalid align earlier in pbus_size_mem()
PCI: Log reset and restore of resources
PCI: Add pci_resource_is_bridge_win()
PCI: Fetch dev_res to local var in __assign_resources_sorted()
PCI: Use res_to_dev_res() in reassign_resources_sorted()
PCI: Pass bridge window resource to pbus_size_mem()
PCI: Push realloc check into pbus_size_mem()
PCI: Remove old_size limit from bridge window sizing
resource: Increase MAX_IORES_LEVEL to 8
PCI: Stop over-estimating bridge window size
PCI: Rewrite bridge window head alignment function
PCI: Fix bridge window alignment with optional resources
PCI: Use resource_set_range() that correctly sets ->end
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:24 +0000 (17:09 -0600)]
Merge branch 'pci/pwrctrl'
- Rename pwrseq, tc9563, and slot driver structs, variables, and functions
for consistency (Bjorn Helgaas)
- Add power_on/off callbacks with generic signature to pwrseq, tc9563, and
slot drivers so they can be used by pwrctrl core (Manivannan Sadhasivam)
- Add interfaces to create and destroy pwrctrl devices (Krishna Chaitanya
Chundru)
- Add interfaces to power devices on and off (Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the PCI
core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver (Manivannan
Sadhasivam)
* pci/pwrctrl:
PCI/pwrctrl: Create pwrctrl device if graph port is found
PCI/pwrctrl: Add PCIe M.2 connector support
PCI: Drop the assert_perst() callback
PCI: qcom: Drop the assert_perst() callbacks
PCI/pwrctrl: Switch to pwrctrl create, power on/off, destroy APIs
PCI/pwrctrl: Add APIs to power on/off pwrctrl devices
PCI/pwrctrl: Add APIs to create, destroy pwrctrl devices
PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks
PCI/pwrctrl: pwrseq: Factor out power on/off code to helpers
PCI/pwrctrl: slot: Factor out power on/off code to helpers
PCI/pwrctrl: tc9563: Rename private struct and pointers for consistency
PCI/pwrctrl: tc9563: Add local variables to reduce repetition
PCI/pwrctrl: tc9563: Clean up whitespace
PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
PCI/pwrctrl: slot: Rename private struct and pointers for consistency
PCI/pwrctrl: pwrseq: Rename private struct and pointers for consistency
- Remove unnecessary bus_type check in pcie_port_bus_match() (Uwe
Kleine-König)
- Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
portdrv.c (Uwe Kleine-König)
- Remove unnecessary dev and dev->driver checks in portdrv .probe() and
.remove() (Uwe Kleine-König)
- Take advantage of pcie_port_bus_type.probe() and .remove() instead of
assigning them for each portdrv service driver (Uwe Kleine-König)
* pci/portdrv:
PCI/portdrv: Use bus-type functions
PCI/portdrv: Don't check for valid device and driver in bus callbacks
PCI/portdrv: Move pcie_port_bus_type to pcie source file
PCI/portdrv: Don't check for the driver's and device's bus
PCI/portdrv: Drop empty shutdown callback
PCI/portdrv: Fix potential resource leak
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:15 +0000 (17:09 -0600)]
Merge branch 'pci/enumeration'
- Skip enabling ExtTag on VFs since that bit is Reserved and causes
misleading log messages (Håkon Bugge)
- Mark 3ware-9650SA Root Port Extended Tags as broken since 9650SA can't
handle 8-bit tags (Jörg Wedekind)
- Release domain number from the correct IDA when a PCI host bridge has no
parent device (Sergey Shtylyov)
- Initialize endpoint Read Completion Boundary to match Root Port,
regardless of ACPI _HPX (Håkon Bugge)
- Apply _HPX PCIe Setting Record only to AER configuration, and only when
OS owns PCIe hotplug but not AER, to avoid clobbering Extended Tag and
Relaxed Ordering settings (Håkon Bugge)
- Clear PCIe Root Status register with a write, not a read/modify/write
(Lukas Wunner)
* pci/enumeration:
PCI/PME: Replace RMW of Root Status register with direct write
PCI/ACPI: Restrict program_hpx_type2() to AER bits
PCI: Initialize RCB from pci_configure_device()
PCI: Check parent for NULL in of_pci_bus_release_domain_nr()
PCI: Mark 3ware-9650SA Root Port Extended Tags as broken
PCI: Do not attempt to set ExtTag for VFs
Ilpo Järvinen [Fri, 16 Jan 2026 13:15:12 +0000 (15:15 +0200)]
PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
The commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as
PCIe BW controller") was found to lead to a boot hang on a Intel P45
system. Testing without setting Link Bandwidth Management Interrupt Enable
(LBMIE) and Link Autonomous Bandwidth Interrupt Enable (LABIE) (PCIe r7.0,
sec 7.5.3.7) in bwctrl allowed system to come up.
P45 is a very old chipset and supports only up to gen2 PCIe, so not having
bwctrl does not seem a huge deficiency.
Add no_bw_notif in struct pci_dev and quirk Intel P45 Root Port with it.
The IDT switch with Device ID 0x8090 used in the ARM Juno R2 development
board incorrectly raises an ACS Source Validation error on Completions for
Config Read Requests, even though PCIe r7.0, sec 6.12.1.1, says that
Completions are never affected by ACS Source Validation.
This is already handled by the pci_disable_broken_acs_cap() quirk for the
IDT 0x80b5 switch. Extend the quirk for the 0x8090 device too.
Some IDT switches incorrectly flag an ACS Source Validation error on
completions for config read requests before they have captured the bus
number from a previous config write, even though PCIe r7.0, sec 6.12.1.1,
says that completions are never affected by ACS Source Validation.
The previous workaround, aa667c6408d2 ("PCI: Workaround IDT switch ACS
Source Validation erratum"), temporarily disabled ACS SV during
enumeration. This was effective but didn't cover the time after switch
reset, when it may lose the captured bus number.
Avoid the issue by preventing use of ACS SV altogether for these switches
by calling pci_disable_broken_acs_cap() from pci_acs_init() and remove the
previous workaround in pci_bus_read_dev_vendor_id().
Removal of ACS SV for these switches means they no longer enforce
everything in REQ_ACS_FLAGS, so downstream devices are not isolated from
each other and the iommu_group may include more devices.
PCI: Enable ACS after configuring IOMMU for OF platforms
Platform, ACPI, or IOMMU drivers call pci_request_acs(), which sets
'pci_acs_enable' to request that ACS be enabled for any devices enumerated
in the future.
OF platforms called pci_enable_acs() for the first device before
of_iommu_configure() called pci_request_acs(), so ACS was never enabled for
that device (typically a Root Port).
Call pci_enable_acs() later, from pci_dma_configure(), after
of_dma_configure() has had a chance to call pci_request_acs().
Here's the call path, showing the move of pci_enable_acs() from
pci_acs_init() to pci_dma_configure(), where it always happens after
pci_request_acs():
pci_device_add
pci_init_capabilities
pci_acs_init
- pci_enable_acs
- if (pci_acs_enable) <-- previous test
- ...
device_add
bus_notify(BUS_NOTIFY_ADD_DEVICE)
iommu_bus_notifier
iommu_probe_device
iommu_init_device
dev->bus->dma_configure
pci_dma_configure # pci_bus_type.dma_configure
of_dma_configure
of_iommu_configure
pci_request_acs
pci_acs_enable = 1 <-- set
+ pci_enable_acs
+ if (pci_acs_enable) <-- new test
+ ...
bus_probe_device
device_initial_probe
...
really_probe
dev->bus->dma_configure
pci_dma_configure # pci_bus_type.dma_configure
...
pci_enable_acs
Note that we will now call pci_enable_acs() twice for every device, first
from the iommu_probe_device() path and again from the really_probe() path.
Presumably that's not an issue since we also call dev->bus->dma_configure()
twice.
For the ACPI platforms, pci_request_acs() is called during ACPI
initialization time itself, independent of the IOMMU framework.
PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
12d8:b404 is apparently another PCI ID for Pericom PI7C9X2G404 (as
identified by the chip silkscreen and lspci).
It is also affected by the PI7C9X2G errata (e.g. a network card attached
to it fails under load when P2P Redirect Request is enabled), so apply
the same quirk to this PCI ID too.
The Qualcomm Hamoa & Glymur Root Ports don't advertise an ACS capability,
but they do provide ACS-like features to disable peer transactions and
validate bus numbers in requests.
Ilpo Järvinen [Fri, 16 Jan 2026 12:57:40 +0000 (14:57 +0200)]
PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
The function comment for pci_bus_max_d3cold_delay() declares pci_bus_sem
must be held while calling the function which can be automatically checked.
Add lockdep_assert_held(pci_bus_sem) to confirm pci_bus_sem is held.
Jinhui Guo [Fri, 12 Dec 2025 14:55:28 +0000 (22:55 +0800)]
PCI: Fix pci_slot_trylock() error handling
Commit a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()")
delegates the bridge device's pci_dev_trylock() to pci_bus_trylock() in
pci_slot_trylock(), but it forgets to remove the corresponding
pci_dev_unlock() when pci_bus_trylock() fails.
Johnny-CC Chang [Thu, 13 Nov 2025 08:44:06 +0000 (16:44 +0800)]
PCI: Mark Nvidia GB10 to avoid bus reset
After asserting Secondary Bus Reset to downstream devices via a GB10 Root
Port, the link may not retrain correctly, e.g., the link may retrain with a
lower lane count or config accesses to downstream devices may fail.
Prevent use of Secondary Bus Reset for devices below GB10.
Alex Williamson [Fri, 9 Jan 2026 00:02:08 +0000 (17:02 -0700)]
PCI: Mark ASM1164 SATA controller to avoid bus reset
User forums report issues when assigning ASM1164 SATA controllers to VMs,
especially in configurations with multiple controllers. Logs show the
device fails to retrain after bus reset. Reports suggest this is an issue
across multiple platforms. The device indicates support for PM reset,
therefore the device still has a viable function level reset mechanism.
The reporting user confirms the device is well behaved in this use case
with bus reset disabled.
When pci_host_common_ecam_create() calls of_address_to_resource(), it
assumes all errors are due to a missing "reg" property in the device tree
node, when they may be due to a malformed "reg" property.
This can manifest when running the qemu "virt" board with a 32-bit kernel
and `highmem=on` and leads to the very confusing error message:
pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
pci-host-generic 4010000000.pcie: IO 0x003eff0000..0x003effffff -> 0x0000000000
pci-host-generic 4010000000.pcie: MEM 0x0010000000..0x003efeffff -> 0x0010000000
pci-host-generic 4010000000.pcie: MEM 0x8000000000..0xffffffffff -> 0x8000000000
pci-host-generic 4010000000.pcie: missing "reg" property
pci-host-generic 4010000000.pcie: probe with driver pci-host-generic failed with error -75
Lukas Wunner [Sun, 26 Oct 2025 16:57:57 +0000 (17:57 +0100)]
PCI/PME: Replace RMW of Root Status register with direct write
As of PCIe r7.0, the Root Status register contains a single writeable bit
(PME Status, type RW1C) and otherwise just read-only bits and RsvdZ bits
(which software must write as zero, PCIe r7.0 sec 7.4).
Thus, when clearing the PME Status bit, there's no need to perform a
read-modify-write of the register. Instead, the bit can be written
directly.
Lukas Wunner [Sun, 25 Jan 2026 09:25:51 +0000 (10:25 +0100)]
PCI/AER: Clear stale errors on reporting agents upon probe
Correctable and Uncorrectable Error Status Registers on reporting agents
are cleared upon PCI device enumeration in pci_aer_init() to flush past
events. They're cleared again when an error is handled by the AER driver.
If an agent reports a new error after pci_aer_init() and before the AER
driver has probed on the corresponding Root Port or Root Complex Event
Collector, that error is not handled by the AER driver: It clears the
Root Error Status Register on probe, but neglects to re-clear the
Correctable and Uncorrectable Error Status Registers on reporting agents.
The error will eventually be reported when another error occurs. Which
is irritating because to an end user it appears as if the earlier error
has just happened.
Amend the AER driver to clear stale errors on reporting agents upon probe.
Skip reporting agents which have not invoked pci_aer_init() yet to avoid
using an uninitialized pdev->aer_cap. They're recognizable by the error
bits in the Device Control register still being clear.
Reporting agents may execute pci_aer_init() after the AER driver has
probed, particularly when devices are hotplugged or removed/rescanned via
sysfs. For this reason, it continues to be necessary that pci_aer_init()
clears Correctable and Uncorrectable Error Status Registers.
Ilpo Järvinen [Tue, 3 Feb 2026 17:21:38 +0000 (19:21 +0200)]
PCI: Don't claim disabled bridge windows
The commit 8278c6914306 ("PCI: Preserve bridge window resource type flags")
changed bridge window resource behavior such that flags are no longer zero
if the bridge window is not valid or is disabled (mainly to preserve the
type flags for later use). If a bridge window has its limit smaller than
base address, pci_read_bridge_*() sets both IORESOURCE_UNSET and
IORESOURCE_DISABLED to indicate the bridge window exists but is not valid
with the current base and limit configuration.
The code in pci_claim_bridge_resources() still depends on the old behavior
of checking validity of the bridge window solely based on !r->flags,
whereas after 8278c6914306, also IORESOURCE_DISABLED may indicate bridge
window addresses are not valid.
While pci_claim_resource() does check IORESOURCE_UNSET,
pci_claim_bridge_resource() attempts to clip the resource if
pci_claim_resource() fails, which is not correct for bridge window
resources that are not valid. As pci_bus_clip_resource() performs clipping
regardless of flags and then clears IORESOURCE_UNSET, it should not be
called unless the resource is valid.
The problem is visible in this log:
pci 0000:20:00.0: PCI bridge to [bus 21]
pci 0000:20:00.0: bridge window [io size 0x0000 disabled]: can't claim; no address assigned
pci 0000:20:00.0: [io 0x0000-0xffffffffffffffff disabled] clipped to [io 0x0000-0xffff disabled]
Add IORESOURCE_DISABLED check in pci_claim_bridge_resources() to only
claim bridge windows that appear to have a valid configuration.
Felix Gu [Tue, 3 Feb 2026 16:46:24 +0000 (00:46 +0800)]
PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
In rzg3s_pcie_host_parse_port(), of_get_next_child() returns a device node
with an incremented reference count that must be released with
of_node_put(). The current code fails to call of_node_put() which causes a
reference leak.
Use the __free(device_node) attribute to ensure automatic cleanup when the
variable goes out of scope.
PCI: dwc: Fix missing iATU setup when ECAM is enabled
When ECAM is enabled, the driver skipped calling dw_pcie_iatu_setup()
before configuring ECAM iATU entries. This left IO and MEM outbound
windows unprogrammed, resulting in broken IO transactions. Additionally,
dw_pcie_config_ecam_iatu() was only called during host initialization,
so ECAM-related iATU entries were not restored after suspend/resume,
leading to failures in configuration space access
To resolve these issues, move the ECAM iATU configuration to
dw_pcie_iatu_setup(), and invoke dw_pcie_iatu_setup() when ECAM is
enabled.
Furthermore, add error checks in dw_pcie_prog_outbound_atu() and
dw_pcie_prog_inbound_atu() such that an error is returned if the caller is
trying to program an iATU that is outside the number of iATUs supported by
the controller.
Fixes: f6fd357f7afb ("PCI: dwc: Prepare the driver for enabling ECAM mechanism using iATU 'CFG Shift Feature'") Reported-by: Maciej W. Rozycki <macro@orcam.me.uk> Closes: https://lore.kernel.org/all/alpine.DEB.2.21.2511280256260.36486@angie.orcam.me.uk/ Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> Co-developed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
[mani: used imperative tone] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Zhang <zhanghuabing@ecosda.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Cc: stable+noautosel@kernel.org # depends on Clean up iATU index usage in dw_pcie_iatu_setup() Link: https://patch.msgid.link/20260127151038.1484881-8-cassel@kernel.org
Niklas Cassel [Tue, 27 Jan 2026 15:10:40 +0000 (16:10 +0100)]
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
The current iATU index usage in dw_pcie_iatu_setup() is a mess.
For outbound address translation the index is incremented before usage.
For inbound address translation the index is incremented after usage.
Incrementing the index after usage make much more sense, and make the
index usage consistent for both outbound and inbound address translation.
Most likely, the overly complicated logic for the outbound address
translation is because the iATU at index 0 is reserved for CFG IOs
(dw_pcie_other_conf_map_bus()), however, we should be able to use the
exact same logic for the indexing of the outbound and inbound iATUs.
(Only the starting index should be different.)
Create two new variables ob_iatu_index and ib_iatu_index, which makes
it more clear from the name itself that it is a zeroes based index,
and only increment the index if the iATU configuration call succeeded.
Since we always check if there is an index available immediately before
programming the iATU, we can remove the useless "ranges exceed outbound
iATU size" warnings, as the code is already unreachable. For the same
reason, we can also remove the useless breaks outside of the while loops.
No functional changes intended.
Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Zhang <zhanghuabing@ecosda.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260127151038.1484881-7-cassel@kernel.org
Niklas Cassel [Tue, 27 Jan 2026 15:10:39 +0000 (16:10 +0100)]
PCI: dwc: Fix msg_atu_index assignment
When dw_pcie_iatu_setup() configures outbound address translation for both
type PCIE_ATU_TYPE_MEM and PCIE_ATU_TYPE_IO, the iATU index to use is
incremented before calling dw_pcie_prog_outbound_atu().
However for msg_atu_index, the index is not incremented before use,
causing the iATU index to be the same as the last configured iATU index,
which means that it will incorrectly use the same iATU index that is
already in use, breaking outbound address translation.
In total there are three problems with this code:
-It assigns msg_atu_index the same index that was used for the last
outbound address translation window, rather than incrementing the index
before assignment.
-The index should only be incremented (and msg_atu_index assigned) if the
use_atu_msg feature is actually requested/in use (pp->use_atu_msg is set).
-If the use_atu_msg feature is requested/in use, and there are no outbound
iATUs available, the code should return an error, as otherwise when this
this feature is used, it will use an iATU index that is out of bounds.
Fixes: e1a4ec1a9520 ("PCI: dwc: Add generic MSG TLP support for sending PME_Turn_Off when system suspend") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Zhang <zhanghuabing@ecosda.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260127151038.1484881-6-cassel@kernel.org
Aksh Garg [Fri, 30 Jan 2026 11:55:16 +0000 (17:25 +0530)]
PCI: dwc: ep: Add comment explaining controller level PTM access in multi PF setup
PCIe r6.0, section 7.9.15 requires PTM capability in exactly one
function to control all PTM-capable functions. This makes PTM registers
controller level rather than per-function.
Add a comment explaining why PTM capability registers are accessed
using the standard DBI accessors instead of func_no indexed
per-function accessors.
Aksh Garg [Fri, 30 Jan 2026 11:55:15 +0000 (17:25 +0530)]
PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support
The commit 24ede430fa49 ("PCI: designware-ep: Add multiple PFs support
for DWC") added support for multiple PFs in the DWC driver, but the
implementation was incomplete. It did not properly support MSI/MSI-X,
as well as BAR and inbound ATU mapping for multiple PFs. The MSI/MSI-X
issue was later fixed by commit 47a062609a30 ("PCI: designware-ep:
Modify MSI and MSIX CAP way of finding") by introducing a per-PF
struct dw_pcie_ep_func.
However, even with both commits, the multiple PF support in the driver
remains broken because BAR configuration and ATU mappings are managed
globally in struct dw_pcie_ep, meaning all PFs share the same BAR-to-ATU
mapping table. This causes one PF's EPF to overwrite the address
translation of another PF's EPF in the internal ATU region,
creating conflicts when multiple physical functions attempt to
configure their BARs independently.
The commit cfbc98dbf44d ("PCI: dwc: ep: Support BAR subrange inbound
mapping via Address Match Mode iATU") later introduced Address Match
Mode support, which suffers from the same multi-PF conflict issue.
Fix this by moving the required members from struct dw_pcie_ep to
struct dw_pcie_ep_func, similar to what commit 47a062609a30
("PCI: designware-ep: Modify MSI and MSIX CAP way of finding") did for
MSI/MSI-X capability support, to allow proper multi-function endpoint
operation, where each PF can configure its BARs and corresponding
internal ATU region without interfering with other PFs.
Fixes: 24ede430fa49 ("PCI: designware-ep: Add multiple PFs support for DWC") Fixes: cc839bef7727 ("PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU") Signed-off-by: Aksh Garg <a-garg7@ti.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20260130115516.515082-3-a-garg7@ti.com
Vincent Guittot [Mon, 2 Feb 2026 15:10:50 +0000 (16:10 +0100)]
PCI: s32g: Skip Root Port removal during success
Currently, s32g_pcie_parse_ports() exercises the 'err_port' path even
during the success case. This results in ports getting deleted after
successful parsing of Root Ports.
Hence, skip the removal of Root Ports during success.
Niklas Schnelle [Tue, 16 Dec 2025 22:14:03 +0000 (23:14 +0100)]
PCI/IOV: Fix race between SR-IOV enable/disable and hotplug
Commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking when
enabling/disabling SR-IOV") tried to fix a race between the VF removal
inside sriov_del_vfs() and concurrent hot unplug by taking the PCI
rescan/remove lock in sriov_del_vfs(). Similarly the PCI rescan/remove lock
was also taken in sriov_add_vfs() to protect addition of VFs.
This approach however causes deadlock on trying to remove PFs with SR-IOV
enabled because PFs disable SR-IOV during removal and this removal happens
under the PCI rescan/remove lock. So the original fix had to be reverted.
Instead of taking the PCI rescan/remove lock in sriov_add_vfs() and
sriov_del_vfs(), fix the race that occurs with SR-IOV enable and disable vs
hotplug higher up in the callchain by taking the lock in
sriov_numvfs_store() before calling into the driver's sriov_configure()
callback.
Niklas Schnelle [Tue, 16 Dec 2025 22:14:02 +0000 (23:14 +0100)]
Revert "PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV"
This reverts commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking
when enabling/disabling SR-IOV"), which causes a deadlock by recursively
taking pci_rescan_remove_lock when sriov_del_vfs() is called as part of
pci_stop_and_remove_bus_device(). For example with the following sequence
of commands:
Håkon Bugge [Thu, 29 Jan 2026 17:52:33 +0000 (18:52 +0100)]
PCI/ACPI: Restrict program_hpx_type2() to AER bits
Previously program_hpx_type2() applied PCIe settings unconditionally,
which could incorrectly change bits like Extended Tag Field Enable and
Enable Relaxed Ordering.
When _HPX was added to ACPI r3.0, the intent of the PCIe Setting
Record (Type 2) in sec 6.2.7.3 was to configure AER registers when the
OS does not own the AER Capability:
The PCI Express setting record contains ... [the AER] Uncorrectable
Error Mask, Uncorrectable Error Severity, Correctable Error Mask
... to be used when configuring registers in the Advanced Error
Reporting Extended Capability Structure ...
OSPM [1] will only evaluate _HPX with Setting Record – Type 2 if
OSPM is not controlling the PCI Express Advanced Error Reporting
capability.
ACPI r3.0b, sec 6.2.7.3, added more AER registers, including registers
in the PCIe Capability with AER-related bits, and the restriction that
the OS use this only when it owns PCIe native hotplug:
... when configuring PCI Express registers in the Advanced Error
Reporting Extended Capability Structure *or PCI Express Capability
Structure* ...
An OS that has assumed ownership of native hot plug but does not
... have ownership of the AER register set must use ... the Type 2
record to program the AER registers ...
However, since the Type 2 record also includes register bits that
have functions other than AER, the OS must ignore values ... that
are not applicable.
Restrict program_hpx_type2() to only the intended purpose:
- Apply settings only when OS owns PCIe native hotplug but not AER,
- Only touch the AER-related bits (Error Reporting Enables) in Device
Control
- Don't touch Link Control at all, since nothing there seems AER-related,
but log _HPX settings for debugging purposes
Note that Read Completion Boundary is now configured elsewhere, since it is
unrelated to _HPX.
[1] Operating System-directed configuration and Power Management
Håkon Bugge [Thu, 29 Jan 2026 17:52:32 +0000 (18:52 +0100)]
PCI: Initialize RCB from pci_configure_device()
Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root
Port supports it (_HPX)") worked around a bogus _HPX type 2 record, which
caused program_hpx_type2() to set the RCB in an endpoint even though the
Root Port did not have the RCB bit set.
e42010d8207f fixed that by setting the RCB in the endpoint only when it was
set in the Root Port.
In retrospect, program_hpx_type2() is intended for AER-related settings,
and the RCB should be configured elsewhere so it doesn't depend on the
presence or contents of an _HPX record.
Explicitly program the RCB from pci_configure_device() so it matches the
Root Port's RCB. The Root Port may not be visible to virtualized guests;
in that case, leave RCB alone.
Aksh Garg [Fri, 30 Jan 2026 11:55:14 +0000 (17:25 +0530)]
PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations
The resizable BAR support added by the commit 3a3d4cabe681 ("PCI: dwc: ep:
Allow EPF drivers to configure the size of Resizable BARs") incorrectly
configures the resizable BARs only for the first Physical Function (PF0)
in EP mode.
The resizable BAR configuration functions use generic dw_pcie_*_dbi()
operations instead of physical function specific dw_pcie_ep_*_dbi()
operations. This causes resizable BAR configuration to always target
PF0 regardless of the requested function number.
Additionally, dw_pcie_ep_init_non_sticky_registers() only initializes
resizable BAR registers for PF0, leaving other PFs unconfigured during
the execution of this function.
Fix this by using physical function specific configuration space access
operations throughout the resizable BAR code path and initializing
registers for all the physical functions that support resizable BARs.
Niklas Cassel [Fri, 30 Jan 2026 11:30:39 +0000 (12:30 +0100)]
PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes
Add bar{0,1,2,3,4,5}_size attributes in configfs, so that the user is not
restricted to run pci-epf-test with the hardcoded BAR size values defined
in pci-epf-test.c.
This code is shamelessly more or less copy pasted from pci-epf-vntb.c
Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Koichiro Den <den@valinux.co.jp> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260130113038.2143947-2-cassel@kernel.org
Koichiro Den [Sat, 24 Jan 2026 14:50:12 +0000 (23:50 +0900)]
selftests: pci_endpoint: Add BAR subrange mapping test case
Add BAR_SUBRANGE_TEST to the pci_endpoint kselftest suite.
The test uses the PCITEST_BAR_SUBRANGE ioctl and will skip when the
chosen BAR is disabled (-ENODATA), when the endpoint/controller does not
support subrange mapping (-EOPNOTSUPP), or when the BAR is reserved for
the test register space (-EBUSY).
Koichiro Den [Sat, 24 Jan 2026 14:50:11 +0000 (23:50 +0900)]
misc: pci_endpoint_test: Add BAR subrange mapping test case
Add a new PCITEST_BAR_SUBRANGE ioctl to exercise EPC BAR subrange
mapping end-to-end.
The test programs a simple 2-subrange layout on the endpoint (via
pci-epf-test) and verifies that:
- the endpoint-provided per-subrange signature bytes are observed at
the expected PCIe BAR offsets, and
- writes to each subrange are routed to the correct backing region
(i.e. the submap order is applied rather than accidentally working
due to an identity mapping).
Return -EOPNOTSUPP when the endpoint does not advertise subrange
mapping, -ENODATA when the BAR is disabled, and -EBUSY when the BAR is
reserved for the test register space.
Koichiro Den [Sat, 24 Jan 2026 14:50:10 +0000 (23:50 +0900)]
PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support
Extend pci-epf-test so that pci_endpoint_test can exercise BAR subrange
mapping end-to-end.
Add BAR_SUBRANGE_SETUP/CLEAR commands that program (and tear down) a
simple 2-subrange layout for a selected BAR. The endpoint deliberately
permutes the physical backing regions (swap the halves) and writes a
deterministic signature byte per subrange. This allows the RC to verify
that the submap order is actually applied, not just that reads/writes
work with an identity mapping.
Advertise CAP_SUBRANGE_MAPPING only when the underlying EPC supports
dynamic_inbound_mapping and subrange_mapping. Also bump the default BAR
sizes (BAR0-4) to 128 KiB so that split subranges are large enough to
satisfy common inbound-translation alignment constraints. E.g. for DWC
EP, the default and maximum CX_ATU_MIN_REGION_SIZE is 64 kB, so 128 KiB
is sufficient for DWC-based EP platforms for 2-subrange testing.
The current documentation implies that pci_epc_set_bar() is only used
before the host enumerates the endpoint.
In practice, some Endpoint Controllers support calling pci_epc_set_bar()
multiple times for the same BAR (without clearing it) in order to update
inbound address translations after the host has programmed the BAR base
address, which some Endpoint Functions such as vNTB already rely on.
Add document text for that.
Also document the expected call flow for BAR subrange mapping
(pci_epf_bar.num_submap / pci_epf_bar.submap), which may require a
second pci_epc_set_bar() call after the host has programmed the BAR base
address.
Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260124145012.2794108-6-den@valinux.co.jp
Koichiro Den [Sat, 24 Jan 2026 14:50:08 +0000 (23:50 +0900)]
PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU
Extend dw_pcie_ep_set_bar() to support inbound mappings for BAR
subranges using Address Match Mode IB iATU when pci_epf_bar.num_submap
is non-zero.
Rename the existing BAR-match helper into dw_pcie_ep_ib_atu_bar() and
introduce dw_pcie_ep_ib_atu_addr() for Address Match Mode. When
num_submap is non-zero, read the assigned BAR base address and program
one inbound iATU window per subrange. Validate the submap array before
programming:
- each subrange is aligned to pci->region_align
- subranges cover the whole BAR (no gaps and no overlaps)
Track Address Match Mode mappings and tear them down on clear_bar() and
on set_bar() error paths to avoid leaving half-programmed state or
untranslated BAR holes.
Advertise this capability by extending the common feature bit
initializer macro (DWC_EPC_COMMON_FEATURES).
This enables multiple inbound windows within a single BAR, which is
useful on platforms where usable BARs are scarce but EPFs need multiple
inbound regions.
Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20260124145012.2794108-5-den@valinux.co.jp
PCI/pwrctrl: Create pwrctrl device if graph port is found
The devicetree node of the PCIe Root Port/Slot could have the graph port to
link the PCIe M.2 connector node. Since the M.2 connectors are modeled as
Power Sequencing devices, they need to be controlled by the pwrctrl driver
like the Root Port/Slot supplies.
Hence, create the pwrctrl device if the graph port is found in the node.
Sergey Shtylyov [Tue, 27 Jan 2026 20:39:42 +0000 (23:39 +0300)]
PCI: Check parent for NULL in of_pci_bus_release_domain_nr()
of_pci_bus_find_domain_nr() allows its parent parameter to be NULL but
of_pci_bus_release_domain_nr() (that undoes its effect) doesn't -- that
means it's going to blow up while calling of_get_pci_domain_nr() if the
parent parameter indeed happens to be NULL. Add the missing NULL check.
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
Add support for handling PCIe M.2 connectors as Power Sequencing devices.
These connectors are exposed as Power Sequencing devices as they often
support multiple interfaces like PCIe/SATA, USB/UART to the host machine,
and the interfaces may be driven by different client drivers at the same
time.
This driver handles the PCIe interface of these connectors. It first checks
for the presence of the graph port in the Root Port node with the help of
of_graph_is_present() API. If present, it acquires/powers ON the
corresponding pwrseq device.
Once the pwrseq device is powered ON, the driver will skip parsing the Root
Port/Slot resources and register with the pwrctrl framework.
Koichiro Den [Sat, 24 Jan 2026 14:50:07 +0000 (23:50 +0900)]
PCI: dwc: Advertise dynamic inbound mapping support
The DesignWare EP core has supported updating the inbound iATU mapping
for an already configured BAR (i.e. allowing pci_epc_set_bar() to be
called again without a prior pci_epc_clear_bar()) since
commit 4284c88fff0e ("PCI: designware-ep: Allow pci_epc_set_bar() update
inbound map address").
Now that this capability is exposed via the dynamic_inbound_mapping EPC
feature bit, set it for DWC-based EP glue drivers using a common
initializer macro to avoid duplicating the same flag in each driver.
Note that pci-layerscape-ep.c is untouched. It currently constructs the
feature struct dynamically in ls_pcie_ep_init(). Once converted to a
static feature definition, it will use DWC_EPC_COMMON_FEATURES as well.
Koichiro Den [Sat, 24 Jan 2026 14:50:06 +0000 (23:50 +0900)]
PCI: endpoint: Add BAR subrange mapping support
Some endpoint platforms have only a small number of usable BARs. At the
same time, EPF drivers (e.g. vNTB) may need multiple independent inbound
regions (control/scratchpad, one or more memory windows, and optionally
MSI or other feature-related regions). Subrange mapping allows these to
share a single BAR without consuming additional BARs that may not be
available, or forcing a fragile layout by aggressively packing into a
single contiguous memory range.
Extend the PCI endpoint core to support mapping subranges within a BAR.
Add an optional 'submap' field in struct pci_epf_bar so an endpoint
function driver can request inbound mappings that fully cover the BAR.
Introduce a new EPC feature bit, subrange_mapping, and reject submap
requests from pci_epc_set_bar() unless the controller advertises both
subrange_mapping and dynamic_inbound_mapping features.
The submap array describes the complete BAR layout (no overlaps and no
gaps are allowed to avoid exposing untranslated address ranges). This
provides the generic infrastructure needed to map multiple logical
regions into a single BAR at different offsets, without assuming a
controller-specific inbound address translation mechanism.
Introduce a new EPC feature bit (dynamic_inbound_mapping) that indicates
whether an Endpoint Controller can update the inbound address
translation for a BAR without requiring the EPF driver to clear/reset
the BAR first.
Endpoint Function drivers (e.g. vNTB) can use this information to decide
whether it really is safe to call pci_epc_set_bar() multiple times to
update inbound mappings for the BAR.
Suggested-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260124145012.2794108-2-den@valinux.co.jp
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:36 +0000 (19:40 +0200)]
PCI: Move CardBus bridge scanning to setup-cardbus.c
The PCI core's pci_scan_bridge_extend() contains convoluted logic specific
to setting up bus numbers for legacy CardBus bridges. Extract the CardBus
specific part out into setup-cardbus.c to make the core code cleaner and
allow omitting CardBus bridge support from modern systems.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:35 +0000 (19:40 +0200)]
PCI: Add pbus_validate_busn() for Bus Number validation
pci_scan_bridge_extend() validates bus numbers but upcoming changes that
separate CardBus code into own function need to call that the same
validation.
Thus, add pbus_validate_busn for validating the Bus Numbers.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:33 +0000 (19:40 +0200)]
PCI: Add dword #defines for Bus Number + Secondary Latency Timer
uapi/linux/pci_regs.h defines Primary/Secondary/Subordinate Bus Numbers
and Secondary Latency Timer (PCIe r7.0, sec. 7.5.1.3) as byte register
offsets, but in practice the code may read/write the entire dword. In the
lack of #defines to handle the dword fields, the code ends up using
literals which are not as easy to read.
Add dword field masks for the Bus Number and Secondary Latency Timer
fields and use them in probe.c.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:32 +0000 (19:40 +0200)]
PCI: Use scnprintf() instead of sprintf()
Using sprintf() is deprecated as it does not do proper size checks. While
the code in pci_scan_bridge_extend() is safe with respect to overwriting
the destination buffer, use scnprintf() to not promote use of a deprecated
sprint() (and allow eventually removing it from the kernel).
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:30 +0000 (19:40 +0200)]
PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS
PCI bridge window setup code includes special code to handle CardBus
bridges. CardBus has long since fallen out of favor and modern systems have
no use for it.
Move CardBus setup code to its own file and use existing CONFIG_CARDBUS to
decide whether it should be built or not.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:29 +0000 (19:40 +0200)]
PCI: Add 'pci' prefix to struct pci_dev_resource handling functions
setup-bus.c has static functions for handling struct pci_dev_resource
related operation which have no prefixes. Add 'pci' prefixes to those
function names as add_to_list() will be needed in another file by an
upcoming change.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:28 +0000 (19:40 +0200)]
PCI: Use resource_assigned() in setup-bus.c algorithm
Many places in the resource fitting and assignment algorithm want to
know if the resource is assigned into the resource tree or not. Convert
open-coded ->parent checks to use resource_assigned().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:27 +0000 (19:40 +0200)]
resource: Mark res given to resource_assigned() as const
The caller may hold a const struct resource which will trigger an
unnecessary warning when calling resource_assigned() as it will not
modify res in any way.
Mark resource_assigned()'s struct resource *res parameter const to
avoid the compiler warning.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:26 +0000 (19:40 +0200)]
PCI: Add pbus_mem_size_optional() to handle optional sizes
The resource loop in pbus_size_mem() handles optional resources that are
either fully optional (SR-IOV and disabled Expansion ROMs) or bridge
windows that may be optional only for a part. The logic is a little
inconsistent when it comes to a bridge window that has only optional
children resources as it would be more natural to treat it similar to any
fully optional resource. As resource size should be zero in that case, it
shouldn't cause any bugs but it still seems useful to address the
inconsistency.
Place the optional size related code of pbus_size_mem() into
pbus_mem_size_optional() and add a check in pci_resource_is_optional()
for entirely optional bridge windows. Reorder the logic inside
pbus_mem_size_optional() such that fully optional resources are handled
the same irrespective of whether the resource is a bridge window or
not.
Additional motivation for this are the upcoming changes that add complexity
to the optional sizing logic due to Resizable BAR awareness. The extra
logic would exceed any reasonable indentation level if the optional sizing
code is kept within the loop body.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:25 +0000 (19:40 +0200)]
PCI: Check invalid align earlier in pbus_size_mem()
Check for invalid align before any bridge window sizing actions in
pbus_size_mem() to avoid need to roll back any sizing calculations.
Placing the check earlier will make it easier to add more optional size
related calculations at where the SR-IOV logic currently is in
pbus_size_mem().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:24 +0000 (19:40 +0200)]
PCI: Log reset and restore of resources
PCI resource fitting and assignment is complicated to track because it
performs many actions without any logging. One of these is resource reset
(zeroing the resource) and the restore during the multi-pass resource
fitting algorithm.
Resource reset does not play well with the other PCI code if the code later
wants to reattempt assignment of that resource. Knowing that a resource was
left in the reset state without a pairing restore is useful for
understanding issues that show up as resource assignment failures.
Add pci_dbg() to both reset and restore to be better able to track what's
going on within the resource fitting algorithm.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:22 +0000 (19:40 +0200)]
PCI: Fetch dev_res to local var in __assign_resources_sorted()
__assign_resources_sorted() calls get_res_add_size() and
get_res_add_align(), each walking through the realloc_head list to
relocate the corresponding pci_dev_resource entry.
Fetch the pci_dev_resource entry into a local variable to avoid double
walk.
In addition, reverse logic to reduce indentation level.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:21 +0000 (19:40 +0200)]
PCI: Use res_to_dev_res() in reassign_resources_sorted()
reassign_resources_sorted() contains a search loop for a particular
resource in the head list. res_to_dev_res() already implements the same
search so use it instead.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:20 +0000 (19:40 +0200)]
PCI: Pass bridge window resource to pbus_size_mem()
pbus_size_mem() inputs type and calculates bridge window resource within.
Its caller (__pci_bus_size_bridges()) also has to look up the prefetchable
window to determine if it exists or not to decide whether to call
pbus_size_mem() twice or once.
Change the interface such that the caller is responsible in providing the
bridge window resource. Passing the resource directly avoids another lookup
for the prefetchable window inside pbus_size_mem().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:19 +0000 (19:40 +0200)]
PCI: Push realloc check into pbus_size_mem()
pbus_size_mem() and calculate_memsize() input both min_size and add_size.
They are given the same value if realloc_head is NULL and min_size is 0
otherwise. Both are used in calculate_memsize() to enforce a lower bound to
the size.
The interface between __pci_bus_size_bridges() and the forementioned
functions can be simplied by pushing the realloc check into
pbus_size_mem().
There are only two possible cases:
1) when calculating size0, add_size parameter given to
calculate_memsize() is always 0 which implies only min_size
matters.
2) when calculating size1, realloc_head is not NULL which implies
min_size=0 so only add_size matters.
Drop min_size parameter from pbus_size_mem() and check realloc_head when
calling calculate_memsize(). Drop add_size from calculate_memsize() and use
only min_size within max() to enforce the lower bound.
calculate_iosize() is a bit more complicated than calculate_memsize() and
is therefore left as is, but pbus_size_io() can still input only min_size
similar to pbus_size_mem().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:18 +0000 (19:40 +0200)]
PCI: Remove old_size limit from bridge window sizing
calculate_memsize() applies lower bound to the resource size before
aligning the resource size making it impossible to shrink bridge window
resources. I've not found any justification for this lower bound and
nothing indicated it was to work around some HW issue.
Prior to the commit 3baeae36039a ("PCI: Use pci_release_resource() instead
of release_resource()"), releasing a bridge window during BAR resize
resulted in clearing start and end address of the resource. Clearing
addresses destroys the resource size as a side-effect, therefore nullifying
the effect of the old size lower bound.
After the commit 3baeae36039a ("PCI: Use pci_release_resource() instead of
release_resource()"), BAR resize uses the aligned old size, which results
in exceeding what fits into the parent window in some cases:
xe 0030:03:00.0: [drm] Attempting to resize bar from 256MiB -> 16384MiB
xe 0030:03:00.0: BAR 0 [mem 0x620c000000000-0x620c000ffffff 64bit]: releasing
xe 0030:03:00.0: BAR 2 [mem 0x6200000000000-0x620000fffffff 64bit pref]: releasing
pci 0030:02:01.0: bridge window [mem 0x6200000000000-0x620001fffffff 64bit pref]: releasing
pci 0030:01:00.0: bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref]: releasing
pci 0030:00:00.0: bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref]: was not released (still contains assigned resources)
pci 0030:00:00.0: Assigned bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref] to [bus 01-04] free space at [mem 0x6200400000000-0x62007ffffffff 64bit pref]
pci 0030:00:00.0: Assigned bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref] to [bus 01-04] cannot fit 0x4000000000 required for 0030:01:00.0 bridging to [bus 02-04]
The old size of 0x6200000000000-0x6203fbff0ffff resource was used as the
lower bound which results in 0x4000000000 size request due to alignment.
That exceeds what can fit into the parent window.
Since the lower bound never even was enforced fully because the resource
addresses were cleared when the bridge window is released, remove the
old_size lower bound entirely and trust the calculated bridge window size
is enough.
This same problem may occur on io window side but seems less likely to
cause issues due to general difference in alignment. Removing the lower
bound may have other unforeseen consequences in case of io window so it's
better to leave it as -next material if no problem is reported related to
io window sizing (BAR resize shouldn't touch io windows anyway).
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:17 +0000 (19:40 +0200)]
resource: Increase MAX_IORES_LEVEL to 8
While debugging a PCI resource allocation issue, the resources for many
nested bridges and endpoints got flattened in /proc/iomem by
MAX_IORES_LEVEL that is set to 5. This made the iomem output hard to
read as the visual hierarchy cues were lost.
Increase MAX_IORES_LEVEL to 8 to avoid flattening PCI topologies with
nested bridges so aggressively (the case in the Link has the deepest
resource at level 7 so 8 looks a reasonable limit).
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:16 +0000 (19:40 +0200)]
PCI: Stop over-estimating bridge window size
New way to calculate the bridge window head alignment produces tight-fit,
that is, it does not leave any gaps between the resources. Similarly,
relaxed tail alignment does not leave extra tail room.
Start to use bridge window calculation that does not over-estimate the size
of the required window.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:15 +0000 (19:40 +0200)]
PCI: Rewrite bridge window head alignment function
The calculation of bridge window head alignment is done by
calculate_mem_align() [*]. With the default bridge window alignment, it
is used for both head and tail alignment.
The selected head alignment does not always result in tight-fitting
resources (gap at d4f00000-d4ffffff):
This has not caused problems (for years) with the default bridge window
tail alignment that grossly over-estimates the required tail alignment
leaving more tail room than necessary. With the introduction of relaxed
tail alignment that leaves no extra tail room whatsoever, any gaps will
immediately turn into assignment failures.
Introduce head alignment calculation that ensures no gaps are left and
apply the new approach when using relaxed alignment. We may want to
consider using it for the normal alignment eventually, but as the first
step, solve only the problem with the relaxed tail alignment.
([*] I don't understand the algorithm in calculate_mem_align().)
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:14 +0000 (19:40 +0200)]
PCI: Fix bridge window alignment with optional resources
pbus_size_mem() has two alignments, one for required resources in min_align
and another in add_align that takes account optional resources.
The add_align is applied to the bridge window through the realloc_head
list. It can happen, however, that add_align is larger than min_align but
calculated size1 and size0 are equal due to extra tailroom (e.g., hotplug
reservation, tail alignment), and therefore no entry is created to the
realloc_head list. Without the bridge appearing in the realloc head,
add_align is lost when pbus_size_mem() returns.
The problem is visible in this log for 0000:05:00.0 which lacks
add_size ... add_align ... line that would indicate it was added into
the realloc_head list:
While this bug itself seems old, it has likely become more visible after
the relaxed tail alignment that does not grossly overestimate the size
needed for the bridge window.
Make sure add_align > min_align too results in adding an entry into the
realloc head list. In addition, add handling to the cases where add_size is
zero while only alignment differs.
Fixes: d74b9027a4da ("PCI: Consider additional PF's IOV BAR alignment in sizing and assigning") Reported-by: Malte Schröder <malte+lkml@tnxip.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Malte Schröder <malte+lkml@tnxip.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251219174036.16738-2-ilpo.jarvinen@linux.intel.com
While pci_primary_epc_epf_link() and pci_secondary_epc_epf_link() specify
the parameters in the correct order, pci_primary_epc_epf_unlink() and
pci_secondary_epc_epf_unlink() specify the parameters in the wrong order,
leading to the below kernel crash when using the unlink command in
configfs:
Unable to handle kernel paging request at virtual address 0000000300000857
Mem abort info:
...
pc : string+0x54/0x14c
lr : vsnprintf+0x280/0x6e8
...
string+0x54/0x14c
vsnprintf+0x280/0x6e8
vprintk_default+0x38/0x4c
vprintk+0xc4/0xe0
pci_epf_unbind+0xdc/0x108
configfs_unlink+0xe0/0x208+0x44/0x74
vfs_unlink+0x120/0x29c
__arm64_sys_unlinkat+0x3c/0x90
invoke_syscall+0x48/0x134
do_el0_svc+0x1c/0x30prop.0+0xd0/0xf0
Fixes: e85a2d783762 ("PCI: endpoint: Add support in configfs to associate two EPCs with EPF") Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
[mani: cced stable, changed commit message as per https://lore.kernel.org/linux-pci/aV9joi3jF1R6ca02@ryzen] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260108062747.1870669-1-mmaddireddy@nvidia.com
Qiang Yu [Thu, 22 Jan 2026 07:45:19 +0000 (23:45 -0800)]
PCI: dwc: Rename dw_pcie_rp::has_msi_ctrl to dw_pcie_rp::use_imsi_rx for clarity
The current "has_msi_ctrl" flag name is misleading because it suggests the
presence of any MSI controller, while it is specifically set for platforms
that lack .msi_init() callback and don't have "msi-parent" or "msi-map"
device tree properties, indicating they rely on the iMSI-RX module for MSI
functionality.
Rename it to "use_imsi_rx" to make the intent clear:
- When true: Platform uses the iMSI-RX module for MSI handling
- When false: Platform has other MSI controller support (ITS/MBI, external
MSI controller)
No functional changes, only improves code readability and eliminates
naming confusion.
Brian Norris [Thu, 22 Jan 2026 17:48:15 +0000 (09:48 -0800)]
PCI/PM: Prevent runtime suspend until devices are fully initialized
Previously, it was possible for a PCI device to be runtime-suspended before
it was fully initialized. When that happened, the suspend process could
save invalid device state, for example, before BAR assignment. Restoring
the invalid state during resume may leave the device non-functional.
Prevent runtime suspend for PCI devices until they are fully initialized by
deferring pm_runtime_enable().
More details on how exactly this may occur:
1. PCI device is created by pci_scan_slot() or similar
2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and
prevents runtime suspend prevented via pm_runtime_forbid()
3. pci_device_add() adds the underlying 'struct device' via device_add(),
which means user space can allow runtime suspend, e.g.,
echo auto > /sys/bus/pci/devices/.../power/control
4. PCI device receives BAR configuration
(pci_assign_unassigned_bus_resources(), etc.)
5. pci_bus_add_device() applies final fixups, saves device state, and
tries to attach a driver
The device may potentially be suspended between #3 and #5, so this is racy
with user space (udev or similar).
Many PCI devices are enumerated at subsys_initcall time and so will not
race with user space, but devices created later by hotplug or modular
pwrctrl or host controller drivers are susceptible to this race.