- Add more logging of resource assignment (Ilpo Järvinen)
- Add pbus_mem_size_optional() to handle sizes of optional resources
(SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)
- Move CardBus code to setup-cardbus.c and only build it when
CONFIG_CARDBUS is set (Ilpo Järvinen)
- Use scnprintf() instead of sprintf() (Ilpo Järvinen)
- Add pbus_validate_busn() for Bus Number validation (Ilpo Järvinen)
- Don't claim disabled bridge windows to avoid spurious claim failures
(Ilpo Järvinen)
* pci/resource:
PCI: Don't claim disabled bridge windows
PCI: Move CardBus bridge scanning to setup-cardbus.c
PCI: Add pbus_validate_busn() for Bus Number validation
PCI: Add dword #defines for Bus Number + Secondary Latency Timer
PCI: Use scnprintf() instead of sprintf()
PCI: Handle CardBus-specific params in setup-cardbus.c
PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS
PCI: Add 'pci' prefix to struct pci_dev_resource handling functions
PCI: Use resource_assigned() in setup-bus.c algorithm
resource: Mark res given to resource_assigned() as const
PCI: Add pbus_mem_size_optional() to handle optional sizes
PCI: Check invalid align earlier in pbus_size_mem()
PCI: Log reset and restore of resources
PCI: Add pci_resource_is_bridge_win()
PCI: Fetch dev_res to local var in __assign_resources_sorted()
PCI: Use res_to_dev_res() in reassign_resources_sorted()
PCI: Pass bridge window resource to pbus_size_mem()
PCI: Push realloc check into pbus_size_mem()
PCI: Remove old_size limit from bridge window sizing
resource: Increase MAX_IORES_LEVEL to 8
PCI: Stop over-estimating bridge window size
PCI: Rewrite bridge window head alignment function
PCI: Fix bridge window alignment with optional resources
PCI: Use resource_set_range() that correctly sets ->end
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:24 +0000 (17:09 -0600)]
Merge branch 'pci/pwrctrl'
- Rename pwrseq, tc9563, and slot driver structs, variables, and functions
for consistency (Bjorn Helgaas)
- Add power_on/off callbacks with generic signature to pwrseq, tc9563, and
slot drivers so they can be used by pwrctrl core (Manivannan Sadhasivam)
- Add interfaces to create and destroy pwrctrl devices (Krishna Chaitanya
Chundru)
- Add interfaces to power devices on and off (Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the PCI
core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver (Manivannan
Sadhasivam)
* pci/pwrctrl:
PCI/pwrctrl: Create pwrctrl device if graph port is found
PCI/pwrctrl: Add PCIe M.2 connector support
PCI: Drop the assert_perst() callback
PCI: qcom: Drop the assert_perst() callbacks
PCI/pwrctrl: Switch to pwrctrl create, power on/off, destroy APIs
PCI/pwrctrl: Add APIs to power on/off pwrctrl devices
PCI/pwrctrl: Add APIs to create, destroy pwrctrl devices
PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks
PCI/pwrctrl: pwrseq: Factor out power on/off code to helpers
PCI/pwrctrl: slot: Factor out power on/off code to helpers
PCI/pwrctrl: tc9563: Rename private struct and pointers for consistency
PCI/pwrctrl: tc9563: Add local variables to reduce repetition
PCI/pwrctrl: tc9563: Clean up whitespace
PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
PCI/pwrctrl: slot: Rename private struct and pointers for consistency
PCI/pwrctrl: pwrseq: Rename private struct and pointers for consistency
- Remove unnecessary bus_type check in pcie_port_bus_match() (Uwe
Kleine-König)
- Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
portdrv.c (Uwe Kleine-König)
- Remove unnecessary dev and dev->driver checks in portdrv .probe() and
.remove() (Uwe Kleine-König)
- Take advantage of pcie_port_bus_type.probe() and .remove() instead of
assigning them for each portdrv service driver (Uwe Kleine-König)
* pci/portdrv:
PCI/portdrv: Use bus-type functions
PCI/portdrv: Don't check for valid device and driver in bus callbacks
PCI/portdrv: Move pcie_port_bus_type to pcie source file
PCI/portdrv: Don't check for the driver's and device's bus
PCI/portdrv: Drop empty shutdown callback
PCI/portdrv: Fix potential resource leak
Bjorn Helgaas [Fri, 6 Feb 2026 23:09:15 +0000 (17:09 -0600)]
Merge branch 'pci/enumeration'
- Skip enabling ExtTag on VFs since that bit is Reserved and causes
misleading log messages (Håkon Bugge)
- Mark 3ware-9650SA Root Port Extended Tags as broken since 9650SA can't
handle 8-bit tags (Jörg Wedekind)
- Release domain number from the correct IDA when a PCI host bridge has no
parent device (Sergey Shtylyov)
- Initialize endpoint Read Completion Boundary to match Root Port,
regardless of ACPI _HPX (Håkon Bugge)
- Apply _HPX PCIe Setting Record only to AER configuration, and only when
OS owns PCIe hotplug but not AER, to avoid clobbering Extended Tag and
Relaxed Ordering settings (Håkon Bugge)
- Clear PCIe Root Status register with a write, not a read/modify/write
(Lukas Wunner)
* pci/enumeration:
PCI/PME: Replace RMW of Root Status register with direct write
PCI/ACPI: Restrict program_hpx_type2() to AER bits
PCI: Initialize RCB from pci_configure_device()
PCI: Check parent for NULL in of_pci_bus_release_domain_nr()
PCI: Mark 3ware-9650SA Root Port Extended Tags as broken
PCI: Do not attempt to set ExtTag for VFs
Ilpo Järvinen [Fri, 16 Jan 2026 13:15:12 +0000 (15:15 +0200)]
PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
The commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as
PCIe BW controller") was found to lead to a boot hang on a Intel P45
system. Testing without setting Link Bandwidth Management Interrupt Enable
(LBMIE) and Link Autonomous Bandwidth Interrupt Enable (LABIE) (PCIe r7.0,
sec 7.5.3.7) in bwctrl allowed system to come up.
P45 is a very old chipset and supports only up to gen2 PCIe, so not having
bwctrl does not seem a huge deficiency.
Add no_bw_notif in struct pci_dev and quirk Intel P45 Root Port with it.
Lukas Wunner [Sun, 26 Oct 2025 16:57:57 +0000 (17:57 +0100)]
PCI/PME: Replace RMW of Root Status register with direct write
As of PCIe r7.0, the Root Status register contains a single writeable bit
(PME Status, type RW1C) and otherwise just read-only bits and RsvdZ bits
(which software must write as zero, PCIe r7.0 sec 7.4).
Thus, when clearing the PME Status bit, there's no need to perform a
read-modify-write of the register. Instead, the bit can be written
directly.
Lukas Wunner [Sun, 25 Jan 2026 09:25:51 +0000 (10:25 +0100)]
PCI/AER: Clear stale errors on reporting agents upon probe
Correctable and Uncorrectable Error Status Registers on reporting agents
are cleared upon PCI device enumeration in pci_aer_init() to flush past
events. They're cleared again when an error is handled by the AER driver.
If an agent reports a new error after pci_aer_init() and before the AER
driver has probed on the corresponding Root Port or Root Complex Event
Collector, that error is not handled by the AER driver: It clears the
Root Error Status Register on probe, but neglects to re-clear the
Correctable and Uncorrectable Error Status Registers on reporting agents.
The error will eventually be reported when another error occurs. Which
is irritating because to an end user it appears as if the earlier error
has just happened.
Amend the AER driver to clear stale errors on reporting agents upon probe.
Skip reporting agents which have not invoked pci_aer_init() yet to avoid
using an uninitialized pdev->aer_cap. They're recognizable by the error
bits in the Device Control register still being clear.
Reporting agents may execute pci_aer_init() after the AER driver has
probed, particularly when devices are hotplugged or removed/rescanned via
sysfs. For this reason, it continues to be necessary that pci_aer_init()
clears Correctable and Uncorrectable Error Status Registers.
Ilpo Järvinen [Tue, 3 Feb 2026 17:21:38 +0000 (19:21 +0200)]
PCI: Don't claim disabled bridge windows
The commit 8278c6914306 ("PCI: Preserve bridge window resource type flags")
changed bridge window resource behavior such that flags are no longer zero
if the bridge window is not valid or is disabled (mainly to preserve the
type flags for later use). If a bridge window has its limit smaller than
base address, pci_read_bridge_*() sets both IORESOURCE_UNSET and
IORESOURCE_DISABLED to indicate the bridge window exists but is not valid
with the current base and limit configuration.
The code in pci_claim_bridge_resources() still depends on the old behavior
of checking validity of the bridge window solely based on !r->flags,
whereas after 8278c6914306, also IORESOURCE_DISABLED may indicate bridge
window addresses are not valid.
While pci_claim_resource() does check IORESOURCE_UNSET,
pci_claim_bridge_resource() attempts to clip the resource if
pci_claim_resource() fails, which is not correct for bridge window
resources that are not valid. As pci_bus_clip_resource() performs clipping
regardless of flags and then clears IORESOURCE_UNSET, it should not be
called unless the resource is valid.
The problem is visible in this log:
pci 0000:20:00.0: PCI bridge to [bus 21]
pci 0000:20:00.0: bridge window [io size 0x0000 disabled]: can't claim; no address assigned
pci 0000:20:00.0: [io 0x0000-0xffffffffffffffff disabled] clipped to [io 0x0000-0xffff disabled]
Add IORESOURCE_DISABLED check in pci_claim_bridge_resources() to only
claim bridge windows that appear to have a valid configuration.
Niklas Schnelle [Tue, 16 Dec 2025 22:14:03 +0000 (23:14 +0100)]
PCI/IOV: Fix race between SR-IOV enable/disable and hotplug
Commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking when
enabling/disabling SR-IOV") tried to fix a race between the VF removal
inside sriov_del_vfs() and concurrent hot unplug by taking the PCI
rescan/remove lock in sriov_del_vfs(). Similarly the PCI rescan/remove lock
was also taken in sriov_add_vfs() to protect addition of VFs.
This approach however causes deadlock on trying to remove PFs with SR-IOV
enabled because PFs disable SR-IOV during removal and this removal happens
under the PCI rescan/remove lock. So the original fix had to be reverted.
Instead of taking the PCI rescan/remove lock in sriov_add_vfs() and
sriov_del_vfs(), fix the race that occurs with SR-IOV enable and disable vs
hotplug higher up in the callchain by taking the lock in
sriov_numvfs_store() before calling into the driver's sriov_configure()
callback.
Niklas Schnelle [Tue, 16 Dec 2025 22:14:02 +0000 (23:14 +0100)]
Revert "PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV"
This reverts commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking
when enabling/disabling SR-IOV"), which causes a deadlock by recursively
taking pci_rescan_remove_lock when sriov_del_vfs() is called as part of
pci_stop_and_remove_bus_device(). For example with the following sequence
of commands:
Håkon Bugge [Thu, 29 Jan 2026 17:52:33 +0000 (18:52 +0100)]
PCI/ACPI: Restrict program_hpx_type2() to AER bits
Previously program_hpx_type2() applied PCIe settings unconditionally,
which could incorrectly change bits like Extended Tag Field Enable and
Enable Relaxed Ordering.
When _HPX was added to ACPI r3.0, the intent of the PCIe Setting
Record (Type 2) in sec 6.2.7.3 was to configure AER registers when the
OS does not own the AER Capability:
The PCI Express setting record contains ... [the AER] Uncorrectable
Error Mask, Uncorrectable Error Severity, Correctable Error Mask
... to be used when configuring registers in the Advanced Error
Reporting Extended Capability Structure ...
OSPM [1] will only evaluate _HPX with Setting Record – Type 2 if
OSPM is not controlling the PCI Express Advanced Error Reporting
capability.
ACPI r3.0b, sec 6.2.7.3, added more AER registers, including registers
in the PCIe Capability with AER-related bits, and the restriction that
the OS use this only when it owns PCIe native hotplug:
... when configuring PCI Express registers in the Advanced Error
Reporting Extended Capability Structure *or PCI Express Capability
Structure* ...
An OS that has assumed ownership of native hot plug but does not
... have ownership of the AER register set must use ... the Type 2
record to program the AER registers ...
However, since the Type 2 record also includes register bits that
have functions other than AER, the OS must ignore values ... that
are not applicable.
Restrict program_hpx_type2() to only the intended purpose:
- Apply settings only when OS owns PCIe native hotplug but not AER,
- Only touch the AER-related bits (Error Reporting Enables) in Device
Control
- Don't touch Link Control at all, since nothing there seems AER-related,
but log _HPX settings for debugging purposes
Note that Read Completion Boundary is now configured elsewhere, since it is
unrelated to _HPX.
[1] Operating System-directed configuration and Power Management
Håkon Bugge [Thu, 29 Jan 2026 17:52:32 +0000 (18:52 +0100)]
PCI: Initialize RCB from pci_configure_device()
Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root
Port supports it (_HPX)") worked around a bogus _HPX type 2 record, which
caused program_hpx_type2() to set the RCB in an endpoint even though the
Root Port did not have the RCB bit set.
e42010d8207f fixed that by setting the RCB in the endpoint only when it was
set in the Root Port.
In retrospect, program_hpx_type2() is intended for AER-related settings,
and the RCB should be configured elsewhere so it doesn't depend on the
presence or contents of an _HPX record.
Explicitly program the RCB from pci_configure_device() so it matches the
Root Port's RCB. The Root Port may not be visible to virtualized guests;
in that case, leave RCB alone.
PCI/pwrctrl: Create pwrctrl device if graph port is found
The devicetree node of the PCIe Root Port/Slot could have the graph port to
link the PCIe M.2 connector node. Since the M.2 connectors are modeled as
Power Sequencing devices, they need to be controlled by the pwrctrl driver
like the Root Port/Slot supplies.
Hence, create the pwrctrl device if the graph port is found in the node.
Sergey Shtylyov [Tue, 27 Jan 2026 20:39:42 +0000 (23:39 +0300)]
PCI: Check parent for NULL in of_pci_bus_release_domain_nr()
of_pci_bus_find_domain_nr() allows its parent parameter to be NULL but
of_pci_bus_release_domain_nr() (that undoes its effect) doesn't -- that
means it's going to blow up while calling of_get_pci_domain_nr() if the
parent parameter indeed happens to be NULL. Add the missing NULL check.
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
Add support for handling PCIe M.2 connectors as Power Sequencing devices.
These connectors are exposed as Power Sequencing devices as they often
support multiple interfaces like PCIe/SATA, USB/UART to the host machine,
and the interfaces may be driven by different client drivers at the same
time.
This driver handles the PCIe interface of these connectors. It first checks
for the presence of the graph port in the Root Port node with the help of
of_graph_is_present() API. If present, it acquires/powers ON the
corresponding pwrseq device.
Once the pwrseq device is powered ON, the driver will skip parsing the Root
Port/Slot resources and register with the pwrctrl framework.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:36 +0000 (19:40 +0200)]
PCI: Move CardBus bridge scanning to setup-cardbus.c
The PCI core's pci_scan_bridge_extend() contains convoluted logic specific
to setting up bus numbers for legacy CardBus bridges. Extract the CardBus
specific part out into setup-cardbus.c to make the core code cleaner and
allow omitting CardBus bridge support from modern systems.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:35 +0000 (19:40 +0200)]
PCI: Add pbus_validate_busn() for Bus Number validation
pci_scan_bridge_extend() validates bus numbers but upcoming changes that
separate CardBus code into own function need to call that the same
validation.
Thus, add pbus_validate_busn for validating the Bus Numbers.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:33 +0000 (19:40 +0200)]
PCI: Add dword #defines for Bus Number + Secondary Latency Timer
uapi/linux/pci_regs.h defines Primary/Secondary/Subordinate Bus Numbers
and Secondary Latency Timer (PCIe r7.0, sec. 7.5.1.3) as byte register
offsets, but in practice the code may read/write the entire dword. In the
lack of #defines to handle the dword fields, the code ends up using
literals which are not as easy to read.
Add dword field masks for the Bus Number and Secondary Latency Timer
fields and use them in probe.c.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:32 +0000 (19:40 +0200)]
PCI: Use scnprintf() instead of sprintf()
Using sprintf() is deprecated as it does not do proper size checks. While
the code in pci_scan_bridge_extend() is safe with respect to overwriting
the destination buffer, use scnprintf() to not promote use of a deprecated
sprint() (and allow eventually removing it from the kernel).
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:30 +0000 (19:40 +0200)]
PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS
PCI bridge window setup code includes special code to handle CardBus
bridges. CardBus has long since fallen out of favor and modern systems have
no use for it.
Move CardBus setup code to its own file and use existing CONFIG_CARDBUS to
decide whether it should be built or not.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:29 +0000 (19:40 +0200)]
PCI: Add 'pci' prefix to struct pci_dev_resource handling functions
setup-bus.c has static functions for handling struct pci_dev_resource
related operation which have no prefixes. Add 'pci' prefixes to those
function names as add_to_list() will be needed in another file by an
upcoming change.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:28 +0000 (19:40 +0200)]
PCI: Use resource_assigned() in setup-bus.c algorithm
Many places in the resource fitting and assignment algorithm want to
know if the resource is assigned into the resource tree or not. Convert
open-coded ->parent checks to use resource_assigned().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:27 +0000 (19:40 +0200)]
resource: Mark res given to resource_assigned() as const
The caller may hold a const struct resource which will trigger an
unnecessary warning when calling resource_assigned() as it will not
modify res in any way.
Mark resource_assigned()'s struct resource *res parameter const to
avoid the compiler warning.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:26 +0000 (19:40 +0200)]
PCI: Add pbus_mem_size_optional() to handle optional sizes
The resource loop in pbus_size_mem() handles optional resources that are
either fully optional (SR-IOV and disabled Expansion ROMs) or bridge
windows that may be optional only for a part. The logic is a little
inconsistent when it comes to a bridge window that has only optional
children resources as it would be more natural to treat it similar to any
fully optional resource. As resource size should be zero in that case, it
shouldn't cause any bugs but it still seems useful to address the
inconsistency.
Place the optional size related code of pbus_size_mem() into
pbus_mem_size_optional() and add a check in pci_resource_is_optional()
for entirely optional bridge windows. Reorder the logic inside
pbus_mem_size_optional() such that fully optional resources are handled
the same irrespective of whether the resource is a bridge window or
not.
Additional motivation for this are the upcoming changes that add complexity
to the optional sizing logic due to Resizable BAR awareness. The extra
logic would exceed any reasonable indentation level if the optional sizing
code is kept within the loop body.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:25 +0000 (19:40 +0200)]
PCI: Check invalid align earlier in pbus_size_mem()
Check for invalid align before any bridge window sizing actions in
pbus_size_mem() to avoid need to roll back any sizing calculations.
Placing the check earlier will make it easier to add more optional size
related calculations at where the SR-IOV logic currently is in
pbus_size_mem().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:24 +0000 (19:40 +0200)]
PCI: Log reset and restore of resources
PCI resource fitting and assignment is complicated to track because it
performs many actions without any logging. One of these is resource reset
(zeroing the resource) and the restore during the multi-pass resource
fitting algorithm.
Resource reset does not play well with the other PCI code if the code later
wants to reattempt assignment of that resource. Knowing that a resource was
left in the reset state without a pairing restore is useful for
understanding issues that show up as resource assignment failures.
Add pci_dbg() to both reset and restore to be better able to track what's
going on within the resource fitting algorithm.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:22 +0000 (19:40 +0200)]
PCI: Fetch dev_res to local var in __assign_resources_sorted()
__assign_resources_sorted() calls get_res_add_size() and
get_res_add_align(), each walking through the realloc_head list to
relocate the corresponding pci_dev_resource entry.
Fetch the pci_dev_resource entry into a local variable to avoid double
walk.
In addition, reverse logic to reduce indentation level.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:21 +0000 (19:40 +0200)]
PCI: Use res_to_dev_res() in reassign_resources_sorted()
reassign_resources_sorted() contains a search loop for a particular
resource in the head list. res_to_dev_res() already implements the same
search so use it instead.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:20 +0000 (19:40 +0200)]
PCI: Pass bridge window resource to pbus_size_mem()
pbus_size_mem() inputs type and calculates bridge window resource within.
Its caller (__pci_bus_size_bridges()) also has to look up the prefetchable
window to determine if it exists or not to decide whether to call
pbus_size_mem() twice or once.
Change the interface such that the caller is responsible in providing the
bridge window resource. Passing the resource directly avoids another lookup
for the prefetchable window inside pbus_size_mem().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:19 +0000 (19:40 +0200)]
PCI: Push realloc check into pbus_size_mem()
pbus_size_mem() and calculate_memsize() input both min_size and add_size.
They are given the same value if realloc_head is NULL and min_size is 0
otherwise. Both are used in calculate_memsize() to enforce a lower bound to
the size.
The interface between __pci_bus_size_bridges() and the forementioned
functions can be simplied by pushing the realloc check into
pbus_size_mem().
There are only two possible cases:
1) when calculating size0, add_size parameter given to
calculate_memsize() is always 0 which implies only min_size
matters.
2) when calculating size1, realloc_head is not NULL which implies
min_size=0 so only add_size matters.
Drop min_size parameter from pbus_size_mem() and check realloc_head when
calling calculate_memsize(). Drop add_size from calculate_memsize() and use
only min_size within max() to enforce the lower bound.
calculate_iosize() is a bit more complicated than calculate_memsize() and
is therefore left as is, but pbus_size_io() can still input only min_size
similar to pbus_size_mem().
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:18 +0000 (19:40 +0200)]
PCI: Remove old_size limit from bridge window sizing
calculate_memsize() applies lower bound to the resource size before
aligning the resource size making it impossible to shrink bridge window
resources. I've not found any justification for this lower bound and
nothing indicated it was to work around some HW issue.
Prior to the commit 3baeae36039a ("PCI: Use pci_release_resource() instead
of release_resource()"), releasing a bridge window during BAR resize
resulted in clearing start and end address of the resource. Clearing
addresses destroys the resource size as a side-effect, therefore nullifying
the effect of the old size lower bound.
After the commit 3baeae36039a ("PCI: Use pci_release_resource() instead of
release_resource()"), BAR resize uses the aligned old size, which results
in exceeding what fits into the parent window in some cases:
xe 0030:03:00.0: [drm] Attempting to resize bar from 256MiB -> 16384MiB
xe 0030:03:00.0: BAR 0 [mem 0x620c000000000-0x620c000ffffff 64bit]: releasing
xe 0030:03:00.0: BAR 2 [mem 0x6200000000000-0x620000fffffff 64bit pref]: releasing
pci 0030:02:01.0: bridge window [mem 0x6200000000000-0x620001fffffff 64bit pref]: releasing
pci 0030:01:00.0: bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref]: releasing
pci 0030:00:00.0: bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref]: was not released (still contains assigned resources)
pci 0030:00:00.0: Assigned bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref] to [bus 01-04] free space at [mem 0x6200400000000-0x62007ffffffff 64bit pref]
pci 0030:00:00.0: Assigned bridge window [mem 0x6200000000000-0x6203fbff0ffff 64bit pref] to [bus 01-04] cannot fit 0x4000000000 required for 0030:01:00.0 bridging to [bus 02-04]
The old size of 0x6200000000000-0x6203fbff0ffff resource was used as the
lower bound which results in 0x4000000000 size request due to alignment.
That exceeds what can fit into the parent window.
Since the lower bound never even was enforced fully because the resource
addresses were cleared when the bridge window is released, remove the
old_size lower bound entirely and trust the calculated bridge window size
is enough.
This same problem may occur on io window side but seems less likely to
cause issues due to general difference in alignment. Removing the lower
bound may have other unforeseen consequences in case of io window so it's
better to leave it as -next material if no problem is reported related to
io window sizing (BAR resize shouldn't touch io windows anyway).
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:17 +0000 (19:40 +0200)]
resource: Increase MAX_IORES_LEVEL to 8
While debugging a PCI resource allocation issue, the resources for many
nested bridges and endpoints got flattened in /proc/iomem by
MAX_IORES_LEVEL that is set to 5. This made the iomem output hard to
read as the visual hierarchy cues were lost.
Increase MAX_IORES_LEVEL to 8 to avoid flattening PCI topologies with
nested bridges so aggressively (the case in the Link has the deepest
resource at level 7 so 8 looks a reasonable limit).
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:16 +0000 (19:40 +0200)]
PCI: Stop over-estimating bridge window size
New way to calculate the bridge window head alignment produces tight-fit,
that is, it does not leave any gaps between the resources. Similarly,
relaxed tail alignment does not leave extra tail room.
Start to use bridge window calculation that does not over-estimate the size
of the required window.
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:15 +0000 (19:40 +0200)]
PCI: Rewrite bridge window head alignment function
The calculation of bridge window head alignment is done by
calculate_mem_align() [*]. With the default bridge window alignment, it
is used for both head and tail alignment.
The selected head alignment does not always result in tight-fitting
resources (gap at d4f00000-d4ffffff):
This has not caused problems (for years) with the default bridge window
tail alignment that grossly over-estimates the required tail alignment
leaving more tail room than necessary. With the introduction of relaxed
tail alignment that leaves no extra tail room whatsoever, any gaps will
immediately turn into assignment failures.
Introduce head alignment calculation that ensures no gaps are left and
apply the new approach when using relaxed alignment. We may want to
consider using it for the normal alignment eventually, but as the first
step, solve only the problem with the relaxed tail alignment.
([*] I don't understand the algorithm in calculate_mem_align().)
Ilpo Järvinen [Fri, 19 Dec 2025 17:40:14 +0000 (19:40 +0200)]
PCI: Fix bridge window alignment with optional resources
pbus_size_mem() has two alignments, one for required resources in min_align
and another in add_align that takes account optional resources.
The add_align is applied to the bridge window through the realloc_head
list. It can happen, however, that add_align is larger than min_align but
calculated size1 and size0 are equal due to extra tailroom (e.g., hotplug
reservation, tail alignment), and therefore no entry is created to the
realloc_head list. Without the bridge appearing in the realloc head,
add_align is lost when pbus_size_mem() returns.
The problem is visible in this log for 0000:05:00.0 which lacks
add_size ... add_align ... line that would indicate it was added into
the realloc_head list:
While this bug itself seems old, it has likely become more visible after
the relaxed tail alignment that does not grossly overestimate the size
needed for the bridge window.
Make sure add_align > min_align too results in adding an entry into the
realloc head list. In addition, add handling to the cases where add_size is
zero while only alignment differs.
Fixes: d74b9027a4da ("PCI: Consider additional PF's IOV BAR alignment in sizing and assigning") Reported-by: Malte Schröder <malte+lkml@tnxip.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Malte Schröder <malte+lkml@tnxip.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251219174036.16738-2-ilpo.jarvinen@linux.intel.com
While pci_primary_epc_epf_link() and pci_secondary_epc_epf_link() specify
the parameters in the correct order, pci_primary_epc_epf_unlink() and
pci_secondary_epc_epf_unlink() specify the parameters in the wrong order,
leading to the below kernel crash when using the unlink command in
configfs:
Unable to handle kernel paging request at virtual address 0000000300000857
Mem abort info:
...
pc : string+0x54/0x14c
lr : vsnprintf+0x280/0x6e8
...
string+0x54/0x14c
vsnprintf+0x280/0x6e8
vprintk_default+0x38/0x4c
vprintk+0xc4/0xe0
pci_epf_unbind+0xdc/0x108
configfs_unlink+0xe0/0x208+0x44/0x74
vfs_unlink+0x120/0x29c
__arm64_sys_unlinkat+0x3c/0x90
invoke_syscall+0x48/0x134
do_el0_svc+0x1c/0x30prop.0+0xd0/0xf0
Fixes: e85a2d783762 ("PCI: endpoint: Add support in configfs to associate two EPCs with EPF") Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
[mani: cced stable, changed commit message as per https://lore.kernel.org/linux-pci/aV9joi3jF1R6ca02@ryzen] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260108062747.1870669-1-mmaddireddy@nvidia.com
Brian Norris [Thu, 22 Jan 2026 17:48:15 +0000 (09:48 -0800)]
PCI/PM: Prevent runtime suspend until devices are fully initialized
Previously, it was possible for a PCI device to be runtime-suspended before
it was fully initialized. When that happened, the suspend process could
save invalid device state, for example, before BAR assignment. Restoring
the invalid state during resume may leave the device non-functional.
Prevent runtime suspend for PCI devices until they are fully initialized by
deferring pm_runtime_enable().
More details on how exactly this may occur:
1. PCI device is created by pci_scan_slot() or similar
2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and
prevents runtime suspend prevented via pm_runtime_forbid()
3. pci_device_add() adds the underlying 'struct device' via device_add(),
which means user space can allow runtime suspend, e.g.,
echo auto > /sys/bus/pci/devices/.../power/control
4. PCI device receives BAR configuration
(pci_assign_unassigned_bus_resources(), etc.)
5. pci_bus_add_device() applies final fixups, saves device state, and
tries to attach a driver
The device may potentially be suspended between #3 and #5, so this is racy
with user space (udev or similar).
Many PCI devices are enumerated at subsys_initcall time and so will not
race with user space, but devices created later by hotplug or modular
pwrctrl or host controller drivers are susceptible to this race.
Jörg Wedekind [Mon, 19 Jan 2026 14:31:10 +0000 (15:31 +0100)]
PCI: Mark 3ware-9650SA Root Port Extended Tags as broken
Per PCIe r7.0, sec 2.2.6.2.1 and 7.5.3.4, a Requester may not use 8-bit Tags
unless its Extended Tag Field Enable is set, but all Receivers/Completers
must handle 8-bit Tags correctly regardless of their Extended Tag Field
Enable.
Some devices do not handle 8-bit Tags as Completers, so add a quirk for
them. If we find such a device, we disable Extended Tags for the entire
hierarchy to make peer-to-peer DMA possible.
The 3ware 9650SA seems to have issues with handling 8-bit tags. Mark it as
broken.
Previously, the pcie-qcom driver probed first, deasserted PERST#, enabled
link training and scanned the bus. By the time the pwrctrl driver probe got
called, link training was already enabled by the controller driver.
Thus the pwrctrl drivers had to call the .assert_perst() callback, to
assert PERST#, power on the needed resources, and then call the
.assert_perst() callback to deassert PERST#.
Now since all pwrctrl drivers and this controller driver have been
converted to the new pwrctrl design where the pwrctrl drivers will first
power on the devices before this driver deasserts PERST# and scan the bus.
So there is no longer a need for .assert_perst() callback in this driver
and in DWC core driver. Hence, drop them.
PCI/pwrctrl: Switch to pwrctrl create, power on/off, destroy APIs
Adopt pwrctrl APIs to create, power on/off, and destroy pwrctrl devices.
In qcom_pcie_host_init(), call pci_pwrctrl_create_devices() to create
devices, then pci_pwrctrl_power_on_devices() to power them on, both after
controller resource initialization. Once successful, deassert PERST# for
all devices.
In qcom_pcie_host_deinit(), call pci_pwrctrl_power_off_devices() after
asserting PERST#. Note that pci_pwrctrl_destroy_devices() is not called
here, as deinit is only invoked during system suspend where device
destruction is unnecessary. If the driver becomes removable in future,
pci_pwrctrl_destroy_devices() should be called in the remove() handler.
Remove the old pwrctrl framework code from the PCI core (including
devlinks) as the new APIs are now the sole consumer of pwrctrl
functionality. And also do not power on the pwrctrl drivers during probe()
as this is now handled by the APIs.
PCI/pwrctrl: Add APIs to power on/off pwrctrl devices
To fix bridge resource allocation issues when powering PCI bridges with the
pwrctrl driver, introduce APIs to explicitly power on and off all related
devices simultaneously.
Previously, the individual pwrctrl drivers powered on/off the PCI devices
autonomously, without any control from the controller drivers. But to
enforce ordering with respect to powering on the devices, these APIs will
power on/off all the devices at the same time.
The pci_pwrctrl_power_on_devices() API recursively scans the PCI child
nodes, makes sure that pwrctrl drivers are bound to devices, and calls
their power_on() callbacks. If any pwrctrl driver is not bound, it will
return -EPROBE_DEFER.
Similarly, pci_pwrctrl_power_off_devices() API powers off devices
recursively via their power_off() callbacks.
These APIs are expected to be called during the controller probe and
suspend/resume time to power on/off the devices. But before calling these
APIs, the pwrctrl devices should be created using the
pci_pwrctrl_{create/destroy}_devices() APIs.
PCI/pwrctrl: Add APIs to create, destroy pwrctrl devices
Previously, the PCI core created pwrctrl devices during pci_scan_device()
on its own and then skipped enumeration of those devices, hoping the
pwrctrl driver would power them on and trigger a bus rescan.
This approach works for endpoint devices directly connected to Root Ports,
but it fails for PCIe switches acting as bus extenders. When the switch
requires pwrctrl support and the pwrctrl driver is not available during
the pwrctrl device creation, its enumeration will be skipped during the
initial PCI bus scan.
This premature scan leads the PCI core to allocate resources (bridge
windows, bus numbers) for the upstream bridge based on available downstream
buses at scan time. For non-hotplug capable bridges, PCI core typically
allocates resources based on the number of buses available during the
initial bus scan, which happens to be just one if the switch is not powered
on and enumerated at that time. When the switch gets enumerated later on,
it will fail due to the lack of upstream resources.
As a result, a PCIe switch powered on by the pwrctrl driver cannot be
reliably enumerated currently. Either the switch has to be enabled in the
bootloader or the switch pwrctrl driver has to be loaded during the pwrctrl
device creation time to work around these issues.
Introduce new APIs to explicitly create and destroy pwrctrl devices from
controller drivers by recursively scanning the PCI child nodes of the
controller. These APIs allow creating pwrctrl devices based on the original
criteria and are intended to be called during controller probe and removal.
These APIs, together with the upcoming APIs for power on/off will allow the
controller drivers to power on all the devices before starting the initial
bus scan, thereby solving the resource allocation issue.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: splitted the patch, cleaned up the code, and rewrote description] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://patch.msgid.link/20260115-pci-pwrctrl-rework-v5-10-9d26da3ce903@oss.qualcomm.com
To allow the pwrctrl core to control the power on/off sequences of the
pwrctrl drivers, add the 'struct pci_pwrctrl::power_{on/off}' callbacks and
populate them in the respective pwrctrl drivers.
The pwrctrl drivers still power on the resources on their own now. So there
is no functional change.
PCI/pwrctrl: pwrseq: Factor out power on/off code to helpers
In order to allow the pwrctrl core to control the power on/off logic of the
pwrctrl pwrseq driver, move the power on/off code to
pci_pwrctrl_pwrseq_power_{off/on} helper functions.
PCI/pwrctrl: slot: Factor out power on/off code to helpers
In order to allow the pwrctrl core to control the power on/off logic of the
pwrctrl slot driver, move the power on/off code to
pci_pwrctrl_slot_power_{off/on} helper functions.
Bjorn Helgaas [Thu, 15 Jan 2026 07:28:58 +0000 (12:58 +0530)]
PCI/pwrctrl: tc9563: Rename private struct and pointers for consistency
Previously the pwrseq, tc9563, and slot pwrctrl drivers used different
naming conventions for their private data structs and pointers to them,
which makes patches hard to read.
Rename struct and variables to be shorter and more specific:
PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
The API comment for of_find_i2c_adapter_by_node() recommends using
put_device() to drop the reference count of I2C adapter instead of using
i2c_put_adapter(). So replace i2c_put_adapter() with put_device().
Fixes: 4c9c7be47310 ("PCI: pwrctrl: Add power control driver for TC9563") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://patch.msgid.link/20260115-pci-pwrctrl-rework-v5-3-9d26da3ce903@oss.qualcomm.com
Bjorn Helgaas [Thu, 15 Jan 2026 07:28:54 +0000 (12:58 +0530)]
PCI/pwrctrl: slot: Rename private struct and pointers for consistency
Previously the pwrseq, tc9563, and slot pwrctrl drivers used different
naming conventions for their private data structs and pointers to them,
which makes patches hard to read.
Rename structs, variables, and functions to reduce boiler-plate and start
with "slot":
Bjorn Helgaas [Thu, 15 Jan 2026 07:28:53 +0000 (12:58 +0530)]
PCI/pwrctrl: pwrseq: Rename private struct and pointers for consistency
Previously the pwrseq, tc9563, and slot pwrctrl drivers used different
naming conventions for their private data structs and pointers to them,
which makes patches hard to read.
Rename structs, variables, and functions to reduce boiler-plate and start
with "pwrseq":
Alistair Popple [Mon, 12 Jan 2026 00:54:40 +0000 (11:54 +1100)]
PCI/P2PDMA: Reset page reference count when page mapping fails
When mapping a p2pdma page the page reference count is initialised to 1
prior to calling vm_insert_page(). This is to avoid vm_insert_page()
warning if the page refcount is zero. Prior to setting the page count there
is a check to ensure the page is currently free (ie. has a zero reference
count).
However vm_insert_page() can fail. In this case the pages are freed back to
the genalloc pool, but that does not reset the page refcount. So a future
allocation of the same page will see the elevated page refcount from the
previous set_page_count() call triggering the VM_WARN_ON_ONCE_PAGE checking
that the page is free.
Fix this by resetting the page refcount to zero using set_page_count().
Note that put_page() is not used because that would result in freeing the
page twice due to implicitly calling p2pdma_folio_free().
In pcie_ptm_create_debugfs(), if devm_kasprintf() fails after successfully
allocating ptm_debugfs with kzalloc(), the function returns without freeing
the allocated memory, resulting in a memory leak.
Free ptm_debugfs before returning in the devm_kasprintf() error path and in
pcie_ptm_destroy_debugfs().
Instead of assigning the probe function for each driver individually, use
.probe() and .remove() from the pci_express bus. Rename the functions for
consistency.
.shutdown() is an optional callback and the core only calls it if the
pointer in struct device_driver is non-NULL. So make nothing in a bit
shorter time and remove the empty function.
pcie_port_probe_service() unconditionally calls get_device() (unless it
fails). So drop that reference also unconditionally as it's fine for a
PCIe driver to not have a remove callback.
Nirmoy Das [Wed, 17 Dec 2025 15:45:29 +0000 (07:45 -0800)]
PCI: Add PCI_BRIDGE_NO_ALIAS quirk for ASPEED AST1150
ASPEED BMC controllers have VGA and USB functions behind a PCIe-to-PCI
bridge that causes them to share the same StreamID:
[e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0 ASPEED Graphics Family
\-02.0 ASPEED USB Controller
Both devices get StreamID 0x5e200 due to bridge aliasing, causing the USB
controller to be rejected with 'Aliasing StreamID unsupported'.
Per ASPEED, the AST1150 doesn't use a real PCI bus and always forwards
the original Requester ID from downstream devices rather than replacing
it with any alias.
Add a new PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIAS flag and apply it to the
AST1150.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/20251217154529.377586-2-nirmoyd@nvidia.com
Håkon Bugge [Wed, 12 Nov 2025 09:54:40 +0000 (10:54 +0100)]
PCI: Do not attempt to set ExtTag for VFs
The bit for enabling extended tags is Reserved and Preserved (RsvdP) for
VFs, according to PCIe r7.0 section 7.5.3.4 table 7.21. Hence, bail out
early from pci_configure_extended_tags() if the device is a VF.
Otherwise, we may see incorrect log messages such as:
The PCI tracing system provides tracepoints to monitor critical hardware
events that can impact system performance and reliability. Add
documentation about it.
Commit b7e282378773 has already changed the initial page refcount of
p2pdma page from one to zero, however, in p2pmem_alloc_mmap() it uses
"VM_WARN_ON_ONCE_PAGE(!page_ref_count(page))" to assert the initial page
refcount should not be zero and the following will be reported when
CONFIG_DEBUG_VM is enabled:
Hou Tao [Sat, 20 Dec 2025 04:04:34 +0000 (12:04 +0800)]
PCI/P2PDMA: Release per-CPU pgmap ref when vm_insert_page() fails
When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
doesn't invoke percpu_ref_put() to free the per-CPU ref of pgmap acquired
after gen_pool_alloc_owner(), and memunmap_pages() will hang forever when
trying to remove the PCI device.
Brian Norris [Fri, 3 Oct 2025 22:40:09 +0000 (15:40 -0700)]
PCI/PM: Avoid redundant delays on D3hot->D3cold
When transitioning to D3cold, __pci_set_power_state() first transitions to
D3hot. If the device was already in D3hot, this adds excess work:
(a) read/modify/write PMCSR; and
(b) excess delay (pci_dev_d3_sleep()).
For (b), we already performed the necessary delay on the previous D3hot
entry; this was extra noticeable when evaluating runtime PM transition
latency.
Check whether we're already in the target state before continuing.
Note that __pci_set_power_state() already does this same check for other
state transitions, but D3cold is special because __pci_set_power_state()
converts it to D3hot for the purposes of PMCSR.
This seems to be an oversight in commit 0aacdc957401 ("PCI/PM: Clean up
pci_set_low_power_state()").
Fixes: 0aacdc957401 ("PCI/PM: Clean up pci_set_low_power_state()") Signed-off-by: Brian Norris <briannorris@google.com> Signed-off-by: Brian Norris <briannorris@chromium.org>
[bhelgaas: reverse test to match other "dev->current_state == state" cases] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251003154008.1.I7a21c240b30062c66471329567a96dceb6274358@changeid
Shuai Xue [Wed, 10 Dec 2025 13:29:06 +0000 (21:29 +0800)]
PCI: trace: Add RAS tracepoint to monitor link speed changes
PCIe link speed degradation directly impacts system performance and often
indicates hardware issues such as faulty devices, physical layer problems,
or configuration errors.
To this end, add a RAS tracepoint to monitor link speed changes, enabling
proactive health checks and diagnostic analysis.
The following output is generated when a device is hotplugged:
Shuai Xue [Wed, 10 Dec 2025 13:29:05 +0000 (21:29 +0800)]
PCI: trace: Add generic RAS tracepoint for hotplug event
Hotplug events are critical indicators for analyzing hardware health, and
surprise link downs can significantly impact system performance and
reliability.
Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
for hotplug event to help health checks. Add enum pci_hotplug_event in
include/uapi/linux/pci.h so applications like rasdaemon can register
tracepoint event handlers for it.
The following output is generated when a device is hotplugged:
Ilpo Järvinen [Mon, 8 Dec 2025 14:56:54 +0000 (16:56 +0200)]
PCI: Use resource_set_range() that correctly sets ->end
__pci_read_base() sets resource start and end addresses when resource
is larger than 4G but pci_bus_addr_t or resource_size_t are not capable
of representing 64-bit PCI addresses. This creates a problematic
resource that has non-zero flags but the start and end addresses do not
yield to resource size of 0 but 1.
Replace custom resource addresses setup with resource_set_range()
that correctly sets end address as -1 which results in resource_size()
returning 0.
For consistency, also use resource_set_range() in the other branch that
does size based resource setup.
The asynchronous creation of sub-groups by a delayed work could lead to a
NULL pointer dereference when the driver directory is removed before the
work completes.
The crash can be easily reproduced with the following commands:
# cd /sys/kernel/config/pci_ep/functions/pci_epf_test
# for i in {1..20}; do mkdir test && rmdir test; done
Fix this issue by using configfs_add_default_group() API which does not
have the deadlock problem as configfs_register_group() and does not require
the delayed work handler.
Fixes: e85a2d783762 ("PCI: endpoint: Add support in configfs to associate two EPCs with EPF") Signed-off-by: Liu Song <liu.song13@zte.com.cn>
[mani: slightly reworded the description and added stable list] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@kernel.org Link: https://patch.msgid.link/20250710143845409gLM6JdlwPhlHG9iX3F6jK@zte.com.cn
Fix copy & paste errors by changing the references from 'ntb' to 'vntb'.
Fixes: 4ac8c8e52cd9 ("Documentation: PCI: Add specification for the PCI vNTB function device") Signed-off-by: Baruch Siach <baruch@tkos.co.il>
[mani: squashed the patches and fixed more errors] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/b51c2a69ffdbfa2c359f5cf33f3ad2acc3db87e4.1762154911.git.baruch@tkos.co.il
Linus Torvalds [Sun, 14 Dec 2025 03:35:35 +0000 (15:35 +1200)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The only core fix is in doc; all the others are in drivers, with the
biggest impacts in libsas being the rollback on error handling and in
ufs coming from a couple of error handling fixes, one causing a crash
if it's activated before scanning and the other fixing W-LUN
resumption"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom: Fix confusing cleanup.h syntax
scsi: libsas: Add rollback handling when an error occurs
scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
scsi: ufs: core: Fix a deadlock in the frequency scaling code
scsi: ufs: core: Fix an error handler crash
scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
scsi: qla4xxx: Use time conversion macros
scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
scsi: imm: Fix use-after-free bug caused by unfinished delayed work
scsi: target: sbp: Remove KMSG_COMPONENT macro
scsi: core: Correct documentation for scsi_device_quiesce()
scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
scsi: target: Reset t_task_cdb pointer in error case
scsi: ufs: core: Fix EH failure after W-LUN resume error
Linus Torvalds [Sun, 14 Dec 2025 03:24:10 +0000 (15:24 +1200)]
Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"We have a patch that adds an initial set of tracepoints to the MDS
client from Max, a fix that hardens osdmap parsing code from myself
(marked for stable) and a few assorted fixups"
* tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client:
rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES
ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES
libceph: make decode_pool() more resilient against corrupted osdmaps
libceph: Amend checking to fix `make W=1` build breakage
ceph: Amend checking to fix `make W=1` build breakage
ceph: add trace points to the MDS client
libceph: fix log output race condition in OSD client
Linus Torvalds [Sat, 13 Dec 2025 18:12:46 +0000 (06:12 +1200)]
Merge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:
- Fix CPU hotplug callbacks to disable interrupts on UP kernels
* tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Make atomic hotplug callbacks run with interrupts disabled on UP
Linus Torvalds [Sat, 13 Dec 2025 18:10:35 +0000 (06:10 +1200)]
Merge tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
- Fix NULL pointer dereference crash in the Intel PMU driver
- Fix missing read event generation on task exit
- Fix AMD uncore driver init error handling
- Fix whitespace noise
* tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
perf/core: Fix missing read event generation on task exit
perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error
perf/uprobes: Remove <space><Tab> whitespace noise
Linus Torvalds [Sat, 13 Dec 2025 18:07:09 +0000 (06:07 +1200)]
Merge tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix error code in the irqchip/mchp-eic driver
- Fix setup_percpu_irq() affinity assumptions
- Remove the unused irq_domain_add_tree() function
* tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc()
irqdomain: Delete irq_domain_add_tree()
genirq: Allow NULL affinity for setup_percpu_irq()
Linus Torvalds [Sat, 13 Dec 2025 18:04:16 +0000 (06:04 +1200)]
Merge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
- Improve bug reporting
- Suppress W=1 format warning
- Improve rseq scalability on Clang builds
* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Always inline rseq_debug_syscall_return()
bug: Hush suggest-attribute=format for __warn_printf()
bug: Let report_bug_entry() provide the correct bugaddr
Linus Torvalds [Sat, 13 Dec 2025 08:55:12 +0000 (20:55 +1200)]
Merge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"There are no significant series in this small merge. Please see the
individual changelogs for details"
[ Editor's note: it's mainly ocfs2 and a couple of random fixes ]
* tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: memfd_luo: add CONFIG_SHMEM dependency
mm: shmem: avoid build warning for CONFIG_SHMEM=n
ocfs2: fix memory leak in ocfs2_merge_rec_left()
ocfs2: invalidate inode if i_mode is zero after block read
ocfs2: avoid -Wflex-array-member-not-at-end warning
ocfs2: convert remaining read-only checks to ocfs2_emergency_state
ocfs2: add ocfs2_emergency_state helper and apply to setattr
checkpatch: add uninitialized pointer with __free attribute check
args: fix documentation to reflect the correct numbers
ocfs2: fix kernel BUG in ocfs2_find_victim_chain
liveupdate: luo_core: fix redundant bound check in luo_ioctl()
ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list
fs/fat: remove unnecessary wrapper fat_max_cache()
ocfs2: replace deprecated strcpy with strscpy
ocfs2: check tl_used after reading it from trancate log inode
liveupdate: luo_file: don't use invalid list iterator
Brown paper bag time. This is a silly oversight where I missed to drop
the error condition checking to ensure we clean up on early error
returns. I have an internal unit testset coming up for this which will
catch all such issues going forward.
Reported-by: Chris Mason <clm@fb.com> Reported-by: Jeff Layton <jlayton@kernel.org> Fixes: 011703a9acd7 ("file: add FD_{ADD,PREPARE}()") Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 13 Dec 2025 07:57:41 +0000 (19:57 +1200)]
x86/hv: Add gitignore entry for generated header file
Commit 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") added a
new generated header file for the offsets into the mshv_vtl_cpu_context
structure to be used by the low-level assembly code. But it didn't add
the .gitignore file to go with it, so 'git status' and friends will
mention it.
Let's add the gitignore file before somebody thinks that generated
header should be committed.
Linus Torvalds [Sat, 13 Dec 2025 05:39:28 +0000 (17:39 +1200)]
Merge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Pull more drm fixes from Dave Airlie:
"These are the enqueued fixes that ended up in our fixes branch,
nouveau mostly, along with some small fixes in other places.
plane:
- Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties()
ttm:
- fix devcoredump for evicted bos
panel:
- Fix stack usage warning in novatek-nt35560
nouveau:
- alloc fwsec sb at boot to avoid s/r problems
- fix strcpy usage
- fix i2c encoder crash
bridge:
- Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83
mgag200:
- Fix bigendian handling in mgag200
tilcdc:
- Fix probe failure in tilcdc"
* tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
drm/mgag200: Fix big-endian support
drm/tilcdc: Fix removal actions in case of failed probe
drm/ttm: Avoid NULL pointer deref for evicted BOs
drm: nouveau: Replace sprintf() with sysfs_emit()
drm/nouveau: fix circular dep oops from vendored i2c encoder
drm/nouveau: refactor deprecated strcpy
drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties()
drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors
drm/nouveau/gsp: Allocate fwsec-sb at boot
drm/panel: novatek-nt35560: avoid on-stack device structure
i915:
- Fix format string truncation warning
- FIx runtime PM reference during fbdev BO creation
panthor:
- fix UAF
renesas:
- fix sync flag handling"
* tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/amd/display: Fix pbn to kbps Conversion"
drm/amd: Fix unbind/rebind for VCN 4.0.5
drm/i915: Fix format string truncation warning
drm/i915/fbdev: Hold runtime PM ref during fbdev BO creation
drm/amd/display: Improve HDMI info retrieval
drm/amdkfd: bump minimum vgpr size for gfx1151
drm/amd/display: shrink struct members
drm/amdkfd: Export the cwsr_size and ctl_stack_size to userspace
drm/amd/display: Refactor dml_core_mode_support to reduce stack frame
drm/amdgpu: don't attach the tlb fence for SI
drm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()
drm/amdkfd: Trap handler support for expert scheduling mode
drm/amdkfd: Use huge page size to check split svm range alignment
drm/rcar-du: dsi: Handle both DRM_MODE_FLAG_N.SYNC and !DRM_MODE_FLAG_P.SYNC
drm/gem-shmem: revert the 8-byte alignment constraint
drm/gem-dma: revert the 8-byte alignment constraint
drm/panthor: Prevent potential UAF in group creation
Linus Torvalds [Sat, 13 Dec 2025 05:15:16 +0000 (17:15 +1200)]
Merge tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull further i3c update from Alexandre Belloni:
"We are removing a legacy API callback and having this sooner rather
than later will help ensuring no one introduces a new driver using it.
I've also added patches removing the "__free(...) = NULL" pattern
because I'm sure we won't avoid people sending those following the
mailing list discussion..."
* tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: adi: Fix confusing cleanup.h syntax
i3c: master: Fix confusing cleanup.h syntax
i3c: master: cleanup callback .priv_xfers()
i3c: master: switch to use new callback .i3c_xfers() from .priv_xfers()
Linus Torvalds [Sat, 13 Dec 2025 05:09:06 +0000 (17:09 +1200)]
Merge tag 'rtc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Subsystem:
- stop setting max_user_freq from the individual drivers as this has
not been hardware related for a while
New drivers:
- Andes ATCRTC100
- Apple SMC
- Nvidia VRS