Bjorn Helgaas [Wed, 4 Jun 2025 15:50:02 +0000 (10:50 -0500)]
Merge branch 'pci/pwrctrl'
- Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the
filename paths. Retain old deprecated symbols for compatibility, except
for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold)
- When unregistering pwrctrl, cancel outstanding rescan work before
cleaning up data structures to avoid use-after-free issues (Brian Norris)
* pci/pwrctrl:
arm64: Kconfig: switch to HAVE_PWRCTRL
wifi: ath12k: switch to PCI_PWRCTRL_PWRSEQ
wifi: ath11k: switch to PCI_PWRCTRL_PWRSEQ
PCI/pwrctrl: Rename pwrctrl Kconfig symbols and slot module
PCI/pwrctrl: Cancel outstanding rescan work when unregistering
Bjorn Helgaas [Wed, 4 Jun 2025 15:50:01 +0000 (10:50 -0500)]
Merge branch 'pci/pm'
- Add pm_runtime_put() cleanup helper for use with __free() to
automatically drop the device usage count when a pointer goes out of
scope (Alex Williamson)
- Increment PM usage counter when probing reset methods so we don't try to
read config space of a powered-off device (Alex Williamson)
- Set all devices to D0 during enumeration to ensure ACPI opregion is
connected via _REG (Mario Limonciello)
* pci/pm:
PCI: Explicitly put devices into D0 when initializing
PCI: Increment PM usage counter when probing reset methods
PM: runtime: Define pm_runtime_put cleanup helper
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:59 +0000 (10:49 -0500)]
Merge branch 'pci/hotplug'
- Ignore Presence Detect Changed caused by DPC. pciehp already ignores
Link Down/Up events caused by DPC, but on slots using in-band presence
detect, DPC causes a spurious Presence Detect Changed event (Lukas
Wunner)
- Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports
using in-band presence detect, the reset causes a Presence Detect Changed
event, which mistakenly caused teardown and re-enumeration of the device.
Drivers may need to annotate code that resets their device (Lukas Wunner)
* pci/hotplug:
PCI: hotplug: Drop superfluous #include directives
PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
PCI: pciehp: Ignore Presence Detect Changed caused by DPC
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:56 +0000 (10:49 -0500)]
Merge branch 'pci/enumeration'
- Remove pci_fixup_cardbus(), which has no users left (Heiner Kallweit)
- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
instead of assuming it was 1000ms (Wilfred Mallawa)
- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
devices', which broke resume from system sleep on AMD platforms and has
been fixed by other commits (Lukas Wunner)
- Restrict visibility of pci_dev.match_driver since it's no longer used
outside the PCI core (Lukas Wunner)
* pci/enumeration:
PCI: Limit visibility of match_driver flag to PCI core
Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices"
PCI: Print the actual delay time in pci_bridge_wait_for_secondary_bus()
PCI: Use PCI_STD_NUM_BARS instead of 6
PCI: Remove pci_fixup_cardbus()
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:50 +0000 (10:49 -0500)]
Merge branch 'pci/devres'
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and
unnecessary (Philipp Stanner)
- Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and
related flags since all uses have been removed (Philipp Stanner)
- Rework devres 'request' functions so they are no longer 'hybrid', i.e.,
their behavior no longer depends on whether pcim_enable_device or
pci_enable_device() was used, and remove related code (Philipp Stanner)
* pci/devres:
PCI: Remove function pcim_intx() prototype from pci.h
PCI: Remove hybrid-devres usage warnings from kernel-doc
PCI: Remove redundant set of request functions
PCI: Remove exclusive requests flags from _pcim_request_region()
PCI: Remove pcim_request_region_exclusive()
Documentation/driver-api: Update pcim_enable_device()
PCI: Remove hybrid devres nature from request functions
PCI: Remove pcim_iounmap_regions()
mtip32xx: Remove unnecessary pcim_iounmap_regions() calls
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:49 +0000 (10:49 -0500)]
Merge branch 'pci/bwctrl'
- Simplify link bandwidth controller by replacing the count of Link
Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag
(Ilpo Järvinen)
- Update the Link Speed after retraining, since the Link Speed may have
changed (Ilpo Järvinen)
* pci/bwctrl:
PCI: Update Link Speed after retraining
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
Bjorn Helgaas [Wed, 4 Jun 2025 15:49:49 +0000 (10:49 -0500)]
Merge branch 'pci/aer'
- Initialize struct aer_err_info before using it to avoid depending on
stack garbage (Bjorn Helgaas)
- Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL
or ERR_NONFATAL was received from a downstream device) and decode into
bus/device/function (Bjorn Helgaas)
- Consolidate AER Error Source ID in one place for message consistency
(Bjorn Helgaas)
- Update statistics and emit trace events early in AER logging paths,
before any potential ratelimiting (Bjorn Helgaas)
- Determine AER log level once and save it so all related messages use the
same level (Karolina Stolarek)
- Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors.
- Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
controls on interval and burst count, to avoid flooding logs and RCU
stall warnings (Jon Pan-Doh)
* pci/aer:
PCI/ERR: Remove misleading TODO regarding kernel panic
PCI/AER: Add sysfs attributes for log ratelimits
PCI/AER: Add ratelimits to PCI AER Documentation
PCI/AER: Ratelimit correctable and non-fatal error logging
PCI/AER: Simplify add_error_device()
PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index
PCI/AER: Rename struct aer_stats to aer_info
PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNING
PCI/ERR: Add printk level to pcie_print_tlp_log()
PCI/AER: Check log level once and remember it
PCI/AER: Trace error event before ratelimiting
PCI/AER: Update statistics before ratelimiting
PCI/AER: Simplify pci_print_aer()
PCI/AER: Initialize aer_err_info before using it
PCI/AER: Move aer_print_source() earlier in file
PCI/AER: Rename aer_print_port_info() to aer_print_source()
PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc
PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()
PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()
PCI/DPC: Log Error Source ID only when valid
PCI/DPC: Initialize aer_err_info before using it
PCI/ERR: Remove misleading TODO regarding kernel panic
A PCI device is just another peripheral in a system. So failure to
recover it, must not result in a kernel panic. So remove the TODO which
is quite misleading.
Johan Hovold [Wed, 2 Apr 2025 13:26:31 +0000 (15:26 +0200)]
PCI/pwrctrl: Rename pwrctrl Kconfig symbols and slot module
Commits b88cbaaa6fa1 ("PCI/pwrctrl: Rename pwrctl files to pwrctrl") and 3f925cd62874 ("PCI/pwrctrl: Rename pwrctrl functions and structures")
renamed the "pwrctl" framework to "pwrctrl" for consistency reasons.
Rename also the Kconfig symbols so that they reflect the new name while
adding entries for the deprecated ones. The old symbols can be removed once
everything that depends on them has been updated.
Note that no deprecated symbol is added for the new slot driver to avoid
having to add a user visible option.
Rename the new slot module to reflect the framework name and match the
other pwrctrl modules.
Jon Pan-Doh [Thu, 22 May 2025 23:21:26 +0000 (18:21 -0500)]
PCI/AER: Add sysfs attributes for log ratelimits
Allow userspace to read/write log ratelimits per device (including
enable/disable). Create aer/ sysfs directory to store them and any
future AER configs.
The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so
if we try to emit more than 10 messages in a 5 second period, some are
suppressed.
Update AER sysfs ABI filename to reflect the broader scope of AER sysfs
attributes (e.g. stats and ratelimits).
Jon Pan-Doh [Thu, 22 May 2025 23:21:24 +0000 (18:21 -0500)]
PCI/AER: Ratelimit correctable and non-fatal error logging
Spammy devices can flood kernel logs with AER errors and slow/stall
execution. Add per-device ratelimits for AER correctable and non-fatal
uncorrectable errors that use the kernel defaults (10 per 5s). Logging of
fatal errors is not ratelimited.
There are two AER logging entry points:
- aer_print_error() is used by DPC and native AER
- pci_print_aer() is used by GHES and CXL
The native AER aer_print_error() case includes a loop that may log details
from multiple devices, which are ratelimited individually. If we log
details for any device, we also log the Error Source ID from the Root Port
or RCEC.
If no such device details are found, we still log the Error Source from the
ERR_* Message, ratelimited by the Root Port or RCEC that received it.
The DPC aer_print_error() case is not ratelimited, since this only happens
for fatal errors.
The CXL pci_print_aer() case is ratelimited by the Error Source device.
The GHES pci_print_aer() case is via aer_recover_work_func(), which
searches for the Error Source device. If the device is not found, there's
no per-device ratelimit, so we use a system-wide ratelimit that covers all
error types (correctable, non-fatal, and fatal).
Sargun at Meta reported internally that a flood of AER errors causes RCU
CPU stall warnings and CSD-lock warnings.
Tested using aer-inject[1]. Sent 11 AER errors. Observed 10 errors logged
while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) show
true count of 11.
Bjorn Helgaas [Thu, 22 May 2025 23:21:22 +0000 (18:21 -0500)]
PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index
Previously aer_get_device_error_info() and aer_print_error() took a pointer
to struct aer_err_info and a pointer to a pci_dev. Typically the pci_dev
was one of the elements of the aer_err_info.dev[] array (DPC was an
exception, where the dev[] array was unused).
Convert aer_get_device_error_info() and aer_print_error() to take an index
into the aer_err_info.dev[] array instead. A future patch will add
per-device ratelimit information, so the index makes it convenient to find
the ratelimit associated with the device.
To accommodate DPC, set info->dev[0] to the DPC port before using these
interfaces.
Update name to reflect the broader definition of structs/variables that are
stored (e.g. ratelimits). This is a preparatory patch for adding rate limit
support.
[bhelgaas: "aer_report" -> "aer_info"]
Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-16-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:19 +0000 (18:21 -0500)]
PCI/ERR: Add printk level to pcie_print_tlp_log()
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/
etc) that depends on the kind of error, and it calls pcie_print_tlp_log(),
which previously always produced output at KERN_ERR.
Add a "level" parameter so aer_print_error() can control the level of the
pcie_print_tlp_log() output to match.
When reporting an AER error, we check its type multiple times to determine
the log level for each message. Do this check only in the top-level
functions (aer_isr_one_error(), pci_print_aer()) and save the level in
struct aer_err_info.
[bhelgaas: save log level in struct aer_err_info instead of passing it
as a parameter]
Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-13-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:16 +0000 (18:21 -0500)]
PCI/AER: Update statistics before ratelimiting
There are two AER logging entry points:
- aer_print_error() is used by DPC (dpc_process_error()) and native AER
handling (aer_process_err_devices()).
- pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL
(cxl_handle_rdport_errors())
Both use __aer_print_error() to print the AER error bits. Previously
__aer_print_error() also incremented the AER statistics via
pci_dev_aer_stats_incr().
Call pci_dev_aer_stats_incr() early in the entry points instead of in
__aer_print_error() so we update the statistics even if the actual printing
of error bits is rate limited by a future change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-11-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:15 +0000 (18:21 -0500)]
PCI/AER: Simplify pci_print_aer()
Simplify pci_print_aer() by initializing the struct aer_err_info "info"
with a designated initializer list (it was previously initialized with
memset()) and using pci_name().
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-10-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:14 +0000 (18:21 -0500)]
PCI/AER: Initialize aer_err_info before using it
Previously the struct aer_err_info "e_info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "e_info" at declaration with a designated initializer list,
which initializes the other members to zero.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-9-helgaas@kernel.org
Jon Pan-Doh [Thu, 22 May 2025 23:21:12 +0000 (18:21 -0500)]
PCI/AER: Rename aer_print_port_info() to aer_print_source()
Rename aer_print_port_info() to aer_print_source() to be more descriptive.
This logs the Error Source ID logged by a Root Port or Root Complex Event
Collector when it receives an ERR_COR, ERR_NONFATAL, or ERR_FATAL Message.
Bjorn Helgaas [Thu, 22 May 2025 23:21:11 +0000 (18:21 -0500)]
PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number,
device, and function number directly from the Error Source ID. There's no
need to shift and mask it explicitly.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-6-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:10 +0000 (18:21 -0500)]
PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()
Previously we decoded the AER Error Source ID in aer_isr_one_error_type(),
then again in find_source_device() if we didn't find any devices with
errors logged in their AER Capabilities.
Consolidate this so we only decode and log the Error Source ID once in
aer_isr_one_error_type(). Add a "found" parameter so we can add a note
when we didn't find any downstream devices with errors logged in their AER
Capability.
This changes the dmesg logging when we found no devices with errors logged:
- pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0
- pci 0000:00:01.0: AER: found no error details for 0000:02:00.0
+ pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0 (no details found)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-5-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:09 +0000 (18:21 -0500)]
PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()
aer_isr_one_error() duplicates the Error Source ID logging and AER error
processing for Correctable Errors and Uncorrectable Errors. Factor out the
duplicated code to aer_isr_one_error_type().
aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the
Root Port or RCEC pci_dev pointer instead.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-4-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:08 +0000 (18:21 -0500)]
PCI/DPC: Log Error Source ID only when valid
DPC Error Source ID is only valid when the DPC Trigger Reason indicates
that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL
Message (PCIe r6.0, sec 7.9.14.5).
When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE)
or ERR_FATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) from a downstream device,
log the Error Source ID (decoded into domain/bus/device/function). Don't
print the source otherwise, since it's not valid.
For DPC trigger due to reception of ERR_NONFATAL or ERR_FATAL, the dmesg
logging changes:
Previously the "containment event" message was at KERN_INFO and the
"%s detected" message was at KERN_WARNING. Now the single message is at
KERN_WARNING.
Fixes: 26e515713342 ("PCI: Add Downstream Port Containment driver") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-3-helgaas@kernel.org
Bjorn Helgaas [Thu, 22 May 2025 23:21:07 +0000 (18:21 -0500)]
PCI/DPC: Initialize aer_err_info before using it
Previously the struct aer_err_info "info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.
Initialize "info" at declaration so it starts as all zeros.
Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-2-helgaas@kernel.org
PCI/ACPI: Fix allocated memory release on error in pci_acpi_scan_root()
In the pci_acpi_scan_root() function, when creating a PCI bus fails,
we need to free up the previously allocated memory, which can avoid
invalid memory usage and save resources.
Fixes: 789befdfa389 ("arm64: PCI: Migrate ACPI related functions to pci-acpi.c") Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn>
[kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250430060603.381504-1-qiaozhe@iscas.ac.cn
Philipp Stanner [Mon, 19 May 2025 11:30:00 +0000 (13:30 +0200)]
PCI: Remove hybrid-devres usage warnings from kernel-doc
pci/iomap.c still contains warnings about those functions not behaving
in a managed manner if pcim_enable_device() was called. Since all hybrid
behavior that users could know about has been removed by now, those
explicit warnings are no longer necessary.
Remove the hybrid-devres usage warnings from the kernel-doc.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
[kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-8-phasta@kernel.org
Philipp Stanner [Mon, 19 May 2025 11:29:59 +0000 (13:29 +0200)]
PCI: Remove redundant set of request functions
When the demangling of the hybrid devres functions within PCI was
implemented, it was necessary to implement several PCI functions a
second time to avoid cyclic calls, since the hybrid functions in pci.c
call the managed functions in devres.c, which in turn can be directly
used outside of PCI and needed request infrastructure, too.
Therefore, __pcim_request_region_range(), __pci_release_region_range()
and wrappers around them were implemented.
The hybrid nature has recently been removed from all functions in pci.c.
Therefore, the functions in devres.c can now directly use their
counterparts in pci.c without causing a call-cycle.
Remove __pcim_request_region_range(), __pcim_request_region_range() and
the wrappers. Use the corresponding request functions from pci.c in
devres.c
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-7-phasta@kernel.org
Philipp Stanner [Mon, 19 May 2025 11:29:57 +0000 (13:29 +0200)]
PCI: Remove pcim_request_region_exclusive()
pcim_request_region_exclusive() was only needed for redirecting the
relatively exotic exclusive request functions in pci.c in case of them
operating in managed mode.
The managed nature has been removed from those functions and no one else
uses pcim_request_region_exclusive().
Remove pcim_request_region_exclusive().
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-5-phasta@kernel.org
pcim_enable_device() is not related anymore to switching the mode of
operation of any functions. It merely sets up a devres callback for
automatically disabling the PCI device on driver detach.
Adjust the function's documentation.
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-4-phasta@kernel.org
Philipp Stanner [Mon, 19 May 2025 11:29:55 +0000 (13:29 +0200)]
PCI: Remove hybrid devres nature from request functions
All functions based on __pci_request_region() and its release counter
part support "hybrid mode", where the functions become managed if the
PCI device was enabled with pcim_enable_device().
Removing this undesirable feature requires to remove all users who
activated their device with that function and use one of the affected
request functions.
These users were:
ASoC
alsa
cardreader
cirrus
i2c
mmc
mtd
mtd
mxser
net
spi
vdpa
vmwgfx
all of which have been ported to always-managed pcim_ functions by now.
The hybrid nature can, thus, be removed from the aforementioned PCI
functions.
Remove all function guards and documentation in pci.c related to the
hybrid redirection. Adjust the visibility of pcim_release_region().
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-3-phasta@kernel.org
Ilpo Järvinen [Wed, 14 May 2025 13:28:21 +0000 (16:28 +0300)]
PCI: Update Link Speed after retraining
PCIe Link Retraining can alter Link Speed. pcie_retrain_link() that
performs the Link Training is called from bwctrl and ASPM driver.
While bwctrl listens for Link Bandwidth Management Status (LBMS) to
pick up changes in Link Speed, there is a race between
pcie_reset_lbms() clearing LBMS after the Link Training and
pcie_bwnotif_irq() reading the Link Status register. If LBMS is already
cleared when the irq handler reads the register, the interrupt handler
will return early with IRQ_NONE and won't update the Link Speed.
When Link Speed update originates from bwctrl,
pcie_bwctrl_change_speed() ensures Link Speed is updated after the
retraining. ASPM driver, however, calls pcie_retrain_link() but does
not update the Link Speed after retraining which can result in stale
Link Speed. Also, it is possible to have ASPM support with
CONFIG_PCIEPORTBUS=n in which case bwctrl will not be built in (and
thus won't update the Link Speed at all).
To ensure Link Speed is not left stale after Link Training, move the
call to pcie_update_link_speed() from pcie_bwctrl_change_speed() into
pcie_retrain_link().
PCI: Limit visibility of match_driver flag to PCI core
Since commit 58d9a38f6fac ("PCI: Skip attaching driver in device_add()"),
PCI enumeration is split into two steps: In the first step, all devices
are published in sysfs with device_add(). In the second step, drivers are
bound to the devices with device_attach(). To delay driver binding until
the second step, a "bool match_driver" in struct pci_dev is used.
Instead of a bool, use a bit in the "unsigned long priv_flags" to shrink
struct pci_dev a little and prevent use of the bool outside the PCI core
(as has happened with commit cbbc00be2ce3 ("iommu/amd: Prevent binding
other PCI drivers to IOMMU PCI devices")).
Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices"
Commit 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") changed IRQ handling on PCI driver probing.
It inadvertently broke resume from system sleep on AMD platforms:
* 8affb487d4a4 ("x86/PCI: Don't alloc pcibios-irq when MSI is enabled")
* cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices")
The breaking change and one of these two fixes were subsequently reverted:
* fe25d078874f ("Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"")
* 6c777e8799a9 ("Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"")
This rendered the second fix unnecessary, so revert it as well. It used
the match_driver flag in struct pci_dev, which is internal to the PCI core
and not supposed to be touched by arbitrary drivers.
Ilpo Järvinen [Tue, 22 Apr 2025 11:55:47 +0000 (14:55 +0300)]
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
PCIe BW controller counted LBMS assertions for the purposes of the Target
Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the
LBMS count through sysfs to allow better diagnosing link related issues.
Lukas Wunner suggested, however, that adding a trace event would be better
for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user
of the lbms_count.
The logic in pcie_failed_link_retrain() does not require keeping count of
LBMS assertions, so replace lbms_count with a simple flag in pci_dev's
priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem.
Since pcie_failed_link_retrain() runs before bwctrl is probed during boot,
the LBMS in Link Status register still has to be checked by the quirk.
The priv_flags numbering is not continuous because hotplug code added a few
flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in
different branches).
Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: squashed a fix to resolve build failures from
https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
PCI: Explicitly put devices into D0 when initializing
AMD BIOS team has root caused an issue that NVMe storage failed to come
back from suspend to a lack of a call to _REG when NVMe device was probed.
112a7f9c8edbf ("PCI/ACPI: Call _REG when transitioning D-states") added
support for calling _REG when transitioning D-states, but this only works
if the device actually "transitions" D-states.
967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")
added support for runtime PM on PCI devices, but never actually
'explicitly' sets the device to D0.
To make sure that devices are in D0 and that platform methods such as
_REG are called, explicitly set all devices into D0 during initialization.
Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Denis Benato <benato.denis96@gmail.com> Tested-By: Yijun Shen <Yijun_Shen@Dell.com> Tested-By: David Perry <david.perry@amd.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://patch.msgid.link/20250424043232.1848107-1-superm1@kernel.org
Alex Williamson [Tue, 22 Apr 2025 23:05:32 +0000 (17:05 -0600)]
PCI: Increment PM usage counter when probing reset methods
We can get different results probing reset methods for a device depending
on its power state. For example, reading the PM control register of a
device in D3cold will always indicate NoSoftRst+ because we get ~0 data
when the config read fails on PCI, preventing us from correctly probing PM
reset support.
Increment the PM usage counter before any probes and use the cleanup __free
facility to automatically drop the usage counter out of scope.
PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
Link Layer State Changed event as a side effect. On hotplug ports using
in-band presence detect, it additionally causes a Presence Detect Changed
event.
These spurious events should not result in teardown and re-enumeration of
the device in the slot. Hence commit 2e35afaefe64 ("PCI: pciehp: Add
reset_slot() method") masked the Presence Detect Changed Enable bit in the
Slot Control register during a Secondary Bus Reset. Commit 06a8d89af551
("PCI: pciehp: Disable link notification across slot reset") additionally
masked the Data Link Layer State Changed Enable bit.
However masking those bits only disables interrupt generation (PCIe r6.2
sec 6.7.3.1). The events are still visible in the Slot Status register
and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
This can happen if the interrupt is shared or if an unmasked hotplug event
occurs, e.g. Attention Button Pressed or Power Fault Detected.
The likelihood of this happening used to be small, so it wasn't much of a
problem in practice. That has changed with the recent introduction of
bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl:
Re-add BW notification portdrv as PCIe BW controller"):
Bandwidth control shares the interrupt with PCIe hotplug. A Secondary Bus
Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
runs, picks up the masked events and tears down the device in the slot.
As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
root-caused to the incorrect handling of masked hotplug events.
Clearly, a more reliable way is needed to ignore spurious hotplug events.
For Downstream Port Containment, a new ignore mechanism was introduced by
commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
It has been working reliably for the past four years.
Adapt it for Secondary Bus Resets.
Introduce two helpers to annotate code sections which cause spurious link
changes: pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
Use those helpers in lieu of masking interrupts in the Slot Control
register.
Introduce a helper to check whether such a code section is executing
concurrently and if so, await it: pci_hp_spurious_link_change()
Invoke the helper in the hotplug IRQ thread pciehp_ist(). Re-use the
IRQ thread's existing code which ignores DPC-induced link changes unless
the link is unexpectedly down after reset recovery or the device was
replaced during the bus reset.
That code block in pciehp_ist() was previously only executed if a Data
Link Layer State Changed event has occurred. Additionally execute it for
Presence Detect Changed events. That's necessary for compatibility with
PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
before PCIe r1.1. DPC was added with PCIe r3.1 and thus DPC-capable
hotplug ports always support Data Link Layer State Changed events.
But the same cannot be assumed for Secondary Bus Reset, which already
existed in PCIe r1.0.
Secondary Bus Reset is only one of many causes of spurious link changes.
Others include runtime suspend to D3cold, firmware updates or FPGA
reconfiguration. The new pci_hp_{,un}ignore_link_change() helpers may be
used by all kinds of drivers to annotate such code sections, hence their
declarations are publicly visible in <linux/pci.h>. A case in point is
the Mellanox Ethernet driver which disables a firmware reset feature if
the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7
("net/mlx5: Nack sync reset request when HotPlug is enabled"). Going
forward, PCIe hotplug will be able to cope gracefully with all such use
cases once the code sections are properly annotated.
The new helpers internally use two bits in struct pci_dev's priv_flags as
well as a wait_queue. This mirrors what was done for DPC by commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC"). That may
be insufficient if spurious link changes are caused by multiple sources
simultaneously. An example might be a Secondary Bus Reset issued by AER
during FPGA reconfiguration. If this turns out to happen in real life,
support for it can easily be added by replacing the PCI_LINK_CHANGING flag
with an atomic_t counter incremented by pci_hp_ignore_link_change() and
decremented by pci_hp_unignore_link_change(). Instead of awaiting a zero
PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
then simply await a zero counter.
Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Joel Mathew Thomas <proxy0@tutamail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/d04deaf49d634a2edf42bf3c06ed81b4ca54d17b.1744298239.git.lukas@wunner.de
PCI: pciehp: Ignore Presence Detect Changed caused by DPC
Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
amended PCIe hotplug to not bring down the slot upon Data Link Layer State
Changed events caused by Downstream Port Containment.
However Keith reports off-list that if the slot uses in-band presence
detect (i.e. Presence Detect State is derived from Data Link Layer Link
Active), DPC also causes a spurious Presence Detect Changed event.
This needs to be ignored as well.
Unfortunately there's no register indicating that in-band presence detect
is used. PCIe r5.0 sec 7.5.3.10 introduced the In-Band PD Disable bit in
the Slot Control Register. The PCIe hotplug driver sets this bit on
ports supporting it. But older ports may still use in-band presence
detect.
If in-band presence detect can be disabled, Presence Detect Changed events
occurring during DPC must not be ignored because they signal device
replacement. On all other ports, device replacement cannot be detected
reliably because the Presence Detect Changed event could be a side effect
of DPC. On those (older) ports, perform a best-effort device replacement
check by comparing the Vendor ID, Device ID and other data in Config Space
with the values cached in struct pci_dev. Use the existing helper
pciehp_device_replaced() to accomplish this. It is currently #ifdef'ed to
CONFIG_PM_SLEEP in pciehp_core.c, so move it to pciehp_hpc.c where most
other functions accessing config space reside.
Since 1c7f4fe86f17 ("powerpc/pci: Remove pcibios_setup_bus_devices()")
there's no architecture left setting pci_fixup_cardbus. Therefore remove
support from PCI core.
pcim_iounmap_regions() is deprecated. Moreover, it is not necessary to call
it in the driver's remove() function or if probe() fails, since it does
cleanup automatically on driver detach.
Remove all calls to pcim_iounmap_regions().
Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20250327110707.20025-3-phasta@kernel.org
of_node_to_fwnode() is irqdomain's reimplementation of the "officially"
defined of_fwnode_handle(). The former is in the process of being removed,
so use the latter instead.
tools/include: make uapi/linux/types.h usable from assembly
The "real" linux/types.h UAPI header gracefully degrades to a NOOP when
included from assembly code.
Mirror this behaviour in the tools/ variant.
Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the
toolchain automatically.
Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/ Fixes: c9fbaa879508 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:
- add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc
driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT
* tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE
Len Brown [Sun, 6 Apr 2025 18:49:20 +0000 (14:49 -0400)]
tools/power turbostat: v2025.05.06
Support up to 8192 processors
Add cpuidle governor debug telemetry, disabled by default
Update default output to exclude cpuidle invocation counts
Bug fixes
Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
"Fix a perf events time accounting bug"
* tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix child_total_time_enabled accounting bug at task exit
Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix a nonsensical Kconfig combination
- Remove an unnecessary rseq-notification
* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Eliminate useless task_work on execve
sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
... and don't error out so hard on missing module descriptions.
Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').
After that commit the warning became an unconditional hard error.
And it turns out not all modules have been converted despite the claims
to the contrary. As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.
The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself. And so anybody doing full build tests
didn't actually see this failre.
So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage. Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.
Len Brown [Sun, 6 Apr 2025 15:18:39 +0000 (11:18 -0400)]
tools/power turbostat: report CoreThr per measurement interval
The CoreThr column displays total thermal throttling events
since boot time.
Change it to report events during the measurement interval.
This is more useful for showing a user the current conditions.
Total events since boot time are still available to the user via
/sys/devices/system/cpu/cpu*/thermal_throttle/*
Document CoreThr on turbostat.8
Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print") Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Chen Yu <yu.c.chen@intel.com>
Justin Ernst [Wed, 19 Mar 2025 20:27:31 +0000 (15:27 -0500)]
tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output:
"turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151"
A similar error appears with the use of turbostat --cpu when the inputted cpu
range contains a cpu number >= 1024:
# turbostat -c 1100-1151
"--cpu 1100-1151" malformed
...
Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS.
It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low.
For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS
brings support for parsing cpu numbers >= 1024.
Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64.
Signed-off-by: Justin Ernst <justin.ernst@hpe.com> Signed-off-by: Len Brown <len.brown@intel.com>
Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A set of final cleanups for the timer subsystem:
- Convert all del_timer[_sync]() instances over to the new
timer_delete[_sync]() API and remove the legacy wrappers.
Conversion was done with coccinelle plus some manual fixups as
coccinelle chokes on scoped_guard().
- The final cleanup of the hrtimer_init() to hrtimer_setup()
conversion.
This has been delayed to the end of the merge window, so that all
patches which have been merged through other trees are in mainline
and all new users are catched.
Doing this right before rc1 ensures that new code which is merged post
rc1 is not introducing new instances of the original functionality"
* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/timers: Rename the hrtimer_init event to hrtimer_setup
hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
hrtimers: Rename debug_init() to debug_setup()
hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
hrtimers: Make callback function pointer private
hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
hrtimers: Switch to use __htimer_setup()
hrtimers: Delete hrtimer_init()
treewide: Convert new and leftover hrtimer_init() users
treewide: Switch/rename to timer_delete[_sync]()
Merge tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more irq updates from Thomas Gleixner:
"A set of updates for the interrupt subsystem:
- A treewide cleanup for the irq_domain code, which makes the naming
consistent and gets rid of the original oddity of naming domains
'host'.
This is a trivial mechanical change and is done late to ensure that
all instances have been catched and new code merged post rc1 wont
reintroduce new instances.
- A trivial consistency fix in the migration code
The recent introduction of irq_force_complete_move() in the core
code, causes a problem for the nostalgia crowd who maintains ia64
out of tree.
The code assumes that hierarchical interrupt domains are enabled
and dereferences irq_data::parent_data unconditionally. That works
in mainline because both architectures which enable that code have
hierarchical domains enabled. Though it breaks the ia64 build,
which enables the functionality, but does not have hierarchical
domains.
While it's not really a problem for mainline today, this
unconditional dereference is inconsistent and trivially fixable by
using the existing helper function irqd_get_parent_data(), which
has the appropriate #ifdeffery in place"
* tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
irqdomain: Stop using 'host' for domain
irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
Merge tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A revert to fix a adjtimex() regression:
The recent change to prevent that time goes backwards for the coarse
time getters due to immediate multiplier adjustments via adjtimex(),
changed the way how the timekeeping core treats that.
That change result in a regression on the adjtimex() side, which is
user space visible:
1) The forwarding of the base time moves the update out of the
original period and establishes a new one. That's changing the
behaviour of the [PF]LL control, which user space expects to be
applied periodically.
2) The clearing of the accumulated NTP error due to #1, changes the
behaviour as well.
An attempt to delay the multiplier/frequency update to the next tick
did not solve the problem as userspace expects that the multiplier or
frequency updates are in effect, when the syscall returns.
There is a different solution for the coarse time problem available,
so revert the offending commit to restore the existing adjtimex()
behaviour"
* tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
Merge tag 'sh-for-v6.15-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"One important fix and one small configuration update.
The first patch by Artur Rojek fixes an issue with the J2 firmware
loader not being able to find the location of the device tree blob due
to insufficient alignment of the .bss section which rendered J2 boards
unbootable.
The second patch by Johan Korsnes updates the defconfigs on sh to drop
the CONFIG_NET_CLS_TCINDEX configuration option which became obsolete
after 8c710f75256b ("net/sched: Retire tcindex classifier").
Summary:
- sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX
- sh: Align .bss section padding to 8-byte boundary"
* tag 'sh-for-v6.15-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX
sh: Align .bss section padding to 8-byte boundary
Merge tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Improve performance in gendwarfksyms
- Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS
- Support CONFIG_HEADERS_INSTALL for ARCH=um
- Use more relative paths to sources files for better reproducibility
- Support the loong64 Debian architecture
- Add Kbuild bash completion
- Introduce intermediate vmlinux.unstripped for architectures that need
static relocations to be stripped from the final vmlinux
- Fix versioning in Debian packages for -rc releases
- Treat missing MODULE_DESCRIPTION() as an error
- Convert Nios2 Makefiles to use the generic rule for built-in DTB
- Add debuginfo support to the RPM package
* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
kbuild: rpm-pkg: build a debuginfo RPM
kconfig: merge_config: use an empty file as initfile
nios2: migrate to the generic rule for built-in DTB
rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
kbuild: pacman-pkg: hardcode module installation path
kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
modpost: require a MODULE_DESCRIPTION()
kbuild: make all file references relative to source root
x86: drop unnecessary prefix map configuration
kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
kbuild: Add a help message for "headers"
kbuild: deb-pkg: remove "version" variable in mkdebian
kbuild: deb-pkg: fix versioning for -rc releases
Documentation/kbuild: Fix indentation in modules.rst example
x86: Get rid of Makefile.postlink
kbuild: Create intermediate vmlinux build with relocations preserved
kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
kbuild: link-vmlinux.sh: Make output file name configurable
kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
Revert "kheaders: Ignore silly-rename files"
...
Merge tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, mostly from the end of last week, this week was very
quiet, maybe you scared everyone away. It's mostly amdgpu, and xe,
with some i915, adp and bridge bits, since I think this is overly
quiet I'd expect rc2 to be a bit more lively.
bridge:
- tda998x: Select CONFIG_DRM_KMS_HELPER
amdgpu:
- Guard against potential division by 0 in fan code
- Zero RPM support for SMU 14.0.2
- Properly handle SI and CIK support being disabled
- PSR fixes
- DML2 fixes
- DP Link training fix
- Vblank fixes
- RAS fixes
- Partitioning fix
- SDMA fix
- SMU 13.0.x fixes
- Rom fetching fix
- MES fixes
- Queue reset fix
xe:
- Fix NULL pointer dereference on error path
- Add missing HW workaround for BMG
- Fix survivability mode not triggering
- Fix build warning when DRM_FBDEV_EMULATION is not set
i915:
- Bounds check for scalers in DSC prefill latency computation
- Fix build by adding a missing include
adp:
- Fix error handling in plane setup"
# -----BEGIN PGP SIGNATURE-----
* tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
drm/i2c: tda998x: select CONFIG_DRM_KMS_HELPER
drm/amdgpu/gfx12: fix num_mec
drm/amdgpu/gfx11: fix num_mec
drm/amd/pm: Add gpu_metrics_v1_8
drm/amdgpu: Prefer shadow rom when available
drm/amd/pm: Update smu metrics table for smu_v13_0_6
drm/amd/pm: Remove host limit metrics support
Remove unnecessary firmware version check for gc v9_4_2
drm/amdgpu: stop unmapping MQD for kernel queues v3
Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"
drm/amdgpu: Parse all deferred errors with UMC aca handle
drm/amdgpu: Update ta ras block
drm/amdgpu: Add NPS2 to DPX compatible mode
drm/amdgpu: Use correct gfx deferred error count
drm/amd/display: Actually do immediate vblank disable
drm/amd/display: prevent hang on link training fail
Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting"
drm/amd/display: Increase vblank offdelay for PSR panels
drm/amd: Handle being compiled without SI or CIK support better
drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2
...
Uday Shankar [Mon, 31 Mar 2025 22:46:32 +0000 (16:46 -0600)]
kbuild: rpm-pkg: build a debuginfo RPM
The rpm-pkg make target currently suffers from a few issues related to
debuginfo:
1. debuginfo for things built into the kernel (vmlinux) is not available
in any RPM produced by make rpm-pkg. This makes using tools like
systemtap against a make rpm-pkg kernel impossible.
2. debug source for the kernel is not available. This means that
commands like 'disas /s' in gdb, which display source intermixed with
assembly, can only print file names/line numbers which then must be
painstakingly resolved to actual source in a separate editor.
3. debuginfo for modules is available, but it remains bundled with the
.ko files that contain module code, in the main kernel RPM. This is a
waste of space for users who do not need to debug the kernel (i.e.
most users).
Address all of these issues by additionally building a debuginfo RPM
when the kernel configuration allows for it, in line with standard
patterns followed by RPM distributors. With these changes:
1. systemtap now works (when these changes are backported to 6.11, since
systemtap lags a bit behind in compatibility), as verified by the
following simple test script:
129 op_str = blk_op_name[op];
130
131 return op_str;
132 }
0xffffffff814c8768 <+40>: jmp 0xffffffff81d01360 <__x86_return_thunk>
End of assembler dump.
3. The size of the main kernel package goes down substantially,
especially if many modules are built (quite typical). Here is a
comparison of installed size of the kernel package (configured with
allmodconfig, dwarf4 debuginfo, and module compression turned off)
before and after this patch:
# rpm -qi kernel-6.13* | grep -E '^(Version|Size)'
Version : 6.13.0postpatch+
Size : 1382874089
Version : 6.13.0prepatch+
Size : 17870795887
This is a ~92% size reduction.
Note that a debuginfo package can only be produced if the following
configs are set:
- CONFIG_DEBUG_INFO=y
- CONFIG_MODULE_COMPRESS=n
- CONFIG_DEBUG_INFO_SPLIT=n
The first of these is obvious - we can't produce debuginfo if the build
does not generate it. The second two requirements can in principle be
removed, but doing so is difficult with the current approach, which uses
a generic rpmbuild script find-debuginfo.sh that processes all packaged
executables. If we want to remove those requirements the best path
forward is likely to add some debuginfo extraction/installation logic to
the modules_install target (controllable by flags). That way, it's
easier to operate on modules before they're compressed, and the logic
can be reused by all packaging targets.
Daniel Gomez [Fri, 28 Mar 2025 14:28:37 +0000 (14:28 +0000)]
kconfig: merge_config: use an empty file as initfile
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Sun, 22 Dec 2024 00:30:53 +0000 (09:30 +0900)]
nios2: migrate to the generic rule for built-in DTB
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot
DTBs") introduced generic support for built-in DTBs.
Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.
To keep consistency across architectures, this commit also renames
CONFIG_NIOS2_DTB_SOURCE_BOOL to CONFIG_BUILTIN_DTB, and
CONFIG_NIOS2_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.
Johan Korsnes [Sun, 23 Mar 2025 19:13:30 +0000 (20:13 +0100)]
sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX
This option was removed from Kconfig in 8c710f75256b ("net/sched:
Retire tcindex classifier") but from the defconfigs.
Fixes: 8c710f75256b ("net/sched: Retire tcindex classifier") Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Artur Rojek [Sun, 16 Feb 2025 17:55:44 +0000 (18:55 +0100)]
sh: Align .bss section padding to 8-byte boundary
J2-based devices expect to find a device tree blob at the end of the
.bss section. As of a77725a9a3c5 ("scripts/dtc: Update to upstream
version v1.6.1-19-g0a3a9d3449c8"), libfdt enforces 8-byte alignment
for the DTB, causing J2 devices to fail early in sh_fdt_init().
As the J2 loader firmware calculates the DTB location based on the kernel
image .bss section size rather than the __bss_stop symbol offset, the
required alignment can't be enforced with BSS_SECTION(0, PAGE_SIZE, 8).
To fix this, inline a modified version of the above macro which grows
.bss by the required size. While this change affects all existing SH
boards, it should be benign on platforms which don't need this alignment.
Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: Rob Landley <rob@landley.net> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Merge tag 'input-for-v6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a brand new driver for touchpads and touchbars in newer Apple devices
- support for Berlin-A series in goodix-berlin touchscreen driver
- improvements to matrix_keypad driver to better handle GPIOs toggling
- assorted small cleanups in other input drivers
* tag 'input-for-v6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: goodix_berlin - add support for Berlin-A series
dt-bindings: input: goodix,gt9916: Document gt9897 compatible
dt-bindings: input: matrix_keypad - add wakeup-source property
dt-bindings: input: matrix_keypad - add missing property
Input: pm8941-pwrkey - fix dev_dbg() output in pm8941_pwrkey_irq()
Input: synaptics - hide unused smbus_pnp_ids[] array
Input: apple_z2 - fix potential confusion in Kconfig
Input: matrix_keypad - use fsleep for delays after activating columns
Input: matrix_keypad - add settle time after enabling all columns
dt-bindings: input: matrix_keypad: add settle time after enabling all columns
dt-bindings: input: matrix_keypad: convert to YAML
dt-bindings: input: Correct indentation and style in DTS example
MAINTAINERS: Add entries for Apple Z2 touchscreen driver
Input: apple_z2 - add a driver for Apple Z2 touchscreens
dt-bindings: input: touchscreen: Add Z2 controller
Input: Switch to use hrtimer_setup()
Input: drop vb2_ops_wait_prepare/finish
Nam Cao [Wed, 5 Feb 2025 10:55:20 +0000 (11:55 +0100)]
hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename debug_init_on_stack() to debug_setup_on_stack() as well, to keep the
names consistent.
Nam Cao [Wed, 5 Feb 2025 10:55:18 +0000 (11:55 +0100)]
hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
All the hrtimer_init*() functions have been renamed to hrtimer_setup*().
Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() as well, to
keep the names consistent.
Nam Cao [Wed, 5 Feb 2025 10:55:17 +0000 (11:55 +0100)]
hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
The struct hrtimer::function field can only be changed using
hrtimer_setup*() or hrtimer_update_function(), and both already null-check
'function'. Therefore, null-checking 'function' in hrtimer_start_range_ns()
is not necessary.
Nam Cao [Wed, 5 Feb 2025 10:55:16 +0000 (11:55 +0100)]
hrtimers: Make callback function pointer private
Make the struct hrtimer::function field private, to prevent users from
changing this field in an unsafe way. hrtimer_update_function() should be
used if the callback function needs to be changed.
Nam Cao [Wed, 5 Feb 2025 10:55:11 +0000 (11:55 +0100)]
hrtimers: Switch to use __htimer_setup()
__hrtimer_init_sleeper() calls __hrtimer_init() and also sets up the
callback function. But there is already __hrtimer_setup() which does both
actions.
Switch to use __hrtimer_setup() to simplify the code.
Merge tag 'v6.15-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a race condition in the newly added eip93 driver"
* tag 'v6.15-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: inside-secure/eip93 - acquire lock on eip93_put_descriptor hash
Merge tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
missing base register for relocated lowcore address
- Fix build failure on older linkers by conditionally adding the
-no-pie linker option only when it is supported
- Fix inaccurate kernel messages in vfio-ap by providing descriptive
error notifications for AP queue sharing violations
- Fix PCI isolation logic by ensuring non-VF devices correctly return
false in zpci_bus_is_isolated_vf()
- Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
proper sentinel element, preventing potential overruns and
translation errors
- Cleanup header dependency problems with asm-offsets.c
- Add fault info for unexpected low-address protection faults in user
mode
- Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
handling with common code handling
- Use bitop functions to implement CPU flag helper functions to ensure
that bits cannot get lost if modified in different contexts on a CPU
- Remove unused machine_flags for the lowcore
* tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
s390/pci: Fix dev.dma_range_map missing sentinel element
s390/mm: Dump fault info in case of low address protection fault
s390/smp: Add support for HOTPLUG_SMT
s390: Fix linker error when -no-pie option is unavailable
s390/processor: Use bitop functions for cpu flag helper functions
s390/asm-offsets: Remove ASM_OFFSETS_C
s390/asm-offsets: Include ftrace_regs.h instead of ftrace.h
s390/kvm: Split kvm_host header file
s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
s390/lowcore: Remove unused machine_flags
s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocation
Merge tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- reconnect fixes: three for updating rsize/wsize and an SMB1 reconnect
fix
- RFC1001 fixes: fixing connections to nonstandard ports, and negprot
retries
- fix mfsymlinks to old servers
- make mapping of open flags for SMB1 more accurate
- permission fixes: adding retry on open for write, and one for stat to
workaround unexpected access denied
- add two new xattrs, one for retrieving SACL and one for retrieving
owner (without having to retrieve the whole ACL)
- fix mount parm validation for echo_interval
- minor cleanup (including removing now unneeded cifs_truncate_page)
* tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
cifs: Implement is_network_name_deleted for SMB1
cifs: Remove cifs_truncate_page() as it should be superfluous
cifs: Do not add FILE_READ_ATTRIBUTES when using GENERIC_READ/EXECUTE/ALL
cifs: Improve SMB2+ stat() to work also without FILE_READ_ATTRIBUTES
cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES
cifs: Fix querying and creating MF symlinks over SMB1
cifs: Fix access_flags_to_smbopen_mode
cifs: Fix negotiate retry functionality
cifs: Improve handling of NetBIOS packets
cifs: Allow to disable or force initialization of NetBIOS session
cifs: Add a new xattr system.smb3_ntsd_owner for getting or setting owner
cifs: Add a new xattr system.smb3_ntsd_sacl for getting or setting SACLs
smb: client: Update IO sizes after reconnection
smb: client: Store original IO parameters and prevent zero IO sizes
smb:client: smb: client: Add reverse mapping from tcon to superblocks
cifs: remove unreachable code in cifs_get_tcp_session()
cifs: fix integer overflow in match_server()