Zhao Mengmeng [Fri, 27 Mar 2026 06:17:57 +0000 (14:17 +0800)]
scx_central: Defer timer start to central dispatch to fix init error
scx_central currently assumes that ops.init() runs on the selected
central CPU and aborts otherwise. This is no longer true, as ops.init()
is invoked from the scx_enable_helper thread, which can run on any
CPU.
As a result, sched_setaffinity() from userspace doesn't work, causing
scx_central to fail when loading with:
Fix this by:
- Defer bpf_timer_start() to the first dispatch on the central CPU.
- Initialize the BPF timer in central_init() and kick the central CPU
to guarantee entering the dispatch path on the central CPU immediately.
- Remove the unnecessary sched_setaffinity() call in userspace.
Yeoreum Yun [Sat, 14 Mar 2026 17:51:31 +0000 (17:51 +0000)]
arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
The purpose of supporting LSUI is to eliminate PAN toggling. CPUs that
support LSUI are unlikely to support a 32-bit runtime. Rather than
emulating the SWP instruction using LSUI instructions in order to remove
PAN toggling, simply disable SWP emulation.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
[catalin.marinas@arm.com: some tweaks to the in-code comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
dax/hmem, cxl: Defer and resolve Soft Reserved ownership
The current probe time ownership check for Soft Reserved memory based
solely on CXL window intersection is insufficient. dax_hmem probing is not
always guaranteed to run after CXL enumeration and region assembly, which
can lead to incorrect ownership decisions before the CXL stack has
finished publishing windows and assembling committed regions.
Introduce deferred ownership handling for Soft Reserved ranges that
intersect CXL windows. When such a range is encountered during the
initial dax_hmem probe, schedule deferred work to wait for the CXL stack
to complete enumeration and region assembly before deciding ownership.
Once the deferred work runs, evaluate each Soft Reserved range
individually: if a CXL region fully contains the range, skip it and let
dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
ownership model avoids the need for CXL region teardown and
alloc_dax_region() resource exclusion prevents double claiming.
Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
so it survives module reload. Ensure dax_cxl defers driver registration
until dax_hmem has completed ownership resolution. dax_cxl calls
dax_hmem_flush_work() before cxl_driver_register(), which both waits for
the deferred work to complete and creates a module symbol dependency that
forces dax_hmem.ko to load before dax_cxl.
Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260322195343.206900-9-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
cxl/region: Add helper to check Soft Reserved containment by CXL regions
Add a helper to determine whether a given Soft Reserved memory range is
fully contained within the committed CXL region.
This helper provides a primitive for policy decisions in subsequent
patches such as co-ordination with dax_hmem to determine whether CXL has
fully claimed ownership of Soft Reserved memory ranges.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20260322195343.206900-8-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
dax: Track all dax_region allocations under a global resource tree
Introduce a global "DAX Regions" resource root and register each
dax_region->res under it via request_resource(). Release the resource on
dax_region teardown.
By enforcing a single global namespace for dax_region allocations, this
ensures only one of dax_hmem or dax_cxl can successfully register a
dax_region for a given range.
Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260322195343.206900-7-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Dan Williams [Sun, 22 Mar 2026 19:53:38 +0000 (19:53 +0000)]
dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
dax_cxl.
In addition, defer registration of the dax_cxl driver to a workqueue
instead of using module_cxl_driver(). This ensures that dax_hmem has
an opportunity to initialize and register its deferred callback and make
ownership decisions before dax_cxl begins probing and claiming Soft
Reserved ranges.
Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
out of line from other synchronous probing avoiding ordering
dependencies while coordinating ownership decisions with dax_hmem.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://patch.msgid.link/20260322195343.206900-6-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Dan Williams [Sun, 22 Mar 2026 19:53:37 +0000 (19:53 +0000)]
dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
so that HMEM only defers Soft Reserved ranges when CXL DAX support is
enabled. This makes the coordination between HMEM and the CXL stack more
precise and prevents deferral in unrelated CXL configurations.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260322195343.206900-5-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Dan Williams [Sun, 22 Mar 2026 19:53:36 +0000 (19:53 +0000)]
dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
Reserved ranges.
Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause HMEM to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.
Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before HMEM makes ownership decisions for Soft Reserved ranges.
Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
Soft Reserved ranges before CXL drivers have had a chance to claim them.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://patch.msgid.link/20260322195343.206900-4-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
dax/hmem: Factor HMEM registration into __hmem_register_device()
Separate the CXL overlap check from the HMEM registration path and keep
the platform-device setup in a dedicated __hmem_register_device().
This makes hmem_register_device() the policy entry point for deciding
whether a range should be deferred to CXL, while __hmem_register_device()
handles the HMEM registration flow.
No functional changes.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260322195343.206900-3-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
dax/bus: Use dax_region_put() in alloc_dax_region() error path
alloc_dax_region() calls kref_init() on the dax_region early in the
function, but the error path for sysfs_create_groups() failure uses
kfree() directly to free the dax_region. This bypasses the kref lifecycle.
Use dax_region_put() instead to handle kref lifecycle correctly.
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260322195343.206900-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Youssef Samir [Thu, 5 Feb 2026 12:34:14 +0000 (13:34 +0100)]
accel/qaic: Handle DBC deactivation if the owner went away
When a DBC is released, the device sends a QAIC_TRANS_DEACTIVATE_FROM_DEV
transaction to the host over the QAIC_CONTROL MHI channel. QAIC handles
this by calling decode_deactivate() to release the resources allocated for
that DBC. Since that handling is done in the qaic_manage_ioctl() context,
if the user goes away before receiving and handling the deactivation, the
host will be out-of-sync with the DBCs available for use, and the DBC
resources will not be freed unless the device is removed. If another user
loads and requests to activate a network, then the device assigns the same
DBC to that network, QAIC will "indefinitely" wait for dbc->in_use = false,
leading the user process to hang.
As a solution to this, handle QAIC_TRANS_DEACTIVATE_FROM_DEV transactions
that are received after the user has gone away.
Fixes: 129776ac2e38 ("accel/qaic: Add control path") Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patch.msgid.link/20260205123415.3870898-1-youssef.abdulrahman@oss.qualcomm.com
John Ogness [Thu, 26 Mar 2026 13:38:02 +0000 (14:44 +0106)]
printk_ringbuffer: Add sanity check for 0-size data
get_data() has a sanity check for regular data blocks to ensure at
least space for the ID exists. But a regular block should also have
at least 1 byte of data (otherwise it would be data-less instead of
regular).
Expand the get_data() block size sanity check to additionally expect
at least 1 byte of data.
Commit cc3bad11de6e ("printk_ringbuffer: Fix check of valid data
size when blk_lpos overflows") added sanity checking to get_data()
to avoid returning data of illegal sizes (too large or too small).
It uses the helper function data_check_size() for the check.
However, data_check_size() expects the size of the data, not the
size of the data block. get_data() is providing the size of the
data block. This means that if the data size (text_buf_size) is
at or near the maximum legal size:
data_check_size() will report failure because it adds
sizeof(prb_data_block) to the provided size. The sanity check in
get_data() is counting the data block header twice. The result is
that the reader fails to read the legal record.
Since get_data() subtracts the data block header size before returning,
move the sanity check to after the subtraction.
Luckily printk() is not vulnerable to this problem because
truncate_msg() limits printk-messages to 1/4 of the ringbuffer.
Indeed, by adjusting the printk_ringbuffer KUnit test, which does not
use printk() and its truncate_msg() check, it is easy to see that the
reader fails and the WARN_ON is triggered.
Fixes: cc3bad11de6e ("printk_ringbuffer: Fix check of valid data size when blk_lpos overflows") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/20260326133809.8045-1-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
====================
bpf: classify block device hooks and add selftests
A bunch of new hooks for managing block devices were added a while ago
but they weren't appropriately classified. Classify them and add a test
program so we catch regressions.
Note that for whatever reason building the bpf selftests locally seems
to fail for all kinds of arcane reasons for me. That might just be my
fault. I added a pr against the ci to have the selftests run but to test
this meaningfully it needs veritysetup and dmverity support. I'm not
sure if that's available already.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v2:
- No changes.
- Link to v1: https://patch.msgid.link/20260220-work-bpf-bdev-v1-0-c53e852c4702@kernel.org
A bunch of new hooks for managing block devices were added a while ago
but they weren't actually appropriately classified.
* bpf_lsm_bdev_alloc() is called when the inode for the block
device is allocated. This happens from a sleepable context so mark the
function as sleepable. When this function is called the memory for the
block device storage embedded into the inode is zeroed. That block
device cannot be meaningfully reference or interacted with at this
point. So mark it as untrusted for now.
* bpf_lsm_bdev_free() is called when the inode for the block
device is freed. A bunch of memory associated with the block device
has already been freed and there's dangling pointers in there. So mark
it as untrusted. It cannot be meaningfully referenced or interacted
with anymore. It is also called from sb->s_op->free_inode:: which
means it runs in rcu context (most of the times). So leave it as
non-sleepable.
* bpf_lsm_bdev_setintegrity() is called when a dm-verity device
is instantiated (glossing over details for simplicity of the commit
message). The block device is very much alive so it remains a trusted
hook. It's also called with device mapper's suspend lock held and so
the hook is able to sleep so mark it sleepable.
Jan Kara [Thu, 26 Mar 2026 14:06:32 +0000 (15:06 +0100)]
udf: Fix race between file type conversion and writeback
udf_setsize() can race with udf_writepages() as follows:
udf_setsize() udf_writepages()
if (iinfo->i_alloc_type ==
ICBTAG_FLAG_AD_IN_ICB)
err = udf_expand_file_adinicb(inode);
err = udf_extend_file(inode, newsize);
udf_adinicb_writepages()
memcpy_from_file_folio() - crash
because inode size is too big.
Fix the problem by checking the file type under folio lock in
udf_handle_page_wb() handler called from __mpage_writepages() which
properly serializes with udf_expand_file_adinicb().
Jan Kara [Thu, 26 Mar 2026 14:06:31 +0000 (15:06 +0100)]
mpage: Provide variant of mpage_writepages() with own optional folio handler
Some filesystems need to treat some folios specially (for example for
inodes with inline data). Doing the handling in their .writepages method
in a race-free manner results in duplicating some of the writeback
internals. So provide generalized version of mpage_writepages() that
allows filesystem to provide a handler called for each folio which can
handle the folio in a special way.
When vNTB is used as a PCI endpoint function, the NTB device is backed
by a virtual PCI function. For DMA API allocations and mappings, NTB
clients must use the device that is associated with the IOMMU domain.
Implement ntb_dev_ops->get_dma_dev() for pci-epf-vntb and return the EPC
parent device.
Suggested-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260306031443.1911860-4-den@valinux.co.jp
Koichiro Den [Fri, 6 Mar 2026 03:14:42 +0000 (12:14 +0900)]
NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
ntb_transport currently uses ndev->pdev->dev for coherent allocations
and frees.
Switch the coherent buffer allocation/free paths to use
ntb_get_dma_dev(), so ntb_transport can work with NTB implementations
where the NTB PCI function is not the right device to use for DMA
mappings.
Suggested-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260306031443.1911860-3-den@valinux.co.jp
Koichiro Den [Fri, 6 Mar 2026 03:14:41 +0000 (12:14 +0900)]
NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
Some NTB implementations are backed by a PCI function that is not the right
struct device to use with DMA API helpers (e.g. due to IOMMU topology, or
because the NTB device is virtual).
Add an optional .get_dma_dev() callback to struct ntb_dev_ops and provide a
helper, ntb_get_dma_dev(), so NTB clients can use the appropriate struct
device for DMA allocations and mappings.
If the callback is not implemented, ntb_get_dma_dev() returns the current
default (ntb->dev.parent). Drivers that implement .get_dma_dev() must
return a non-NULL device.
Suggested-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: format doc] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260306031443.1911860-2-den@valinux.co.jp
Jens Axboe [Fri, 27 Mar 2026 15:51:17 +0000 (09:51 -0600)]
Merge tag 'nvme-7.1-2026-03-27' of git://git.infradead.org/nvme into for-7.1/block
Pull NVMe updates from Keith:
"- Fabrics authentication updates (Eric, Alistar)
- Enanced block queue limits support (Caleb)
- Workqueue usage updates (Marco)
- A new write zeroes device quirk (Robert)
- Tagset cleanup fix for loop device (Nilay)"
* tag 'nvme-7.1-2026-03-27' of git://git.infradead.org/nvme: (41 commits)
nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown
nvme: add WQ_PERCPU to alloc_workqueue users
nvmet-fc: add WQ_PERCPU to alloc_workqueue users
nvmet: replace use of system_wq with system_percpu_wq
nvme-auth: Don't propose NVME_AUTH_DHGROUP_NULL with SC_C
nvme: Add the DHCHAP maximum HD IDs
nvme-pci: add NVME_QUIRK_DISABLE_WRITE_ZEROES for Kingston OM3SGP4
nvme: respect NVME_QUIRK_DISABLE_WRITE_ZEROES when wzsl is set
nvmet: report NPDGL and NPDAL
nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT
nvme: set discard_granularity from NPDG/NPDA
nvme: add from0based() helper
nvme: always issue I/O Command Set specific Identify Namespace
nvme: update nvme_id_ns OPTPERF constants
nvme: fold nvme_config_discard() into nvme_update_disk_info()
nvme: add preferred I/O size fields to struct nvme_id_ns_nvm
nvme: Allow reauth from sysfs
nvme: Expose the tls_configured sysfs for secure concat connections
nvmet-tcp: Don't free SQ on authentication success
nvmet-tcp: Don't error if TLS is enabed on a reset
...
Sohil Mehta [Wed, 25 Mar 2026 23:01:49 +0000 (16:01 -0700)]
x86/fred: Remove kernel log message when initializing exceptions
When FRED is enabled, its initialization message is printed for every CPU
during boot as well as during suspend-resume. This debug message can be noisy
and it isn't very useful unless someone is debugging FRED itself.
As FRED is enabled by default, remove the log message as mentioned in
the code comment.
James Morse [Fri, 13 Mar 2026 14:46:16 +0000 (14:46 +0000)]
arm_mpam: Quirk CMN-650's CSU NRDY behaviour
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
This tells us the monitor never finishes scanning the cache. The erratum
document says to wait the maximum time, then ignore the field.
Add a flag to indicate whether this is the final attempt to read the
counter, and when this quirk is applied, ignore the NRDY field.
This means accesses to this counter will always retry, even if the counter
was previously programmed to the same values.
The counter value is not expected to be stable, it drifts up and down with
each allocation and eviction. The CSU register provides the value for a
point in time.
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
The registers MSMON_MBWU_L and MSMON_MBWU return the number of requests
rather than the number of bytes transferred.
Bandwidth resource monitoring is performed at the last level cache, where
each request arrive in 64Byte granularity. The current implementation
returns the number of transactions received at the last level cache but
does not provide the value in bytes. Scaling by 64 gives an accurate byte
count to match the MPAM specification for the MSMON_MBWU and MSMON_MBWU_L
registers. This patch fixes the issue by reporting the actual number of
bytes instead of the number of transactions from __ris_msmon_read().
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
In the T241 implementation of memory-bandwidth partitioning, in the absence
of contention for bandwidth, the minimum bandwidth setting can affect the
amount of achieved bandwidth. Specifically, the achieved bandwidth in the
absence of contention can settle to any value between the values of
MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
zero (below 0.78125%), once a core enters a throttled state, it will never
leave that state.
The first issue is not a concern if the MPAM software allows to program
MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
In the scenario where the resctrl doesn't support the MBW_MIN interface via
sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention,
software should configure a relatively narrow gap between MBW_MIN and
MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
Clear the feature MBW_MIN feature from the class to ensure we don't
accidentally change behaviour when resctrl adds support for a MBW_MIN
interface.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
The MPAM bandwidth partitioning controls will not be correctly configured,
and hardware will retain default configuration register values, meaning
generally that bandwidth will remain unprovisioned.
To address the issue, follow the below steps after updating the MBW_MIN
and/or MBW_MAX registers.
- Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
(0x360048 + slice*0x10000 + partid*8). These registers are read-only.
- Continue iterating until all 12 shadow register values match in a loop.
pr_warn_once if the values fail to match within the loop count 1000.
- Perform 64b writes with the value 0x0 to the two spare registers at
offsets 0x1b0000 and 0x1c0000.
In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
are transformed into broadcast writes to the 12 shadow registers. The
final two writes to the spare registers cause a final rank of downstream
micro-architectural MPAM registers to be updated from the shadow copies.
The intervening loop to read the 12 shadow registers helps avoid a race
condition where writes to the spare registers occur before all shadow
registers have been updated.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
The MPAM specification includes the MPAMF_IIDR, which serves to uniquely
identify the MSC implementation through a combination of implementer
details, product ID, variant, and revision. Certain hardware issues/errata
can be resolved using software workarounds.
Introduce a quirk framework to allow workarounds to be enabled based on the
MPAMF_IIDR value.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Takashi Iwai [Fri, 27 Mar 2026 15:30:54 +0000 (16:30 +0100)]
ALSA: usb-audio: Extend max number of channels to 64
The current limitation of 16 as MAX_CHANNELS is rather historical at
the time of UAC1 definition. As there seem already devices with a
higher number of mixer channels, we should extend it too. As an ad
hoc update, let's raise it to 64 so that it can still fit in a single
long-long integer.
Takashi Iwai [Fri, 27 Mar 2026 15:30:53 +0000 (16:30 +0100)]
ALSA: usb-audio: Replace hard-coded number with MAX_CHANNELS
One place in mixer.c still used a hard-coded number 16 instead of
MAX_CHANNELS. Replace with it, so that we can extend the max number
of channels gracefully.
James Morse [Fri, 13 Mar 2026 14:46:09 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
A few resctrl features and hooks need to be provided, but aren't needed or
supported on MPAM platforms.
resctrl has individual hooks to separately enable and disable the
closid/partid and rmid/pmg context switching code. For MPAM this is all the
same thing, as the value in struct task_struct is used to cache the value
that should be written to hardware. arm64's context switching code is
enabled once MPAM is usable, but doesn't touch the hardware unless the
value has changed.
For now event configuration is not supported, and can be turned off by
returning 'false' from resctrl_arch_is_evt_configurable().
The new io_alloc feature is not supported either, always return false from
the enable helper to indicate and fail the enable.
Add this, and empty definitions for the other hooks.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:46:08 +0000 (14:46 +0000)]
arm_mpam: resctrl: Update the rmid reallocation limit
resctrl's limbo code needs to be told when the data left in a cache is
small enough for the partid+pmg value to be re-allocated.
x86 uses the cache size divided by the number of rmid users the cache may
have. Do the same, but for the smallest cache, and with the number of
partid-and-pmg users.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:46:06 +0000 (14:46 +0000)]
arm_mpam: resctrl: Allow resctrl to allocate monitors
When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs to allocate
a monitor on the corresponding resource. Monitors are allocated by class
instead of component.
Add helpers to allocate a CSU monitor. These helper return an out of range
value for MBM counters.
Allocating a montitor context is expected to block until hardware resources
become available. This only makes sense for QOS_L3_OCCUP as unallocated MBM
counters are losing data.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:46:05 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add support for csu counters
resctrl exposes a counter via a file named llc_occupancy. This isn't really
a counter as its value goes up and down, this is a snapshot of the cache
storage usage monitor.
Add some picking code which will only find an L3. The resctrl counter
file is called llc_occupancy but we don't check it is the last one as
it is already identified as L3.
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Dave Martin <dave.martin@arm.com> Signed-off-by: Dave Martin <dave.martin@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Ben Horgan [Fri, 13 Mar 2026 14:46:04 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add monitor initialisation and domain boilerplate
Add the boilerplate that tells resctrl about the mpam monitors that are
available. resctrl expects all (non-telemetry) monitors to be on the L3 and
so advertise them there and invent an L3 resctrl resource if required. The
L3 cache itself has to exist as the cache ids are used as the domain
ids.
Bring the resctrl monitor domains online and offline based on the cpus
they contain.
Support for specific monitor types is left to later.
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Signed-off-by: James Morse <james.morse@arm.com>
Dave Martin [Fri, 13 Mar 2026 14:46:03 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add kunit test for control format conversions
resctrl specifies the format of the control schemes, and these don't match
the hardware.
Some of the conversions are a bit hairy - add some kunit tests.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[morse: squashed enough of Dave's fixes in here that it's his patch now!] Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:46:02 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add support for 'MB' resource
resctrl supports 'MB', as a percentage throttling of traffic from the
L3. This is the control that mba_sc uses, so ideally the class chosen
should be as close as possible to the counters used for mbm_total. If there
is a single L3, it's the last cache, and the topology of the memory matches
then the traffic at the memory controller will be equivalent to that at
egress of the L3. If these conditions are met allow the memory class to
back MB.
MB's percentage control should be backed either with the fixed point
fraction MBW_MAX or bandwidth portion bitmaps. The bandwidth portion
bitmaps is not used as its tricky to pick which bits to use to avoid
contention, and may be possible to expose this as something other than a
percentage in the future.
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Thierry Reding [Thu, 26 Mar 2026 11:28:31 +0000 (12:28 +0100)]
soc/tegra: bpmp: Use ENODEV instead of ENOTSUPP
ENOTSUPP is not a SUSV4 error code and checkpatch will warn about it.
It is also not very descriptive in the context of BPMP, so use the
ENODEV error code instead. For the stub implementations this is a more
accurate description of what the failure is.
Ben Horgan [Fri, 13 Mar 2026 14:46:01 +0000 (14:46 +0000)]
arm_mpam: resctrl: Wait for cacheinfo to be ready
In order to calculate the rmid realloc threshold the size of the cache
needs to be known. Cache domains will also be named after the cache id. So
that this information can be extracted from cacheinfo we need to wait for
it to be ready. The cacheinfo information is populated in device_initcall()
so we wait for that.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Ben Horgan [Fri, 13 Mar 2026 14:46:00 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add rmid index helpers
Because MPAM's pmg aren't identical to RDT's rmid, resctrl handles some
data structures by index. This allows x86 to map indexes to RMID, and MPAM
to map them to partid-and-pmg.
Add the helpers to do this.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
MPAM uses a fixed-point formats for some hardware controls. Resctrl
provides the bandwidth controls as a percentage. Add helpers to convert
between these.
Ensure bwa_wd is at most 16 to make it clear higher values have no meaning.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
When CDP is not enabled, the 'rmid_entry's in the limbo list,
rmid_busy_llc, map directly to a (PARTID,PMG) pair and when CDP is enabled
the mapping is to two different pairs. As the limbo list is reused between
mounts and CDP disabled on unmount this can lead to stale mapping and the
limbo handler will then make monitor reads with potentially out of range
PARTID. This may then cause an MPAM error interrupt and the driver will
disable MPAM.
No problems are expected if you just mount the resctrl file system
once with CDP enabled and never unmount it. Hide CDP emulation behind
CONFIG_EXPERT to protect the unwary.
Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:57 +0000 (14:45 +0000)]
arm_mpam: resctrl: Add CDP emulation
Intel RDT's CDP feature allows the cache to use a different control value
depending on whether the accesses was for instruction fetch or a data
access. MPAM's equivalent feature is the other way up: the CPU assigns a
different partid label to traffic depending on whether it was instruction
fetch or a data access, which causes the cache to use a different control
value based solely on the partid.
MPAM can emulate CDP, with the side effect that the alternative partid is
seen by all MSC, it can't be enabled per-MSC.
Add the resctrl hooks to turn this on or off. Add the helpers that match a
closid against a task, which need to be aware that the value written to
hardware is not the same as the one resctrl is using.
Update the 'arm64_mpam_global_default' variable the arch code uses during
context switch to know when the per-cpu value should be used instead. Also,
update these per-cpu values and sync the resulting mpam partid/pmg
configuration to hardware.
resctrl can enable CDP for L2 caches, L3 caches or both. When it is enabled
by one and not the other MPAM globally enabled CDP but hides the effect
on the other cache resource. This hiding is possible as CPOR is the only
supported cache control and that uses a resource bitmap; two partids with
the same bitmap act as one.
Awkwardly, the MB controls don't implement CDP and CDP can't be hidden as
the memory bandwidth control is a maximum per partid which can't be
modelled with more partids. If the total maximum is used for both the data
and instruction partids then then the maximum may be exceeded and if it is
split in two then the one using more bandwidth will hit a lower
limit. Hence, hide the MB controls completely if CDP is enabled for any
resource.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Amit Singh Tomar <amitsinght@marvell.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:55 +0000 (14:45 +0000)]
arm_mpam: resctrl: Implement helpers to update configuration
resctrl has two helpers for updating the configuration.
resctrl_arch_update_one() updates a single value, and is used by the
software-controller to apply feedback to the bandwidth controls, it has to
be called on one of the CPUs in the resctrl:domain.
resctrl_arch_update_domains() copies multiple staged configurations, it can
be called from anywhere.
Both helpers should update any changes to the underlying hardware.
Implement resctrl_arch_update_domains() to use
resctrl_arch_update_one(). Neither need to be called on a specific CPU as
the mpam driver will send IPIs as needed.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Thierry Reding [Thu, 26 Mar 2026 13:58:53 +0000 (14:58 +0100)]
arm64: tegra: Add PCI controllers on Tegra264
A total of six PCIe controllers can be found on Tegra264. One of them is
used internally for the integrated GPU while the other five can go to a
variety of connectors like full PCIe slots or M.2.
James Morse [Fri, 13 Mar 2026 14:45:52 +0000 (14:45 +0000)]
arm_mpam: resctrl: Pick the caches we will use as resctrl resources
Systems with MPAM support may have a variety of control types at any point
of their system layout. We can only expose certain types of control, and
only if they exist at particular locations.
Start with the well-known caches. These have to be depth 2 or 3 and support
MPAM's cache portion bitmap controls, with a number of portions fewer than
resctrl's limit.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Jon Hunter [Thu, 5 Mar 2026 15:16:59 +0000 (15:16 +0000)]
arm64: tegra: Fix RTC aliases
The following warning is observed on the Tegra234 Jetson platforms ...
rtc-nvidia-vrs10 4-003c: /aliases ID 0 not available
This happens because the 'rtc@c2a0000' device is registered before the
vrs10 RTC and so is assigned the 'rtc0' alias. We want the vrs10 RTC to
be the default RTC because this RTC maintains time across power cycles.
Fix this by adding a 'rtc1' alias for the 'rtc@c2a0000' device.
Fixes: b1806f2b4e78 ("arm64: tegra: Add device-tree node for NVVRS RTC") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
James Morse [Fri, 13 Mar 2026 14:45:51 +0000 (14:45 +0000)]
arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
resctrl has its own data structures to describe its resources. We can't use
these directly as we play tricks with the 'MBA' resource, picking the MPAM
controls or monitors that best apply. We may export the same component as
both L3 and MBA.
Add mpam_resctrl_res[] as the array of class->resctrl mappings we are
exporting, and add the cpuhp hooks that allocated and free the resctrl
domain structures. Only the mpam control feature are considered here and
monitor support will be added later.
While we're here, plumb in a few other obvious things.
CONFIG_ARM_CPU_RESCTRL is used to allow this code to be built even though
it can't yet be linked against resctrl.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:50 +0000 (14:45 +0000)]
KVM: arm64: Force guest EL1 to use user-space's partid configuration
While we trap the guest's attempts to read/write the MPAM control
registers, the hardware continues to use them. Guest-EL0 uses KVM's
user-space's configuration, as the value is left in the register, and
guest-EL1 uses either the host kernel's configuration, or in the case of
VHE, the UNKNOWN reset value of MPAM1_EL1.
We want to force the guest-EL1 to use KVM's user-space's MPAM
configuration. On nVHE rely on MPAM0_EL1 and MPAM1_EL1 always being
programmed the same and on VHE copy MPAM0_EL1 into the guest's
MPAM1_EL1. There is no need to restore as this is out of context once TGE
is set.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:49 +0000 (14:45 +0000)]
arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
Care must be taken when modifying the PARTID and PMG of a task in any
per-task structure as writing these values may race with the task being
scheduled in, and reading the modified values.
Add helpers to set the task properties, and the CPU default value. These
use WRITE_ONCE() that pairs with the READ_ONCE() in mpam_get_regval() to
avoid causing torn values.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Cc: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Ben Horgan [Fri, 13 Mar 2026 14:45:48 +0000 (14:45 +0000)]
arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
The MPAMSM_EL1 sets the MPAM labels, PMG and PARTID, for loads and stores
generated by a shared SMCU. Disable the traps so the kernel can use it and
set it to the same configuration as the per-EL cpu MPAM configuration.
If an SMCU is not shared with other cpus then it is implementation
defined whether the configuration from MPAMSM_EL1 is used or that from
the appropriate MPAMy_ELx. As we set the same, PMG_D and PARTID_D,
configuration for MPAM0_EL1, MPAM1_EL1 and MPAMSM_EL1 the resulting
configuration is the same regardless.
The range of valid configurations for the PARTID and PMG in MPAMSM_EL1 is
not currently specified in Arm Architectural Reference Manual but the
architect has confirmed that it is intended to be the same as that for the
cpu configuration in the MPAMy_ELx registers.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:47 +0000 (14:45 +0000)]
arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
The MPAM system registers will be lost if the CPU is reset during PSCI's
CPU_SUSPEND.
Add a PM notifier to restore them.
mpam_thread_switch(current) can't be used as this won't make any changes if
the in-memory copy says the register already has the correct value. In
reality the system register is UNKNOWN out of reset.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:46 +0000 (14:45 +0000)]
arm64: mpam: Advertise the CPUs MPAM limits to the driver
Requesters need to populate the MPAM fields for any traffic they send on
the interconnect. For the CPUs these values are taken from the
corresponding MPAMy_ELx register. Each requester may have a limit on the
largest PARTID or PMG value that can be used. The MPAM driver has to
determine the system-wide minimum supported PARTID and PMG values.
To do this, the driver needs to be told what each requestor's limit is.
CPUs are special, but this infrastructure is also needed for the SMMU and
GIC ITS. Call the helper to tell the MPAM driver what the CPUs can do.
The return value can be ignored by the arch code as it runs well before the
MPAM driver starts probing.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
[ morse: requestor->requester as argued by ispell ] Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:44 +0000 (14:45 +0000)]
arm64: mpam: Re-initialise MPAM regs when CPU comes online
Now that the MPAM system registers are expected to have values that change,
reprogram them based on the previous value when a CPU is brought online.
Previously MPAM's 'default PARTID' of 0 was always used for MPAM in
kernel-space as this is the PARTID that hardware guarantees to
reset. Because there are a limited number of PARTID, this value is exposed
to user-space, meaning resctrl changes to the resctrl default group would
also affect kernel threads. Instead, use the task's PARTID value for
kernel work on behalf of user-space too. The default of 0 is kept for both
user-space and kernel-space when MPAM is not enabled.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:45:43 +0000 (14:45 +0000)]
arm64: mpam: Context switch the MPAM registers
MPAM allows traffic in the SoC to be labeled by the OS, these labels are
used to apply policy in caches and bandwidth regulators, and to monitor
traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
equivalent calls these CLOSID and RMID, but they don't map precisely.
MPAM has two CPU system registers that is used to hold the PARTID and PMG
values that traffic generated at each exception level will use. These can
be set per-task by the resctrl file system. (resctrl is the defacto
interface for controlling this stuff).
Add a helper to switch this.
struct task_struct's separate CLOSID and RMID fields are insufficient to
implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
PMG (sort of like the RMID) separately. On x86, the rmid is an independent
number, so a race that writes a mismatched closid and rmid into hardware is
benign. On arm64, the pmg bits extend the partid.
(i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0). In
this case, mismatching the values will 'dirty' a pmg value that resctrl
believes is clean, and is not tracking with its 'limbo' code.
To avoid this, the partid and pmg are always read and written as a
pair. This requires a new u64 field. In struct task_struct there are two
u32, rmid and closid for the x86 case, but as we can't use them here do
something else. Add this new field, mpam_partid_pmg, to struct thread_info
to avoid adding more architecture specific code to struct task_struct.
Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
Resctrl allows a per-cpu 'default' value to be set, this overrides the
values when scheduling a task in the default control-group, which has
PARTID 0. The way 'code data prioritisation' gets emulated means the
register value for the default group needs to be a variable.
The current system register value is kept in a per-cpu variable to avoid
writing to the system register if the value isn't going to change. Writes
to this register may reset the hardware state for regulating bandwidth.
Finally, there is no reason to context switch these registers unless there
is a driver changing the values in struct task_struct. Hide the whole thing
behind a static key. This also allows the driver to disable MPAM in
response to errors reported by hardware. Move the existing static key to
belong to the arch code, as in the future the MPAM driver may become a
loadable module.
All this should depend on whether there is an MPAM driver, hide it behind
CONFIG_ARM64_MPAM.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> CC: Amit Singh Tomar <amitsinght@marvell.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Ben Horgan [Fri, 13 Mar 2026 14:45:42 +0000 (14:45 +0000)]
KVM: arm64: Make MPAMSM_EL1 accesses UNDEF
The MPAMSM_EL1 register controls the MPAM labeling for an SMCU, Streaming
Mode Compute Unit. As there is no MPAM support in KVM, make sure MPAMSM_EL1
accesses trigger an UNDEF.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Ben Horgan [Fri, 13 Mar 2026 14:45:41 +0000 (14:45 +0000)]
KVM: arm64: Preserve host MPAM configuration when changing traps
When KVM enables or disables MPAM traps to EL2 it clears all other bits in
MPAM2_EL2. Notably, it clears the partition ids (PARTIDs) and performance
monitoring groups (PMGs). Avoid changing these bits in anticipation of
adding support for MPAM in the kernel. Otherwise, on a VHE system with the
host running at EL2 where MPAM2_EL2 and MPAM1_EL1 access the same register,
any attempt to use MPAM to monitor or partition resources for kernel space
would be foiled by running a KVM guest. Additionally, MPAM2_EL2.EnMPAMSM is
always set to 0 which causes MPAMSM_EL1 to always trap. Keep EnMPAMSM set
to 1 when not in a guest so that the kernel can use MPAMSM_EL1.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Ben Horgan [Fri, 13 Mar 2026 14:45:39 +0000 (14:45 +0000)]
arm_mpam: Reset when feature configuration bit unset
To indicate that the configuration, of the controls used by resctrl, in a
RIS need resetting to driver defaults the reset flags in mpam_config are
set. However, these flags are only ever set temporarily at RIS scope in
mpam_reset_ris() and hence mpam_cpu_online() will never reset these
controls to default. As the hardware reset is unknown this leads to unknown
configuration when the control values haven't been configured away from the
defaults.
Use the policy that an unset feature configuration bit means reset. In this
way the mpam_config in the component can encode that it should be in reset
state and mpam_reprogram_msc() will reset controls as needed.
Fixes: 09b89d2a72f3 ("arm_mpam: Allow configuration to be applied and restored during cpu online") Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
[ morse: Removed unused reset flags from config structure ] Signed-off-by: James Morse <james.morse@arm.com>
Thierry Reding [Thu, 26 Mar 2026 13:58:50 +0000 (14:58 +0100)]
dt-bindings: pci: Document the NVIDIA Tegra264 PCIe controller
The six PCIe controllers found on Tegra264 are of two types: one is used
for the internal GPU and therefore is not connected to a UPHY and the
remaining five controllers are typically routed to a PCI slot and have
additional controls for the physical link.
While these controllers can be switched into endpoint mode, this binding
describes the root complex mode only.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
Thierry Reding [Mon, 23 Feb 2026 14:33:03 +0000 (15:33 +0100)]
dt-bindings: memory: tegra210: Mark EMC as cooling device
The external memory controller found on Tegra210 can use throttling of
the EMC frequency in order to reduce the memory chip temperature. Mark
the memory controller as a cooling device to take advantage of this
functionality.
Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
Zeng Heng [Fri, 13 Mar 2026 14:45:38 +0000 (14:45 +0000)]
arm_mpam: Ensure in_reset_state is false after applying configuration
The per-RIS flag, in_reset_state, indicates whether or not the MSC
registers are in reset state, and allows avoiding resetting when they are
already in reset state. However, when mpam_apply_config() updates the
configuration it doesn't update the in_reset_state flag and so even after
the configuration update in_reset_state can be true and mpam_reset_ris()
will skip the actual register restoration on subsequent resets.
Once resctrl has a MPAM backend it will use resctrl_arch_reset_all_ctrls()
to reset the MSC configuration on unmount and, if the in_reset_state flag
is bogusly true, fail to reset the MSC configuration. The resulting
non-reset MSC configuration can lead to persistent performance restrictions
even after resctrl is unmounted.
Fix by clearing in_reset_state to false immediately after successful
configuration application, ensuring that the next reset operation
properly restores MSC register defaults.
Fixes: 09b89d2a72f3 ("arm_mpam: Allow configuration to be applied and restored during cpu online") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Acked-by: Ben Horgan <ben.horgan@arm.com>
[Horgan: rewrite commit message to not be specific to resctrl unmount] Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Signed-off-by: James Morse <james.morse@arm.com>
Thierry Reding [Thu, 26 Mar 2026 13:58:49 +0000 (14:58 +0100)]
firmware: tegra: bpmp: Add tegra_bpmp_get_with_id() function
Some device tree bindings need to specify a parameter along with a BPMP
phandle reference to designate the ID associated with a given controller
that needs to interoperate with BPMP. Typically this is specified as an
extra cell in the nvidia,bpmp property, so add a helper to parse this ID
while resolving the phandle reference.
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:33 +0000 (18:56 +0200)]
PCI: Fix alignment calculation for resource size larger than align
The commit bc75c8e50711 ("PCI: Rewrite bridge window head alignment
function") did not use if (r_size <= align) check from pbus_size_mem() for
the new head alignment bookkeeping structure (aligns2[]). In some
configurations, this can result in producing a gap into the bridge window
which the resource larger than its alignment cannot fill.
The old alignment calculation algorithm was removed by the subsequent
commit 3958bf16e2fe ("PCI: Stop over-estimating bridge window size") which
renamed the aligns2[] array leaving only aligns[] array.
Add the if (r_size <= align) check back to avoid this problem.
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:32 +0000 (18:56 +0200)]
PCI: Align head space better
When a bridge window contains big and small resource(s), the small
resource(s) may not amount to the half of the size of the big resource
which would allow calculate_head_align() to shrink the head alignment.
This results in always placing the small resource(s) after the big
resource.
In general, it would be good to be able to place the small resource(s)
before the big resource to achieve better utilization of the address space.
In the cases where the large resource can only fit at the end of the
window, it is even required.
However, carrying the information over from pbus_size_mem() and
calculate_head_align() to __pci_assign_resource() and
pcibios_align_resource() is not easy with the current data structures.
A somewhat hacky way to move the non-aligning tail part to the head is
possible within pcibios_align_resource(). The free space between the start
of the free space span and the aligned start address can be compared with
the non-aligning remainder of the size. If the free space is larger than
the remainder, placing the remainder before the start address is possible.
This relocation should generally work, because PCI resources consist only
power-of-2 atoms.
Various arch requirements may still need to override the relocation, so the
relocation is only applied selectively in such cases.
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:31 +0000 (18:56 +0200)]
PCI: Rename window_alignment() to pci_min_window_alignment()
window_alignment() lacks prefix. Rename it to pci_min_window_alignment() in
order to include the prefix and also add min to indicate the returned
window alignment is the minimum PCI spec and arch allows.
Also make it available in drivers/pci/pci.h as upcoming changes will need
to call it from outside of setup-bus.c.
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:30 +0000 (18:56 +0200)]
parisc/PCI: Clean up align handling
Caller of pcibios_align_resource() (__find_resource_space()) already aligns
the start address by 'alignment' so aligning is only necessary if align >
alignment.
Change also to use ALIGN() instead of open-coding.
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:29 +0000 (18:56 +0200)]
MIPS: PCI: Remove unnecessary second application of align
Aligning res->start by align inside pcibios_align_resource() is unnecessary
because caller of pcibios_align_resource() is __find_resource_space() that
aligns res->start with align before calling pcibios_align_resource().
Aligning by align in case of IORESOURCE_IO && start & 0x300 cannot ever
result in changing start either because 0x300 bits would have not survived
the earlier alignment if align was large enough to have an impact.
Thus, remove the duplicated aligning from pcibios_align_resource().
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:28 +0000 (18:56 +0200)]
m68k/PCI: Remove unnecessary second application of align
Aligning res->start by align inside pcibios_align_resource() is unnecessary
because caller of pcibios_align_resource() is __find_resource_space() that
aligns res->start with align before calling pcibios_align_resource().
Aligning by align in case of IORESOURCE_IO && start & 0x300 cannot ever
result in changing start either because 0x300 bits would have not survived
the earlier alignment if align was large enough to have an impact.
Thus, remove the duplicated aligning from pcibios_align_resource().
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:27 +0000 (18:56 +0200)]
ARM/PCI: Remove unnecessary second application of align
Aligning res->start by align inside pcibios_align_resource() is unnecessary
because caller of pcibios_align_resource() is __find_resource_space() that
aligns res->start with align before calling pcibios_align_resource().
Aligning by align in case of IORESOURCE_IO && start & 0x300 cannot ever
result in changing start either because 0x300 bits would have not survived
the earlier alignment if align was large enough to have an impact.
Thus, remove the duplicated aligning from pcibios_align_resource().
Ilpo Järvinen [Tue, 24 Mar 2026 16:56:25 +0000 (18:56 +0200)]
resource: Pass full extent of empty space to resource_alignf callback
__find_resource_space() calculates the full extent of empty space but only
passes the aligned space to resource_alignf callback. In some situations,
the callback may choose take advantage of the free space before the
requested alignment.
Pass the full extent of the calculated empty space to resource_alignf
callback as an additional parameter.
Svyatoslav Ryhel [Mon, 26 Jan 2026 19:15:32 +0000 (21:15 +0200)]
ARM: tegra: Add ACTMON node to Tegra114 device tree
Add support for ACTMON on Tegra114. This is used to monitor activity from
different components. Based on the collected statistics, the rate at which
the external memory needs to be clocked can be derived.
H. Peter Anvin [Wed, 25 Mar 2026 23:01:47 +0000 (16:01 -0700)]
x86/fred: Enable FRED by default
When FRED was added to the mainline kernel, it was set up as an explicit
opt-in due to the risk of regressions before hardware was available publicly.
Now, Panther Lake (Core Ultra 300 series) has been released, and benchmarking
by Phoronix has shown that it provides a significant performance benefit on
most workloads:
Svyatoslav Ryhel [Mon, 26 Jan 2026 10:10:17 +0000 (12:10 +0200)]
ARM: tegra: lg-x3: Add node for capacitive buttons
Both smartphones have capacitive buttons but only P895 supports RMI4
function 1A (0D touch), while P880 exposes buttons area as a region of the
touchscreen.