Merge patch series "scsi: ufs: Add TX Equalization support for UFS 5.0"
Can Guo <can.guo@oss.qualcomm.com> says:
Hi,
The UFS v5.0 and UFSHCI v5.0 standards have published, introducing support
for HS-G6 (46.6 Gbps per lane) through the new UniPro V3.0 interconnect
layer and M-PHY V6.0 physical layer specifications. To achieve reliable
operation at these higher speeds, UniPro V3.0 introduces TX Equalization
and Pre-Coding mechanisms that are essential for signal integrity.
This patch series implements TX Equalization support in the UFS core
driver as specified in UFSHCI v5.0, along with the necessary vendor
operations and a reference implementation for Qualcomm UFS host
controllers.
Background
==========
TX Equalization is a signal conditioning technique that compensates for
channel impairments at high data rates (HS-G4 through HS-G6). It works
by adjusting two key parameters:
- PreShoot: Pre-emphasis applied before the main signal transition
- DeEmphasis: De-emphasis applied after the main signal transition
UniPro V3.0 defines TX Equalization Training (EQTR) procedure to
automatically discover optimal TX Equalization settings. The EQTR
procedure:
1. Starts from the most reliable link state (HS-G1)
2. Iterates through all possible PreShoot and DeEmphasis combinations
3. Evaluates signal quality using Figure of Merit (FOM) measurements
4. Selects the best settings for both host and device TX lanes
For HS-G6, Pre-Coding is also introduced to further improve signal
quality. Pre-Coding must be enabled on both transmitter and receiver
when the RX_FOM indicates it is required.
Implementation Overview
=======================
The implementation follows the UFSHCI v5.0 specification and consists of:
Core Infrastructure (Patches 1-6):
- New vops callback negotiate_pwr_mode() to allow vendors to negotiate
power mode parameters before applying TX Equalization settings
- Support for HS-G6 gear enumeration
- Complete TX EQTR procedure implementation in ufs-txeq.c
- Debugfs interface for TX Equalization parameter inspection and manual
retraining
- Module parameters for adaptive TX Equalization control
Qualcomm Implementation (Patches 7-11):
- PHY-specific configurations for TX EQTR procedure
- Vendor-specific FOM measurement support
- TX Equalization settings application
- Enable TX Equalization for HW version 0x7 and onwards
The implementation is designed to be vendor-agnostic, with platform-
specific details handled through the vops callbacks. Other vendors can
add support by implementing the three new vops:
- tx_eqtr_notify(): Called before/after TX EQTR for vendor setup
- apply_tx_eqtr_settings(): Apply vendor-specific PHY configurations
- get_rx_fom(): Retrieve vendor-specific FOM measurements if needed
Module Parameters
=================
The implementation provides several module parameters for flexibility:
- use_adaptive_txeq: Enable/disable adaptive TX Equalization (default: false)
- adaptive_txeq_gear: Minimum gear for adaptive TX EQ (default: HS-G6)
- use_txeq_presets: Use only the 8 standaird presets (default: false)
- txeq_presets_selected[]: Select specific presets for EQTR
Testing
=======
This patch series has been tested on Qualcomm platforms with UFS 5.0
devices, validating:
- Successful TX EQTR completion for HS-G6
- Proper FOM evaluation and optimal settings selection
- Pre-Coding enablement for HS-G6
- Power mode changes with TX Equalization settings applied
- Report of TX Equalization settings via debugfs entries
- Report of TX EQTR histories via debug entries (see next section)
- Re-training TX Equalization via debugfs entry
Example of TX EQTR history
==========================
Device TX EQTR record summary -
Target Power Mode: HS-G6, Rate-B
Most recent record index: 2
Most recent record timestamp: 219573378 us
TX Lane 0 FOM - PreShoot\DeEmphasis
\ 0 1 2 3 4 5 6 7
0 50 70 65 - - - - x
1 x x x x x x x x
2 100 90 70 - - - - x
3 x x x x x x x x
4 95 90 - - - - - x
5 - - - - - - - x
6 x x x x x x x x
7 x x x x x x x x
TX Lane 1 FOM - PreShoot\DeEmphasis
\ 0 1 2 3 4 5 6 7
0 50 70 60 - - - - x
1 x x x x x x x x
2 100 80 65 - - - - x
3 x x x x x x x x
4 95 85 - - - - - x
5 - - - - - - - x
6 x x x x x x x x
7 x x x x x x x x
Patch Structure
===============
Patches 1-3: Preparatory changes for power mode negotiation and HS-G6
Patch 4: Core TX Equalization and EQTR implementation
Patches 5-7: Debugfs support for TX Equalization
Patches 8-12: Qualcomm vendor implementation
Next
====
One more series has been developed to enhance TX Equalization support,
which will be submitted for review after this series is accepted:
- Provide board specific (static) TX Equalization settings from DTS
- Parse static TX Equalization settings from DTS if provided
- Apply static TX Equalization settings if use_adaptive_txeq is disabled
- Add support for UFS v5.0 attributes qTxEQGnSettings & wTxEQGnSettingsExt
- Enable persistent storage and retrieval of optimal TX Equalization settings
Can Guo [Wed, 25 Mar 2026 15:21:54 +0000 (08:21 -0700)]
scsi: ufs: ufs-qcom: Enable TX Equalization
Enable TX Equalization for hosts with HW version 0x7 and onwards.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-13-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On some platforms, when Host Software triggers TX Equalization Training, HW
does not take TX EQTR settings programmed in PA_TxEQTRSetting, instead HW
takes TX EQTR settings from PA_TxEQG1Setting. Implement vops
apply_tx_eqtr_setting() to work around it by programming TX EQTR settings
to PA_TxEQG1Setting during TX EQTR procedure.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-12-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:52 +0000 (08:21 -0700)]
scsi: ufs: ufs-qcom: Implement vops get_rx_fom()
On some platforms, host's M-PHY RX_FOM Attribute always reads 0, meaning SW
cannot rely on Figure of Merit (FOM) to identify the optimal TX
Equalization settings for device's TX Lanes. Implement the vops
ufs_qcom_get_rx_fom() such that SW can utilize the UFS Eye Opening Monitor
(EOM) to evaluate the TX Equalization settings for device's TX Lanes.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-11-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On some platforms, HW does not support triggering TX EQTR from the most
reliable High-Speed (HS) Gear (HS Gear1), but only allows to trigger TX
EQTR for the target HS Gear from the same HS Gear. To work around the HW
limitation, implement vops tx_eqtr_notify() to change Power Mode to the
target TX EQTR HS Gear prior to TX EQTR procedure and change Power Mode
back to HS Gear1 (the most reliable gear) post TX EQTR procedure.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-10-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If HS-G6 Power Mode change handshake is successful and outbound data Lanes
are expected to transmit ADAPT, M-TX Lanes shall be configured as
if (Adapt Type == REFRESH)
TX_HS_ADAPT_LENGTH_L0_L1_L2_L3 = PA_PeerRxHsG6AdaptRefreshL0L1L2L3.
else if (Adapt Type == INITIAL)
TX_HS_ADAPT_LENGTH_L0_L1_L2_L3 = PA_PeerRxHsG6AdaptInitialL0L1L2L3.
On some platforms, the ADAPT_L0_L1_L2_L3 duration on Host TX Lanes is only
a half of theoretical ADAPT_L0_L1_L2_L3 duration TADAPT_L0_L1_L2_L3 (in
PAM-4 UI) calculated from TX_HS_ADAPT_LENGTH_L0_L1_L2_L3.
For such platforms, the workaround is to double the ADAPT_L0_L1_L2_L3
duration by uplifting TX_HS_ADAPT_LENGTH_L0_L1_L2_L3. UniPro initializes
TX_HS_ADAPT_LENGTH_L0_L1_L2_L3 during HS-G6 Power Mode change handshake, it
would be too late for SW to update TX_HS_ADAPT_LENGTH_L0_L1_L2_L3 post
HS-G6 Power Mode change. Update PA_PeerRxHsG6AdaptRefreshL0L1L2L3 and
PA_PeerRxHsG6AdaptInitialL0L1L2L3 post Link Startup and before HS-G6 Power
Mode change, so that the UniPro would use the updated value during HS-G6
Power Mode change handshake.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-9-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:49 +0000 (08:21 -0700)]
scsi: ufs: core: Add support to retrain TX Equalization via debugfs
Drastic environmental changes, such as significant temperature shifts, can
impact link signal integrity. In such cases, retraining TX Equalization is
necessary to compensate for these environmental changes.
Add a debugfs entry, 'tx_eq_ctrl', to allow userspace to manually trigger
the TX Equalization training (EQTR) procedure and apply the identified
optimal settings on the fly. These entries are created on a per-gear basis
for High Speed Gear 4 (HS-G4) and above, as TX EQTR is not supported for
lower gears.
The 'tx_eq_ctrl' entry currently accepts the 'retrain' command to initiate
the procedure. The interface is designed to be scalable to support
additional commands in the future.
Reading the 'tx_eq_ctrl' entry provides a usage hint to the user, ensuring
the interface is self-documenting.
The ufshcd's debugfs folder structure will look like below:
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-8-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:48 +0000 (08:21 -0700)]
scsi: ufs: core: Add helpers to pause and resume command processing
In preparation for supporting TX Equalization refreshing, introduce helper
functions to safely pause and resume command processing.
ufshcd_pause_command_processing() ensures the host is in a quiescent state
by stopping the block layer tagset, acquiring the necessary locks
(scan_mutex and clk_scaling_lock), and waiting for any in-flight commands
to complete within a specified timeout.
ufshcd_resume_command_processing() restores the host to its previous
operational state by reversing these steps in the correct order.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-7-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:47 +0000 (08:21 -0700)]
scsi: ufs: core: Add debugfs entries for TX Equalization params
Add debugfs support for UFS TX Equalization and UFS TX Equalization
Training (EQTR) to facilitate runtime inspection of link quality. These
entries allow developers to monitor and optimize TX Equalization parameters
and EQTR records during live operation.
The debugfs entries are organized on a per-gear basis under the HBA's
debugfs root. Since TX EQTR is only defined for High Speed Gear 4 (HS-G4)
and above, EQTR-related entries are explicitly excluded for HS-G1 through
HS-G3 to avoid exposing unsupported attributes.
The ufshcd's debugfs folder structure will look like below:
Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-6-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:46 +0000 (08:21 -0700)]
scsi: ufs: core: Add support for TX Equalization
MIPI Unipro3.0 introduced PA_TxEQGnSetting and PA_PreCodeEn attributes for
TX Equalization and Pre-Coding. It is Host Software's responsibility to
configure these attributes for both host and device before initiating Power
Mode Change to High-Speed Gears.
MIPI Unipro3.0 also introduced TX Equalization Training (EQTR) to identify
optimal TX Equalization settings for use by both Host's and Device's
UniPro. TX EQTR shall be initiated from the most reliable High-Speed Gear
(HS-G1) targeting High-Speed Gears (HS-G4 to HS-G6).
Implement TX Equalization configuration and TX EQTR procedure as defined
in UFSHCI v5.0 specification. The TX EQTR procedure determines the optimal
TX Equalization settings by iterating through all possible PreShoot and
DeEmphasis combinations and selecting the best combinations for both Host
and Device based on Figure of Merit (FOM) evaluation.
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-5-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:45 +0000 (08:21 -0700)]
scsi: ufs: core: Add UFS_HS_G6 and UFS_HS_GEAR_MAX to enum ufs_hs_gear_tag
Add UFS_HS_G6 to enum ufs_hs_gear_tag. In addition, add UFS_HS_GEAR_MAX to
enum ufs_hs_gear_tag to facilitate iteration over valid High Speed Gears.
Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-4-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:44 +0000 (08:21 -0700)]
scsi: ufs: core: Pass force_pmc to ufshcd_config_pwr_mode() as a parameter
Currently, callers must manually toggle hba->force_pmc before and after
calling ufshcd_config_pwr_mode() to force a Power Mode change. Introduce
enum ufshcd_pmc_policy and refactor ufshcd_config_pwr_mode() to accept
pmc_policy as a parameter to force a Power Mode change.
Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-3-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Can Guo [Wed, 25 Mar 2026 15:21:43 +0000 (08:21 -0700)]
scsi: ufs: core: Introduce a new ufshcd vops negotiate_pwr_mode()
Most vendor specific implemenations of vops pwr_change_notify(PRE_CHANGE)
are fulfilling two things at once:
- Vendor specific target power mode negotiation
- Vendor specific power mode change preparation
When TX Equalization is added into consideration, before power mode change
to a target power mode, TX Equalization Training (EQTR) needs be done for
that target power mode. In addition, UFSHCI spec requires to start TX EQTR
from HS-G1 (the most reliable High Speed Gear).
Adding TX EQTR before pwr_change_notify(PRE_CHANGE) is not applicable
because we don't know the negotiated power mode yet.
Adding TX EQTR post pwr_change_notify(PRE_CHANGE) is inappropriate because
pwr_change_notify(PRE_CHANGE) has finished preparation for a power mode
change to negotiated power mode, yet we are changing power mode to HS-G1
for TX EQTR.
Add a new vops negotiate_pwr_mode() so that vendor specific power mode
negotiation can be fulfilled in its vendor specific implementations. Later
on, TX EQTR can be added post vops negotiate_pwr_mode() and before vops
pwr_change_notify(PRE_CHANGE).
Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Can Guo <can.guo@oss.qualcomm.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Link: https://patch.msgid.link/20260325152154.1604082-2-can.guo@oss.qualcomm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Keith Busch [Wed, 25 Mar 2026 19:36:08 +0000 (12:36 -0700)]
dm: provide helper to set stacked limits
There are multiple device mappers that set up their stacking limits
exactly the same for the logical, physical and minimum IO queue limits.
Provide a helper for it.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Keith Busch [Wed, 25 Mar 2026 19:36:07 +0000 (12:36 -0700)]
dm-integrity: always set the io hints
Don't depend on the defaults to be what is desired if the integrity
device was set up with 512b sector size. Always set the queue limits to
be at least what the device mapper wants.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Keith Busch [Wed, 25 Mar 2026 19:36:06 +0000 (12:36 -0700)]
dm-integrity: fix mismatched queue limits
A user can integritysetup a device with a backing device using a 4k
logical block size, but request the dm device use 1k or 2k. This
mismatch creates an inconsistency such that the dm device would report
limits for IO that it can't actually execute. Fix this by using the
backing device's limits if they are larger.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Zilin Guan [Fri, 27 Mar 2026 08:47:42 +0000 (16:47 +0800)]
hfsplus: extract hidden directory search into a helper function
In hfsplus_fill_super(), the process of looking up the hidden directory
involves initializing a catalog search, building a search key, reading
the b-tree record, and releasing the search data.
Currently, this logic is open-coded directly within the main superblock
initialization routine. This makes hfsplus_fill_super() quite lengthy
and its error handling paths less straightforward.
Extract the hidden directory search sequence into a new helper function,
hfsplus_get_hidden_dir_entry(). This improves overall code readability,
cleanly encapsulates the hfs_find_data lifecycle, and simplifies the
error exits in hfsplus_fill_super().
Zilin Guan [Fri, 27 Mar 2026 08:47:41 +0000 (16:47 +0800)]
hfsplus: fix held lock freed on hfsplus_fill_super()
hfsplus_fill_super() calls hfs_find_init() to initialize a search
structure, which acquires tree->tree_lock. If the subsequent call to
hfsplus_cat_build_key() fails, the function jumps to the out_put_root
error label without releasing the lock. The later cleanup path then
frees the tree data structure with the lock still held, triggering a
held lock freed warning.
Fix this by adding the missing hfs_find_exit(&fd) call before jumping
to the out_put_root error label. This ensures that tree->tree_lock is
properly released on the error path.
The bug was originally detected on v6.13-rc1 using an experimental
static analysis tool we are developing, and we have verified that the
issue persists in the latest mainline kernel. The tool is specifically
designed to detect memory management issues. It is currently under active
development and not yet publicly available.
We confirmed the bug by runtime testing under QEMU with x86_64 defconfig,
lockdep enabled, and CONFIG_HFSPLUS_FS=y. To trigger the error path, we
used GDB to dynamically shrink the max_unistr_len parameter to 1 before
hfsplus_asc2uni() is called. This forces hfsplus_asc2uni() to naturally
return -ENAMETOOLONG, which propagates to hfsplus_cat_build_key() and
exercises the faulty error path. The following warning was observed
during mount:
=========================
WARNING: held lock freed! 7.0.0-rc3-00016-gb4f0dd314b39 #4 Not tainted
-------------------------
mount/174 is freeing memory ffff888103f92000-ffff888103f92fff, with a lock still held there! ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0
2 locks held by mount/174:
#0: ffff888103f960e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.constprop.0+0x167/0xa40
#1: ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0
Ranjan Kumar [Fri, 20 Mar 2026 09:03:26 +0000 (14:33 +0530)]
scsi: mpi3mr: Add retry mechanism for IOC shutdown with timeout reset
Enhance the IOC shutdown process to handle transient failures during
controller cleanup. Add retry logic with configurable maximum retry
count (MPI3MR_MAX_SHUTDOWN_RETRY_COUNT) and proper timeout management
that resets on each retry attempt. This ensures shutdown can recover
from temporary issues without failing completely.
Ranjan Kumar [Fri, 20 Mar 2026 09:03:25 +0000 (14:33 +0530)]
scsi: mpi3mr: Add queue-full tracking for operational request queues
Track queue-full conditions on operational request queues in the driver.
Record the last host tag returned to the SCSI mid-layer and count I/Os
affected by queue-full conditions.
Ranjan Kumar [Fri, 20 Mar 2026 09:03:24 +0000 (14:33 +0530)]
scsi: mpi3mr: Reset controller on invalid I/O completion
Operational replies without a valid scsi_cmnd indicate an invalid I/O
completion and a potentially inconsistent controller state. Track this
condition and allow the watchdog to trigger a soft reset to safely
recover.
Qinxin Xia [Tue, 10 Mar 2026 04:06:07 +0000 (12:06 +0800)]
perf tools: Add --pmu-filter option for filtering PMUs
This patch adds a new --pmu-filter option to perf-stat command to allow
filtering events on specific PMUs. This is useful when there are
multiple PMUs with same type (e.g. hisi_sicl2_cpa0 and hisi_sicl0_cpa0).
[root@localhost tmp]# perf stat -M cpa_p0_avg_bw
Performance counter stats for 'system wide':
Kexin Sun [Sat, 21 Mar 2026 10:59:04 +0000 (18:59 +0800)]
scsi: iscsi_tcp: update outdated comment for renamed iscsi_conn_set_callbacks()
The function iscsi_conn_set_callbacks() was renamed to
iscsi_sw_tcp_conn_set_callbacks() by commit 38e1a8f5479d ("[SCSI]
iscsi_tcp: hook iscsi_tcp into new libiscsi_tcp module"). Update the
stale reference in iscsi_sw_tcp_conn_restore_callbacks().
Kexin Sun [Sat, 21 Mar 2026 10:59:09 +0000 (18:59 +0800)]
scsi: lpfc: Update outdated comment for renamed lpfc_freenode()
The function lpfc_freenode() was renamed to lpfc_cleanup_node() by
commit 685f0bf7afe0 ("[SCSI] lpfc 8.1.12 : Collapse discovery lists to a
single node list"), and commit a70e63eee1c1 ("scsi: lpfc: Fix NPIV
Fabric Node reference counting") later removed the lpfc_unreg_rpi() call
from lpfc_cleanup_node().
Remove the now-inaccurate "called from lpfc_freenode()" sentence and
reflow the remaining comment text for lpfc_unreg_rpi().
Danilo Krummrich [Wed, 25 Mar 2026 00:39:17 +0000 (01:39 +0100)]
gpu: nova-core: use sized array for GSP log buffers
Switch LogBuffer from Coherent<[u8]> (unsized) to
Coherent<[u8; LOG_BUFFER_SIZE]> (sized). The buffer size is a
compile-time constant (RM_LOG_BUFFER_NUM_PAGES * GSP_PAGE_SIZE), so a
fixed-size array is more precise and avoids the need for the runtime
length parameter of zeroed_slice().
Danilo Krummrich [Wed, 25 Mar 2026 00:39:16 +0000 (01:39 +0100)]
rust: dma: generalize BinaryWriter impl for Coherent<T>
Generalize the BinaryWriter implementation from Coherent<[u8]> to
Coherent<T> where T: KnownSize + AsBytes + ?Sized. The implementation
only uses size() and write_dma(), neither of which depends on the
inner type being a byte slice.
This allows any Coherent allocation with an AsBytes inner type to be
exposed as a debugfs binary file.
Danilo Krummrich [Wed, 25 Mar 2026 00:39:15 +0000 (01:39 +0100)]
rust: uaccess: generalize write_dma() to accept any Coherent<T>
Generalize write_dma() from &Coherent<[u8]> to &Coherent<T> where
T: KnownSize + AsBytes + ?Sized. The function body only uses as_ptr()
and size(), which work for any such T, so there is no reason to
restrict it to byte slices.
Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Gary Guo <gary@garyguo.net> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260325003921.3420-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The event_list processes non-hotplug events (such as LUN capacity
changes), so remove the conditions that guard the initial kicks in
_probe() and _restore(), as well as the work cancellation in _remove().
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Joshua Daley <jdaley@linux.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://patch.msgid.link/20260325180857.3675854-3-jdaley@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Joshua Daley [Wed, 25 Mar 2026 18:08:56 +0000 (19:08 +0100)]
scsi: virtio_scsi: Move INIT_WORK calls to virtscsi_probe()
The last step of virtscsi_handle_event() is to call
virtscsi_kick_event(), which calls INIT_WORK on its own work
item. INIT_WORK resets the work item's data bits to 0.
If this occurs while the work item is being flushed by
cancel_work_sync(), then kernel/workqueue.c/work_offqd_enable triggers a
kernel warning, as it expects the "disable" bit to be 1:
This warning may occur if a controller is detached immediately following
a disk detach.
Move the INIT_WORK call to prevent this. Don't re-init event list work
items in virtscsi_kick_event(), init them only once in virtscsi_probe()
instead.
The DRM shmem helper includes common code useful for drivers which
allocate GEM objects as anonymous shmem. Add a Rust abstraction for
this. Drivers can choose the raw GEM implementation or the shmem layer,
depending on their needs.
Signed-off-by: Asahi Lina <lina@asahilina.net> Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Janne Grunau <j@jananu.net> Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com> Link: https://patch.msgid.link/20260316211646.650074-6-lyude@redhat.com
[ * DRM_GEM_SHMEM_HELPER is a tristate; when a module driver selects it,
it becomes =m. The Rust kernel crate and its C helpers are always
built into vmlinux and can't reference symbols from a module,
causing link errors.
Thus, add RUST_DRM_GEM_SHMEM_HELPER bool Kconfig that selects
DRM_GEM_SHMEM_HELPER, forcing it built-in when Rust drivers need it;
use cfg(CONFIG_RUST_DRM_GEM_SHMEM_HELPER) for the shmem module.
* Add cfg_attr(not(CONFIG_RUST_DRM_GEM_SHMEM_HELPER), expect(unused))
on pub(crate) use impl_aref_for_gem_obj and BaseObjectPrivate, so
that unused warnings are suppressed when shmem is not enabled.
* Enable const_refs_to_static (stabilized in 1.83) to prevent build
errors with older compilers.
* Use &raw const for bindings::drm_gem_shmem_vm_ops and add
#[allow(unused_unsafe, reason = "Safe since Rust 1.82.0")].
* Fix incorrect C Header path and minor spelling and formatting
issues.
* Drop shmem::Object::sg_table() as the current implementation is
unsound.
Lyude Paul [Mon, 16 Mar 2026 21:16:10 +0000 (17:16 -0400)]
rust: drm: gem: Add raw_dma_resv() function
For retrieving a pointer to the struct dma_resv for a given GEM object. We
also introduce it in a new trait, BaseObjectPrivate, which we automatically
implement for all gem objects and don't expose to users outside of the
crate.
Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Janne Grunau <j@jananu.net> Tested-by: Janne Grunau <j@jannau.net> Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com> Link: https://patch.msgid.link/20260316211646.650074-3-lyude@redhat.com
[ Fix incorrect reference in safety comment. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Eric Biggers [Thu, 26 Mar 2026 03:29:20 +0000 (20:29 -0700)]
lib/crypto: chacha: Zeroize permuted_state before it leaves scope
Since the ChaCha permutation is invertible, the local variable
'permuted_state' is sufficient to compute the original 'state', and thus
the key, even after the permutation has been done.
While the kernel is quite inconsistent about zeroizing secrets on the
stack (and some prominent userspace crypto libraries don't bother at all
since it's not guaranteed to work anyway), the kernel does try to do it
as a best practice, especially in cases involving the RNG.
Thus, explicitly zeroize 'permuted_state' before it goes out of scope.
Kees Cook [Mon, 23 Mar 2026 17:13:15 +0000 (10:13 -0700)]
scsi: target: Replace strncpy() with strscpy() in VPD dump functions
Replace the deprecated[1] strncpy() with strscpy() in
transport_dump_vpd_proto_id(), transport_dump_vpd_assoc(),
transport_dump_vpd_ident_type(), and transport_dump_vpd_ident().
All four functions follow the same pattern: a local
buf[VPD_TMP_BUF_SIZE] (254 bytes) is zeroed with memset(), populated
via sprintf()/snprintf() (always NUL-terminated), then conditionally
copied to p_buf with strncpy(). The p_buf destination is used as a
C string by all callers in target_core_configfs.c: strlen(buf) and
sprintf(page+len, "%s", buf) to build sysfs output.
NUL-padding is not required: callers in target_core_configfs.c
pre-zero p_buf with memset() or initializer before calling these
functions, and consume p_buf only as a NUL-terminated C string via
strlen() and "%s", never exposing trailing bytes.
No behavioral change: the source buf is always NUL-terminated and
shorter than VPD_TMP_BUF_SIZE, so strscpy() produces identical output.
Linus Torvalds [Fri, 27 Mar 2026 20:30:04 +0000 (13:30 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
- Quite a few irdma bug fixes, several user triggerable
- Fix a 0 SMAC header in ionic
- Tolerate FW errors for RAAS in bng_re
- Don't UAF in efa when printing error events
- Better handle pool exhaustion in the new bvec paths
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Harden depth calculation functions
RDMA/irdma: Return EINVAL for invalid arp index error
RDMA/irdma: Fix deadlock during netdev reset with active connections
RDMA/irdma: Remove reset check from irdma_modify_qp_to_err()
RDMA/irdma: Clean up unnecessary dereference of event->cm_node
RDMA/irdma: Remove a NOP wait_event() in irdma_modify_qp_roce()
RDMA/irdma: Update ibqp state to error if QP is already in error state
RDMA/irdma: Initialize free_qp completion before using it
RDMA/efa: Fix possible deadlock
RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ path
RDMA/rw: Fall back to direct SGE on MR pool exhaustion
RDMA/efa: Fix use of completion ctx after free
RDMA/bng_re: Fix silent failure in HWRM version query
RDMA/ionic: Preserve and set Ethernet source MAC after ib_ud_header_init()
RDMA/irdma: Fix double free related to rereg_user_mr
Arnd Bergmann [Mon, 23 Mar 2026 09:57:39 +0000 (10:57 +0100)]
scsi: esas2r: Fix __printf annotation on esas2r_log_master()
clang-22 started warning about functions that take printf format
strings:
drivers/scsi/esas2r/esas2r_log.c:160:50: error: diagnostic behavior may be improved by adding the 'format(printf, 3, 0)' attribute to the declaration of 'esas2r_log_master' [-Werror,-Wmissing-format-attribute]
121 | retval = vsnprintf(buffer, buflen, format, args);
| ^
drivers/scsi/esas2r/esas2r_log.c:121:12: note: 'esas2r_log_master' declared here
121 | static int esas2r_log_master(const long level,
| ^
The warning already got silenced for gcc but not clang in the past.
Rather than modify that hack to turn it off for both, just add the
attribute as suggested and remove the pragma again.
Linus Torvalds [Fri, 27 Mar 2026 20:25:58 +0000 (13:25 -0700)]
Merge tag 'pci-v7.0-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Remove power-off from pwrctrl drivers since this is now done directly
by the PCI controller drivers (Chen-Yu Tsai)
- Fix pwrctrl device node leak (Felix Gu)
- Document a TLP header decoder for AER log messages (Lukas Wunner)
* tag 'pci-v7.0-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Documentation: PCI: Document PCIe TLP Header decoder for AER messages
PCI/pwrctrl: Fix pci_pwrctrl_is_required() device node leak
PCI/pwrctrl: Do not power off on pwrctrl device removal
Linus Torvalds [Fri, 27 Mar 2026 20:16:40 +0000 (13:16 -0700)]
Merge tag 'sound-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became slightly big partly due to my time off in the last week.
But all changes are about device-specific fixes, so it should be
safely applicable.
ASoC:
- Fix double free in sma1307
- Fix uninitialized variables in simple-card-utils/imx-card
- Address clock leaks and error propagation in ADAU1372
- Add DMI quirks and ACP/SDW support for ASUS
- Fix Intel CATPT DMA mask
- Fix SOF topology parsing
- Fix DT bindings for RK3576 SPDIF, STM32 SAI and WCD934x
HD-audio:
- Quirks for Lenovo, ASUS, and various HP models, as well as
a speaker pop fix on Star Labs StarFighter
- Revert MSI X870E Tomahawk denylist again
USB-Audio:
- Fix distorted audio on Focusrite Scarlett 2i2/2i4 1st Gen
- Add iface reset quirk for AB17X
- Update Qualcomm USB audio Kconfig dependencies and license
Misc:
- Fix minor compile warnings for firewire and asihpi drivers"
* tag 'sound-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (35 commits)
Revert "ALSA: hda/intel: Add MSI X870E Tomahawk to denylist"
ALSA: usb-audio: Add iface reset and delay quirk for AB17X USB Audio
ALSA: hda/realtek: add HP Laptop 15-fd0xxx mute LED quirk
ALSA: usb-audio: Exclude Scarlett 2i4 1st Gen from SKIP_IFACE_SETUP
ALSA: hda/realtek: Add mute LED quirk for HP Pavilion 15-eg0xxx
ALSA: hda/realtek - Fixed Speaker Mute LED for HP EliteBoard G1a platform
ASoC: SOF: ipc4-topology: Allow bytes controls without initial payload
ASoC: adau1372: Fix clock leak on PLL lock failure
ASoC: adau1372: Fix unchecked clk_prepare_enable() return value
ASoC: SDCA: fix finding wrong entity
ASoC: SDCA: remove the max count of initialization table
ASoC: codecs: wcd934x: fix typo in dt parsing
ASoC: dt-bindings: stm32: Fix incorrect compatible string in stm32h7-sai match
ASoC: Intel: catpt: Fix the device initialization
ASoC: amd: acp: add ASUS HN7306EA quirk for legacy SDW machine
ASoC: SOF: topology: reject invalid vendor array size in token parser
ASoC: tas2781: Add null check for calibration data
ALSA: asihpi: avoid write overflow check warning
ASoC: fsl: imx-card: initialize playback_only and capture_only
ASoC: simple-card-utils: Check value of is_playback_only and is_capture_only
...
Linus Torvalds [Fri, 27 Mar 2026 20:10:49 +0000 (13:10 -0700)]
Merge tag 'media/v7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- uvcvideo may cause OOPS when out of memory
- remove a deadlock in the ccs driver
* tag 'media/v7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: ccs: Avoid deadlock in ccs_init_state()
media: uvcvideo: Fix bug in error path of uvc_alloc_urb_buffers
Linus Torvalds [Fri, 27 Mar 2026 20:04:34 +0000 (13:04 -0700)]
Merge tag 'sysctl-7.00-fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl fix from Joel Granados:
"Fix uninitialized variable error when writing to a sysctl bitmap
Removed the possibility of returning an unjustified -EINVAL when
writing to a sysctl bitmap"
* tag 'sysctl-7.00-fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: fix uninitialized variable in proc_do_large_bitmap
Linus Torvalds [Fri, 27 Mar 2026 19:22:45 +0000 (12:22 -0700)]
Merge tag 'xfs-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"This includes a few important bug fixes, and some code refactoring
that was necessary for one of the fixes"
* tag 'xfs-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove file_path tracepoint data
xfs: don't irele after failing to iget in xfs_attri_recover_work
xfs: remove redundant validation in xlog_recover_attri_commit_pass2
xfs: fix ri_total validation in xlog_recover_attri_commit_pass2
xfs: close crash window in attr dabtree inactivation
xfs: factor out xfs_attr3_leaf_init
xfs: factor out xfs_attr3_node_entry_remove
xfs: only assert new size for datafork during truncate extents
xfs: annotate struct xfs_attr_list_context with __counted_by_ptr
xfs: cleanup buftarg handling in XFS_IOC_VERIFY_MEDIA
xfs: scrub: unlock dquot before early return in quota scrub
xfs: refactor xfsaild_push loop into helper
xfs: save ailp before dropping the AIL lock in push callbacks
xfs: avoid dereferencing log items after push callbacks
xfs: stop reclaim before pushing AIL during unmount
Shuming Fan [Fri, 27 Mar 2026 08:23:31 +0000 (16:23 +0800)]
ASoC: SDCA: fix the register to ctl value conversion for Q7.8 format
The division calculation should be implemented using signed integer format.
This patch changes mc->shift from an unsigned type to a signed integer during the calculation.
Fixes: 501efdcb3b3a ("ASoC: SDCA: Pull the Q7.8 volume helpers out of soc-ops") Signed-off-by: Shuming Fan <shumingf@realtek.com> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20260327082331.2277498-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
Linus Torvalds [Fri, 27 Mar 2026 19:03:39 +0000 (12:03 -0700)]
Merge tag 'v7.0-rc5-ksmbd-srv-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix out of bounds write
- Fix for better calculating max output buffers
- Fix memory leaks in SMB2/SMB3 lock
- Fix use after free
- Multichannel fix
* tag 'v7.0-rc5-ksmbd-srv-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix potencial OOB in get_file_all_info() for compound requests
ksmbd: replace hardcoded hdr2_len with offsetof() in smb2_calc_max_out_buf_len()
ksmbd: fix memory leaks and NULL deref in smb2_lock()
ksmbd: fix use-after-free and NULL deref in smb_grant_oplock()
ksmbd: do not expire session on binding failure
Ethan Tidmore [Thu, 19 Mar 2026 18:26:44 +0000 (13:26 -0500)]
iommu/riscv: Fix signedness bug
The function platform_irq_count() returns negative error codes and
iommu->irqs_count is an unsigned integer, so the check
(iommu->irqs_count <= 0) is always impossible.
Make the return value of platform_irq_count() be assigned to ret, check
for error, and then assign iommu->irqs_count to ret.
Detected by Smatch:
drivers/iommu/riscv/iommu-platform.c:119 riscv_iommu_platform_probe() warn:
'iommu->irqs_count' unsigned <= 0
Gregory Price [Fri, 27 Mar 2026 02:02:02 +0000 (22:02 -0400)]
cxl/core/region: move dax region device logic into region_dax.c
core/region.c is overloaded with per-region control logic (pmem, dax,
sysram, etc). Move the CXL DAX region device infrastructure from
region.c into a new region_dax.c file.
This will also allow us to add additional dax-driver integration paths
that don't further dirty the core region.c logic.
No functional changes.
Signed-off-by: Gregory Price <gourry@gourry.net> Co-developed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260327020203.876122-3-gourry@gourry.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Gregory Price [Fri, 27 Mar 2026 02:02:01 +0000 (22:02 -0400)]
cxl/core/region: move pmem region driver logic into region_pmem.c
core/region.c is overloaded with per-region control logic (pmem, dax,
sysram, etc). Move the pmem region driver logic from region.c into
region_pmem.c make it clear that this code only applies to pmem regions.
No functional changes.
[ dj: Fixed up some tabbing issues, may be from original code. ]
Signed-off-by: Gregory Price <gourry@gourry.net> Co-developed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260327020203.876122-2-gourry@gourry.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Linus Walleij [Thu, 26 Mar 2026 23:26:39 +0000 (00:26 +0100)]
ASoC: nau8315: Drop unused include
The driver includes the legacy GPIO header <linux/gpio.h> but does
not use any symbols from it so drop the include. (It is already
using the consumer header as is proper.)
Cheng-Yang Chou [Fri, 27 Mar 2026 09:50:39 +0000 (17:50 +0800)]
sched_ext: Document why built-in DSQs are unsupported sources in scx_bpf_dsq_move_to_local()
Add a comment explaining the design intent behind rejecting built-in DSQs
(%SCX_DSQ_GLOBAL and %SCX_DSQ_LOCAL*) as sources. Local DSQs support
reenqueueing but the BPF scheduler cannot directly iterate or move tasks
from them. %SCX_DSQ_GLOBAL is similar but also doesn't support
reenqueueing because it maps to multiple per-node DSQs, making the scope
difficult to define.
Also annotate @dsq_id to make clear it must be a user-created DSQ.
Zhao Mengmeng [Fri, 27 Mar 2026 06:17:57 +0000 (14:17 +0800)]
scx_central: Defer timer start to central dispatch to fix init error
scx_central currently assumes that ops.init() runs on the selected
central CPU and aborts otherwise. This is no longer true, as ops.init()
is invoked from the scx_enable_helper thread, which can run on any
CPU.
As a result, sched_setaffinity() from userspace doesn't work, causing
scx_central to fail when loading with:
Fix this by:
- Defer bpf_timer_start() to the first dispatch on the central CPU.
- Initialize the BPF timer in central_init() and kick the central CPU
to guarantee entering the dispatch path on the central CPU immediately.
- Remove the unnecessary sched_setaffinity() call in userspace.
Yeoreum Yun [Sat, 14 Mar 2026 17:51:31 +0000 (17:51 +0000)]
arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
The purpose of supporting LSUI is to eliminate PAN toggling. CPUs that
support LSUI are unlikely to support a 32-bit runtime. Rather than
emulating the SWP instruction using LSUI instructions in order to remove
PAN toggling, simply disable SWP emulation.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
[catalin.marinas@arm.com: some tweaks to the in-code comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
dax/hmem, cxl: Defer and resolve Soft Reserved ownership
The current probe time ownership check for Soft Reserved memory based
solely on CXL window intersection is insufficient. dax_hmem probing is not
always guaranteed to run after CXL enumeration and region assembly, which
can lead to incorrect ownership decisions before the CXL stack has
finished publishing windows and assembling committed regions.
Introduce deferred ownership handling for Soft Reserved ranges that
intersect CXL windows. When such a range is encountered during the
initial dax_hmem probe, schedule deferred work to wait for the CXL stack
to complete enumeration and region assembly before deciding ownership.
Once the deferred work runs, evaluate each Soft Reserved range
individually: if a CXL region fully contains the range, skip it and let
dax_cxl bind. Otherwise, register it with dax_hmem. This per-range
ownership model avoids the need for CXL region teardown and
alloc_dax_region() resource exclusion prevents double claiming.
Introduce a boolean flag dax_hmem_initial_probe to live inside device.c
so it survives module reload. Ensure dax_cxl defers driver registration
until dax_hmem has completed ownership resolution. dax_cxl calls
dax_hmem_flush_work() before cxl_driver_register(), which both waits for
the deferred work to complete and creates a module symbol dependency that
forces dax_hmem.ko to load before dax_cxl.
Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260322195343.206900-9-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
cxl/region: Add helper to check Soft Reserved containment by CXL regions
Add a helper to determine whether a given Soft Reserved memory range is
fully contained within the committed CXL region.
This helper provides a primitive for policy decisions in subsequent
patches such as co-ordination with dax_hmem to determine whether CXL has
fully claimed ownership of Soft Reserved memory ranges.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20260322195343.206900-8-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
dax: Track all dax_region allocations under a global resource tree
Introduce a global "DAX Regions" resource root and register each
dax_region->res under it via request_resource(). Release the resource on
dax_region teardown.
By enforcing a single global namespace for dax_region allocations, this
ensures only one of dax_hmem or dax_cxl can successfully register a
dax_region for a given range.
Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260322195343.206900-7-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Dan Williams [Sun, 22 Mar 2026 19:53:38 +0000 (19:53 +0000)]
dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
Move hmem/ earlier in the dax Makefile so that hmem_init() runs before
dax_cxl.
In addition, defer registration of the dax_cxl driver to a workqueue
instead of using module_cxl_driver(). This ensures that dax_hmem has
an opportunity to initialize and register its deferred callback and make
ownership decisions before dax_cxl begins probing and claiming Soft
Reserved ranges.
Mark the dax_cxl driver as PROBE_PREFER_ASYNCHRONOUS so its probe runs
out of line from other synchronous probing avoiding ordering
dependencies while coordinating ownership decisions with dax_hmem.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://patch.msgid.link/20260322195343.206900-6-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Dan Williams [Sun, 22 Mar 2026 19:53:37 +0000 (19:53 +0000)]
dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL)
so that HMEM only defers Soft Reserved ranges when CXL DAX support is
enabled. This makes the coordination between HMEM and the CXL stack more
precise and prevents deferral in unrelated CXL configurations.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260322195343.206900-5-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Dan Williams [Sun, 22 Mar 2026 19:53:36 +0000 (19:53 +0000)]
dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft
Reserved ranges.
Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous
request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual
loading, it does not enforce that the dependency has finished init
before the current module runs. This can cause HMEM to start before
cxl_acpi has populated the resource tree, breaking detection of overlaps
between Soft Reserved and CXL Windows.
Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike
cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices
that trigger further module loads. Asynchronous probe flushing
(wait_for_device_probe()) is added later in the series in a deferred
context before HMEM makes ownership decisions for Soft Reserved ranges.
Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI
must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming
Soft Reserved ranges before CXL drivers have had a chance to claim them.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://patch.msgid.link/20260322195343.206900-4-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
dax/hmem: Factor HMEM registration into __hmem_register_device()
Separate the CXL overlap check from the HMEM registration path and keep
the platform-device setup in a dedicated __hmem_register_device().
This makes hmem_register_device() the policy entry point for deciding
whether a range should be deferred to CXL, while __hmem_register_device()
handles the HMEM registration flow.
No functional changes.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260322195343.206900-3-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
dax/bus: Use dax_region_put() in alloc_dax_region() error path
alloc_dax_region() calls kref_init() on the dax_region early in the
function, but the error path for sysfs_create_groups() failure uses
kfree() directly to free the dax_region. This bypasses the kref lifecycle.
Use dax_region_put() instead to handle kref lifecycle correctly.
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260322195343.206900-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Youssef Samir [Thu, 5 Feb 2026 12:34:14 +0000 (13:34 +0100)]
accel/qaic: Handle DBC deactivation if the owner went away
When a DBC is released, the device sends a QAIC_TRANS_DEACTIVATE_FROM_DEV
transaction to the host over the QAIC_CONTROL MHI channel. QAIC handles
this by calling decode_deactivate() to release the resources allocated for
that DBC. Since that handling is done in the qaic_manage_ioctl() context,
if the user goes away before receiving and handling the deactivation, the
host will be out-of-sync with the DBCs available for use, and the DBC
resources will not be freed unless the device is removed. If another user
loads and requests to activate a network, then the device assigns the same
DBC to that network, QAIC will "indefinitely" wait for dbc->in_use = false,
leading the user process to hang.
As a solution to this, handle QAIC_TRANS_DEACTIVATE_FROM_DEV transactions
that are received after the user has gone away.
Fixes: 129776ac2e38 ("accel/qaic: Add control path") Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patch.msgid.link/20260205123415.3870898-1-youssef.abdulrahman@oss.qualcomm.com
John Ogness [Thu, 26 Mar 2026 13:38:02 +0000 (14:44 +0106)]
printk_ringbuffer: Add sanity check for 0-size data
get_data() has a sanity check for regular data blocks to ensure at
least space for the ID exists. But a regular block should also have
at least 1 byte of data (otherwise it would be data-less instead of
regular).
Expand the get_data() block size sanity check to additionally expect
at least 1 byte of data.
Commit cc3bad11de6e ("printk_ringbuffer: Fix check of valid data
size when blk_lpos overflows") added sanity checking to get_data()
to avoid returning data of illegal sizes (too large or too small).
It uses the helper function data_check_size() for the check.
However, data_check_size() expects the size of the data, not the
size of the data block. get_data() is providing the size of the
data block. This means that if the data size (text_buf_size) is
at or near the maximum legal size:
data_check_size() will report failure because it adds
sizeof(prb_data_block) to the provided size. The sanity check in
get_data() is counting the data block header twice. The result is
that the reader fails to read the legal record.
Since get_data() subtracts the data block header size before returning,
move the sanity check to after the subtraction.
Luckily printk() is not vulnerable to this problem because
truncate_msg() limits printk-messages to 1/4 of the ringbuffer.
Indeed, by adjusting the printk_ringbuffer KUnit test, which does not
use printk() and its truncate_msg() check, it is easy to see that the
reader fails and the WARN_ON is triggered.
Fixes: cc3bad11de6e ("printk_ringbuffer: Fix check of valid data size when blk_lpos overflows") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/20260326133809.8045-1-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
====================
bpf: classify block device hooks and add selftests
A bunch of new hooks for managing block devices were added a while ago
but they weren't appropriately classified. Classify them and add a test
program so we catch regressions.
Note that for whatever reason building the bpf selftests locally seems
to fail for all kinds of arcane reasons for me. That might just be my
fault. I added a pr against the ci to have the selftests run but to test
this meaningfully it needs veritysetup and dmverity support. I'm not
sure if that's available already.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v2:
- No changes.
- Link to v1: https://patch.msgid.link/20260220-work-bpf-bdev-v1-0-c53e852c4702@kernel.org
A bunch of new hooks for managing block devices were added a while ago
but they weren't actually appropriately classified.
* bpf_lsm_bdev_alloc() is called when the inode for the block
device is allocated. This happens from a sleepable context so mark the
function as sleepable. When this function is called the memory for the
block device storage embedded into the inode is zeroed. That block
device cannot be meaningfully reference or interacted with at this
point. So mark it as untrusted for now.
* bpf_lsm_bdev_free() is called when the inode for the block
device is freed. A bunch of memory associated with the block device
has already been freed and there's dangling pointers in there. So mark
it as untrusted. It cannot be meaningfully referenced or interacted
with anymore. It is also called from sb->s_op->free_inode:: which
means it runs in rcu context (most of the times). So leave it as
non-sleepable.
* bpf_lsm_bdev_setintegrity() is called when a dm-verity device
is instantiated (glossing over details for simplicity of the commit
message). The block device is very much alive so it remains a trusted
hook. It's also called with device mapper's suspend lock held and so
the hook is able to sleep so mark it sleepable.
Jan Kara [Thu, 26 Mar 2026 14:06:32 +0000 (15:06 +0100)]
udf: Fix race between file type conversion and writeback
udf_setsize() can race with udf_writepages() as follows:
udf_setsize() udf_writepages()
if (iinfo->i_alloc_type ==
ICBTAG_FLAG_AD_IN_ICB)
err = udf_expand_file_adinicb(inode);
err = udf_extend_file(inode, newsize);
udf_adinicb_writepages()
memcpy_from_file_folio() - crash
because inode size is too big.
Fix the problem by checking the file type under folio lock in
udf_handle_page_wb() handler called from __mpage_writepages() which
properly serializes with udf_expand_file_adinicb().
Jan Kara [Thu, 26 Mar 2026 14:06:31 +0000 (15:06 +0100)]
mpage: Provide variant of mpage_writepages() with own optional folio handler
Some filesystems need to treat some folios specially (for example for
inodes with inline data). Doing the handling in their .writepages method
in a race-free manner results in duplicating some of the writeback
internals. So provide generalized version of mpage_writepages() that
allows filesystem to provide a handler called for each folio which can
handle the folio in a special way.
When vNTB is used as a PCI endpoint function, the NTB device is backed
by a virtual PCI function. For DMA API allocations and mappings, NTB
clients must use the device that is associated with the IOMMU domain.
Implement ntb_dev_ops->get_dma_dev() for pci-epf-vntb and return the EPC
parent device.
Suggested-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260306031443.1911860-4-den@valinux.co.jp
Koichiro Den [Fri, 6 Mar 2026 03:14:42 +0000 (12:14 +0900)]
NTB: ntb_transport: Use ntb_get_dma_dev() for DMA buffers
ntb_transport currently uses ndev->pdev->dev for coherent allocations
and frees.
Switch the coherent buffer allocation/free paths to use
ntb_get_dma_dev(), so ntb_transport can work with NTB implementations
where the NTB PCI function is not the right device to use for DMA
mappings.
Suggested-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260306031443.1911860-3-den@valinux.co.jp
Koichiro Den [Fri, 6 Mar 2026 03:14:41 +0000 (12:14 +0900)]
NTB: core: Add .get_dma_dev() callback to ntb_dev_ops
Some NTB implementations are backed by a PCI function that is not the right
struct device to use with DMA API helpers (e.g. due to IOMMU topology, or
because the NTB device is virtual).
Add an optional .get_dma_dev() callback to struct ntb_dev_ops and provide a
helper, ntb_get_dma_dev(), so NTB clients can use the appropriate struct
device for DMA allocations and mappings.
If the callback is not implemented, ntb_get_dma_dev() returns the current
default (ntb->dev.parent). Drivers that implement .get_dma_dev() must
return a non-NULL device.
Suggested-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: format doc] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260306031443.1911860-2-den@valinux.co.jp
Jens Axboe [Fri, 27 Mar 2026 15:51:17 +0000 (09:51 -0600)]
Merge tag 'nvme-7.1-2026-03-27' of git://git.infradead.org/nvme into for-7.1/block
Pull NVMe updates from Keith:
"- Fabrics authentication updates (Eric, Alistar)
- Enanced block queue limits support (Caleb)
- Workqueue usage updates (Marco)
- A new write zeroes device quirk (Robert)
- Tagset cleanup fix for loop device (Nilay)"
* tag 'nvme-7.1-2026-03-27' of git://git.infradead.org/nvme: (41 commits)
nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown
nvme: add WQ_PERCPU to alloc_workqueue users
nvmet-fc: add WQ_PERCPU to alloc_workqueue users
nvmet: replace use of system_wq with system_percpu_wq
nvme-auth: Don't propose NVME_AUTH_DHGROUP_NULL with SC_C
nvme: Add the DHCHAP maximum HD IDs
nvme-pci: add NVME_QUIRK_DISABLE_WRITE_ZEROES for Kingston OM3SGP4
nvme: respect NVME_QUIRK_DISABLE_WRITE_ZEROES when wzsl is set
nvmet: report NPDGL and NPDAL
nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT
nvme: set discard_granularity from NPDG/NPDA
nvme: add from0based() helper
nvme: always issue I/O Command Set specific Identify Namespace
nvme: update nvme_id_ns OPTPERF constants
nvme: fold nvme_config_discard() into nvme_update_disk_info()
nvme: add preferred I/O size fields to struct nvme_id_ns_nvm
nvme: Allow reauth from sysfs
nvme: Expose the tls_configured sysfs for secure concat connections
nvmet-tcp: Don't free SQ on authentication success
nvmet-tcp: Don't error if TLS is enabed on a reset
...
Sohil Mehta [Wed, 25 Mar 2026 23:01:49 +0000 (16:01 -0700)]
x86/fred: Remove kernel log message when initializing exceptions
When FRED is enabled, its initialization message is printed for every CPU
during boot as well as during suspend-resume. This debug message can be noisy
and it isn't very useful unless someone is debugging FRED itself.
As FRED is enabled by default, remove the log message as mentioned in
the code comment.
James Morse [Fri, 13 Mar 2026 14:46:16 +0000 (14:46 +0000)]
arm_mpam: Quirk CMN-650's CSU NRDY behaviour
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
This tells us the monitor never finishes scanning the cache. The erratum
document says to wait the maximum time, then ignore the field.
Add a flag to indicate whether this is the final attempt to read the
counter, and when this quirk is applied, ignore the NRDY field.
This means accesses to this counter will always retry, even if the counter
was previously programmed to the same values.
The counter value is not expected to be stable, it drifts up and down with
each allocation and eviction. The CSU register provides the value for a
point in time.
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
The registers MSMON_MBWU_L and MSMON_MBWU return the number of requests
rather than the number of bytes transferred.
Bandwidth resource monitoring is performed at the last level cache, where
each request arrive in 64Byte granularity. The current implementation
returns the number of transactions received at the last level cache but
does not provide the value in bytes. Scaling by 64 gives an accurate byte
count to match the MPAM specification for the MSMON_MBWU and MSMON_MBWU_L
registers. This patch fixes the issue by reporting the actual number of
bytes instead of the number of transactions from __ris_msmon_read().
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
In the T241 implementation of memory-bandwidth partitioning, in the absence
of contention for bandwidth, the minimum bandwidth setting can affect the
amount of achieved bandwidth. Specifically, the achieved bandwidth in the
absence of contention can settle to any value between the values of
MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
zero (below 0.78125%), once a core enters a throttled state, it will never
leave that state.
The first issue is not a concern if the MPAM software allows to program
MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
In the scenario where the resctrl doesn't support the MBW_MIN interface via
sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention,
software should configure a relatively narrow gap between MBW_MIN and
MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
Clear the feature MBW_MIN feature from the class to ensure we don't
accidentally change behaviour when resctrl adds support for a MBW_MIN
interface.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
The MPAM bandwidth partitioning controls will not be correctly configured,
and hardware will retain default configuration register values, meaning
generally that bandwidth will remain unprovisioned.
To address the issue, follow the below steps after updating the MBW_MIN
and/or MBW_MAX registers.
- Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
(0x360048 + slice*0x10000 + partid*8). These registers are read-only.
- Continue iterating until all 12 shadow register values match in a loop.
pr_warn_once if the values fail to match within the loop count 1000.
- Perform 64b writes with the value 0x0 to the two spare registers at
offsets 0x1b0000 and 0x1c0000.
In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
are transformed into broadcast writes to the 12 shadow registers. The
final two writes to the spare registers cause a final rank of downstream
micro-architectural MPAM registers to be updated from the shadow copies.
The intervening loop to read the 12 shadow registers helps avoid a race
condition where writes to the spare registers occur before all shadow
registers have been updated.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
The MPAM specification includes the MPAMF_IIDR, which serves to uniquely
identify the MSC implementation through a combination of implementer
details, product ID, variant, and revision. Certain hardware issues/errata
can be resolved using software workarounds.
Introduce a quirk framework to allow workarounds to be enabled based on the
MPAMF_IIDR value.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Takashi Iwai [Fri, 27 Mar 2026 15:30:54 +0000 (16:30 +0100)]
ALSA: usb-audio: Extend max number of channels to 64
The current limitation of 16 as MAX_CHANNELS is rather historical at
the time of UAC1 definition. As there seem already devices with a
higher number of mixer channels, we should extend it too. As an ad
hoc update, let's raise it to 64 so that it can still fit in a single
long-long integer.
Takashi Iwai [Fri, 27 Mar 2026 15:30:53 +0000 (16:30 +0100)]
ALSA: usb-audio: Replace hard-coded number with MAX_CHANNELS
One place in mixer.c still used a hard-coded number 16 instead of
MAX_CHANNELS. Replace with it, so that we can extend the max number
of channels gracefully.
James Morse [Fri, 13 Mar 2026 14:46:09 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
A few resctrl features and hooks need to be provided, but aren't needed or
supported on MPAM platforms.
resctrl has individual hooks to separately enable and disable the
closid/partid and rmid/pmg context switching code. For MPAM this is all the
same thing, as the value in struct task_struct is used to cache the value
that should be written to hardware. arm64's context switching code is
enabled once MPAM is usable, but doesn't touch the hardware unless the
value has changed.
For now event configuration is not supported, and can be turned off by
returning 'false' from resctrl_arch_is_evt_configurable().
The new io_alloc feature is not supported either, always return false from
the enable helper to indicate and fail the enable.
Add this, and empty definitions for the other hooks.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:46:08 +0000 (14:46 +0000)]
arm_mpam: resctrl: Update the rmid reallocation limit
resctrl's limbo code needs to be told when the data left in a cache is
small enough for the partid+pmg value to be re-allocated.
x86 uses the cache size divided by the number of rmid users the cache may
have. Do the same, but for the smallest cache, and with the number of
partid-and-pmg users.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:46:06 +0000 (14:46 +0000)]
arm_mpam: resctrl: Allow resctrl to allocate monitors
When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs to allocate
a monitor on the corresponding resource. Monitors are allocated by class
instead of component.
Add helpers to allocate a CSU monitor. These helper return an out of range
value for MBM counters.
Allocating a montitor context is expected to block until hardware resources
become available. This only makes sense for QOS_L3_OCCUP as unallocated MBM
counters are losing data.
Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
James Morse [Fri, 13 Mar 2026 14:46:05 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add support for csu counters
resctrl exposes a counter via a file named llc_occupancy. This isn't really
a counter as its value goes up and down, this is a snapshot of the cache
storage usage monitor.
Add some picking code which will only find an L3. The resctrl counter
file is called llc_occupancy but we don't check it is the last one as
it is already identified as L3.
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Dave Martin <dave.martin@arm.com> Signed-off-by: Dave Martin <dave.martin@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
Ben Horgan [Fri, 13 Mar 2026 14:46:04 +0000 (14:46 +0000)]
arm_mpam: resctrl: Add monitor initialisation and domain boilerplate
Add the boilerplate that tells resctrl about the mpam monitors that are
available. resctrl expects all (non-telemetry) monitors to be on the L3 and
so advertise them there and invent an L3 resctrl resource if required. The
L3 cache itself has to exist as the cache ids are used as the domain
ids.
Bring the resctrl monitor domains online and offline based on the cpus
they contain.
Support for specific monitor types is left to later.
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Signed-off-by: James Morse <james.morse@arm.com>