Li RongQing [Wed, 3 Jun 2026 12:37:08 +0000 (20:37 +0800)]
dma-debug: fix physical address retrieval in debug_dma_sync_sg_for_device
In debug_dma_sync_sg_for_device(), when iterating over a scatterlist,
the debug entry population mistakenly uses the head of the scatterlist
'sg' to fetch the physical address via sg_phys(), instead of using the
current iterator variable 's'.
This causes dma-debug to track the physical address of the very first
scatterlist entry for all subsequent entries in the list.
Fix this by passing the correct loop iterator 's' to sg_phys()
Li Chen [Fri, 15 May 2026 09:18:27 +0000 (17:18 +0800)]
ext4: fast commit: export snapshot stats in fc_info
Snapshot-based fast commit can fall back when the commit-time snapshot
cannot be built (e.g. extent status cache misses). It is useful to
quantify the updates-locked window and to see why snapshotting failed.
Add best-effort snapshot counters to the ext4 superblock and extend
/proc/fs/ext4/<sb_id>/fc_info to report the number of snapshotted
inodes and ranges, snapshot failure reasons, and the average/max time
spent with journal updates locked.
Li Chen [Fri, 15 May 2026 09:18:26 +0000 (17:18 +0800)]
ext4: fast commit: add lock_updates tracepoint
Commit-time fast commit snapshots run under jbd2_journal_lock_updates(),
so it is useful to quantify the time spent with updates locked and to
understand why snapshotting can fail.
Add a new tracepoint, ext4_fc_lock_updates, reporting the time spent in
the updates-locked window along with the number of snapshotted inodes
and ranges. Record the first snapshot failure reason in a stable snap_err
field for tooling.
Li Chen [Fri, 15 May 2026 09:18:25 +0000 (17:18 +0800)]
ext4: fast commit: avoid i_data_sem by dropping ext4_map_blocks() in snapshots
Commit-time snapshots run under jbd2_journal_lock_updates(), so the work
done there must stay bounded.
The snapshot path still used ext4_map_blocks() to build data ranges. This
can take i_data_sem and pulls the mapping code into the snapshot logic.
Build inode data range snapshots from the extent status tree instead.
The extent status tree is a cache, not an authoritative source. If the
needed information is missing or unstable (e.g. delayed allocation), treat
the transaction as fast commit ineligible and fall back to full commit.
Also cap the number of inodes and ranges snapshotted per fast commit and
allocate range records from a dedicated slab cache. The inode pointer
array is allocated outside the updates-locked window.
Testing: QEMU/KVM guest, virtio-pmem + dax, ext4 -O fast_commit, mounted
dax,noatime. Ran python3 500x {4K write + fsync}, fallocate 256M, and
python3 500x {creat + fsync(dir)} without lockdep splats or errors.
Li Chen [Fri, 15 May 2026 09:18:24 +0000 (17:18 +0800)]
ext4: fast commit: avoid self-deadlock in inode snapshotting
ext4_fc_snapshot_inodes() used igrab()/iput() to pin inodes while building
commit-time snapshots. With ext4_fc_del() waiting for
EXT4_STATE_FC_COMMITTING, iput() can trigger
ext4_clear_inode()->ext4_fc_del() in the commit thread and deadlock waiting
for the fast commit to finish.
ext4_fc_del() also has to re-check EXT4_STATE_FC_COMMITTING after
waiting on EXT4_STATE_FC_FLUSHING_DATA. The commit thread clears
FLUSHING_DATA before it sets COMMITTING, so a waiter woken from the
flush wait must not delete the inode based on an old COMMITTING
check.
Avoid taking extra references. Collect inode pointers under s_fc_lock and
rely on EXT4_STATE_FC_COMMITTING to pin inodes until ext4_fc_cleanup()
clears the bit.
Also set EXT4_STATE_FC_COMMITTING for create-only inodes referenced
from the dentry update queue, and wake up waiters when ext4_fc_cleanup()
clears the bit.
Li Chen [Fri, 15 May 2026 09:18:23 +0000 (17:18 +0800)]
ext4: fast commit: avoid waiting for FC_COMMITTING
ext4_fc_track_inode() can be called while holding i_data_sem (e.g.
fallocate). Waiting for EXT4_STATE_FC_COMMITTING in that case risks an
ABBA deadlock: i_data_sem -> wait(FC_COMMITTING) vs FC_COMMITTING ->
wait(i_data_sem) in the commit task.
Now that fast commit snapshots inode state at commit time, updates during
log writing do not need to block. Drop the wait and lockdep assertion in
ext4_fc_track_inode(), and make ext4_fc_del() wait for FC_COMMITTING so an
inode cannot be removed while the commit thread is still using it.
When an inode is modified during a fast commit, mark it with
EXT4_STATE_FC_REQUEUE so cleanup keeps it queued for the next fast commit.
This is needed because jbd2_fc_end_commit() invokes the cleanup callback
with tid == 0, so tid-based requeue logic would requeue every inode.
Testing: tracepoint ext4:ext4_fc_commit_stop with two fsyncs in the same
transaction. nblks is the number of journal blocks written for that fast
commit. Before this change, the second fsync still wrote almost the same
fast commit log (nblks 10->9), because tid == 0 in jbd2_fc_end_commit()
caused the tid-based requeue logic to keep all inodes queued. After this
change, only inodes modified during the commit are requeued, and the
second fsync wrote a nearly empty fast commit (nblks 10->1).
Li Chen [Fri, 15 May 2026 09:18:22 +0000 (17:18 +0800)]
ext4: lockdep: handle i_data_sem subclassing for special inodes
Fast commit can hold s_fc_lock while writing journal blocks. Mapping the
journal inode can take its i_data_sem. Normal inode update paths can take a
data inode i_data_sem and then s_fc_lock, which makes lockdep report a
circular dependency.
lockdep treats all i_data_sem instances as one lock class and cannot
distinguish the journal inode i_data_sem from a regular inode i_data_sem.
The journal inode is not tracked by fast commit and no FC waiters ever
depend on it, so this is not a real ABBA deadlock. Assign the journal inode
a dedicated i_data_sem lockdep subclass to avoid the false positive.
Inode cache objects can be recycled, so also reset i_data_sem to
I_DATA_SEM_NORMAL when allocating an ext4 inode. Otherwise a new inode may
inherit an old subclass (journal/quota/ea) and trigger lockdep warnings.
Li Chen [Fri, 15 May 2026 09:18:21 +0000 (17:18 +0800)]
ext4: fast commit: snapshot inode state before writing log
Fast commit writes inode metadata and data range updates after unlocking
journal updates. New handles can start at that point, so the log writing
path must not look at live inode state.
Add a commit-time per-inode snapshot and populate it while journal updates
are locked and existing handles are drained. Store the snapshot behind
ext4_inode_info->i_fc_snap so ext4_inode_info only grows by one pointer.
The snapshot contains a copy of the on-disk inode plus the data range
records needed for fast commit TLVs.
Snapshotting runs under jbd2_journal_lock_updates(). Avoid triggering I/O
there by using ext4_get_inode_loc_noio() and falling back to full commit
if the inode table block is not present or not uptodate.
Log writing then only serializes the snapshot, so it no longer needs to
call ext4_map_blocks() and take i_data_sem under s_fc_lock. The snapshot
is installed and freed under s_fc_lock and is released from fast commit
cleanup and inode eviction.
Junrui Luo [Wed, 13 May 2026 09:28:40 +0000 (17:28 +0800)]
jbd2: fix integer underflow in jbd2_journal_initialize_fast_commit()
jbd2_journal_initialize_fast_commit() validates journal capacity by
checking (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS).
Both j_last and num_fc_blks are unsigned, so when num_fc_blks exceeds
j_last the subtraction wraps to a large value, bypassing the bounds
check.
The resulting underflow corrupts j_last, j_fc_first, and j_free,
leading to journal abort.
Fix by checking num_fc_blks against j_last before the subtraction,
returning -EFSCORRUPTED.
Fixes: 6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Fixes: e029c5f27987 ("ext4: make num of fast commit blocks configurable") Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Fixes: e029c5f279872 ("ext4: make num of fast commit blocks configurable") Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/SYBPR01MB7881663C927DE9D7BBF4D1DFAF062@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Li Chen [Wed, 13 May 2026 08:58:17 +0000 (16:58 +0800)]
ext4: fix fast commit wait/wake bit mapping on 64-bit
On 64-bit, ext4 dynamic inode states live in the upper half of i_flags,
and ext4_test_inode_state() applies the corresponding +32 offset.
The fast-commit wait and wake paths open-coded the wait key with the raw
EXT4_STATE_* value. Add small helpers for the state wait word and bit,
and use them for the FC_COMMITTING and FC_FLUSHING_DATA waits so the wait
key follows the same mapping as the state helpers.
Fixes: 857d32f26181 ("ext4: rework fast commit commit path") Reported-by: Sashiko AI review <sashiko-bot@kernel.org> Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260513085818.552432-1-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
However, h_transaction may legitimately be NULL for an aborted handle.
The is_handle_aborted() helper in include/linux/jbd2.h explicitly
treats !h_transaction as one of the aborted states:
if (handle->h_aborted || !handle->h_transaction)
return 1;
Every other entry point in fs/jbd2/transaction.c
(jbd2_journal_get_{write,undo,create}_access, jbd2_journal_extend,
jbd2_journal_restart, jbd2_journal_stop, etc.) guards against this
with an is_handle_aborted() check before any dereference of
h_transaction. jbd2_journal_dirty_metadata() was missing this guard.
This is reachable from ocfs2's xattr code. ocfs2_xa_set() intentionally
falls through to ocfs2_xa_journal_dirty() even after
ocfs2_xa_prepare_entry() fails, on the assumption that the buffer
needs to be journaled to record any partial modifications (see the
comment above the out_dirty label in fs/ocfs2/xattr.c). If the failure
was caused by the journal being aborted -- e.g. an underlying I/O
error during a sub-operation such as __ocfs2_remove_xattr_range() --
the handle's h_transaction has been cleared by the abort path, and
the unconditional deref in jbd2_journal_dirty_metadata() becomes a
NULL deref.
Reproduced by syzbot with a crafted ocfs2 image where I/O against the
loop device backing the mount is sabotaged via LOOP_SET_STATUS64
between two setxattr() calls, causing the second setxattr (which
truncates an external xattr value) to abort the journal mid-flight:
Oops: general protection fault, probably for non-canonical
address 0xdffffc0000000000
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: jbd2_journal_dirty_metadata+0x4a/0xd30 fs/jbd2/transaction.c:1520
Call Trace:
ocfs2_journal_dirty+0x130/0x700 fs/ocfs2/journal.c:831
ocfs2_xa_journal_dirty fs/ocfs2/xattr.c:1483 [inline]
ocfs2_xa_set+0x15e3/0x2ec0 fs/ocfs2/xattr.c:2294
ocfs2_xattr_block_set+0x3e0/0x33c0 fs/ocfs2/xattr.c:3016
__ocfs2_xattr_set_handle+0x6b3/0xf50 fs/ocfs2/xattr.c:3418
ocfs2_xattr_set+0xf3f/0x13e0 fs/ocfs2/xattr.c:3681
__vfs_setxattr+0x43c/0x480 fs/xattr.c:218
...
Fix by adding the standard is_handle_aborted() guard at the top of
jbd2_journal_dirty_metadata() and returning -EROFS, matching the
pattern used by every other entry point in this file.
ocfs2_journal_dirty() already handles a non-zero return from
jbd2_journal_dirty_metadata() correctly.
Reported-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=98f651460e558a21baae Tested-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260507050605.50081-1-kartikey406@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Avinash Bhatt [Sun, 31 May 2026 10:53:08 +0000 (13:53 +0300)]
wifi: iwlwifi: mld: add KUnit tests for link grading
Add tests for the link grading algorithm covering per-bandwidth
grading tables, channel load calculation, 6 GHz RSSI adjustments
including duplicated beacon and PSD/EIRP compensation, and
puncturing penalty.
Shahar Tzarfati [Sun, 31 May 2026 10:53:06 +0000 (13:53 +0300)]
wifi: iwlwifi: mld: drop TLC config cmd v4/v5 compat code
FW core102 bumped TLC_MNG_CONFIG_CMD_API_S from version 5 to
version 6. The v4 and v5 compatibility paths in
iwl_mld_send_tlc_cmd() are no longer reachable on any supported
firmware.
Miri Korenblit [Sun, 31 May 2026 10:53:05 +0000 (13:53 +0300)]
wifi: iwlwifi: mvm: remove __must_check annotation from command sending
We don't acually need to always check the return value. For example, if
we send a command to remove an object - we can assume success
(if it fails it is probably because the fw is dead, and then it doesn't
have the object anyway).
Miri Korenblit [Sun, 31 May 2026 10:53:04 +0000 (13:53 +0300)]
wifi: iwlwifi: trans: export the maximum supported hcmd size
Export the maximum allowed host command payload size to the op-modes.
Note that this information was available to the op-modes also before
this change, this just adds a clear macro.
Shahar Tzarfati [Sun, 31 May 2026 10:53:03 +0000 (13:53 +0300)]
wifi: iwlwifi: stop supporting core101
BZ, DR and SC no longer need to accept core101 firmware.
Raise the minimum supported firmware core from 101 to 102 so
these families only match supported core102 and newer images.
Shahar Tzarfati [Sun, 31 May 2026 10:53:02 +0000 (13:53 +0300)]
wifi: iwlwifi: remove orphaned DC2DC config enum
FW core102 removed both DC2DC_CONFIG_CMD_API_S and
DC2DC_CONFIG_CMD_RSP_API_S. The only driver-side artifact is
enum iwl_dc2dc_config_id in fw/api/config.h, which has no
callers in any .c file across all driver paths (mld/mvm/xvt).
Ever since the TFD queue size is no longer limited to 256 entries,
this code has been wrong, and might erroneously not detect a move
if it was by a multiple of 256. Not a big deal, but fix it while
I see it.
Our binding handling for P2P-Device can run into the following
scenario, as observed by our testing:
- a station interface is connected on some channel
- the P2P-Device does a remain-on-channel (ROC) on that channel
- the ROC ends, and the P2P-Device is removed from the binding,
but the phy_ctxt pointer is left around as a PHY cache so we
don't need to recalibrate to the channel again and again in
case it's not shared
- a binding update by the station interface, even a removal,
will re-add the P2P-Device to the binding
- the P2P-Device is removed, which removes the PHY context, but
it's still in the binding so the firmware crashes
Since the P2P device is removed from the binding and only re-
added by unrelated code, but we want to keep the phy_ctxt around
as a cache for future ROC usage, fix it by adding a boolean that
indicates whether or not the P2P-Device should be added to the
binding, and handle that in the binding iterator. That way, the
station interface cannot re-add the P2P-Device to the binding
when that isn't active.
Add KUnit tests to verify RSSI adjustment for 6 GHz duplicated
beacons across different operational bandwidths and validate
detection of the duplicated beacon bit.
Johannes Berg [Wed, 27 May 2026 20:05:09 +0000 (23:05 +0300)]
wifi: iwlwifi: mld: don't WARN on WoWLAN suspend w/o netdetect
Clearly, from a user perspective, it must be valid to configure
WoWLAN and then suspend while not connected to a network. Since
mac80211 doesn't distinguish these cases and simply calls the
driver to suspend whenever WoWLAN is configured, the driver has
to cleanly handle the case where it's called for WoWLAN, it's
not connected but there's also no netdetect configured.
Remove the WARN_ON() and keep returning 1 to disconnect and
then suspend.
Shahar Tzarfati [Wed, 27 May 2026 20:05:08 +0000 (23:05 +0300)]
wifi: iwlwifi: cfg: Revert "wifi: iwlwifi: cfg: move the MODULE_FIRMWARE to the per-rf file"
IWL_BZ_UCODE_CORE_MAX is undefined in cfg/rf-fm.c, this
causes __stringify(core) to turn it into the literal
token text, so MODULE_FIRMWARE entries are generated as
"iwlwifi...-cIWL_BZ_UCODE_CORE_MAX.ucode",
instead of the actual number.
wifi: iwlwifi: mld: set fast-balance scan for active EMLSR
While associated to MLD AP with active EMLSR, set all scan
operations as fast-balance scans. The only exception is when a
fragmented scan is planned (high traffic or low latency), in
which case the fragmented scan is preserved.
Miri Korenblit [Wed, 27 May 2026 20:05:05 +0000 (23:05 +0300)]
wifi: iwlwifi: mld: always allow mimo in NAN
The mimo field of the sta command is badly named. It really carries the
initial SMPS value as it is in the association request of the client
station (when we are the AP).
In NAN we don't have this information, just mark SMPS as disabled.
Given that we now use v3 rates with FW index throughout,
_to_hwrate() is confusing, since the hardware still uses
the PLCP value, the driver just doesn't see that now (as
it talks to firmware, not hardware.)
Rename this to iwl_mvm_rate_idx_to_fw_idx() to more
clearly indicate what it's doing.
Moriya Itzchaki [Wed, 27 May 2026 20:05:02 +0000 (23:05 +0300)]
wifi: iwlwifi: fix STEP_URM register address for SC devices
The CNVI_PMU_STEP_FLOW register address differs between device families.
For SC and newer devices, the register is at 0xA2D688,
while for BZ devices it's at 0xA2D588.
Miri Korenblit [Wed, 27 May 2026 20:05:01 +0000 (23:05 +0300)]
wifi: iwlwifi: mld: fix smatch warning
We dereference the mld_sta pointer before checking for NULL.
But we do check the sta pointer, and sta != NULL means mld_sta != NULL,
so there is no real issue.
Fix it anyway to silence the warning.
Miri Korenblit [Wed, 27 May 2026 20:04:59 +0000 (23:04 +0300)]
wifi: iwlwifi: remove stale comment
iwl_pcie_set_hw_ready still returns the return value of iwl_poll_bits,
but the latter one no longer returns the time elapsed until success, now it
returns either success or failure.
Remove the comment entirely.
Johannes Berg [Wed, 27 May 2026 20:04:58 +0000 (23:04 +0300)]
wifi: iwlwifi: fw: cut down NIC wakeups during dump
Currently, the dump code attempts to dump any number of
memories and register banks, as defined by the firmware.
Especially when the device is failing, this can lead to
excessive time spent attempting to acquire NIC access
over and over again.
Improve the code to only attempt to acquire NIC access
once or twice, but using the new memory dump functions
that may drop the spinlock etc. Mark all dump regions
that require NIC access, and skip them if we couldn't
obtain that.
In order to avoid CPU latency due to the increased time
holding the spinlock (and possibly disabling softirqs),
drop locks and call cond_resched() after each section
(if holding NIC access) but don't release HW NIC access.
AX231 is a device that is based on AX211 that doesn't support 6E and
its bandwidth is limited to 80 MHz.
Just reuse the radio config from AX203 which has the exact same
characteristics.
It has a specific subdevice ID to allow the driver to differentiate
between AX211 and AX231.
Maxwell Doose [Sun, 31 May 2026 23:38:28 +0000 (18:38 -0500)]
iio: chemical: scd30: Replace manual locking with RAII locking
scd30_core.c currently uses manual mutex_lock() and mutex_unlock()
calls. Replace them with the newer guard(mutex)() for cleaner RAII
patterns and to improve maintainability.
Add new helper function scd30_trigger_handler_helper() containing
the critical section for scd30_trigger_handler().
In addition, small refactor to replace "?:" operator with regular
if/else returns.
Heiko Carstens [Tue, 26 May 2026 05:57:02 +0000 (07:57 +0200)]
s390/percpu: Provide arch_this_cpu_write() implementation
Provide an s390 specific implementation of arch_this_cpu_write()
instead of the generic variant. The generic variant uses a quite
expensive raw_local_irq_save() / raw_local_irq_restore() pair.
Get rid of this by providing an own variant which makes use of the new
percpu code section infrastructure.
With this the text size of the kernel image is reduced by ~1k (defconfig).
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Heiko Carstens [Tue, 26 May 2026 05:57:01 +0000 (07:57 +0200)]
s390/percpu: Provide arch_this_cpu_read() implementation
Provide an s390 specific implementation of arch_this_cpu_read() instead
of the generic variant. The generic variant uses preempt_disable() /
preempt_enable() pair and READ_ONCE().
Get rid of the preempt_disable() / preempt_enable() pairs by providing an
own variant which makes use of the new percpu code section infrastructure.
With this the text size of the kernel image is reduced by ~1k
(defconfig). Also 87 generated preempt_schedule_notrace() function
calls within the kernel image (modules not counted) are removed.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Heiko Carstens [Tue, 26 May 2026 05:57:00 +0000 (07:57 +0200)]
s390/percpu: Use new percpu code section for arch_this_cpu_[and|or]()
Convert arch_this_cpu_[and|or]() to make use of the new percpu code
section infrastructure.
There is no user of this_cpu_and() and only one user of this_cpu_or()
within the kernel. Therefore this conversion has hardly any effect,
and also removes only preempt_schedule_notrace() function call.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Heiko Carstens [Tue, 26 May 2026 05:56:59 +0000 (07:56 +0200)]
s390/percpu: Use new percpu code section for arch_this_cpu_add_return()
Convert arch_this_cpu_add_return() to make use of the new percpu code
section infrastructure.
With this the text size of the kernel image is reduced by ~4k
(defconfig). Also 66 generated preempt_schedule_notrace() function
calls within the kernel image (modules not counted) are removed.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Heiko Carstens [Tue, 26 May 2026 05:56:58 +0000 (07:56 +0200)]
s390/percpu: Use new percpu code section for arch_this_cpu_add()
Convert arch_this_cpu_add() to make use of the new percpu code section
infrastructure.
With this the text size of the kernel image is reduced by ~76kb
(defconfig). Also more than 5300 generated preempt_schedule_notrace()
function calls within the kernel image (modules not counted) are removed.
With:
DEFINE_PER_CPU(long, foo);
void bar(long a) { this_cpu_add(foo, a); }
Heiko Carstens [Tue, 26 May 2026 05:56:56 +0000 (07:56 +0200)]
s390/percpu: Infrastructure for more efficient this_cpu operations
With the intended removal of PREEMPT_NONE this_cpu operations based on
atomic instructions, guarded with preempt_disable()/preempt_enable() pairs
become more expensive: the preempt_disable() / preempt_enable() pairs are
not optimized away anymore during compile time.
In particular the conditional call to preempt_schedule_notrace() after
preempt_enable() adds additional code and register pressure.
E.g. this simple C code sequence
DEFINE_PER_CPU(long, foo);
long bar(long a) { return this_cpu_add_return(foo, a); }
Even though the above example is more or less the worst case, since the
branch to preempt_schedule_notrace() requires a stackframe, which
otherwise wouldn't be necessary, there is also the conditional jnhe branch
instruction.
Get rid of the conditional branch with the following code sequence:
The general idea is that this_cpu operations based on atomic instructions
are guarded with mviy instructions:
- The first mviy instruction writes the register number, which contains
the percpu address variable to lowcore. This also indicates that a
percpu code section is executed.
- The first instruction following the mviy instruction must be the ag
instruction which adds the percpu offset to the percpu address register.
- Afterwards the atomic percpu operation follows.
- Then a second mviy instruction writes a zero to lowcore, which indicates
the end of the percpu code section.
- In case of an interrupt/exception/nmi the register number which was
written to lowcore is copied to the exception frame (pt_regs), and a zero
is written to lowcore.
- On return to the previous context it is checked if a percpu code section
was executed (saved register number not zero), and if the process was
migrated to a different cpu. If the percpu offset was already added to
the percpu address register (instruction address does _not_ point to the
ag instruction) the content of the percpu address register is adjusted so
it points to percpu variable of the new cpu.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/zcrypt: Replace get_zeroed_page() with kzalloc()
zcrypt_rng_device_add() allocates a buffer for the software random
number generator data cache.
This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().
Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
s390/trng: Replace __get_free_page() with kmalloc()
trng_read() allocates a temporary staging buffer for CPACF TRNG
random data before copying it to userspace.
This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().
s390/qeth: Replace get_zeroed_page() with kzalloc()
qeth_get_trap_id() allocates a temporary buffer for STSI system
information queries used to build trap identification strings.
This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().
s390/hvc_iucv: Replace get_zeroed_page() with kzalloc()
hvc_iucv_alloc() allocates a send staging buffer for accumulating
outbound terminal characters before they are copied into a separate
IUCV message buffer for transmission to the hypervisor. The staging
buffer itself is never passed to any IUCV function.
This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().
s390/dasd: Replace get_zeroed_page() with kzalloc()
DASD driver uses get_zeroed_page() to allocate pages for the Extended Error
Reporting software ring buffer and for a scratch buffer for formatting
sense dump diagnostic text.
These buffers can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().
s390/con3270: Replace __get_free_page() with kmalloc()
con3270_alloc_view() allocates a staging buffer used to assemble
3270 datastream content before it is copied into channel program
requests.
This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.
kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.
Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.
For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.
Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().
John Hubbard [Wed, 3 Jun 2026 07:30:22 +0000 (16:30 +0900)]
gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version
The FSP Chain of Trust handshake is versioned: Hopper speaks version 1
and Blackwell speaks version 2. Provide the version through the FSP HAL
so the boot message carries the value FSP expects, and so chipsets that
do not use FSP need not express a version at all.
FSP exchanges are request/response: the driver sends an MCTP/NVDM
message and must match the reply against the request before acting on
it. Add the synchronous send-and-wait path that validates the response
transport and message headers and confirms the reply corresponds to the
request that was sent.
John Hubbard [Wed, 3 Jun 2026 07:30:20 +0000 (16:30 +0900)]
gpu: nova-core: add MCTP/NVDM protocol types for firmware communication
Add the MCTP (Management Component Transport Protocol) and NVDM (NVIDIA
Data Model) wire-format types used for communication between the kernel
driver and GPU firmware processors.
This includes typed MCTP transport headers, NVDM message headers, and
NVDM message type identifiers. Both the FSP boot path and the upcoming
GSP RPC message queue share this protocol layer.
FSP communication uses a pair of non-circular queues in the FSP
falcon's EMEM, one for messages from the driver to FSP and one for
replies, with the driver polling for response data. Add the queue
registers and the low-level helpers used by the higher-level FSP
message layer.
Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260603-b4-blackwell-v13-2-d9f3a06939e0@nvidia.com
[acourbot: align register fields names with OpenRM.]
[acourbot: represent registers as arrays of 8 instances, as per OpenRM.] Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Add external memory (EMEM) read/write operations to the GPU's FSP falcon
engine. These operations use Falcon PIO (Programmed I/O) to communicate
with the FSP through indirect memory access.
Jinyu Tang [Sun, 17 May 2026 15:34:27 +0000 (23:34 +0800)]
KVM: riscv: Fast-path dirty logging write faults
With dirty logging enabled, guest writes often fault on an existing 4K
G-stage leaf that was write-protected only for dirty tracking. The slow
path still performs the full fault handling flow and takes mmu_lock for
write, even though the page-table shape does not change.
x86 handles the analogous case in its fast page fault path by atomically
making a writable SPTE writable again when the fault is only a
write-protection fault. Add the same style of fast path for RISC-V. If a
write fault hits an existing 4K leaf in a writable dirty-log memslot,
mark the page dirty and atomically set the PTE writable and dirty under
the read side of mmu_lock.
The dirty bitmap is updated before the PTE becomes writable again. The
PTE D bit is also set so systems that trap on a clear D bit do not fall
back to the slow path for a writable but clean PTE.
When a fault hits an existing G-stage leaf with the same PFN, KVM only
needs to update the PTE permissions. This path will be used by read-side
fault handling, so it must not overwrite a concurrent PTE update.
Use the cmpxchg helper when relaxing permissions on an existing leaf,
following the same concurrency model used by x86 for atomic SPTE
permission updates. Retry if another CPU changed the PTE first, and use
cpu_relax() while spinning.
Jinyu Tang [Sun, 17 May 2026 15:34:25 +0000 (23:34 +0800)]
KVM: riscv: Add a G-stage PTE cmpxchg helper
Permission-only G-stage PTE updates can run in parallel once they are
moved to the read side of mmu_lock. Plain set_pte() is not enough for
that case because another CPU may update the same PTE first.
x86 handles the same class of SPTE races with cmpxchg-based updates in
its fast page fault and TDP MMU paths. Add a small RISC-V helper for
atomic G-stage PTE updates. The helper reports contention to the caller
and flushes the target range only when the PTE value actually changes.
Jinyu Tang [Sun, 17 May 2026 15:34:24 +0000 (23:34 +0800)]
KVM: riscv: Use an rwlock for mmu_lock
RISC-V KVM currently uses a spinlock for mmu_lock. That serializes all
G-stage MMU operations, including permission-only updates that do not
allocate or free page-table pages.
Use KVM's rwlock form of mmu_lock, as x86 and arm64 already do. Keep the
existing map, unmap and teardown paths on the write side. This prepares
RISC-V for read-side handling of G-stage permission updates.
Yosry Ahmed [Thu, 28 May 2026 23:10:52 +0000 (16:10 -0700)]
KVM: selftests: Add a test for gPAT handling in L2
When KVM_X86_QUIRK_NESTED_SVM_SHARED_PAT is disabled, verify that KVM
correctly virtualizes the host PAT MSR and the guest PAT register for
nested SVM guests.
With nested NPT disabled:
* L1 and L2 share the same PAT
* The vmcb12.g_pat is ignored
With nested NPT enabled:
* An invalid g_pat in vmcb12 causes VMEXIT_INVALID
* RDMSR(IA32_PAT) from L2 returns the value of the guest PAT register
* WRMSR(IA32_PAT) from L2 is reflected in vmcb12's g_pat on VMEXIT
* RDMSR(IA32_PAT) from L1 returns the value of the host PAT MSR
Verify that save/restore with the vCPU in guest mode behaves as expected in
both cases, e.g. preserves both hPAT and gPAT when NPT is enabled.
Originally-by: Jim Mattson <jmattson@google.com> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
[sean: use even fancier macro shenanigans] Link: https://patch.msgid.link/20260528231052.404737-1-seanjc@google.com
[sean: avoid use of goto, print skips] Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: selftests: Add guest_memfd regression test signed offset+size bug
Add a regression (and proof-of-bug) testcase to ensure KVM rejects an
offset+size that would result in a negative value when computed as a signed
64-bit value. KVM had a flaw where it would allow binding a memslot to a
guest_memfd instance even with a wildly out-of-range offset, if the offset
and size were both positive values, but the combined offset+size was
negative.
Use "0x7fffffffffffffffull - page_size", i.e. "INT64_MAX - page_size", for
the offset as the size of the guest_memfd file must be at least page_size
(KVM requires memslots and gmem files to be host page-size aligned). I.e.
"INT64_MAX - page_size + size" is guaranteed to generate an offset+size
that is negative when converted to a signed 64-bit value *and* honors KVM's
alignment requirements.
KVM: selftests: Expand the guest_memfd test macros to allow passing the VM
Expand the gmem test macros to allow passing the VM to testcases, without
needing to plumb the VM into _every_ testcase, as the vast majority of
testcases only need the fd and size.
KVM: guest_memfd: Treat memslot binding offset+size as unsigned values
When binding a memslot to a guest_memfd file, treat the offset and size as
unsigned values to fix a bug where the sum of the two can result in a false
negative when checking for overflow against the size of the file. Passing
unsigned values also avoids relying on somewhat obscure checks in other
flows for safety, and tracks the offset and size as they are intended to be
tracked, as unsigned values.
On 64-bit kernels, the number of pages a memslot contains and thus the size
(and offset) of its guest_memfd binding are unsigned 64-bit values. Taking
the offset+size as an loff_t instead of a uoff_t inadvertently converts
the unsigned value to a signed value if the offset and/or size is massive.
Locally storing the offset and size as signed values is benign in and of
itself (though even that is *extremely* difficult to discern), but
operating on their sum is not.
For the offset, KVM explicitly checks against a negative value, which might
seem like a bug as KVM could incorrectly reject a legitimate binding, but
that's not actually the case as KVM_CREATE_GUEST_MEMFD takes a signed value
for its size, i.e. a would-be-negative offset is also greater than the
maximum possible size of any guest_memfd file.
Regarding the size, while KVM lacks an explicit check for a negative value,
i.e. seemingly has a flawed overflow check, KVM restricts the number of
pages in a single memslot to the largest positive signed 32-bit value:
if (id < KVM_USER_MEM_SLOTS &&
(mem->memory_size >> PAGE_SHIFT) > KVM_MEM_MAX_NR_PAGES)
return -EINVAL;
and so that maximum "size" will ever be is 0x7fffffff000.
The sum of the two is, however, problematic. While the size is restricted
by KVM's memslot logic, the offset is not, i.e. the offset is completely
unchecked until the "offset + size > i_size_read(inode)" check. If the
offset is the (nearly) largest possible _positive_ value, then adding size
to the offset can result in a signed, negative 64-bit value. When compared
against the size of the file (guaranteed to be positive), the negative sum
is always smaller, and KVM incorrectly allows the absurd offset.
Opportunistically add missing includes in kvm_mm.h (instead of relying on
its parents).
Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Cc: stable@vger.kernel.org Cc: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://patch.msgid.link/20260602170921.1304394-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Jason Gunthorpe [Mon, 1 Jun 2026 16:52:31 +0000 (13:52 -0300)]
RDMA/umem: Fix truncation for block sizes >= 4G
When the iommu is used the linearization of the mapping can give a single
block that is very large split across multiple SG entries.
When __rdma_block_iter_next() reassembles the split SG entries it is
overflowing the 32 bit stack values and computed the wrong DMA addresses
for blocks after the truncation.
KVM: x86: Move async #PF helpers to x86.h (as inlines)
Move kvm_pv_async_pf_enabled() and kvm_async_pf_hash_reset() to x86.h in
anticipation of extracting the majority of register and MSR specific code
out of x86.c.
Move update_cr8_intercept() to lapic.c so that it's globally visible
in anticipation of extracting most of the register-specific code out of
x86.c and into a new compilation unit. Opportunistically prefix the
helper kvm_lapic_ to make its role/scope more obvious.
KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels
Unconditionally return %false for is_64_bit_hypercall() on 32-bit kernels
to guard against incorrectly setting guest_state_protected, and because
in a (very) hypothetical world where 32-bit KVM supports protected guests,
assuming a hypercall was made in 64-bit mode is flat out wrong.
KVM: nSVM: Use kvm_rax_read() now that it's mode-aware
Now that kvm_rax_read() truncates the output value to 32 bits if the
vCPU isn't in 64-bit mode, use it instead of the more verbose (and very
technically slower) kvm_register_read().
Note! VMLOAD, VMSAVE, and VMRUN emulation are still technically buggy,
as they can use EAX (versus RAX) in 64-bit mode via an operand size
prefix. Don't bother trying to handle that case, as it would require
decoding the code stream, which would open an entirely different can of
worms, and in practice no sane guest would shove garbage into RAX[63:32]
and then execute VMLOAD/VMSAVE/VMRUN with just EAX.
Drop the non-raw, mode-aware kvm_<reg>_write() helpers as there is no
usage in KVM, and in all likelihood there will never be usage in KVM as
use of hardcoded registers in instructions is uncommon, and *modifying*
hardcoded registers is practically unheard of. While there are a few
instructions that modify registers in mode-aware ways, e.g. REP string
and some ENCLS varieties, the odds of KVM needing to emulate such
instructions (outside of the fully emulator) are vanishingly small.
Drop kvm_<reg>_write() to prevent incorrect usage; _if_ a new instruction
comes along that needs to modify a hardcoded register, this can be
reverted.
KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
Make kvm_<reg>_{read,write}() mode-aware (where the value is truncated to
32 bits if the vCPU isn't in 64-bit mode), and convert all the intentional
"raw" accesses to kvm_<reg>_{read,write}_raw() versions. To avoid
confusion and bikeshedding over whether or not explicit 32-bit accesses
should use the "raw" or mode-aware variants, add and use "e" versions, e.g.
for things like RDMSR, WRMSR, and CPUID, where the instruction uses only
bits 31:0, regardless of mode.
No functional change intended (all use of "e" versions is for cases where
the value is already truncated due to bouncing through a u32).
Cc: Binbin Wu <binbin.wu@linux.intel.com> Cc: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20260529222223.870923-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: x86: Move inlined GPR, CR, and DR helpers from x86.h to regs.h
Move inlined General Purpose Register, Control Register, and Debug
Register helpers from x86.h to the aptly named regs.h, to help trim
down x86.h (and x86.c in the future).
Move *very* select EFER functionality as well, but leave behind the bulk of
EFER handling and all other MSR handling. There is more than enough MSR
code to carve out msrs.{c,h} in the future. Give is_long_bit_mode()
special treatment as it's more along the lines of a CR4 bit check, but just
happens to be accessed through an MSR interface. And more importantly,
because giving regs.h access to is_long_bit_mode() greatly simplifies
dependency chains.
Rename kvm_cache_regs.h to simply regs.h, as the "cache" nomenclature is
already a lie (the file deals with state/registers that aren't cached per
se), and so that more code/functionality can be landed in the header
without making it a truly horrible misnomer.
Deliberately drop the kvm_ prefix/namespace to align with other "local"
headers, and to further differentiate regs.h from the public/global
arch/x86/include/asm/kvm_vcpu_regs.h, which sadly needs to stay in asm/
so that the number of registers can be referenced by kvm_vcpu_arch.
KVM: x86: Trace hypercall register *after* truncating values for 32-bit
When tracing hypercalls, invoke the tracepoint *after* truncating the
register values for 32-bit guests so as not to record unused garbage (in
the extremely unlikely scenario that the guest left garbage in a register
after transitioning from 64-bit mode to 32-bit mode).
Fixes: 229456fc34b1 ("KVM: convert custom marker based tracing to event traces") Reviewed-by: Yosry Ahmed <yosry@kernel.org> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://patch.msgid.link/20260529222223.870923-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode
When getting register values for ENCLS emulation, use kvm_register_read()
instead of kvm_<reg>_read() so that bits 63:32 of the register are dropped
if the guest is in 32-bit mode.
Note, the misleading/surprising behavior of kvm_<reg>_read() being "raw"
variants under the hood will be addressed once all non-benign bugs are
fixed.
Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions") Fixes: b6f084ca5538 ("KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)") Acked-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://patch.msgid.link/20260529222223.870923-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
Don't truncate RAX when handling a Xen hypercall for a guest with protected
state, as KVM's ABI is to assume the guest is in 64-bit for such cases
(the guest leaving garbage in 63:32 after a transition to 32-bit mode is
far less likely than 63:32 being necessary to complete the hypercall).
Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state") Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://patch.msgid.link/20260529222223.870923-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall
Bug the VM if 32-bit KVM attempts to handle a 64-bit hypercall, primarily
so that a future change to set "input" in mode-specific code doesn't
trigger a false positive warn=>error:
arch/x86/kvm/xen.c:1687:6: error: variable 'input' is used uninitialized
whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
1687 | if (!longmode) {
| ^~~~~~~~~
arch/x86/kvm/xen.c:1708:31: note: uninitialized use occurs here
1708 | trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
| ^~~~~
x86/kvm/xen.c:1687:2: note: remove the 'if' if its condition is always true
1687 | if (!longmode) {
| ^~~~~~~~~~~~~~
arch/x86/kvm/xen.c:1677:11: note: initialize the variable 'input' to silence this warning
1677 | u64 input, params[6], r = -ENOSYS;
| ^
1 error generated.
Note, params[] also has the same flaw, but -Wsometimes-uninitialized
doesn't seem to be enforced for arrays, presumably because it's difficult
to avoid false positives on specific entries.
KVM: SVM: Truncate INVLPGA address in compatibility mode
Check for full 64-bit mode, not just long mode, when truncating the
virtual address as part of INVLPGA emulation. Compatibility mode doesn't
support 64-bit addressing.
Note, the FIXME still applies, e.g. if the guest deliberately targeted
EAX while in 64-bit via an address size override. That flaw isn't worth
fixing as it would require decoding the code stream, which would open an
entirely different can of worms, and in practice no sane guest would shove
garbage into RAX[63:32] and execute INVLPGA.
Note #2, VMSAVE, VMLOAD, and VMRUN all suffer from the same architectural
flaw of not providing the full linear address in a VMCB exit information
field, because, quoting the APM verbatim:
the linear address is available directly from the guest rAX register
(VMSAVE, VMLOAD, and VMRUN take a physical address, but their behavior
with respect to rAX is otherwise identical).
Fixes: bc9eff67fc35 ("KVM: SVM: Use default rAX size for INVLPGA emulation") Reviewed-by: Yosry Ahmed <yosry@kernel.org> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://patch.msgid.link/20260529222223.870923-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
cfg_setup() dereferences drvdata.hdev to issue MCU command requests.
hid_go_cfg_remove() tears down sysfs and stops the HID device, but
never drains the delayed work. If the device is unbound within the
2 ms scheduling delay (a probe failure rolling back via remove, or a
fast rmmod after probe), the work fires after hid_destroy_device()
has dropped its reference and released the underlying hdev struct,
leaving cfg_setup() with a stale drvdata.hdev pointer.
Mirror the sibling driver hid-lenovo-go-s.c, whose hid_gos_cfg_remove()
already calls cancel_delayed_work_sync() on its analogous work, and
drain go_cfg_setup at the top of hid_go_cfg_remove(). The cancel
must come before guard(mutex)(&drvdata.cfg_mutex) because cfg_setup()
acquires that mutex; reversing the order would deadlock.
Linus Walleij [Wed, 3 Jun 2026 12:17:39 +0000 (14:17 +0200)]
Merge tag 'stm32-dt-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into soc/dt
STM32 DT for v7.2, round 1
Highlights:
----------
- MPU:
- STM32MP13:
- Enable PHY SSC (Spread Spectrum) on DHCORE DHSBC board.
- Add board pin documentation stm32mp135f-dk to help user.
- STMP32MP15:
- Protonic:
- Update MECIOR0 ans MECIOR1 boards:
- Define ADC channels and GPIO line definitions in board and
no longer in common file.
- Fix ADC sampling.
- STM32MP25:
- Fix SAI addresses.
* tag 'stm32-dt-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
arm64: dts: st: Fix SAI addresses on stm32mp251
ARM: dts: stm32: stm32mp15x-mecio1-io: Move expander gpio-line-names to board files
ARM: dts: stm32: stm32mp15x-mecio1-io: Fix expander gpio line typo
ARM: dts: stm32: stm32mp15x-mecio1-io: Move gpio-line-names to board files
ARM: dts: stm32: stm32mp15x-mecio1-io: Fix GPIO names typo
ARM: dts: stm32: stm32mp15x-mecio1-io: Move divergent mecio1 ADC channels to board files
ARM: dts: stm32: stm32mp15x-mecio1-io: Fix ADC sampling times
ARM: dts: stm32: stm32mp15x-mecio1-io: Enable internal ADC reference
ARM: dts: stm32: add board pin documentation stm32mp135f-dk
ARM: dts: stm32: Enable PHY SSC on DH STM32MP13xx DHCOR DHSBC board
Linus Walleij [Wed, 3 Jun 2026 12:17:06 +0000 (14:17 +0200)]
Merge tag 'samsung-dt64-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt
Samsung DTS ARM64 changes for v7.2
1. Exynos850: Implement proper power off to fully shutdown the board and
reduce drawn current.
2. ExynosAutov920: Add UFS storage.
3. Add Peter Griffin as a co-maintainer.
I have multiple subsystems to care of and limited time. Also, with
joining to SoC team I figured out it is good to plan my succession.
Or backup.
Peter shown both time and interest in keeping Samsung Exynos code
working. He already works on and maintains Google Tensor SoC, which
shares a lot with Samsung Exynos processors.
Considering all this, I proposed Peter to become a co-maintainer here
(same for pinctrl, which went via different tree).
I will still be the one handling patches for this and (probably) next
cycle, but in a further timeframe the roles could reverse with me
only providing acks or reviews. If this works then depending on other
duties and amount of work, I might be slowly transitioning to leave
Samsung SoC maintainership.
* tag 'samsung-dt64-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
MAINTAINERS: Add Peter Griffin as a co-maintainer of Samsung Exynos SoCs
arm64: dts: exynos: Add EL2 virtual timer interrupt
arm64: dts: exynosautov920: enable support for ufs controller
arm64: dts: exynosautov920: Add syscon hsi2 node
arm64: dts: exynos850: Add syscon-poweroff node
Linus Walleij [Wed, 3 Jun 2026 12:13:41 +0000 (14:13 +0200)]
Merge tag 'tegra-for-7.2-arm-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/dt
ARM: tegra: Device tree changes for v7.2-rc1
The bulk of this is various improvements for some of the older ASUS and
LG devices, but there's also support for interconnects on Tegra114 to
help improve memory frequency scaling.
* tag 'tegra-for-7.2-arm-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: tegra: tf600t: Invert accelerometer calibration matrix
ARM: tegra: tf600t: Drop backlight regulator
ARM: tegra: tf600t: Configure panel
ARM: tegra: transformers: Add connector node for common trees
ARM: tegra: transformer: Add support for front camera
ARM: tegra: grouper: Add support for front camera
ARM: tegra: p880: Lower CPU thermal limit
ARM: tegra: lg-x3: Set PMIC's RTC address
ARM: tegra: lg-x3: Complete video device graph
ARM: tegra: Configure Tegra114 power domains
ARM: tegra: Add DC interconnections for Tegra114
ARM: tegra: Add EMC OPP and ICC properties to Tegra114 EMC and ACTMON device-tree nodes
ARM: tegra: Add #{address,size}-cells to Chromium-based /firmware
dt-bindings: memory: Document Tegra114 External Memory Controller
dt-bindings: memory: Document Tegra114 Memory Controller
Johannes Berg [Fri, 29 May 2026 08:25:08 +0000 (10:25 +0200)]
wifi: mac80211: AP: handle DBE for clients
In AP mode, track the BSS non-DBE bandwidth and apply
that to all non-DBE clients, then track OMP updates
from the clients and enable/disable DBE accordingly.
For now don't send a response, clients need to have a
timer anyway (it's up to the driver to set the right
timeout in UHR capabilities.)
Johannes Berg [Fri, 29 May 2026 08:25:07 +0000 (10:25 +0200)]
wifi: mac80211: parse and apply UHR DBE channel
When a UHR AP has DBE enabled, parse the channel and apply it
to the chandef. Apply for TX only after the OMP response (or
timeout) so that the AP doesn't receive frames with DBE width
before the station completed transition to DBE.