Fixing an integer overflow present in batadv_iv_ogm_send_to_if. The size
check is done using the int type in batadv_iv_ogm_aggr_packet whereas the
buff_pos variable uses the s16 type. This could lead to an out-of-bound
read.
Linus Torvalds [Sat, 2 May 2026 19:31:43 +0000 (12:31 -0700)]
Merge tag 'v7.1-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
- Reject algorithms with authsizes that are too short in authencesn
* tag 'v7.1-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: authencesn - reject short ahash digests during instance creation
Linus Torvalds [Sat, 2 May 2026 19:25:57 +0000 (12:25 -0700)]
Merge tag 'ntfs-for-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs
Pull ntfs fixes from Namjae Jeon:
- Fix a NULL pointer dereference in ntfs_index_walk_down() by
validating index block allocation
- Fix a memory leak of the symlink target string in
ntfs_reparse_set_wsl_symlink() during error paths
- Prevent VCN overflow and validate lowest_vcn in
ntfs_mapping_pairs_decompress() to avoid runlist corruption
- Fix a page reference leak in ntfs_write_iomap_end_resident()
when attribute search context allocation fails
- Fix an invalid PTR_ERR() usage on a valid folio pointer in
__ntfs_bitmap_set_bits_in_run()
- Correct directory link counting by dropping nlink only when
the MFT record link count reaches zero for WIN32/DOS aliases
- Fix an uninitialized variable in ntfs_mapping_pairs_decompress()
by returning an error pointer directly
* tag 'ntfs-for-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs:
ntfs: Use return instead of goto in ntfs_mapping_pairs_decompress()
ntfs: drop nlink once for WIN32/DOS aliases
ntfs: fix invalid PTR_ERR() usage in __ntfs_bitmap_set_bits_in_run()
ntfs: fix error handling in ntfs_write_iomap_end_resident()
ntfs: fix VCN overflow in ntfs_mapping_pairs_decompress()
ntfs: fix WSL symlink target leak on reparse failure
ntfs: fix NULL dereference in ntfs_index_walk_down()
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:48 +0000 (13:17 -0300)]
RDMA/hns: Fix unlocked call to hns_roce_qp_remove()
Sashiko points out that hns_roce_qp_remove() requires the caller to hold
locks. The error flow in hns_roce_create_qp_common() doesn't hold those
locks for the error unwind so it risks corrupting memory.
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:47 +0000 (13:17 -0300)]
RDMA/hns: Fix xarray race in hns_roce_create_qp_common()
Similar to the SRQ case the hr_qp is stored in the xarray before it is
fully initialized. Unlike the SRQ case the error unwinds do not wait for
the completion so keep the refcount 0 until the function succeeds.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://patch.msgid.link/r/14-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Suggested-by: Junxian Huang <huangjunxian6@hisilicon.com> Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Which will fail refcount debug because the refcount is 0 and then crash:
srq->event(srq, event_type);
Because event is NULL.
Use refcount_inc_not_zero() instead to ensure a partially prepared srq is
never retrieved from the event handler and fix the ordering of the
initialization so refcount becomes 1 only after it is fully ready.
All the initialization must be done before calling free_srqc() since it
depends on the completion and refcount.
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:45 +0000 (13:17 -0300)]
RDMA/mlx4: Fix mis-use of RCU in mlx4_srq_event()
Sashiko points out the radix_tree itself is RCU safe, but nothing ever
frees the mlx4_srq struct with RCU, and it isn't even accessed within the
RCU critical section. It also will crash if an event is delivered before
the srq object is finished initializing.
Use the spinlock since it isn't easy to make RCU work, use
refcount_inc_not_zero() to protect against partially initialized objects,
and order the refcount_set() to be after the srq is fully initialized.
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:42 +0000 (13:17 -0300)]
RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()
Sashiko points out that pd->uctx isn't initialized until late in the
function so all these error flow references are NULL and will crash. Use
the uctx that isn't NULL.
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:41 +0000 (13:17 -0300)]
RDMA/ocrdma: Clarify the mm_head searching
The intention of this code is to find matching entries exactly, the driver
never creates phys_addr's with different lens so the current expression is
not a bug, but it doesn't make sense and confuses review tooling.
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:38 +0000 (13:17 -0300)]
RDMA/mana: Remove user triggerable WARN_ON() in mana_ib_create_qp_rss()
Sashiko points out that the user can specify WQs sharing the same CQ as a
part of the uAPI and this will trigger the WARN_ON() then go on to corrupt
the kernel.
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:37 +0000 (13:17 -0300)]
RDMA/mana: Validate rx_hash_key_len
Sashiko points out that rx_hash_key_len comes from a uAPI structure and is
blindly passed to memcpy, allowing the userspace to trash kernel
memory. Bounds check it so the memcpy cannot overflow.
Jason Gunthorpe [Tue, 28 Apr 2026 16:17:36 +0000 (13:17 -0300)]
RDMA/mlx5: Add missing store/release for lock elision pattern
mlx5 has a common pattern implementing a device-global singleton resource
where it checks the resource pointer for !NULL and then skips obtaining
the lock.
This is not ordered properly as observing !NULL doesn't mean that all the
data under that pointer is also visible on this CPU when the lock is not
taken.
Use a release/acquire pairing to explicitly manage this.
ip6_gre: Use cached t->net in ip6erspan_changelink().
After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
ip6gre hash via link_net. ip6erspan_changelink() was not converted in
that series and still uses dev_net(dev), which diverges from the
device's creation netns after IFLA_NET_NS_FD migration.
This re-inserts the tunnel into the wrong per-netns hash. The
original netns keeps a stale entry. When that netns is later
destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
slab-use-after-free reported by KASAN, followed by a kernel BUG at
net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().
Reachable from an unprivileged user namespace (unshare --user
--map-root-user --net).
ip6gre_changelink() earlier in the same file already uses the cached
t->net; only ip6erspan_changelink() has the wrong shape.
====================
Replace direct dequeue call with qdisc_dequeue_peeked
When sfb and red qdiscs have children (eg qfq qdisc) whose peek() callback is
qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such
qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from
its child (red/sfb in this case), it will do the following:
1a. do a peek() - and when sensing there's an skb the child can offer, then
- the child in this case(red/sfb) calls its child's (qfq) peek.
qfq does the right thing and will return the gso_skb queue packet.
Note: if there wasnt a gso_skb entry then qfq will store it there.
1b. invoke a dequeue() on the child (red/sfb). And herein lies the problem.
- red/sfb will call the child's dequeue() which will essentially just
try to grab something of qfq's queue.
The right thing to do in #1b is to grab the skb off gso_skb queue.
This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked()
method instead.
Patch 1 fixes the issue for red qdisc. Patch 2 fixes it for sfb.
Patch 3 adds testcases for the two setups.
====================
Victor Nogueira [Thu, 30 Apr 2026 15:29:57 +0000 (11:29 -0400)]
selftests/tc-testing: Add tests that force red and sfb to dequeue from child's gso_skb
Create 4 test cases:
- Force red to dequeue from its child's gso_skb with qfq leaf
- Force sfb to dequeue from its child's gso_skb with qfq leaf
- Force red to dequeue from its child's gso_skb with dualpi2 leaf
- Force sfb to dequeue from its child's gso_skb with dualpi2 leaf
All of them have tbf followed by red (or sfb) followed by qfq (or
dualpi2). Since tbf calls its child's peek followed by
qdisc_dequeue_peeked, it will force red/sfb to call their child's peek.
In this case, since the child (qfq/dualpi2) has qdisc_peek_dequeued as
its peek callback, the packet will be stored in its gso_skb queue. During
the subsequent call to qdisc_dequeue_peeked, red/sfb will have to dequeue
from the child's gso_skb to retrieve the packet.
Not doing so will cause a NULL ptr deref which was happening before a
recent fix.
Victor Nogueria [Thu, 30 Apr 2026 15:29:56 +0000 (11:29 -0400)]
net/sched: sch_sfb: Replace direct dequeue call with peek and qdisc_dequeue_peeked
When sfb has children (eg qfq qdisc) whose peek() callback is
qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such
qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from
its child (sfb in this case), it will do the following:
1a. do a peek() - and when sensing there's an skb the child can offer, then
- the child in this case(sfb) calls its child's (qfq) peek.
qfq does the right thing and will return the gso_skb queue packet.
Note: if there wasnt a gso_skb entry then qfq will store it there.
1b. invoke a dequeue() on the child (sfb). And herein lies the problem.
- sfb will call the child's dequeue() which will essentially just
try to grab something of qfq's queue.
The right thing to do in #1b is to grab the skb off gso_skb queue.
This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked()
method instead.
Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Victor Nogueria <victor@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/sched: sch_red: Replace direct dequeue call with peek and qdisc_dequeue_peeked
When red qdisc has children (eg qfq qdisc) whose peek() callback is
qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such
qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from
its child (red in this case), it will do the following:
1a. do a peek() - and when sensing there's an skb the child can offer, then
- the child in this case(red) calls its child's (qfq) peek.
qfq does the right thing and will return the gso_skb queue packet.
Note: if there wasnt a gso_skb entry then qfq will store it there.
1b. invoke a dequeue() on the child (red). And herein lies the problem.
- red will call the child's dequeue() which will essentially just
try to grab something of qfq's queue.
The right thing to do in #1b is to grab the skb off gso_skb queue.
This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked()
method instead.
Fixes: 77be155cba4e ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.") Reported-by: Manas <ghandatmanas@gmail.com> Reported-by: Rakshit Awasthi <rakshitawasthi17@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
XGBE_PTP_ACT_CLK_FREQ and XGBE_V2_PTP_ACT_CLK_FREQ were 10x too
large (500MHz/1GHz instead of 50MHz/100MHz), causing the computed
addend to overflow the 32-bit tstamp_addend. In the general case
this would result in the clock advancing at the wrong rate. For v2
(PCI), ptpclk_rate is hardcoded to 125MHz, so the addend formula
(ACT_CLK_FREQ << 32) / ptpclk_rate yields exactly 8 * 2^32, and
when stored to the 32-bit tstamp_addend the value is zero. With
addend = 0 the hardware accumulator never overflows and the PTP
clock is fully stopped. For v1 (platform), ptpclk_rate is read from
ACPI/DT so the exact overflow behavior depends on the
firmware-reported frequency.
Define the constants as NSEC_PER_SEC / SSINC so the relationship is
explicit and cannot drift out of sync.
If the driver is used in a non tdm mode priv->utdm is a NULL pointer.
Therefore we need to check this pointer first before checking si_regs.
Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC") Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unmapping of uf_regs is done from ucc_fast_free and doesn't need to be
done explicitly. If already unmapped ucc_fast_free will crash.
Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC") Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Chaignon [Sat, 2 May 2026 10:12:40 +0000 (12:12 +0200)]
tools/headers: Regenerate stddef.h to fix BPF selftests
With commit dacbfc167808 ("crypto: af_alg - Annotate struct af_alg_iv
with __counted_by"), two selftests, test_tag and crypto_sanity, now
indirectly rely on the __counted_by macro. On systems with commit dacbfc167808 in the installed UAPI headers, the selftests build fails
with:
In file included from tools/testing/selftests/bpf/prog_tests/crypto_sanity.c:7:
/usr/include/linux/if_alg.h:45:22: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘__counted_by’
45 | __u8 iv[] __counted_by(ivlen);
| ^~~~~~~~~~~~
This patch fixes it by regenerating stddef.h in tools/include using the
instructions from commit a778f5d46b62 ("tools/headers: Pull in stddef.h
to uapi to fix BPF selftests build in CI").
riscv: mm: Fixup no5lvl failure when vaddr is invalid
Unlike no4lvl, no5lvl still continues to detect satp, which
requires va=pa mapping. When pa=0x800000000000, no5lvl
would fail in Sv48 mode due to an illegal VA value of
0x800000000000.
So, prevent detecting the satp flow for no5lvl, when
vaddr is invalid. Add the is_vaddr_valid() function for
checking.
Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Tested-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Link: https://patch.msgid.link/20260125055212.433163-1-guoren@kernel.org
[pjw@kernel.org: cleaned up commit message] Signed-off-by: Paul Walmsley <pjw@kernel.org>
Michael Neuling [Fri, 1 May 2026 06:23:20 +0000 (06:23 +0000)]
riscv: Fix register corruption from uninitialized cregs on error
compat_riscv_gpr_set() calls cregs_to_regs() unconditionally, even when
user_regset_copyin() fails. Since cregs is an uninitialized stack
variable, a copyin failure causes uninitialized stack data to be written
into the target task's pt_regs, corrupting its register state and
potentially leaking kernel stack contents.
compat_restore_sigcontext() has the same issue: it calls cregs_to_regs()
even when __copy_from_user() fails, leading to the same corruption of
the signal-returning task's register state on error.
Only call cregs_to_regs() when the user copy succeeds.
smb_inherit_dacl() walks the parent directory DACL loaded from the
security descriptor xattr. It verifies that each ACE contains the fixed
SID header before using it, but does not verify that the variable-length
SID described by sid.num_subauth is fully contained in the ACE.
A malformed inheritable ACE can advertise more subauthorities than are
present in the ACE. compare_sids() may then read past the ACE.
smb_set_ace() also clamps the copied destination SID, but used the
unchecked source SID count to compute the inherited ACE size. That could
advance the temporary inherited ACE buffer pointer and nt_size accounting
past the allocated buffer.
Fix this by validating the parent ACE SID count and SID length before
using the SID during inheritance. Compute the inherited ACE size from the
copied SID so the size matches the bounded destination SID. Reject the
inherited DACL if size accumulation would overflow smb_acl.size or the
security descriptor allocation size.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Shota Zaizen <s@zaizen.me> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
ksmbd: fix kernel-doc warnings from ksmbd_conn_get/put()
The kernel test robot reported W=1 build warnings for ksmbd_conn_get()
and ksmbd_conn_put() due to missing parameter descriptions.
Add the @conn description to fix these warnings.
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Shuhao Fu [Wed, 29 Apr 2026 08:59:56 +0000 (16:59 +0800)]
ksmbd: fail share config requests when path allocation fails
Non-pipe shares must have a duplicated backing path before they can be
published. share_config_request() currently calls kstrndup() for that
path, but if the allocation fails it leaves ret unchanged. If veto list
parsing succeeds and share->name exists, the partially built share is
still inserted into the global share table with share->path left NULL.
A later share-root SMB2 create uses tree_conn->share_conf->path as the
lookup root. If the share was published with path == NULL, that request
passes a NULL pathname into do_getname_kernel()/strlen() and can crash
the ksmbd worker.
Set ret = -ENOMEM when path duplication fails so the incomplete share is
destroyed before publication.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
DaeMyung Kang [Tue, 28 Apr 2026 14:08:56 +0000 (23:08 +0900)]
ksmbd: close durable scavenger races against m_fp_list lookups
ksmbd_durable_scavenger() has two related races against any walker
that iterates f_ci->m_fp_list, including ksmbd_lookup_fd_inode()
(used by ksmbd_vfs_rename) and the share-mode checks in
fs/smb/server/smb_common.c.
(1) fp->node list-head reuse. Durable-preserved handles can remain
linked on f_ci->m_fp_list after session teardown so share-mode checks
still see them while the handle is reconnectable. The scavenger
collected expired handles by adding fp->node to a local
scavenger_list after removing them from the global durable idr.
Because fp->node is the same list_head used by m_fp_list,
list_add(&fp->node, &scavenger_list) overwrites the m_fp_list links
and corrupts both lists. CONFIG_DEBUG_LIST can report this on the
share-mode walk path.
(2) Refcount race against m_fp_list walkers. The scavenger qualifies
an expired durable handle with atomic_read(&fp->refcount) > 1 and
fp->conn under global_ft.lock, removes fp from global_ft, then drops
global_ft.lock before unlinking fp from m_fp_list and freeing it.
During that gap fp is still linked on m_fp_list with f_state ==
FP_INITED. ksmbd_lookup_fd_inode() under m_lock read calls
ksmbd_fp_get() (atomic_inc_not_zero on refcount that is still 1) and
takes a live reference; the scavenger then unlinks and frees fp
while the holder owns a reference, leading to UAF on the holder's
subsequent ksmbd_fd_put() and on any field reads performed by a
concurrent share-mode walker that iterates m_fp_list without taking
ksmbd_fp_get() (smb_check_perm_dleases-like paths).
Fix both:
* Stop reusing fp->node as a scavenger-private list node. Remove
one expired handle from global_ft under global_ft.lock, take an
explicit transient reference, drop the lock, unlink fp->node
from m_fp_list under f_ci->m_lock, then drop both the durable
lifetime and transient references with atomic_sub_and_test(2,
&fp->refcount). If the scavenger is the last putter the close
runs there; otherwise an in-flight holder that already raced
through the m_fp_list lookup owns the final close via its
ksmbd_fd_put() path. The one-at-a-time disposal can rescan the
durable idr when multiple handles expire in the same pass, but
durable scavenging is a background expiration path and the final
full scan recomputes min_timeout before the next wait.
* Clear fp->persistent_id inside __ksmbd_remove_durable_fd() right
after idr_remove(), so a delayed final close from a holder that
snatched fp does not re-issue idr_remove() on a persistent id
that idr_alloc_cyclic() in ksmbd_open_durable_fd() may have
already handed out to a brand-new durable handle.
* Bypass the per-conn open_files_count decrement in
__put_fd_final() when fp is detached from any session table
(fp->conn cleared by session_fd_check() at durable preserve --
paired with the volatile_id clear at unpublish, so checking
fp->conn alone is sufficient). The walker that owns the final
close runs from an unrelated work->conn whose
stats.open_files_count never tracked this durable fp; without
this guard the holder would underflow that unrelated counter.
The two races are folded into one patch because patch (1) alone
cleans up the corrupted list but leaves a deterministic UAF window
for m_fp_list walkers that the transient-reference and
persistent_id discipline in (2) close; bisecting onto an
intermediate state would land on a UAF that pre-patch chaos merely
made less reproducible.
Validation:
* CONFIG_DEBUG_LIST coverage for the list_head reuse path.
* KASAN-enabled direct SMB2 durable-handle coverage that exercised
ksmbd_durable_scavenger() and non-NULL ksmbd_lookup_fd_inode()
returns while durable handles expired under concurrent rename
lookups, with no KASAN, UAF, list-corruption, ODEBUG, or WARNING
reports.
* checkpatch --strict
* make -j$(nproc) M=fs/smb/server
Fixes: d484d621d40f ("ksmbd: add durable scavenger timer") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
DaeMyung Kang [Tue, 28 Apr 2026 14:08:55 +0000 (23:08 +0900)]
ksmbd: harden file lifetime during session teardown
__close_file_table_ids() is the per-session teardown that closes every
fp belonging to a session (or to one tree connect on that session) by
walking the session's volatile-id idr. The current loop has three
related problems on busy or racing workloads:
* Sleeping under ft->lock. The session-teardown skip callback,
session_fd_check(), already sleeps in ksmbd_vfs_copy_durable_owner()
-> kstrdup(GFP_KERNEL) and down_write(&fp->f_ci->m_lock) (a
rw_semaphore). Running the callback inside write_lock(&ft->lock)
trips CONFIG_DEBUG_ATOMIC_SLEEP / CONFIG_PROVE_LOCKING on a
durable-fd workload.
* Refcount accounting blind to f_state. The unconditional
atomic_dec_and_test(&fp->refcount) does not distinguish
FP_INITED (idr-owned reference still intact) from FP_CLOSED (an
earlier ksmbd_close_fd() already consumed the idr-owned reference
while leaving fp in the idr because a holder kept refcount
non-zero). When the latter races with teardown the same path
over-decrements into a holder reference and ksmbd_fd_put() later
UAFs that holder.
* FP_NEW window. Between __open_id() publishing fp into the
session idr and ksmbd_update_fstate(..., FP_INITED) committing the
transition at the end of smb2_open(), an fp is in FP_NEW and an
intervening teardown that takes a transient reference and
unpublishes the volatile id leaves the original idr-owned
reference orphaned -- the opener is unaware that fp has been
unpublished, returns success to the client, and the fp leaks at
refcount = 1.
Refactor __close_file_table_ids() to take a transient reference on fp
and unpublish fp from the session idr *under ft->lock* before calling
skip() outside the lock. A transient ref protects lifetime but not
concurrent field mutation, so the idr_remove() is what keeps
__ksmbd_lookup_fd() through this session's idr from granting a new
ksmbd_fp_get() reference to an fp whose fp->conn / fp->tcon /
fp->volatile_id / op->conn / lock_list links are about to be rewritten
by session_fd_check(). Durable reconnect is unaffected because it
reaches fp through the global durable table (ksmbd_lookup_durable_fd
-> global_ft).
Decide n_to_drop together with any FP_INITED -> FP_CLOSED transition
under ft->lock so teardown and ksmbd_close_fd() never both consume the
idr-owned reference. See ksmbd_mark_fp_closed() for the per-state
accounting. For the FP_NEW path to be safe, the opener has to learn
that fp was unpublished: ksmbd_update_fstate() now returns -ENOENT
when an FP_NEW -> FP_INITED transition finds f_state already advanced
or the volatile id cleared (both committed by teardown under
ft->lock); smb2_open() propagates that as STATUS_OBJECT_NAME_INVALID
and drops the original reference via ksmbd_fd_put().
The list removal cannot be left for a deferred final putter because
fp->volatile_id has already been cleared and __ksmbd_remove_fd() will
intentionally skip both idr_remove() and list_del_init(). Move the
m_fp_list unlink in __ksmbd_remove_fd() above the volatile-id check so
that an FP_NEW fp that happened to be added to m_fp_list (smb2_open()
adds fp->node before ksmbd_update_fstate() runs) is still cleaned up
on the deferred putter path; list_del_init() on an empty node is a
no-op and remains safe for fps that were never added.
Add a defensive guard in session_fd_check() that refuses non-FP_INITED
fps so that even if a teardown reaches an FP_NEW fp it falls into the
close branch (where the n_to_drop = 1 accounting keeps the opener's
reference alive) instead of the durable-preserve branch (which mutates
fp->conn / fp->tcon).
Validation on a debug kernel additionally built with CONFIG_DEBUG_LIST
and CONFIG_DEBUG_OBJECTS_WORK used a same-session two-tcon workload
(open/write storm on one tcon, 50 tree disconnects on the other) and
reported no list-corruption, work_struct ODEBUG, sleep-in-atomic,
lockdep or kmemleak reports. Reverting only the
__close_file_table_ids() hunk while keeping a forced-is_reconnectable()
harness produced the expected sleep-in-atomic at vfs_cache.c:1095,
confirming the ft->lock-out-of-sleepable-skip discipline.
KASAN-enabled direct SMB2 coverage with durable handles enabled
exercised ksmbd_close_tree_conn_fds(), ksmbd_close_session_fds(),
the FP_NEW failure path, tree_conn_fd_check(), and a non-zero
session_fd_check() durable-preserve return. This produced no KASAN,
DEBUG_LIST, ODEBUG, or WARNING reports.
Fixes: f44158485826 ("cifsd: add file operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
end the conn with a bare kfree(), skipping
ida_destroy(&conn->async_ida) and
conn->transport->ops->free_transport(conn->transport). Whenever one
of them is the last putter, the embedded async_ida and the entire
transport struct leak -- for TCP, that is also the struct socket and
the kvec iov.
__free_opinfo() being a final putter is not theoretical. opinfo_put()
queues the callback via call_rcu(&opinfo->rcu, free_opinfo_rcu), so
ksmbd_server_terminate_conn() can deposit N opinfo releases in RCU and
have ksmbd_conn_free() run in the handler thread before any of them
fire. ksmbd_conn_free() then observes refcnt > 0 and short-circuits;
the last RCU-delivered __free_opinfo() falls onto its bare kfree(conn)
branch and the transport is lost.
A/B validation in a QEMU/virtme guest, mounting //127.0.0.1/testshare:
each iteration holds 8 files open via sleep processes, force-closes
TCP with "ss -K sport = :445", kills the holders, lazy-umounts;
repeated 10 times, then ksmbd shutdown and kmemleak scan.
Pre-patch conn_free=20 with tcp_free=10 directly demonstrates the
bare-kfree paths skipping transport cleanup; kmemleak backtraces point
into struct tcp_transport / iov. With this patch tcp_free matches
conn_free at 20/20 and kmemleak is clean.
Move the per-struct final release into __ksmbd_conn_release_work() and
route the three bare-kfree final-put sites through a new
ksmbd_conn_put(). Those sites now pair ida_destroy() and
free_transport() with kfree(conn) regardless of which holder happens
to release the last reference. stop_sessions() only triggers the
transport shutdown and does not itself drop the last conn reference,
so it is unaffected.
The centralized release reaches sock_release() -> tcp_close() ->
lock_sock_nested() (might_sleep) from every final putter, including
__free_opinfo() invoked from an RCU softirq callback, which trips
CONFIG_DEBUG_ATOMIC_SLEEP. Defer the release to a dedicated
ksmbd_conn_wq workqueue so ksmbd_conn_put() is safe from any
non-sleeping context.
Make ksmbd_file own a strong connection reference while fp->conn is
non-NULL so durable-preserve and final-close paths cannot dereference
a stale connection. ksmbd_open_fd() and ksmbd_reopen_durable_fd()
take the reference via ksmbd_conn_get() (the latter also reorders the
fp->conn / fp->tcon assignments before __open_id() so the published fp
is never observed with fp->conn == NULL); session_fd_check() and
__ksmbd_close_fd() drop it via ksmbd_conn_put(). With that invariant,
session_fd_check() can take a local conn pointer once and use it
across the m_op_list and lock_list iterations even though op->conn
puts may otherwise drop the last reference.
At module exit the workqueue is flushed and destroyed after
rcu_barrier(), so any release queued by a trailing RCU callback is
drained before the inode hash and module text go away.
Fixes: ee426bfb9d09 ("ksmbd: add refcnt to ksmbd_conn struct") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
====================
ipv6: fix ECMP route failover on carrier loss
This patchset resolves an issue where established IPv6 connections are
unable to transition to alternative ECMP nexthops upon carrier loss.
Unlike IPv4, the IPv6 routing subsystem does not actively invalidate
cached destinations during a NETDEV_CHANGE event. Sockets persist
with dead routes, leading to stalled traffic or connection drops.
This series introduces a fix to trigger route invalidation by
updating the route serial number on link carrier loss and provides
a corresponding selftest to validate the failover behavior for IPv4
and IPv6.
====================
When using IPv6 ECMP routes, if a netdev listed as a nexthop experiences
a carrier change event (e.g., a bond device generating a NETDEV_CHANGE
event after its slaves go linkdown), established connections utilizing
that nexthop fail to fail over to other available nexthops. Instead,
these connections stall or drop.
This happens because the IPv6 FIB code does not invalidate the socket's
cached destination when a NETDEV_CHANGE event occurs. While
fib6_ifdown() correctly marks the nexthop with RTNH_F_LINKDOWN, it
leaves the route's serial number unchanged. As a result, sockets with a
previously cached dst do not realize the route is no longer viable and
continue to try using the non-functional nexthop.
This behavior contrasts with IPv4, which actively flushes cached
destinations on a NETDEV_CHANGE event (see fib_netdev_event() in
net/ipv4/fib_frontend.c).
Fix this by updating the route serial number in fib6_ifdown() when
setting RTNH_F_LINKDOWN. This invalidates stale cached destinations,
forcing sockets to perform a new route lookup and fail over to a
functioning nexthop.
Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: Sagarika Sharma <sharmasagarika@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430200909.527827-2-sharmasagarika@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Cheema [Wed, 29 Apr 2026 17:57:39 +0000 (18:57 +0100)]
net: usb: cdc_ncm: add Apple Mac USB-C direct networking quirk
Apple Silicon Macs expose two CDC NCM "private" data interfaces over
USB-C with VID:PID 0x05ac:0x1905 and product string "Mac". This is the
same protocol Apple already ships on iPhone (0x05ac:0x12a8) and iPad
(0x05ac:0x12ab) for RemoteXPC since iOS 17 -- both data interfaces lack
an interrupt status endpoint, so they rely on the FLAG_LINK_INTR-
conditional bind path introduced in commit 3ec8d7572a69 ("CDC-NCM: add
support for Apple's private interface").
The id_table currently has entries for iPhone and iPad but not for the
Mac. Without a match, cdc_ncm falls through to the generic CDC NCM
class-match entry, which uses the FLAG_LINK_INTR-having cdc_ncm_info
struct, so bind_common() fails on the missing status endpoint and no
netdev appears.
Add id_table entries for both interface numbers (0 and 2) of the Mac,
bound to the existing apple_private_interface_info driver_info.
Verified empirically on a Mac Studio M3 Ultra running macOS 26.5: when
a Mac is connected via USB-C, ioreg shows VID 0x05ac, PID 0x1905,
product string "Mac", with two NCM data interfaces at numbers 0 and 2.
The same PID is presented by all current Apple Silicon Mac models
(MacBook Pro/Air, Mac mini, Mac Studio across the M-series), mirroring
Apple's single-PID-per-family pattern from iPhone/iPad.
After this patch, plugging a Mac into a Linux host running the patched
kernel produces two enx... interfaces (one per data interface),
"ip -br link" lists them as UP, and standard userspace networking
(DHCP, NetworkManager shared mode, etc.) works without any modprobe
overrides or out-of-tree modules.
On Ethernet devices (the overwhelming majority of SR-IOV NICs)
dev->addr_len is 6, so only the first 6 bytes of broadcast[] are
written. The remaining 26 bytes retain whatever was previously on
the kernel stack. The full struct is then handed to userspace via:
leaking up to 26 bytes of uninitialised kernel stack per VF per
RTM_GETLINK request, repeatable.
The other vf_* structs in the same function are explicitly zeroed
for exactly this reason - see the memset() calls for ivi,
vf_vlan_info, node_guid and port_guid a few lines above.
vf_broadcast was simply missed when it was added.
Reachability: any unprivileged local process can open AF_NETLINK /
NETLINK_ROUTE without capabilities and send RTM_GETLINK with an
IFLA_EXT_MASK attribute carrying RTEXT_FILTER_VF. The kernel walks
each VF and emits IFLA_VF_BROADCAST, leaking 26 bytes of stack per
VF per request. Stack residue at this call site can include return
addresses and transient sensitive data; KASAN with stack
instrumentation, or KMSAN, will flag the nla_put() when reproduced.
Zero the on-stack struct before the partial memcpy, matching the
existing pattern used for the other vf_* structs in the same
function.
Eric Dumazet [Thu, 30 Apr 2026 07:06:11 +0000 (07:06 +0000)]
ipmr: prevent info-leak in pmr_cache_report()
Yiming Qian reported:
<quote>
ipmr_cache_report()` allocates a report skb with `alloc_skb(128,
GFP_ATOMIC)` and appends a `struct igmphdr` using `skb_put()`. In the
non-`IGMPMSG_WHOLEPKT` path it initializes only:
- `igmp->type`
- `igmp->code`
but does not initialize:
- `igmp->csum`
- `igmp->group`
Later, `igmpmsg_netlink_event()` copies the bytes after `sizeof(struct
igmpmsg)` into the `IPMRA_CREPORT_PKT` netlink attribute and emits
`RTM_NEWCACHEREPORT` on `RTNLGRP_IPV4_MROUTE_R`.
As a result, 6 bytes of stale heap data from the skb head are
disclosed to userspace.
</quote>
Let's use skb_put_zero() instead of skb_put() to fix this bug.
Linus Torvalds [Fri, 1 May 2026 23:56:08 +0000 (16:56 -0700)]
Merge tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Fixes for rc2, the usual amdgpu/xe double header, I think xe had a
couple of weeks combined due to some maintainer access issues,
otherwise there's just a few misc fixes and documentation fixups.
core and helpers:
- calculate framebuffer geometry with format helpers
- fix docs
amdgpu:
- GFX12 fix for CONFIG_DRM_DEBUG_MM configs
- Fix DC analog support
- Userq fixes
- GART placement fix
- Aldebaran SMU fixes
- AMDGPU_INFO_READ_MMR_REG fix
- UVD 3.1 fix
- GC 6 TCC fix
- Fix root reservation in amdgpu_vm_handle_fault()
- RAS fix
- Module reload fix for APUs
- Fix build for CONFIG_DRM_FBDEV_EMULATION=n
- IGT DWB regression fix
- GC 11.5.4 fix
- VCN user fence fixes
- JPEG user fence fixes
- SMU 13.0.6 fix
- VCN 3/4 IB parser fixes
- NV3x+ dGPU vblank fix
- DCE6/8 fixes for LVDS/eDP panels without an EDID
amdkfd:
- Fix for when CONFIG_HSA_AMD is not set
- SVM fixes
xe:
- uapi: Add missing pad and extensions check
- uapi: Reject unsafe PAT indices for CPU cached memory
- Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge
- Xe3p tuning and workaround fixes
- USE drm mm instead of drm SA for CCS read/write
- Fix leaks and null derefs
- Fix Wa_18022495364
appletbdrm:
- allocate protocol buffers with kvzalloc()
dma-buf:
- fix docs
imagination:
- avoid segfault in debugfs
ofdrm:
- put PCI device reference on errors
udl:
- increase USB timeout"
* tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernel: (77 commits)
drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR
drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise
drm/xe/xelp: Fix Wa_18022495364
drm/xe/gsc: Fix BO leak on error in query_compatibility_version()
drm/xe/eustall: Fix drm_dev_put called before stream disable in close
drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()
drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import()
drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure
drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked()
drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked()
drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked
drm/xe/vf: Use drm mm instead of drm sa for CCS read/write
drm/xe: Add memory pool with shadow support
drm/xe/debugfs: Correct printing of register whitelist ranges
drm/xe: Mark ROW_CHICKEN5 as a masked register
drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL
drm/xe/xe3p_lpg: Add missing indirect ring state feature flag
drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138
drm/xe/vm: Add missing pad and extensions check
drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()
...
The TRENDnet TUC-ET2G V2.0 is an RTL8156B based 2.5G Ethernet controller.
Add the vendor and product ID values to the driver. This makes Ethernet
work with the adapter.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Birger Koblitz <mail@birger-koblitz.de> Link: https://patch.msgid.link/20260430213435.21821-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 1 May 2026 23:45:41 +0000 (16:45 -0700)]
Merge tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Replace skb_try_make_writable() by skb_ensure_writable() in
nft_fwd_netdev and the flowtable to deal with uncloned packets
having their network header in paged fragments.
2) Drop packet if output device does not exist and ensure sufficient
headroom in nft_fwd_netdev before transmitting the skb.
3) Use the existing dup recursion counter in nft_fwd_netdev for the
neigh_xmit variant, from Weiming Shi.
4) Add .check_hooks interface to x_tables to detach the control plane
hook check based on the match/target configuration. Then, update
nft_compat to use .check_hooks from .validate path, this fixes a
lack of hook validation for several match/targets.
5) Fix incorrect .usersize in xt_CT, from Florian Westphal.
6) Fix a memleak with netdev tables in dormant state,
from Florian Westphal.
7) Several patches to check if the packet is a fragment, then skip
layer 4 inspection, for x_tables and nf_tables; as well as common
nf_socket infrastructure. The xt_hashlimit match drops fragments
to stay consistent with the existing approach when failing to parse
the layer 4 protocol header.
8) Ensure sufficient headroom in the flowtable before transmitting
the skb.
9) Fix the flowtable inline vlan approach for double-tagged vlan:
Reverse the iteration over .encap[] since it represents the
encapsulation as seen from the ingress path. Postpone pushing
layer 2 header so output device is available to calculate needed
headroom. Finally, add and use nf_flow_vlan_push() to fix it.
10) Fix flowtable inline pppoe with GSO packets. Moreover, use
FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware
address since neighbour cache does not exist in pppoe.
11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for
double-tagged vlan in particular this should provide some benefits
in certain scenarios.
More notes regarding 9-11):
- sashiko is also signalling to use it for IPIP headers, but that needs
more adjustments such setting skb->protocol after removing the IPIP
header, will follow up in a separated patch.
- I plan to submit selftests to cover double-tagged-vlan. As for pppoe,
it should be possible but that would mandate a few userspace dependencies.
This has been semi-automatically tested by me and reporters describing
broken double-vlan-tagged and pppoe currently in the flowtable.
* tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header
netfilter: flowtable: fix inline pppoe encapsulation in xmit path
netfilter: flowtable: fix inline vlan encapsulation in xmit path
netfilter: flowtable: ensure sufficient headroom in xmit path
netfilter: xtables: fix L4 header parsing for non-first fragments
netfilter: nf_tables: skip L4 header parsing for non-first fragments
netfilter: nf_socket: skip socket lookup for non-first fragments
netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables
netfilter: xt_CT: fix usersize for v1 and v2 revision
netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate
netfilter: x_tables: add .check_hooks to matches and targets
netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
netfilter: replace skb_try_make_writable() by skb_ensure_writable()
====================
Linus Torvalds [Fri, 1 May 2026 23:32:42 +0000 (16:32 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Avoid writing an uninitialised stack variable to POR_EL0 on sigreturn
if the poe_context record is absent
- Reserve one more page for the early 4K-page kernel mapping to cover
the extra [_text, _stext) split introduced by the non-executable
read-only mapping
- Force the arch_local_irq_*() wrappers to be __always_inline so that
noinstr entry and idle paths cannot call out-of-line, instrumentable
copies
- Fix potential sign extension in the arm64 SCS unwinder's DWARF
advance_loc4 decoding
- Tolerate arm64 ACPI platforms with only WFI and no deeper PSCI idle
states, restoring cpuidle registration on such systems
- Include the UAPI <asm/ptrace.h> header in the arm64 GCS libc test
rather than carrying a duplicate struct user_gcs definition (the
original #ifdef NT_ARM_GCS was wrong to cover the structure
definition as it would be masked out if the toolchain defined it)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: Preserve POR_EL0 if poe_context is missing
arm64: Reserve an extra page for early kernel mapping
kselftest/arm64: Include <asm/ptrace.h> for user_gcs definition
ACPI: arm64: cpuidle: Tolerate platforms with no deep PSCI idle states
arm64/irqflags: __always_inline the arch_local_irq_*() helpers
arm64/scs: Fix potential sign extension issue of advance_loc4
Yi Kuo [Wed, 29 Apr 2026 10:00:11 +0000 (18:00 +0800)]
smb: smbdirect: fix MR registration for coalesced SG lists
ib_dma_map_sg() modifies the provided scatterlist and returns the
number of mapped entries, which can be fewer than the requested
mr->sgt.nents if the DMA controller coalesces contiguous memory
segments. Passing the original, uncoalesced count to ib_map_mr_sg()
causes memory registration failures if coalescing actually occurs.
Capture the actual mapped count returned by ib_dma_map_sg() and pass it
to ib_map_mr_sg() to ensure correct MR registration.
Also update the ib_dma_map_sg() error logging to drop the error
pointer formatting, since the return value is an integer count
rather than an error code.
Ensure a proper error code (-EIO) is assigned when DMA mapping or
MR registration fails.
Fixes: de5ef8ec3c46 ("smb: smbdirect: introduce smbdirect_mr.c with client mr code") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221408 Reviewed-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Yi Kuo <yi@yikuo.dev> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: smbdirect: introduce and use include/linux/smbdirect.h
This makes it easier to rebuild cifs.ko and ksmbd.ko against
a running kernel.
Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/ Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: smbdirect: make use of DEFAULT_SYMBOL_NAMESPACE and EXPORT_SYMBOL_GPL
This is a better solution than
EXPORT_SYMBOL_FOR_MODULES(__sym, "cifs,ksmbd") as it makes
it possible to rebuild smbdirect.ko against a
running kernel and then load the existing cifs.ko and ksmbd.ko
from the running kernel.
Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/ Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
accel/qaic: fix incorrect counter check in RAS message decode
The UE and UE_NF cases check ce_count against UINT_MAX before incrementing
their respective counters. This is logically incorrect and prevents
ue_count and ue_nf_count from incrementing when ce_count reaches UINT_MAX.
Fixes: c11a50b170e7 ("accel/qaic: Add Reliability, Accessibility, Serviceability (RAS)") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patch.msgid.link/20260410112015.592546-1-alok.a.tiwari@oracle.com
Linus Torvalds [Fri, 1 May 2026 20:19:14 +0000 (13:19 -0700)]
Merge tag 'selinux-pr-20260501' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Ensure SELinux is always properly accessing its own sock LSM state
- Only reserve an xattr slot for SELinux if it will be used
- Fix a SELinux auditing regression in the directory avdcache
* tag 'selinux-pr-20260501' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix avdcache auditing
selinux: don't reserve xattr slot when we won't fill it
selinux: use sk blob accessor in socket permission helpers
Davidlohr Bueso [Fri, 1 May 2026 19:41:23 +0000 (12:41 -0700)]
futex: Drop CLONE_THREAD requirement for private default hash alloc
Currently need_futex_hash_allocate_default() depends on strict pthread
semantics, abusing CLONE_THREAD. This breaks the non-concurrency
assumptions when doing the mm->futex_ref pcpu allocations, leading to
bugs[0] when sharing the mm in other ways; ie:
BUG: KASAN: slab-use-after-free in futex_hash_put
... where the +1 bias can end up on a percpu counter that mm->futex_ref
no longer points at.
Loosen the check to cover any CLONE_VM clone, except vfork(). Excluding
vfork keeps the existing paths untouched (no overhead), and we can't
race in the first place: either the parent is suspended and the child
runs alone, or mm->futex_ref is already allocated from an earlier
CLONE_VM.
Linus Torvalds [Fri, 1 May 2026 19:58:02 +0000 (12:58 -0700)]
Merge tag 's390-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Reject zero-length writes from userspace that corrupt Debug Facility
buffers
- Replace one s390 PCI maintainer
- Remove SCLP_OFB Kconfig option and enable the guarded code
unconditionally
- Replace incorrect use of phys_to_folio() to virt_to_folio() in
do_secure_storage_access()
* tag 's390-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: Fix phys_to_folio() usage in do_secure_storage_access()
s390/sclp: Remove SCLP_OFB Kconfig option
MAINTAINERS: Replace one of the maintainers for s390/pci
s390/debug: Reject zero-length input in debug_input_flush_fn()
s390/debug: Reject zero-length input before trimming a newline
Mark Brown [Thu, 23 Apr 2026 19:17:45 +0000 (20:17 +0100)]
selftests/rseq: Don't run tests with runner scripts outside of the scripts
The rseq selftests include two runner scripts run_param_test.sh and
run_syscall_errors_test.sh which set up the environment for test binaries
and run them with various parameters. Currently we list these test binaries
in TEST_GEN_PROGS but this results in the kselftest framework running them
directly as well as via the runners, resulting in duplication and spurious
failures when the environment is not correctly set up (eg, if glibc tries
to use rseq).
Move the binaries the runners invoke to TEST_GEN_PROGS_EXTENDED, binaries
listed there are built but not run by the framework. The param_test
benchmarks are not moved since they are not run by run_param_test.sh.
Fixes: 830969e7821a ("selftests/rseq: Implement time slice extension test") Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260423-selftests-rseq-use-runner-v1-1-e13a133754c1@kernel.org Cc: stable@vger.kernel.org
Linus Torvalds [Fri, 1 May 2026 18:26:15 +0000 (11:26 -0700)]
Merge tag 'block-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- MD pull request via Yu:
- Fix a raid5 UAF on IO across the reshape position
- Avoid failing RAID1/RAID10 devices for invalid IO errors
- Fix RAID10 divide-by-zero when far_copies is zero
- Restore bitmap grow through sysfs
- Use mddev_is_dm() instead of open-coding gendisk checks
- Use ATTRIBUTE_GROUPS() for md default sysfs attributes
- Replace open-coded wait loops with wait_event helpers
- NVMe pull request via Keith:
- Target data transfer size configuation (Aurelien)
- Enable P2P for RDMA (Shivaji Kant)
- TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar)
- TCP host updates (Alistair, Chaitanya)
- Authentication updates (Alistair, Daniel, Chris Leech)
- Multipath fixes (John Garry)
- New quirks (Alan Cui, Tao Jiang)
- Apple driver fix (Fedor Pchelkin)
- PCI admin doorbell update fix (Keith)
- Properly propagate CDROM read-only state to the block layer
* tag 'block-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (35 commits)
md: use ATTRIBUTE_GROUPS() for md default sysfs attributes
md: use mddev_is_dm() instead of open-coding gendisk checks
md/raid1: replace wait loop with wait_event_idle() in raid1_write_request()
md/md-bitmap: add a none backend for bitmap grow
md/md-bitmap: split bitmap sysfs groups
md: factor bitmap creation away from sysfs handling
md: use mddev_lock_nointr() in mddev_suspend_and_lock_nointr()
md: replace wait loop with wait_event() in md_handle_request()
md/raid10: fix divide-by-zero in setup_geo() with zero far_copies
md/raid1,raid10: don't fail devices for invalid IO errors
MAINTAINERS: Add Xiao Ni as md/raid reviewer
md/raid5: Fix UAF on IO across the reshape position
cdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro()
nvme-auth: Hash DH shared secret to create session key
nvme-pci: fix missed admin queue sq doorbell write
nvme-auth: Include SC_C in RVAL controller hash
nvme-tcp: teardown circular locking fixes
nvmet-tcp: Don't clear tls_key when freeing sq
Revert "nvmet-tcp: Don't free SQ on authentication success"
nvme: skip trace completion for host path errors
...
Linus Torvalds [Fri, 1 May 2026 18:01:31 +0000 (11:01 -0700)]
Merge tag 'io_uring-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Remove dead struct io_buffer_list member
- Fix for incrementally consumed buffers with recvmsg multishot, which
requires a minimum value left in a buffer for any receive for the
headers. If there's still a bit of buffer left but it's smaller than
that value, then userspace will see a spurious -EFAULT returned in
the CQE
- Locking fix for the DEFER_TASKRUN retry list, which otherwise could
race with fallback cancelations. If the task is exiting with
task_work left in both the normal and retry list AND the exit cleanup
races with the task running task work, then entries could either be
doubly completed or lost
- Cap NAPI busy poll timeout to something sane, to avoid syzbot running
into excessive polling and triggering warnings around that
* tag 'io_uring-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/tw: serialize ctx->retry_llist with ->uring_lock
io_uring/napi: cap busy_poll_to 10 msec
io_uring/kbuf: support min length left for incremental buffers
io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member
Linus Torvalds [Fri, 1 May 2026 16:51:38 +0000 (09:51 -0700)]
Merge tag 'spi-fix-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There are a couple of nasty issues fixed here in the axiado and
rockchip drivers. We've also got more of the fixes from Johan here,
this time for the two Cadence drivers, plus a couple of other similar
fixes from John and Felix"
* tag 'spi-fix-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: amlogic-spisg: initialize completion before requesting IRQ
spi: axiado: replace usleep_range() with udelay() in IRQ path
spi: cadence-quadspi: fix runtime pm and clock imbalance on unbind
spi: cadence-quadspi: fix unclocked access on unbind
spi: cadence-quadspi: fix clock imbalance on probe failure
spi: cadence-quadspi: fix runtime pm disable imbalance on probe failure
spi: cadence: fix clock imbalance on probe failure
spi: cadence: fix unclocked access on unbind
spi: rockchip: Drop unused and broken CR0 macros
spi: rockchip: Read ISR, not IMR, to detect cs-inactive IRQ
spi: rzv2h-rspi: Fix silent failure in clock setup error path
Kevin Brodsky [Mon, 27 Apr 2026 12:03:33 +0000 (13:03 +0100)]
arm64: signal: Preserve POR_EL0 if poe_context is missing
Commit 2e8a1acea859 ("arm64: signal: Improve POR_EL0 handling to
avoid uaccess failures") delayed the write to POR_EL0 in
rt_sigreturn to avoid spurious uaccess failures. This change however
relies on the poe_context frame record being present: on a system
supporting POE, calling sigreturn without a poe_context record now
results in writing arbitrary data from the kernel stack into POR_EL0.
Fix this by adding a __valid_fields member to struct
user_access_state, and zeroing the struct on allocation.
restore_poe_context() then indicates that the por_el0 field is valid
by setting the corresponding bit in __valid_fields, and
restore_user_access_state() only touches POR_EL0 if there is a valid
value to set it to. This is in line with how POR_EL0 was originally
handled; all frame records are currently optional, except
fpsimd_context.
To ensure that __valid_fields is kept in sync, fields (currently
just por_el0) are now accessed via accessors and prefixed with __ to
discourage direct access.
Fixes: 2e8a1acea859 ("arm64: signal: Improve POR_EL0 handling to avoid uaccess failures") Cc: <stable@vger.kernel.org> Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Fri, 1 May 2026 16:25:12 +0000 (09:25 -0700)]
Merge tag 'regulator-fix-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A fix from Arnd re-adding a dependency on gpiolib which was implicitly
pulled in via an OF specific route which got removed as part of a
cleanup"
* tag 'regulator-fix-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rpi-panel-attiny: add back GPIOLIB dependency
Linus Torvalds [Fri, 1 May 2026 15:45:23 +0000 (08:45 -0700)]
Merge tag 'mm-hotfixes-stable-2026-04-30-15-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"20 hotfixes. All are for MM (and for MMish maintainers). 9 are
cc:stable and the remainder are for post-7.0 issues or aren't deemed
suitable for backporting.
There are two DAMON series from SeongJae Park which address races
which could lead to use-after-free errors, and avoid the possibility
of presenting stale parameter values to users"
* tag 'mm-hotfixes-stable-2026-04-30-15-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: memcontrol: fix rcu unbalance in get_non_dying_memcg_end()
mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()
MAINTAINERS: remove stale kdump project URL
mm/damon/stat: detect and use fresh enabled value
mm/damon/lru_sort: detect and use fresh enabled and kdamond_pid values
mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
selftests/mm: specify requirement for PROC_MEM_ALWAYS_FORCE=y
mm/damon/sysfs-schemes: protect path kfree() with damon_sysfs_lock
mm/damon/sysfs-schemes: protect memcg_path kfree() with damon_sysfs_lock
MAINTAINERS: update Li Wang's email address
MAINTAINERS, mailmap: update email address for Qi Zheng
MAINTAINERS: update Liam's email address
mm/hugetlb_cma: round up per_node before logging it
MAINTAINERS: fix regex pattern in CORE MM category
mm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap()
mm: start background writeback based on per-wb threshold for strictlimit BDIs
kho: fix error handling in kho_add_subtree()
liveupdate: fix return value on session allocation failure
mailmap: update entry for Dan Carpenter
vmalloc: fix buffer overflow in vrealloc_node_align()
For 4K pages, the early kernel mapping may use 2MB block entries but the
kernel segments are only 64KB aligned. Segment boundaries that fall
within a 2MB block therefore require a PTE table so that different
attributes can be applied on either side of the boundary.
KERNEL_SEGMENT_COUNT still correctly counts the five permanent kernel
VMAs registered by declare_kernel_vmas(). However, since commit 5973a62efa34 ("arm64: map [_text, _stext) virtual address range
non-executable+read-only"), the early mapper also maps [_text, _stext)
separately from [_stext, _etext). This adds one more early-only split
and can require one more page-table page than the existing
EARLY_SEGMENT_EXTRA_PAGES allowance reserves.
Increase the 4K-page early mapping allowance by one page to cover that
additional split.
Fixes: 5973a62efa34 ("arm64: map [_text, _stext) virtual address range non-executable+read-only") Assisted-by: TRAE:GLM-5.1 Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
[catalin.marinas@arm.com: rewrote part of the commit log]
[catalin.marinas@arm.com: expanded the code comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Leo Yan [Wed, 29 Apr 2026 14:30:10 +0000 (15:30 +0100)]
kselftest/arm64: Include <asm/ptrace.h> for user_gcs definition
kselftest includes kernel uAPI headers with option:
-isystem $(top_srcdir)/usr/include
Include <asm/ptrace.h> in libc-gcs.c for the definition of struct
user_gcs from the uAPI headers, and remove the redundant definition in
gcs-util.h. This fixes a compilation error on systems where the
toolchain defines NT_ARM_GCS.
Fixes: a505a52b4e29 ("kselftest/arm64: Add a GCS test program built with the system libc") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ALSA: hda/realtek: Add codec SSID quirk for Lenovo Yoga Pro 9 16IMH9
The Yoga Pro 9 16IMH9 (codec SSID 17aa:38d6) shares PCI audio device
subsystem ID 17aa:3811 with the Legion S7 15IMH05. The existing
SND_PCI_QUIRK entry for the Legion routes both machines to
ALC287_FIXUP_LEGION_15IMHG05_SPEAKERS, which does not bind the TAS2781
smart amplifiers, resulting in near-silent built-in speakers.
Add an HDA_CODEC_QUIRK entry immediately before the conflicting PCI quirk
that matches the Yoga Pro 9's unique codec SSID and routes it to
ALC287_FIXUP_TAS2781_I2C. Codec quirks are evaluated after PCI quirks and
take precedence, leaving the Legion S7 15IMH05 entry unaffected.
This follows the same pattern used to disambiguate PCI SSID 17aa:3847
(shared between Yoga Pro 7 14IMH9 and Legion 7 16ACHG6), where a
HDA_CODEC_QUIRK for codec SSID 17aa:38cf resolves the conflict.
Jens Axboe [Fri, 1 May 2026 11:23:12 +0000 (19:23 +0800)]
ublk: don't issue uring_cmd from fallback task work
When ublk_ch_uring_cmd_cb() runs as fallback task work (e.g., because
the submitting task is exiting), the command should not be issued as
current is a kworker, not the daemon task. This can cause io->task
to capture the wrong task in __ublk_fetch(), leading to a task
mismatch warning in ublk_uring_cmd_cancel_fn().
Check tw.cancel and return -ECANCELED instead of issuing the command
from fallback context.
netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header
This adjusts the checksum, if required, after pulling the layer 2
header, either the pppoe header or the inner vlan header in the
double-tagged vlan packets.
Add the VID/PIDs for the ASUS ROG RAIKIRI II controller to xpad_device
and the VID to xpad_table. The controller has a physical PC/XBOX toggle
which switches between XBOX360 and XBOXONE protocols.
Dave Airlie [Fri, 1 May 2026 02:49:22 +0000 (12:49 +1000)]
Merge tag 'drm-xe-fixes-2026-04-30' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
API Fixes:
- Add missing pad and extensions check (Jonathan)
- Reject unsafe PAT indices for CPU cached memory (Jia)
Driver Fixes:
- Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge (Brost)
- Xe3p tuning and workaround fixes (Roper, Gustavo)
- USE drm mm instead of drm SA for CCS read/write (Satya)
- Fix leaks and null derefs (Shuicheng)
- Fix Wa_18022495364 (Tvrtko)
Michael Neuling [Thu, 9 Apr 2026 09:11:39 +0000 (09:11 +0000)]
riscv: errata: Fix bitwise vs logical AND in MIPS errata patching
The condition checking whether a specific errata needs patching uses
logical AND (&&) instead of bitwise AND (&). Since logical AND only
checks that both operands are non-zero, this causes all errata patches
to be applied whenever any single errata is detected, rather than only
applying the matching one.
The SiFive errata implementation correctly uses bitwise AND for the same
check.
Fixes: 0b0ca959d206 ("riscv: errata: Fix the PAUSE Opcode for MIPS P8700") Signed-off-by: Michael Neuling <mikey@neuling.org> Assisted-by: Cursor:claude-4.6-opus-high-thinking Link: https://patch.msgid.link/20260409091143.1348853-2-mikey@neuling.org
[pjw@kernel.org: fixed checkpatch warning] Signed-off-by: Paul Walmsley <pjw@kernel.org>
This series tightens Marvell OcteonTX2 AF NPC support for CN20K silicon
around MCAM key typing, optional debugfs setup, defrag allocation rollback,
defrag entry relocation bookkeeping, logical MCAM clear and programming,
default-rule index handling with explicit teardown, and NIXLF reserved-slot
lookup when default rules are missing.
Patches 1 through 3 focus on AF error handling: propagate
npc_mcam_idx_2_key_type() failures through cn20k MCAM enable, config, copy,
and read paths; treat cn20k NPC debugfs nodes as optional so probe does not
fail when debugfs is unavailable; and fix defrag MCAM allocation rollback
so allocation errno is not overwritten during subbank index resolution.
Patch 4 fixes npc_defrag_move_vdx_to_free(): when an MCAM line is moved to
a new physical index, move entry2target_pffunc[] association to the new
slot, clear the old slot, and retarget the matching mcam_rules entry so
software state matches hardware after defrag.
Patches 5 through 7 refine cn20k MCAM programming: clear entries using the
logical MCAM index and resolved key width, fix bank/CFG sequencing in
npc_cn20k_config_mcam_entry(), and read action metadata from the correct
bank in npc_cn20k_read_mcam_entry().
Patches 8 through 10 complete default-rule lifecycle handling: initialize
default-rule index outputs eagerly, tear down reserved default MCAM rules
explicitly (coordinated with npc_mcam_free_all_entries()), and reject
USHRT_MAX sentinel indices from npc_get_nixlf_mcam_index() on cn20k.
====================
octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices
When cn20k default L2 rules are not installed,
npc_cn20k_dft_rules_idx_get() leaves broadcast, multicast, promiscuous, and
unicast slots at USHRT_MAX. npc_get_nixlf_mcam_index() previously returned
that sentinel as a valid MCAM index, so callers could program hardware with
an invalid index.
Return -EINVAL from the cn20k branches of npc_get_nixlf_mcam_index() when
the requested slot is still USHRT_MAX. Harden cn20k NPC MCAM entry helpers
to reject out-of-range indices before touching hardware.
Drop the early bounds check in npc_enable_mcam_entry() for cn20k so invalid
indices are validated inside npc_cn20k_enable_mcam_entry() instead of being
silently ignored.
In rvu_npc_update_flowkey_alg_idx(), treat negative MCAM indices like
out-of-range values, and only update RSS actions for promiscuous and
all-multi paths when the resolved index is non-negative.
octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free
npc_cn20k_dft_rules_free() used the NPC MCAM mbox "free all" path, which
does not match how cn20k tracks default-rule MCAM slots indexes.
Resolve the default-rule indices, then for each valid slot clear the bitmap
entry, drop the PF/VF map, disable the MCAM line, clear the target
function, and npc_cn20k_idx_free(). Remove any matching software mcam_rules
nodes. On hard failure from idx_free, WARN and stop so the box stays up for
analysis.
In npc_mcam_free_all_entries(), prefetch the same default-rule indices and,
on cn20k, skip bitmap clear and idx_free when the scanned entry is one of
those reserved defaults (they are released by npc_cn20k_dft_rules_free).
octeontx2-af: npc: cn20k: Initialize default-rule index outputs up front
npc_cn20k_dft_rules_idx_get() wrote USHRT_MAX into individual outputs only
on some error paths (lbk promisc lookup, VF ucast lookup, and the PF rule
walk), which could leave other caller slots stale across retries.
Set every non-NULL bcast/mcast/promisc/ucast pointer to USHRT_MAX once at
entry, then drop the duplicate assignments on failure. Successful lookups
still overwrite the relevant slot before returning.
npc_cn20k_read_mcam_entry() always reloaded action and vtag_action from
bank 0 after programming the CAM words. Use the bank returned by
npc_get_bank() for the ACTION reads as well, and read those registers once
up front so both X2 and X4 paths share the same metadata.
Return directly from the X2 keyword path now that the action fields are
already populated.
Cc: Suman Ghosh <sumang@marvell.com> Fixes: 6d1e70282f76 ("octeontx2-af: npc: cn20k: Use common APIs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-8-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For X4 keys its loop reused the bank parameter as the loop counter, so bank
no longer reflected the caller's bank after the loop and the control flow
was hard to follow.
Program NPC_AF_CN20K_MCAMEX_BANKX_CFG_EXT directly in
npc_cn20k_config_mcam_entry(): one CFG write for X2 using the computed
bank, and one CFG write per bank inside the X4 action loop. Enable the
entry at the end with npc_cn20k_enable_mcam_entry(..., true) instead of
embedding the enable bit in bank_cfg via the removed helper.
Cc: Suman Ghosh <sumang@marvell.com> Fixes: 4e527f1e5c15 ("octeontx2-af: npc: cn20k: Add new mailboxes for CN20K silicon") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-7-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
octeontx2-af: npc: cn20k: Clear MCAM entries by index and key width
Replace the old four-argument CN20K MCAM clear with a per-bank static
helper and npc_cn20k_clear_mcam_entry() that takes a logical MCAM index,
resolves the key width via npc_mcam_idx_2_key_type(), and clears either one
bank (X2) or every bank (X4).
Call it from npc_clear_mcam_entry() on cn20k and log when key-type lookup
fails. Use the per-bank helper from npc_cn20k_config_mcam_entry() for
pre-program clears.
For loopback VFs, use the promisc MCAM index as ucast_idx when copying RSS
action for promisc, matching cn20k default-rule layout.
Cc: Suman Ghosh <sumang@marvell.com> Fixes: 6d1e70282f76 ("octeontx2-af: npc: cn20k: Use common APIs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260429022722.1110289-6-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
npc_defrag_move_vdx_to_free() disables, copies, and enables the MCAM entry
at a new index but previously left entry2target_pffunc[] and the mcam_rules
list still keyed to the old index. Copy the target PF association to the
new slot, clear the old one, and retarget the rule entry so software state
matches the relocated hardware context.
octeontx2-af: npc: cn20k: Propagate errors in defrag MCAM alloc rollback
npc_defrag_alloc_free_slots() allocates MCAM indexes in up to two passes on
bank0 then bank1. On failure it rolls back by freeing entries already
placed in save[].
__npc_subbank_alloc() can return a negative errno while only part of the
indexes are valid. The rollback loop used rc for
npc_mcam_idx_2_subbank_idx() as well, so a successful lookup stored zero in
rc and a later __npc_subbank_free() failure could still end with return 0
when the allocation path had also left rc at zero (for example shortfall
after zero return values from the alloc helpers).
Jump to the rollback path immediately when either __npc_subbank_alloc()
call fails, preserving its errno. If both calls succeed but the total
allocated count is still less than cnt, set rc to -ENOSPC before rollback.
Use a separate err variable for npc_mcam_idx_2_subbank_idx() so a
successful lookup no longer clears a non-zero rc from the allocation phase.
octeontx2-af: npc: cn20k: Drop debugfs_create_file() error checks in init
debugfs is not intended to be checked for allocation failures the way other
kernel APIs are: callers should not fail probe or subsystem init because a
debugfs node could not be created, including when debugfs is disabled in
Kconfig. Replacing NULL checks with IS_ERR() checks is similarly wrong for
optional debugfs.
Remove dentry checks and -EFAULT returns from npc_cn20k_debugfs_init().
See:
https://staticthinking.wordpress.com/2023/07/24/
debugfs-functions-are-not-supposed-to-be-checked/
octeontx2-af: npc: cn20k: Propagate MCAM key-type errors on cn20k
npc_mcam_idx_2_key_type() can fail; callers used to ignore it and still
used kw_type when enabling, configuring, copying, and reading MCAM entries.
That could program or decode hardware with an undefined key type.
Return -EINVAL when key-type lookup fails. Return -EINVAL from
npc_cn20k_copy_mcam_entry() when src and dest key types differ instead of
failing silently.
Change npc_cn20k_{enable,config,copy,read}_mcam_entry() to return int on
success or error. Thread those errors through the cn20k MCAM write and read
mbox handlers, the cn20k baseline steer read path, NPC defrag move
(disable/copy/enable with dev_err and -EFAULT), and the DMAC update path in
rvu_npc_fs.c.
Make npc_copy_mcam_entry() return int so the cn20k branch can return
npc_cn20k_copy_mcam_entry() without a void/int mismatch, and fail
NPC_MCAM_SHIFT_ENTRY when copy fails.
Lorenzo Bianconi [Wed, 29 Apr 2026 12:02:31 +0000 (14:02 +0200)]
net: airoha: Move entries to queue head in case of DMA mapping failure in airoha_dev_xmit()
In order to respect the original descriptor order and avoid any
potential IOMMU fault or memory corruption, move pending queue entries
to the head of hw queue tx_list if the DMA mapping of current inflight
packet fails in airoha_dev_xmit routine.
Currently, request_threaded_irq() is used with a primary handler but a
NULL threaded handler, while also setting the IRQF_ONESHOT flag. This
specific combination triggers a WARNING since the commit aef30c8d569c
("genirq: Warn about using IRQF_ONESHOT without a threaded handler").
WARNING: kernel/irq/manage.c:1502 at __setup_irq+0x4fa/0x760
Fix the issue by switching to request_irq(), which is the appropriate
interface or a non-threaded interrupt handler, and removing the
unnecessary IRQF_ONESHOT flag.
Register WX_CFG_PORT_ST is a PF restricted register. When a VF is
initialized, attempting to read this register triggers an illegal
register access, which lead to a system hang.
When the device is VF, the bus function ID can be obtained directly from
the PCI_FUNC(pdev->devfn).
Linus Torvalds [Fri, 1 May 2026 00:36:48 +0000 (17:36 -0700)]
Merge tag 'mtd/fixes-for-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Besides an out-of-bound bug, this is about properly supporting Winbond
octal SPI NAND chips which use a specific pattern for stuffing more
address bits in some operations. This uses the spi-mem flag in SPI
NAND that was added to the spi-mem layer just before the merge window
through the spi tree"
* tag 'mtd/fixes-for-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spinand: winbond: Fix ODTR write VCR on W35NxxJW
mtd: spinand: winbond: Set the packed page read flag to W35N02/04JW
mtd: spinand: Add support for packed read data ODTR commands
mtd: spi-nor: debugfs: fix out-of-bounds read in spi_nor_params_show()
net: enetc: fix VSI mailbox timeout handling and DMA lifecycle
In the current VSI mailbox implementation, the VSI allocates a DMA buffer
to store the message sent to the PSI. When the PSI receives the message
request from the VSI, the hardware copies the message data from this DMA
buffer to PSI's DMA buffer for processing.
When enetc_msg_vsi_send() times out, two scenarios can occur:
1) Use-after-free: If the hardware hasn't completed message copying when
the VSI frees the buffer, the hardware may subsequently copy the data
from freed memory to PSI's DMA buffer.
2) Message race: If PSI hasn't processed the previous message when the
next message is sent, the VSI may receive the previous message's
reply, leading to incorrect handling.
To address these issues, implement the following changes:
- Check the mailbox busy status before sending a new message. If the
mailbox is in busy state, it indicates the previous message is still
being processed, so return an error immediately.
- Add the 'msg' field to struct enetc_si to preserve the DMA buffer
information. The caller of enetc_msg_vsi_send() no longer frees the
DMA buffer. Instead, defer freeing until it is safe to do so (when
mailbox is not busy on next send).
- Add cleanup in enetc_vf_remove() to free the last message buffer.
This ensures the DMA buffer remains valid during message copying and
prevents message reply mismatches.
Daniel Borkmann [Wed, 29 Apr 2026 15:46:48 +0000 (17:46 +0200)]
ipv6: Implement limits on extension header parsing
ipv6_{skip_exthdr,find_hdr}() and ip6_{tnl_parse_tlv_enc_lim,
protocol_deliver_rcu}() iterate over IPv6 extension headers until they
find a non-extension-header protocol or run out of packet data. The
loops have no iteration counter, relying solely on the packet length
to bound them. For a crafted packet with 8-byte extension headers
filling a 64KB jumbogram, this means a worst case of up to ~8k
iterations with a skb_header_pointer call each. ipv6_skip_exthdr(),
for example, is used where it parses the inner quoted packet inside
an incoming ICMPv6 error:
- icmpv6_rcv
- checksum validation
- case ICMPV6_DEST_UNREACH
- icmpv6_notify
- pskb_may_pull() <- pull inner IPv6 header
- ipv6_skip_exthdr() <- iterates here
- pskb_may_pull()
- ipprot->err_handler() <- sk lookup
The per-iteration cost of ipv6_skip_exthdr itself is generally
light, but skb_header_pointer becomes more costly on reassembled
packets: the first ~1232 bytes of the inner packet are in the skb's
linear area, but the remaining ~63KB are in the frag_list where
skb_copy_bits is needed to read data.
Initially, the idea was to add a configurable limit via a new
sysctl knob with default 8, in line with knobs from commit 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and Destination
options"), but two reasons eventually argued against it:
- It adds to UAPI that needs to be maintained forever, and
upcoming work is restricting extension header ordering anyway,
leaving little reason for another sysctl knob
- exthdrs_core.c is always built-in even when CONFIG_IPV6=n,
where struct net has no .ipv6 member, so the read site would
need an ifdef'd fallback to a constant anyway
Therefore, just use a constant (IP6_MAX_EXT_HDRS_CNT). All four
extension header walking functions are now bound by this limit.
Note that the check in ip6_protocol_deliver_rcu() happens right
before the goto resubmit, such that we don't have to have a test
for ipv6_ext_hdr() in the fast-path.
There's an ongoing IETF draft-iurman-6man-eh-occurrences to enforce
IPv6 extension headers ordering and occurrence. The latter also
discusses security implications. As per RFC8200 section 4.1, the
occurrence rules for extension headers provide a practical upper
bound which is 8. In order to be conservative, let's define
IP6_MAX_EXT_HDRS_CNT as 12 to leave enough room for quirky setups.
In the unlikely event that this is still not enough, then we might
need to reconsider a sysctl.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260429154648.809751-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>