Adrian Hunter [Wed, 3 Jun 2026 09:07:54 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Increase DMA transfer ring size to maximum
The DMA transfer ring is currently limited to 16 entries, despite the
MIPI I3C HCI supporting up to 32 devices. When the ring lacks space for a
new transfer list, the driver returns -EBUSY, which can be unexpected
for clients.
Increase the DMA transfer ring size to the maximum supported value of
255 entries. This effectively eliminates ring-space exhaustion in
practice and avoids the complexity of adding secondary queuing
mechanisms.
Even at the maximum size, the memory overhead remains small
(approximately 24 bytes per entry by default).
Adrian Hunter [Wed, 3 Jun 2026 09:07:52 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Base timeouts on actual transfer start time
Transfer timeouts are currently measured from the point where a transfer
list is queued to the controller. This can cause transfers to time out
before they have actually started, if earlier queued transfers consume
the timeout interval.
Fix this by recording when a transfer reaches the head of the queue and
adjusting the timeout calculation to start from that point. The existing
low-overhead completion-based timeout mechanism is preserved, but care is
taken to ensure the transfer start time is consistently recorded for both
PIO and DMA paths.
This prevents premature timeouts while retaining efficient timeout
handling.
Adrian Hunter [Wed, 3 Jun 2026 09:07:51 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Wait for NoOp commands to complete
When a transfer list is only partially completed due to an error,
hci_dma_dequeue_xfer() overwrites the remaining DMA ring entries with
NoOp commands and restarts the ring to flush them out.
While NoOp commands are expected to complete successfully, they may still
fail to complete if the DMA ring is stuck. Explicitly wait for the NoOp
commands to finish, and trigger controller recovery if they do not
complete or report an error.
This ensures that partially completed transfer lists are reliably
resolved and that a stuck ring is recovered promptly.
Adrian Hunter [Wed, 3 Jun 2026 09:07:50 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Add DMA-mode recovery for internal controller errors
Handle internal I3C HCI errors when operating in DMA mode by adding a
simple recovery mechanism.
On detection of an internal controller error, mark recovery as needed and
attempt to restore operation by performing a software reset followed by
state restore. To keep recovery straightforward on this unlikely error
path, all currently queued transfers are terminated and completed with an
error.
This allows the controller to resume operation after internal failures
rather than remaining permanently stuck.
Note, internal errors indicated by INTR_HC_INTERNAL_ERR, cause the
controller to stop.
Adrian Hunter [Wed, 3 Jun 2026 09:07:48 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Add DMA ring abort quirk for Intel controllers
DMA rings can be aborted either per-ring via RING_CONTROL or globally
via HC_CONTROL_ABORT. The driver currently relies on the per-ring
mechanism.
Some Intel I3C HCI controllers require HC_CONTROL_ABORT to be asserted
before a DMA ring abort is effective. This behavior is non-standard.
Introduce a controller quirk to select the required abort method and
enable it for Intel LPSS I3C controllers.
Adrian Hunter [Wed, 3 Jun 2026 09:07:46 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Add DMA ring abort/reset quirk for Intel controllers
Some Intel I3C HCI controllers cannot reliably restart a DMA ring after an
ABORT. Additional queue resets are required to recover, and must be
performed using PIO reset bits even while operating in DMA mode.
This behavior is non-standard. Introduce a controller quirk to opt into
the required PIO queue resets after a DMA ring abort, and enable it for
Intel LPSS I3C controllers.
Adrian Hunter [Wed, 3 Jun 2026 09:07:45 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Avoid restarting DMA ring after aborting wrong transfer
Software ABORT of the DMA ring is used to recover from transfer list
timeouts, but it is inherently racy. The intended transfer list may
complete just before the ABORT takes effect, causing the subsequent
transfer list to be aborted instead.
In this case, an incomplete transfer list may remain in the ring and has
not yet been processed by hci_dma_dequeue_xfer(). Restarting the DMA
ring at that point can lead to unpredictable results.
Detect when the next queued transfer is not the first entry of a transfer
list and does not belong to the list currently being dequeued. In that
case, skip restarting the DMA ring and defer recovery until a subsequent
call to hci_dma_dequeue_xfer(), which will safely restart the ring once
the incomplete list is handled.
Adrian Hunter [Wed, 3 Jun 2026 09:07:44 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Complete transfer lists immediately on error
In DMA mode, transfer lists are currently completed only when the final
transfer in the list completes. If an earlier transfer fails, the list is
left incomplete and callers wait until timeout.
There is no need to wait for a timeout, as the completion path in
i3c_hci_process_xfer() already checks for error status. Complete the
transfer list as soon as any transfer in the list reports an error.
This avoids unnecessary delays and spurious timeouts on error.
Complete a transfer list completion immediately there is an error.
Adrian Hunter [Wed, 3 Jun 2026 09:07:43 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Call hci_dma_xfer_done() from dequeue path
hci_dma_dequeue_xfer() relies on state normally updated by the DMA
interrupt handler. Ensure that state is current by explicitly invoking
hci_dma_xfer_done() from the dequeue path.
This handles cases where the interrupt handler has not (yet) run.
Adrian Hunter [Wed, 3 Jun 2026 09:07:41 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Wait for DMA ring restart to complete
Although hci_dma_dequeue_xfer() is serialized against itself via
control_mutex, this does not guarantee that a DMA ring restart
triggered by a previous invocation has fully completed.
When the function is called again in rapid succession, the DMA ring may
still be transitioning back to the running state, which may confound or
disrupt further state changes.
Address this by waiting for the DMA ring restart to complete before
continuing.
Adrian Hunter [Wed, 3 Jun 2026 09:07:40 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Prevent DMA enqueue while ring is aborting or in error
Block the DMA enqueue path while a Ring abort is in progress or after an
error condition has been detected.
Previously, new transfers could be enqueued while the DMA Ring was being
aborted or while error handling was underway. This allowed enqueue and
error-recovery paths to run concurrently, potentially interfering with
each other and corrupting Ring state.
Introduce explicit enqueue blocking and a wait queue to serialize access:
enqueue operations now wait until abort or error handling has completed
before proceeding. Enqueue is unblocked once the Ring is safely restarted.
Note, there is only 1 ring bundle configured, and a transfer error causes
the controller to halt ring (bundle) operation, so there is only ever 1
outstanding error at a time. Furthermore, a later patch ensures that only
the currently active transfer list can time out. Consequently, the DMA
queue will not be unblocked while there are outstanding transfer errors or
timeouts.
Adrian Hunter [Wed, 3 Jun 2026 09:07:39 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Preserve RUN bit when aborting DMA ring
The MIPI I3C HCI specification does not require the DMA ring RUN bit
(RUN_STOP) to be cleared when issuing an ABORT. That allows the DMA ring
to continue to receive IBIs, although an IBI is anyway not lost because it
can be received once the ring restarts if the I3C device has not given up.
Note, currently ABORT is only used on a timeout error path so the change
has very little effect in practice. In the more common case of a transfer
error, the ring (bundle) operation is halted by the controller anyway.
Adjust the RING_CONTROL handling to set ABORT without clearing RUN_STOP,
bringing the driver into alignment with the specification.
Fixes: b795e68bf3073 ("i3c: mipi-i3c-hci: Correct RING_CTRL_ABORT handling in DMA dequeue") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260603090754.16252-3-adrian.hunter@intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Adrian Hunter [Wed, 3 Jun 2026 09:07:38 +0000 (12:07 +0300)]
i3c: mipi-i3c-hci: Fix suspend behavior when bus disable falls back to software reset
Software reset was introduced as a fallback if bus disable failed. The
change was made in 2 places: the cleanup path and the suspend path.
For the cleanup path (i3c_hci_bus_cleanup()), after software reset the
function continues to do cleanup for the current I/O mode. For the
suspend path (i3c_hci_rpm_suspend()), after software reset the function
returns early. However software reset does not reset any Ring Headers in
the Host Controller, so returning early is not the right thing to do.
Instead, continue to call suspend for the current I/O mode, which for DMA
mode will reset any Ring Headers.
Note, although Ring Headers should not be active at this stage, performing
this reset follows the procedure defined by the specification and keeps
the suspend path consistent with the cleanup path.
Note also, i3c_hci_sync_irq_inactive() is still called via the PIO and DMA
hci->io->suspend() callbacks.
Always return 0 because the device is quiesced as much as possible and
returning a negative error code would unnecessarily prevent system suspend.
Fixes: 9a258d1336f7 ("i3c: mipi-i3c-hci: Fallback to software reset when bus disable fails") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260603090754.16252-2-adrian.hunter@intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Linus Torvalds [Sun, 14 Jun 2026 14:37:39 +0000 (15:37 +0100)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM fixes from Russell King:
- Avoid KASAN instrumentation of half-word IO
- Use a byte load for KASAN shadow stack
- Fix kexec and hibernation with PAN
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9476/1: mm: fix kexec and hibernation with CONFIG_CPU_TTBR0_PAN
ARM: 9475/1: entry: use byte load for KASAN VMAP stack shadow
ARM: 9474/1: io: avoid KASAN instrumentation of raw halfword I/O
geneve: Fix off-by-one comparing with GRO_LEGACY_MAX_SIZE
GRO_LEGACY_MAX_SIZE = 65536; total_len being 65536 is too big to fit
into a u16. As can be seen in skb_gro_receive, packets bigger or equal
to gro_max_size (or GRO_LEGACY_MAX_SIZE) are dropped with -E2BIG. Apply
the same boundary to geneve_post_decap_hint to avoid writing 65536 to a
16-bit iph->tot_len field with an overflow.
Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path") Signed-off-by: Alice Mikityanska <alice@isovalent.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260611192955.604661-3-alice.kernel@fastmail.im Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Similar to commit add641e7dee3 ("sched: act_csum: don't mangle TCP and
UDP GSO packets"), UDP tunnel GSO packets going through act_csum
shouldn't have their checksum calculated at this point, because it will
be done after segmentation. Setting the checksum in act_csum modifies
skb->ip_summed and prevents inner IP csum offload from kicking in,
resulting in a packet with a bad checksum.
Add UDP tunnel GSO packets to the exceptions, and also add UDP GSO
(SKB_GSO_UDP_L4), as the same logic as in the commit mentioned above
applies to UDP GSO too.
Will Deacon [Sun, 14 Jun 2026 11:18:18 +0000 (12:18 +0100)]
Merge branch 'for-next/selftests' into for-next/core
* for-next/selftests:
kselftest/arm64: Add 2025 dpISA coverage to hwcaps
kselftest/arm64: Add tests for POR_EL0 save/reset/restore
kselftest/arm64: Move/add POE helpers to test_signals_utils.h
kselftest/arm64: Add POE as a feature in the signal tests
selftests/mm: Fix resv_sz when parsing arm64 signal frame
Will Deacon [Sun, 14 Jun 2026 11:17:33 +0000 (12:17 +0100)]
Merge branch 'for-next/mm' into for-next/core
* for-next/mm: (24 commits)
Revert "arm64: mm: Unmap kernel data/bss entirely from the linear map"
Revert "arm64: mm: Defer remap of linear alias of data/bss"
arm64/mm: Rename ptdesc_t
arm64: mm: Defer remap of linear alias of data/bss
KVM: arm64: Omit tag sync on stage-2 mappings of the zero page
arm64: Avoid double evaluation of __ptep_get()
kasan: Move generic KASAN page tables out of BSS too
arm64: Rename page table BSS section to .bss..pgtbl
arm64: mm: Unmap kernel data/bss entirely from the linear map
arm64: mm: Map the kernel data/bss read-only in the linear map
mm: Make empty_zero_page[] const
sh: Drop cache flush of the zero page at boot
powerpc/code-patching: Avoid r/w mapping of the zero page
arm64: mm: Don't abuse memblock NOMAP to check for overlaps
arm64: Move fixmap and kasan page tables to end of kernel image
arm64: mm: Permit contiguous attribute for preliminary mappings
arm64: kfence: Avoid NOMAP tricks when mapping the early pool
arm64: mm: Permit contiguous descriptors to be manipulated
arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
arm64: mm: Preserve existing table mappings when mapping DRAM
...
Will Deacon [Sun, 14 Jun 2026 11:17:07 +0000 (12:17 +0100)]
Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
arm64: arch_timer: reuse arch_timer_read_cnt{p,v}ct_el0() helpers
arm64: patching: replace min_t with min in __text_poke
ARM64: remove unnecessary architecture-specific <asm/device.h>
arm64: Implement _THIS_IP_ using inline asm
arm64: panic from init_IRQ if IRQ handler stacks cannot be allocated
arm64: smp: Do not mark secondary CPUs possible under nosmp
arm64/daifflags: Make local_daif_*() helpers __always_inline
Will Deacon [Sun, 14 Jun 2026 11:16:59 +0000 (12:16 +0100)]
Merge branch 'for-next/fpsimd-cleanups' into for-next/core
* for-next/fpsimd-cleanups:
arm64: fpsimd: Remove <asm/fpsimdmacros.h>
arm64: fpsimd: Move SME save/restore inline
arm64: fpsimd: Move sve_flush_live() inline
arm64: fpsimd: Move SVE save/restore inline
arm64: fpsimd: Use opaque type for SME state
arm64: fpsimd: Use opaque type for SVE state
arm64: fpsimd: Move fpsimd save/restore inline
arm64: fpsimd: Split FPSR/FPCR from SVE save/restore
arm64: sysreg: Add FPCR and FPSR
arm64: fpsimd: Move sve_get_vl() and sme_get_vl() inline
arm64: fpsimd: Use assembler for baseline SME instructions
arm64: fpsimd: Use assembler for SVE instructions
arm64: fpsimd: Remove sve_set_vq() and sme_set_vq()
arm64: fpsimd: Fold sve_init_regs() into do_sve_acc()
KVM: arm64: pkvm: Remove struct cpu_sve_state
KVM: arm64: pkvm: Save host FPMR in host cpu context
KVM: arm64: Don't override FFR save/restore argument
KVM: arm64: Don't include <asm/fpsimdmacros.h>
arm64: fpsimd: Fix type mismatch in sme_{save,load}_state()
arm64: fpsimd: Fix type mismatch in sve_{save,load}_state()
netfilter: nf_dup_netdev: add nf_dev_xmit_recursion*() helpers and use them
Update nft_dup and nft_fwd to use the nf_dev_xmit_recursion() helpers.
This patch also disables BH when transmitting the skb to address a
possible migration to different CPU leading to imbalanced decrementation
of the recursion counters.
This is modeled after Florian Westphal's dev_xmit_recursion*() API
available since commit 97cdcf37b57e ("net: place xmit recursion in
softnet data") according to its current state in the tree.
Fixes: 1d47b55b36d2 ("netfilter: nft_fwd_netdev: use recursion counter in neigh egress path") Fixes: f37ad9127039 ("netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix the docutils error reported by kernel test robot
for the new conn_max sysctl:
Documentation/networking/ipvs-sysctl.rst:76: WARNING: Block quote ends
without a blank line; unexpected unindent. [docutils]
Documentation/networking/ipvs-sysctl.rst:76: ERROR: Unexpected section
title or transition.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202606071851.Dc1H7hOO-lkp@intel.com/ Fixes: 4a15044a2b06 ("ipvs: add conn_max sysctl to limit connections") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: flowtable: bail out if forward path cannot be discovered
If forward path discovery fails for any reason or netdevice is not
registered for this flowtable, then bail out to classic forwarding path
rather than providing incomplete forwarding path.
Update the existing forward path parser functions to report an error
so the flow_offload expressions gives up on setting up the flowtable
entry.
nf_ct_ext_find() used to return NULL if the extension is stale for
unconfirmed conntracks if the genid validation fails.
Skip NULL check in nf_nat_inet_fn() given this is valid to be NULL
for non-initialized ct nat extensions.
While at it, fetch ct helper area in nf_ct_expect_related_report() only
once and pass it on to other ancilliary functions. Replace WARN_ON()
by WARN_ON_ONCE() in nf_ct_unlink_expect_report().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
1) tree_gc_worker fails to wrap around after it can't find more pending
work. Update data->gc_tree unconditionally. If its 0, start from
the first pending tree (which can be 0).
2) tree_gc_worker() iterates the rbtree without lock. This is never
safe. Move iteration under the spinlock. If this takes too long
(resched needed), save key of next node, drop lock, resched, re-lock,
then search for the key (node). In very rare cases this node might
no longer exist, in that case we can just wait for next gc.
3) use disable_work_sync(), we don't want any restarts.
4) module exit function needs rcu_barrier before we zap the kmem cache.
Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Closes: https://sashiko.dev/#/patchset/20260525182924.28456-1-fw%40strlen.de Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_conncount: add sequence counter to detect tree modifications
There a two issues with traversal:
1. Key lookup (tree search) cannot detect concurrent modifications and may
not find a result in case of parallel modification.
2. Worker does a lockless iteration. This is never safe.
Add a sequence counter and re-do the lookup under lock in case the
tree was modified / seqcount changed.
gc_worker bugs are addressed in the next patch.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_conncount: use per nf_conncount_data spinlocks
This change replaces the rb_root with a new container structure.
Instead of an array of locks shared by all nf_conncount_data objects,
each tree gains its own dedicated lock.
Downside: nf_conncount_data increases in size. Before this change:
struct nf_conncount_data {
[..]
/* --- cacheline 33 boundary (2112 bytes) was 16 bytes ago --- */
unsigned int gc_tree; /* 2128 4 */
/* size: 2136, cachelines: 34, members: 7 */
/* padding: 4 */
netfilter: nf_conncount: callers must hold rcu read lock
rcu_derefence_raw() should not have been used here, it concealed this bug.
Its used because struct rb_node lacks __rcu annotated pointers, so plain
rcu_derefence causes sparse warnings.
The major tradeoff is that rcu_derefence_raw() doesn't warn when the caller
isn't in a rcu read section.
Extend the rcu read lock scope accordingly and cause sparse warnings,
those warnings are the lesser evil.
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Closes: https://sashiko.dev/#/patchset/20260603230610.7900-1-fw%40strlen.de Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: use DEBUG_NET_WARN_ON_ONCE in packet and control paths
Replace raw warning macros with DEBUG_NET_WARN_ON_ONCE across the
nf_tables API, core engine, and expression evaluations. This prevents
unnecessary system panics when panic_on_warn=1 is enabled in production
systems.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Marco Crivellari [Wed, 27 May 2026 08:18:34 +0000 (10:18 +0200)]
ipvs: Replace use of system_unbound_wq with system_dfl_long_wq
This patch continues the effort to refactor workqueue APIs, which has
begun with the changes introducing new workqueues and a new
alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior
of workqueues to become unbound by default so that their workload
placement is optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better
named new workqueues with no intended behaviour changes:
Cen Zhang [Sun, 14 Jun 2026 00:48:01 +0000 (08:48 +0800)]
ALSA: seq: avoid stale FIFO cells during resize
snd_seq_fifo_resize() still needs to publish the replacement pool
before it waits for FIFO users. A blocking snd_seq_read() holds
f->use_lock while it sleeps, so concurrent senders must be able to
queue to the new pool and wake that reader instead of failing against a
closing old pool.
However, snd_seq_fifo_event_in() duplicates an event before it takes
f->lock, and snd_seq_read() can dequeue a cell and later call
snd_seq_fifo_cell_putback() if copy_to_user() or
snd_seq_expand_var_event() fails. If resize swaps f->pool and detaches
oldhead in between, either path can relink an old-pool cell after the
snapshot. That stale cell sits outside the drained oldhead list, keeps
oldpool->counter elevated, and can leave snd_seq_pool_delete() waiting
for the retired pool to drain.
Keep the existing swap-before-wait ordering in snd_seq_fifo_resize(),
but reject stale cells before any FIFO relink. Revalidate event-in cells
under f->lock and retry them against the published replacement pool, and
free stale putback cells instead of linking them back into the FIFO.
The buggy scenario involves two paths, with each column showing the
order within that path:
resize path: relink path:
1. Allocate newpool. 1. Take f->use_lock.
2. Swap f->pool to newpool and 2. Duplicate or dequeue an old-pool
detach oldhead. cell before oldpool closes.
3. Mark oldpool closing and 3. Reach a later relink point after
wait for FIFO users. resize published newpool.
4. Free oldhead and delete 4. Relink the old-pool cell after
oldpool. resize detached oldhead.
5. Drop f->use_lock.
The reproducer reports a resize ioctl blocked in the expected pool
teardown path:
Cen Zhang [Sun, 14 Jun 2026 00:48:00 +0000 (08:48 +0800)]
ALSA: seq: oss: Serialize readq reset state with q->lock
snd_seq_oss_readq_clear() resets qlen, head, and tail without
q->lock even though the normal reader and producer paths serialize the
same ring state under that spinlock. A reset can therefore race
snd_seq_oss_readq_free() or snd_seq_oss_readq_put_event() and leave
stale records in the queue, drop freshly queued ones, or report the
wrong readiness after wakeup. KCSAN reports a data race between
snd_seq_oss_readq_clear() and snd_seq_oss_readq_free().
Take q->lock while clearing the ring and resetting input_time. Factor
the enqueue logic into a caller-locked helper so
snd_seq_oss_readq_put_timestamp() updates its suppression state under
the same lock instead of racing the reset path.
The buggy scenario involves two paths, with each column showing the
order within that path:
reset path: locked readq updater:
1. snd_seq_oss_reset() or 1. A reader or callback producer
release reaches takes q->lock on the same queue.
snd_seq_oss_readq_clear().
2. snd_seq_oss_readq_clear() 2. The updater tests or modifies
resets qlen, head, tail, qlen, head, and tail.
and input_time.
3. snd_seq_oss_readq_clear() 3. The updater completes its
wakes sleepers on read-modify-write sequence.
q->midi_sleep.
4. Without q->lock, the reset 4. The resulting ring state drives
can overlap the locked later reads and readiness.
update.
KCSAN reports:
BUG: KCSAN: data-race in snd_seq_oss_readq_clear /
snd_seq_oss_readq_free
write to 0xffff8881069fe608 of 4 bytes by task 120516 on cpu 0:
snd_seq_oss_readq_free+0x6c/0x80
snd_seq_oss_read+0xcb/0x250
odev_read+0x38/0x60
vfs_read+0xff/0x600
ksys_read+0xb4/0x140
__x64_sys_read+0x46/0x60
do_syscall_64+0xbb/0x2f0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff8881069fe608 of 4 bytes by task 120517 on cpu 1:
snd_seq_oss_readq_clear+0x1f/0x90
snd_seq_oss_reset+0xa7/0xf0
snd_seq_oss_ioctl+0x6f6/0x7e0
odev_ioctl+0x56/0xc0
__x64_sys_ioctl+0xd1/0x120
do_syscall_64+0xbb/0x2f0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Runyu Xiao [Thu, 11 Jun 2026 05:35:43 +0000 (13:35 +0800)]
kcm: use WRITE_ONCE() when changing lower socket callbacks
kcm_attach() replaces a live lower TCP socket's sk_data_ready and
sk_write_space callbacks with KCM handlers, and kcm_unattach() restores
them later. Those callback-pointer updates are still plain stores even
though the same fields can be read and invoked concurrently on other
CPUs.
If another CPU observes an older callback snapshot after the live field
has already been restored, callback execution can run with a mismatched
target and sk_user_data state, leading to stale or misdirected wakeups.
Use WRITE_ONCE() for the callback replacement and restore operations so
these shared callback fields follow the same visibility contract already
established by the earlier 4022 fixes.
Merge tag 'iio-fixes-for-7.1b' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
IIO: 2nd set of fixes for the 7.1 cycle.
Usual mixed bag of ancient issues and the recently introduced.
Various drivers
- Ensure use of simple_write_to_buffer() in debugfs callbacks doesn't
result in reading off the end of intended data by checking the
position is always 0.
buffer/hw-consumer
- Ensure scan_mask is freed on buffer release.
acpi-als
- Check ACPI_COMPANION() against NULL to close corner case where a
driver is overridden.
adi,ad4062
- Add GPIOLIB dependency to avoid undefined ref to gpiochip_get_data()
adi.ad7768-1
- Add GPIOLIB dependency to avoid several undefined functions.
adi,ad2s1210
- Ensure possible recovery path if a read fails in the interrupt handler.
bosch,bmg160
- Increase sleep on startup to ensure device is ready.
bosch,bmp280
- Ensure buffer pushed to kfifo is zeroed to avoid leaking uninitialized
stack data to userspace.
dyna-image,al3010
- Fix refactor that stopped reading one of the two measurement registers.
dyna-image,al3320a
- Fix refactor that stopped reading one of the two measurement registers.
qcom,spmi-iadc
- Ensure disable_irq_wake() is called on remove path.
sensiron,scd30
- Fix a sign extension bug.
st,vl5310x
- Ensure possible recovery path if a read fails in the interrupt handler.
ti,adc1298
- Bounds check for pga_settings index. Hardening against device returning
unexpected values.
ti,tmp006
- Ensure trigger correctly released on remove path.
vishay,veml6030
- Fix incorrect channel type in events.
vishay,veml6074
- Bounds check for veml6075_it_ms. Hardening against device returning
unexpected values.
* tag 'iio-fixes-for-7.1b' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (23 commits)
iio: adc: ad_sigma_delta: fix clear_pending_event for registerless devices
iio: adc: ad_sigma_delta: fix CS held asserted and state leaks
iio: light: opt3001: fix missing state reset on timeout
iio: chemical: scd30: Cleanup initializations and fix sign-extension bug
iio: core: fix uninitialized data in debugfs
iio: backend: fix uninitialized data in debugfs
iio: dac: ad3552r-hs: fix uninitialized data ni ad3552r_hs_write_data_source()
iio: adc: qcom-spmi-iadc: balance enable_irq_wake() on driver unbind
iio: light: al3320a: read both ALS ADC registers again
iio: light: al3010: read both ALS ADC registers again
iio: temperature: tmp006: use devm_iio_trigger_register
iio: buffer: hw-consumer: free scan_mask on buffer release
iio: adc: ad7768-1: Select GPIOLIB
iio: light: veml6030: fix channel type when pushing events
iio: light: acpi-als: Check ACPI_COMPANION() against NULL
iio: resolver: ad2s1210: notify trigger and clear state on fault read error
iio: proximity: vl53l0x: notify trigger and clear IRQ on error paths
iio: gyro: bmg160: wait full startup time after mode change at probe
iio: gyro: bmg160: bail out when bandwidth/filter is not in table
iio: pressure: bmp280: zero-init bmp580 trigger handler buffer
...
Sean Chang [Mon, 8 Jun 2026 15:52:52 +0000 (23:52 +0800)]
riscv: kvm: Use endian-specific __lelong for NACL shared memory
When compiling with sparse enabled (C=2), bitwise type warnings are
triggered in the RISC-V KVM implementation. This occurs because the
user-space data unboxing macro '__get_user_asm' performs implicit
casting on restricted types without forcing the compiler's compliance.
Additionally, raw 'unsigned long *' pointers are used to access the
SBI NACL shared memory, whereas the RISC-V SBI specification mandates
that these structures must follow little-endian byte ordering.
Fix these by:
1. Adding a '__force' cast to '__get_user_asm()' to safely suppress
implicit cast warnings during user-space data fetching.
2. Introducing the '__lelong' type macro, which dynamically resolves to
'__le32' or '__le64' depending on XLEN, and replacing 'unsigned long *'
with '__lelong *' to enforce proper compile-time endianness checks.
Linus Torvalds [Sun, 14 Jun 2026 01:21:44 +0000 (18:21 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Fixes for the Qualcomm and Google GS101 clk drivers:
- Skip parking clks on some Qualcomm platforms so that the recovery
console keeps working
- Fix Google GS101 resume by using the correct div register"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: dispcc-sc8280xp: Don't park mdp_clk_src at registration time
clk: samsung: gs101: Fix missing USI7_USI DIV clock in peric0_clk_regs
clk: qcom: x1e80100-dispcc: Stop disp_cc_mdss_mdp_clk_src from getting parked
====================
net: hns3: enhance tc flow offload support
This patchset enhances the tc flow offload support for hns3 driver:
- Patch 1: Refactor hclge_add_cls_flower() to support more actions
- Patch 2: Improve unused_tuple parameter setting for separate src/dst configuration
- Patch 3: Add support for HCLGE_FD_ACTION_SELECT_QUEUE and HCLGE_FD_ACTION_DROP_PACKET actions
- Patch 4: Add support for FLOW_DISSECTOR_KEY_IP and FLOW_DISSECTOR_KEY_ENC_KEYID dissectors
- Patch 5: Add debugfs support for dumping FD rules
- Patch 6: Move FD code to a separate file (hclge_fd.c) for better code organization
====================
Jijie Shao [Wed, 10 Jun 2026 06:06:18 +0000 (14:06 +0800)]
net: hns3: move fd code to a separate file
The hclge_main.c file has become very large,
so the fd code has been moved to a separate hclge_fd.c file.
This patch only moves the code and does not modify any functionality.
Jijie Shao [Wed, 10 Jun 2026 06:06:16 +0000 (14:06 +0800)]
net: hns3: support IP and tunnel VNI dissectors for tc flow
Currently, the driver does not support FLOW_DISSECTOR_KEY_IP and
FLOW_DISSECTOR_KEY_ENC_KEYID. But the hardware supports
ip_tos (FLOW_DISSECTOR_KEY_IP) and
outer_tun_vni (FLOW_DISSECTOR_KEY_ENC_KEYID).
This patch adds support for FLOW_DISSECTOR_KEY_IP and
FLOW_DISSECTOR_KEY_ENC_KEYID.
Additionally, since tc flow cannot effectively support
l2_user_def, l3_user_def, and l4_user_def,
this patch explicitly sets them to not be used.
Jijie Shao [Wed, 10 Jun 2026 06:06:14 +0000 (14:06 +0800)]
net: hns3: improve the unused_tuple parameter setting
Currently, when the tc tool is used to set flow table rules, the IP address
and MAC address can be configured separately, for example, src_xx or dst_xx
can be configured separately.
Therefore, the driver needs to check whether the mask is all zero in
keys, such as FLOW_DISSECTOR_KEY_IPV4_ADDRS, FLOW_DISSECTOR_KEY_IPV6_ADDRS,
and FLOW_DISSECTOR_KEY_ETH_ADDRS.
If the mask is all zero, the tuple is not configured.
In this case, the driver adds the tuple to unused_tuple.
Jijie Shao [Wed, 10 Jun 2026 06:06:13 +0000 (14:06 +0800)]
net: hns3: refactor add_cls_flower to prepare for multiple actions
Remove the tc parameter from the add_cls_flower() ops callback and
refactor action parsing to support future extensions for SELECT_QUEUE
and DROP_PACKET actions.
Changes:
* Remove the tc parameter from the add_cls_flower() callback signature.
* Extract TC-based action parsing into hclge_get_tc_flower_action().
* Move the dissector->used_keys check from hclge_parse_cls_flower() to
hclge_check_cls_flower(), and restrict ETH_ADDRS to
HCLGE_FD_MODE_DEPTH_2K_WIDTH_400B_STAGE_1 mode since hardware only
supports MAC matching there.
* Migrate error reporting from dev_err() to netlink extended ACK (extack).
The FDB management done by the dpaa2_switch_port_set_fdb() function is
hard to follow even by trained eyes. This series tries to make it easier
to read and understand it by factoring out some code blocks into helper
functions and unifying the join and leave paths in terms of FDB
management.
====================
Ioana Ciornei [Wed, 10 Jun 2026 15:09:12 +0000 (18:09 +0300)]
dpaa2-switch: unify the FDB update logic in dpaa2_switch_port_set_fdb()
For both the join and leave paths, the logic goes through the following
steps: determines which FDB should be used on a port after the current
changeupper change, populate the private port structures with the new
FDB and, if necessary, make as not used the old FDB.
Instead of having two distinct paths inside the
dpaa2_switch_port_set_fdb() for linking=true and linking=false, unify
them. This will hopefully help in making this function easier to read.
Ioana Ciornei [Wed, 10 Jun 2026 15:09:11 +0000 (18:09 +0300)]
dpaa2-switch: move FDB selection for leave path into a helper
Move the FDB selection for when a port leaves bridge into a new helper -
dpaa2_switch_fdb_for_leave(). This will hopefully make the
dpaa2_switch_port_set_fdb() function easier to read and follow. The new
helper only determines the FDB to be used, any updates into the private
port structure still gets done in the set_fdb() function.
Ioana Ciornei [Wed, 10 Jun 2026 15:09:10 +0000 (18:09 +0300)]
dpaa2-switch: move FDB selection for join path into a helper
The dpaa2_switch_port_set_fdb() function handles the setup of the FDB
for both changeupper cases: join and leave. Move the code block which
handles the join path into a new helper - dpaa2_switch_fdb_for_join() -
with the hope that the entire function will become easier to read and
extend with other use cases in the future.
This new helper just determines and returns what FDB should be used for
a specific port, the cleanup of the old FDB and the actual setup in the
per port structure remains in the dpaa2_switch_port_set_fdb() function.
Ioana Ciornei [Wed, 10 Jun 2026 15:09:09 +0000 (18:09 +0300)]
dpaa2-switch: factor out the FDB in-use check into a helper
The dpaa2_switch_port_set_fdb() function is hard to follow and
open-coding the in-use check into it makes it even harder to read.
Factor out that code block into a new helper -
dpaa2_switch_fdb_in_use_by_others().
Ioana Ciornei [Wed, 10 Jun 2026 15:09:08 +0000 (18:09 +0300)]
dpaa2-switch: change dpaa2_switch_port_set_fdb() function prototype
Since there dpaa2_switch_port_set_fdb() never fails and its return value
was never checked, change its prototype to return void.
Also, instead of determining if the DPAA2 port is joining or leaving an
upper based on the value of the 'bridge_dev' parameter, add the
'linking' parameter to explicitly specify the action.
Victor Nogueira [Wed, 10 Jun 2026 18:37:44 +0000 (02:37 +0800)]
selftests/tc-testing: Verify IFE can handle truncated inner Ethernet header
Add a tdc test that exercises the act_ife decode path with a malformed
IFE packet whose encapsulated inner Ethernet header is truncated.
The injected frame has a valid outer Ethernet header (ethertype 0xED3E)
and a minimal IFE header (metalen 2, i.e. no metadata TLVs), but the
payload that should hold the original frame is a single byte instead of
a full Ethernet header. Once ife_decode() strips the outer header and
the IFE metadata, fewer than ETH_HLEN bytes are left, which previously
let eth_type_trans() read past the end of the linear data.
Yong Wang [Wed, 10 Jun 2026 18:37:43 +0000 (02:37 +0800)]
net: ife: require ETH_HLEN to be pullable in ife_decode()
ife decode may return after making only the outer IFE header and
metadata pullable. The caller then passes the decapsulated packet to
eth_type_trans(), which expects the inner Ethernet header to be
accessible from the linear data area.
With a malformed IFE frame, the inner Ethernet header may still be
shorter than ETH_HLEN in the linear area, which can lead to a crash in
the original code.
Fix this by extending the pull check in ife_decode() so that the inner
Ethernet header is also guaranteed to be pullable before returning.
Fixes: ef6980b6becb ("introduce IFE action") Cc: stable@vger.kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yong Wang <edragain@163.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/20260610183814.1648888-2-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The SPI drivers read properties whose bindings use normal uint32 cells.
Using boolean or u16 helpers makes the access look like a different DT
encoding and causes the property checker to flag the call sites.
Use presence checks for unsupported properties and read numeric cell
properties through u32 helpers before assigning to driver fields.
ASoC: dt-bindings: Fix RT5677 "realtek,gpio-config" type
"realtek,gpio-config" is described as six 8-bit GPIO configuration
values, and the RT5677 driver stores and reads those values as bytes.
The binding incorrectly documented the property as a uint32 array.
Document "realtek,gpio-config" as a uint8-array so the generated
schema matches the hardware definition and the existing driver helper.
Assisted-by: Codex:gpt-5-5 Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260612214911.1883234-1-robh@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260609213559.178657-15-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Agalakov Daniil [Tue, 9 Jun 2026 21:35:54 +0000 (14:35 -0700)]
e1000e: limit endianness conversion to boundary words
[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().
The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.
Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Agalakov Daniil [Tue, 9 Jun 2026 21:35:53 +0000 (14:35 -0700)]
e1000: limit endianness conversion to boundary words
[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().
The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.
Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Matt Vollrath [Tue, 9 Jun 2026 21:35:52 +0000 (14:35 -0700)]
e1000e: Use __napi_schedule_irqoff()
The __napi_schedule_irqoff() macro is intended to bypass saving and
restoring IRQ state when scheduling is requested from an IRQ handler,
where hard interrupts are already disabled. Use this macro in all three
interrupt handlers.
This was tested on a system with an I218-V and MSI interrupts. Because
this is an optimization, I was interested in measuring the impact, so I
added ktime_get() time measurement to e1000_intr_msi and a print of the
last sample in the watchdog task. For each test case I ran a
bi-directional iperf3 to saturate the line. With some help from awk,
here are the statistics.
49 samples each, all units ns
previous: min 678 max 1265 mean 879.429 median 806 stddev 137.188
noirq: min 707 max 1165 mean 811.857 median 790 stddev 89.486
According to this informal comparison, the mean time to handle an
interrupt from start to finish is improved by about 8% under load.
Signed-off-by: Matt Vollrath <tactii@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Michal Cohen <michalx.cohen@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260609213559.178657-12-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daiki Harada [Tue, 9 Jun 2026 21:35:51 +0000 (14:35 -0700)]
igc: use napi_schedule_irqoff() instead of napi_schedule()
Replace napi_schedule() with napi_schedule_irqoff()
in the interrupt handler path in igc driver
Tested on Intel Corporation Ethernet Controller I226-V.
Suggested-by: Kohei Enju <kohei@enjuk.jp> Signed-off-by: Daiki Harada <daiky0325@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260609213559.178657-11-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
e1000e: use ktime_get_real_ns() in e1000e_systim_reset()
Replace ktime_to_ns(ktime_get_real()) with the direct equivalent
ktime_get_real_ns() in e1000e_systim_reset(). Using the combined helper
avoids the unnecessary intermediate ktime_t variable and makes the
intent clearer.
Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260609213559.178657-9-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
igb: use ktime_get_real helpers in igb_ptp_reset()
Replace ktime_to_ns(ktime_get_real()) with the direct equivalent
ktime_get_real_ns() and ktime_to_timespec64(ktime_get_real()) with
ktime_get_real_ts64() in igb_ptp_reset(). Using the combined helpers
makes the intent clearer.
Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Suggested-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260609213559.178657-8-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260609213559.178657-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Raczynski [Tue, 9 Jun 2026 21:35:45 +0000 (14:35 -0700)]
net/intel: Replace manual array size calculation with ARRAY_SIZE
There are still places in the code where manual calculation of array size
exist, but it is good to enforce usage of single macro through the whole
code as it makes code bit more readable.
While at it, beautify condition surrounding it by reversing check and remove
unnecessary casting.
iavf: iavf_virtchnl_completion: drop duplicate ether_addr_equal() test
This is just a simple cleanup fix. Commit 35a2443d0910f ("iavf: Add
waiting for response from PF in set mac") introduced a duplicate
ether_addr_equal() check, so the current code tests the new MAC twice
against the former MAC.
Remove the outer ether_addr_equal() test, remnant of commit c5c922b3e09b
("iavf: fix MAC address setting for VFs when filter is rejected")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260609213559.178657-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
idpf: Replace use of system_unbound_wq with system_dfl_wq
This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:
This series extends Marvell octeontx2-af support for CN20K NPC (MCAM
debuggability, allocation policy, default-rule lifetime, optional KPU
profiles from firmware files, X2/X4 MCAM keyword handling in flows and
defaults, and dynamic CN20K NPC private state), adds a devlink mechanism
for multi-value parameters, and moves devlink_nl_param_fill() temporaries
to the heap so stack usage stays reasonable once union devlink_param_value
grows (patch 3).
Patch 1 enforces a single RVU admin-function PCI device in the kernel.
On Octeon series SoCs, hardware resources such as NPC, NIX and related
blocks are global and coordinated by the AF driver; PFs and VFs request
them through AF mailbox messages. Firmware exposes only one AF PCI
function at boot, so two AF driver instances cannot both own that state.
rvu_probe() rejects a second bind with -EBUSY, logs a warning, clears the
probe gate on early allocation failures, and aligns the driver model with
hardware so reviewers and automation can rely on exactly one bound AF.
Patch 2 improves CN20K MCAM visibility in debugfs: mcam_layout marks
enabled entries, dstats reports per-entry hit deltas (baseline updated in
software after each read; hardware counters are not cleared), and mismatch
lists enabled entries without a PF mapping.
Patch 3 allocates the per-configuration-mode union devlink_param_value
buffers and struct devlink_param_gset_ctx used by devlink_nl_param_fill()
with kcalloc()/kzalloc_obj() and funnels failures through a single cleanup
path so the netlink reply path stays safe as the union grows.
Patch 4 (Saeed) introduces DEVLINK_PARAM_TYPE_U64_ARRAY and nested
DEVLINK_ATTR_PARAM_VALUE_DATA attributes so drivers and user space can
exchange bounded u64 arrays; YAML, uapi, and netlink validation are
updated.
Patch 5 adds a runtime devlink parameter srch_order to reorder CN20K
subbank search during MCAM allocation (the param uses the u64 array type
from patch 4).
Patch 6 ties default MCAM entries to NIX LF alloc/free on CN20K, adds
NIX_LF_DONT_FREE_DFT_IDXS for PF teardown paths that must not drop default
NPC indexes while the driver still owns state, and tightens nix_lf_alloc
error propagation.
Patch 7 allows loading a custom KPU profile from /lib/firmware/kpu via
module parameter kpu_profile, with cam2 / ptype_mask wiring and helpers
that share firmware-sourced vs filesystem-sourced profile layouts.
Patch 8 makes default-rule allocation, AF flow install, and PF-side RSS,
defaults, and ethtool flows respect the active CN20K MCAM keyword width
(X2 vs X4), including X4 reference-index masking and -EOPNOTSUPP when a
flow needs X4 keys on an X2-only profile.
Patch 9 replaces file-scope npc_priv and static dstats with allocation
sized from discovered bank/subbank geometry, threads npc_priv_get()
through CN20K NPC paths, and allocates dstats via devm_kzalloc for the
debugfs helper.
Patch 1 is ordered first so later patches assume a single bound AF.
Heap-backed devlink_nl_param_fill() sits immediately before the U64 array
param work so incremental builds stay stack-safe as the union grows; the
CN20K patches keep srch_order ahead of NIX LF coordination, optional KPU
profile load from firmware files, X2/X4 handling, and the npc_priv refactor
that touches the same files heavily.
====================
octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.
Replace the file-scope static npc_priv with a kcalloc'd struct filled
from hardware bank/subbank geometry at init (num_banks is no longer a
const compile-time constant; drop init_done and use a non-NULL
npc_priv pointer for liveness). Thread npc_priv_get() / pointer access
through the CN20K NPC code paths, extend teardown to kfree the root
struct on failure and in npc_cn20k_deinit, and adjust MCAM section
setup to use the discovered subbank count.
Allocate MCAM debugfs dstats via devm_kzalloc instead of a static matrix,
and use the allocated backing store consistently when computing deltas
(including the counter rollover compare).
octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc
Default CN20K NPC rule allocation now keys off the active MCAM keyword
width: use X4 with a bank-masked reference index when the silicon uses
X4 keys, and X2 with the raw index otherwise (replacing the previous
always-X2 / eidx + 1 behaviour).
In the AF flow-install path, flows that need more than 256 key bits
query the NPC profile; if the platform is fixed to X2 entries, fail
with -EOPNOTSUPP instead of requesting X4. Otherwise select X4 for the
MCAM alloc.
On the PF, cache and pass the profile kw_type from npc_get_pfl_info
through otx2_mcam_pfl_info_get(), and use it when allocating MCAM
entries for RSS/defaults and when installing ethtool flows on CN20K,
including masking the reference index for X4 slot layout.
octeontx2-af: npc: Support for custom KPU profile from filesystem
Flashing updated firmware on deployed devices is cumbersome. Provide a
mechanism to load a custom KPU (Key Parse Unit) profile directly from
the filesystem at module load time.
When the rvu_af module is loaded with the kpu_profile parameter, the
specified profile is read from /lib/firmware/kpu and programmed into
the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
used by filesystem-loaded profiles and support ptype/ptype_mask in
npc_config_kpucam when profile->from_fs is set.
Usage:
1. Copy the KPU profile file to /lib/firmware/kpu.
2. Build OCTEONTX2_AF as a module.
3. Load: insmod rvu_af.ko kpu_profile=<profile_name>
octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle
Add NIX_LF_DONT_FREE_DFT_IDXS so the PF can send NIX LF free during hw
reinit or teardown without the AF freeing CN20K default NPC rule indexes
while the driver still owns that state (otx2_init_hw_resources and
otx2_free_hw_resources).
On CN20K, allocate default NPC rules from NIX LF alloc before
nix_interface_init, roll back with npc_cn20k_dft_rules_free on failure,
and free from NIX LF free when the new flag is not set. Tighten
rvu_mbox_handler_nix_lf_alloc error handling: use a single rc, propagate
qmem_alloc and other errors, and set -ENOMEM only when kcalloc fails
(remove the blanket -ENOMEM at the free_mem path).
octeontx2-af: npc: cn20k: add subbank search order control
CN20K NPC MCAM is split into 32 subbanks that are searched in a
predefined order during allocation. Lower-numbered subbanks have
higher priority than higher-numbered ones.
Add a runtime "srch_order" to control the order in which
subbanks are searched during MCAM allocation.
Saeed Mahameed [Tue, 9 Jun 2026 04:04:48 +0000 (09:34 +0530)]
devlink: Implement devlink param multi attribute nested data values
Devlink param value attribute is not defined since devlink is handling
the value validating and parsing internally, this allows us to implement
multi attribute values without breaking any policies.
Devlink param multi-attribute values are considered to be dynamically
sized arrays of u64 values, by introducing a new devlink param type
DEVLINK_PARAM_TYPE_U64_ARRAY, driver and user space can set a variable
count of u64 values into the DEVLINK_ATTR_PARAM_VALUE_DATA attribute.
Implement get/set parsing and add to the internal value structure passed
to drivers.
This is useful for devices that need to configure a list of values for
a specific configuration.
example:
$ devlink dev param show pci/... name multi-value-param
name multi-value-param type driver-specific
values:
cmode permanent value: 0,1,2,3,4,5,6,7
$ devlink dev param set pci/... name multi-value-param \
value 4,5,6,7,0,1,2,3 cmode permanent
devlink: heap-allocate param fill buffers in devlink_nl_param_fill
devlink_nl_param_fill() kept two per-configuration-mode copies of
union devlink_param_value plus a struct devlink_param_gset_ctx on the
stack while building the Netlink reply. Allocate those with kcalloc()
and kzalloc_obj() instead, and route failures through a single cleanup
path so temporary buffers are always freed.
Improve MCAM visibility and field debugging for CN20K NPC.
- Extend "mcam_layout" to show enabled (+) or disabled state per entry
so status can be verified without parsing the full "mcam_entry" dump.
- Add "dstats" debugfs entry: for enabled MCAM indices, print hit deltas
since the prior read by comparing hardware counters to a per-entry
software baseline and advancing that baseline after each read (hardware
counters are not cleared).
- Add "mismatch" debugfs entry: lists MCAM entries that are enabled
but not explicitly allocated, helping diagnose allocation/field issues.
On Octeon series SoCs, the AF is an integrated device within the SoC, and
hardware resources such as NPC, NIX and related blocks are global and
coordinated by the AF driver. Physical and virtual functions request those
resources via AF mailbox messages, so two AF driver instances cannot both
own that global state; firmware exposes only one AF PCI function at boot
and any further octeontx2-af PCI probe returns -EBUSY so software matches
the single-AF model.
====================
net/stmmac: Fixes for maximum TX/RX queues to use by driver
When contributing other changes preparing functions for new XGMAC hardware
https://lore.kernel.org/netdev/20260601162537.553512-1-j.raczynski@samsung.com/
there have been reports by Sashiko AI.
All of issues are wrong DTS configuration, but kernel needs to handle it.
====================
Jakub Raczynski [Thu, 11 Jun 2026 11:33:58 +0000 (13:33 +0200)]
net/stmmac: Apply MTL_MAX queue limit if config missing
When "snps,rx-queues-to-use" or "tx-queues-to-use" config in DTS is provided
current code will apply U8_MAX value for queues_to_use if there is input of
higher value. But actual maximum number of supported queues is set via
macro MTL_MAX_RX_QUEUES and MTL_MAX_TX_QUEUES, which currently have value of 8.
This value of U8_MAX will be capped to value provided by core in DMA
capabilities (dma_conf), but it does so only if core provides it.
This is true for XGMAC (dwxgmac2) and some GMAC (dwmac4),
but not for (dwmac1000). This capping is at later stage in stmmac_hw_init(),
and during stmmac_mtl_setup() we might parse fields outside allocated memory
if queues_to_use is over defines MTL_MAX_ values,
for example following rx_queues_cfg is array of size of MTL_MAX_RX_QUEUES.
Fix this by capping value to MTL_MAX during config parsing.
Jakub Raczynski [Thu, 11 Jun 2026 11:33:57 +0000 (13:33 +0200)]
net/stmmac: Apply TBS config only to used queues
While opening stmmac driver, there is enabling of TBS (Time-Based Scheduling)
option in dma config. Currently this is executed for all possible TX queues via
MTL_MAX_TX_QUEUES macro, but actual number of queues used might differ.
While setting this is generally harmless, since memory for MTL_MAX_TX_QUEUES
is allocated, it is incorrect, because it prepares config for unused queues.
Change this to apply tbs config only to tx_queues_to_use.
Co-developed-by: Chang-Sub Lee <cs0617.lee@samsung.com> Signed-off-by: Chang-Sub Lee <cs0617.lee@samsung.com> Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260611113358.3379518-2-j.raczynski@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wayen.Yan [Thu, 11 Jun 2026 23:09:56 +0000 (07:09 +0800)]
net: airoha: Fix debugfs new-tuple display for IPv4 ROUTE entries
In airoha_ppe_debugfs_foe_show(), the second switch statement falls
through from PPE_PKT_TYPE_IPV4_HNAPT/DSLITE to PPE_PKT_TYPE_IPV4_ROUTE,
accessing hwe->ipv4.new_tuple for all three types. However, IPv4 ROUTE
(3-tuple) entries do not contain a valid new_tuple — this field is only
meaningful for NATted flows (HNAPT/DSLITE). For ROUTE entries, the
memory at the new_tuple offset holds routing information, not NAT data,
so displaying "new=" produces garbage output.
Display new_tuple only for HNAPT and DSLITE, and let IPV4_ROUTE fall
through to the default case.
Wayen.Yan [Thu, 11 Jun 2026 23:09:13 +0000 (07:09 +0800)]
net: airoha: Fix register index for Tx-fwd counter configuration
In airoha_qdma_init_qos_stats(), the Tx-fwd counter configuration
register uses the same index (i << 1) as the Tx-cpu counter, which
overwrites the Tx-cpu configuration. The Tx-fwd counter value register
correctly uses (i << 1) + 1, so the configuration register should use
the same index.
Fix the REG_CNTR_CFG index from (i << 1) to ((i << 1) + 1) so that
the Tx-fwd counter is properly configured instead of clobbering the
Tx-cpu counter config.
Li Xiasong [Thu, 11 Jun 2026 13:56:47 +0000 (21:56 +0800)]
tipc: restrict socket queue dumps in enqueue tracepoints
tipc_sk_enqueue() runs with sk->sk_lock.slock held while the socket is
owned by user context. The spinlock protects the backlog queue in this
path, but it does not serialize against the socket owner consuming or
purging sk_receive_queue.
The TIPC_DUMP_ALL tracepoints in tipc_sk_enqueue() also dump
sk_receive_queue and can therefore dereference skbs that the socket
owner has already dequeued or freed. Restrict these dumps to
TIPC_DUMP_SK_BKLGQ, which matches the queue protected by the held
spinlock.
Keep the change limited to the enqueue path, where the unsafe queue dump
is reachable while the socket is owned by user context.
Fixes: 01e661ebfbad ("tipc: add trace_events for tipc socket") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20260611135647.3666727-1-lixiasong1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Thu, 11 Jun 2026 10:43:00 +0000 (12:43 +0200)]
net: airoha: better handle MIBs for GDM ports with multiple devs attached
In the context of a GDM port that can have multiple net_devices attached
(GDM3 and GDM4), the HW counters (MIBs) are global for the GDM port.
This cause duplicated stats reported to the kernel for the related
net_device.
The SoC supports a split MIB feature where each counter is tracked based
on the relevant HW channel (NBQ) to account for this scenario and
provide a way to select the related counter on accessing the MIB
registers.
Enable this feature for GDM3 and GDM4 and configure the relevant HW
channel before updating the HW stats to report correct HW counter to the
kernel for the related interface.
Move the stats struct from port to dev since HW counter are now specific
to the network device instead of the GDM port. Refactor
airoha_update_hw_stats() to take airoha_eth and airoha_gdm_port
parameters since the function operates on the entire port.
Ratheesh Kannoth [Thu, 11 Jun 2026 08:33:30 +0000 (14:03 +0530)]
octeontx2-af: fix NPC mailbox codes in mbox.h
Several NPC mailbox command IDs in the 0x601x range were assigned out of
order. Renumber and reorder the M() definitions so each opcode matches
the stable contract expected by userspace tools and applications.
Fixes: 4e527f1e5c15 ("octeontx2-af: npc: cn20k: Add new mailboxes for CN20K silicon") Cc: Suman Ghosh <sumang@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260611083330.1652181-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Frank Li [Wed, 10 Jun 2026 15:05:30 +0000 (11:05 -0400)]
dt-bindings: net: dsa: Convert lan9303.txt to yaml format
Convert lan9303.txt to yaml format to fix below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx53-kp-hsc.dtb: /soc/bus@50000000/i2c@53fec000/switch@a: failed to match any schema with compatible: ['smsc,lan9303-i2c']
Additional changes:
- rename switch-phy to switch in example.
Ovidiu Panait [Wed, 10 Jun 2026 08:52:38 +0000 (08:52 +0000)]
net: bcmgenet: Use weighted round-robin TX DMA arbitration
Under heavy network traffic, we observed sporadic TX queue timeouts on the
Raspberry Pi 4. The timeouts can be reproduced by stress testing the TX
path with multiple concurrent iperf UDP streams:
iperf3 -c <ip> -u -b0 -P16 -t60
NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2044 ms
NETDEV WATCHDOG: CPU: 3: transmit queue 0 timed out 2004 ms
Investigation showed that the timeouts are caused by the priority-based
arbiter. Under heavy load the highest priority queue starves the lower
priority ones, causing timeouts. The TX strict priority arbiter is not
suitable for the default use case where all the traffic gets spread
across all the TX queues.
Therefore, to fix this, switch the TX DMA arbiter to Weighted Round-Robin,
which services all queues, so they do not stall. The weights were chosen
to follow the existing priority scheme: q0 gets the smallest weight, while
q1-4 get the bulk of the TX bandwidth.