Florian Westphal [Fri, 12 Jun 2026 09:22:09 +0000 (11:22 +0200)]
selftests: netfilter: add phony nft_offload test
... "phony", because its not testing offloads, it tests the control
plane code. Also test error unwind via fault injection framework.
For a proper test, real hardware would be required given we'd have
check if 'previously handed off to hardware' offload commands are
properly removed again on failure or rule flush.
Florian Westphal [Fri, 12 Jun 2026 09:22:08 +0000 (11:22 +0200)]
netdevsim: tc: allow to test nf_tables offload control plane code
The actual 'offload' is phony, all commands are ignored: this is only
useful to test control plane code.
Tag the existing callback to permit error injection to test rollback/abort
code in nf_tables. This is also for fuzzers - the fault injection
framework allows probabilistic error insertion.
Wayen.Yan [Fri, 12 Jun 2026 09:37:00 +0000 (17:37 +0800)]
net: airoha: Fix error handling in airoha_ppe_flush_sram_entries()
In airoha_ppe_flush_sram_entries(), the outer "err" variable was never
updated when the inner loop variable shadowed it, causing the function
to always return 0 even when airoha_ppe_foe_commit_sram_entry() fails.
Drop the outer "err" variable and return directly on error, propagating
the error code from airoha_ppe_foe_commit_sram_entry() correctly.
Linus Torvalds [Sat, 13 Jun 2026 15:23:36 +0000 (08:23 -0700)]
Merge tag 'core-urgent-2026-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fix from Ingo Molnar:
- Fix potential debugobjects deadlock on PREEMPT_RT kernels (Waiman
Long)
* tag 'core-urgent-2026-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Don't call fill_pool() in early boot hardirq context
Linus Torvalds [Sat, 13 Jun 2026 15:14:17 +0000 (08:14 -0700)]
Merge tag 'i2c-for-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"The biggest news here is that this is my last pull request as I2C
maintainer after 13.5 years. Starting with the 7.2 cycle, Andi Shyti
is taking over who helped me greatly maintaining the host drivers for
a while now. Thank you, Andi, and good luck with the subsystem. I'll
be around for help, of course.
Technically, there are two patches which might be a tad large for this
late cycle, but most of them is explaining comments, so I think they
are suitable.
- MAINTAINERS:
- hand over I2C maintainership to Andi
- minor updates
- rust: fix I2cAdapter refcount double increment
- imx: keep clock and pinctrl states consistent in runtime PM
- imx-lpi2c: fix DMA resource leaks on PIO fallback
- qcom-cci: fix NULL pointer dereference on remove
- riic: fix reset refcount leak on resume_noirq error path
- stm32f7: account for analog filter in timing computation
- tegra:
- fix suspend/resume handling in NOIRQ phase
- update Tegra410 I2C timings to match hardware specs"
* tag 'i2c-for-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
dt-bindings: i2c: mux-gpio: name correct maintainer
MAINTAINERS: hand over I2C to Andi Shyti
i2c: imx-lpi2c: fix resource leaks switching to devm_dma_request_chan()
MAINTAINERS: i2c: designware: Remove inactive reviewer
i2c: tegra: Fix NOIRQ suspend/resume
i2c: tegra: Update Tegra410 I2C timing parameters
i2c: qcom-cci: Fix NULL pointer dereference in cci_remove()
i2c: stm32f7: fix timing computation ignoring i2c-analog-filter
i2c: imx: fix clock and pinctrl state inconsistency in runtime PM
i2c: riic: fix refcount leak in riic_i2c_resume_noirq()
rust: i2c: fix I2cAdapter refcounts double increment
Thomas Gleixner [Sat, 13 Jun 2026 14:24:29 +0000 (16:24 +0200)]
Merge tag 'timers-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/daniel.lezcano/linux into timers/clocksource
Pull clocksource/driver updates from Daniel Lezcano:
- Remove the sifive,fine-ctr-bits property bindings because it is a
redundant information (Nick Hu)
- Remove the TCIU8 interrupt bindings on Renesas because it should not
be described as the documentation marked reserved and fix the
conditional reset line for the RZ/{T2H,N2H} (Cosmin Tanislav)
- Extend schema condition for interrupts to cover D1 compatible
variant an add the D1 hstimer support (Michal Piekos)
- Update the ARM architected timer support to handle the ACPI GTDT v3
format and the EL2 virtual timer, enabling Linux to use the most
appropriate timer when running with VHE, while also fixing several
Device Trees to accurately reflect the underlying hardware (Marc
Zyngier)
- Cleanup and add the clocksource and the clockevent in the TI DM
timer (Markus Schneider-Pargmann)
- Add the multiple watchdogs support in the tegra186 and
tegra234. Dedicate one as a kernel watchdog (Kartik Rajput)
- Add the NXP clocksource selection for the scheduler in the Kconfig
(Enric Balletbo i Serra)
WenTao Liang [Thu, 11 Jun 2026 16:17:38 +0000 (00:17 +0800)]
posix-cpu-timers: Fix pid refcount leak in do_cpu_nanosleep() error path
In do_cpu_nanosleep(), posix_cpu_timer_create() takes a pid reference
via get_pid() and stores it in timer.it.cpu.pid. If the subsequent
posix_cpu_timer_set() call fails, the function returns immediately
without calling posix_cpu_timer_del() to release the pid reference,
causing a leak.
Fix it by calling posix_cpu_timer_del() before the unlock-and-return
on the error path, consistent with the other exit paths in the same
function.
Thomas Gleixner [Tue, 9 Jun 2026 15:14:45 +0000 (17:14 +0200)]
time/jiffies: Register jiffies clocksource before usage
Teddy reported that a XEN HVM has a long boot delay, which was bisected to
the recent enhancements to the negative motion detection. It turned out
that the jiffies clocksource is used in early boot before it is registered,
which leaves the max_delta_raw field at zero. That causes the read out to
be clamped to the max delta of 0, which means time is not making progress.
Cure it by ensuring that it is initialized before its first usage in
timekeeping_init().
Fixes: 76031d9536a0 ("clocksource: Make negative motion detection more robust") Reported-by: Teddy Astie <teddy.astie@vates.tech> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Teddy Astie <teddy.astie@vates.tech> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/87y0gn3fve.ffs@fw13 Closes: https://lore.kernel.org/all/1780914594.8631fc262581453bbf619ec5b2062170.19ea6c8227b000701b@vates.tech
The "ti,n-factor" binding and examples allow negative correction
values. Reading it as u32 makes the helper type disagree with the
documented signed value and hides real schema mismatches.
Use the signed helper so the DT access matches the s32 value stored by
the driver.
Keith Busch [Fri, 12 Jun 2026 22:32:04 +0000 (15:32 -0700)]
block: check bio split for unaligned bvec
Offsets and lengths need to be validated against the dma alignment. This
check was skipped for sufficiently a small bio with a single bvec, which
may allow an invalid request dispatched to the driver. Force the
validation for an unaligned bvec by forcing the bio split path that
handles this condition.
Fixes: 7eac33186957 ("iomap: simplify direct io validity check") Fixes: 5ff3f74e145a ("block: simplify direct io validity check") Reported-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://patch.msgid.link/20260612223205.465913-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Eric Dumazet [Sat, 13 Jun 2026 04:26:19 +0000 (04:26 +0000)]
nbd: Reclassify sockets to avoid lockdep circular dependency
syzbot reported a possible circular locking dependency in udp_sendmsg()
where fs_reclaim can be triggered while holding sk_lock, and fs_reclaim
can eventually depend on another sk_lock (e.g., if NBD is used for swap
or writeback and NBD uses TLS/TCP which acquires sk_lock).
Since the UDP socket and the NBD TCP/TLS socket are different, this is a
false positive. Fix this by reclassifying NBD sockets to a separate lock
class when they are added to the NBD device.
This is similar to what nvme-tcp and other network block devices do.
Fixes: ffa1e7ada456 ("block: Make request_queue lockdep splats show up earlier") Reported-by: syzbot+607cdcf978b3e79da878@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a2cdafe.428ffe26.258b27.0161.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260613042619.1108126-1-edumazet@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 30 May 2026 02:03:47 +0000 (20:03 -0600)]
io_uring/net: make POLL_FIRST receive side checks consistent
io_recv() and io_recvzc() are the odd ones out, as they checks for
whether POLL_FIRST should be honored before checking if the file is a
socket. It doesn't really matter, but might as well make it consistent
across all receive and send types.
Jens Axboe [Thu, 11 Jun 2026 17:44:47 +0000 (11:44 -0600)]
io_uring: remove the per-ctx fallback task_work machinery
With the tctx fallback running its entries directly, the per-ctx
fallback work has a single user left: moving local (DEFER_TASKRUN)
task_work entries out of a ring that is going away. Both of its call
sites are process context and don't hold ->uring_lock, the same
conditions the deferred fallback work itself ran under - so run the
entries in cancel mode right there instead, and rename the helper to
io_cancel_local_task_work() to match what it now does.
With that, ->fallback_llist, ->fallback_work, io_fallback_req_func()
and __io_fallback_tw() can all go away, along with the fallback work
flushing in the ring exit and cancel paths. Requests that get
orphaned by an exiting task now run via the tctx fallback work, which
the ring exit side implicitly waits on through the ctx refs those
requests hold.
Jens Axboe [Thu, 11 Jun 2026 17:41:25 +0000 (11:41 -0600)]
io_uring: run the tctx task_work fallback directly
The fallback work drains the tctx queue only to redistribute the entries
into the per-ctx fallback lists, bouncing them through a second
(per-ctx) work item before they finally run. That made sense when the
producer side did the draining and could be in any context, but the
fallback work is a regular process context kworker: it can just run the
entries itself. Reuse the normal run loop - if run from the fallback
kernel thread, ts.cancel will get set, and the work terminated.
Jens Axboe [Thu, 11 Jun 2026 16:13:22 +0000 (10:13 -0600)]
io_uring: switch normal task_work to a mpscq
Like the local task_work list, the normal (tctx) task_work list is an
llist, and hence needs the O(n) llist_reverse_order() pass before
running entries in queue order. On top of that, capped runs - sqpoll
processing IORING_TW_CAP_ENTRIES_VALUE entries at a time - need the
claimed-but-unprocessed leftovers carried in a separate retry_list,
as they can't be pushed back to the shared list.
Switch tctx->task_list to a mpscq, like what was done for the
DEFER_TASKRUN paths as well.
Jens Axboe [Wed, 10 Jun 2026 21:19:35 +0000 (15:19 -0600)]
io_uring: switch local task_work to a mpscq
The local (DEFER_TASKRUN) task_work list is an llist, which is LIFO
ordered, and hence __io_run_local_work() has to restore the right
running order with an O(n) llist_reverse_order() pass first. On top of
that, a batch that gets capped by max_events needs the leftover entries
parked on a separate ->retry_llist, as they can't be pushed back to the
shared list.
Switch it to the FIFO mpscq. Adds are wait-free instead of a cmpxchg
retry loop, entries are popped in queue order with no reversal pass,
capping a run simply leaves the remainder on the queue, and
->retry_llist goes away entirely. The consumer cursor, ->work_head,
lives with the rest of the ->uring_lock protected state rather than
next to the queue, so that popping entries doesn't dirty the producer
side cacheline.
For low amounts of task_work, this ends up being a bit more efficient
than the existing scheme. As an example of that, doing multishot
receives for 8 clients has the following task_work overhead:
most of that overhead is gone, and performance is better as well.
Caleb Sander Mateos <csander@purestorage.com> reports that it improves
the performance of a ublk 4kb workload by 4% [1], while testing v1 of
this patchset.
Local task_work is currently using llists for managing the work,
but that's a LIFO type of list. This means that running this task_work
needs to reverse the list first, to ensure fairness in running the
queued items.
Add a lockless FIFO queued, based on Dmitry Vyukov's intrusive MPSC
node-based queue algorithm, modified with an externally held consumer
cursor and conditional stub reinsertion. See comments in the header.
Producers are wait-free: a push is a single xchg() on the queue tail,
which serializes concurrent producers and defines the FIFO order, plus
a store linking the node to its predecessor. There are no cmpxchg retry
loops, and pushing is safe from any context, including hardirq.
The cost of linked list FIFO ordering is that a push publishes the node
in two steps - the xchg() makes it visible as the new tail before the
subsequent store links it into the chain that is reachable from the
head. A consumer hitting that window gets a NULL from mpscq_pop() while
mpscq_empty() reports false, and must retry later rather than treat the
queue as empty. The window is two instructions wide, but a producer can
get preempted inside it, so the consumer must not busy wait on it.
The consumer side supports a single consumer at a time, with callers
providing their own serialization. A stub node, which also defines the
empty state (tail == stub), allows the consumer to detach the final
node without racing against producer link stores: that node is only
handed out once the stub has been cmpxchg'ed back in as the tail. This
also guarantees that the previous tail returned by mpscq_push() cannot
get freed before that push has linked it, making it always valid for
comparisons.
The consumer cursor is deliberately not part of the queue struct - the
caller owns it and passes it to mpscq_pop(). This is done to separate
the consumer and producers cacheline. The cursor is written for every
popped entry, and keeping it on the same cacheline as ->tail would have
the consumer invalidating the line that producers need for every push.
Keeping it external lets the caller place it with its own consumer side
data instead.
Jens Axboe [Fri, 12 Jun 2026 02:27:22 +0000 (20:27 -0600)]
io_uring: grab RCU read lock marking task run
Not required right now, as io_req_local_work_add() already calls this
helper with the RCU read lock held. But in preparation for that not
being the case, grab it locally.
Paolo Abeni [Sat, 13 Jun 2026 09:50:31 +0000 (11:50 +0200)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-06-09 (idpf, ixgbe, igc)
Przemyslaw adds needed padding to idpf PTP structures to match firmware
expectations.
Larysa bypasses XPS configuration on XDP queues for ixgbe.
Khai Wen corrects offset into packet buffer when handling for frame
preemption on igc.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: skip RX timestamp header for frame preemption verification
ixgbe: do not configure xps for XDP queues
idpf: add padding to PTP virtchnl structures
====================
Ratheesh Kannoth [Wed, 10 Jun 2026 02:23:44 +0000 (07:53 +0530)]
octeontx2-af: npc: Fix size of entry2cntr_map
KASAN prints below splat. This is caused by allocating counter for
reserved mcam entry for cpt 2nd pass entry. But mcam->entry2cntr_map
is not allocated for reserved entries.
BUG: KASAN: slab-out-of-bounds in npc_map_mcam_entry_and_cntr+0xb0/0x1a0
Write of size 2 at addr ffff0001033e7ffe by task kworker/0:1/14
Woojin Ji [Fri, 12 Jun 2026 05:26:55 +0000 (14:26 +0900)]
selftests/bpf: Add arena direct-value one-past-end reject test
BPF_MAP_TYPE_ARENA supports direct-value pseudo loads, but unlike array
maps its map value_size is zero and the valid direct-value range is the
arena mmap size, max_entries * PAGE_SIZE.
Commit 3ac1a467e376 ("bpf: Fix off-by-one boundary validation in arena
direct-value access") fixed arena_map_direct_value_addr() to reject an
offset exactly at the end of the arena mapping. Add a regression test
that loads a BPF_PSEUDO_MAP_VALUE with off == arena_size and verifies
that the verifier rejects it with the expected offset in the log.
This is intentionally kept as a userspace raw-instruction test. I tried
expressing the same BPF_PSEUDO_MAP_VALUE + off == arena_size case in
verifier_arena.c with inline assembly. The only form that produces the
desired instruction bytes uses __imm_addr(arena), but that emits
R_BPF_64_NODYLD32, which the libbpf/bpftool link step rejects. Other
register, immediate, and memory constraints either fail in the BPF
backend or lower to a normal R_BPF_64_64 load followed by an ALU add,
which does not exercise arena_map_direct_value_addr() with the boundary
offset in the second ldimm64 slot.
A legacy test_verifier fixture can express the raw instruction directly,
but it needs arena map creation, mmap, and fixup plumbing in the legacy
runner. That is more intrusive than the small prog_tests raw-instruction
test.
Use the userspace raw-instruction test, following the existing selftests
pattern used for direct map-value pseudo loads, so insns[1].imm can be
set to arena_size precisely.
Assisted-by: ChatGPT:gpt-5.5 Signed-off-by: Woojin Ji <random6.xyz@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Cc: Emil Tsalapatis <emil@etsalapatis.com> Cc: Junyoung Jang <graypanda.inzag@gmail.com> Link: https://lore.kernel.org/r/20260612-arena-direct-value-v1-v4-1-b81b642f5277@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Gabriele Monaco [Wed, 10 Jun 2026 09:04:29 +0000 (11:04 +0200)]
rqspinlock: Fix order in raw_res_spin_(un)lock_irq to allow schedule
raw_res_spin_unlock_irqrestore() calls raw_res_spin_unlock() and then
restores interrupts, this means preemption is enabled when interrupts
are still disabled (as part of raw_res_spin_unlock()) so this cannot
trigger an actual preemption.
This is inconsistent with other spinlock implementations
(raw_spin_unlock_irqrestore() and bpf_res_spin_unlock_irqrestore()
itself).
Adjust the macro to ensure interrupts are enabled before enabling
preemption, allowing to schedule at that point. Make the same
modification in the error path of raw_res_spin_lock_irqsave().
====================
bpf: Fix setting retval to -EPERM for cgroup hooks not returning errno
This series fixes the issue reported by sashiko in [1]. The issue is that,
when a cgroup BPF program exits with 0, bpf_prog_run_array_cg() sets
the hook return value to -EPERM if it is not a valid errno. This is
correct for errno-based hooks, which return 0 on success and negative
errno on failure, but wrong for void and boolean LSM hooks. Boolean
LSM hooks should only return true or false, and void LSM hooks have
no return value at all.
Fix it by skipping setting -EPERM for hooks not returning errno.
Xu Kuohai [Wed, 10 Jun 2026 20:17:24 +0000 (20:17 +0000)]
selftests/bpf: Add retval test for bool and errno LSM cgroup hooks
Add test to check the return value when a BPF program exits with 0 for
a boolean and an errno LSM hook.
For each hook, two BPF programs are attached. The first program returns
0 without calling bpf_set_retval() to exercise the return value translation
logic, while the second program reads the retval via bpf_get_retval().
Xu Kuohai [Wed, 10 Jun 2026 20:17:23 +0000 (20:17 +0000)]
bpf: Fix setting retval to -EPERM for cgroup hooks not returning errno
When a cgroup BPF program exits with 0, bpf_prog_run_array_cg() sets
the hook return value to -EPERM if it is not a valid errno. This is
correct for errno-based hooks, which return 0 on success and negative
errno on failure, but wrong for boolean and void LSM hooks. Boolean
LSM hooks should only return true or false, and void LSM hooks have
no return value at all.
Fix it by skipping setting -EPERM for hooks not returning errno.
Kaitao Cheng [Tue, 9 Jun 2026 06:13:35 +0000 (14:13 +0800)]
firewire: core: Open-code topology list walk
A later change will make list_for_each_entry() cache the next element
before entering the loop body. for_each_fw_node() intentionally appends
newly discovered child nodes to the temporary walk list while the list is
being traversed.
Keep the loop open-coded so the next node is looked up only after
children have been appended. This preserves the current breadth-first
traversal semantics and prepares the code for the list iterator update.
net: qrtr: fix 32-bit integer overflow in qrtr_endpoint_post()
qrtr_endpoint_post() validates an incoming packet with
if (!size || len != ALIGN(size, 4) + hdrlen)
goto err;
where size comes from the wire. On 32-bit, size_t is 32 bits and
ALIGN(size, 4) wraps to 0 for size >= 0xfffffffd, so the check
passes and skb_put_data(skb, data + hdrlen, size) writes past the
hdrlen-sized skb and oopses the kernel. 64-bit is unaffected.
This is the 32-bit residual of ad9d24c9429e2 ("net: qrtr: fix OOB
Read in qrtr_endpoint_post"), which fixed only the 64-bit case.
Reject any size that cannot fit the buffer before the ALIGN.
Fixes: ad9d24c9429e2 ("net: qrtr: fix OOB Read in qrtr_endpoint_post") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260611125455.2352279-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dragos Tatulea [Thu, 11 Jun 2026 13:52:30 +0000 (16:52 +0300)]
net/mlx5: Check max_macs devlink param value against max capability
The max_macs devlink param is checked against the FW max value only at
param register time (driver load) and inside the validate callback
(devlink param set). The stored DRIVERINIT value persists across FW
resets and devlink reloads without any further checks against the max.
If the FW link type changes from Ethernet to IB and a FW reset happens,
the MAX cap for log_max_current_uc_list will become zero, but the
previously stored max_macs value remains and is unconditionally
programmed into the HCA caps in handle_hca_cap(). FW will then return a
syndrome during SET_HCA_CAP:
mlx5_cmd_out_err:839:(pid 3831): SET_HCA_CAP(0x109) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x537801), err(-22)
set_hca_cap:907:(pid 3831): handle_hca_cap failed
This results in a failure to register the RDMA device.
This patch skips programming log_max_current_uc_list when the MAX
capability is 0 (in case of IB).
Fixes: 8680a60fc1fc ("net/mlx5: Let user configure max_macs generic param") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260611135230.534513-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
psp: Add support for dev-assoc/disassoc
The main purpose of this feature is to associate virtual devices like
veth or netkit with a real PSP device, so we could provide PSP
functionality to the application running with virtual devices.
A typical deployment that works with this feature is as follows:
Host Namespace:
psp_dev_local ←──physically linked──→ psp_dev_peer
(PSP device)
│
│ BPF on psp_dev_local ingress: bpf_redirect_peer() to nk_guest
│
nk_host / veth_host
│
│ BPF on nk_host ingress: bpf_redirect_neigh() to psp_dev_local
│
Guest Namespace (netns):
│
nk_guest / veth_guest
★ PSP application run here
Remote Namespace (_netns):
psp_dev_peer
★ PSP server application runs here
Note:
The general requirement for this feature to work:
For PSP to work correctly, the egress device at validate_xmit_skb()
time must have psp_dev matching the association's psd. Any device
stacking or traffic redirection that changes the egress device will
cause either:
1. TX validation failure (SKB_DROP_REASON_PSP_OUTPUT) - fail-safe
2. RX policy failure after tx-assoc - packets without PSP extension
are rejected by receiver expecting encrypted traffic
Here are a few examples that this feature would not work:
- Bonding with load balancing in round-robin, XOR, 802.3ad mode across
multiple PSP devices, or mixed PSP and non-PSP devices
- Bonding with active-backup mode might work without PSP migration for
failover case.
- ipvlan/macvlan in bridge mode would not work given packets are
loopbacked locally without going through the PSP device.
====================
Wei Wang [Mon, 8 Jun 2026 23:31:18 +0000 (16:31 -0700)]
selftests/net: psp: add dev-get, no-nsid, and cleanup tests
Add the following 3 tests:
- _psp_dev_get_check_netkit_psp_assoc: verifies dev-get output in both
host and guest namespaces, checking assoc-list, by-association flag,
and nsid values
- _dev_assoc_no_nsid: tests dev-assoc and dev-disassoc without the nsid
attribute, verifying ifindex lookup in the caller's namespace
- _psp_dev_assoc_cleanup_on_netkit_del: verifies that deleting the
associated netkit interface properly cleans up the assoc-list, using
a disposable netkit pair to avoid disturbing the shared environment
Add tests that verify PSP notifications are delivered to listeners in
associated namespaces:
- _key_rotation_notify_multi_ns_netkit: triggers key rotation and
verifies the notification is received in both main and guest namespaces
- _dev_change_notify_multi_ns_netkit: triggers dev_set and verifies the
dev_change notification is received in both namespaces
Wei Wang [Mon, 8 Jun 2026 23:31:16 +0000 (16:31 -0700)]
selftests/net: psp: add dev-assoc data path test
Add _assoc_check_list() test that associates nk_guest with the PSP
device and verifies the assoc-list is correctly populated.
Add _data_basic_send_netkit_psp_assoc() which tests PSP data send
through a netkit interface associated with a PSP device. The test
associates nk_guest with the PSP device, then sends PSP-encrypted
traffic from the guest namespace.
Wei Wang [Mon, 8 Jun 2026 23:31:15 +0000 (16:31 -0700)]
selftests/net: psp: support PSP in NetDrvContEnv infrastructure
Add infrastructure to support PSP tests across network namespaces
using NetDrvContEnv with netkit pairs. This enables testing PSP device
association, where a non-PSP-capable device (e.g. netkit) in a guest
namespace is associated with a real PSP device in the host namespace,
allowing the guest to perform PSP encryption/decryption through the
host's PSP hardware.
env.py:
- nk_guest_ifindex is queried after moving the device into the guest
namespace, so tests can use it directly for dev-assoc
psp.py:
- PSP device lookup supports container environments where the PSP
device is on the physical interface, not the test interface
- Association helpers handle dev-assoc/dev-disassoc with defer-based
cleanup to prevent state leaks on test assertion failures
- main() tries NetDrvContEnv with primary_rx_redirect and falls back
to NetDrvEpEnv, so existing tests continue to work without the
container environment
Wei Wang [Mon, 8 Jun 2026 23:31:14 +0000 (16:31 -0700)]
selftests/net: rename _nk_host_ifname to nk_host_ifname
Rename _nk_host_ifname to nk_host_ifname in NetDrvContEnv to make it
a public attribute, matching the nk_guest_ifname rename. Tests that
access the host-side netkit interface name (e.g. for cleanup after
deleting the netkit pair) no longer trigger pylint protected-access
warnings.
Wei Wang [Mon, 8 Jun 2026 23:31:13 +0000 (16:31 -0700)]
selftests/net: add _find_bpf_obj() to search hw/ for BPF objects
Add _find_bpf_obj() helper to NetDrvContEnv that searches the test
directory first, then falls back to the hw/ subdirectory. This allows
tests outside drivers/net/hw/ (e.g. psp.py in drivers/net/) to find
BPF objects built in the hw/ directory.
Update _attach_bpf() and _attach_primary_rx_redirect_bpf() to use
_find_bpf_obj() for BPF object discovery.
Wei Wang [Mon, 8 Jun 2026 23:31:12 +0000 (16:31 -0700)]
selftests/net: psp: refactor test builders to use ksft_variants
Replace the manual psp_ip_ver_test_builder() and ipver_test_builder()
functions with @ksft_variants decorators for data_basic_send and
data_mss_adjust. This is a pure refactor with no behavior change.
Wei Wang [Mon, 8 Jun 2026 23:31:10 +0000 (16:31 -0700)]
psp: add new netlink cmd for dev-assoc and dev-disassoc
The main purpose of this cmd is to be able to associate a
non-psp-capable device (e.g. veth or netkit) with a psp device.
One use case is if we create a pair of veth/netkit, and assign 1 end
inside a netns, while leaving the other end within the default netns,
with a real PSP device, e.g. netdevsim or a physical PSP-capable NIC.
With this command, we could associate the veth/netkit inside the netns
with PSP device, so the virtual device could act as PSP-capable device
to initiate PSP connections, and performs PSP encryption/decryption on
the real PSP device.
Wei Wang [Mon, 8 Jun 2026 23:31:09 +0000 (16:31 -0700)]
psp: add admin/non-admin version of psp_device_get_locked
Introduce 2 versions of psp_device_get_locked:
1. psp_device_get_locked_admin(): This version is used for operations
that would change the status of the psd, and are currently used for
dev-set and key-rotation.
2. psp_device_get_locked(): This is the non-admin version, which are
used for broader user issued operations including: dev-get, rx-assoc,
tx-assoc, get-stats.
Following commit will be implementing both of the checks.
Generic XDP devmap multi redirect can leave cloned skbs sharing packet
data. When a devmap egress program mutates packet data, another
destination sharing the same data may observe that mutation.
Fix this by making cloned skbs private before running the generic devmap
egress program. The private copy is made in dev_map_generic_redirect()
so dev_map_bpf_prog_run_skb() can keep returning the XDP action directly.
Add selftest coverage for the last-destination case, where the final
destination runs on the original skb while earlier destinations use
cloned skbs. The test records the source MAC observed by an earlier
destination and checks that it is neither the sentinel value left in the
result map nor the MAC written by the final destination.
---
v5:
- Move the skb_copy() check back to dev_map_generic_redirect() to keep
dev_map_bpf_prog_run_skb() returning only the XDP action.
- Preserve mac_len after skb_copy().
- Use __be64 temporary values when updating mac_map from userspace.
- Initialize rx_mac with a sentinel in the last-destination test instead
of relying on -ENOENT for ARRAY map lookups.
- Adjust the last-destination test topology so the checked earlier
destination is not the ingress/source veth.
- Split the last-destination check into two assertions: one for store_mac_1
updating rx_mac and one for detecting last-destination rewrite leakage.
v4: https://lore.kernel.org/bpf/20260611080850.536996-1-sun.jian.kdev@gmail.com/T/#mf830f03d362f33e0941d1b0e425169698fce76e5
- Preserve mac_len after skb_copy().
- Separate errno return from XDP action output in
dev_map_bpf_prog_run_skb().
- Zero-initialize net_config in the new selftest.
v3: https://lore.kernel.org/bpf/20260611043317.512843-1-sun.jian.kdev@gmail.com/
- Split the kernel fix and selftest into separate patches.
- Move the private-copy logic into dev_map_bpf_prog_run_skb().
- Use deterministic DEVMAP_HASH keys in the last-destination selftest.
- Fix the Fixes tag.
v2: https://lore.kernel.org/bpf/08c35c70-a59e-4e0e-91db-22b5ec30b611@linux.dev/
- Move the private-copy step into dev_map_generic_redirect() so the
last-destination path is covered as well.
- Use skb_copy() instead of skb_unshare() to keep caller ownership
unchanged on allocation failure.
- Add a generic XDP last-destination selftest case.
Strengthen xdp_veth_egress to check that each destination observes the
MAC selected for its own egress ifindex, instead of only checking that
the observed MAC differs from a single magic value.
Add a generic XDP last-destination test where an earlier destination does
not have a devmap egress program while the final destination does. This
covers the case where the final destination runs on the original skb and
could otherwise rewrite packet data still shared with an earlier cloned
skb.
Use deterministic DEVMAP_HASH keys for the egress map so the intended
last destination is stable. Initialize the result map with a sentinel
value and check that store_mac_1 overwrites it before checking that the
earlier destination did not observe the MAC written by the final
destination.
Sun Jian [Fri, 12 Jun 2026 11:40:31 +0000 (19:40 +0800)]
bpf: Run generic devmap egress prog on private skb
Generic XDP devmap multi redirect uses skb_clone() for intermediate
destinations and sends the last destination with the original skb. This
can leave multiple destinations sharing the same packet data.
This becomes visible after generic devmap egress-program support was
added: a devmap egress program may mutate packet data, and another
destination sharing the same data can observe that mutation.
Native XDP broadcast redirect does not have this issue because
xdpf_clone() copies the frame data for each destination. Generic XDP
should provide the same per-destination isolation before running a
devmap egress program.
Fix this by making cloned skbs private before running the generic devmap
egress program. Use skb_copy() instead of skb_unshare() so allocation
failure does not consume the skb and the existing caller error paths keep
their ownership semantics.
This series continues the rework of the KSZ driver initiated by two previous
series (see [1] & [2]).
The KSZ driver handles more than 20 switches split in several families.
This was previously handled through a common set of dsa_switch_ops
operations that used device-specific ksz_dev_ops callbacks. The two
previous series have split this common struct dsa_switch_ops into 5
to connect the ksz_dev_ops's implentations directly to the new
dsa_swicth ops.
This series continues in the same vein and removes the dsa_switch_ops
operations that aren't used.
On top of this on-going rework I added PTP and periodic output support for
the KSZ8463 (which was my first goal). There are still more than 20 patches
left for all this so this series will be followed by three others and if you
want to see the full picture we can check my github ([3]).
FYI, I only have a KSZ8463 so, unfortunately, I can't test other switches.
The next series is going to move out of ksz_common.c the last remaining
functions that aren't truly common to all KSZ switches. The series after
that will add PTP support for the KSZ8463 and the final one will add
periodic output support for the KSZ8463.
net: dsa: microchip: implement port_teardown only if needed
The port_teardown() operation is optional. Yet, it is implemented by all
the KSZ switches through a common function that doesn't do anything for
the switches that aren't part of the ksz9477 family
Remove the implementation from the switches that don't need it.
Implement instead a ksz9477-specific port_teardown.
All the switches use a common mdio_register() function that uses two
ksz_dev_ops callbacks (.mdio_bus_preinit() and .create_phy_addr_map())
to handle the lan937x specific case. These two callbacks are used only
at this place in the code.
Implement a new lan937x-specific MDIO registration functions that uses
these two lan937x-specific functions. The lan937x bindings don't
have any 'interrupts' property so this lan937x_mdio_register() doesn't
call ksz_irq_phy_setup().
Expose the common ksz_*_mdio_{read/write} functions so they can be used
in lan937x.c
Remove the callbacks from ksz_dev_ops.
net: dsa: microchip: implement .{get/set}_wol only if needed
All the KSZ switches use common {get/set}_wol operations while only the
ksz9477 and the ksz87xx families really support it. These operations are
optional so there is no point implementing them to return -EOPNOTSUPP.
Remove the {get/set}_wol callbacks from the switch operations for the
ksz88xx, the ksz8463 and the lan937x families.
Remove the family check from the common {get/set}_wol implementation.
Note that is_ksz9477() is only true for the KSZ9477 so this change will
also add WoL support for the other switches using the
ksz9477_switch_ops. I checked their datasheet, they implement the same
PME_WOL registers, at the same addresses, so this should go fine.
Modify the ksz_wol_pre_shutdown() initial check to ensure consistency in
the WoL handling for these non-KSZ9477 switches using ksz9477_switch_ops.
net: dsa: microchip: implement .support_eee() only if needed
The .support_eee() operation is optional. Yet, it is implemented by the
KSZ switches through a common functon that reports false for every chip
except for KSZ8563, KSZ9563 and KSZ9893 from the KSZ9477 family.
Remove the implementation from the switches that don't support EEE.
Also remove .set_mac_eee() for them as .set_mac_eee() is gated by the
`support_eee` presence in the core.
Implement instead a ksz9477-specific support_eee for these three supported
switches.
Note that comment /* KSZ879x/KSZ877x/KSZ876x Errata DS80000687C Module 2 */
is completely removed because it concerns the KSZ87xx family that doesn't
support at all EEE.
setup_rgmii_delay() operation is only used once during the common phylink
MAC configuration. Only the lan937x switch implements this
setup_rgmii_delay().
Remove the setup_rgmii_delay operation from ksz_dev_ops.
Implement a lan937x-specific phylink MAC configuration that does this
RGMII delay setup.
Export ksz_set_xmii since it's needed by the lan937x implementation.
net: dsa: microchip: wrap the MAC configuration checks in a function
The common .mac_config() implementation checks some conditions before
doing any register access. As this common implementation is about to be
split in the upcoming patch, these checks would lead to code
duplication.
Wrap all the checks in a need_config() function that returns true when
the driver really need to access the switch registers to configure the
MAC.
net: dsa: microchip: implement get_phy_flags only if needed
The common ksz_get_phy_flags() is used by all the switches to implement
the optional .get_phy_flags DSA operation. It always returns 0 except
for KSZ88X3 switches where an errata has to be handled.
Make ksz_get_phy_flags() ksz88xx-specific.
Remove the get_phy_flags implementation for the switches that don't need
it.
net: dsa: microchip: remove useless common cls_flower_{add/del} operations
All the KSZ switches share a common implementation of the
cls_flower_{add/del} operations. These common implementations return
ksz9477-specific implementations for the KSZ9477 family and -EOPNOTSUPP
for the others. -EOPNOTSUPP is already returned by the DSA core when
the operation isn't implemented.
Remove the common implementations.
Directly link the ksz9477_cls_flower_{add/del}() to the KSZ9477 callback.
Victor Nogueira [Thu, 11 Jun 2026 20:58:49 +0000 (17:58 -0300)]
net/sched: sch_dualpi2: Add missing module alias
When a qdisc is added by name, the kernel tries to autoload its module
via request_qdisc_module(), which calls:
request_module(NET_SCH_ALIAS_PREFIX "%s", name);
i.e. it asks modprobe to resolve the "net-sch-<kind>" alias (e.g.
"net-sch-dualpi2") rather than the module's file name. Since dualpi2
was shipped without this alias, the autoload fails:
tc qdisc add dev lo root handle 1: dualpi2
Error: Specified qdisc kind is unknown.
Fix this by adding the missing alias so the qdisc is autoloaded on demand
like the others.
Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc") Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20260611205849.3287640-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 11 Jun 2026 17:21:49 +0000 (10:21 -0700)]
docs: networking: add guidance on what to push via extack
Every now and then someone tries to duplicated extack
messages to dmesg. Document our guidance against this.
Also indicate that system level faults should continue
to go to system logs. The high level thinking is to try
to distinguish between what's important to the user vs
system admin.
Vadim Fedorenko [Thu, 11 Jun 2026 19:03:33 +0000 (19:03 +0000)]
ptp: ocp: add shutdown callback
The shutdown callback was never implemented for this driver, but it's
needed because .remove() callback is never called during kexec/reboot
process. That leaves HW with some interrupts enabled and may cause
spurious interrupt while booting into a new kernel during with kexec.
If it happens that I2C interrupt fires during kexec, the whole I2C bus
is disabled leaving TimeCard with no devlink communication. The same
happens if timestampers were enabled, leaving the card without
timestamper interrupts until full reboot cycle.
Implement .shutdown() callback with the same function as remove
callback.
Ido Schimmel [Thu, 11 Jun 2026 15:46:05 +0000 (18:46 +0300)]
selftests: fib_tests: Add test cases for route lookup with oif
Test that both address families respect the oif parameter when a
matching multipath route is found, regardless of the presence of a
source address.
Output without "ipv6: Select best matching nexthop object in
fib6_table_lookup()" and "ipv6: Honor oif when choosing nexthop for
locally generated traffic":
IPv4 multipath oif test
TEST: IPv4 multipath via first nexthop [ OK ]
TEST: IPv4 multipath via second nexthop [ OK ]
TEST: IPv4 multipath via first nexthop with source address [ OK ]
TEST: IPv4 multipath via second nexthop with source address [ OK ]
IPv4 multipath oif with nexthop object test
TEST: IPv4 multipath via first nexthop [ OK ]
TEST: IPv4 multipath via second nexthop [ OK ]
TEST: IPv4 multipath via first nexthop with source address [ OK ]
TEST: IPv4 multipath via second nexthop with source address [ OK ]
IPv4 multipath oif with VRF test
TEST: IPv4 multipath via first nexthop [ OK ]
TEST: IPv4 multipath via second nexthop [ OK ]
TEST: IPv4 multipath via first nexthop with source address [ OK ]
TEST: IPv4 multipath via second nexthop with source address [ OK ]
IPv6 multipath oif test
TEST: IPv6 multipath via first nexthop [ OK ]
TEST: IPv6 multipath via second nexthop [ OK ]
TEST: IPv6 multipath via first nexthop with source address [FAIL]
TEST: IPv6 multipath via second nexthop with source address [FAIL]
IPv6 multipath oif with nexthop object test
TEST: IPv6 multipath via first nexthop [FAIL]
TEST: IPv6 multipath via second nexthop [FAIL]
TEST: IPv6 multipath via first nexthop with source address [FAIL]
TEST: IPv6 multipath via second nexthop with source address [FAIL]
IPv6 multipath oif with VRF test
TEST: IPv6 multipath via first nexthop [ OK ]
TEST: IPv6 multipath via second nexthop [ OK ]
TEST: IPv6 multipath via first nexthop with source address [FAIL]
TEST: IPv6 multipath via second nexthop with source address [FAIL]
IPv4 multipath oif test
TEST: IPv4 multipath via first nexthop [ OK ]
TEST: IPv4 multipath via second nexthop [ OK ]
TEST: IPv4 multipath via first nexthop with source address [ OK ]
TEST: IPv4 multipath via second nexthop with source address [ OK ]
IPv4 multipath oif with nexthop object test
TEST: IPv4 multipath via first nexthop [ OK ]
TEST: IPv4 multipath via second nexthop [ OK ]
TEST: IPv4 multipath via first nexthop with source address [ OK ]
TEST: IPv4 multipath via second nexthop with source address [ OK ]
IPv4 multipath oif with VRF test
TEST: IPv4 multipath via first nexthop [ OK ]
TEST: IPv4 multipath via second nexthop [ OK ]
TEST: IPv4 multipath via first nexthop with source address [ OK ]
TEST: IPv4 multipath via second nexthop with source address [ OK ]
IPv6 multipath oif test
TEST: IPv6 multipath via first nexthop [ OK ]
TEST: IPv6 multipath via second nexthop [ OK ]
TEST: IPv6 multipath via first nexthop with source address [ OK ]
TEST: IPv6 multipath via second nexthop with source address [ OK ]
IPv6 multipath oif with nexthop object test
TEST: IPv6 multipath via first nexthop [ OK ]
TEST: IPv6 multipath via second nexthop [ OK ]
TEST: IPv6 multipath via first nexthop with source address [ OK ]
TEST: IPv6 multipath via second nexthop with source address [ OK ]
IPv6 multipath oif with VRF test
TEST: IPv6 multipath via first nexthop [ OK ]
TEST: IPv6 multipath via second nexthop [ OK ]
TEST: IPv6 multipath via first nexthop with source address [ OK ]
TEST: IPv6 multipath via second nexthop with source address [ OK ]
Ido Schimmel [Thu, 11 Jun 2026 15:46:04 +0000 (18:46 +0300)]
ipv6: Honor oif when choosing nexthop for locally generated traffic
Commit 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is
set") made the kernel honor the oif parameter when specified as part of
output route lookup:
# ip route add 2001:db8:1::/64 dev dummy1
# ip route add ::/0 dev dummy2
# ip route get 2001:db8:1::1 oif dummy2 fibmatch
default dev dummy2 metric 1024 pref medium
Due to regression reports, the behavior was partially reverted in commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr
set") to only honor the oif if source address is not specified:
# ip route get 2001:db8:1::1 from 2001:db8:2::1 oif dummy2 fibmatch
2001:db8:1::/64 dev dummy1 metric 1024 pref medium
That is, when source address is specified, the kernel will choose the
most specific route even if its nexthop device does not match the
specified oif.
This creates a problem for multipath routes. After looking up a route,
when source address is not specified, the kernel will choose a nexthop
whose nexthop device matches the specified oif:
# sysctl -wq net.ipv6.conf.all.forwarding=1
# ip route add 2001:db8:10::/64 nexthop via fe80::1 dev dummy1 nexthop via fe80::2 dev dummy2
# for i in {1..100}; do ip route get 2001:db8:10::${i} oif dummy2; done | grep -o dummy[0-9] | sort | uniq -c
100 dummy2
But will disregard the oif when source address is specified despite the
fact that a matching nexthop exists:
# for i in {1..100}; do ip route get 2001:db8:10::${i} from 2001:db8:2::1 oif dummy2; done | grep -o dummy[0-9] | sort | uniq -c
53 dummy1
47 dummy2
This behavior differs from IPv4:
# ip address add 192.0.2.1/32 dev lo
# ip route add 198.51.100.0/24 nexthop via inet6 fe80::1 dev dummy1 nexthop via inet6 fe80::2 dev dummy2
# for i in {1..100}; do ip route get 198.51.100.${i} from 192.0.2.1 oif dummy2; done | grep -o dummy[0-9] | sort | uniq -c
100 dummy2
What happens is that fib6_table_lookup() returns a route with a matching
nexthop device (assuming it exists):
# perf record -e fib6:fib6_table_lookup -- bash -c "for i in {1..100}; do ip route get 2001:db8:10::${i} from 2001:db8:2::1 oif dummy2; done > /dev/null"
# perf script | grep -o dummy[0-9] | sort | uniq -c
100 dummy2
But it is later overwritten during path selection in fib6_select_path()
which instead chooses a nexthop according to the calculated hash.
Solve this by telling fib6_select_path() to skip path selection if we
have an oif match during output route lookup (iif being
LOOPBACK_IFINDEX).
Behavior after the change:
# sysctl -wq net.ipv6.conf.all.forwarding=1
# ip route add 2001:db8:10::/64 nexthop via fe80::1 dev dummy1 nexthop via fe80::2 dev dummy2
# for i in {1..100}; do ip route get 2001:db8:10::${i} from 2001:db8:2::1 oif dummy2; done | grep -o dummy[0-9] | sort | uniq -c
100 dummy2
Note that enabling forwarding is only needed because we did not add
neighbor entries for the gateway addresses. When forwarding is disabled
and CONFIG_IPV6_ROUTER_PREF is not enabled in kernel config, the kernel
will treat non-existing neighbor entries as errors and perform
round-robin between the nexthops:
# sysctl -wq net.ipv6.conf.all.forwarding=0
# for i in {1..100}; do ip route get 2001:db8:10::${i} from 2001:db8:2::1 oif dummy2; done | grep -o dummy[0-9] | sort | uniq -c
50 dummy1
50 dummy2
Ido Schimmel [Thu, 11 Jun 2026 15:46:03 +0000 (18:46 +0300)]
ipv6: Select best matching nexthop object in fib6_table_lookup()
Currently, when using multipath routes without nexthop objects,
fib6_table_lookup() selects the nexthop with the highest score. This
means that when both a source address and an oif are specified, the
nexthop that is chosen is the one that matches in terms of oif:
# sysctl -wq net.ipv6.conf.all.forwarding=1
# ip address add 2001:db8:2::1/64 dev lo
# ip route add 2001:db8:10::/64 nexthop via fe80::1 dev dummy1 nexthop via fe80::2 dev dummy2
# perf record -e fib6:fib6_table_lookup -- bash -c "for i in {1..100}; do ip route get 2001:db8:10::${i} from 2001:db8:2::1 oif dummy1; done > /dev/null"
# perf script | grep -o dummy[0-9] | sort | uniq -c
100 dummy1
# perf record -e fib6:fib6_table_lookup -- bash -c "for i in {1..100}; do ip route get 2001:db8:10::${i} from 2001:db8:2::1 oif dummy2; done > /dev/null"
# perf script | grep -o dummy[0-9] | sort | uniq -c
100 dummy2
When using nexthop objects, fib6_table_lookup() selects the first
matching nexthop and not necessarily the one with the highest score:
# ip nexthop add id 1 via fe80::1 dev dummy1
# ip nexthop add id 2 via fe80::2 dev dummy2
# ip nexthop add id 3 group 1/2
# ip route add 2001:db8:20::/64 nhid 3
# perf record -e fib6:fib6_table_lookup -- bash -c "for i in {1..100}; do ip route get 2001:db8:20::${i} from 2001:db8:2::1 oif dummy1; done > /dev/null"
# perf script | grep -o dummy[0-9] | sort | uniq -c
100 dummy1
# perf record -e fib6:fib6_table_lookup -- bash -c "for i in {1..100}; do ip route get 2001:db8:20::${i} from 2001:db8:2::1 oif dummy2; done > /dev/null"
# perf script | grep -o dummy[0-9] | sort | uniq -c
100 dummy1
This is not very significant right now because the nexthop is later
overwritten during path selection in fib6_select_path(). However, the
next patch is going to skip path selection when we have an oif match
during output route lookup.
As a preparation for this change, align the nexthop object behavior with
the legacy one and make sure that fib6_table_lookup() always selects the
best matching nexthop. Do that by always returning 0 from
rt6_nh_find_match() in order not to terminate the loop in
nexthop_for_each_fib6_nh() and storing in arg->nh the best matching
nexthop so far.
Behavior after the change:
# perf record -e fib6:fib6_table_lookup -- bash -c "for i in {1..100}; do ip route get 2001:db8:20::${i} from 2001:db8:2::1 oif dummy1; done > /dev/null"
# perf script | grep -o dummy[0-9] | sort | uniq -c
100 dummy1
# perf record -e fib6:fib6_table_lookup -- bash -c "for i in {1..100}; do ip route get 2001:db8:20::${i} from 2001:db8:2::1 oif dummy2; done > /dev/null"
# perf script | grep -o dummy[0-9] | sort | uniq -c
100 dummy2
Breno Leitao [Wed, 10 Jun 2026 14:26:04 +0000 (07:26 -0700)]
netconsole: clear cached dev_name on resume-window cleanup
When process_resume_target() catches a device that was unregistered
while the target was off target_list, it calls do_netpoll_cleanup() to
release the reference but leaves the cached np.dev_name in place. The
other cleanup path, netconsole_process_cleanups_core(), already wipes
dev_name for MAC-bound targets because the name was only a cache of the
device that last carried the MAC and may no longer match.
The pattern is the same in both spots, so fold it into a small helper
netcons_release_dev() and route both call sites through it. This makes
the resume-window cleanup consistent with the notifier-driven one so a
later enable does not let netpoll_setup() pick a stale interface by name
when the user bound the target by MAC.
Eric Dumazet [Thu, 11 Jun 2026 15:27:37 +0000 (15:27 +0000)]
net: watchdog: fix refcount tracking races
Blamed commit converted the untracked dev_hold()/dev_put() calls
in the watchdog code to use the tracked dev_hold_track()/dev_put_track()
(which were later renamed/interfaced to netdev_hold() and netdev_put()).
By introducing dev->watchdog_dev_tracker to store the
reference tracking information without adding synchronization
between netdev_watchdog_up() and dev_watchdog(), it enabled the
race condition where this pointer could be overwritten or freed
concurrently, leading to the list corruption crash syzbot reported:
1) Add dev->watchdog_lock and dev->watchdog_ref_held to serialize watchdog operations.
2) Remove netdev_watchdog_up() call from netif_carrier_on():
This ensures netdev_watchdog_up() is only called from process/BH context
(via linkwatch workqueue dev_activate()), allowing us to use
spin_lock_bh() for synchronization.
3) Synchronize watchdog up and watchdog timer:
Protect netdev_watchdog_up() with tx_global_lock and watchdog_lock.
Only allocate a new tracker in netdev_watchdog_up() if one is
not already present.
In dev_watchdog(), ensure we don't release the tracker if the
timer was rescheduled either by dev_watchdog() itself or concurrently
by netdev_watchdog_up().
Fixes: f12bf6f3f942 ("net: watchdog: add net device refcount tracker") Reported-by: syzbot+381d82bbf0253710b35d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a26b751.c25708ab.1b19ef.0013.GAE@google.com/T/#u Tested-by: syzbot+3479efbc2821cb2a79f2@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260611152737.2580480-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dragos Tatulea [Thu, 11 Jun 2026 16:03:41 +0000 (19:03 +0300)]
selftests: iou-zcrx: defer listen() until after zcrx setup
The server binds the queues for zero-copy after listen(). If the client
does a connect() during this time it can fail with EHOSTUNREACH on
a cold system. This was encountered with the mlx5 driver where binding
the .ndo_queue_start() is a slow operation during which no packets
can be exchanged.
This change moves listen() after queue binding, when the test server is
fully operational.
====================
net: mana: fix error-path issues in queue setup
Two error-path fixes in MANA queue setup, both surfaced during Sashiko
AI review of a recently upstreamed patch series.
Patch 1 initializes queue->id to INVALID_QUEUE_ID in
mana_gd_create_mana_wq_cq() so that a CQ creation failure before the
firmware id is assigned does not NULL gc->cq_table[0] and silently
break whichever real CQ owns that slot. This mirrors the existing
pattern in mana_gd_create_eq().
Patch 2 guards mana_destroy_txq()'s call to mana_destroy_wq_obj() with
an INVALID_MANA_HANDLE check, mirroring mana_destroy_rxq(). Without
it, TX setup failures lead to a firmware-rejected destroy of (u64)-1
and a spurious error in dmesg.
====================
Aditya Garg [Mon, 8 Jun 2026 10:13:41 +0000 (03:13 -0700)]
net: mana: guard TX wq object destroy with INVALID_MANA_HANDLE check
mana_create_txq() has several error paths (after mana_alloc_queues() or
mana_create_wq_obj() failure) where tx_qp[i].tx_object stays as the
INVALID_MANA_HANDLE sentinel set at allocation. mana_destroy_txq() then
unconditionally calls mana_destroy_wq_obj() with (u64)-1, which firmware
rejects and logs an error.
Mirror the RX-side pattern in mana_destroy_rxq() and skip the destroy
when the handle is still INVALID_MANA_HANDLE.
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260608101345.2267320-3-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aditya Garg [Mon, 8 Jun 2026 10:13:40 +0000 (03:13 -0700)]
net: mana: initialize gdma queue id to INVALID_QUEUE_ID
mana_gd_create_mana_wq_cq() leaves queue->id as 0 (from kzalloc_obj())
until mana_create_wq_obj() assigns the firmware-returned id. If creation
fails before that, cleanup calls mana_gd_destroy_cq() with id 0, NULLing
gc->cq_table[0] and silently breaking whichever real CQ owns that slot.
Initialize queue->id to INVALID_QUEUE_ID right after allocation, matching
mana_gd_create_eq(). The existing (id >= max_num_cqs) guard then
short-circuits cleanly.
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260608101345.2267320-2-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: mdio: realtek-rtl9300: Add RTL931x support
The Realtek Otto switch platform consists of four different series
- RTL838x aka maple : 28 port 1G Switches
- RTL839x aka cypress : 52 port 1G Switches
- RTL930x aka longan : 28 port 1G/2.5G/10G Switches
- RTL931x aka mango : 56 port 1G/2.5G/10G Switches
This patch series adds support for the RTL931x devices. For this
- Enhance device tree binding.
- Implement final cleanups and enhancments for the driver.
- Add RTL931x coding.
Remark: Instead of this series it was planned to bring support for
hardware polling configuration first. It turns out that more testing
is needed - especially for the RTL83xx SoCs. Instead add the lineup
of the RTL931x devices, that are known to have no obvious bus and
polling issues (at least from testing and vendor SDK perspective).
====================
net: mdio: realtek-rtl9300: Add support for RTL931x
The MDIO driver has been prepared for multiple device support. Add all
required bits for the RTL931x (aka mango) series. This is straightforward
but some things are worth to be mentioned.
- In contrast to RTL930x the I/O register has the input/output fields
swapped. Upper 16 bits are for read/outputs, and the lower 16 bits
are for write/inputs.
- The supported "pages" are 8192 and thus the raw page is 8191
- The devices support up to 56 ports. Thus the MAX_PORTS definition
is increased by this commit.
- There are multiple global SMI controller registers with a different
layout from RTL930x devices. Therefore a separate setup_controller()
callback is added.
net: mdio: realtek-rtl9300: Add registers for high port count models
The high port count models of the Realtek Otto switches have additional
registers to instrument the MDIO controller. These are:
- High port mask: A bitfield that extends the already existing low port
mask to select ports starting from 32.
- Broadcast: This takes the port number during reads on the RTL931x.
- Extended page: Some additional page info. The SDK does not give much
information about this. Basically some fixed value must be written
into it during access.
net: mdio: realtek-rtl9300: Make otto_emdio_read_cmd() generic
The otto_emdio_read_cmd() helper still uses RTL9300 specific properties.
This cannot be made generic as the I/O register has different layouts for
the different SoCs. E.g.
- RTL930x: data in bits 31-16, data out bits 15-0
- RTL931x: data in bits 15-0, data out bits 31-16
Add a mask parameter to the function signature and fill it properly
in the callers. As the masks will always have bits set from constant
defines, there is no need for a consistency check.
net: mdio: realtek-rtl9300: Add prefix to register field defines
The current Realtek Otto MDIO driver has some define leftovers without
a SoC prefix. When adding new devices there will be an overlap for some
of them. Sort this out as follows:
- PHY_CTRL_CMD/PHY_CTRL_MMD_DEVAD/PHY_CTRL_MMD_REG are common for all
series. Leave them as is but move them into a separate block.
- Add RTL9300 prefix to all other defines and adapt the callers.
dt-bindings: net: realtek,rtl9301-mdio: Add RTL931x series
The 10G Realtek Otto switches are divided into two series
- Longan: RTL930x up to 28 ports
- Mango : RTL931x up to 56 ports
The Mango based devices have 3 different SoCs RTL9311, RTL9312 and RTL9313.
The MDIO controller of these switches works like the existing RTL930x
logic but has different characteristics and different registers. Add new
compatibles in the device tree.
Linus Torvalds [Sat, 13 Jun 2026 00:23:05 +0000 (17:23 -0700)]
Merge tag 'pinctrl-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Two fixes for the mcp23s08 driver.
- Revert an earlier fix to the AMD pin controller that was all wrong. A
proper fix is being developed.
* tag 'pinctrl-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
Revert "pinctrl-amd: enable IRQ for WACF2200 touchscreen on Lenovo Yoga 7 14AGP11"
pinctrl: mcp23s08: Read spi-present-mask as u8 not u32
pinctrl: mcp23s08: Initialize mcp->dev and mcp->addr before regmap init
====================
Avoid mistaken parent class deactivation during peek
Several qdiscs (fq_codel, codel and dualpi2) may drop packets while
peeking at their queue. When that happens they call
qdisc_tree_reduce_backlog() to notify the parent of the backlog/qlen
change. The problem is that they do so *before* reincrementing the qlen
that peek had temporarily decremented.
If the qlen momentarily drops to zero while peek still has an skb to
return, qdisc_tree_reduce_backlog() ends up invoking the parent's
qlen_notify() callback even though the child is not actually empty. The
parent then deactivates the class, while the child still holds a packet.
For parents such as QFQ this desync corrupts the active class list and
leads to wild memory accesses and NULL pointer dereferences (see the
per-patch splats). For HFSC it might lead to stalls [1].
Fix all three qdiscs the same way: only call qdisc_tree_reduce_backlog()
once the qlen has been restored, so the parent never observes a
transient empty child during peek.
Patch 1 fixes this for fq_codel, patch 2 for codel, patch 3 for dualpi2
and patch 4 adds test cases for these 3 setups.
Note: Patch 1 is one of two fixes for the stall reported in [1]; the
companion fix is "net/sched: sch_hfsc: Don't make class passive twice",
sent separately.
Note2: A possible cleaner fix is to create a new helper function for peek
that only calls qdisc_tree_reduce_backlog after reincrementing the qlen.
This would be called from the 3 vulnerable qdiscs, however we thought this
might make it harder for backporting so, if people agree, we can submit
this cleaner version to net-next after this one is merged.
Victor Nogueira [Wed, 10 Jun 2026 19:28:55 +0000 (16:28 -0300)]
selftests/tc-testing: Verify child qdisc will not mistakenly deactivate QFQ parent
Create 3 test cases:
- Verify fq_codel won't mistakenly deactivate QFQ parent class during peek
- Verify codel won't mistakenly deactivate QFQ parent class during peek
- Verify dualpi2 won't mistakenly deactivate QFQ parent class during peek
Verify that these 3 qdiscs (fq_codel, codel, dualpi2) will not call
qdisc_tree_reduce_backlog with an incorrect qlen (0) during peek and
mistakenly deactivate a parent class.
Victor Nogueira [Wed, 10 Jun 2026 19:28:54 +0000 (16:28 -0300)]
net/sched: sch_dualpi2: Do not call qdisc_tree_reduce_backlog during peek before restoring qlen
Whenever dualpi2 drops packets during peek, it calls
qdisc_tree_reduce_backlog. An issue arises because it calls
qdisc_tree_reduce_backlog before it reincrements the qlen. If qlen drops
to zero, but peek returns an skb, the parent's qlen_notify callback will be
executed even though dualpi2 still has 1 packet on the queue and, thus,
mistakenly deactivates the parent's class which leads to a null-ptr-deref:
Victor Nogueira [Wed, 10 Jun 2026 19:28:53 +0000 (16:28 -0300)]
net/sched: sch_codel: Do not call qdisc_tree_reduce_backlog during peek before restoring qlen
Whenever codel drops packets during peek, it calls
qdisc_tree_reduce_backlog. An issue arises because it calls
qdisc_tree_reduce_backlog before it reincrements the qlen. If qlen drops
to zero, but peek returns an skb, the parent's qlen_notify callback will
be executed even though codel still has 1 packet on the queue and, thus,
will mistakenly deactivate the parent's class causing issues like a wild
memory access when qfq has codel as a child:
Victor Nogueira [Wed, 10 Jun 2026 19:28:52 +0000 (16:28 -0300)]
net/sched: sch_fq_codel: Do not call qdisc_tree_reduce_backlog during peek before restoring qlen
Whenever fq_codel drops packets during peek, it calls
qdisc_tree_reduce_backlog. An issue arises because it calls
qdisc_tree_reduce_backlog before it reincrements the qlen. If qlen drops
to zero, but peek returns an skb, the parent's qlen_notify callback will be
executed even though fq_codel still has 1 packet on the queue and, thus,
will mistakenly deactivate the parent's class causing issues like a recent
report [1] and a wild memory access in qfq:
====================
ipv6: mcast: annotate data races in /proc/net/igmp6
/proc/net/igmp6 walks IPv6 multicast memberships under RCU without
holding idev->mc_lock, taking a lockless snapshot of two fields that
writers update under the lock: mca_flags and mca_work.timer.expires.
Patch 1 adds WRITE_ONCE() to all mca_flags update sites and READ_ONCE()
to the procfs reader. Patch 2 does the same for the timer.expires read
in the procfs path.
====================
Yuyang Huang [Tue, 9 Jun 2026 08:11:13 +0000 (17:11 +0900)]
ipv6: mcast: annotate igmp6 timer expiry race
/proc/net/igmp6 walks IPv6 multicast memberships under RCU and reads
mca_work.timer.expires to print the remaining multicast timer. The
delayed-work timer can be updated concurrently.
Annotate the intentional lockless procfs snapshot with READ_ONCE().
Yuyang Huang [Tue, 9 Jun 2026 08:11:12 +0000 (17:11 +0900)]
ipv6: mcast: annotate data-races around mca_flags
/proc/net/igmp6 walks IPv6 multicast memberships under RCU and
prints mca_flags without holding idev->mc_lock. The multicast paths
update the field while holding idev->mc_lock.
Annotate this intentional lockless snapshot with READ_ONCE() and the
matching writers with WRITE_ONCE().
Jakub Kicinski [Fri, 12 Jun 2026 23:48:57 +0000 (16:48 -0700)]
Merge branch 'rxrpc-miscellaneous-fixes'
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here are some miscellaneous AF_RXRPC fixes:
(1) Make sure rxrpc_verify_data() allocates a buffer, even if the DATA
packet being looked at is zero length to avoid potential NULL-pointer
exceptions.
(2) Don't move an OOB message (e.g. an RxGK CHALLENGE) off the receive
queue onto the pending queue in recvmsg() if MSG_PEEK is specified.
(3) Fix a potential UAF in rxgk_issue_challenge() in which a tracepoint
refers to memory just freed by a different pointer.
(4) Fix afs net namespace teardown to cancel the incoming call
preallocation charger before we disable listening (which will delete
the preallocation queue).
(5) Fix rxrpc_kernel_charge_accept() to use the socket mutex to defend
against listen(0)/shutdown simultaneously deleting the preallocation
queue.
====================
Li Daming [Tue, 9 Jun 2026 14:09:09 +0000 (15:09 +0100)]
rxrpc: serialize kernel accept preallocation with socket teardown
rxrpc_kernel_charge_accept() reads rx->backlog without any
socket/backlog synchronization and passes that raw pointer into
rxrpc_service_prealloc_one(). A concurrent rxrpc_discard_prealloc()
sets rx->backlog = NULL and frees the backlog rings, so a kernel
preallocation worker can keep using a freed struct rxrpc_backlog
while updating *_backlog_head/tail and array slots.
Serialize the state check and backlog lookup with the socket lock,
and reject kernel preallocation once teardown has disabled
listening or discarded the service backlog.
Fixes: 00e907127e6f ("rxrpc: Preallocate peers, conns and calls for incoming service requests") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Li Daming <d4n.for.sec@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260609140911.838677-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Tue, 9 Jun 2026 14:09:08 +0000 (15:09 +0100)]
afs: Fix netns teardown to cancel the preallocation charger
Fix the teardown of an afs network namespace to make sure it cancels the
work item that keeps the preallocated rxrpc call/conn/peer queue charged
before incoming calls are disabled (i.e. listen 0).
Also, if net->live is false because the afs netns is being deleted, make
afs_charge_preallocation() skip charging and make afs_rx_new_call() avoid
requeuing the charger.
(This was found by AI review).
Fixes: 00e907127e6f ("rxrpc: Preallocate peers, conns and calls for incoming service requests") Reported-by: Simon Horman <horms@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Li Daming <d4n.for.sec@gmail.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260609140911.838677-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Tue, 9 Jun 2026 14:09:07 +0000 (15:09 +0100)]
rxrpc: Fix UAF in rxgk_issue_challenge()
Fix rxgk_issue_challenge() to free the page containing the challenge
content after invoking the tracepoint as the whdr passed to the tracepoint
points into the page just freed.
Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260609140911.838677-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hyunwoo Kim [Tue, 9 Jun 2026 14:09:06 +0000 (15:09 +0100)]
rxrpc: Don't move a peeked OOB message onto the pending queue
rxrpc_recvmsg_oob() takes a received oob message off recvmsg_oobq and,
if a response is needed, moves it onto the pending_oobq tree. However,
only the unlink from recvmsg_oobq is guarded by MSG_PEEK; the move onto
pending_oobq always runs.
As a result, reading a challenge with MSG_PEEK leaves the skb on
recvmsg_oobq while also adding it to pending_oobq. Since struct
sk_buff's rbnode shares storage with its next and prev pointers,
rb_insert_color() overwrites the list linkage, and the skb, which holds
a single reference, becomes reachable from both queues at once.
When the socket is closed both queues are drained in turn. While
draining recvmsg_oobq, __skb_unlink() follows the next and prev
pointers that rbnode has overwritten and writes to a bad address. Also,
as the skb holds a single reference but is freed from each queue, both
the skb and the connection reference it holds are released twice. This
leads to memory corruption and to a use-after-free caused by the
connection refcount underflow.
MSG_PEEK does not consume the message from the queue, so only unlink it
from recvmsg_oobq and then move it onto pending_oobq or free it when
the message is actually consumed.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260609140911.838677-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rxrpc_recvmsg_data() calls rxrpc_verify_data() whenever the
rxrpc_call.rx_dec_buffer is unallocated and assumes that upon
successful return that rx_dec_buffer must be allocated.
However, rxrpc_verify_data() does not request an allocation if
the rxrpc_skb_priv.len is zero.
In addition, failure to allocate rx_dec_buffer will result in a
call to skb_copy_bits() with a NULL destination which can
trigger a NULL pointer dereference.
To prevent these issues rxrpc_verify_data() is modified to
always attempt to allocate the rxrpc_call.rx_dec_buffer if it
is NULL.
This issue was identified with assistance of a private
sashiko instance.
Fixes: d2bc90cf6c75cb ("rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg") Reported-by: Simon Horman <simon.horman@redhat.com> Signed-off-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jiayuan Chen <jiayuan.chen@linux.dev>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/20260609140911.838677-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Thu, 11 Jun 2026 10:21:33 +0000 (12:21 +0200)]
tls: remove tls_toe and the related driver
The tls_toe feature and its single user (chelsio chtls) have been
unmaintained for multiple years. It also hooks into the core of the
TCP implementation, and bypasses most of the networking stack.