Jiexun Wang [Wed, 6 May 2026 11:43:30 +0000 (19:43 +0800)]
Bluetooth: serialize accept_q access
bt_sock_poll() walks the accept queue without synchronization, while
child teardown can unlink the same socket and drop its last reference.
The unsynchronized accept queue walk has existed since the initial
Bluetooth import.
Protect accept_q with a dedicated lock for queue updates and polling.
Also rework bt_accept_dequeue() to take temporary child references under
the queue lock before dropping it and locking the child socket.
Fixes: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Reported-by: Jann Horn <jannh@google.com> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Sven Schuchmann [Tue, 12 May 2026 07:19:47 +0000 (09:19 +0200)]
net: phy: DP83TC811: add reading of abilities
At this time the driver is not listing any speeds
it supports. This should be ETHTOOL_LINK_MODE_100baseT1_Full_BIT
for DP83TC811. Add the missing call for phylib to read the abilities.
Fixes: b753a9faaf9a ("net: phy: DP83TC811: Introduce support for the DP83TC811 phy") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sven Schuchmann <schuchmann@schleissheimer.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260512071949.6218-1-schuchmann@schleissheimer.de
[pabeni@redhat.com: dropped revision history] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Mon, 11 May 2026 17:49:18 +0000 (10:49 -0700)]
net: tls: prevent chain-after-chain in plain text SG
Sashiko points out that if end = 0 (start != 0) the current
code will create a chain link to content type right after
the wrap link:
This would create a chain where the wrap link points directly
to another chain link. The scatterlist API sg_next iterator
does not recursively resolve consecutive chain links.
meaning this is illegal input to crypto.
The wrapping link is unnecessary if end = 0. end is the entry after
the last one used so end = 0 means there's nothing pushed after
the wrap:
end start i
v v v
[ ]...[ ][ d ][ d ][ d ][ d ][rsv for wrap]
Skip the wrapping in this case.
TLS 1.3 can use the "wrapping slot" for it's chaining if end = 0.
This avoids the chain-after-chain.
Move the wrap chaining before marking END and chaining off content
type, that feels like more logical ordering to me, but should not
matter from functional perspective.
Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: 9aaaa56845a0 ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260511174920.433155-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Mon, 11 May 2026 17:49:17 +0000 (10:49 -0700)]
net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
When an sk_msg scatterlist ring wraps (sg.end < sg.start),
tls_push_record() chains the tail portion of the ring to the head
using sg_chain(). An extra entry in the sg array is reserved for
this:
struct sk_msg_sg {
[...]
/* The extra two elements:
* 1) used for chaining the front and sections when the list becomes
* partitioned (e.g. end < start). The crypto APIs require the
* chaining;
* 2) to chain tailer SG entries after the message.
*/
struct scatterlist data[MAX_MSG_FRAGS + 2];
The current code uses MAX_SKB_FRAGS + 1 as the ring size:
instead of the true last entry. This is likely due to a "race" of
the commit under Fixes landing close to
commit 031097d9e079 ("bpf: sk_msg, zap ingress queue on psock down")
Convert to ARRAY_SIZE and drop the data[start] / - start (as suggested
by Sabrina).
Reported-by: 钱一铭 <yimingqian591@gmail.com> Fixes: 9aaaa56845a0 ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260511174920.433155-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xiang Mei [Mon, 11 May 2026 06:21:38 +0000 (23:21 -0700)]
net/smc: reject CHID-0 ACCEPT that matches an empty ism_dev slot
On the SMC-D client, slot 0 of ini->ism_dev[]/ini->ism_chid[] is
reserved for an SMC-Dv1 device. smc_find_ism_v2_device_clnt()
populates V2 entries starting at index 1, so when no V1 device is
selected slot 0 is left in its kzalloc()'ed state with ism_dev[0] ==
NULL and ism_chid[0] == 0.
smc_v2_determine_accepted_chid() then matches the peer's CHID against
the array starting from index 0 using the CHID alone. A malicious
peer replying to a SMC-Dv2-only proposal with d1.chid == 0 matches
the empty slot, ini->ism_selected becomes 0, and the subsequent
ism_dev[0]->lgr_lock dereference in smc_conn_create() faults at
offsetof(struct smcd_dev, lgr_lock) == 0x68:
BUG: KASAN: null-ptr-deref in _raw_spin_lock_bh+0x79/0xe0
Write of size 4 at addr 0000000000000068 by task exploit/144
Call Trace:
_raw_spin_lock_bh
smc_conn_create (net/smc/smc_core.c:1997)
__smc_connect (net/smc/af_smc.c:1447)
smc_connect (net/smc/af_smc.c:1720)
__sys_connect
__x64_sys_connect
do_syscall_64
Require ism_dev[i] to be non-NULL before accepting a CHID match.
Fixes: a7c9c5f4af7f ("net/smc: CLC accept / confirm V2") Reported-by: Weiming Shi <bestswngs@gmail.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260511062138.2839584-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
macsec: use rcu_work to fix crypto cleanup in softirq context
From: Jinliang Zheng <alexjlzheng@tencent.com>
crypto_free_aead() can internally call vunmap() (e.g. via dma_free_attrs()
in hardware crypto drivers like hisi_sec2), which must not be invoked from
softirq context. Both free_rxsa() and free_txsa() are RCU callbacks that
run in softirq, causing a kernel crash on affected hardware.
This series fixes the issue by deferring the actual cleanup to a workqueue
using rcu_work, which combines the RCU grace period and workqueue dispatch
into a single primitive.
Two design decisions worth noting:
1. rcu_work instead of schedule_work() + synchronize_rcu()
An alternative would be to call schedule_work() directly from
macsec_rxsa_put()/macsec_txsa_put(), then call synchronize_rcu() at
the start of the work handler to replace the grace period previously
provided by call_rcu(). However, synchronize_rcu() blocks the worker
thread for the duration of a full RCU grace period. Under high SA
churn (e.g. tearing down an interface with many SAs), each SA would
occupy a worker thread while waiting, and multiple concurrent calls
cannot share the same grace period — leading to unnecessary latency
and resource waste.
rcu_work uses call_rcu_hurry() internally, which is fully asynchronous:
the worker thread is only dispatched after the grace period has elapsed,
and multiple concurrent queue_rcu_work() calls naturally batch under the
same grace period via the RCU subsystem's existing coalescing mechanism.
2. Dedicated workqueue instead of system_wq
Using a dedicated workqueue (macsec_wq) allows macsec_exit() to drain
exactly the work items belonging to this module — by calling
destroy_workqueue() after rcu_barrier(). If system_wq were used,
flush_scheduled_work() would drain all pending work items across the
entire system, creating unnecessary coupling with unrelated subsystems
and potentially causing unexpected delays. The dedicated workqueue
provides a clean, contained teardown path.
====================
Jinliang Zheng [Mon, 11 May 2026 15:31:00 +0000 (23:31 +0800)]
macsec: use rcu_work to defer TX SA crypto cleanup out of softirq
free_txsa() is an RCU callback running in softirq context, but calls
crypto_free_aead() which can invoke vunmap() internally on hardware
crypto drivers (e.g. hisi_sec2), triggering a kernel crash.
Use rcu_work to defer the cleanup to a workqueue, for the same reasons
as the analogous fix to free_rxsa() in the previous patch.
Jinliang Zheng [Mon, 11 May 2026 15:30:59 +0000 (23:30 +0800)]
macsec: use rcu_work to defer RX SA crypto cleanup out of softirq
crypto_free_aead() can internally invoke vunmap() (e.g. via
dma_free_attrs() in hardware crypto drivers such as hisi_sec2).
vunmap() must not be called from softirq context, but free_rxsa()
is an RCU callback that runs in softirq, leading to a kernel crash:
Use rcu_work to defer the cleanup to a workqueue. rcu_work dispatches
the worker asynchronously after the RCU grace period, so no thread
blocks waiting, and concurrent releases of multiple SAs naturally
share the same grace period.
Jinliang Zheng [Mon, 11 May 2026 15:30:58 +0000 (23:30 +0800)]
macsec: introduce dedicated workqueue for SA crypto cleanup
Introduce a dedicated ordered workqueue, macsec_wq, which will be used
by subsequent patches to defer SA crypto cleanup (crypto_free_aead and
related teardown) out of softirq context.
Using a dedicated workqueue instead of system_wq allows macsec_exit()
to drain exactly the work items belonging to this module via
destroy_workqueue(), without interfering with unrelated work items on
system_wq or causing unexpected delays elsewhere.
rcu_barrier() in macsec_exit() ensures all in-flight rcu_work callbacks
have enqueued their work items before destroy_workqueue() drains and
destroys the queue, making the two-step teardown correct and complete.
The same sequence is kept in the error path of macsec_init() as a
precaution, to mirror macsec_exit() and stay safe if work ever becomes
queueable before this point in the future.
While at it, rename the error labels in macsec_init() from the
resource-named style (rtnl:, notifier:, wq:) to the err_xxx: style
(err_rtnl:, err_notifier:, err_destroy_wq:) to align with the broader
kernel convention.
Faicker Mo [Mon, 11 May 2026 14:05:51 +0000 (22:05 +0800)]
net: net_failover: Fix the deadlock in slave register
There is netdev_lock_ops() before the NETDEV_REGISTER notifier
in register_netdevice(), so use the non-locking functions
in net_failover_slave_register().
failover_slave_register() in failover_existing_slave_register() adds lock
and unlock ops too.
Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP") Signed-off-by: Faicker Mo <faicker.mo@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Victor Nogueira [Mon, 11 May 2026 18:30:58 +0000 (14:30 -0400)]
selftests/tc-testing: Add QFQ/CBS qlen underflow test
Since CBS was not calling reset for its child qdisc, there are scenarios
where it could cause an underflow on its parent's qlen/backlog. When the
parent is QFQ, a null-ptr deref could occur.
Add a test case that reproduces the underflow followed by a null-ptr
deref scenario.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jamal Hadi Salim [Mon, 11 May 2026 18:30:57 +0000 (14:30 -0400)]
net/sched: sch_cbs: Call qdisc_reset for child qdisc
During a reset, CBS is not calling reset on its child qdisc, which
might cause qlen/backlog accounting issues. For example, if we have CBS
with a QFQ parent and a netem child with delay, we can create a scenario
where the parent's qlen underflows. QFQ, specifically, uses qlen to
check whether it should deference a pointer, so this scenario may cause
a null-ptr deref in QFQ:
Fix this by calling qdisc_reset for CBS' child qdisc.
Sashiko caught an issue which could result in a null ptr deref if
qdisc_create_dflt() is invoked on an unitialised cbs qdisc which is exposed
by this patch. We add an early return if the qdisc is null to address this.
This is a similar approach used by two other fixes[1][2].
The proper fix for this specific issue elucidated by sashiko is to remove
the call to qdisc_reset when qdisc_create_dflt fails. Since the dflt qdisc
isn't attached anywhere yet at that point, calling the reset callback doesn't
make much sense (and as stated has been a source of two other bugs).
We plan on submitting this fix in a later patch.
[1] https://lore.kernel.org/netdev/20221018063201.306474-2-shaozhengchao@huawei.com/
[2] https://lore.kernel.org/netdev/20221018063201.306474-4-shaozhengchao@huawei.com/
Fixes: 585d763af09c ("net/sched: Introduce Credit Based Shaper (CBS) qdisc") Reported-by: Junyoung Jang <graypanda.inzag@gmail.com> Tested-by: Junyoung Jang <graypanda.inzag@gmail.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The reset actions of the DEFZA adapters are exceedingly slow, taking up
to 30 seconds to complete by the device spec and typically in the range
of 10 seconds in reality, as required for the device RTOS to boot, still
quite a lot. Therefore a state machine is used that's interrupt driven,
however a safety mechanism is required in case of adapter malfunction,
so that if no state change interrupt has arrived in time, then the
situation is taken care of.
The safety mechanism depends on the origin of the reset. For regular
adapter initialisation at the device probe time a sleep is requested.
However a reset is also required by the device spec when the adapter has
transitioned into the halted state, such as in response to a PC Trace
event in the course of ring fault recovery, possibly a common network
event. In that case no sleep is possible as a device halt is reported
at the hardirq level.
A timer is therefore set up to ensure progress in case no adapter state
change interrupt has arrived in time, but as from commit 168f6b6ffbee
("timers: Use del_timer_sync() even on UP") a warning is issued as the
timer is deleted in the hardirq handler upon an expected state change:
The immediate origin of the new warning is the switch away from aliasing
del_timer_sync() to del_timer() (timer_delete_sync() to timer_delete()
in terms of current function names) for UP configurations, which however
is the only choice for this driver anyway as no SMP hardware supports
the TURBOchannel bus this device interfaces to. Therefore there is a
very remote issue only this is a sign of.
Specifically if an adapter reset issued upon a transition to the halted
state times out and first triggers fza_reset_timer() for another reset
assertion, which then schedules fza_reset_timer() for reset deassertion
and then that second call is pre-empted after poking at the hardware,
but before the timer has been rearmed and owing to high system load
causing exceedingly high scheduling latency control is not handed back
before a transition to the uninitialised state has caused the timer to
be deleted even before it has been started, then fza_reset_timer() will
be called yet again and issue another reset even though by then the
adapter has already recovered.
Prevent this situation from happening by switching to timer_delete() for
the transition to the halted state and protect the code region affected
with a spinlock, also to make sure add_timer() has not been called twice
in a row due to an execution race between the interrupt handler and the
timer handler (though it could only happen on SMP, but let's keep the
driver clean). It's a very unlikely sequence of events to happen and
therefore there's no point in trying to be overly clever about it, such
as by placing printk() calls outside the protection. For the transition
to the uninitialised state switch to timer_delete_sync_try() instead, so
that a timer isn't deleted that's just been rearmed by the timer handler
and needs to watch for the device to come out of reset again (again, an
SMP scenario only).
Retain timer_delete_sync() invocations outside the hardirq context for a
stray timer not to fire once device structures have been released.
Fixes: 61414f5ec9834 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Niklas Söderlund [Sun, 10 May 2026 10:30:17 +0000 (12:30 +0200)]
net: ethernet: ravb: Do not check URAM suspension when WoL is active
When updating the driver to match latest datasheet to suspend access to
URAM when suspending DMA transfers a corner-case was missed, URAM access
will not be suspended if WoL is enabled. This lead to the error message
(correctly) being triggered as URAM access is not suspended even tho
it's requested as part of stopping DMA.
Avoid checking if URAM access is suspended and printing the error
message if WoL is enabled when we suspend the system, as we know it will
not be.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdWnjV%3DHGE1o08zLhUfTgOSene5fYx1J5GG10mB%2BToq8qg@mail.gmail.com/ Fixes: 353d8e7989b6 ("net: ethernet: ravb: Suspend and resume the transmission flow") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chenguang Zhao [Mon, 11 May 2026 01:43:43 +0000 (09:43 +0800)]
ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
ethnl_bitmap32_not_zero() should return true if some bit in [start, end)
is set:
- Fix inverted memchr_inv() sense: return true when the scan finds a
non-zero byte, not when the middle words are all zero.
- Return false for an empty interval (end <= start).
- When end is 32-bit aligned, indices in [start, end) do not include any
bits from map[end_word]; return false after earlier checks found no
non-zero data.
Xiang Mei [Sun, 10 May 2026 22:26:40 +0000 (15:26 -0700)]
net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
The smc_msg_event tracepoint class, shared by smc_tx_sendmsg and
smc_rx_recvmsg, unconditionally dereferences smc->conn.lnk:
__string(name, smc->conn.lnk->ibname)
conn->lnk is only set for SMC-R; for SMC-D it is NULL. Other code on
these paths already handles this (e.g. !conn->lnk in
SMC_STAT_RMB_TX_SIZE_SMALL()). With the tracepoint enabled, the first
sendmsg()/recvmsg() on an SMC-D socket crashes:
Oops: general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [...]
RIP: 0010:strlen+0x1e/0xa0
Call Trace:
trace_event_raw_event_smc_msg_event (net/smc/smc_tracepoint.h:44)
smc_rx_recvmsg (net/smc/smc_rx.c:515)
smc_recvmsg (net/smc/af_smc.c:2859)
__sys_recvfrom (net/socket.c:2315)
__x64_sys_recvfrom (net/socket.c:2326)
do_syscall_64
The faulting address 0x3e0 is offsetof(struct smc_link, ibname),
confirming the NULL ->lnk deref. Enabling the tracepoint requires
root, but the trigger itself is unprivileged: socket(AF_SMC, ...) has
no capability check, and SMC-D negotiation needs no admin step on
s390 or on x86 with the loopback ISM device loaded.
Log an empty device name for SMC-D instead of dereferencing NULL.
Fixes: aff3083f10bf ("net/smc: Introduce tracepoints for tx and rx msg") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nicolò Coccia [Sun, 10 May 2026 16:34:13 +0000 (12:34 -0400)]
net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
A logic flaw in __smc_setsockopt() allows a local unprivileged user to
cause a Denial of Service (DoS) by holding the socket lock indefinitely.
The function __smc_setsockopt() calls copy_from_sockptr() while holding
lock_sock(sk). By passing a userfaultfd-monitored memory page (or
FUSE-backed memory on systems where unprivileged userfaultfd is disabled)
as the optval, an attacker can halt execution during the copy operation,
keeping the lock held.
Combined with asynchronous tear-down operations like shutdown(), this
exhausts the kernel wq (kworkers) and triggers the hung task watchdog.
[ 240.123456] INFO: task kworker/u8:2 blocked for more than 120 seconds.
[ 240.123489] Call Trace:
[ 240.123501] smc_shutdown+...
[ 240.123512] lock_sock_nested+...
This patch moves the user-space copy outside the lock_sock() critical
section to prevent the issue.
Fixes: a6a6fe27bab4 ("net/smc: Dynamic control handshake limitation by socket options") Signed-off-by: Nicolò Coccia <n.coccia96@gmail.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Yang [Sat, 9 May 2026 12:23:58 +0000 (20:23 +0800)]
net: atm: fix skb leak in sigd_send() default branch
The default branch in sigd_send() calls sock_put() and returns -EINVAL
without freeing the skb, while all other exit paths do so. Add the
missing dev_kfree_skb() before sock_put() to fix the leak.
phy_remove() clears phydev->drv but doesn't call phy_detach(), so the
phy_device stays in the link topology xarray and ethnl_req_get_phydev()
still hands it back. ETHTOOL_MSG_PHY_GET then oopses on:
drvname is already treated as optional by phy_reply_size(),
phy_fill_reply() and phy_cleanup_data(), so just skip the allocation
when there is no driver bound.
Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Cc: stable@vger.kernel.org # 6.13.x Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260509215046.107157-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zoran Ilievski [Mon, 11 May 2026 06:40:02 +0000 (08:40 +0200)]
net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
The shutdown handler aq_pci_shutdown() unconditionally calls
pci_wake_from_d3(pdev, false), clearing the PCI PME_En bit even when
wake-on-LAN has been configured. While aq_nic_shutdown() correctly
programs the NIC firmware via aq_nic_set_power() to listen for magic
packets, the PCI subsystem will not propagate the resulting PME wake
event from D3, so the system never wakes after poweroff.
WOL from suspend (S3) is unaffected because aq_suspend_common() does
not touch pci_wake_from_d3() and relies on the PM core's wake
configuration via device_may_wakeup().
This affects all atlantic-supported NICs (AQC107/108/111/112/113);
users have reported that WOL works if the atlantic driver is never
loaded, but breaks once it has run its shutdown path.
Pass the configured WOL state to pci_wake_from_d3() instead of a
literal false, so the PCI PME_En bit is preserved when the user has
armed WOL via ethtool.
Paolo Abeni [Tue, 12 May 2026 14:15:03 +0000 (16:15 +0200)]
Merge branch 'net-shaper-fix-various-minor-bugs'
Jakub Kicinski says:
====================
net: shaper: fix various minor bugs
Fix various minor bugs in the net shaper API.
First 2 patches deal with ordering issues around inserting
and publishing new shapers. Shapers are inserted "tentatively"
and marked valid only after HW op succeeded, this used to
be slightly racy.
Only other patch of note is patch 8. We want to add a Netlink
policy check on the handle ID. This necessitates patch 7.
Jakub Kicinski [Sun, 10 May 2026 19:29:04 +0000 (12:29 -0700)]
net: shaper: reject QUEUE scope handle with missing id
net_shaper_parse_handle() does not enforce that the user provides
the handle ID. For NODE the ID defaults to UNSPEC for all other
cases it defaults to 0.
For NETDEV 0 is the only option. For QUEUE defaulting to 0 makes
less intuitive sense. Specifically because the behavior should
(IMHO) be the same for all cases where there may be more than
one ID (QUEUE and NODE).
We should either document this as intentional or reject.
I picked the latter with no strong conviction.
Jakub Kicinski [Sun, 10 May 2026 19:29:03 +0000 (12:29 -0700)]
net: shaper: enforce singleton NETDEV scope with id 0
The NETDEV scope represents a singleton root shaper in the per-device
hierarchy. All code assumes NETDEV shapers have id 0:
net_shaper_default_parent() hardcodes parent->id = 0 when returning
the NETDEV parent for QUEUE/NODE children, and the UAPI documentation
describes NETDEV scope as "the main shaper" (singular, not plural).
net_shaper_parse_handle() reads the user-supplied handle ID via
nla_get_u32(), accepting the full u32 range. However, the xarray key
is built by net_shaper_handle_to_index() using
FIELD_PREP(NET_SHAPER_ID_MASK, handle->id), where NET_SHAPER_ID_MASK
is GENMASK(25, 0) - only 26 bits wide. FIELD_PREP silently masks off
the upper bits at runtime. A user-supplied NODE id like 0x04000123
becomes id 0x123.
Additionally, a user-supplied id equal to NET_SHAPER_ID_UNSPEC
(0x03FFFFFF, which is NET_SHAPER_ID_MASK itself) would collide with
the sentinel used internally by the group operation to signal
"allocate a new NODE id".
Reject user-supplied IDs >= NET_SHAPER_ID_MASK (i.e., >= 0x03FFFFFF)
in the policy.
Jakub Kicinski [Sun, 10 May 2026 19:29:01 +0000 (12:29 -0700)]
tools: ynl: add scope qualifier for definitions
Using definitions in kernel policies is awkward right now.
On one hand we want defines for max values and such.
On the other we don't have a way of adding kernel-only defines.
Adding unnecessary defines to uAPI is a bad idea, we won't
be able to delete them. And when it comes to policy user
space should just query it via the policy dump, not use
hard coded defines.
Add a "scope" property to definitions, which will let us tell
the codegen that a definition is for kernel use only. Support
following values:
- uapi: render into the uAPI header (default, today's behavior)
- kernel: render to kernel header only
- user: same as kernel but for the user-side generated header
Definitions may have a header property (definition is "external",
provided by existing header). Extend the scope to headers, too.
If definition has both scope and header properties we will only
generate the includes in the right scope.
Jakub Kicinski [Sun, 10 May 2026 19:29:00 +0000 (12:29 -0700)]
net: shaper: fix undersized reply skb allocation in GROUP command
net_shaper_group_send_reply() writes both the NET_SHAPER_A_IFINDEX
attribute (via net_shaper_fill_binding()) and the nested
NET_SHAPER_A_HANDLE attribute (via net_shaper_fill_handle()), but
the reply skb at the call site in net_shaper_nl_group_doit() is
allocated using net_shaper_handle_size(), which only accounts for
the nested handle.
The allocation is therefore short by nla_total_size(sizeof(u32))
(8 bytes) for the IFINDEX attribute. In practice the slab allocator
rounds up the small allocation so the bug is latent, but the size
accounting is wrong and could bite if the reply grew further.
Introduce net_shaper_group_reply_size() that accounts for the full
reply payload and use it both at the genlmsg_new() call site and in
the defensive WARN_ONCE message.
Jakub Kicinski [Sun, 10 May 2026 19:28:57 +0000 (12:28 -0700)]
net: shaper: reject duplicate leaves in GROUP request
net_shaper_nl_group_doit() does not deduplicate NET_SHAPER_A_LEAVES
entries. When userspace supplies the same leaf handle twice, the same
old-parent pointer lands twice in old_nodes[]. The cleanup loop double
frees the parent. Of course the same parent may still be in old_nodes[]
twice if we are moving multiple of its leaves.
Note that this patch also implicitly fixes the fact that the
i >= leaves_count path forgets to set ret.
Jakub Kicinski [Sun, 10 May 2026 19:28:55 +0000 (12:28 -0700)]
net: shaper: flip the polarity of the valid flag
The usual way of inserting entries which are not yet fully ready
into XArray is to have a VALID flag. The shaper code has a NOT_VALID
flag. Since XArray code does not let us create entries with marks
already set - the creation of entries is currently not atomic.
Flip the polarity of the VALID flag. This closes the tiny race
in net_shaper_pre_insert() of entries being created without
the NOT_VALID flag.
net: ethernet: cs89x0: remove stale CONFIG_MACH_MX31ADS reference
The legacy ARM board file for MACH_MX31ADS was removed in commit c93197b0041d ("ARM: imx: Remove i.MX31 board files"), but a reference
to it remained in the cs89x0 driver. Drop this unused code.
Linus Walleij [Fri, 8 May 2026 22:13:38 +0000 (00:13 +0200)]
net: ethernet: cortina: Carry over frag counter
The gmac_rx() NAPI poll function assembles packets in an
SKB from a ring buffer.
If the ring buffer gets completely emptied during a poll cycle,
we exit gmac_rx(), but the packet is not yet completely
assembled in the SKB, yet the fragment counter frag_nr is
reset to zero on the next invocation.
Solve this by making the RX fragment counter a part of the
port struct, and carry it over between invocations.
Reset the fragment counter only right after calling
napi_gro_frags(), on error (after calling napi_free_frags())
or if stopping the port.
Reset it in some place where not strictly necessary just to
emphasize what is going on.
This was found by Sashiko during normal patch review.
Linus Walleij [Fri, 8 May 2026 22:13:37 +0000 (00:13 +0200)]
net: ethernet: cortina: Make RX SKB per-port
The SKB used to assemble packets from fragments in gmac_rx()
is static local, but the Gemini has two ethernet ports, meaning
there can be races between the ports on a bad day if a device
is using both.
Make the RX SKB a per-port variable and carry it over between
invocations in the port struct instead.
Zero the pointer once we call napi_gro_frags(), on error (after
calling napi_free_frags()) or if the port is stopped.
Zero it in some place where not strictly necessary just to
emphasize what is going on.
This was found by Sashiko during normal patch review.
====================
vsock/virtio: fix vsockmon tap skb construction
While reviewing the patch posted by Yiqi Sun [1] to fix an issue in
virtio_transport_build_skb(), I discovered another issue related to
the offset and length of the payload to be copied in the new skb.
This was introduced when we did the skb conversion, and fixed by
patch 1.
Patch 2 fixes the issue found by Yiqi Sun in a different way: using
iov_iter_kvec() to properly initialize all the iov_iter fields and
removing the linear vs non-linear split like we alredy do in
vhost-vsock.
It could have been a single patch, but since there were two affected
commits, I decided to keep the fixes separate.
vsock/virtio: fix empty payload in tap skb for non-linear buffers
For non-linear skbs, virtio_transport_build_skb() goes through
virtio_transport_copy_nonlinear_skb() to copy the original payload
in the new skb to be delivered to the vsockmon tap device.
This manually initializes an iov_iter but does not set iov_iter.count.
Since the iov_iter is zero-initialized, the copy length is zero and no
payload is actually copied to the monitor interface, leaving data
un-initialized.
Fix this by removing the linear vs non-linear split and using
skb_copy_datagram_iter() with iov_iter_kvec() for all cases, as
vhost-vsock already does. This handles both linear and non-linear skbs,
properly initializes the iov_iter, and removes the now unused
virtio_transport_copy_nonlinear_skb().
While touching this code, let's also check the return value of
skb_copy_datagram_iter(), even though it's unlikely to fail.
Fixes: 4b0bf10eb077 ("vsock/virtio: non-linear skb handling for tap") Reported-by: Yiqi Sun <sunyiqixm@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Arseniy Krasnov <avkrasnov@rulkc.org> Link: https://patch.msgid.link/20260508164411.261440-3-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
vsock/virtio: fix length and offset in tap skb for split packets
virtio_transport_build_skb() builds a new skb to be delivered to the
vsockmon tap device. To build the new skb, it uses the original skb
data length as payload length, but as the comment notes, the original
packet stored in the skb may have been split in multiple packets, so we
need to use the length in the header, which is correctly updated before
the packet is delivered to the tap, and the offset for the data.
This was also similar to what we did before commit 71dc9ec9ac7d
("virtio/vsock: replace virtio_vsock_pkt with sk_buff") where we probably
missed something during the skb conversion.
Also update the comment above, which was left stale by the skb
conversion and still mentioned a buffer pointer that no longer exists.
Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Arseniy Krasnov <avkrasnov@rulkc.org> Link: https://patch.msgid.link/20260508164411.261440-2-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Quan Sun [Fri, 8 May 2026 12:46:36 +0000 (20:46 +0800)]
net: hsr: fix NULL pointer dereference in hsr_get_node_data()
In the HSR (High-availability Seamless Redundancy) protocol, node
information is maintained in the node_db. When a supervision frame is
received, node->addr_B_port is updated to track the receiving port type
(e.g., HSR_PT_SLAVE_B).
If the underlying physical interface associated with this slave port is
removed (e.g., via `ip link del`), hsr_del_port() frees the hsr_port
object. However, the stale node->addr_B_port reference is kept in the
node_db until the node ages out.
Subsequently, if userspace queries the node status via the Netlink
command HSR_C_GET_NODE_STATUS, the kernel calls hsr_get_node_data().
This function unconditionally dereferences the pointer returned by
hsr_port_get_hsr():
if (node->addr_B_port != HSR_PT_NONE) {
port = hsr_port_get_hsr(hsr, node->addr_B_port);
*addr_b_ifindex = port->dev->ifindex; // <-- NULL deref
}
If the slave port has been deleted, hsr_port_get_hsr() returns NULL,
resulting in a kernel panic.
Oops: general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:hsr_get_node_data+0x7b6/0x9e0
Call Trace:
<TASK>
hsr_get_node_status+0x445/0xa40
Fix this by adding a proper NULL pointer check. If the port lookup fails
due to a stale port type, gracefully treat it as if no valid port exists
and assign -1 to the interface index.
Steps to reproduce:
1. Create an HSR interface with two slave devices.
2. Receive a supervision frame to populate node_db with
addr_B_port assigned to SLAVE_B.
3. Delete the underlying slave device B.
4. Send an HSR_C_GET_NODE_STATUS Netlink message.
qed: fix division by zero in qed_init_wfq_param when all vports are configured
In qed_init_wfq_param(), variable non_requested_count can become zero
when the number of vports with the configured flag set (including the
current vport being configured) equals total num_vports. This happens
when configuring the last unconfigured vport or when re-configuring
an already configured vport.
The function then calculates left_rate_per_vp = total_left_rate /
non_requested_count, which causes division by zero.
Fix this by skipping the division when non_requested_count is zero.
In that case, there is no remaining bandwidth to distribute, so just
record the configuration for the current vport and return success.
Davide Caratti [Fri, 8 May 2026 17:05:10 +0000 (19:05 +0200)]
net/sched: dualpi2: initialize timer earlier in dualpi2_init()
'pi2_timer' needs to be initialized in all error paths of dualpi2_init():
otherwise, a failure in qdisc_create_dflt() causes the following crash in
dualpi2_destroy():
net: ena: PHC: Check return code before setting timestamp output
ena_phc_gettimex64() is setting the output parameter regardless
of whether ena_com_phc_get_timestamp() succeeded or failed.
When ena_com_phc_get_timestamp() returns an error, the timestamp
parameter may contain uninitialized stack memory (e.g., when PHC is
disabled or in blocked state) or invalid hardware values. Passing
these to userspace via the PTP ioctl is both a security issue
(information leak) and a correctness bug.
Fix by checking the return code after releasing the lock and only
setting the output timestamp on success.
Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver") Cc: stable@vger.kernel.org Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260507003518.22554-1-akiyano@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/rds: reset op_nents when zerocopy page pin fails
When iov_iter_get_pages2() fails in rds_message_zcopy_from_user(),
the pinned pages are released with put_page(), and
rm->data.op_mmp_znotifier is cleared. But we fail to properly
clear rm->data.op_nents.
Later when rds_message_purge() is called from rds_sendmsg() the
cleanup loop iterates over the incorrectly non zero number of
op_nents and frees them again.
Fix this by properly resetting op_nents when it should be in
rds_message_zcopy_from_user().
net: xgene: fix mdio_np leak in xgene_mdiobus_register()
The for_each_child_of_node() loop captures mdio_np via break,
holding the refcount. of_mdiobus_register() does not consume the
reference, so it leaks on success.
Jakub Kicinski [Sun, 10 May 2026 17:07:37 +0000 (10:07 -0700)]
Merge tag 'batadv-net-pullrequest-20260508' of https://git.open-mesh.org/batadv
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix integer overflow on buff_pos, by Lyes Bourennani
- fix invalid tp_meter access during teardown, by Jiexun Wang (2 patches)
- stop caching unowned originator pointers in BAT IV, by Jiexun Wang
- tp_meter: fix tp_num leak on kmalloc failure, by Sven Eckelmann
- fix BLA refcounting issues, by Sven Eckelmann (3 patches)
* tag 'batadv-net-pullrequest-20260508' of https://git.open-mesh.org/batadv:
batman-adv: bla: put backbone reference on failed claim hash insert
batman-adv: bla: only purge non-released claims
batman-adv: bla: prevent use-after-free when deleting claims
batman-adv: tp_meter: fix tp_num leak on kmalloc failure
batman-adv: stop caching unowned originator pointers in BAT IV
batman-adv: stop tp_meter sessions during mesh teardown
batman-adv: reject new tp_meter sessions during teardown
batman-adv: fix integer overflow on buff_pos
====================
net: ena: PHC: Fix potential use-after-free in get_timestamp
Move the phc->active check and resp pointer assignment to after
acquiring the spinlock. Previously, phc->active was checked without
holding the lock, and resp was cached from ena_dev->phc.virt_addr
before the lock was acquired.
If ena_com_phc_destroy() runs between the lockless active check and
the lock acquisition, it sets active=false, releases the lock, frees
the DMA memory, and sets virt_addr=NULL. The get_timestamp path would
then read a NULL virt_addr and dereference it.
With both the active check and the pointer read under the lock,
destroy cannot free the memory while get_timestamp is using it.
Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver") Cc: stable@vger.kernel.org Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260508062126.7273-1-akiyano@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tools/ynl: add missing uapi header deps in Makefile.deps
ethtool.h includes linux/typelimits.h which is a relatively new header
not yet shipped in most distro kernel-header packages. Without the
explicit entry, the build silently falls through to -idirafter.
dev_energymodel.h is a new YNL family whose uapi header is not in
system paths at all and was missing a CFLAGS entry entirely.
Holger Brunck [Thu, 7 May 2026 15:53:32 +0000 (17:53 +0200)]
net: wan: fsl_ucc_hdlc: free tx_skbuff in uhdlc_memclean
When the device is removed all allocated resources should be freed.
In uhdlc_memclean the netdev transmit queue was already stopped. But at
this point we may have pending skb in the transmit queue which must be
freed. Therefore iterate over the tx_skbuff pointers and free all
pending skb. The issue was discovered by sashiko.
Tested on a ls1043a board running HDLC in bus mode on kernel 6.12.
Jakub Kicinski [Sat, 9 May 2026 01:28:26 +0000 (18:28 -0700)]
Merge tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Allow initial x_tables table replacement without emitting an audit
log message. Delay the register message until after hooks are wired up
to avoid unnecessary unregister logs during error unwinding.
2) Fix a NULL dereference by allocating hook ops before adding the
table to the per-netns list. Use `synchronize_rcu()` during error
unwinding to ensure the table stops processing packets before
teardown. Defer audit log register message until all operations
succeed.
3) Refactor xtables to use a single `xt_unregister_table_pre_exit`
function. Eliminate code duplication by centralizing table
unregistration logic within the xtables core. ebtables cannot be
changed due to incompatibility.
4) Unregister xtables templates before module removal. This prevents
a race condition where userspace instantiates a new table after the
pernet unreg removed the current table.
5) Add `xtables_unregister_table_exit` to fully unregister netfilter
tables during module removal. Unlink the table from dying lists,
then free hook operations.
6) Implement a two-stage removal scheme for ebtables following the
x_tables pattern. Assign table->ops while holding the ebt mutex to
prevent exposing partially-filled structures.
7) Fix ebtables module initialization race. Register the template last
in table initialization functions. Prevent table instantiation before
pernet operations are available.
8) Fix a race condition in x_tables module initialization. Ensure
pernet ops are fully set up before exposing the table to userspace.
9) Fix a race condition in ebtables module initialization, similar to
previous patch.
10) Restore propagation of helper to expected connection, this is a
fix-for-recent-fix.
11) Validate that the expectation tuple and mask netlink attributes are
present when adding expectation via nfqueue, this fixes a possible
null-ptr-deref.
12) Fix possible rare memleak in the SIP helper in case helper has been
detached from conntrack entry, from Li Xiasong.
13) Fix refcount leak in nft_ct when creating custom expectation, also
from Li Xiason.
Patches 1-9 from Florian Westphal.
10) Restore propagation of helper to expected connection, this is a
fix-for-recent-fix.
11) Check that tuple and mask netlink attributes are set when creating an
expectation via nfqueue.
* tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_ct: fix missing expect put in obj eval
netfilter: nf_conntrack_sip: get helper before allocating expectation
netfilter: ctnetlink: check tuple and mask in expectations created via nfqueue
netfilter: nf_conntrack_expect: restore helper propagation via expectation
netfilter: bridge: eb_tables: close module init race
netfilter: x_tables: close dangling table module init race
netfilter: ebtables: close dangling table module init race
netfilter: ebtables: move to two-stage removal scheme
netfilter: x_tables: add and use xtables_unregister_table_exit
netfilter: x_tables: unregister the templates first
netfilter: x_tables: add and use xt_unregister_table_pre_exit
netfilter: x_tables: allocate hook ops while under mutex
netfilter: x_tables: allow initial table replace without emitting audit log message
====================
Ben Morris [Fri, 8 May 2026 00:14:55 +0000 (17:14 -0700)]
sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL
The SCTP_SENDALL path in sctp_sendmsg() iterates ep->asocs with
list_for_each_entry_safe(), which caches the next entry in @tmp before
the loop body runs. The body calls sctp_sendmsg_to_asoc(), which may
drop the socket lock inside sctp_wait_for_sndbuf().
While the lock is dropped, another thread can SCTP_SOCKOPT_PEELOFF the
association cached in @tmp, migrating it to a new endpoint via
sctp_sock_migrate() (list_del_init() + list_add_tail() to
newep->asocs), and optionally close the new socket which frees the
association via kfree_rcu(). The cached @tmp can also be freed by a
network ABORT for that association, processed in softirq while the
lock is dropped.
sctp_wait_for_sndbuf() revalidates @asoc (the current entry) on re-lock
via the "sk != asoc->base.sk" and "asoc->base.dead" checks, but nothing
revalidates @tmp. After a successful return, the iterator advances to
the stale @tmp, yielding either a use-after-free (if the peeled socket
was closed) or a list-walk onto the new endpoint's list head (type
confusion of &newep->asocs as a struct sctp_association *).
Both are reachable from CapEff=0; the type-confusion path gives
controlled indirect call via the outqueue.sched->init_sid pointer.
Fix by re-deriving @tmp from @asoc after sctp_sendmsg_to_asoc()
returns. @asoc is known to still be on ep->asocs at that point: the
only callers that list_del an association from ep->asocs are
sctp_association_free() (which sets asoc->base.dead) and
sctp_assoc_migrate() (which changes asoc->base.sk), and
sctp_wait_for_sndbuf() checks both under the lock before any
successful return; a tripped check propagates as err < 0 and the loop
bails before the re-derive.
The SCTP_ABORT path in sctp_sendmsg_check_sflags() returns 0 and the
loop hits 'continue' before sctp_sendmsg_to_asoc() is ever called, so
the @tmp cached by list_for_each_entry_safe() still covers the
lock-held free that ba59fb027307 ("sctp: walk the list of asoc
safely") was added for.
Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg") Cc: stable@vger.kernel.org Signed-off-by: Ben Morris <bmorris@anthropic.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260508001455.3137-1-joycathacker@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ti: icssm-prueth: fix eth_ports_node leak in probe
The error path on of_property_read_u32() failure inside
icssm_prueth_probe() returns without putting eth_ports_node,
which was acquired before the for_each_child_of_node() loop.
Myeonghun Pak [Wed, 6 May 2026 12:43:11 +0000 (21:43 +0900)]
net: lan966x: avoid unregistering netdev on register failure
lan966x_probe_port() stores the newly allocated net_device in the
port before calling register_netdev(). If register_netdev() fails,
the probe error path calls lan966x_cleanup_ports(), which sees
port->dev and calls unregister_netdev() for a device that was never
registered.
Destroy the phylink instance created for this port and clear port->dev
before returning the registration error. The common cleanup path now skips
ports without port->dev before reaching the registered netdev cleanup, so
it only handles ports that reached the registered-netdev lifetime.
This also avoids treating an uninitialized FDMA netdev and the failed port
as a NULL == NULL match in the common cleanup path.
Fixes: d28d6d2e37d1 ("net: lan966x: add port module support") Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Link: https://patch.msgid.link/20260506124331.31945-1-mhun512@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf)
Matt Volrath fixes two issues with the i40e driver probe routine, ensuring
that PTP is properly cleaned up if the probe fails.
Emil corrects the initialization of the read_dev_clk_lock spinlock in
idpf_ptp_init, ensuring it is initialized prior to when the
ptp_schedule_worker() is called.
Greg KH fixes a double free and use-after free in the idpf auxiliary device
error paths.
Marcin fixes ice_set_rss_hfunc() to use the correct q_opt_flags field,
correcting the assignment and preventing submission of invalid data to the
firmware.
Bart corrects the locking in ice_dcb_rebuild(), ensuring that the tc_mutex
is held over the entire operation.
Ivan fixes the rclk pin state get for E810 devices, ensuring the index is
properly offset by the base_rclk_idx value. This ensures that the correct
pin index is used to look up recovered clock state. He additionally adds
bounds checking to prevent attempting to access pins outside of the pin
state array.
Ivan also moves the CGU register macros to the top of ice_dpll.h, inside
the header guard to avoid duplicate macro definitions should the ice_dpll.h
header is included multiple times.
====================
Ivan Vecera [Wed, 6 May 2026 21:48:17 +0000 (14:48 -0700)]
ice: dpll: fix misplaced header macros
The CGU register definitions (ICE_CGU_R10, ICE_CGU_R11 and related field
masks) were placed after the #endif of the _ICE_DPLL_H_ include guard,
leaving them unprotected. Move them inside the guard.
Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-8-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Wed, 6 May 2026 21:48:16 +0000 (14:48 -0700)]
ice: dpll: fix rclk pin state get for E810
The refactoring of ice_dpll_rclk_state_on_pin_get() to use
ice_dpll_pin_get_parent_idx() omitted the base_rclk_idx adjustment that was
correctly added in the ice_dpll_rclk_state_on_pin_set() path. This breaks
E810 devices where base_rclk_idx is non-zero, causing the wrong hardware
index to be used for pin state lookup and incorrect recovered clock state
to be reported via the DPLL subsystem. E825C is unaffected as its
base_rclk_idx is 0.
While at it, add bounds check against ICE_DPLL_RCLK_NUM_MAX on hw_idx after
the base_rclk_idx subtraction in both ice_dpll_rclk_state_on_pin_{get,set}()
to prevent out-of-bounds access on the pin state array.
Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-7-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bart Van Assche [Wed, 6 May 2026 21:48:15 +0000 (14:48 -0700)]
ice: fix locking in ice_dcb_rebuild()
Move the mutex_lock() call up to prevent that DCB settings change after
the first ice_query_port_ets() call. The second ice_query_port_ets()
call in ice_dcb_rebuild() is already protected by pf->tc_mutex.
This also fixes a bug in an error path, as before taking the first
"goto dcb_error" in the function jumped over mutex_lock() to
mutex_unlock().
This bug has been detected by the clang thread-safety analyzer.
Cc: intel-wired-lan@lists.osuosl.org Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-6-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marcin Szycik [Wed, 6 May 2026 21:48:14 +0000 (14:48 -0700)]
ice: fix setting RSS VSI hash for E830
ice_set_rss_hfunc() performs a VSI update, in which it sets hashing
function, leaving other VSI options unchanged. However, ::q_opt_flags is
mistakenly set to the value of another field, instead of its original
value, probably due to a typo. What happens next is hardware-dependent:
On E810, only the first bit is meaningful (see
ICE_AQ_VSI_Q_OPT_PE_FLTR_EN) and can potentially end up in a different
state than before VSI update.
On E830, some of the remaining bits are not reserved. Setting them
to some unrelated values can cause the firmware to reject the update
because of invalid settings, or worse - succeed.
Reproducer:
sudo ethtool -X $PF1 equal 8
Output in dmesg:
Failed to configure RSS hash for VSI 6, error -5
Fixes: 352e9bf23813 ("ice: enable symmetric-xor RSS for Toeplitz hash function") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-5-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
idpf: fix double free and use-after-free in aux device error paths
When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
auxiliary_device_uninit() and falls through to err_aux_dev_init. The
uninit call will trigger put_device(), which invokes the release
callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
iadev. The fall-through then reads adev->id from the freed iadev for
ida_free() and double-frees iadev with kfree().
Free the IDA slot and clear the back-pointer before uninit, while adev
is still valid, then return immediately.
Commit 65637c3a1811 ("idpf: fix UAF in RDMA core aux dev deinitialization")
fixed the same use-after-free in the matching unplug path in this file but
missed both probe error paths.
Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: stable@kernel.org Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy") Fixes: f4312e6bfa2a ("idpf: implement core RDMA auxiliary dev create, init, and destroy") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-4-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Emil Tantilov [Wed, 6 May 2026 21:48:12 +0000 (14:48 -0700)]
idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()
In idpf_ptp_init(), read_dev_clk_lock is initialized after
ptp_schedule_worker() had already been called (and after
idpf_ptp_settime64() could reach the lock). The PTP aux worker
fires immediately upon scheduling and can call into
idpf_ptp_read_src_clk_reg_direct(), which takes
spin_lock(&ptp->read_dev_clk_lock) on an uninitialized lock, triggering
the lockdep "non-static key" warning:
Move the call to spin_lock_init() up a bit to make sure read_dev_clk_lock
is not touched before it's been initialized.
Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock") Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-3-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matt Vollrath [Wed, 6 May 2026 21:48:10 +0000 (14:48 -0700)]
i40e: Cleanup PTP registration on probe failure
Fix two conditions which would leak PTP registration on probe failure:
1. i40e_setup_pf_switch can encounter an error in
i40e_setup_pf_filter_control, call i40e_ptp_init, then return
non-zero, sending i40e_probe to err_vsis.
2. i40e_setup_misc_vector can return non-zero, sending i40e_probe to
err_vsis.
Both of these conditions have been present since PTP was introduced in
this driver.
Found with coccinelle.
Fixes: beb0dff1251db ("i40e: enable PTP") Signed-off-by: Matt Vollrath <tactii@gmail.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260506-jk-iwl-net-2026-05-04-v2-1-a5ea4dc837a9@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mohsin Bashir [Wed, 6 May 2026 23:37:45 +0000 (16:37 -0700)]
net: shaper: Reject reparenting of existing nodes
When an existing node-scope shaper is moved to a different parent
via the group operation, the framework fails to update the leaves
count on both the old and new parent shapers. Only newly created
nodes (handle.id == NET_SHAPER_ID_UNSPEC) trigger the parent
leaves increment at line 1039.
This causes the parent's leaves counter to diverge from the
actual number of children in the xarray. When the node is later
deleted, pre_del_node() allocates an array sized by the stale
leaves count, but the xarray iteration finds more children than
expected, hitting the WARN_ON_ONCE guard and returning -EINVAL.
Rather than adding reparenting support with complex leaves count
bookkeeping, reject group calls that attempt to change an existing
node's parent. Updates to an existing node's rate or leaves under
the same parent remain permitted. We expect that for any modification
of the topology user should always create new groups and let the
kernel garbage collect the leaf-less nodes.
Alice Ryhl [Wed, 6 May 2026 20:07:13 +0000 (20:07 +0000)]
genetlink: free the skb on 'group >= family->n_mcgrps'
These methods generally consume ownership of the provided skb, so even
if an error path is encountered, the skb is freed. This is because the
very first thing they do after some initial setup is to unconditionally
consume the skb via consume_skb(skb). Any subsequent errors lead to the
core netlink layer freeing the skb.
However, there is one check that occurs before ownership is passed,
which is the check for the group index. So if this error condition is
encountered, then the skb is leaked. This error condition is generally
considered a violation of the netlink API, so it's not expected to occur
under normal circumstances. For the same reason, no callers check for
this error condition, and no callers need to be adjusted. However, we
should still follow the same ownership semantics of the rest of the
function. Thus, free the skb in this codepath.
Ilya Maximets [Thu, 7 May 2026 12:04:26 +0000 (14:04 +0200)]
net: nsh: fix incorrect header length macros
NSH header length is a 6-bit field that encodes the total length of
the header in 4-byte words. So the maximum length is 0b111111 * 4,
which is 252 and not 256. The maximum context length is the same
number minus the length of the base header (8), so 244.
These macros are used to validate push_nsh() action in openvswitch.
Miscalculation here doesn't cause any real issues. In the worst case
the oversized context is truncated while building the header, so we'll
construct and send a broken packet, which is not a big problem, as any
receiver should validate the fields. No invalid memory accesses will
happen during the header push. But we should fix the macros to reject
the incorrect actions in the first place.
Using previously defined values and calculating the length instead
of defining numbers directly, so it's easier to understand where they
come from and harder to make a mistake.
Quan Sun [Thu, 7 May 2026 13:17:38 +0000 (21:17 +0800)]
net: ethtool: fix NULL pointer dereference in phy_reply_size
In phy_prepare_data(), several strings such as 'name', 'drvname',
'upstream_sfp_name', and 'downstream_sfp_name' are allocated using
kstrdup(). However, these allocations were not checked for failure.
If kstrdup() fails for 'name', it returns NULL while the function
continues. This leads to a kernel NULL pointer dereference and panic
later in phy_reply_size() when it unconditionally calls strlen() on
the NULL pointer.
While other strings like 'upstream_sfp_name' might be checked before
access in certain code paths, failing to handle these allocations
consistently can lead to incomplete data reporting or hidden bugs.
Fix this by adding proper NULL checks for all kstrdup() calls in
phy_prepare_data() and implement a centralized error handling path
using goto labels to ensure all previously allocated resources are
freed on failure.
Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260507131738.1173835-1-2022090917019@std.uestc.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nicolas Ferre [Thu, 7 May 2026 12:04:44 +0000 (14:04 +0200)]
MAINTAINERS: change maintainers for macb Ethernet driver
I would like to hand over the macb maintenance to Théo, as I'm unable to
keep up with the recent flow of patches for this driver. After speaking
with Claudiu, he indicated that he is in the same position as me.
To help with this work, Conor has agreed to act as a reviewer.
I was given responsibility for this driver years ago, and I'm glad to
see it continue with talented developers.
Dragos Tatulea [Wed, 6 May 2026 09:08:08 +0000 (09:08 +0000)]
net: napi: Avoid gro timer misfiring at end of busypoll
When in irq deferral mode (defer-hard-irqs > 0), a short enough
gro-flush timeout can trigger before NAPI_STATE_SCHED is cleared if the
last poll in busy_poll_stop() takes too long. This can have the effect
of leaving the queue stuck with interrupts disabled and no timer armed
which results in a tx timeout if there is no subsequent busypoll cycle.
To prevent this, defer the gro-flush timer arm after the last poll.
Fixes: 7fd3253a7de6 ("net: Introduce preferred busy-polling") Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260506090808.820559-2-dtatulea@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ipv6: flowlabel: per-netns budget for unprivileged callers
From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
This series fixes the cross-tenant DoS in net/ipv6/ip6_flowlabel.c.
v1 through v6 were single-patch postings, each in its own thread.
v6 review pointed out that the existing fl_size read in
mem_check() and the corresponding write in fl_intern() are not in
the same critical section. v7 split the work into 2 patches.
Patch 1/2 is a prerequisite. It moves spin_lock_bh(&ip6_fl_lock)
and the matching unlock from fl_intern() into its only caller
ipv6_flowlabel_get(), so the mem_check() call runs under the same
critical section as the fl_intern() insert. With all writers and
the read of fl_size under the lock, fl_size is converted from
atomic_t to plain int. This is independent of the per-netns
budget. It also makes 2/2 backportable without conflicts.
Patch 2/2 is the v6 patch, rebased on 1/2.
- flowlabel_count is plain int rather than atomic_t, since the
previous patch put all writers and readers under ip6_fl_lock.
- In ip6_fl_gc(), fl_free() is now placed below the fl_size
and flowlabel_count decrements, removing the v6 cache of
fl->fl_net.
- In ip6_fl_purge(), fl_free() stays in its original position.
The function argument net is used for flowlabel_count.
- mem_check() uses spaces around the / operator on all four
expressions, addressing the checkpatch note in v6 review.
CAP_NET_ADMIN against init_user_ns still bypasses both caps.
Reproducer (KASAN VM, 4 cores, qemu): unprivileged netns A holds
3072 flowlabels via 100 procs. Fresh unprivileged netns B then
allocates 32 flowlabels (the FL_MAX_PER_SOCK ceiling for one
socket), the same as a clean baseline. Without the per-netns
ceiling, netns A could push fl_size past FL_MAX_SIZE - FL_MAX_SIZE
/ 4 and netns B would see allocations denied.
====================
Maoyi Xie [Wed, 6 May 2026 08:24:16 +0000 (16:24 +0800)]
ipv6: flowlabel: enforce per-netns limit for unprivileged callers
fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are
file scope and shared across netns. mem_check() reads fl_size to
decide whether to deny non-CAP_NET_ADMIN callers. capable() runs
against init_user_ns, so an unprivileged user in any non-init
userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and
starve every other unprivileged userns on the host.
Add struct netns_ipv6::flowlabel_count, bumped and decremented
next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new
field fills the existing 4-byte hole after ipmr_seq, so struct
netns_ipv6 stays the same size on 64-bit builds.
Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the
file was added. Machines and connection counts have grown.
mem_check() folds an extra per-netns ceiling into the existing
non-CAP_NET_ADMIN conditional. The ceiling is half of the total
budget that unprivileged callers have ever been able to use, i.e.
(FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With
FL_MAX_SIZE doubled, this preserves the original per-user reach
of 3K (what an unprivileged caller could already obtain before
this change), while forcing an attacker to spread allocations
across at least two netns to exhaust the global non-CAP_NET_ADMIN
budget.
CAP_NET_ADMIN against init_user_ns still bypasses both caps.
The previous patch took ip6_fl_lock across mem_check and
fl_intern, so the new flowlabel_count read in mem_check and the
new flowlabel_count++ in fl_intern run under the same critical
section. flowlabel_count is therefore plain int, like fl_size.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maoyi Xie [Wed, 6 May 2026 08:24:15 +0000 (16:24 +0800)]
ipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern
mem_check() in net/ipv6/ip6_flowlabel.c reads fl_size without
holding ip6_fl_lock. fl_intern() takes the lock immediately
afterwards. The two checks therefore race against concurrent
fl_intern, ip6_fl_gc and ip6_fl_purge writers, which makes the
mem_check budget check approximate.
Move spin_lock_bh(&ip6_fl_lock) and the matching unlock from
fl_intern() into its only caller ipv6_flowlabel_get(). The
mem_check() call now runs under the same critical section as the
fl_intern() insert, so the budget check is exact.
With all writers and the read of fl_size under ip6_fl_lock,
convert fl_size from atomic_t to plain int. The four sites that
update or read fl_size are fl_intern (insert path), ip6_fl_gc
(garbage collector, the !sched check and the per-entry decrement),
ip6_fl_purge (per-netns purge), and mem_check (budget check), and
all four now run under ip6_fl_lock.
This is a prerequisite for adding a per-netns budget alongside
fl_size. The follow-up patch adds netns_ipv6::flowlabel_count and
folds it into mem_check().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260506082416.2259567-2-maoyixie.tju@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MAINTAINERS: Add self for the 3c509 network driver
It appears there's a need for a maintainer for the 3Com EtherLink III
family of Ethernet network adapters. There is documentation available
and the driver is very mature so the task ought to be of little hassle,
so I think I should be able to squeeze in any issues to be addressed.
When TCP socket migration fails at inet_ehash_insert() in
reqsk_timer_handler(), we jump to the no_ownership: label
and free the new reqsk immediately with __reqsk_free().
Thus, we must stop the new reqsk's timer before jumping to the
label, but the timer might be missed since the cited commit,
resulting in UAF.
As we are in the original reqsk's timer context, we can safely
call timer_delete_sync() for the new reqsk.
Let's pass false to __inet_csk_reqsk_queue_drop() to stop
the new reqsk's timer.
Sven Eckelmann [Wed, 6 May 2026 20:20:52 +0000 (22:20 +0200)]
batman-adv: bla: put backbone reference on failed claim hash insert
When batadv_bla_add_claim() fails to insert a new claim into the hash, it
leaked a reference to the backbone_gw for which the claim was intended.
Call batadv_backbone_gw_put() on the error path to release the reference
and avoid leaking the backbone_gw object.
Sven Eckelmann [Wed, 6 May 2026 20:20:51 +0000 (22:20 +0200)]
batman-adv: bla: only purge non-released claims
When batadv_bla_purge_claims() goes through the list of claims, it is only
traversing the hash list with an rcu_read_lock(). Due to a potential
parallel batadv_claim_put(), it can happen that it encounters a claim which
was actually in the process of being released+freed by
batadv_claim_release(). In this case, backbone_gw is set to NULL before the
delayed RCU kfree is started. Calling batadv_bla_claim_get_backbone_gw() is
then no longer allowed because it would cause a NULL-ptr derefence.
To avoid this, only claims with a valid reference counter must be purged.
All others are already taken care of.
Sven Eckelmann [Wed, 6 May 2026 20:20:50 +0000 (22:20 +0200)]
batman-adv: bla: prevent use-after-free when deleting claims
When batadv_bla_del_backbone_claims() removes all claims for a backbone, it
does this by dropping the link entry in the hash list. This list entry
itself was one of the references which need to be dropped at the same time
via batadv_claim_put().
But the batadv_claim_put() must not be done before the last access to the
claim object in this function. Otherwise the claim might be freed already
by the batadv_claim_release() function before the list entry was dropped.
Sven Eckelmann [Wed, 6 May 2026 20:20:49 +0000 (22:20 +0200)]
batman-adv: tp_meter: fix tp_num leak on kmalloc failure
When batadv_tp_start() or batadv_tp_init_recv() fail to allocate a new
tp_vars object, the previously incremented bat_priv->tp_num counter is
never decremented. This causes tp_num to drift upward on each allocation
failure. Since only BATADV_TP_MAX_NUM sessions can be started and the count
is never reduced for these failed allocations, it causes to an exhaustion
of throughput meter sessions. In worst case, no new throughput meter
session can be started until the mesh interface is removed.
The error handling must decrement tp_num releasing the lock and aborting
the creation of an throughput meter session
Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <sven@narfation.org>
Jiexun Wang [Sun, 3 May 2026 04:28:58 +0000 (12:28 +0800)]
batman-adv: stop caching unowned originator pointers in BAT IV
BAT IV keeps the last-hop neighbor address in each neigh_node, but some
paths also cache an originator pointer derived from a temporary lookup.
That pointer is not owned by the neigh_node and may no longer refer to a
live originator entry after purge handling runs.
Stop storing the auxiliary originator pointer in the BAT IV neighbor
state. When BAT IV needs the neighbor originator data, resolve it from
the stored neighbor address and drop the reference again after use.
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
[sven: avoid bonding logic for outgoing OGM] Signed-off-by: Sven Eckelmann <sven@narfation.org>
Li Xiasong [Thu, 7 May 2026 14:04:22 +0000 (22:04 +0800)]
netfilter: nf_conntrack_sip: get helper before allocating expectation
process_register_request() allocates an expectation and then checks
whether a conntrack helper is available. If helper lookup fails, the
function returns early and the allocated expectation is left behind.
Reorder the code to fetch and validate helper before calling
nf_ct_expect_alloc(). This keeps the logic simpler and removes the leak
path while preserving existing behavior.
Fixes: e14575fa7529 ("netfilter: nf_conntrack: use rcu accessors where needed") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_conntrack_expect: restore helper propagation via expectation
A recent series to fix expectations broke helper propagation via
expectation, this mechanism is used by the sip and h323 helper. This
also propagates the conntrack helper to expected connections. I changed
semantics of exp->helper which now tells us the actual helper that
created the expectation.
Add an explicit assign_helper field to expectations for this purpose
and update helpers to use it.
Restore this feature for userspace conntrack helper via ctnetlink
nfqueue integration so it is again possible to attach a helper to an
expectation, where it makes sense. This is not restored via ctnetlink
expectation creation as there is no client for such feature. Use the
expectation layer 4 protocol number for the helper lookup for
consistency.
Make sure the expectation using this helper propagation mechanism also
go away when the helper is unregistered.
netfilter: bridge: eb_tables: close module init race
sashiko reports for unrelated patch:
Does the core ebtables initialization in ebtables.c suffer from a similar race?
Once nf_register_sockopt() completes, the sockopts are exposed globally.
sockopt has to be registered last, just like in ip/ip6/arptables.
Fixes: 5b53951cfc85 ("netfilter: ebtables: use net_generic infra") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: close dangling table module init race
Similar to the previous ebtables patch:
template add exposes the table to userspace, we must do this last to
rnsure the pernet ops are set up (contain the destructors).
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ebtables: close dangling table module init race
sashiko reported for a related patch:
In modules like iptable_raw.c, [..], if register_pernet_subsys() fails,
the rollback might call kfree(rawtable_ops) before [..]
During this window, could a concurrent userspace process find the globally
visible template, trigger table_init(), [..]
The table init functions must always register the template last.
Otherwise, set/getsockopt can instantiate a table in a namespace
while the required pernet ops (contain the destructor) isn't available.
This change is also required in x_tables, handled in followup change.
Fixes: 87663c39f898 ("netfilter: ebtables: do not hook tables by default") Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: add and use xtables_unregister_table_exit
Previous change added xtables_unregister_table_pre_exit to detach the
table from the packetpath and to unlink it from the active table list.
In case of rmmod, userspace that is doing set/getsockopt for this table
will not be able to re-instantiate the table:
1. The larval table has been removed already
2. existing instantiated table is no longer on the xt pernet table list.
This adds the second stage helper:
unlink the table from the dying list, free the hook ops (if any) and do
the audit notification. It replaces xt_unregister_table().
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: unregister the templates first
When the module is going away we need to zap the template
first. Else there is a small race window where userspace
could instantiate a new table after the pernet exit function
has removed the current table.
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: x_tables: add and use xt_unregister_table_pre_exit
Remove the copypasted variants of _pre_exit and add one single
function in the xtables core. ebtables is not compatible with
x_tables and therefore unchanged.
This is a preparation patch to reduce noise in the followup
bug fixes.
netfilter: x_tables: allocate hook ops while under mutex
arp/ip(6)t_register_table() add the table to the per-netns list via
xt_register_table() before allocating the per-netns hook ops copy
via kmemdup_array(). This leaves a window where the table is
visible in the list with ops=NULL.
If the pernet exit happens runs concurrently the pre_exit callback finds
the table via xt_find_table() and passes the NULL ops pointer to
nf_unregister_net_hooks(), causing a NULL dereference:
general protection fault in nf_unregister_net_hooks+0xbc/0x150
RIP: nf_unregister_net_hooks (net/netfilter/core.c:613)
Call Trace:
ipt_unregister_table_pre_exit
iptable_mangle_net_pre_exit
ops_pre_exit_list
cleanup_net
Fix by moving the ops allocation into the xtables core so the table is
never in the list without valid ops. Also ensure the table is no longer
processing packets before its torn down on error unwind.
nf_register_net_hooks might have published at least one hook; call
synchronize_rcu() if there was an error.
audit log register message gets deferred until all operations have
passed, this avoids need to emit another ureg message in case of
error unwinding.
At the moment we emit the audit log a bit too early, which makes it
necessary to also emit an unregister log in case we have to unwind
errors after possible hook register failure.
Followup patch will be slightly simpler if we can delay the
register message until after the hooks have been wired up.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Daniel Machon [Wed, 6 May 2026 07:25:39 +0000 (09:25 +0200)]
net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
sparx5_port_init() only invokes sparx5_serdes_set() and the associated
shadow-device enable and low-speed device switch for SGMII and QSGMII.
On any port with a high-speed primary device (DEV5G/DEV10G/DEV25G)
configured for 1000BASE-X the serdes is therefore left uninitialized,
the DEV2G5 shadow is never enabled, and the port stays pointed at its
high-speed device rather than the DEV2G5. The PCS1G block looks
healthy in isolation, but no frames reach the link partner.
Add 1000BASE-X to the check so the same three steps run.
Note: the same issue might apply to 2500BASE-X, but that will,
eventually, be addressed in a separate commit.
The value read back from the chip is GCB_CHIP_ID_PART_ID, which is a
GENMASK(27, 12) field, i.e. at most 16 bits wide. It can never match
these IDs, so probing a TSN part fails with a "Target not supported"
error.
Fix the enum to use the actual 16-bit part IDs returned by the
hardware: 0x0546, 0x0549, 0x0552, 0x0556 and 0x0558.
Linus Torvalds [Thu, 7 May 2026 15:55:15 +0000 (08:55 -0700)]
Merge tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again a collection of small fixes, mostly for device-specific ones.
The only big LOC is about the removal of pretty old dead code in
ab8500 codec driver, while the rest all nice small changes.
Core / API:
- Fix race in deferred fasync state checks
- Fix UMP group filtering in sequencer
ASoC:
- cs35l56: fixes for driver cleanup and error paths
- tas2764/2770: workaround for bogus temperature readings
- wm_adsp: fixes for firmware unit tests
- amd-yc: more DMI quirks for laptops
- Minor fixes for fsl_xcvr and spacemit
HD-Audio:
- Mute LED and speaker quirks for HP, Lenovo, and Xiaomi laptops
USB-audio:
- New device-specific quirks (Motu, JBL, AlphaTheta, Razer)
- Fix of MIDI2 playback on resume
Others:
- Firewire-tascam control event fix
- Minor cleanups and fixes for sparc/dbri and pcmtest"
* tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
ASoC: cs35l56: Destroy workqueue in probe error path
ASoC: cs35l56: Don't use devres to unregister component
ALSA: sparc/dbri: add missing fallthrough
ALSA: core: Serialize deferred fasync state checks
ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx
ALSA: seq: Fix UMP group 16 filtering
ASoC: wm_adsp_fw_find_test: Clear searched_fw_files in find-by-index test
ASoC: wm_adsp_fw_find_test: Redirect wm_adsp_release_firmware_files()
ASoC: tas2770: Deal with bogus initial temperature value
ASoC: tas2764: Deal with bogus initial temperature register value
ALSA: usb-audio: add clock quirk for Motu 1248
ALSA: usb-audio: midi2: Restart output URBs on resume
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx
ALSA: usb-audio: Add quirk flags for JBL Pebbles
ALSA: firewire-tascam: Do not drop unread control events
ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA
ASoC: fsl_xcvr: Fix event generation for cached controls
ASoC: sdw_utils: avoid the SDCA companion function not supported failure
ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
ASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()
...