]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
5 weeks agobonding: prevent potential infinite loop in bond_header_parse()
Eric Dumazet [Sun, 15 Mar 2026 10:41:52 +0000 (10:41 +0000)] 
bonding: prevent potential infinite loop in bond_header_parse()

bond_header_parse() can loop if a stack of two bonding devices is setup,
because skb->dev always points to the hierarchy top.

Add new "const struct net_device *dev" parameter to
(struct header_ops)->parse() method to make sure the recursion
is bounded, and that the final leaf parse method is called.

Fixes: 950803f72547 ("bonding: fix type confusion in bond_setup_by_slave()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Tested-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Link: https://patch.msgid.link/20260315104152.1436867-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-macb-fix-ethernet-malfunction-on-amd-versal-board-after-suspend'
Jakub Kicinski [Sat, 14 Mar 2026 19:19:48 +0000 (12:19 -0700)] 
Merge branch 'net-macb-fix-ethernet-malfunction-on-amd-versal-board-after-suspend'

Kevin Hao says:

====================
net: macb: Fix Ethernet malfunction on AMD Versal board after suspend

On Versal boards, the tx/rx queue pointer registers are cleared after suspend,
which causes Ethernet malfunction. This patch series addresses this issue by
reinitializing the tx/rx queue pointer registers and the rx ring.
====================

Link: https://patch.msgid.link/20260312-macb-versal-v1-0-467647173fa4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: macb: Reinitialize tx/rx queue pointer registers and rx ring during resume
Kevin Hao [Thu, 12 Mar 2026 08:13:59 +0000 (16:13 +0800)] 
net: macb: Reinitialize tx/rx queue pointer registers and rx ring during resume

On certain platforms, such as AMD Versal boards, the tx/rx queue pointer
registers are cleared after suspend, and the rx queue pointer register
is also disabled during suspend if WOL is enabled. Previously, we assumed
that these registers would be restored by macb_mac_link_up(). However,
in commit bf9cf80cab81, macb_init_buffers() was moved from
macb_mac_link_up() to macb_open(). Therefore, we should call
macb_init_buffers() to reinitialize the tx/rx queue pointer registers
during resume.

Due to the reset of these two registers, we also need to adjust the
tx/rx rings accordingly. The tx ring will be handled by
gem_shuffle_tx_rings() in macb_mac_link_up(), so we only need to
initialize the rx ring here.

Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up")
Reported-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Tested-by: Quanyang Wang <quanyang.wang@windriver.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260312-macb-versal-v1-2-467647173fa4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: macb: Introduce gem_init_rx_ring()
Kevin Hao [Thu, 12 Mar 2026 08:13:58 +0000 (16:13 +0800)] 
net: macb: Introduce gem_init_rx_ring()

Extract the initialization code for the GEM RX ring into a new function.
This change will be utilized in a subsequent patch. No functional changes
are introduced.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260312-macb-versal-v1-1-467647173fa4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ti: icssg-prueth: Fix memory leak in XDP_DROP for non-zero-copy mode
Meghana Malladi [Wed, 11 Mar 2026 09:54:41 +0000 (15:24 +0530)] 
net: ti: icssg-prueth: Fix memory leak in XDP_DROP for non-zero-copy mode

Page recycling was removed from the XDP_DROP path in emac_run_xdp() to
avoid conflicts with AF_XDP zero-copy mode, which uses xsk_buff_free()
instead.

However, this causes a memory leak when running XDP programs that drop
packets in non-zero-copy mode (standard page pool mode). The pages are
never returned to the page pool, leading to OOM conditions.

Fix this by handling cleanup in the caller, emac_rx_packet().
When emac_run_xdp() returns ICSSG_XDP_CONSUMED for XDP_DROP, the
caller now recycles the page back to the page pool. The zero-copy
path, emac_rx_packet_zc() already handles cleanup correctly with
xsk_buff_free().

Fixes: 7a64bb388df3 ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260311095441.1691636-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: mana: fix use-after-free in mana_hwc_destroy_channel() by reordering teardown
Dipayaan Roy [Wed, 11 Mar 2026 19:22:04 +0000 (12:22 -0700)] 
net: mana: fix use-after-free in mana_hwc_destroy_channel() by reordering teardown

A potential race condition exists in mana_hwc_destroy_channel() where
hwc->caller_ctx is freed before the HWC's Completion Queue (CQ) and
Event Queue (EQ) are destroyed. This allows an in-flight CQ interrupt
handler to dereference freed memory, leading to a use-after-free or
NULL pointer dereference in mana_hwc_handle_resp().

mana_smc_teardown_hwc() signals the hardware to stop but does not
synchronize against IRQ handlers already executing on other CPUs. The
IRQ synchronization only happens in mana_hwc_destroy_cq() via
mana_gd_destroy_eq() -> mana_gd_deregister_irq(). Since this runs
after kfree(hwc->caller_ctx), a concurrent mana_hwc_rx_event_handler()
can dereference freed caller_ctx (and rxq->msg_buf) in
mana_hwc_handle_resp().

Fix this by reordering teardown to reverse-of-creation order: destroy
the TX/RX work queues and CQ/EQ before freeing hwc->caller_ctx. This
ensures all in-flight interrupt handlers complete before the memory they
access is freed.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/abHA3AjNtqa1nx9k@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: bcmgenet: increase WoL poll timeout
Justin Chen [Thu, 12 Mar 2026 19:18:52 +0000 (12:18 -0700)] 
net: bcmgenet: increase WoL poll timeout

Some systems require more than 5ms to get into WoL mode. Increase the
timeout value to 50ms.

Fixes: c51de7f3976b ("net: bcmgenet: add Wake-on-LAN support code")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260312191852.3904571-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'nf-26-03-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Sat, 14 Mar 2026 16:13:58 +0000 (09:13 -0700)] 
Merge tag 'nf-26-03-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

This is a much earlier pull request than usual, due to the large
backlog.  We are aware of several unfixed issues, in particular
in ctnetlink, patches are being worked on.

The following patchset contains Netfilter fixes for *net*:

1) fix a use-after-free in ctnetlink, from Hyunwoo Kim, broken
   since v3.10.
2) add missing netlink range checks in ctnetlink, broken since v2.6
   days.
3) fix content length truncation in sip conntrack helper,
   from Lukas Johannes Möller.  Broken since 2.6.34.
4) Revert a recent patch to add stronger checks for overlapping ranges
   in nf_tables rbtree set type.
   Patch is correct, but several nftables version have a bug (now fixed)
   that trigger the checks incorrectly.
5) Reset mac header before the vlan push to avoid warning splat (and
   make things functional). From Eric Woudstra.
6) Add missing bounds check in H323 conntrack helper, broken since this
   helper was added 20 years ago, from Jenny Guanni Qu.
7) Fix a memory leak in the dynamic set infrastructure, from Pablo Neira
   Ayuso.  Broken since v5.11.
8+9) a few spots failed to purge skbs queued to userspace via nfqueue,
   this causes RCU escape / use-after-free. Also from Pablo. broken
   since v3.4 added the CT target to xtables.
10) Fix undefined behaviour in xt_time, use u32 for a shift-by-31
    operation, not s32, from Jenny Guanni Qu.
11) H323 conntrack helper lacks a check for length variable becoming
    negative after decrement, causes major out-of-bounds read due to
    cast to unsigned size later, also from Jenny.
    Both issues exist since 2.6 days.

* tag 'nf-26-03-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_conntrack_h323: check for zero length in DecodeQ931()
  netfilter: xt_time: use unsigned int for monthday bit shift
  netfilter: xt_CT: drop pending enqueued packets on template removal
  netfilter: nft_ct: drop pending enqueued packets on removal
  nf_tables: nft_dynset: fix possible stateful expression memleak in error path
  netfilter: nf_conntrack_h323: fix OOB read in decode_int() CONS case
  netfilter: nf_flow_table_ip: reset mac header before vlan push
  netfilter: revert nft_set_rbtree: validate open interval overlap
  netfilter: nf_conntrack_sip: fix Content-Length u32 truncation in sip_help_tcp()
  netfilter: conntrack: add missing netlink policy validations
  netfilter: ctnetlink: fix use-after-free in ctnetlink_dump_exp_ct()
====================

Link: https://patch.msgid.link/20260313150614.21177-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'for-net-2026-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Sat, 14 Mar 2026 15:39:28 +0000 (08:39 -0700)] 
Merge tag 'for-net-2026-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_sync: Fix hci_le_create_conn_sync
 - MGMT: Fix list corruption and UAF in command complete handlers
 - L2CAP: Disconnect if received packet's SDU exceeds IMTU
 - L2CAP: Disconnect if sum of payload sizes exceed SDU
 - L2CAP: Fix accepting multiple L2CAP_ECRED_CONN_REQ
 - L2CAP: Fix type confusion in l2cap_ecred_reconf_rsp()
 - L2CAP: Validate L2CAP_INFO_RSP payload length before access
 - L2CAP: Fix use-after-free in l2cap_unregister_user
 - ISO: Fix defer tests being unstable
 - HIDP: Fix possible UAF
 - SMP: make SM/PER/KDU/BI-04-C happy
 - qca: fix ROM version reading on WCN3998 chips

* tag 'for-net-2026-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: qca: fix ROM version reading on WCN3998 chips
  Bluetooth: L2CAP: Validate L2CAP_INFO_RSP payload length before access
  Bluetooth: L2CAP: Fix type confusion in l2cap_ecred_reconf_rsp()
  Bluetooth: L2CAP: Fix accepting multiple L2CAP_ECRED_CONN_REQ
  Bluetooth: L2CAP: Fix use-after-free in l2cap_unregister_user
  Bluetooth: HIDP: Fix possible UAF
  Bluetooth: MGMT: Fix list corruption and UAF in command complete handlers
  Bluetooth: hci_sync: Fix hci_le_create_conn_sync
  Bluetooth: ISO: Fix defer tests being unstable
  Bluetooth: SMP: make SM/PER/KDU/BI-04-C happy
  Bluetooth: LE L2CAP: Disconnect if sum of payload sizes exceed SDU
  Bluetooth: LE L2CAP: Disconnect if received packet's SDU exceeds IMTU
====================

Link: https://patch.msgid.link/20260312200655.1215688-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoatm: lec: fix use-after-free in sock_def_readable()
Deepanshu Kartikey [Mon, 9 Mar 2026 15:59:08 +0000 (21:29 +0530)] 
atm: lec: fix use-after-free in sock_def_readable()

A race condition exists between lec_atm_close() setting priv->lecd
to NULL and concurrent access to priv->lecd in send_to_lecd(),
lec_handle_bridge(), and lec_atm_send(). When the socket is freed
via RCU while another thread is still using it, a use-after-free
occurs in sock_def_readable() when accessing the socket's wait queue.

The root cause is that lec_atm_close() clears priv->lecd without
any synchronization, while callers dereference priv->lecd without
any protection against concurrent teardown.

Fix this by converting priv->lecd to an RCU-protected pointer:
- Mark priv->lecd as __rcu in lec.h
- Use rcu_assign_pointer() in lec_atm_close() and lecd_attach()
  for safe pointer assignment
- Use rcu_access_pointer() for NULL checks that do not dereference
  the pointer in lec_start_xmit(), lec_push(), send_to_lecd() and
  lecd_attach()
- Use rcu_read_lock/rcu_dereference/rcu_read_unlock in send_to_lecd(),
  lec_handle_bridge() and lec_atm_send() to safely access lecd
- Use rcu_assign_pointer() followed by synchronize_rcu() in
  lec_atm_close() to ensure all readers have completed before
  proceeding. This is safe since lec_atm_close() is called from
  vcc_release() which holds lock_sock(), a sleeping lock.
- Remove the manual sk_receive_queue drain from lec_atm_close()
  since vcc_destroy_socket() already drains it after lec_atm_close()
  returns.

v2: Switch from spinlock + sock_hold/put approach to RCU to properly
    fix the race. The v1 spinlock approach had two issues pointed out
    by Eric Dumazet:
    1. priv->lecd was still accessed directly after releasing the
       lock instead of using a local copy.
    2. The spinlock did not prevent packets being queued after
       lec_atm_close() drains sk_receive_queue since timer and
       workqueue paths bypass netif_stop_queue().

Note: Syzbot patch testing was attempted but the test VM terminated
    unexpectedly with "Connection to localhost closed by remote host",
    likely due to a QEMU AHCI emulation issue unrelated to this fix.
    Compile testing with "make W=1 net/atm/lec.o" passes cleanly.

Reported-by: syzbot+f50072212ab792c86925@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f50072212ab792c86925
Link: https://lore.kernel.org/all/20260309093614.502094-1-kartikey406@gmail.com/T/
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309155908.508768-1-kartikey406@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetfilter: nf_conntrack_h323: check for zero length in DecodeQ931()
Jenny Guanni Qu [Thu, 12 Mar 2026 14:49:50 +0000 (14:49 +0000)] 
netfilter: nf_conntrack_h323: check for zero length in DecodeQ931()

In DecodeQ931(), the UserUserIE code path reads a 16-bit length from
the packet, then decrements it by 1 to skip the protocol discriminator
byte before passing it to DecodeH323_UserInformation(). If the encoded
length is 0, the decrement wraps to -1, which is then passed as a
large value to the decoder, leading to an out-of-bounds read.

Add a check to ensure len is positive after the decrement.

Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com>
Reported-by: Dawid Moczadło <dawid@vidocsecurity.com>
Tested-by: Jenny Guanni Qu <qguanni@gmail.com>
Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: xt_time: use unsigned int for monthday bit shift
Jenny Guanni Qu [Thu, 12 Mar 2026 14:59:49 +0000 (14:59 +0000)] 
netfilter: xt_time: use unsigned int for monthday bit shift

The monthday field can be up to 31, and shifting a signed integer 1
by 31 positions (1 << 31) is undefined behavior in C, as the result
overflows a 32-bit signed int. Use 1U to ensure well-defined behavior
for all valid monthday values.

Change the weekday shift to 1U as well for consistency.

Fixes: ee4411a1b1e0 ("[NETFILTER]: x_tables: add xt_time match")
Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com>
Reported-by: Dawid Moczadło <dawid@vidocsecurity.com>
Tested-by: Jenny Guanni Qu <qguanni@gmail.com>
Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: xt_CT: drop pending enqueued packets on template removal
Pablo Neira Ayuso [Thu, 12 Mar 2026 12:48:48 +0000 (13:48 +0100)] 
netfilter: xt_CT: drop pending enqueued packets on template removal

Templates refer to objects that can go away while packets are sitting in
nfqueue refer to:

- helper, this can be an issue on module removal.
- timeout policy, nfnetlink_cttimeout might remove it.

The use of templates with zone and event cache filter are safe, since
this just copies values.

Flush these enqueued packets in case the template rule gets removed.

Fixes: 24de58f46516 ("netfilter: xt_CT: allow to attach timeout policy + glue code")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: nft_ct: drop pending enqueued packets on removal
Pablo Neira Ayuso [Thu, 12 Mar 2026 12:48:47 +0000 (13:48 +0100)] 
netfilter: nft_ct: drop pending enqueued packets on removal

Packets sitting in nfqueue might hold a reference to:

- templates that specify the conntrack zone, because a percpu area is
  used and module removal is possible.
- conntrack timeout policies and helper, where object removal leave
  a stale reference.

Since these objects can just go away, drop enqueued packets to avoid
stale reference to them.

If there is a need for finer grain removal, this logic can be revisited
to make selective packet drop upon dependencies.

Fixes: 7e0b2b57f01d ("netfilter: nft_ct: add ct timeout support")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonf_tables: nft_dynset: fix possible stateful expression memleak in error path
Pablo Neira Ayuso [Thu, 12 Mar 2026 11:38:59 +0000 (12:38 +0100)] 
nf_tables: nft_dynset: fix possible stateful expression memleak in error path

If cloning the second stateful expression in the element via GFP_ATOMIC
fails, then the first stateful expression remains in place without being
released.

   unreferenced object (percpu) 0x607b97e9cab8 (size 16):
     comm "softirq", pid 0, jiffies 4294931867
     hex dump (first 16 bytes on cpu 3):
       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     backtrace (crc 0):
       pcpu_alloc_noprof+0x453/0xd80
       nft_counter_clone+0x9c/0x190 [nf_tables]
       nft_expr_clone+0x8f/0x1b0 [nf_tables]
       nft_dynset_new+0x2cb/0x5f0 [nf_tables]
       nft_rhash_update+0x236/0x11c0 [nf_tables]
       nft_dynset_eval+0x11f/0x670 [nf_tables]
       nft_do_chain+0x253/0x1700 [nf_tables]
       nft_do_chain_ipv4+0x18d/0x270 [nf_tables]
       nf_hook_slow+0xaa/0x1e0
       ip_local_deliver+0x209/0x330

Fixes: 563125a73ac3 ("netfilter: nftables: generalize set extension to support for several expressions")
Reported-by: Gurpreet Shergill <giki.shergill@proton.me>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: nf_conntrack_h323: fix OOB read in decode_int() CONS case
Jenny Guanni Qu [Thu, 12 Mar 2026 02:29:32 +0000 (02:29 +0000)] 
netfilter: nf_conntrack_h323: fix OOB read in decode_int() CONS case

In decode_int(), the CONS case calls get_bits(bs, 2) to read a length
value, then calls get_uint(bs, len) without checking that len bytes
remain in the buffer. The existing boundary check only validates the
2 bits for get_bits(), not the subsequent 1-4 bytes that get_uint()
reads. This allows a malformed H.323/RAS packet to cause a 1-4 byte
slab-out-of-bounds read.

Add a boundary check for len bytes after get_bits() and before
get_uint().

Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com>
Reported-by: Dawid Moczadło <dawid@vidocsecurity.com>
Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: nf_flow_table_ip: reset mac header before vlan push
Eric Woudstra [Tue, 10 Mar 2026 14:39:33 +0000 (15:39 +0100)] 
netfilter: nf_flow_table_ip: reset mac header before vlan push

With double vlan tagged packets in the fastpath, getting the error:

skb_vlan_push got skb with skb->data not at mac header (offset 18)

Call skb_reset_mac_header() before calling skb_vlan_push().

Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path")
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: revert nft_set_rbtree: validate open interval overlap
Florian Westphal [Wed, 11 Mar 2026 15:24:02 +0000 (16:24 +0100)] 
netfilter: revert nft_set_rbtree: validate open interval overlap

This reverts commit 648946966a08 ("netfilter: nft_set_rbtree: validate
open interval overlap").

There have been reports of nft failing to laod valid rulesets after this
patch was merged into -stable.

I can reproduce several such problem with recent nft versions, including
nft 1.1.6 which is widely shipped by distributions.

We currently have little choice here.
This commit can be resurrected at some point once the nftables fix that
triggers the false overlap positive has appeared in common distros
(see e83e32c8d1cd ("mnl: restore create element command with large batches" in
 nftables.git).

Fixes: 648946966a08 ("netfilter: nft_set_rbtree: validate open interval overlap")
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: nf_conntrack_sip: fix Content-Length u32 truncation in sip_help_tcp()
Lukas Johannes Möller [Tue, 10 Mar 2026 21:49:01 +0000 (21:49 +0000)] 
netfilter: nf_conntrack_sip: fix Content-Length u32 truncation in sip_help_tcp()

sip_help_tcp() parses the SIP Content-Length header with
simple_strtoul(), which returns unsigned long, but stores the result in
unsigned int clen.  On 64-bit systems, values exceeding UINT_MAX are
silently truncated before computing the SIP message boundary.

For example, Content-Length 4294967328 (2^32 + 32) is truncated to 32,
causing the parser to miscalculate where the current message ends.  The
loop then treats trailing data in the TCP segment as a second SIP
message and processes it through the SDP parser.

Fix this by changing clen to unsigned long to match the return type of
simple_strtoul(), and reject Content-Length values that exceed the
remaining TCP payload length.

Fixes: f5b321bd37fb ("netfilter: nf_conntrack_sip: add TCP support")
Signed-off-by: Lukas Johannes Möller <research@johannes-moeller.dev>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: conntrack: add missing netlink policy validations
Florian Westphal [Mon, 9 Mar 2026 23:28:29 +0000 (00:28 +0100)] 
netfilter: conntrack: add missing netlink policy validations

Hyunwoo Kim reports out-of-bounds access in sctp and ctnetlink.

These attributes are used by the kernel without any validation.
Extend the netlink policies accordingly.

Quoting the reporter:
  nlattr_to_sctp() assigns the user-supplied CTA_PROTOINFO_SCTP_STATE
  value directly to ct->proto.sctp.state without checking that it is
  within the valid range. [..]

  and: ... with exp->dir = 100, the access at
  ct->master->tuplehash[100] reads 5600 bytes past the start of a
  320-byte nf_conn object, causing a slab-out-of-bounds read confirmed by
  UBSAN.

Fixes: 076a0ca02644 ("netfilter: ctnetlink: add NAT support for expectations")
Fixes: a258860e01b8 ("netfilter: ctnetlink: add full support for SCTP to ctnetlink")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agonetfilter: ctnetlink: fix use-after-free in ctnetlink_dump_exp_ct()
Hyunwoo Kim [Sat, 7 Mar 2026 17:21:37 +0000 (02:21 +0900)] 
netfilter: ctnetlink: fix use-after-free in ctnetlink_dump_exp_ct()

ctnetlink_dump_exp_ct() stores a conntrack pointer in cb->data for the
netlink dump callback ctnetlink_exp_ct_dump_table(), but drops the
conntrack reference immediately after netlink_dump_start().  When the
dump spans multiple rounds, the second recvmsg() triggers the dump
callback which dereferences the now-freed conntrack via nfct_help(ct),
leading to a use-after-free on ct->ext.

The bug is that the netlink_dump_control has no .start or .done
callbacks to manage the conntrack reference across dump rounds.  Other
dump functions in the same file (e.g. ctnetlink_get_conntrack) properly
use .start/.done callbacks for this purpose.

Fix this by adding .start and .done callbacks that hold and release the
conntrack reference for the duration of the dump, and move the
nfct_help() call after the cb->args[0] early-return check in the dump
callback to avoid dereferencing ct->ext unnecessarily.

 BUG: KASAN: slab-use-after-free in ctnetlink_exp_ct_dump_table+0x4f/0x2e0
 Read of size 8 at addr ffff88810597ebf0 by task ctnetlink_poc/133

 CPU: 1 UID: 0 PID: 133 Comm: ctnetlink_poc Not tainted 7.0.0-rc2+ #3 PREEMPTLAZY
 Call Trace:
  <TASK>
  ctnetlink_exp_ct_dump_table+0x4f/0x2e0
  netlink_dump+0x333/0x880
  netlink_recvmsg+0x3e2/0x4b0
  ? aa_sk_perm+0x184/0x450
  sock_recvmsg+0xde/0xf0

 Allocated by task 133:
  kmem_cache_alloc_noprof+0x134/0x440
  __nf_conntrack_alloc+0xa8/0x2b0
  ctnetlink_create_conntrack+0xa1/0x900
  ctnetlink_new_conntrack+0x3cf/0x7d0
  nfnetlink_rcv_msg+0x48e/0x510
  netlink_rcv_skb+0xc9/0x1f0
  nfnetlink_rcv+0xdb/0x220
  netlink_unicast+0x3ec/0x590
  netlink_sendmsg+0x397/0x690
  __sys_sendmsg+0xf4/0x180

 Freed by task 0:
  slab_free_after_rcu_debug+0xad/0x1e0
  rcu_core+0x5c3/0x9c0

Fixes: e844a928431f ("netfilter: ctnetlink: allow to dump expectation per master conntrack")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
5 weeks agompls: add missing unregister_netdevice_notifier to mpls_init
Sabrina Dubroca [Wed, 11 Mar 2026 22:35:09 +0000 (23:35 +0100)] 
mpls: add missing unregister_netdevice_notifier to mpls_init

If mpls_init() fails after registering mpls_dev_notifier, it never
gets removed. Add the missing unregister_netdevice_notifier() call to
the error handling path.

Fixes: 5be2062e3080 ("mpls: Handle error of rtnl_register_module().")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/7c55363c4f743d19e2306204a134407c90a69bbb.1773228081.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoip_tunnel: adapt iptunnel_xmit_stats() to NETDEV_PCPU_STAT_DSTATS
Eric Dumazet [Wed, 11 Mar 2026 12:31:10 +0000 (12:31 +0000)] 
ip_tunnel: adapt iptunnel_xmit_stats() to NETDEV_PCPU_STAT_DSTATS

Blamed commits forgot that vxlan/geneve use udp_tunnel[6]_xmit_skb() which
call iptunnel_xmit_stats().

iptunnel_xmit_stats() was assuming tunnels were only using
NETDEV_PCPU_STAT_TSTATS.

@syncp offset in pcpu_sw_netstats and pcpu_dstats is different.

32bit kernels would either have corruptions or freezes if the syncp
sequence was overwritten.

This patch also moves pcpu_stat_type closer to dev->{t,d}stats to avoid
a potential cache line miss since iptunnel_xmit_stats() needs to read it.

Fixes: 6fa6de302246 ("geneve: Handle stats using NETDEV_PCPU_STAT_DSTATS.")
Fixes: be226352e8dc ("vxlan: Handle stats using NETDEV_PCPU_STAT_DSTATS.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20260311123110.1471930-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/rose: fix NULL pointer dereference in rose_transmit_link on reconnect
Jiayuan Chen [Wed, 11 Mar 2026 07:06:02 +0000 (15:06 +0800)] 
net/rose: fix NULL pointer dereference in rose_transmit_link on reconnect

syzkaller reported a bug [1], and the reproducer is available at [2].

ROSE sockets use four sk->sk_state values: TCP_CLOSE, TCP_LISTEN,
TCP_SYN_SENT, and TCP_ESTABLISHED. rose_connect() already rejects
calls for TCP_ESTABLISHED (-EISCONN) and TCP_CLOSE with SS_CONNECTING
(-ECONNREFUSED), but lacks a check for TCP_SYN_SENT.

When rose_connect() is called a second time while the first connection
attempt is still in progress (TCP_SYN_SENT), it overwrites
rose->neighbour via rose_get_neigh(). If that returns NULL, the socket
is left with rose->state == ROSE_STATE_1 but rose->neighbour == NULL.
When the socket is subsequently closed, rose_release() sees
ROSE_STATE_1 and calls rose_write_internal() ->
rose_transmit_link(skb, NULL), causing a NULL pointer dereference.

Per connect(2), a second connect() while a connection is already in
progress should return -EALREADY. Add this missing check for
TCP_SYN_SENT to complete the state validation in rose_connect().

[1] https://syzkaller.appspot.com/bug?extid=d00f90e0af54102fb271
[2] https://gist.github.com/mrpre/9e6779e0d13e2c66779b1653fef80516

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+d00f90e0af54102fb271@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69694d6f.050a0220.58bed.0027.GAE@google.com/T/
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260311070611.76913-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agobridge: cfm: Fix race condition in peer_mep deletion
Hyunwoo Kim [Tue, 10 Mar 2026 18:18:09 +0000 (03:18 +0900)] 
bridge: cfm: Fix race condition in peer_mep deletion

When a peer MEP is being deleted, cancel_delayed_work_sync() is called
on ccm_rx_dwork before freeing. However, br_cfm_frame_rx() runs in
softirq context under rcu_read_lock (without RTNL) and can re-schedule
ccm_rx_dwork via ccm_rx_timer_start() between cancel_delayed_work_sync()
returning and kfree_rcu() being called.

The following is a simple race scenario:

           cpu0                                     cpu1

mep_delete_implementation()
  cancel_delayed_work_sync(ccm_rx_dwork);
                                           br_cfm_frame_rx()
                                             // peer_mep still in hlist
                                             if (peer_mep->ccm_defect)
                                               ccm_rx_timer_start()
                                                 queue_delayed_work(ccm_rx_dwork)
  hlist_del_rcu(&peer_mep->head);
  kfree_rcu(peer_mep, rcu);
                                           ccm_rx_work_expired()
                                             // on freed peer_mep

To prevent this, cancel_delayed_work_sync() is replaced with
disable_delayed_work_sync() in both peer MEP deletion paths, so
that subsequent queue_delayed_work() calls from br_cfm_frame_rx()
are silently rejected.

The cc_peer_disable() helper retains cancel_delayed_work_sync()
because it is also used for the CC enable/disable toggle path where
the work must remain re-schedulable.

Fixes: dc32cbb3dbd7 ("bridge: cfm: Kernel space implementation of CFM. CCM frame RX added.")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/abBgYT5K_FI9rD1a@v4bel
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoaf_unix: Give up GC if MSG_PEEK intervened.
Kuniyuki Iwashima [Wed, 11 Mar 2026 05:40:40 +0000 (05:40 +0000)] 
af_unix: Give up GC if MSG_PEEK intervened.

Igor Ushakov reported that GC purged the receive queue of
an alive socket due to a race with MSG_PEEK with a nice repro.

This is the exact same issue previously fixed by commit
cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").

After GC was replaced with the current algorithm, the cited
commit removed the locking dance in unix_peek_fds() and
reintroduced the same issue.

The problem is that MSG_PEEK bumps a file refcount without
interacting with GC.

Consider an SCC containing sk-A and sk-B, where sk-A is
close()d but can be recv()ed via sk-B.

The bad thing happens if sk-A is recv()ed with MSG_PEEK from
sk-B and sk-B is close()d while GC is checking unix_vertex_dead()
for sk-A and sk-B.

  GC thread                    User thread
  ---------                    -----------
  unix_vertex_dead(sk-A)
  -> true   <------.
                    \
                     `------   recv(sk-B, MSG_PEEK)
              invalidate !!    -> sk-A's file refcount : 1 -> 2

                               close(sk-B)
                               -> sk-B's file refcount : 2 -> 1
  unix_vertex_dead(sk-B)
  -> true

Initially, sk-A's file refcount is 1 by the inflight fd in sk-B
recvq.  GC thinks sk-A is dead because the file refcount is the
same as the number of its inflight fds.

However, sk-A's file refcount is bumped silently by MSG_PEEK,
which invalidates the previous evaluation.

At this moment, sk-B's file refcount is 2; one by the open fd,
and one by the inflight fd in sk-A.  The subsequent close()
releases one refcount by the former.

Finally, GC incorrectly concludes that both sk-A and sk-B are dead.

One option is to restore the locking dance in unix_peek_fds(),
but we can resolve this more elegantly thanks to the new algorithm.

The point is that the issue does not occur without the subsequent
close() and we actually do not need to synchronise MSG_PEEK with
the dead SCC detection.

When the issue occurs, close() and GC touch the same file refcount.
If GC sees the refcount being decremented by close(), it can just
give up garbage-collecting the SCC.

Therefore, we only need to signal the race during MSG_PEEK with
a proper memory barrier to make it visible to the GC.

Let's use seqcount_t to notify GC when MSG_PEEK occurs and let
it defer the SCC to the next run.

This way no locking is needed on the MSG_PEEK side, and we can
avoid imposing a penalty on every MSG_PEEK unnecessarily.

Note that we can retry within unix_scc_dead() if MSG_PEEK is
detected, but we do not do so to avoid hung task splat from
abusive MSG_PEEK calls.

Fixes: 118f457da9ed ("af_unix: Remove lock dance in unix_peek_fds().")
Reported-by: Igor Ushakov <sysroot314@gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260311054043.1231316-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoBluetooth: qca: fix ROM version reading on WCN3998 chips
Dmitry Baryshkov [Tue, 10 Mar 2026 23:02:57 +0000 (01:02 +0200)] 
Bluetooth: qca: fix ROM version reading on WCN3998 chips

WCN3998 uses a bit different format for rom version:

[    5.479978] Bluetooth: hci0: setting up wcn399x
[    5.633763] Bluetooth: hci0: QCA Product ID   :0x0000000a
[    5.645350] Bluetooth: hci0: QCA SOC Version  :0x40010224
[    5.650906] Bluetooth: hci0: QCA ROM Version  :0x00001001
[    5.665173] Bluetooth: hci0: QCA Patch Version:0x00006699
[    5.679356] Bluetooth: hci0: QCA controller version 0x02241001
[    5.691109] Bluetooth: hci0: QCA Downloading qca/crbtfw21.tlv
[    6.680102] Bluetooth: hci0: QCA Downloading qca/crnv21.bin
[    6.842948] Bluetooth: hci0: QCA setup on UART is completed

Fixes: 523760b7ff88 ("Bluetooth: hci_qca: Added support for WCN3998")
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: L2CAP: Validate L2CAP_INFO_RSP payload length before access
Lukas Johannes Möller [Tue, 10 Mar 2026 21:59:47 +0000 (21:59 +0000)] 
Bluetooth: L2CAP: Validate L2CAP_INFO_RSP payload length before access

l2cap_information_rsp() checks that cmd_len covers the fixed
l2cap_info_rsp header (type + result, 4 bytes) but then reads
rsp->data without verifying that the payload is present:

 - L2CAP_IT_FEAT_MASK calls get_unaligned_le32(rsp->data), which reads
   4 bytes past the header (needs cmd_len >= 8).

 - L2CAP_IT_FIXED_CHAN reads rsp->data[0], 1 byte past the header
   (needs cmd_len >= 5).

A truncated L2CAP_INFO_RSP with result == L2CAP_IR_SUCCESS triggers an
out-of-bounds read of adjacent skb data.

Guard each data access with the required payload length check.  If the
payload is too short, skip the read and let the state machine complete
with safe defaults (feat_mask and remote_fixed_chan remain zero from
kzalloc), so the info timer cleanup and l2cap_conn_start() still run
and the connection is not stalled.

Fixes: 4e8402a3f884 ("[Bluetooth] Retrieve L2CAP features mask on connection setup")
Cc: stable@vger.kernel.org
Signed-off-by: Lukas Johannes Möller <research@johannes-moeller.dev>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: L2CAP: Fix type confusion in l2cap_ecred_reconf_rsp()
Lukas Johannes Möller [Tue, 10 Mar 2026 21:59:46 +0000 (21:59 +0000)] 
Bluetooth: L2CAP: Fix type confusion in l2cap_ecred_reconf_rsp()

l2cap_ecred_reconf_rsp() casts the incoming data to struct
l2cap_ecred_conn_rsp (the ECRED *connection* response, 8 bytes with
result at offset 6) instead of struct l2cap_ecred_reconf_rsp (2 bytes
with result at offset 0).

This causes two problems:

 - The sizeof(*rsp) length check requires 8 bytes instead of the
   correct 2, so valid L2CAP_ECRED_RECONF_RSP packets are rejected
   with -EPROTO.

 - rsp->result reads from offset 6 instead of offset 0, returning
   wrong data when the packet is large enough to pass the check.

Fix by using the correct type.  Also pass the already byte-swapped
result variable to BT_DBG instead of the raw __le16 field.

Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Cc: stable@vger.kernel.org
Signed-off-by: Lukas Johannes Möller <research@johannes-moeller.dev>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: L2CAP: Fix accepting multiple L2CAP_ECRED_CONN_REQ
Luiz Augusto von Dentz [Tue, 3 Mar 2026 18:29:53 +0000 (13:29 -0500)] 
Bluetooth: L2CAP: Fix accepting multiple L2CAP_ECRED_CONN_REQ

Currently the code attempts to accept requests regardless of the
command identifier which may cause multiple requests to be marked
as pending (FLAG_DEFER_SETUP) which can cause more than
L2CAP_ECRED_MAX_CID(5) to be allocated in l2cap_ecred_rsp_defer
causing an overflow.

The spec is quite clear that the same identifier shall not be used on
subsequent requests:

'Within each signaling channel a different Identifier shall be used
for each successive request or indication.'
https://www.bluetooth.com/wp-content/uploads/Files/Specification/HTML/Core-62/out/en/host/logical-link-control-and-adaptation-protocol-specification.html#UUID-32a25a06-4aa4-c6c7-77c5-dcfe3682355d

So this attempts to check if there are any channels pending with the
same identifier and rejects if any are found.

Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: L2CAP: Fix use-after-free in l2cap_unregister_user
Shaurya Rane [Thu, 6 Nov 2025 18:20:16 +0000 (23:50 +0530)] 
Bluetooth: L2CAP: Fix use-after-free in l2cap_unregister_user

After commit ab4eedb790ca ("Bluetooth: L2CAP: Fix corrupted list in
hci_chan_del"), l2cap_conn_del() uses conn->lock to protect access to
conn->users. However, l2cap_register_user() and l2cap_unregister_user()
don't use conn->lock, creating a race condition where these functions can
access conn->users and conn->hchan concurrently with l2cap_conn_del().

This can lead to use-after-free and list corruption bugs, as reported
by syzbot.

Fix this by changing l2cap_register_user() and l2cap_unregister_user()
to use conn->lock instead of hci_dev_lock(), ensuring consistent locking
for the l2cap_conn structure.

Reported-by: syzbot+14b6d57fb728e27ce23c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=14b6d57fb728e27ce23c
Fixes: ab4eedb790ca ("Bluetooth: L2CAP: Fix corrupted list in hci_chan_del")
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: HIDP: Fix possible UAF
Luiz Augusto von Dentz [Thu, 5 Mar 2026 15:17:47 +0000 (10:17 -0500)] 
Bluetooth: HIDP: Fix possible UAF

This fixes the following trace caused by not dropping l2cap_conn
reference when user->remove callback is called:

[   97.809249] l2cap_conn_free: freeing conn ffff88810a171c00
[   97.809907] CPU: 1 UID: 0 PID: 1419 Comm: repro_standalon Not tainted 7.0.0-rc1-dirty #14 PREEMPT(lazy)
[   97.809935] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[   97.809947] Call Trace:
[   97.809954]  <TASK>
[   97.809961]  dump_stack_lvl (lib/dump_stack.c:122)
[   97.809990]  l2cap_conn_free (net/bluetooth/l2cap_core.c:1808)
[   97.810017]  l2cap_conn_del (./include/linux/kref.h:66 net/bluetooth/l2cap_core.c:1821 net/bluetooth/l2cap_core.c:1798)
[   97.810055]  l2cap_disconn_cfm (net/bluetooth/l2cap_core.c:7347 (discriminator 1) net/bluetooth/l2cap_core.c:7340 (discriminator 1))
[   97.810086]  ? __pfx_l2cap_disconn_cfm (net/bluetooth/l2cap_core.c:7341)
[   97.810117]  hci_conn_hash_flush (./include/net/bluetooth/hci_core.h:2152 (discriminator 2) net/bluetooth/hci_conn.c:2644 (discriminator 2))
[   97.810148]  hci_dev_close_sync (net/bluetooth/hci_sync.c:5360)
[   97.810180]  ? __pfx_hci_dev_close_sync (net/bluetooth/hci_sync.c:5285)
[   97.810212]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810242]  ? up_write (./arch/x86/include/asm/atomic64_64.h:87 (discriminator 5) ./include/linux/atomic/atomic-arch-fallback.h:2852 (discriminator 5) ./include/linux/atomic/atomic-long.h:268 (discriminator 5) ./include/linux/atomic/atomic-instrumented.h:3391 (discriminator 5) kernel/locking/rwsem.c:1385 (discriminator 5) kernel/locking/rwsem.c:1643 (discriminator 5))
[   97.810267]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810290]  ? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/context_tracking.h:128 kernel/rcu/tree.c:752)
[   97.810320]  hci_unregister_dev (net/bluetooth/hci_core.c:504 net/bluetooth/hci_core.c:2716)
[   97.810346]  vhci_release (drivers/bluetooth/hci_vhci.c:691)
[   97.810375]  ? __pfx_vhci_release (drivers/bluetooth/hci_vhci.c:678)
[   97.810404]  __fput (fs/file_table.c:470)
[   97.810430]  task_work_run (kernel/task_work.c:235)
[   97.810451]  ? __pfx_task_work_run (kernel/task_work.c:201)
[   97.810472]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810495]  ? do_raw_spin_unlock (./include/asm-generic/qspinlock.h:128 (discriminator 5) kernel/locking/spinlock_debug.c:142 (discriminator 5))
[   97.810527]  do_exit (kernel/exit.c:972)
[   97.810547]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810574]  ? __pfx_do_exit (kernel/exit.c:897)
[   97.810594]  ? lock_acquire (kernel/locking/lockdep.c:470 (discriminator 6) kernel/locking/lockdep.c:5870 (discriminator 6) kernel/locking/lockdep.c:5825 (discriminator 6))
[   97.810616]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810639]  ? do_raw_spin_lock (kernel/locking/spinlock_debug.c:95 (discriminator 4) kernel/locking/spinlock_debug.c:118 (discriminator 4))
[   97.810664]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810688]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
[   97.810721]  do_group_exit (kernel/exit.c:1093)
[   97.810745]  get_signal (kernel/signal.c:3007 (discriminator 1))
[   97.810772]  ? security_file_permission (./arch/x86/include/asm/jump_label.h:37 security/security.c:2366)
[   97.810803]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810826]  ? vfs_read (fs/read_write.c:555)
[   97.810854]  ? __pfx_get_signal (kernel/signal.c:2800)
[   97.810880]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810905]  ? __pfx_vfs_read (fs/read_write.c:555)
[   97.810932]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.810960]  arch_do_signal_or_restart (arch/x86/kernel/signal.c:337 (discriminator 1))
[   97.810990]  ? __pfx_arch_do_signal_or_restart (arch/x86/kernel/signal.c:334)
[   97.811021]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.811055]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.811078]  ? ksys_read (fs/read_write.c:707)
[   97.811106]  ? __pfx_ksys_read (fs/read_write.c:707)
[   97.811137]  exit_to_user_mode_loop (kernel/entry/common.c:66 kernel/entry/common.c:98)
[   97.811169]  ? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/context_tracking.h:128 kernel/rcu/tree.c:752)
[   97.811192]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.811215]  ? trace_hardirqs_off (./include/trace/events/preemptirq.h:36 (discriminator 33) kernel/trace/trace_preemptirq.c:95 (discriminator 33) kernel/trace/trace_preemptirq.c:90 (discriminator 33))
[   97.811240]  do_syscall_64 (./include/linux/irq-entry-common.h:226 ./include/linux/irq-entry-common.h:256 ./include/linux/entry-common.h:325 arch/x86/entry/syscall_64.c:100)
[   97.811268]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   97.811292]  ? exc_page_fault (arch/x86/mm/fault.c:1480 (discriminator 3) arch/x86/mm/fault.c:1527 (discriminator 3))
[   97.811318]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   97.811338] RIP: 0033:0x445cfe
[   97.811352] Code: Unable to access opcode bytes at 0x445cd4.

Code starting with the faulting instruction
===========================================
[   97.811360] RSP: 002b:00007f65c41c6dc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   97.811378] RAX: fffffffffffffe00 RBX: 00007f65c41c76c0 RCX: 0000000000445cfe
[   97.811391] RDX: 0000000000000400 RSI: 00007f65c41c6e40 RDI: 0000000000000004
[   97.811403] RBP: 00007f65c41c7250 R08: 0000000000000000 R09: 0000000000000000
[   97.811415] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffe8
[   97.811428] R13: 0000000000000000 R14: 00007fff780a8c00 R15: 00007f65c41c76c0
[   97.811453]  </TASK>
[   98.402453] ==================================================================
[   98.403560] BUG: KASAN: use-after-free in __mutex_lock (kernel/locking/mutex.c:199 kernel/locking/mutex.c:694 kernel/locking/mutex.c:776)
[   98.404541] Read of size 8 at addr ffff888113ee40a8 by task khidpd_00050004/1430
[   98.405361]
[   98.405563] CPU: 1 UID: 0 PID: 1430 Comm: khidpd_00050004 Not tainted 7.0.0-rc1-dirty #14 PREEMPT(lazy)
[   98.405588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[   98.405600] Call Trace:
[   98.405607]  <TASK>
[   98.405614]  dump_stack_lvl (lib/dump_stack.c:122)
[   98.405641]  print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
[   98.405667]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.405691]  ? __virt_addr_valid (arch/x86/mm/physaddr.c:55)
[   98.405724]  ? __mutex_lock (kernel/locking/mutex.c:199 kernel/locking/mutex.c:694 kernel/locking/mutex.c:776)
[   98.405748]  kasan_report (mm/kasan/report.c:221 mm/kasan/report.c:597)
[   98.405778]  ? __mutex_lock (kernel/locking/mutex.c:199 kernel/locking/mutex.c:694 kernel/locking/mutex.c:776)
[   98.405807]  __mutex_lock (kernel/locking/mutex.c:199 kernel/locking/mutex.c:694 kernel/locking/mutex.c:776)
[   98.405832]  ? do_raw_spin_lock (kernel/locking/spinlock_debug.c:95 (discriminator 4) kernel/locking/spinlock_debug.c:118 (discriminator 4))
[   98.405859]  ? l2cap_unregister_user (./include/linux/list.h:381 (discriminator 2) net/bluetooth/l2cap_core.c:1723 (discriminator 2))
[   98.405888]  ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114)
[   98.405915]  ? __pfx___mutex_lock (kernel/locking/mutex.c:775)
[   98.405939]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.405963]  ? lock_acquire (kernel/locking/lockdep.c:470 (discriminator 6) kernel/locking/lockdep.c:5870 (discriminator 6) kernel/locking/lockdep.c:5825 (discriminator 6))
[   98.405984]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
[   98.406015]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406038]  ? lock_release (kernel/locking/lockdep.c:5536 kernel/locking/lockdep.c:5889 kernel/locking/lockdep.c:5875)
[   98.406061]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406085]  ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:119 ./arch/x86/include/asm/irqflags.h:159 ./include/linux/spinlock_api_smp.h:178 kernel/locking/spinlock.c:194)
[   98.406107]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406130]  ? __timer_delete_sync (kernel/time/timer.c:1592)
[   98.406158]  ? l2cap_unregister_user (./include/linux/list.h:381 (discriminator 2) net/bluetooth/l2cap_core.c:1723 (discriminator 2))
[   98.406186]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406210]  l2cap_unregister_user (./include/linux/list.h:381 (discriminator 2) net/bluetooth/l2cap_core.c:1723 (discriminator 2))
[   98.406263]  hidp_session_thread (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 ./include/linux/kref.h:64 net/bluetooth/hidp/core.c:996 net/bluetooth/hidp/core.c:1305)
[   98.406293]  ? __pfx_hidp_session_thread (net/bluetooth/hidp/core.c:1264)
[   98.406323]  ? kthread (kernel/kthread.c:433)
[   98.406340]  ? __pfx_hidp_session_wake_function (net/bluetooth/hidp/core.c:1251)
[   98.406370]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406393]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
[   98.406424]  ? __pfx_hidp_session_wake_function (net/bluetooth/hidp/core.c:1251)
[   98.406453]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406476]  ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:79 (discriminator 1))
[   98.406499]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406523]  ? kthread (kernel/kthread.c:433)
[   98.406539]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406565]  ? kthread (kernel/kthread.c:433)
[   98.406581]  ? __pfx_hidp_session_thread (net/bluetooth/hidp/core.c:1264)
[   98.406610]  kthread (kernel/kthread.c:467)
[   98.406627]  ? __pfx_kthread (kernel/kthread.c:412)
[   98.406645]  ret_from_fork (arch/x86/kernel/process.c:164)
[   98.406674]  ? __pfx_ret_from_fork (arch/x86/kernel/process.c:153)
[   98.406704]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.406728]  ? __pfx_kthread (kernel/kthread.c:412)
[   98.406747]  ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
[   98.406774]  </TASK>
[   98.406780]
[   98.433693] The buggy address belongs to the physical page:
[   98.434405] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888113ee7c40 pfn:0x113ee4
[   98.435557] flags: 0x200000000000000(node=0|zone=2)
[   98.436198] raw: 0200000000000000 ffffea0004244308 ffff8881f6f3ebc0 0000000000000000
[   98.437195] raw: ffff888113ee7c40 0000000000000000 00000000ffffffff 0000000000000000
[   98.438115] page dumped because: kasan: bad access detected
[   98.438951]
[   98.439211] Memory state around the buggy address:
[   98.439871]  ffff888113ee3f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   98.440714]  ffff888113ee4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   98.441580] >ffff888113ee4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   98.442458]                                   ^
[   98.443011]  ffff888113ee4100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   98.443889]  ffff888113ee4180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   98.444768] ==================================================================
[   98.445719] Disabling lock debugging due to kernel taint
[   98.448074] l2cap_conn_free: freeing conn ffff88810c22b400
[   98.450012] CPU: 1 UID: 0 PID: 1430 Comm: khidpd_00050004 Tainted: G    B               7.0.0-rc1-dirty #14 PREEMPT(lazy)
[   98.450040] Tainted: [B]=BAD_PAGE
[   98.450047] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[   98.450059] Call Trace:
[   98.450065]  <TASK>
[   98.450071]  dump_stack_lvl (lib/dump_stack.c:122)
[   98.450099]  l2cap_conn_free (net/bluetooth/l2cap_core.c:1808)
[   98.450125]  l2cap_conn_put (net/bluetooth/l2cap_core.c:1822)
[   98.450154]  session_free (net/bluetooth/hidp/core.c:990)
[   98.450181]  hidp_session_thread (net/bluetooth/hidp/core.c:1307)
[   98.450213]  ? __pfx_hidp_session_thread (net/bluetooth/hidp/core.c:1264)
[   98.450271]  ? kthread (kernel/kthread.c:433)
[   98.450293]  ? __pfx_hidp_session_wake_function (net/bluetooth/hidp/core.c:1251)
[   98.450339]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.450368]  ? find_held_lock (kernel/locking/lockdep.c:5350 (discriminator 1))
[   98.450406]  ? __pfx_hidp_session_wake_function (net/bluetooth/hidp/core.c:1251)
[   98.450442]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.450471]  ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:79 (discriminator 1))
[   98.450499]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.450528]  ? kthread (kernel/kthread.c:433)
[   98.450547]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.450578]  ? kthread (kernel/kthread.c:433)
[   98.450598]  ? __pfx_hidp_session_thread (net/bluetooth/hidp/core.c:1264)
[   98.450637]  kthread (kernel/kthread.c:467)
[   98.450657]  ? __pfx_kthread (kernel/kthread.c:412)
[   98.450680]  ret_from_fork (arch/x86/kernel/process.c:164)
[   98.450715]  ? __pfx_ret_from_fork (arch/x86/kernel/process.c:153)
[   98.450752]  ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221)
[   98.450782]  ? __pfx_kthread (kernel/kthread.c:412)
[   98.450804]  ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
[   98.450836]  </TASK>

Fixes: b4f34d8d9d26 ("Bluetooth: hidp: add new session-management helpers")
Reported-by: soufiane el hachmi <kilwa10@gmail.com>
Tested-by: soufiane el hachmi <kilwa10@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: MGMT: Fix list corruption and UAF in command complete handlers
Wang Tao [Fri, 27 Feb 2026 11:03:39 +0000 (11:03 +0000)] 
Bluetooth: MGMT: Fix list corruption and UAF in command complete handlers

Commit 302a1f674c00 ("Bluetooth: MGMT: Fix possible UAFs") introduced
mgmt_pending_valid(), which not only validates the pending command but
also unlinks it from the pending list if it is valid. This change in
semantics requires updates to several completion handlers to avoid list
corruption and memory safety issues.

This patch addresses two left-over issues from the aforementioned rework:

1. In mgmt_add_adv_patterns_monitor_complete(), mgmt_pending_remove()
is replaced with mgmt_pending_free() in the success path. Since
mgmt_pending_valid() already unlinks the command at the beginning of
the function, calling mgmt_pending_remove() leads to a double list_del()
and subsequent list corruption/kernel panic.

2. In set_mesh_complete(), the use of mgmt_pending_foreach() in the error
path is removed. Since the current command is already unlinked by
mgmt_pending_valid(), this foreach loop would incorrectly target other
pending mesh commands, potentially freeing them while they are still being
processed concurrently (leading to UAFs). The redundant mgmt_cmd_status()
is also simplified to use cmd->opcode directly.

Fixes: 302a1f674c00 ("Bluetooth: MGMT: Fix possible UAFs")
Signed-off-by: Wang Tao <wangtao554@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: hci_sync: Fix hci_le_create_conn_sync
Michael Grzeschik [Thu, 5 Mar 2026 13:50:52 +0000 (14:50 +0100)] 
Bluetooth: hci_sync: Fix hci_le_create_conn_sync

While introducing hci_le_create_conn_sync the functionality
of hci_connect_le was ported to hci_le_create_conn_sync including
the disable of the scan before starting the connection.

When this code was run non synchronously the immediate call that was
setting the flag HCI_LE_SCAN_INTERRUPTED had an impact. Since the
completion handler for the LE_SCAN_DISABLE was not immediately called.
In the completion handler of the LE_SCAN_DISABLE event, this flag is
checked to set the state of the hdev to DISCOVERY_STOPPED.

With the synchronised approach the later setting of the
HCI_LE_SCAN_INTERRUPTED flag has not the same effect. The completion
handler would immediately fire in the LE_SCAN_DISABLE call, check for
the flag, which is then not yet set and do nothing.

To fix this issue and make the function call work as before, we move the
setting of the flag HCI_LE_SCAN_INTERRUPTED before disabling the scan.

Fixes: 8e8b92ee60de ("Bluetooth: hci_sync: Add hci_le_create_conn_sync")
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: ISO: Fix defer tests being unstable
Luiz Augusto von Dentz [Fri, 27 Feb 2026 20:23:01 +0000 (15:23 -0500)] 
Bluetooth: ISO: Fix defer tests being unstable

iso-tester defer tests seem to fail with hci_conn_hash_lookup_cig
being unable to resolve a cig in set_cig_params_sync due a race
where it is run immediatelly before hci_bind_cis is able to set
the QoS settings into the hci_conn object.

So this moves the assigning of the QoS settings to be done directly
by hci_le_set_cig_params to prevent that from happening again.

Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: SMP: make SM/PER/KDU/BI-04-C happy
Christian Eggers [Wed, 25 Feb 2026 17:07:28 +0000 (18:07 +0100)] 
Bluetooth: SMP: make SM/PER/KDU/BI-04-C happy

The last test step ("Test with Invalid public key X and Y, all set to
0") expects to get an "DHKEY check failed" instead of "unspecified".

Fixes: 6d19628f539f ("Bluetooth: SMP: Fail if remote and local public keys are identical")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: LE L2CAP: Disconnect if sum of payload sizes exceed SDU
Christian Eggers [Wed, 25 Feb 2026 17:07:27 +0000 (18:07 +0100)] 
Bluetooth: LE L2CAP: Disconnect if sum of payload sizes exceed SDU

Core 6.0, Vol 3, Part A, 3.4.3:
"... If the sum of the payload sizes for the K-frames exceeds the
specified SDU length, the receiver shall disconnect the channel."

This fixes L2CAP/LE/CFC/BV-27-C (running together with 'l2test -r -P
0x0027 -V le_public').

Fixes: aac23bf63659 ("Bluetooth: Implement LE L2CAP reassembly")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: LE L2CAP: Disconnect if received packet's SDU exceeds IMTU
Christian Eggers [Wed, 25 Feb 2026 17:07:25 +0000 (18:07 +0100)] 
Bluetooth: LE L2CAP: Disconnect if received packet's SDU exceeds IMTU

Core 6.0, Vol 3, Part A, 3.4.3:
"If the SDU length field value exceeds the receiver's MTU, the receiver
shall disconnect the channel..."

This fixes L2CAP/LE/CFC/BV-26-C (running together with 'l2test -r -P
0x0027 -V le_public -I 100').

Fixes: aac23bf63659 ("Bluetooth: Implement LE L2CAP reassembly")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoMerge tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 12 Mar 2026 18:33:35 +0000 (11:33 -0700)] 
Merge tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from CAN and netfilter.

  Current release - regressions:

   - eth: mana: Null service_wq on setup error to prevent double destroy

  Previous releases - regressions:

   - nexthop: fix percpu use-after-free in remove_nh_grp_entry

   - sched: teql: fix NULL pointer dereference in iptunnel_xmit on TEQL slave xmit

   - bpf: fix nd_tbl NULL dereference when IPv6 is disabled

   - neighbour: restore protocol != 0 check in pneigh update

   - tipc: fix divide-by-zero in tipc_sk_filter_connect()

   - eth:
      - mlx5:
         - fix crash when moving to switchdev mode
         - fix DMA FIFO desync on error CQE SQ recovery
      - iavf: fix PTP use-after-free during reset
      - bonding: fix type confusion in bond_setup_by_slave()
      - lan78xx: fix WARN in __netif_napi_del_locked on disconnect

  Previous releases - always broken:

   - core: add xmit recursion limit to tunnel xmit functions

   - net-shapers: don't free reply skb after genlmsg_reply()

   - netfilter:
      - fix stack out-of-bounds read in pipapo_drop()
      - fix OOB read in nfnl_cthelper_dump_table()

   - mctp:
      - fix device leak on probe failure
      - i2c: fix skb memory leak in receive path

   - can: keep the max bitrate error at 5%

   - eth:
      - bonding: fix nd_tbl NULL dereference when IPv6 is disabled
      - bnxt_en: fix RSS table size check when changing ethtool channels
      - amd-xgbe: prevent CRC errors during RX adaptation with AN disabled
      - octeontx2-af: devlink: fix NIX RAS reporter recovery condition"

* tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
  net: prevent NULL deref in ip[6]tunnel_xmit()
  octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status
  octeontx2-af: devlink: fix NIX RAS reporter recovery condition
  net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support
  net/mana: Null service_wq on setup error to prevent double destroy
  selftests: rtnetlink: add neighbour update test
  neighbour: restore protocol != 0 check in pneigh update
  net: dsa: realtek: Fix LED group port bit for non-zero LED group
  tipc: fix divide-by-zero in tipc_sk_filter_connect()
  net: dsa: microchip: Fix error path in PTP IRQ setup
  bpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled
  bpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled
  net: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled
  ipv6: move the disable_ipv6_mod knob to core code
  net: bcmgenet: fix broken EEE by converting to phylib-managed state
  net-shapers: don't free reply skb after genlmsg_reply()
  net: dsa: mxl862xx: don't set user_mii_bus
  net: ethernet: arc: emac: quiesce interrupts before requesting IRQ
  page_pool: store detach_time as ktime_t to avoid false-negatives
  net: macb: Shuffle the tx ring before enabling tx
  ...

6 weeks agoMerge tag 'apparmor-pr-mainline-2026-03-09' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 12 Mar 2026 17:58:02 +0000 (10:58 -0700)] 
Merge tag 'apparmor-pr-mainline-2026-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull AppArmor fixes from John Johansen:
 - fix race between freeing data and fs accessing it
 - fix race on unreferenced rawdata dereference
 - fix differential encoding verification
 - fix unconfined unprivileged local user can do privileged policy management
 - Fix double free of ns_name in aa_replace_profiles()
 - fix missing bounds check on DEFAULT table in verify_dfa()
 - fix side-effect bug in match_char() macro usage
 - fix: limit the number of levels of policy namespaces
 - replace recursive profile removal with iterative approach
 - fix memory leak in verify_header
 - validate DFA start states are in bounds in unpack_pdb

* tag 'apparmor-pr-mainline-2026-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: fix race between freeing data and fs accessing it
  apparmor: fix race on rawdata dereference
  apparmor: fix differential encoding verification
  apparmor: fix unprivileged local user can do privileged policy management
  apparmor: Fix double free of ns_name in aa_replace_profiles()
  apparmor: fix missing bounds check on DEFAULT table in verify_dfa()
  apparmor: fix side-effect bug in match_char() macro usage
  apparmor: fix: limit the number of levels of policy namespaces
  apparmor: replace recursive profile removal with iterative approach
  apparmor: fix memory leak in verify_header
  apparmor: validate DFA start states are in bounds in unpack_pdb

6 weeks agonet: prevent NULL deref in ip[6]tunnel_xmit()
Eric Dumazet [Thu, 12 Mar 2026 04:39:08 +0000 (04:39 +0000)] 
net: prevent NULL deref in ip[6]tunnel_xmit()

Blamed commit missed that both functions can be called with dev == NULL.

Also add unlikely() hints for these conditions that only fuzzers can hit.

Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260312043908.2790803-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoocteontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status
Alok Tiwari [Tue, 10 Mar 2026 18:48:17 +0000 (11:48 -0700)] 
octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status

The NIX RAS health report path uses nix_af_rvu_err when handling the
NIX_AF_RVU_RAS case, so the report prints the ERR interrupt status rather
than the RAS interrupt status.

Use nix_af_rvu_ras for the NIX_AF_RVU_RAS report.

Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20260310184824.1183651-2-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoocteontx2-af: devlink: fix NIX RAS reporter recovery condition
Alok Tiwari [Tue, 10 Mar 2026 18:48:16 +0000 (11:48 -0700)] 
octeontx2-af: devlink: fix NIX RAS reporter recovery condition

The NIX RAS health reporter recovery routine checks nix_af_rvu_int to
decide whether to re-enable NIX_AF_RAS interrupts. This is the RVU
interrupt status field and is unrelated to RAS events, so the recovery
flow may incorrectly skip re-enabling NIX_AF_RAS interrupts.

Check nix_af_rvu_ras instead before writing NIX_AF_RAS_ENA_W1S.

Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20260310184824.1183651-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support
Chintan Vankar [Tue, 10 Mar 2026 16:09:40 +0000 (21:39 +0530)] 
net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support

The "rx_filter" member of "hwtstamp_config" structure is an enum field and
does not support bitwise OR combination of multiple filter values. It
causes error while linuxptp application tries to match rx filter version.
Fix this by storing the requested filter type in a new port field.

Fixes: 97248adb5a3b ("net: ti: am65-cpsw: Update hw timestamping filter for PTPv1 RX packets")
Signed-off-by: Chintan Vankar <c-vankar@ti.com>
Link: https://patch.msgid.link/20260310160940.109822-1-c-vankar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet/mana: Null service_wq on setup error to prevent double destroy
Shiraz Saleem [Mon, 9 Mar 2026 17:24:43 +0000 (10:24 -0700)] 
net/mana: Null service_wq on setup error to prevent double destroy

In mana_gd_setup() error path, set gc->service_wq to NULL after
destroy_workqueue() to match the cleanup in mana_gd_cleanup().
This prevents a use-after-free if the workqueue pointer is checked
after a failed setup.

Fixes: f975a0955276 ("net: mana: Fix double destroy_workqueue on service rescan PCI path")
Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260309172443.688392-1-kotaranov@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'nf-26-03-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 12 Mar 2026 02:12:59 +0000 (19:12 -0700)] 
Merge tag 'nf-26-03-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

Due to large volume of backlogged patches its unlikely I will make the
2nd planned PR this week, so several legit fixes will be pushed back
to next week.  Sorry for the inconvenience but I am out of ideas and
alternatives.

1) syzbot managed to add/remove devices to a flowtable, due to a bug in
   the flowtable netdevice notifier this gets us a double-add and
   eventually UaF when device is removed again (we only expect one
   entry, duplicate remains past net_device end-of-life).
   From Phil Sutter, bug added in 6.16.

2) Yiming Qian reports another nf_tables transaction handling bug:
   in some cases error unwind misses to undo certain set elements,
   resulting in refcount underflow and use-after-free, bug added in 6.4.

3) Jenny Guanni Qu found out-of-bounds read in pipapo set type.
   While the value is never used, it still rightfully triggers KASAN
   splats.  Bug exists since this set type was added in 5.6.

4) a few x_tables modules contain copypastry tcp option parsing code which
    can read 1 byte past the option area.  This bug is ancient, fix from
    David Dull.

5) nfnetlink_queue leaks kernel memory if userspace provides bad
   NFQA_VLAN/NFQA_L2HDR attributes.  From Hyunwoo Kim, bug stems from
   from 4.7 days.

6) nfnetlink_cthelper has incorrect loop restart logic which may result
   in reading one pointer past end of array. From 3.6 days, fix also from
   Hyunwoo Kim.

7) xt_IDLETIMER v0 extension must reject working with timers added
   by revision v1, else we get list corruption. Bug added in v5.7.
   From Yifan Wu, Juefei Pu and Yuan Tan via Xin Lu.

* tag 'nf-26-03-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: xt_IDLETIMER: reject rev0 reuse of ALARM timer labels
  netfilter: nfnetlink_cthelper: fix OOB read in nfnl_cthelper_dump_table()
  netfilter: nfnetlink_queue: fix entry leak in bridge verdict error path
  netfilter: x_tables: guard option walkers against 1-byte tail reads
  netfilter: nft_set_pipapo: fix stack out-of-bounds read in pipapo_drop()
  netfilter: nf_tables: always walk all pending catchall elements
  netfilter: nf_tables: Fix for duplicate device in netdev hooks
====================

Link: https://patch.msgid.link/20260310132050.630-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 12 Mar 2026 02:08:15 +0000 (19:08 -0700)] 
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2026-03-10 (ice, iavf, i40e, e1000e, e1000)

Nikolay Aleksandrov changes return code of RDMA related ice devlink get
parameters when irdma is not enabled to -EOPNOTSUPP as current return
of -ENODEV causes issues with devlink output.

Petr Oros resolves a couple of issues in iavf; freeing PTP resources
before reset and disable. Fixing contention issues with the netdev lock
between reset and some ethtool operations.

Alok Tiwari corrects an incorrect comparison of cloud filter values and
adjust some passed arguments to sizeof() for consistency on i40e.

Matt Vollrath removes an incorrect decrement for DMA error on e1000 and
e1000e drivers.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000/e1000e: Fix leak in DMA error cleanup
  i40e: fix src IP mask checks and memcpy argument names in cloud filter
  iavf: fix incorrect reset handling in callbacks
  iavf: fix PTP use-after-free during reset
  drivers: net: ice: fix devlink parameters get without irdma
====================

Link: https://patch.msgid.link/20260310205654.4109072-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'neighbour-fix-update-of-proxy-neighbour'
Jakub Kicinski [Thu, 12 Mar 2026 02:04:58 +0000 (19:04 -0700)] 
Merge branch 'neighbour-fix-update-of-proxy-neighbour'

Sabrina Dubroca says:

====================
neighbour: fix update of proxy neighbour

While re-reading some "old" patches I ran into a small change of
behavior in commit dc2a27e524ac ("neighbour: Update pneigh_entry in
pneigh_create().").

The old behavior was not consistent between ->protocol and ->flags,
and didn't offer a way to clear protocol, so maybe it's better to
change that (7-years-old [1]) behavior. But then we should change
non-proxy neighbours as well to keep neigh/pneigh consistent.

[1] df9b0e30d44c ("neighbor: Add protocol attribute")
====================

Link: https://patch.msgid.link/cover.1772894876.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: rtnetlink: add neighbour update test
Sabrina Dubroca [Tue, 10 Mar 2026 21:59:17 +0000 (22:59 +0100)] 
selftests: rtnetlink: add neighbour update test

Check that protocol and flags are updated correctly for
neighbour and pneigh entries.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/d28f72b5b4ff4c9ecbbbde06146a938dcc4c264a.1772894876.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoneighbour: restore protocol != 0 check in pneigh update
Sabrina Dubroca [Tue, 10 Mar 2026 21:59:16 +0000 (22:59 +0100)] 
neighbour: restore protocol != 0 check in pneigh update

Prior to commit dc2a27e524ac ("neighbour: Update pneigh_entry in
pneigh_create()."), a pneigh's protocol was updated only when the
value of the NDA_PROTOCOL attribute was non-0. While moving the code,
that check was removed. This is a small change of user-visible
behavior, and inconsistent with the (non-proxy) neighbour behavior.

Fixes: dc2a27e524ac ("neighbour: Update pneigh_entry in pneigh_create().")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/38c61de1bb032871a886aff9b9b52fe1cdd4cada.1772894876.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: realtek: Fix LED group port bit for non-zero LED group
Marek Behún [Wed, 11 Mar 2026 11:12:37 +0000 (12:12 +0100)] 
net: dsa: realtek: Fix LED group port bit for non-zero LED group

The rtl8366rb_led_group_port_mask() function always returns LED port
bit in LED group 0; the switch statement returns the same thing in all
non-default cases.

This means that the driver does not currently support configuring LEDs
in non-zero LED groups.

Fix this.

Fixes: 32d617005475a71e ("net: dsa: realtek: add LED drivers for rtl8366rb")
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260311111237.29002-1-kabel@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agotipc: fix divide-by-zero in tipc_sk_filter_connect()
Mehul Rao [Tue, 10 Mar 2026 17:07:30 +0000 (13:07 -0400)] 
tipc: fix divide-by-zero in tipc_sk_filter_connect()

A user can set conn_timeout to any value via
setsockopt(TIPC_CONN_TIMEOUT), including values less than 4.  When a
SYN is rejected with TIPC_ERR_OVERLOAD and the retry path in
tipc_sk_filter_connect() executes:

    delay %= (tsk->conn_timeout / 4);

If conn_timeout is in the range [0, 3], the integer division yields 0,
and the modulo operation triggers a divide-by-zero exception, causing a
kernel oops/panic.

Fix this by clamping conn_timeout to a minimum of 4 at the point of use
in tipc_sk_filter_connect().

Oops: divide error: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 119 Comm: poc-F144 Not tainted 7.0.0-rc2+
RIP: 0010:tipc_sk_filter_rcv (net/tipc/socket.c:2236 net/tipc/socket.c:2362)
Call Trace:
 tipc_sk_backlog_rcv (include/linux/instrumented.h:82 include/linux/atomic/atomic-instrumented.h:32 include/net/sock.h:2357 net/tipc/socket.c:2406)
 __release_sock (include/net/sock.h:1185 net/core/sock.c:3213)
 release_sock (net/core/sock.c:3797)
 tipc_connect (net/tipc/socket.c:2570)
 __sys_connect (include/linux/file.h:62 include/linux/file.h:83 net/socket.c:2098)

Fixes: 6787927475e5 ("tipc: buffer overflow handling in listener socket")
Cc: stable@vger.kernel.org
Signed-off-by: Mehul Rao <mehulrao@gmail.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20260310170730.28841-1-mehulrao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: microchip: Fix error path in PTP IRQ setup
Bastien Curutchet (Schneider Electric) [Mon, 9 Mar 2026 13:15:43 +0000 (14:15 +0100)] 
net: dsa: microchip: Fix error path in PTP IRQ setup

If request_threaded_irq() fails during the PTP message IRQ setup, the
newly created IRQ mapping is never disposed. Indeed, the
ksz_ptp_irq_setup()'s error path only frees the mappings that were
successfully set up.

Dispose the newly created mapping if the associated
request_threaded_irq() fails at setup.

Cc: stable@vger.kernel.org
Fixes: d0b8fec8ae505 ("net: dsa: microchip: Fix symetry in ksz_ptp_msg_irq_{setup/free}()")
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20260309-ksz-ptp-irq-fix-v1-1-757b3b985955@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-bpf-nd_tbl-fixes-for-when-ipv6-disable-1'
Jakub Kicinski [Thu, 12 Mar 2026 00:53:40 +0000 (17:53 -0700)] 
Merge branch 'net-bpf-nd_tbl-fixes-for-when-ipv6-disable-1'

Ricardo B. Marlière says:

====================
{net,bpf}: nd_tbl fixes for when ipv6.disable=1

Please consider merging these four patches to fix three crashes that were
found after this report:

https://lore.kernel.org/all/CAHXs0ORzd62QOG-Fttqa2Cx_A_VFp=utE2H2VTX5nqfgs7LDxQ@mail.gmail.com

The first patch from Jakub Kicinski is a preparation in order to enable
the use ipv6_mod_enabled() even when CONFIG_IPV6=n.
====================

Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-0-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agobpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled
Ricardo B. Marlière [Sat, 7 Mar 2026 20:50:56 +0000 (17:50 -0300)] 
bpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled

When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called which
initializes it. If bpf_redirect_neigh() is called with explicit AF_INET6
nexthop parameters, __bpf_redirect_neigh_v6() can skip the IPv6 FIB lookup
and call bpf_out_neigh_v6() directly. bpf_out_neigh_v6() then calls
ip_neigh_gw6(), which uses ipv6_stub->nd_tbl.

 BUG: kernel NULL pointer dereference, address: 0000000000000248
 Oops: Oops: 0000 [#1] SMP NOPTI
 RIP: 0010:skb_do_redirect+0x44f/0xf40
 Call Trace:
  <TASK>
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? __tcf_classify.constprop.0+0x83/0x160
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? tcf_classify+0x2b/0x50
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? tc_run+0xb8/0x120
  ? srso_alias_return_thunk+0x5/0xfbef5
  __dev_queue_xmit+0x6fa/0x1000
  ? srso_alias_return_thunk+0x5/0xfbef5
  packet_sendmsg+0x10da/0x1700
  ? srso_alias_return_thunk+0x5/0xfbef5
  __sys_sendto+0x1f3/0x220
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x101/0xf80
  ? exc_page_fault+0x6e/0x170
  ? srso_alias_return_thunk+0x5/0xfbef5
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
  </TASK>

Fix this by adding an early check in bpf_out_neigh_v6(). If IPv6 is
disabled, drop the packet before neighbor lookup.

Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Fixes: ba452c9e996d ("bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-4-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agobpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled
Ricardo B. Marlière [Sat, 7 Mar 2026 20:50:55 +0000 (17:50 -0300)] 
bpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled

When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called which
initializes it. If bpf_redirect_neigh() is called from tc with an explicit
nexthop of nh_family == AF_INET6, bpf_out_neigh_v4() takes the AF_INET6
branch and calls ip_neigh_gw6(), which relies on ipv6_stub->nd_tbl.

 BUG: kernel NULL pointer dereference, address: 0000000000000248
 Oops: Oops: 0000 [#1] SMP NOPTI
 RIP: 0010:skb_do_redirect+0xb93/0xf00
 Call Trace:
  <TASK>
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? __tcf_classify.constprop.0+0x83/0x160
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? tcf_classify+0x2b/0x50
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? tc_run+0xb8/0x120
  ? srso_alias_return_thunk+0x5/0xfbef5
  __dev_queue_xmit+0x6fa/0x1000
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? alloc_skb_with_frags+0x58/0x200
  packet_sendmsg+0x10da/0x1700
  ? srso_alias_return_thunk+0x5/0xfbef5
  __sys_sendto+0x1f3/0x220
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x101/0xf80
  ? exc_page_fault+0x6e/0x170
  ? srso_alias_return_thunk+0x5/0xfbef5
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
  </TASK>

Fix this by adding an early check in the AF_INET6 branch of
bpf_out_neigh_v4(). If IPv6 is disabled, unlock RCU and drop the packet.

Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Fixes: ba452c9e996d ("bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-3-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled
Ricardo B. Marlière [Sat, 7 Mar 2026 20:50:54 +0000 (17:50 -0300)] 
net: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled

When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never
initialized because inet6_init() exits before ndisc_init() is called
which initializes it. If bonding ARP/NS validation is enabled, an IPv6
NS/NA packet received on a slave can reach bond_validate_na(), which
calls bond_has_this_ip6(). That path calls ipv6_chk_addr() and can
crash in __ipv6_chk_addr_and_flags().

 BUG: kernel NULL pointer dereference, address: 00000000000005d8
 Oops: Oops: 0000 [#1] SMP NOPTI
 RIP: 0010:__ipv6_chk_addr_and_flags+0x69/0x170
 Call Trace:
  <IRQ>
  ipv6_chk_addr+0x1f/0x30
  bond_validate_na+0x12e/0x1d0 [bonding]
  ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
  bond_rcv_validate+0x1a0/0x450 [bonding]
  bond_handle_frame+0x5e/0x290 [bonding]
  ? srso_alias_return_thunk+0x5/0xfbef5
  __netif_receive_skb_core.constprop.0+0x3e8/0xe50
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? update_cfs_rq_load_avg+0x1a/0x240
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? __enqueue_entity+0x5e/0x240
  __netif_receive_skb_one_core+0x39/0xa0
  process_backlog+0x9c/0x150
  __napi_poll+0x30/0x200
  ? srso_alias_return_thunk+0x5/0xfbef5
  net_rx_action+0x338/0x3b0
  handle_softirqs+0xc9/0x2a0
  do_softirq+0x42/0x60
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x62/0x70
  __dev_queue_xmit+0x2d3/0x1000
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? packet_parse_headers+0x10a/0x1a0
  packet_sendmsg+0x10da/0x1700
  ? kick_pool+0x5f/0x140
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? __queue_work+0x12d/0x4f0
  __sys_sendto+0x1f3/0x220
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x101/0xf80
  ? exc_page_fault+0x6e/0x170
  ? srso_alias_return_thunk+0x5/0xfbef5
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
  </TASK>

Fix this by checking ipv6_mod_enabled() before dispatching IPv6 packets to
bond_na_rcv(). If IPv6 is disabled, return early from bond_rcv_validate()
and avoid the path to ipv6_chk_addr().

Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-2-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoipv6: move the disable_ipv6_mod knob to core code
Jakub Kicinski [Sat, 7 Mar 2026 20:50:53 +0000 (17:50 -0300)] 
ipv6: move the disable_ipv6_mod knob to core code

From: Jakub Kicinski <kuba@kernel.org>

Make sure disable_ipv6_mod itself is not part of the IPv6 module,
in case core code wants to refer to it. We will remove support
for IPv6=m soon, this change helps make fixes we commit before
that less messy.

Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-1-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'rproc-v7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remotep...
Linus Torvalds [Wed, 11 Mar 2026 16:30:20 +0000 (09:30 -0700)] 
Merge tag 'rproc-v7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull remoteproc fixes from Bjorn Andersson:

 - Correct the early return from the i.MX remoteproc prepare
   operation, which prevented the platform-specific prepare
   function from being reached

 - Ensure that the Mediatek SCP clock is released during system
   suspend after the recent refactoring to avoid issues with the
   clock framework's prepare lock.

 - Correct the type of the subsys_name_len field in the sysmon
   event QMI message, as the recent introduction of big endian
   support in the QMI encoder highlighted the type mismatch and
   resulted in a failure to encode the message

 - Roll back the devm_ioremap_resource_wc() to a devm_ioremap_wc()
   in the Qualcomm WCNSS remoteproc driver, after reports that
   requesting this resource fails on some platforms

* tag 'rproc-v7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  remoteproc: imx_rproc: Fix unreachable platform prepare_ops
  remoteproc: mediatek: Unprepare SCP clock during system suspend
  remoteproc: sysmon: Correct subsys_name_len type in QMI request
  remoteproc: qcom_wcnss: Fix reserved region mapping failure

6 weeks agoMerge tag 'powerpc-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Wed, 11 Mar 2026 15:35:31 +0000 (08:35 -0700)] 
Merge tag 'powerpc-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:
 - Correct MSI allocation tracking
 - Always use 64 bits PTE for powerpc/e500
 - Fix inline assembly for clang build on PPC32
 - Fixes for clang build issues in powerpc64/ftrace
 - Fixes for powerpc64/bpf JIT and tailcall support
 - Cleanup MPC83XX devicetrees
 - Fix keymile vendor prefix
 - Fix to use big-endian types for crash variables

Thanks to Abhishek Dubey, Christophe Leroy (CS GROUP), Hari Bathini,
Heiko Schocher, J. Neuschäfer, Mahesh Salgaonkar, Nam Cao, Nilay Shroff,
Rob Herring (Arm), Saket Kumar Bhaskar, Sourabh Jain, Stan Johnson, and
Venkat Rao Bagalkote.

* tag 'powerpc-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (23 commits)
  powerpc/pseries: Correct MSI allocation tracking
  powerpc: dts: mpc83xx: Add unit addresses to /memory
  powerpc: dts: mpc8315erdb: Add missing #cells properties to SPI bus
  powerpc: dts: mpc8315erdb: Rename LED nodes to comply with schema
  powerpc: dts: mpc8315erdb: Use IRQ_TYPE_* macros
  powerpc: dts: mpc8313erdb: Use IRQ_TYPE_* macros
  powerpc: 83xx: km83xx: Fix keymile vendor prefix
  dt-bindings: powerpc: Add Freescale/NXP MPC83xx SoCs
  powerpc64/bpf: fix kfunc call support
  powerpc64/bpf: fix handling of BPF stack in exception callback
  powerpc64/bpf: remove BPF redzone protection in trampoline stack
  powerpc64/bpf: use consistent tailcall offset in trampoline
  powerpc64/bpf: fix the address returned by bpf_get_func_ip
  powerpc64/bpf: do not increment tailcall count when prog is NULL
  powerpc64/ftrace: workaround clang recording GEP in __patchable_function_entries
  powerpc64/ftrace: fix OOL stub count with clang
  powerpc64: make clang cross-build friendly
  powerpc/crash: adjust the elfcorehdr size
  powerpc/kexec/core: use big-endian types for crash variables
  powerpc/prom_init: Fixup missing #size-cells on PowerMac media-bay nodes
  ...

6 weeks agoMerge tag 'v7.0-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Wed, 11 Mar 2026 03:30:52 +0000 (20:30 -0700)] 
Merge tag 'v7.0-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Fix potential use after free errors

 - Fix refcount leak in smb2 open error path

 - Prevent allowing logging signing or encryption keys

* tag 'v7.0-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: Don't log keys in SMB3 signing and encryption key generation
  smb: server: fix use-after-free in smb2_open()
  ksmbd: fix use-after-free in smb_lazy_parent_lease_break_close()
  ksmbd: fix use-after-free by using call_rcu() for oplock_info
  ksmbd: fix use-after-free in proc_show_files due to early rcu_read_unlock
  smb/server: Fix another refcount leak in smb2_open()

6 weeks agonet: bcmgenet: fix broken EEE by converting to phylib-managed state
Nicolai Buchwitz [Tue, 10 Mar 2026 05:49:35 +0000 (06:49 +0100)] 
net: bcmgenet: fix broken EEE by converting to phylib-managed state

The bcmgenet EEE implementation is broken in several ways.
phy_support_eee() is never called, so the PHY never advertises EEE
and phylib never sets phydev->enable_tx_lpi.  bcmgenet_mac_config()
checks priv->eee.eee_enabled to decide whether to enable the MAC
LPI logic, but that field is never initialised to true, so the MAC
never enters Low Power Idle even when EEE is negotiated - wasting
the power savings EEE is designed to provide.  The only way to get
EEE working at all is a manual 'ethtool --set-eee eth0 eee on' after
every link-up, and even then bcmgenet_get_eee() immediately clobbers
the reported state because phy_ethtool_get_eee() overwrites
eee_enabled and tx_lpi_enabled with the uninitialised PHY eee_cfg
values.  Finally, bcmgenet_mac_config() is only called on link-up,
so EEE is never disabled in hardware on link-down.

Fix all of this by removing the MAC-side EEE state tracking
(priv->eee) and aligning with the pattern used by other non-phylink
MAC drivers such as FEC.

Call phy_support_eee() in bcmgenet_mii_probe() so the PHY advertises
EEE link modes and phylib tracks negotiation state.  Move the EEE
hardware control to bcmgenet_mii_setup(), which is called on every
link event, and drive it directly from phydev->enable_tx_lpi - the
flag phylib sets when EEE is negotiated and the user has not disabled
it.  This enables EEE automatically once the link partner agrees and
disables it cleanly on link-down.

Make bcmgenet_get_eee() and bcmgenet_set_eee() pure passthroughs to
phy_ethtool_get_eee() and phy_ethtool_set_eee(), with the MAC
hardware register read/written for tx_lpi_timer.  Drop struct
ethtool_keee eee from struct bcmgenet_priv.

Fixes: fe0d4fd9285e ("net: phy: Keep track of EEE configuration")
Link: https://lore.kernel.org/netdev/d352039f-4cbb-41e6-9aeb-0b4f3941b54c@lunn.ch/
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260310054935.1238594-1-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet-shapers: don't free reply skb after genlmsg_reply()
Paul Moses [Mon, 9 Mar 2026 17:35:10 +0000 (17:35 +0000)] 
net-shapers: don't free reply skb after genlmsg_reply()

genlmsg_reply() hands the reply skb to netlink, and
netlink_unicast() consumes it on all return paths, whether the
skb is queued successfully or freed on an error path.

net_shaper_nl_get_doit() and net_shaper_nl_cap_get_doit()
currently jump to free_msg after genlmsg_reply() fails and call
nlmsg_free(msg), which can hit the same skb twice.

Return the genlmsg_reply() error directly and keep free_msg
only for pre-reply failures.

Fixes: 4b623f9f0f59 ("net-shapers: implement NL get operation")
Fixes: 553ea9f1efd6 ("net: shaper: implement introspection support")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moses <p@1g4.org>
Link: https://patch.msgid.link/20260309173450.538026-2-p@1g4.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: dsa: mxl862xx: don't set user_mii_bus
Daniel Golle [Tue, 10 Mar 2026 00:41:56 +0000 (00:41 +0000)] 
net: dsa: mxl862xx: don't set user_mii_bus

The PHY addresses in the MII bus are not equal to the port addresses,
so the bus cannot be assigned as user_mii_bus. Falling back on the
user_mii_bus in case a PHY isn't declared in device tree will result in
using the wrong (in this case: off-by-+1) PHY.
Remove the wrong assignment.

Fixes: 23794bec1cb60 ("net: dsa: add basic initial driver for MxL862xx switches")
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/0f0df310fd8cab57e0e5e3d0831dd057fd05bcd5.1773103271.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: ethernet: arc: emac: quiesce interrupts before requesting IRQ
Fan Wu [Mon, 9 Mar 2026 13:24:09 +0000 (13:24 +0000)] 
net: ethernet: arc: emac: quiesce interrupts before requesting IRQ

Normal RX/TX interrupts are enabled later, in arc_emac_open(), so probe
should not see interrupt delivery in the usual case. However, hardware may
still present stale or latched interrupt status left by firmware or the
bootloader.

If probe later unwinds after devm_request_irq() has installed the handler,
such a stale interrupt can still reach arc_emac_intr() during teardown and
race with release of the associated net_device.

Avoid that window by putting the device into a known quiescent state before
requesting the IRQ: disable all EMAC interrupt sources and clear any
pending EMAC interrupt status bits. This keeps the change hardware-focused
and minimal, while preventing spurious IRQ delivery from leftover state.

Fixes: e4f2379db6c6 ("ethernet/arc/arc_emac - Add new driver")
Cc: stable@vger.kernel.org
Signed-off-by: Fan Wu <fanwu01@zju.edu.cn>
Link: https://patch.msgid.link/20260309132409.584966-1-fanwu01@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agopage_pool: store detach_time as ktime_t to avoid false-negatives
Jakub Kicinski [Tue, 10 Mar 2026 00:39:07 +0000 (17:39 -0700)] 
page_pool: store detach_time as ktime_t to avoid false-negatives

While testing other changes in vng I noticed that
nl_netdev.page_pool_check flakes. This never happens in real CI.

Turns out vng may boot and get to that test in less than a second.
page_pool_detached() records the detach time in seconds, so if
vng is fast enough detach time is set to 0. Other code treats
0 as "not detached". detach_time is only used to report the state
to the user, so it's not a huge deal in practice but let's fix it.
Store the raw ktime_t (nanoseconds) instead. A nanosecond value
of 0 is practically impossible.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Fixes: 69cb4952b6f6 ("net: page_pool: report when page pool was destroyed")
Link: https://patch.msgid.link/20260310003907.3540019-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: macb: Shuffle the tx ring before enabling tx
Kevin Hao [Sat, 7 Mar 2026 07:08:54 +0000 (15:08 +0800)] 
net: macb: Shuffle the tx ring before enabling tx

Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board,
the rootfs may take an extended time to recover after a suspend.
Upon investigation, it was determined that the issue originates from a
problem in the macb driver.

According to the Zynq UltraScale TRM [1], when transmit is disabled,
the transmit buffer queue pointer resets to point to the address
specified by the transmit buffer queue base address register.

In the current implementation, the code merely resets `queue->tx_head`
and `queue->tx_tail` to '0'. This approach presents several issues:

- Packets already queued in the tx ring are silently lost,
  leading to memory leaks since the associated skbs cannot be released.

- Concurrent write access to `queue->tx_head` and `queue->tx_tail` may
  occur from `macb_tx_poll()` or `macb_start_xmit()` when these values
  are reset to '0'.

- The transmission may become stuck on a packet that has already been sent
  out, with its 'TX_USED' bit set, but has not yet been processed. However,
  due to the manipulation of 'queue->tx_head' and 'queue->tx_tail',
  `macb_tx_poll()` incorrectly assumes there are no packets to handle
  because `queue->tx_head == queue->tx_tail`. This issue is only resolved
  when a new packet is placed at this position. This is the root cause of
  the prolonged recovery time observed for the NFS root filesystem.

To resolve this issue, shuffle the tx ring and tx skb array so that
the first unsent packet is positioned at the start of the tx ring.
Additionally, ensure that updates to `queue->tx_head` and
`queue->tx_tail` are properly protected with the appropriate lock.

[1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm

Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up")
Reported-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260307-zynqmp-v2-1-6ef98a70e1d0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoe1000/e1000e: Fix leak in DMA error cleanup
Matt Vollrath [Tue, 24 Feb 2026 23:28:33 +0000 (18:28 -0500)] 
e1000/e1000e: Fix leak in DMA error cleanup

If an error is encountered while mapping TX buffers, the driver should
unmap any buffers already mapped for that skb.

Because count is incremented after a successful mapping, it will always
match the correct number of unmappings needed when dma_error is reached.
Decrementing count before the while loop in dma_error causes an
off-by-one error. If any mapping was successful before an unsuccessful
mapping, exactly one DMA mapping would leak.

In these commits, a faulty while condition caused an infinite loop in
dma_error:
Commit 03b1320dfcee ("e1000e: remove use of skb_dma_map from e1000e
driver")
Commit 602c0554d7b0 ("e1000: remove use of skb_dma_map from e1000 driver")

Commit c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of
unsigned in *_tx_map()") fixed the infinite loop, but introduced the
off-by-one error.

This issue may still exist in the igbvf driver, but I did not address it
in this patch.

Fixes: c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()")
Assisted-by: Claude:claude-4.6-opus
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoi40e: fix src IP mask checks and memcpy argument names in cloud filter
Alok Tiwari [Mon, 10 Nov 2025 19:13:38 +0000 (11:13 -0800)] 
i40e: fix src IP mask checks and memcpy argument names in cloud filter

Fix following issues in the IPv4 and IPv6 cloud filter handling logic in
both the add and delete paths:

- The source-IP mask check incorrectly compares mask.src_ip[0] against
  tcf.dst_ip[0]. Update it to compare against tcf.src_ip[0]. This likely
  goes unnoticed because the check is in an "else if" path that only
  executes when dst_ip is not set, most cloud filter use cases focus on
  destination-IP matching, and the buggy condition can accidentally
  evaluate true in some cases.

- memcpy() for the IPv4 source address incorrectly uses
  ARRAY_SIZE(tcf.dst_ip) instead of ARRAY_SIZE(tcf.src_ip), although
  both arrays are the same size.

- The IPv4 memcpy operations used ARRAY_SIZE(tcf.dst_ip) and ARRAY_SIZE
  (tcf.src_ip), Update these to use sizeof(cfilter->ip.v4.dst_ip) and
  sizeof(cfilter->ip.v4.src_ip) to ensure correct and explicit copy size.

- In the IPv6 delete path, memcmp() uses sizeof(src_ip6) when comparing
  dst_ip6 fields. Replace this with sizeof(dst_ip6) to make the intent
  explicit, even though both fields are struct in6_addr.

Fixes: e284fc280473 ("i40e: Add and delete cloud filter")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoMerge tag 'mm-hotfixes-stable-2026-03-09-16-36' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 10 Mar 2026 19:47:56 +0000 (12:47 -0700)] 
Merge tag 'mm-hotfixes-stable-2026-03-09-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "15 hotfixes. 6 are cc:stable. 14 are for MM.

  Singletons, with one doubleton - please see the changelogs for details"

* tag 'mm-hotfixes-stable-2026-03-09-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS, mailmap: update email address for Lorenzo Stoakes
  mm/mmu_notifier: clean up mmu_notifier.h kernel-doc
  uaccess: correct kernel-doc parameter format
  mm/huge_memory: fix a folio_split() race condition with folio_try_get()
  MAINTAINERS: add co-maintainer and reviewer for SLAB ALLOCATOR
  MAINTAINERS: add RELAY entry
  memcg: fix slab accounting in refill_obj_stock() trylock path
  mm/hugetlb.c: use __pa() instead of virt_to_phys() in early bootmem alloc code
  zram: rename writeback_compressed device attr
  tools/testing: fix testing/vma and testing/radix-tree build
  Revert "ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor()"
  mm/cma: move put_page_testzero() out of VM_WARN_ON in cma_release()
  mm/damon/core: clear walk_control on inactive context in damos_walk()
  mm: memfd_luo: always dirty all folios
  mm: memfd_luo: always make all folios uptodate

6 weeks agoiavf: fix incorrect reset handling in callbacks
Petr Oros [Wed, 11 Feb 2026 19:18:55 +0000 (20:18 +0100)] 
iavf: fix incorrect reset handling in callbacks

Three driver callbacks schedule a reset and wait for its completion:
ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().

Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
it") to fix a race condition where adding an interface to bonding
immediately after MTU or ring parameter change failed because the
interface was still in __RESETTING state. The same commit also added
waiting in iavf_set_priv_flags(), which was later removed by commit
53844673d555 ("iavf: kill "legacy-rx" for good").

Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
("iavf: Fix return of set the new channel count") to ensure the PF has
enough time to complete the VF reset when changing channel count, and to
return correct error codes to userspace.

Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
net_shaper_ops to iavf, which required reset_task to use _locked NAPI
variants (napi_enable_locked, napi_disable_locked) that need the netdev
instance lock.

Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
protect all callback with netdev instance lock") started holding the
netdev instance lock during ndo and ethtool callbacks for drivers with
net_shaper_ops.

Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
replaced the driver's crit_lock with netdev_lock in reset_task, causing
incorrect behavior: the callback holds netdev_lock and waits for
reset_task, but reset_task needs the same lock:

  Thread 1 (callback)               Thread 2 (reset_task)
  -------------------               ---------------------
  netdev_lock()                     [blocked on workqueue]
  ndo_change_mtu() or ethtool op
    iavf_schedule_reset()
    iavf_wait_for_reset()           iavf_reset_task()
      waiting...                      netdev_lock() <- blocked

This does not strictly deadlock because iavf_wait_for_reset() uses
wait_event_interruptible_timeout() with a 5-second timeout. The wait
eventually times out, the callback returns an error to userspace, and
after the lock is released reset_task completes the reset. This leads to
incorrect behavior: userspace sees an error even though the configuration
change silently takes effect after the timeout.

Fix this by extracting the reset logic from iavf_reset_task() into a new
iavf_reset_step() function that expects netdev_lock to be already held.
The three callbacks now call iavf_reset_step() directly instead of
scheduling the work and waiting, performing the reset synchronously in
the caller's context which already holds netdev_lock. This eliminates
both the incorrect error reporting and the need for
iavf_wait_for_reset(), which is removed along with the now-unused
reset_waitqueue.

The workqueue-based iavf_reset_task() becomes a thin wrapper that
acquires netdev_lock and calls iavf_reset_step(), preserving its use
for PF-initiated resets.

The callbacks may block for several seconds while iavf_reset_step()
polls hardware registers, but this is acceptable since netdev_lock is a
per-device mutex and only serializes operations on the same interface.

v3:
- Remove netif_running() guard from iavf_set_channels(). Unlike
  set_ringparam where descriptor counts are picked up by iavf_open()
  directly, num_req_queues is only consumed during
  iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset
  on a down device would silently discard the channel count change.
- Remove dead reset_waitqueue code (struct field, init, and all
  wake_up calls) since iavf_wait_for_reset() was the only consumer.

Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoiavf: fix PTP use-after-free during reset
Petr Oros [Thu, 29 Jan 2026 09:57:23 +0000 (10:57 +0100)] 
iavf: fix PTP use-after-free during reset

Commit 7c01dbfc8a1c5f ("iavf: periodically cache PHC time") introduced a
worker to cache PHC time, but failed to stop it during reset or disable.

This creates a race condition where `iavf_reset_task()` or
`iavf_disable_vf()` free adapter resources (AQ) while the worker is still
running. If the worker triggers `iavf_queue_ptp_cmd()` during teardown, it
accesses freed memory/locks, leading to a crash.

Fix this by calling `iavf_ptp_release()` before tearing down the adapter.
This ensures `ptp_clock_unregister()` synchronously cancels the worker and
cleans up the chardev before the backing resources are destroyed.

Fixes: 7c01dbfc8a1c5f ("iavf: periodically cache PHC time")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agodrivers: net: ice: fix devlink parameters get without irdma
Nikolay Aleksandrov [Fri, 13 Feb 2026 08:48:41 +0000 (10:48 +0200)] 
drivers: net: ice: fix devlink parameters get without irdma

If CONFIG_IRDMA isn't enabled but there are ice NICs in the system, the
driver will prevent full devlink dev param show dump because its rdma get
callbacks return ENODEV and stop the dump. For example:
 $ devlink dev param show
 pci/0000:82:00.0:
   name msix_vec_per_pf_max type generic
     values:
       cmode driverinit value 2
   name msix_vec_per_pf_min type generic
     values:
       cmode driverinit value 2
 kernel answers: No such device

Returning EOPNOTSUPP allows the dump to continue so we can see all devices'
devlink parameters.

Fixes: c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoMerge tag 'linux-can-fixes-for-7.0-20260310' of git://git.kernel.org/pub/scm/linux...
Paolo Abeni [Tue, 10 Mar 2026 14:13:55 +0000 (15:13 +0100)] 
Merge tag 'linux-can-fixes-for-7.0-20260310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2026-03-10

this is a pull request of 2 patches for net/main.

Haibo Chen's patch fixes the maximum allowed bit rate error, which was
broken in v6.19.

Wenyuan Li contributes a patch for the hi311x driver that adds missing
error checking in the caller of the hi3110_power_enable() function,
hi3110_open().

linux-can-fixes-for-7.0-20260310

* tag 'linux-can-fixes-for-7.0-20260310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: hi311x: hi3110_open(): add check for hi3110_power_enable() return value
  can: dev: keep the max bitrate error at 5%
====================

Link: https://patch.msgid.link/20260310103547.2299403-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonetfilter: xt_IDLETIMER: reject rev0 reuse of ALARM timer labels
Yuan Tan [Mon, 9 Mar 2026 10:41:46 +0000 (03:41 -0700)] 
netfilter: xt_IDLETIMER: reject rev0 reuse of ALARM timer labels

IDLETIMER revision 0 rules reuse existing timers by label and always call
mod_timer() on timer->timer.

If the label was created first by revision 1 with XT_IDLETIMER_ALARM,
the object uses alarm timer semantics and timer->timer is never initialized.
Reusing that object from revision 0 causes mod_timer() on an uninitialized
timer_list, triggering debugobjects warnings and possible panic when
panic_on_warn=1.

Fix this by rejecting revision 0 rule insertion when an existing timer with
the same label is of ALARM type.

Fixes: 68983a354a65 ("netfilter: xtables: Add snapshot of hardidletimer target")
Co-developed-by: Yifan Wu <yifanwucs@gmail.com>
Signed-off-by: Yifan Wu <yifanwucs@gmail.com>
Co-developed-by: Juefei Pu <tomapufckgml@gmail.com>
Signed-off-by: Juefei Pu <tomapufckgml@gmail.com>
Signed-off-by: Yuan Tan <tanyuan98@outlook.com>
Signed-off-by: Xin Liu <dstsmallbird@foxmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
6 weeks agonetfilter: nfnetlink_cthelper: fix OOB read in nfnl_cthelper_dump_table()
Hyunwoo Kim [Sat, 7 Mar 2026 17:23:34 +0000 (02:23 +0900)] 
netfilter: nfnetlink_cthelper: fix OOB read in nfnl_cthelper_dump_table()

nfnl_cthelper_dump_table() has a 'goto restart' that jumps to a label
inside the for loop body.  When the "last" helper saved in cb->args[1]
is deleted between dump rounds, every entry fails the (cur != last)
check, so cb->args[1] is never cleared.  The for loop finishes with
cb->args[0] == nf_ct_helper_hsize, and the 'goto restart' jumps back
into the loop body bypassing the bounds check, causing an 8-byte
out-of-bounds read on nf_ct_helper_hash[nf_ct_helper_hsize].

The 'goto restart' block was meant to re-traverse the current bucket
when "last" is no longer found, but it was placed after the for loop
instead of inside it.  Move the block into the for loop body so that
the restart only occurs while cb->args[0] is still within bounds.

 BUG: KASAN: slab-out-of-bounds in nfnl_cthelper_dump_table+0x9f/0x1b0
 Read of size 8 at addr ffff888104ca3000 by task poc_cthelper/131
 Call Trace:
  nfnl_cthelper_dump_table+0x9f/0x1b0
  netlink_dump+0x333/0x880
  netlink_recvmsg+0x3e2/0x4b0
  sock_recvmsg+0xde/0xf0
  __sys_recvfrom+0x150/0x200
  __x64_sys_recvfrom+0x76/0x90
  do_syscall_64+0xc3/0x6e0

 Allocated by task 1:
  __kvmalloc_node_noprof+0x21b/0x700
  nf_ct_alloc_hashtable+0x65/0xd0
  nf_conntrack_helper_init+0x21/0x60
  nf_conntrack_init_start+0x18d/0x300
  nf_conntrack_standalone_init+0x12/0xc0

Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
6 weeks agonetfilter: nfnetlink_queue: fix entry leak in bridge verdict error path
Hyunwoo Kim [Sat, 7 Mar 2026 17:24:06 +0000 (02:24 +0900)] 
netfilter: nfnetlink_queue: fix entry leak in bridge verdict error path

nfqnl_recv_verdict() calls find_dequeue_entry() to remove the queue
entry from the queue data structures, taking ownership of the entry.
For PF_BRIDGE packets, it then calls nfqa_parse_bridge() to parse VLAN
attributes.  If nfqa_parse_bridge() returns an error (e.g. NFQA_VLAN
present but NFQA_VLAN_TCI missing), the function returns immediately
without freeing the dequeued entry or its sk_buff.

This leaks the nf_queue_entry, its associated sk_buff, and all held
references (net_device refcounts, struct net refcount).  Repeated
triggering exhausts kernel memory.

Fix this by dropping the entry via nfqnl_reinject() with NF_DROP verdict
on the error path, consistent with other error handling in this file.

Fixes: 8d45ff22f1b4 ("netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR")
Reviewed-by: David Dull <monderasdor@gmail.com>
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
6 weeks agonetfilter: x_tables: guard option walkers against 1-byte tail reads
David Dull [Sat, 7 Mar 2026 18:26:21 +0000 (20:26 +0200)] 
netfilter: x_tables: guard option walkers against 1-byte tail reads

When the last byte of options is a non-single-byte option kind, walkers
that advance with i += op[i + 1] ? : 1 can read op[i + 1] past the end
of the option area.

Add an explicit i == optlen - 1 check before dereferencing op[i + 1]
in xt_tcpudp and xt_dccp option walkers.

Fixes: 2e4e6a17af35 ("[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables")
Signed-off-by: David Dull <monderasdor@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
6 weeks agonetfilter: nft_set_pipapo: fix stack out-of-bounds read in pipapo_drop()
Jenny Guanni Qu [Fri, 6 Mar 2026 19:12:38 +0000 (19:12 +0000)] 
netfilter: nft_set_pipapo: fix stack out-of-bounds read in pipapo_drop()

pipapo_drop() passes rulemap[i + 1].n to pipapo_unmap() as the
to_offset argument on every iteration, including the last one where
i == m->field_count - 1. This reads one element past the end of the
stack-allocated rulemap array (declared as rulemap[NFT_PIPAPO_MAX_FIELDS]
with NFT_PIPAPO_MAX_FIELDS == 16).

Although pipapo_unmap() returns early when is_last is true without
using the to_offset value, the argument is evaluated at the call site
before the function body executes, making this a genuine out-of-bounds
stack read confirmed by KASAN:

  BUG: KASAN: stack-out-of-bounds in pipapo_drop+0x50c/0x57c [nf_tables]
  Read of size 4 at addr ffff8000810e71a4

  This frame has 1 object:
   [32, 160) 'rulemap'

  The buggy address is at offset 164 -- exactly 4 bytes past the end
  of the rulemap array.

Pass 0 instead of rulemap[i + 1].n on the last iteration to avoid
the out-of-bounds read.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
6 weeks agonetfilter: nf_tables: always walk all pending catchall elements
Florian Westphal [Thu, 5 Mar 2026 20:32:00 +0000 (21:32 +0100)] 
netfilter: nf_tables: always walk all pending catchall elements

During transaction processing we might have more than one catchall element:
1 live catchall element and 1 pending element that is coming as part of the
new batch.

If the map holding the catchall elements is also going away, its
required to toggle all catchall elements and not just the first viable
candidate.

Otherwise, we get:
 WARNING: ./include/net/netfilter/nf_tables.h:1281 at nft_data_release+0xb7/0xe0 [nf_tables], CPU#2: nft/1404
 RIP: 0010:nft_data_release+0xb7/0xe0 [nf_tables]
 [..]
 __nft_set_elem_destroy+0x106/0x380 [nf_tables]
 nf_tables_abort_release+0x348/0x8d0 [nf_tables]
 nf_tables_abort+0xcf2/0x3ac0 [nf_tables]
 nfnetlink_rcv_batch+0x9c9/0x20e0 [..]

Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
6 weeks agonetfilter: nf_tables: Fix for duplicate device in netdev hooks
Phil Sutter [Thu, 5 Mar 2026 12:01:44 +0000 (13:01 +0100)] 
netfilter: nf_tables: Fix for duplicate device in netdev hooks

When handling NETDEV_REGISTER notification, duplicate device
registration must be avoided since the device may have been added by
nft_netdev_hook_alloc() already when creating the hook.

Suggested-by: Florian Westphal <fw@strlen.de>
Reported-by: syzbot+bb9127e278fa198e110c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bb9127e278fa198e110c
Fixes: a331b78a5525 ("netfilter: nf_tables: Respect NETDEV_REGISTER events")
Tested-by: Helen Koike <koike@igalia.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
6 weeks agonet: add xmit recursion limit to tunnel xmit functions
Weiming Shi [Fri, 6 Mar 2026 16:01:34 +0000 (00:01 +0800)] 
net: add xmit recursion limit to tunnel xmit functions

Tunnel xmit functions (iptunnel_xmit, ip6tunnel_xmit) lack their own
recursion limit. When a bond device in broadcast mode has GRE tap
interfaces as slaves, and those GRE tunnels route back through the
bond, multicast/broadcast traffic triggers infinite recursion between
bond_xmit_broadcast() and ip_tunnel_xmit()/ip6_tnl_xmit(), causing
kernel stack overflow.

The existing XMIT_RECURSION_LIMIT (8) in the no-qdisc path is not
sufficient because tunnel recursion involves route lookups and full IP
output, consuming much more stack per level. Use a lower limit of 4
(IP_TUNNEL_RECURSION_LIMIT) to prevent overflow.

Add recursion detection using dev_xmit_recursion helpers directly in
iptunnel_xmit() and ip6tunnel_xmit() to cover all IPv4/IPv6 tunnel
paths including UDP encapsulated tunnels (VXLAN, Geneve, etc.).

Move dev_xmit_recursion helpers from net/core/dev.h to public header
include/linux/netdevice.h so they can be used by tunnel code.

 BUG: KASAN: stack-out-of-bounds in blake2s.constprop.0+0xe7/0x160
 Write of size 32 at addr ffff88810033fed0 by task kworker/0:1/11
 Workqueue: mld mld_ifc_work
 Call Trace:
  <TASK>
  __build_flow_key.constprop.0 (net/ipv4/route.c:515)
  ip_rt_update_pmtu (net/ipv4/route.c:1073)
  iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84)
  ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847)
  gre_tap_xmit (net/ipv4/ip_gre.c:779)
  dev_hard_start_xmit (net/core/dev.c:3887)
  sch_direct_xmit (net/sched/sch_generic.c:347)
  __dev_queue_xmit (net/core/dev.c:4802)
  bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
  bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
  bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
  dev_hard_start_xmit (net/core/dev.c:3887)
  __dev_queue_xmit (net/core/dev.c:4841)
  ip_finish_output2 (net/ipv4/ip_output.c:237)
  ip_output (net/ipv4/ip_output.c:438)
  iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86)
  gre_tap_xmit (net/ipv4/ip_gre.c:779)
  dev_hard_start_xmit (net/core/dev.c:3887)
  sch_direct_xmit (net/sched/sch_generic.c:347)
  __dev_queue_xmit (net/core/dev.c:4802)
  bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
  bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
  bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
  dev_hard_start_xmit (net/core/dev.c:3887)
  __dev_queue_xmit (net/core/dev.c:4841)
  ip_finish_output2 (net/ipv4/ip_output.c:237)
  ip_output (net/ipv4/ip_output.c:438)
  iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86)
  ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847)
  gre_tap_xmit (net/ipv4/ip_gre.c:779)
  dev_hard_start_xmit (net/core/dev.c:3887)
  sch_direct_xmit (net/sched/sch_generic.c:347)
  __dev_queue_xmit (net/core/dev.c:4802)
  bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312)
  bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279)
  bond_start_xmit (drivers/net/bonding/bond_main.c:5530)
  dev_hard_start_xmit (net/core/dev.c:3887)
  __dev_queue_xmit (net/core/dev.c:4841)
  mld_sendpack
  mld_ifc_work
  process_one_work
  worker_thread
  </TASK>

Fixes: 745e20f1b626 ("net: add a recursion limit in xmit path")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260306160133.3852900-2-bestswngs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge branch 'amd-xgbe-rx-adaptation-and-phy-handling-fixes'
Paolo Abeni [Tue, 10 Mar 2026 11:07:08 +0000 (12:07 +0100)] 
Merge branch 'amd-xgbe-rx-adaptation-and-phy-handling-fixes'

Raju Rangoju says:

====================
amd-xgbe: RX adaptation and PHY handling fixes

This series fixes several issues in the amd-xgbe driver related to RX
adaptation and PHY handling in 10GBASE-KR mode, particularly when
auto-negotiation is disabled.

Patch 1 fixes link status handling during RX adaptation by correctly
reading the latched link status bit so transient link drops are
detected without losing the current state.

Patch 2 prevents CRC errors that can occur when performing RX
adaptation with auto-negotiation turned off. The driver now stops
TX/RX before re-triggering RX adaptation and only re-enables traffic
once adaptation completes and the link is confirmed up, ensuring
packets are not corrupted during the adaptation window.

Patch 3 restores the intended ordering of PHY reset relative to
phy_start(), making sure PHY settings are reset before the PHY is
started instead of afterwards.
====================

Link: https://patch.msgid.link/20260306111629.1515676-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoamd-xgbe: reset PHY settings before starting PHY
Raju Rangoju [Fri, 6 Mar 2026 11:16:29 +0000 (16:46 +0530)] 
amd-xgbe: reset PHY settings before starting PHY

commit f93505f35745 ("amd-xgbe: let the MAC manage PHY PM") moved
xgbe_phy_reset() from xgbe_open() to xgbe_start(), placing it after
phy_start(). As a result, the PHY settings were being reset after the
PHY had already started.

Reorder the calls so that the PHY settings are reset before
phy_start() is invoked.

Fixes: f93505f35745 ("amd-xgbe: let the MAC manage PHY PM")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260306111629.1515676-4-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoamd-xgbe: prevent CRC errors during RX adaptation with AN disabled
Raju Rangoju [Fri, 6 Mar 2026 11:16:28 +0000 (16:46 +0530)] 
amd-xgbe: prevent CRC errors during RX adaptation with AN disabled

When operating in 10GBASE-KR mode with auto-negotiation disabled and RX
adaptation enabled, CRC errors can occur during the RX adaptation
process. This happens because the driver continues transmitting and
receiving packets while adaptation is in progress.

Fix this by stopping TX/RX immediately when the link goes down and RX
adaptation needs to be re-triggered, and only re-enabling TX/RX after
adaptation completes and the link is confirmed up. Introduce a flag to
track whether TX/RX was disabled for adaptation so it can be restored
correctly.

This prevents packets from being transmitted or received during the RX
adaptation window and avoids CRC errors from corrupted frames.

The flag tracking the data path state is synchronized with hardware
state in xgbe_start() to prevent stale state after device restarts.
This ensures that after a restart cycle (where xgbe_stop disables
TX/RX and xgbe_start re-enables them), the flag correctly reflects
that the data path is active.

Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260306111629.1515676-3-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoamd-xgbe: fix link status handling in xgbe_rx_adaptation
Raju Rangoju [Fri, 6 Mar 2026 11:16:27 +0000 (16:46 +0530)] 
amd-xgbe: fix link status handling in xgbe_rx_adaptation

The link status bit is latched low to allow detection of momentary
link drops. If the status indicates that the link is already down,
read it again to obtain the current state.

Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260306111629.1515676-2-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agomctp: route: hold key->lock in mctp_flow_prepare_output()
Chengfeng Ye [Fri, 6 Mar 2026 03:14:02 +0000 (03:14 +0000)] 
mctp: route: hold key->lock in mctp_flow_prepare_output()

mctp_flow_prepare_output() checks key->dev and may call
mctp_dev_set_key(), but it does not hold key->lock while doing so.

mctp_dev_set_key() and mctp_dev_release_key() are annotated with
__must_hold(&key->lock), so key->dev access is intended to be
serialized by key->lock. The mctp_sendmsg() transmit path reaches
mctp_flow_prepare_output() via mctp_local_output() -> mctp_dst_output()
without holding key->lock, so the check-and-set sequence is racy.

Example interleaving:

  CPU0                                  CPU1
  ----                                  ----
  mctp_flow_prepare_output(key, devA)
    if (!key->dev)  // sees NULL
                                        mctp_flow_prepare_output(
                                            key, devB)
                                          if (!key->dev)  // still NULL
                                          mctp_dev_set_key(devB, key)
                                            mctp_dev_hold(devB)
                                            key->dev = devB
    mctp_dev_set_key(devA, key)
      mctp_dev_hold(devA)
      key->dev = devA   // overwrites devB

Now both devA and devB references were acquired, but only the final
key->dev value is tracked for release. One reference can be lost,
causing a resource leak as mctp_dev_release_key() would only decrease
the reference on one dev.

Fix by taking key->lock around the key->dev check and
mctp_dev_set_key() call.

Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://patch.msgid.link/20260306031402.857224-1-dg573847474@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agobonding: fix type confusion in bond_setup_by_slave()
Jiayuan Chen [Fri, 6 Mar 2026 02:15:07 +0000 (10:15 +0800)] 
bonding: fix type confusion in bond_setup_by_slave()

kernel BUG at net/core/skbuff.c:2306!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:pskb_expand_head+0xa08/0xfe0 net/core/skbuff.c:2306
RSP: 0018:ffffc90004aff760 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88807e3c8780 RCX: ffffffff89593e0e
RDX: ffff88807b7c4900 RSI: ffffffff89594747 RDI: ffff88807b7c4900
RBP: 0000000000000820 R08: 0000000000000005 R09: 0000000000000000
R10: 00000000961a63e0 R11: 0000000000000000 R12: ffff88807e3c8780
R13: 00000000961a6560 R14: dffffc0000000000 R15: 00000000961a63e0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1a0ed8df0 CR3: 000000002d816000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 ipgre_header+0xdd/0x540 net/ipv4/ip_gre.c:900
 dev_hard_header include/linux/netdevice.h:3439 [inline]
 packet_snd net/packet/af_packet.c:3028 [inline]
 packet_sendmsg+0x3ae5/0x53c0 net/packet/af_packet.c:3108
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg net/socket.c:742 [inline]
 ____sys_sendmsg+0xa54/0xc30 net/socket.c:2592
 ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
 __sys_sendmsg+0x170/0x220 net/socket.c:2678
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe1a0e6c1a9

When a non-Ethernet device (e.g. GRE tunnel) is enslaved to a bond,
bond_setup_by_slave() directly copies the slave's header_ops to the
bond device:

    bond_dev->header_ops = slave_dev->header_ops;

This causes a type confusion when dev_hard_header() is later called
on the bond device. Functions like ipgre_header(), ip6gre_header(),all use
netdev_priv(dev) to access their device-specific private data. When
called with the bond device, netdev_priv() returns the bond's private
data (struct bonding) instead of the expected type (e.g. struct
ip_tunnel), leading to garbage values being read and kernel crashes.

Fix this by introducing bond_header_ops with wrapper functions that
delegate to the active slave's header_ops using the slave's own
device. This ensures netdev_priv() in the slave's header functions
always receives the correct device.

The fix is placed in the bonding driver rather than individual device
drivers, as the root cause is bond blindly inheriting header_ops from
the slave without considering that these callbacks expect a specific
netdev_priv() layout.

The type confusion can be observed by adding a printk in
ipgre_header() and running the following commands:

    ip link add dummy0 type dummy
    ip addr add 10.0.0.1/24 dev dummy0
    ip link set dummy0 up
    ip link add gre1 type gre local 10.0.0.1
    ip link add bond1 type bond mode active-backup
    ip link set gre1 master bond1
    ip link set gre1 up
    ip link set bond1 up
    ip addr add fe80::1/64 dev bond1

Fixes: 1284cd3a2b74 ("bonding: two small fixes for IPoIB support")
Suggested-by: Jay Vosburgh <jv@jvosburgh.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260306021508.222062-1-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agocan: hi311x: hi3110_open(): add check for hi3110_power_enable() return value
Wenyuan Li [Tue, 10 Mar 2026 05:08:44 +0000 (13:08 +0800)] 
can: hi311x: hi3110_open(): add check for hi3110_power_enable() return value

In hi3110_open(), the return value of hi3110_power_enable() is not checked.
If power enable fails, the device may not function correctly, while the
driver still returns success.

Add a check for the return value and propagate the error accordingly.

Signed-off-by: Wenyuan Li <2063309626@qq.com>
Link: https://patch.msgid.link/tencent_B5E2E7528BB28AA8A2A56E16C49BD58B8B07@qq.com
Fixes: 57e83fb9b746 ("can: hi311x: Add Holt HI-311x CAN driver")
[mkl: adjust subject, commit message and jump label]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
6 weeks agocan: dev: keep the max bitrate error at 5%
Haibo Chen [Fri, 6 Mar 2026 09:04:48 +0000 (17:04 +0800)] 
can: dev: keep the max bitrate error at 5%

Commit b360a13d44db ("can: dev: print bitrate error with two decimal
digits") changed calculation of the bit rate error from on-tenth of a
percent to on-hundredth of a percent, but forgot to adjust the scale of the
CAN_CALC_MAX_ERROR constant.

Keeping the existing logic unchanged: Only when the bitrate error exceeds
5% should an error be returned. Otherwise, simply output a warning log.

Fixes: b360a13d44db ("can: dev: print bitrate error with two decimal digits")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://patch.msgid.link/20260306-can-fix-v1-1-ac526cec6777@nxp.com
Cc: stable@kernel.org
[mkl: improve commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
6 weeks agomctp: i2c: fix skb memory leak in receive path
Haiyue Wang [Thu, 5 Mar 2026 14:32:34 +0000 (22:32 +0800)] 
mctp: i2c: fix skb memory leak in receive path

When 'midev->allow_rx' is false, the newly allocated skb isn't consumed
by netif_rx(), it needs to free the skb directly.

Fixes: f5b8abf9fc3d ("mctp i2c: MCTP I2C binding driver")
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Link: https://patch.msgid.link/20260305143240.97592-1-haiyuewa@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge branch 'net-enetc-fix-fallback-phy-address-handling-and-do-not-skip-setting...
Paolo Abeni [Tue, 10 Mar 2026 09:36:48 +0000 (10:36 +0100)] 
Merge branch 'net-enetc-fix-fallback-phy-address-handling-and-do-not-skip-setting-for-addr-0'

Wei Fang says:

====================
net: enetc: fix fallback PHY address handling and do not skip setting for addr 0

There are two potential issues when PHY address 0 is used on the board,
see the commit messages of the patches for more details.

v1: https://lore.kernel.org/imx/20260303103047.228005-1-wei.fang@nxp.com/
====================

Link: https://patch.msgid.link/20260305031211.904812-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: enetc: do not skip setting LaBCR[MDIO_PHYAD_PRTAD] for addr 0
Wei Fang [Thu, 5 Mar 2026 03:12:11 +0000 (11:12 +0800)] 
net: enetc: do not skip setting LaBCR[MDIO_PHYAD_PRTAD] for addr 0

Given that some platforms may use PHY address 0 (I suppose the PHY may
not treat address 0 as a broadcast address or default response address).
It is possible for some boards to connect multiple PHYs to the same
ENETC MAC, for example:

  - a PHY with a non-zero address connects to ENETC MAC through SGMII
    interface (selected via DTS_A)
  - a PHY with address 0 connects to ENETC MAC through RGMII interface
    (selected via DTS_B)

For the case where the ENETC port MDIO is used to manage the PHY, when
switching from DTS_A to DTS_B via soft reboot, LaBCR[MDIO_PHYAD_PRTAD]
must be updated to 0 because the NETCMIX block is not reset during soft
reboot. However, the current driver explicitly skips configuring address
0, causing LaBCR[MDIO_PHYAD_PRTAD] to retain its old value.

Therefore, remove the special-case skip of PHY address 0 so that valid
configurations using address 0 are properly supported.

Fixes: 6633df05f3ad ("net: enetc: set the external PHY address in IERB for port MDIO usage")
Fixes: 50bfd9c06f0f ("net: enetc: set external PHY address in IERB for i.MX94 ENETC")
Reviewed-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260305031211.904812-3-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: enetc: fix incorrect fallback PHY address handling
Wei Fang [Thu, 5 Mar 2026 03:12:10 +0000 (11:12 +0800)] 
net: enetc: fix incorrect fallback PHY address handling

The current netc_get_phy_addr() implementation falls back to PHY address
0 when the "mdio" node or the PHY child node is missing. On i.MX95, this
causes failures when a real PHY is actually assigned address 0 and is
managed through the EMDIO interface. Because the bit 0 of phy_mask will
be set, leading imx95_enetc_mdio_phyaddr_config() to return an error, and
the netc_blk_ctrl driver probe subsequently fails. Fix this by returning
-ENODEV when neither an "mdio" node nor any PHY node is present, it means
that ENETC port MDIO is not used to manage the PHY, so there is no need
to configure LaBCR[MDIO_PHYAD_PRTAD].

Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Closes: https://lore.kernel.org/all/7825188.GXAFRqVoOG@steina-w
Fixes: 6633df05f3ad ("net: enetc: set the external PHY address in IERB for port MDIO usage")
Reviewed-by: Clark Wang <xiaoning.wang@nxp.com>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260305031211.904812-2-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agobnxt_en: Fix RSS table size check when changing ethtool channels
Pavan Chebbi [Fri, 6 Mar 2026 22:58:54 +0000 (14:58 -0800)] 
bnxt_en: Fix RSS table size check when changing ethtool channels

When changing channels, the current check in bnxt_set_channels()
is not checking for non-default RSS contexts when the RSS table size
changes. The current check for IFF_RXFH_CONFIGURED is only sufficient
for the default RSS context. Expand the check to include the presence
of any non-default RSS contexts.

Allowing such change will result in incorrect configuration of the
context's RSS table when the table size changes.

Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
Reported-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/netdev/20260303181535.2671734-1-bjorn@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260306225854.3575672-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch 'net-usb-lan78xx-accumulated-bug-fixes'
Jakub Kicinski [Tue, 10 Mar 2026 02:48:39 +0000 (19:48 -0700)] 
Merge branch 'net-usb-lan78xx-accumulated-bug-fixes'

Oleksij Rempel says:

====================
net: usb: lan78xx: accumulated bug fixes

This series contains a collection of standalone bug fixes for the
Microchip LAN78xx driver, addressing packet handling, TX statistics,
invalid register accesses, and a kernel warning during disconnect.
====================

Link: https://patch.msgid.link/20260305143429.530909-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: usb: lan78xx: fix WARN in __netif_napi_del_locked on disconnect
Oleksij Rempel [Thu, 5 Mar 2026 14:34:29 +0000 (15:34 +0100)] 
net: usb: lan78xx: fix WARN in __netif_napi_del_locked on disconnect

Remove redundant netif_napi_del() call from disconnect path.

A WARN may be triggered in __netif_napi_del_locked() during USB device
disconnect:

  WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350

This happens because netif_napi_del() is called in the disconnect path while
NAPI is still enabled. However, it is not necessary to call netif_napi_del()
explicitly, since unregister_netdev() will handle NAPI teardown automatically
and safely. Removing the redundant call avoids triggering the warning.

Full trace:
 lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV
 lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV
 lan78xx 1-1:1.0 enu1: Link is Down
 lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350
 Modules linked in: flexcan can_dev fuse
 CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT
 Hardware name: SKOV IMX8MP CPU revC - bd500 (DT)
 Workqueue: usb_hub_wq hub_event
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __netif_napi_del_locked+0x2b4/0x350
 lr : __netif_napi_del_locked+0x7c/0x350
 sp : ffffffc085b673c0
 x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8
 x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb
 x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000
 x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000
 x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028
 x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8
 x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000
 x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001
 x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000
 Call trace:
  __netif_napi_del_locked+0x2b4/0x350 (P)
  lan78xx_disconnect+0xf4/0x360
  usb_unbind_interface+0x158/0x718
  device_remove+0x100/0x150
  device_release_driver_internal+0x308/0x478
  device_release_driver+0x1c/0x30
  bus_remove_device+0x1a8/0x368
  device_del+0x2e0/0x7b0
  usb_disable_device+0x244/0x540
  usb_disconnect+0x220/0x758
  hub_event+0x105c/0x35e0
  process_one_work+0x760/0x17b0
  worker_thread+0x768/0xce8
  kthread+0x3bc/0x690
  ret_from_fork+0x10/0x20
 irq event stamp: 211604
 hardirqs last  enabled at (211603): [<ffffffc0828cc9ec>] _raw_spin_unlock_irqrestore+0x84/0x98
 hardirqs last disabled at (211604): [<ffffffc0828a9a84>] el1_dbg+0x24/0x80
 softirqs last  enabled at (211296): [<ffffffc080095f10>] handle_softirqs+0x820/0xbc8
 softirqs last disabled at (210993): [<ffffffc080010288>] __do_softirq+0x18/0x20
 ---[ end trace 0000000000000000 ]---
 lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0

Fixes: e110bc825897 ("net: usb: lan78xx: Convert to PHYLINK for improved PHY and MAC management")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: usb: lan78xx: skip LTM configuration for LAN7850
Oleksij Rempel [Thu, 5 Mar 2026 14:34:28 +0000 (15:34 +0100)] 
net: usb: lan78xx: skip LTM configuration for LAN7850

Do not configure Latency Tolerance Messaging (LTM) on USB 2.0 hardware.

The LAN7850 is a High-Speed (USB 2.0) only device and does not support
SuperSpeed features like LTM. Currently, the driver unconditionally
attempts to configure LTM registers during initialization. On the
LAN7850, these registers do not exist, resulting in writes to invalid
or undocumented memory space.

This issue was identified during a port to the regmap API with strict
register validation enabled. While no functional issues or crashes have
been observed from these invalid writes, bypassing LTM initialization
on the LAN7850 ensures the driver strictly adheres to the hardware's
valid register map.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: usb: lan78xx: fix TX byte statistics for small packets
Oleksij Rempel [Thu, 5 Mar 2026 14:34:27 +0000 (15:34 +0100)] 
net: usb: lan78xx: fix TX byte statistics for small packets

Account for hardware auto-padding in TX byte counters to reflect actual
wire traffic.

The LAN7850 hardware automatically pads undersized frames to the minimum
Ethernet frame length (ETH_ZLEN, 60 bytes). However, the driver tracks
the network statistics based on the unpadded socket buffer length. This
results in the tx_bytes counter under-reporting the actual physical
bytes placed on the Ethernet wire for small packets (like short ARP or
ICMP requests).

Use max_t() to ensure the transmission statistics accurately account for
the hardware-generated padding.

Fixes: d383216a7efe ("lan78xx: Introduce Tx URB processing improvements")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet: usb: lan78xx: fix silent drop of packets with checksum errors
Oleksij Rempel [Thu, 5 Mar 2026 14:34:26 +0000 (15:34 +0100)] 
net: usb: lan78xx: fix silent drop of packets with checksum errors

Do not drop packets with checksum errors at the USB driver level;
pass them to the network stack.

Previously, the driver dropped all packets where the 'Receive Error
Detected' (RED) bit was set, regardless of the specific error type. This
caused packets with only IP or TCP/UDP checksum errors to be dropped
before reaching the kernel, preventing the network stack from accounting
for them or performing software fallback.

Add a mask for hard hardware errors to safely drop genuinely corrupt
frames, while allowing checksum-errored frames to pass with their
ip_summed field explicitly set to CHECKSUM_NONE.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>