git.ipfire.org Git - thirdparty/kernel/linux.git/log

net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()

In the non-RT kernel, local_bh_disable() merely disables preemption,
whereas it maps to an actual spin lock in the RT kernel. Consequently,
when attempting to refill RX buffers via netdev_alloc_skb() in
macb_mac_link_up(), a deadlock scenario arises as follows:

   WARNING: possible circular locking dependency detected
   6.18.0-08691-g2061f18ad76e #39 Not tainted
   ------------------------------------------------------
   kworker/0:0/8 is trying to acquire lock:
   ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c

   but task is already holding lock:
   ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit
   +0x148/0xb7c

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (&queue->tx_ptr_lock){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x148/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #2 (_xmit_ETHER#2){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          sch_direct_xmit+0x11c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}:
          lock_release+0x250/0x348
          __local_bh_enable_ip+0x7c/0x240
          __netdev_alloc_skb+0x1b4/0x1d8
          gem_rx_refill+0xdc/0x240
          gem_init_rings+0xb4/0x108
          macb_mac_link_up+0x9c/0x2b4
          phylink_resolve+0x170/0x614
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #0 (&bp->lock){+.+.}-{3:3}:
          __lock_acquire+0x15a8/0x2084
          lock_acquire+0x1cc/0x350
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x808/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&queue->tx_ptr_lock);
                                  lock(_xmit_ETHER#2);
                                  lock(&queue->tx_ptr_lock);
     lock(&bp->lock);

    *** DEADLOCK ***

   Call trace:
    show_stack+0x18/0x24 (C)
    dump_stack_lvl+0xa0/0xf0
    dump_stack+0x18/0x24
    print_circular_bug+0x28c/0x370
    check_noncircular+0x198/0x1ac
    __lock_acquire+0x15a8/0x2084
    lock_acquire+0x1cc/0x350
    rt_spin_lock+0x50/0x1f0
    macb_start_xmit+0x808/0xb7c
    dev_hard_start_xmit+0x94/0x284
    sch_direct_xmit+0x8c/0x37c
    __dev_queue_xmit+0x708/0x1120
    neigh_resolve_output+0x148/0x28c
    ip6_finish_output2+0x2c0/0xb2c
    __ip6_finish_output+0x114/0x308
    ip6_output+0xc4/0x4a4
    mld_sendpack+0x220/0x68c
    mld_ifc_work+0x2a8/0x4f4
    process_one_work+0x20c/0x5f8
    worker_thread+0x1b0/0x35c
    kthread+0x144/0x200
    ret_from_fork+0x10/0x20

Notably, invoking the mog_init_rings() callback upon link establishment
is unnecessary. Instead, we can exclusively call mog_init_rings() within
the ndo_open() callback. This adjustment resolves the deadlock issue.
Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings()
when opening the network interface via at91ether_open(), moving
mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC
check.

Fixes: 633e98a711ac ("net: macb: use resolved link config in mac_link_up()")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Hao <kexin.hao@windriver.com>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://patch.msgid.link/20251222015624.1994551-1-xiaolei.wang@windriver.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: fib_test: Add test case for ipv4 multi nexthops

The test checks that with multi nexthops route the preferred route is the
one which matches source ip. In case when source ip is on dummy
interface, it checks that the routes are balanced.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251221192639.3911901-2-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fib: restore ECMP balance from loopback

Preference of nexthop with source address broke ECMP for packets with
source addresses which are not in the broadcast domain, but rather added
to loopback/dummy interfaces. Original behaviour was to balance over
nexthops while now it uses the latest nexthop from the group. To fix the
issue introduce next hop scoring system where next hops with source
address equal to requested will always have higher priority.

For the case with 198.51.100.1/32 assigned to dummy0 and routed using
192.0.2.0/24 and 203.0.113.0/24 networks:

2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff
    inet 198.51.100.1/32 scope global dummy0
       valid_lft forever preferred_lft forever
7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.0.2.2/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
9: veth3@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 203.0.113.2/24 scope global veth3
       valid_lft forever preferred_lft forever
    inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

~ ip ro list:
default
nexthop via 192.0.2.1 dev veth1 weight 1
nexthop via 203.0.113.1 dev veth3 weight 1
192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2
203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2

before:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    255 veth3

after:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    122 veth1
    133 veth3

Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251221192639.3911901-1-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: fib_nexthops: Add test cases for error routes deletion

Add test cases that check that error routes (e.g., blackhole) are
deleted when their nexthop is deleted.

Output without "ipv4: Fix reference count leak when using error routes
with nexthop objects":

# ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal"

IPv4 functional
----------------------
[...]
       WARNING: Unexpected route entry
TEST: Error route removed on nexthop deletion                       [FAIL]

IPv6
----------------------
[...]
TEST: Error route removed on nexthop deletion                       [ OK ]

Tests passed:  20
Tests failed:   1
Tests skipped:  0

Output with "ipv4: Fix reference count leak when using error routes
with nexthop objects":

# ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal"

IPv4 functional
----------------------
[...]
TEST: Error route removed on nexthop deletion                       [ OK ]

IPv6
----------------------
[...]
TEST: Error route removed on nexthop deletion                       [ OK ]

Tests passed:  21
Tests failed:   0
Tests skipped:  0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20251221144829.197694-2-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ipv4: Fix reference count leak when using error routes with nexthop objects

When a nexthop object is deleted, it is marked as dead and then
fib_table_flush() is called to flush all the routes that are using the
dead nexthop.

The current logic in fib_table_flush() is to only flush error routes
(e.g., blackhole) when it is called as part of network namespace
dismantle (i.e., with flush_all=true). Therefore, error routes are not
flushed when their nexthop object is deleted:

# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip route add 198.51.100.1/32 nhid 1
# ip route add blackhole 198.51.100.2/32 nhid 1
# ip nexthop del id 1
# ip route show
blackhole 198.51.100.2 nhid 1 dev dummy1

As such, they keep holding a reference on the nexthop object which in
turn holds a reference on the nexthop device, resulting in a reference
count leak:

# ip link del dev dummy1
[ 70.516258] unregister_netdevice: waiting for dummy1 to become free. Usage count = 2

Fix by flushing error routes when their nexthop is marked as dead.

IPv6 does not suffer from this problem.

Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Closes: https://lore.kernel.org/netdev/d943f806-4da6-4970-ac28-b9373b0e63ac@I-love.SAKURA.ne.jp/
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20251221144829.197694-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: sr9700: fix incorrect command used to write single register

This fixes the device failing to initialize with "error reading MAC
address" for me, probably because the incorrect write of NCR_RST to
SR_NCR is not actually resetting the device.

Fixes: c9b37458e95629b1d1171457afdcc1bf1eb7881d ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Cc: stable@vger.kernel.org
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20251221082400.50688-1-enelsonmoore@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr()

There exists a kernel oops caused by a BUG_ON(nhead < 0) at
net/core/skbuff.c:2232 in pskb_expand_head().
This bug is triggered as part of the calipso_skbuff_setattr()
routine when skb_cow() is passed headroom > INT_MAX
(i.e. (int)(skb_headroom(skb) + len_delta) < 0).

The root cause of the bug is due to an implicit integer cast in
__skb_cow(). The check (headroom > skb_headroom(skb)) is meant to ensure
that delta = headroom - skb_headroom(skb) is never negative, otherwise
we will trigger a BUG_ON in pskb_expand_head(). However, if
headroom > INT_MAX and delta <= -NET_SKB_PAD, the check passes, delta
becomes negative, and pskb_expand_head() is passed a negative value for
nhead.

Fix the trigger condition in calipso_skbuff_setattr(). Avoid passing
"negative" headroom sizes to skb_cow() within calipso_skbuff_setattr()
by only using skb_cow() to grow headroom.

PoC:
Using `netlabelctl` tool:

        netlabelctl map del default
        netlabelctl calipso add pass doi:7
        netlabelctl map add default address:0::1/128 protocol:calipso,7

        Then run the following PoC:

        int fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP);

        // setup msghdr
        int cmsg_size = 2;
        int cmsg_len = 0x60;
        struct msghdr msg;
        struct sockaddr_in6 dest_addr;
        struct cmsghdr * cmsg = (struct cmsghdr *) calloc(1,
                        sizeof(struct cmsghdr) + cmsg_len);
        msg.msg_name = &dest_addr;
        msg.msg_namelen = sizeof(dest_addr);
        msg.msg_iov = NULL;
        msg.msg_iovlen = 0;
        msg.msg_control = cmsg;
        msg.msg_controllen = cmsg_len;
        msg.msg_flags = 0;

        // setup sockaddr
        dest_addr.sin6_family = AF_INET6;
        dest_addr.sin6_port = htons(31337);
        dest_addr.sin6_flowinfo = htonl(31337);
        dest_addr.sin6_addr = in6addr_loopback;
        dest_addr.sin6_scope_id = 31337;

        // setup cmsghdr
        cmsg->cmsg_len = cmsg_len;
        cmsg->cmsg_level = IPPROTO_IPV6;
        cmsg->cmsg_type = IPV6_HOPOPTS;
        char * hop_hdr = (char *)cmsg + sizeof(struct cmsghdr);
        hop_hdr[1] = 0x9; //set hop size - (0x9 + 1) * 8 = 80

        sendmsg(fd, &msg, 0);

Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.")
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Will Rosenberg <whrosenb@asu.edu>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20251219173637.797418-1-whrosenb@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

usbnet: avoid a possible crash in dql_completed()

syzbot reported a crash [1] in dql_completed() after recent usbnet
BQL adoption.

The reason for the crash is that netdev_reset_queue() is called too soon.

It should be called after cancel_work_sync(&dev->bh_work) to make
sure no more TX completion can happen.

[1]
kernel BUG at lib/dynamic_queue_limits.c:99 !
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 5197 Comm: udevd Tainted: G             L      syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:dql_completed+0xbe1/0xbf0 lib/dynamic_queue_limits.c:99
Call Trace:
<IRQ>
  netdev_tx_completed_queue include/linux/netdevice.h:3864 [inline]
  netdev_completed_queue include/linux/netdevice.h:3894 [inline]
  usbnet_bh+0x793/0x1020 drivers/net/usb/usbnet.c:1601
  process_one_work kernel/workqueue.c:3257 [inline]
  process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
  bh_worker+0x2b1/0x600 kernel/workqueue.c:3611
  tasklet_action+0xc/0x70 kernel/softirq.c:952
  handle_softirqs+0x27d/0x850 kernel/softirq.c:622
  __do_softirq kernel/softirq.c:656 [inline]
  invoke_softirq kernel/softirq.c:496 [inline]
  __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:723
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:739

Fixes: 7ff14c52049e ("usbnet: Add support for Byte Queue Limits (BQL)")
Reported-by: syzbot+5b55e49f8bbd84631a9c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6945644f.a70a0220.207337.0113.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Simon Schippers <simon.schippers@tu-dortmund.de>
Link: https://patch.msgid.link/20251219144459.692715-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

gve: defer interrupt enabling until NAPI registration

Currently, interrupts are automatically enabled immediately upon
request. This allows interrupt to fire before the associated NAPI
context is fully initialized and cause failures like below:

[    0.946369] Call Trace:
[    0.946369]  <IRQ>
[    0.946369]  __napi_poll+0x2a/0x1e0
[    0.946369]  net_rx_action+0x2f9/0x3f0
[    0.946369]  handle_softirqs+0xd6/0x2c0
[    0.946369]  ? handle_edge_irq+0xc1/0x1b0
[    0.946369]  __irq_exit_rcu+0xc3/0xe0
[    0.946369]  common_interrupt+0x81/0xa0
[    0.946369]  </IRQ>
[    0.946369]  <TASK>
[    0.946369]  asm_common_interrupt+0x22/0x40
[    0.946369] RIP: 0010:pv_native_safe_halt+0xb/0x10

Use the `IRQF_NO_AUTOEN` flag when requesting interrupts to prevent auto
enablement and explicitly enable the interrupt in NAPI initialization
path (and disable it during NAPI teardown).

This ensures that interrupt lifecycle is strictly coupled with
readiness of NAPI context.

Cc: stable@vger.kernel.org
Fixes: 1dfc2e46117e ("gve: Refactor napi add and remove functions")
Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20251219102945.2193617-1-hramamurthy@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: fix the crash issue for zero copy XDP_TX action

There is a crash issue when running zero copy XDP_TX action, the crash
log is shown below.

[  216.122464] Unable to handle kernel paging request at virtual address fffeffff80000000
[  216.187524] Internal error: Oops: 0000000096000144 [#1]  SMP
[  216.301694] Call trace:
[  216.304130]  dcache_clean_poc+0x20/0x38 (P)
[  216.308308]  __dma_sync_single_for_device+0x1bc/0x1e0
[  216.313351]  stmmac_xdp_xmit_xdpf+0x354/0x400
[  216.317701]  __stmmac_xdp_run_prog+0x164/0x368
[  216.322139]  stmmac_napi_poll_rxtx+0xba8/0xf00
[  216.326576]  __napi_poll+0x40/0x218
[  216.408054] Kernel panic - not syncing: Oops: Fatal exception in interrupt

For XDP_TX action, the xdp_buff is converted to xdp_frame by
xdp_convert_buff_to_frame(). The memory type of the resulting xdp_frame
depends on the memory type of the xdp_buff. For page pool based xdp_buff
it produces xdp_frame with memory type MEM_TYPE_PAGE_POOL. For zero copy
XSK pool based xdp_buff it produces xdp_frame with memory type
MEM_TYPE_PAGE_ORDER0. However, stmmac_xdp_xmit_back() does not check the
memory type and always uses the page pool type, this leads to invalid
mappings and causes the crash. Therefore, check the xdp_buff memory type
in stmmac_xdp_xmit_back() to fix this issue.

Fixes: bba2556efad6 ("net: stmmac: Enable RX via AF_XDP zero-copy")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20251204071332.1907111-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'wireless-2025-12-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Various fixes all over, most are recent regressions but
also some long-standing issues:

- cfg80211:
    - fix an issue with overly long SSIDs

- mac80211:
    - long-standing beacon protection issue on some devices
    - for for a multi-BSSID AP-side issue
    - fix a syzbot warning on OCB (not really used in practice)
    - remove WARN on connections using disabled channels,
      as that can happen due to changes in the disable flag
    - fix monitor mode list iteration

- iwlwifi:
    - fix firmware loading on certain (really old) devices
    - add settime64 to PTP clock to avoid a warning and clock
      registration failure, but it's not actually supported

- rtw88:
    - remove WQ_UNBOUND since it broke USB adapters
      (because it can't be used with WQ_BH)
    - fix SDIO issues with certain devices

- rtl8192cu: fix TID array out-of-bounds (since 6.9)

- wlcore (TI): add missing skb push headroom increase

* tag 'wireless-2025-12-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: Implement settime64 as stub for MVM/MLD PTP
  wifi: iwlwifi: Fix firmware version handling
  wifi: mac80211: ocb: skip rx_no_sta when interface is not joined
  wifi: mac80211: do not use old MBSSID elements
  wifi: mac80211: don't WARN for connections on invalid channels
  wifi: wlcore: ensure skb headroom before skb_push
  wifi: cfg80211: sme: store capped length in __cfg80211_connect_result()
  wifi: mac80211: fix list iteration in ieee80211_add_virtual_monitor()
  wifi: mac80211: Discard Beacon frames to non-broadcast address
  Revert "wifi: rtw88: add WQ_UNBOUND to alloc_workqueue users"
  wifi: rtlwifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc()
  wifi: rtw88: limit indirect IO under powered off for RTL8822CS
====================

Link: https://patch.msgid.link/20251217201441.59876-3-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-pf: fix "UBSAN: shift-out-of-bounds error"

This patch ensures that the RX ring size (rx_pending) is not
set below the permitted length. This avoids UBSAN
shift-out-of-bounds errors when users passes small or zero
ring sizes via ethtool -G.

Fixes: d45d8979840d ("octeontx2-pf: Add basic ethtool support")
Signed-off-by: Anshumali Gaur <agaur@marvell.com>
Link: https://patch.msgid.link/20251219062226.524844-1-agaur@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

af_unix: don't post cmsg for SO_INQ unless explicitly asked for

A previous commit added SO_INQ support for AF_UNIX (SOCK_STREAM), but it
posts a SCM_INQ cmsg even if just msg->msg_get_inq is set. This is
incorrect, as ->msg_get_inq is just the caller asking for the remainder
to be passed back in msg->msg_inq, it has nothing to do with cmsg. The
original commit states that this is done to make sockets
io_uring-friendly", but it's actually incorrect as io_uring doesn't use
cmsg headers internally at all, and it's actively wrong as this means
that cmsg's are always posted if someone does recvmsg via io_uring.

Fix that up by only posting a cmsg if u->recvmsg_inq is set.

Additionally, mirror how TCP handles inquiry handling in that it should
only be done for a successful return. This makes the logic for the two
identical.

Cc: stable@vger.kernel.org
Fixes: df30285b3670 ("af_unix: Introduce SO_INQ.")
Reported-by: Julian Orth <ju.orth@gmail.com>
Link: https://github.com/axboe/liburing/issues/1509
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/07adc0c2-2c3b-4d08-8af1-1c466a40b6a8@kernel.dk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Fix use-after-free in reset service rescan path

When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().

mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
invoke the driver's remove path and free the gdma_context associated
with the device. After returning, mana_serv_reset() currently jumps to
the out label and attempts to clear gc->in_service, dereferencing a
freed gdma_context.

The issue was observed with the following call logs:
[  698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d
[  698.943121] #PF: supervisor write access in kernel mode
[  698.943423] #PF: error_code(0x0002) - not-present page
[S[  698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0
0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI
tvsc f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
...
[Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16
000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000
[Sat Dec  6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405
[Sat Dec  6 07:50:56 2025] FS:  0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000
[Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Dec  6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0
[Sat Dec  6 07:50:56 2025] Call Trace:
[Sat Dec  6 07:50:56 2025]  <TASK>
[Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana]
[Sat Dec  6 07:50:56 2025]  process_one_work+0x190/0x350
[Sat Dec  6 07:50:56 2025]  worker_thread+0x2b7/0x3d0
[Sat Dec  6 07:50:56 2025]  kthread+0xf3/0x200
[Sat Dec  6 07:50:56 2025]  ? __pfx_worker_thread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork+0x21a/0x250
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork_asm+0x1a/0x30
[Sat Dec  6 07:50:56 2025]  </TASK>

Fix this by returning immediately after mana_serv_rescan() to avoid
accessing GC state that may no longer be valid.

Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when probing the device")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20251218131054.GA3173@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: avoid prefetching NULL pointers

Aditya Gupta reported PowerPC crashes bisected to the blamed commit.

Apparently some platforms do not allow prefetch() on arbitrary pointers.

prefetch(next);
prefetch(&next->priority); // CRASH when next == NULL

Only NULL seems to be supported, with specific handling in prefetch().

Add a conditional to avoid the two prefetches and the skb->next clearing
for the last skb in the list.

Fixes: b2e9821cff6c ("net: prefech skb->priority in __dev_xmit_skb()")
Reported-by: Aditya Gupta <adityag@linux.ibm.com>
Closes: https://lore.kernel.org/netdev/e9f4abee-b132-440f-a50e-bced0868b5a7@linux.ibm.com/T/#mddc372b64ec5a3b181acc9ee3909110c391cc18a
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Aditya Gupta <adityag@linux.ibm.com>
Link: https://patch.msgid.link/20251218081844.809008-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct

Sphinx reports kernel-doc warning:

WARNING: ./net/bridge/br_private.h:267 struct member 'tunnel_hash' not described in 'net_bridge_vlan_group'

Fix it by describing @tunnel_hash member.

Fixes: efa5356b0d9753 ("bridge: per vlan dst_metadata netlink support")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251218042936.24175-2-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write

A deadlock can occur between nfc_unregister_device() and rfkill_fop_write()
due to lock ordering inversion between device_lock and rfkill_global_mutex.

The problematic lock order is:

Thread A (rfkill_fop_write):
  rfkill_fop_write()
    mutex_lock(&rfkill_global_mutex)
      rfkill_set_block()
        nfc_rfkill_set_block()
          nfc_dev_down()
            device_lock(&dev->dev)    <- waits for device_lock

Thread B (nfc_unregister_device):
  nfc_unregister_device()
    device_lock(&dev->dev)
      rfkill_unregister()
        mutex_lock(&rfkill_global_mutex)  <- waits for rfkill_global_mutex

This creates a classic ABBA deadlock scenario.

Fix this by moving rfkill_unregister() and rfkill_destroy() outside the
device_lock critical section. Store the rfkill pointer in a local variable
before releasing the lock, then call rfkill_unregister() after releasing
device_lock.

This change is safe because rfkill_fop_write() holds rfkill_global_mutex
while calling the rfkill callbacks, and rfkill_unregister() also acquires
rfkill_global_mutex before cleanup. Therefore, rfkill_unregister() will
wait for any ongoing callback to complete before proceeding, and
device_del() is only called after rfkill_unregister() returns, preventing
any use-after-free.

The similar lock ordering in nfc_register_device() (device_lock ->
rfkill_global_mutex via rfkill_register) is safe because during
registration the device is not yet in rfkill_list, so no concurrent
rfkill operations can occur on this device.

Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device")
Cc: stable@vger.kernel.org
Reported-by: syzbot+4ef89409a235d804c6c2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4ef89409a235d804c6c2
Link: https://lore.kernel.org/all/20251217054908.178907-1-kartikey406@gmail.com/T/
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20251218012355.279940-1-kartikey406@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: asix: validate PHY address before use

The ASIX driver reads the PHY address from the USB device via
asix_read_phy_addr(). A malicious or faulty device can return an
invalid address (>= PHY_MAX_ADDR), which causes a warning in
mdiobus_get_phy():

addr 207 out of range
WARNING: drivers/net/phy/mdio_bus.c:76

Validate the PHY address in asix_read_phy_addr() and remove the
now-redundant check in ax88172a.c.

Reported-by: syzbot+3d43c9066a5b54902232@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3d43c9066a5b54902232
Tested-by: syzbot+3d43c9066a5b54902232@syzkaller.appspotmail.com
Fixes: 7e88b11a862a ("net: usb: asix: refactor asix_read_phy_addr() and handle errors on return")
Link: https://lore.kernel.org/all/20251217085057.270704-1-kartikey406@gmail.com/T/
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251218011156.276824-1-kartikey406@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: rtl9300: use scoped for loops

Currently in the return path, fwnode_handle_put calls are missing. Just use
_scoped to avoid the issue.

Fixes: 24e31e474769 ("net: mdio: Add RTL9300 MDIO driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20251217210153.14641-1-rosenp@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: b53: skip multicast entries for fdb_dump()

port_fdb_dump() is supposed to only add fdb entries, but we iterate over
the full ARL table, which also includes multicast entries.

So check if the entry is a multicast entry before passing it on to the
callback().

Additionally, the port of those entries is a bitmask, not a port number,
so any included entries would have even be for the wrong port.

Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251217205756.172123-1-jonas.gorski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-12-17 (i40e, iavf, idpf, e1000)

For i40e:
Przemyslaw immediately schedules service task following changes to
filters to ensure timely setup for PTP.

Gregory Herrero adjusts VF descriptor size checks to be device specific.

For iavf:
Kohei Enju corrects a couple of condition checks which caused off-by-one
issues.

For idpf:
Larysa fixes LAN memory region call to follow expected requirements.

Brian Vazquez reduces mailbox wait time during init to avoid lengthy
delays.

For e1000:
Guangshuo Li adds validation of data length to prevent out-of-bounds
access.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000: fix OOB in e1000_tbi_should_accept()
  idpf: reduce mbx_task schedule delay to 300us
  idpf: fix LAN memory regions command on some NVMs
  iavf: fix off-by-one issues in iavf_config_rss_reg()
  i40e: validate ring_len parameter against hardware-specific values
  i40e: fix scheduling in set_rx_mode
====================

Link: https://patch.msgid.link/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bng_en: update module description

The Broadcom BCM57708/800G NIC family is branded as ThorUltra.
Update the driver description accordingly.

Fixes: 74715c4ab0fa0 ("bng_en: Add PCI interface")
Signed-off-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Link: https://patch.msgid.link/20251217104748.3004706-1-rajashekar.hudumula@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

smc91x: fix broken irq-context in PREEMPT_RT

When smc91x.c is built with PREEMPT_RT, the following splat occurs
in FVP_RevC:

[   13.055000] smc91x LNRO0003:00 eth0: link up, 10Mbps, half-duplex, lpa 0x0000
[   13.062137] BUG: workqueue leaked atomic, lock or RCU: kworker/2:1[106]
[   13.062137]      preempt=0x00000000 lock=0->0 RCU=0->1 workfn=mld_ifc_work
[   13.062266] C
** replaying previous printk message **
[   13.062266] CPU: 2 UID: 0 PID: 106 Comm: kworker/2:1 Not tainted 6.18.0-dirty #179 PREEMPT_{RT,(full)}
[   13.062353] Hardware name:  , BIOS
[   13.062382] Workqueue: mld mld_ifc_work
[   13.062469] Call trace:
[   13.062494]  show_stack+0x24/0x40 (C)
[   13.062602]  __dump_stack+0x28/0x48
[   13.062710]  dump_stack_lvl+0x7c/0xb0
[   13.062818]  dump_stack+0x18/0x34
[   13.062926]  process_scheduled_works+0x294/0x450
[   13.063043]  worker_thread+0x260/0x3d8
[   13.063124]  kthread+0x1c4/0x228
[   13.063235]  ret_from_fork+0x10/0x20

This happens because smc_special_trylock() disables IRQs even on PREEMPT_RT,
but smc_special_unlock() does not restore IRQs on PREEMPT_RT.
The reason is that smc_special_unlock() calls spin_unlock_irqrestore(),
and rcu_read_unlock_bh() in __dev_queue_xmit() cannot invoke
rcu_read_unlock() through __local_bh_enable_ip() when current->softirq_disable_cnt becomes zero.

To address this issue, replace smc_special_trylock() with spin_trylock_irqsave().

Fixes: 342a93247e08 ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251217085115.1730036-1-yeoreum.yun@arm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'for-net-2025-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- MGMT: report BIS capability flags in supported settings
- btusb: revert use of devm_kzalloc in btusb

* tag 'for-net-2025-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btusb: revert use of devm_kzalloc in btusb
Bluetooth: MGMT: report BIS capability flags in supported settings
====================

Link: https://patch.msgid.link/20251219223118.90141-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: wangxun: move PHYLINK dependency

The LIBWX library code is what calls into phylink, so any user of
it has to select CONFIG_PHYLINK at the moment, with NGBEVF missing this:

x86_64-linux-ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_nway_reset':
wx_ethtool.c:(.text+0x613): undefined reference to `phylink_ethtool_nway_reset'
x86_64-linux-ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_get_link_ksettings':
wx_ethtool.c:(.text+0x62b): undefined reference to `phylink_ethtool_ksettings_get'
x86_64-linux-ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_set_link_ksettings':
wx_ethtool.c:(.text+0x643): undefined reference to `phylink_ethtool_ksettings_set'
x86_64-linux-ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_get_pauseparam':
wx_ethtool.c:(.text+0x65b): undefined reference to `phylink_ethtool_get_pauseparam'
x86_64-linux-ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_set_pauseparam':
wx_ethtool.c:(.text+0x677): undefined reference to `phylink_ethtool_set_pauseparam'

Add the 'select PHYLINK' line in the libwx option directly so this will
always be enabled for all current and future wangxun drivers, and remove
the now duplicate lines.

Fixes: a0008a3658a3 ("net: wangxun: add ngbevf build")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251216213547.115026-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: fix "buffer overflow detected" for tap.c

When the selftest 'tap.c' is compiled with '-D_FORTIFY_SOURCE=3',
the strcpy() in rtattr_add_strsz() is replaced with a checked
version which causes the test to consistently fail when compiled
with toolchains for which this option is enabled by default.

TAP version 13
1..3
# Starting 3 tests from 1 test cases.
#  RUN           tap.test_packet_valid_udp_gso ...
*** buffer overflow detected ***: terminated
# test_packet_valid_udp_gso: Test terminated by assertion
#          FAIL  tap.test_packet_valid_udp_gso
not ok 1 tap.test_packet_valid_udp_gso
#  RUN           tap.test_packet_valid_udp_csum ...
*** buffer overflow detected ***: terminated
# test_packet_valid_udp_csum: Test terminated by assertion
#          FAIL  tap.test_packet_valid_udp_csum
not ok 2 tap.test_packet_valid_udp_csum
#  RUN           tap.test_packet_crash_tap_invalid_eth_proto ...
*** buffer overflow detected ***: terminated
# test_packet_crash_tap_invalid_eth_proto: Test terminated by assertion
#          FAIL  tap.test_packet_crash_tap_invalid_eth_proto
not ok 3 tap.test_packet_crash_tap_invalid_eth_proto
# FAILED: 0 / 3 tests passed.
# Totals: pass:0 fail:3 xfail:0 xpass:0 skip:0 error:0

A buffer overflow is detected by the fortified glibc __strcpy_chk()
since the __builtin_object_size() of `RTA_DATA(rta)` is incorrectly
reported as 1, even though there is ample space in its bounding
buffer `req`.

Additionally, given that IFLA_IFNAME also expects a null-terminated
string, callers of rtaddr_add_str{,sz}() could simply use the
rtaddr_add_strsz() variant. (which has been renamed to remove the
trailing `sz`) memset() has been used for this function since it
is unchecked and thus circumvents the issue discussed in the
previous paragraph.

Fixes: 2e64fe4624d1 ("selftests: add few test cases for tap driver")
Signed-off-by: Alice C. Munduruca <alice.munduruca@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251216170641.250494-1-alice.munduruca@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: rtl8150: fix memory leak on usb_submit_urb() failure

In async_set_registers(), when usb_submit_urb() fails, the allocated
  async_req structure and URB are not freed, causing a memory leak.

  The completion callback async_set_reg_cb() is responsible for freeing
  these allocations, but it is only called after the URB is successfully
  submitted and completes (successfully or with error). If submission
  fails, the callback never runs and the memory is leaked.

  Fix this by freeing both the URB and the request structure in the error
  path when usb_submit_urb() fails.

Reported-by: syzbot+8dd915c7cb0490fc8c52@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8dd915c7cb0490fc8c52
Fixes: 4d12997a9bb3 ("drivers: net: usb: rtl8150: concurrent URB bugfix")
Signed-off-by: Deepakkumar Karn <dkarn@redhat.com>
Link: https://patch.msgid.link/20251216151304.59865-2-dkarn@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'selftests-drv-net-psp-fix-templated-test-names-in-psp-py'

Daniel Zahka says:

====================
selftests: drv-net: psp: fix templated test names in psp.py

The templated test names in psp.py had a bug that was not exposed
until 80970e0fc07e ("selftests: net: py: extract the case generation
logic") changed the order of test case evaluation and test case name
extraction.

The test cases created in psp_ip_ver_test_builder() and
ipver_test_builder() were only assigning formatted names to the test
cases they returned, when the test itself was run. This series moves
the test case naming to the point where the test function is created.

Using netdevsim psp:
Before:
./tools/testing/selftests/drivers/net/psp.py
  TAP version 13
  1..28
  ok 1 psp.test_case
  ok 2 psp.test_case
  ok 3 psp.test_case
  ok 4 psp.test_case
  ok 5 psp.test_case
  ok 6 psp.test_case
  ok 7 psp.test_case
  ok 8 psp.test_case
  ok 9 psp.test_case
  ok 10 psp.test_case
  ok 11 psp.dev_list_devices
  ...
  ok 28 psp.removal_device_bi
  # Totals: pass:28 fail:0 xfail:0 xpass:0 skip:0 error:0
  #
  # Responder logs (0):
  # STDERR:
  #  Set PSP enable on device 3 to 0xf
  #  Set PSP enable on device 3 to 0x0

After:
./tools/testing/selftests/drivers/net/psp.py
  TAP version 13
  1..28
  ok 1 psp.data_basic_send_v0_ip4
  ok 2 psp.data_basic_send_v0_ip6
  ok 3 psp.data_basic_send_v1_ip4
  ok 4 psp.data_basic_send_v1_ip6
  ok 5 psp.data_basic_send_v2_ip4
  ok 6 psp.data_basic_send_v2_ip6
  ok 7 psp.data_basic_send_v3_ip4
  ok 8 psp.data_basic_send_v3_ip6
  ok 9 psp.data_mss_adjust_ip4
  ok 10 psp.data_mss_adjust_ip6
  ok 11 psp.dev_list_devices
  ...
  ok 28 psp.removal_device_bi
  # Totals: pass:28 fail:0 xfail:0 xpass:0 skip:0 error:0
  #
  # Responder logs (0):
  # STDERR:
  #  Set PSP enable on device 3 to 0xf
  #  Set PSP enable on device 3 to 0x0

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
====================

Link: https://patch.msgid.link/20251216-psp-test-fix-v1-0-3b5a6dde186f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: drv-net: psp: fix test names in ipver_test_builder()

test_case will only take on the formatted name after being
called. This does not work with the way ksft_run() currently
works. Assign the name after the test_case is created.

Fixes: 81236c74dba6 ("selftests: drv-net: psp: add test for auto-adjusting TCP MSS")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251216-psp-test-fix-v1-2-3b5a6dde186f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: drv-net: psp: fix templated test names in psp_ip_ver_test_builder()

test_case will only take on its formatted name after it is called by
the test runner. Move the assignment to test_case.__name__ to when the
test_case is constructed, not called.

Fixes: 8f90dc6e417a ("selftests: drv-net: psp: add basic data transfer and key rotation tests")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251216-psp-test-fix-v1-1-3b5a6dde186f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

amd-xgbe: reset retries and mode on RX adapt failures

During the stress tests, early RX adaptation handshakes can fail, such
as missing the RX_ADAPT ACK or not receiving a coefficient update before
block lock is established. Continuing to retry RX adaptation in this
state is often ineffective if the current mode selection is not viable.

Resetting the RX adaptation retry counter when an RX_ADAPT request fails
to receive ACK or a coefficient update prior to block lock, and clearing
mode_set so the next bring-up performs a fresh mode selection rather
than looping on a likely invalid configuration.

Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://patch.msgid.link/20251215151728.311713-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: fix missing put_device() in dsa_tree_find_first_conduit()

of_find_net_device_by_node() searches net devices by their /sys/class/net/,
entry. It is documented in its kernel-doc that:

* If successful, returns a pointer to the net_device with the embedded
* struct device refcount incremented by one, or NULL on failure. The
* refcount must be dropped when done with the net_device.

We are missing a put_device(&conduit->dev) which we could place at the
end of dsa_tree_find_first_conduit(). But to explain why calling
put_device() right away is safe is the same as to explain why the chosen
solution is different.

The code is very poorly split: dsa_tree_find_first_conduit() was first
introduced in commit 95f510d0b792 ("net: dsa: allow the DSA master to be
seen and changed through rtnetlink") but was first used several commits
later, in commit acc43b7bf52a ("net: dsa: allow masters to join a LAG").

Assume there is a switch with 2 CPU ports and 2 conduits, eno2 and eno3.
When we create a LAG (bonding or team device) and place eno2 and eno3
beneath it, we create a 3rd conduit (the LAG device itself), but this is
slightly different than the first two.

Namely, the cpu_dp->conduit pointer of the CPU ports does not change,
and remains pointing towards the physical Ethernet controllers which are
now LAG ports. Only 2 things change:
- the LAG device has a dev->dsa_ptr which marks it as a DSA conduit
- dsa_port_to_conduit(user port) finds the LAG and not the physical
conduit, because of the dp->cpu_port_in_lag bit being set.

When the LAG device is destroyed, dsa_tree_migrate_ports_from_lag_conduit()
is called and this is where dsa_tree_find_first_conduit() kicks in.

This is the logical mistake and the reason why introducing code in one
patch and using it from another is bad practice. I didn't realize that I
don't have to call of_find_net_device_by_node() again; the cpu_dp->conduit
association was never undone, and is still available for direct (re)use.
There's only one concern - maybe the conduit disappeared in the
meantime, but the netdev_hold() call we made during dsa_port_parse_cpu()
(see previous change) ensures that this was not the case.

Therefore, fixing the code means reimplementing it in the simplest way.

I am blaming the time of use, since this is what "git blame" would show
if we were to monitor for the conduit's kobject's refcount remaining
elevated instead of being freed.

Tested on the NXP LS1028A, using the steps from
Documentation/networking/dsa/configuration.rst section "Affinity of user
ports to CPU ports", followed by (extra prints added by me):

$ ip link del bond0
mscc_felix 0000:00:00.5 swp3: Link is Down
bond0 (unregistering): (slave eno2): Releasing backup interface
fsl_enetc 0000:00:00.2 eno2: Link is Down
mscc_felix 0000:00:00.5 swp0: bond0 disappeared, migrating to eno2
mscc_felix 0000:00:00.5 swp1: bond0 disappeared, migrating to eno2
mscc_felix 0000:00:00.5 swp2: bond0 disappeared, migrating to eno2
mscc_felix 0000:00:00.5 swp3: bond0 disappeared, migrating to eno2

Fixes: acc43b7bf52a ("net: dsa: allow masters to join a LAG")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251215150236.3931670-2-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: properly keep track of conduit reference

Problem description
-------------------

DSA has a mumbo-jumbo of reference handling of the conduit net device
and its kobject which, sadly, is just wrong and doesn't make sense.

There are two distinct problems.

1. The OF path, which uses of_find_net_device_by_node(), never releases
   the elevated refcount on the conduit's kobject. Nominally, the OF and
   non-OF paths should result in objects having identical reference
   counts taken, and it is already suspicious that
   dsa_dev_to_net_device() has a put_device() call which is missing in
   dsa_port_parse_of(), but we can actually even verify that an issue
   exists. With CONFIG_DEBUG_KOBJECT_RELEASE=y, if we run this command
   "before" and "after" applying this patch:

(unbind the conduit driver for net device eno2)
echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind

we see these lines in the output diff which appear only with the patch
applied:

kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 0000000000000000 (delayed 1000)
kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000 (delayed 1000)

2. After we find the conduit interface one way (OF) or another (non-OF),
   it can get unregistered at any time, and DSA remains with a long-lived,
   but in this case stale, cpu_dp->conduit pointer. Holding the net
   device's underlying kobject isn't actually of much help, it just
   prevents it from being freed (but we never need that kobject
   directly). What helps us to prevent the net device from being
   unregistered is the parallel netdev reference mechanism (dev_hold()
   and dev_put()).

Actually we actually use that netdev tracker mechanism implicitly on
user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with
the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link().
But time still passes at DSA switch probe time between the initial
of_find_net_device_by_node() code and the user port creation time, time
during which the conduit could unregister itself and DSA wouldn't know
about it.

So we have to run of_find_net_device_by_node() under rtnl_lock() to
prevent that from happening, and release the lock only with the netdev
tracker having acquired the reference.

Do we need to keep the reference until dsa_unregister_switch() /
dsa_switch_shutdown()?
1: Maybe yes. A switch device will still be registered even if all user
   ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not
   make user port errors fatal"), and the cpu_dp->conduit pointers
   remain valid.  I haven't audited all call paths to see whether they
   will actually use the conduit in lack of any user port, but if they
   do, it seems safer to not rely on user ports for that reference.
2. Definitely yes. We support changing the conduit which a user port is
   associated to, and we can get into a situation where we've moved all
   user ports away from a conduit, thus no longer hold any reference to
   it via the net device tracker. But we shouldn't let it go nonetheless
   - see the next change in relation to dsa_tree_find_first_conduit()
   and LAG conduits which disappear.
   We have to be prepared to return to the physical conduit, so the CPU
   port must explicitly keep another reference to it. This is also to
   say: the user ports and their CPU ports may not always keep a
   reference to the same conduit net device, and both are needed.

As for the conduit's kobject for the /sys/class/net/ entry, we don't
care about it, we can release it as soon as we hold the net device
object itself.

History and blame attribution
-----------------------------

The code has been refactored so many times, it is very difficult to
follow and properly attribute a blame, but I'll try to make a short
history which I hope to be correct.

We have two distinct probing paths:
- one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add
  new binding implementation")
- one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa:
  Add support for platform data")

These are both complete rewrites of the original probing paths (which
used struct dsa_switch_driver and other weird stuff, instead of regular
devices on their respective buses for register access, like MDIO, SPI,
I2C etc):
- one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add
  device tree bindings to register DSA switches")
- one for non-OF, introduced in 2008 in commit 91da11f870f0 ("net:
  Distributed Switch Architecture protocol support")

except for tiny bits and pieces like dsa_dev_to_net_device() which were
seemingly carried over since the original commit, and used to this day.

The point is that the original probing paths received a fix in 2015 in
the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev
dev_put() calls"), but the fix never made it into the "new" (dsa2)
probing paths that can still be traced to today, and the fixed probing
path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove
legacy probing support").

That is to say, the new probing paths were never quite correct in this
area.

The existence of the legacy probing support which was deleted in 2019
explains why dsa_dev_to_net_device() returns a conduit with elevated
refcount (because it was supposed to be released during
dsa_remove_dst()). After the removal of the legacy code, the only user
of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this
function returns. This pattern makes no sense today, and can only be
interpreted historically to understand why dev_hold() was there in the
first place.

Change details
--------------

Today we have a better netdev tracking infrastructure which we should
use. Logically netdev_hold() belongs in common code
(dsa_port_parse_cpu(), where dp->conduit is assigned), but there is a
tradeoff to be made with the rtnl_lock() section which would become a
bit too long if we did that - dsa_port_parse_cpu() also calls
request_module(). So we duplicate a bit of logic in order for the
callers of dsa_port_parse_cpu() to be the ones responsible of holding
the conduit reference and releasing it on error. This shortens the
rtnl_lock() section significantly.

In the dsa_switch_probe() error path, dsa_switch_release_ports() will be
called in a number of situations, one being where dsa_port_parse_cpu()
maybe didn't get the chance to run at all (a different port failed
earlier, etc). So we have to test for the conduit being NULL prior to
calling netdev_put().

There have still been so many transformations to the code since the
blamed commits (rename master -> conduit, commit 0650bf52b31f ("net:
dsa: be compatible with masters which unregister on shutdown")), that it
only makes sense to fix the code using the best methods available today
and see how it can be backported to stable later. I suspect the fix
cannot even be backported to kernels which lack dsa_switch_shutdown(),
and I suspect this is also maybe why the long-lived conduit reference
didn't make it into the new DSA probing paths at the time (problems
during shutdown).

Because dsa_dev_to_net_device() has a single call site and has to be
changed anyway, the logic was just absorbed into the non-OF
dsa_port_parse().

Tested on the ocelot/felix switch and on dsa_loop, both on the NXP
LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=y.

Reported-by: Ma Ke <make24@iscas.ac.cn>
Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.ac.cn/
Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data")
Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251215150236.3931670-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: Move net_devs registration in a dedicated routine

Since airoha_probe() is not executed under rtnl lock, there is small race
where a given device is configured by user-space while the remaining ones
are not completely loaded from the dts yet. This condition will allow a
hw device misconfiguration since there are some conditions (e.g. GDM2 check
in airoha_dev_init()) that require all device are properly loaded from the
device tree. Fix the issue moving net_devices registration at the end of
the airoha_probe routine.

Fixes: 9cd451d414f6e ("net: airoha: Add loopback support for GDM2")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251214-airoha-fix-dev-registration-v1-1-860e027ad4c6@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

erspan: Initialize options_len before referencing options.

The struct ip_tunnel_info has a flexible array member named
options that is protected by a counted_by(options_len)
attribute.

The compiler will use this information to enforce runtime bounds
checking deployed by FORTIFY_SOURCE string helpers.

As laid out in the GCC documentation, the counter must be
initialized before the first reference to the flexible array
member.

After scanning through the files that use struct ip_tunnel_info
and also refer to options or options_len, it appears the normal
case is to use the ip_tunnel_info_opts_set() helper.

Said helper would initialize options_len properly before copying
data into options, however in the GRE ERSPAN code a partial
update is done, preventing the use of the helper function.

Before this change the handling of ERSPAN traffic in GRE tunnels
would cause a kernel panic when the kernel is compiled with
GCC 15+ and having FORTIFY_SOURCE configured:

memcpy: detected buffer overflow: 4 byte write of buffer size 0

Call Trace:
<IRQ>
__fortify_panic+0xd/0xf
erspan_rcv.cold+0x68/0x83
? ip_route_input_slow+0x816/0x9d0
gre_rcv+0x1b2/0x1c0
gre_rcv+0x8e/0x100
? raw_v4_input+0x2a0/0x2b0
ip_protocol_deliver_rcu+0x1ea/0x210
ip_local_deliver_finish+0x86/0x110
ip_local_deliver+0x65/0x110
? ip_rcv_finish_core+0xd6/0x360
ip_rcv+0x186/0x1a0

Cc: stable@vger.kernel.org
Link: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-counted_005fby-variable-attribute
Reported-at: https://launchpad.net/bugs/2129580
Fixes: bb5e62f2d547 ("net: Add options as a flexible array to struct ip_tunnel_info")
Signed-off-by: Frode Nordahl <fnordahl@ubuntu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251213101338.4693-1-fnordahl@ubuntu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mptcp-fix-warn-on-bad-status'

Matthieu Baerts says:

====================
mptcp: fix warn on bad status

Two somewhat related fixes addressing different issues found by
syzkaller, and producing the exact same splat: a WARNING in
subflow_data_ready().

- Patch 1: fallback earlier on simultaneous connections to avoid a
warning. A fix for v5.19.

- Patch 2: ensure context reset on disconnect, also to avoid a similar
warning. A fix for v6.2.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================

Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-0-d1f9fd1c36c8@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: ensure context reset on disconnect()

After the blamed commit below, if the MPC subflow is already in TCP_CLOSE
status or has fallback to TCP at mptcp_disconnect() time,
mptcp_do_fastclose() skips setting the `send_fastclose flag` and the later
__mptcp_close_ssk() does not reset anymore the related subflow context.

Any later connection will be created with both the `request_mptcp` flag
and the msk-level fallback status off (it is unconditionally cleared at
MPTCP disconnect time), leading to a warning in subflow_data_ready():

  WARNING: CPU: 26 PID: 8996 at net/mptcp/subflow.c:1519 subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13))
  Modules linked in:
  CPU: 26 UID: 0 PID: 8996 Comm: syz.22.39 Not tainted 6.18.0-rc7-05427-g11fc074f6c36 #1 PREEMPT(voluntary)
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  RIP: 0010:subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13))
  Code: 90 0f 0b 90 90 e9 04 fe ff ff e8 b7 1e f5 fe 89 ee bf 07 00 00 00 e8 db 19 f5 fe 83 fd 07 0f 84 35 ff ff ff e8 9d 1e f5 fe 90 <0f> 0b 90 e9 27 ff ff ff e8 8f 1e f5 fe 4c 89 e7 48 89 de e8 14 09
  RSP: 0018:ffffc9002646fb30 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff88813b218000 RCX: ffffffff825c8435
  RDX: ffff8881300b3580 RSI: ffffffff825c8443 RDI: 0000000000000005
  RBP: 000000000000000b R08: ffffffff825c8435 R09: 000000000000000b
  R10: 0000000000000005 R11: 0000000000000007 R12: ffff888131ac0000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f88330af6c0(0000) GS:ffff888a93dd2000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f88330aefe8 CR3: 000000010ff59000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   tcp_data_ready (net/ipv4/tcp_input.c:5356)
   tcp_data_queue (net/ipv4/tcp_input.c:5445)
   tcp_rcv_state_process (net/ipv4/tcp_input.c:7165)
   tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1955)
   __release_sock (include/net/sock.h:1158 (discriminator 6) net/core/sock.c:3180 (discriminator 6))
   release_sock (net/core/sock.c:3737)
   mptcp_sendmsg (net/mptcp/protocol.c:1763 net/mptcp/protocol.c:1857)
   inet_sendmsg (net/ipv4/af_inet.c:853 (discriminator 7))
   __sys_sendto (net/socket.c:727 (discriminator 15) net/socket.c:742 (discriminator 15) net/socket.c:2244 (discriminator 15))
   __x64_sys_sendto (net/socket.c:2247)
   do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  RIP: 0033:0x7f883326702d

Address the issue setting an explicit `fastclosing` flag at fastclose
time, and checking such flag after mptcp_do_fastclose().

Fixes: ae155060247b ("mptcp: fix duplicate reset on fastclose")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-2-d1f9fd1c36c8@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: fallback earlier on simult connection

Syzkaller reports a simult-connect race leading to inconsistent fallback
status:

  WARNING: CPU: 3 PID: 33 at net/mptcp/subflow.c:1515 subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
  Modules linked in:
  CPU: 3 UID: 0 PID: 33 Comm: ksoftirqd/3 Not tainted syzkaller #0 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
  RIP: 0010:subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515
  Code: 89 ee e8 78 61 3c f6 40 84 ed 75 21 e8 8e 66 3c f6 44 89 fe bf 07 00 00 00 e8 c1 61 3c f6 41 83 ff 07 74 09 e8 76 66 3c f6 90 <0f> 0b 90 e8 6d 66 3c f6 48 89 df e8 e5 ad ff ff 31 ff 89 c5 89 c6
  RSP: 0018:ffffc900006cf338 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff888031acd100 RCX: ffffffff8b7f2abf
  RDX: ffff88801e6ea440 RSI: ffffffff8b7f2aca RDI: 0000000000000005
  RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007
  R10: 0000000000000004 R11: 0000000000002c10 R12: ffff88802ba69900
  R13: 1ffff920000d9e67 R14: ffff888046f81800 R15: 0000000000000004
  FS:  0000000000000000(0000) GS:ffff8880d69bc000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000560fc0ca1670 CR3: 0000000032c3a000 CR4: 0000000000352ef0
  Call Trace:
   <TASK>
   tcp_data_queue+0x13b0/0x4f90 net/ipv4/tcp_input.c:5197
   tcp_rcv_state_process+0xfdf/0x4ec0 net/ipv4/tcp_input.c:6922
   tcp_v6_do_rcv+0x492/0x1740 net/ipv6/tcp_ipv6.c:1672
   tcp_v6_rcv+0x2976/0x41e0 net/ipv6/tcp_ipv6.c:1918
   ip6_protocol_deliver_rcu+0x188/0x1520 net/ipv6/ip6_input.c:438
   ip6_input_finish+0x1e4/0x4b0 net/ipv6/ip6_input.c:489
   NF_HOOK include/linux/netfilter.h:318 [inline]
   NF_HOOK include/linux/netfilter.h:312 [inline]
   ip6_input+0x105/0x2f0 net/ipv6/ip6_input.c:500
   dst_input include/net/dst.h:471 [inline]
   ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
   NF_HOOK include/linux/netfilter.h:318 [inline]
   NF_HOOK include/linux/netfilter.h:312 [inline]
   ipv6_rcv+0x264/0x650 net/ipv6/ip6_input.c:311
   __netif_receive_skb_one_core+0x12d/0x1e0 net/core/dev.c:5979
   __netif_receive_skb+0x1d/0x160 net/core/dev.c:6092
   process_backlog+0x442/0x15e0 net/core/dev.c:6444
   __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7494
   napi_poll net/core/dev.c:7557 [inline]
   net_rx_action+0xa9f/0xfe0 net/core/dev.c:7684
   handle_softirqs+0x216/0x8e0 kernel/softirq.c:579
   run_ksoftirqd kernel/softirq.c:968 [inline]
   run_ksoftirqd+0x3a/0x60 kernel/softirq.c:960
   smpboot_thread_fn+0x3f7/0xae0 kernel/smpboot.c:160
   kthread+0x3c2/0x780 kernel/kthread.c:463
   ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
   </TASK>

The TCP subflow can process the simult-connect syn-ack packet after
transitioning to TCP_FIN1 state, bypassing the MPTCP fallback check,
as the sk_state_change() callback is not invoked for * -> FIN_WAIT1
transitions.

That will move the msk socket to an inconsistent status and the next
incoming data will hit the reported splat.

Close the race moving the simult-fallback check at the earliest possible
stage - that is at syn-ack generation time.

About the fixes tags: [2] was supposed to also fix this issue introduced
by [3]. [1] is required as a dependence: it was not explicitly marked as
a fix, but it is one and it has already been backported before [3]. In
other words, this commit should be backported up to [3], including [2]
and [1] if that's not already there.

Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") [1]
Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race") [2]
Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") [3]
Cc: stable@vger.kernel.org
Reported-by: syzbot+0ff6b771b4f7a5bce83b@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/586
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-1-d1f9fd1c36c8@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

team: fix check for port enabled in team_queue_override_port_prio_changed()

There has been a syzkaller bug reported recently with the following
trace:

list_del corruption, ffff888058bea080->prev is LIST_POISON2 (dead000000000122)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:59!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 3 UID: 0 PID: 21246 Comm: syz.0.2928 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__list_del_entry_valid_or_report+0x13e/0x200 lib/list_debug.c:59
Code: 48 c7 c7 e0 71 f0 8b e8 30 08 ef fc 90 0f 0b 48 89 ef e8 a5 02 55 fd 48 89 ea 48 89 de 48 c7 c7 40 72 f0 8b e8 13 08 ef fc 90 <0f> 0b 48 89 ef e8 88 02 55 fd 48 89 ea 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000d49f370 EFLAGS: 00010286
RAX: 000000000000004e RBX: ffff888058bea080 RCX: ffffc9002817d000
RDX: 0000000000000000 RSI: ffffffff819becc6 RDI: 0000000000000005
RBP: dead000000000122 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: ffff888039e9c230
R13: ffff888058bea088 R14: ffff888058bea080 R15: ffff888055461480
FS:  00007fbbcfe6f6c0(0000) GS:ffff8880d6d0a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c3afcb0 CR3: 00000000382c7000 CR4: 0000000000352ef0
Call Trace:
<TASK>
__list_del_entry_valid include/linux/list.h:132 [inline]
__list_del_entry include/linux/list.h:223 [inline]
list_del_rcu include/linux/rculist.h:178 [inline]
__team_queue_override_port_del drivers/net/team/team_core.c:826 [inline]
__team_queue_override_port_del drivers/net/team/team_core.c:821 [inline]
team_queue_override_port_prio_changed drivers/net/team/team_core.c:883 [inline]
team_priority_option_set+0x171/0x2f0 drivers/net/team/team_core.c:1534
team_option_set drivers/net/team/team_core.c:376 [inline]
team_nl_options_set_doit+0x8ae/0xe60 drivers/net/team/team_core.c:2653
genl_family_rcv_msg_doit+0x209/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346
netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0xa98/0xc70 net/socket.c:2630
___sys_sendmsg+0x134/0x1d0 net/socket.c:2684
__sys_sendmsg+0x16d/0x220 net/socket.c:2716
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

The problem is in this flow:
1) Port is enabled, queue_id != 0, in qom_list
2) Port gets disabled
        -> team_port_disable()
        -> team_queue_override_port_del()
        -> del (removed from list)
3) Port is disabled, queue_id != 0, not in any list
4) Priority changes
        -> team_queue_override_port_prio_changed()
        -> checks: port disabled && queue_id != 0
        -> calls del - hits the BUG as it is removed already

To fix this, change the check in team_queue_override_port_prio_changed()
so it returns early if port is not enabled.

Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f
Fixes: 6c31ff366c11 ("team: remove synchronize_rcu() called during queue override change")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251212102953.167287-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/handshake: Fix null-ptr-deref in handshake_complete()

A null pointer dereference in handshake_complete() was observed [1].

When handshake_req_next() return NULL in handshake_nl_accept_doit(),
function handshake_complete() will be called unexpectedly which triggers
this crash. Fix it by goto out_status when req is NULL.

[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
RIP: 0010:handshake_complete+0x36/0x2b0 net/handshake/request.c:288
Call Trace:
<TASK>
handshake_nl_accept_doit+0x32d/0x7e0 net/handshake/netlink.c:129
genl_family_rcv_msg_doit+0x204/0x300 net/netlink/genetlink.c:1115
genl_family_rcv_msg+0x436/0x670 net/netlink/genetlink.c:1195
genl_rcv_msg+0xcc/0x170 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x14c/0x430 net/netlink/af_netlink.c:2550
genl_rcv+0x2d/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x878/0xb20 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x897/0xd70 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0xa39/0xbf0 net/socket.c:2592
___sys_sendmsg+0x121/0x1c0 net/socket.c:2646
__sys_sendmsg+0x155/0x200 net/socket.c:2678
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x5f/0x350 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>

Fixes: fe67b063f687 ("net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/kernel-tls-handshake/aScekpuOYHRM9uOd@morisot.1015granger.net/T/#m7cfa5c11efc626d77622b2981591197a2acdd65e
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251212012723.4111831-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ip6_gre: make ip6gre_header() robust

Over the years, syzbot found many ways to crash the kernel
in ip6gre_header() [1].

This involves team or bonding drivers ability to dynamically
change their dev->needed_headroom and/or dev->hard_header_len

In this particular crash mld_newpack() allocated an skb
with a too small reserve/headroom, and by the time mld_sendpack()
was called, syzbot managed to attach an ip6gre device.

[1]
skbuff: skb_under_panic: text:ffffffff8a1d69a8 len:136 put:40 head:ffff888059bc7000 data:ffff888059bc6fe8 tail:0x70 end:0x6c0 dev:team0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:213 !
<TASK>
  skb_under_panic net/core/skbuff.c:223 [inline]
  skb_push+0xc3/0xe0 net/core/skbuff.c:2641
  ip6gre_header+0xc8/0x790 net/ipv6/ip6_gre.c:1371
  dev_hard_header include/linux/netdevice.h:3436 [inline]
  neigh_connected_output+0x286/0x460 net/core/neighbour.c:1618
  neigh_output include/net/neighbour.h:556 [inline]
  ip6_finish_output2+0xfb3/0x1480 net/ipv6/ip6_output.c:136
__ip6_finish_output net/ipv6/ip6_output.c:-1 [inline]
  ip6_finish_output+0x234/0x7d0 net/ipv6/ip6_output.c:220
  NF_HOOK_COND include/linux/netfilter.h:307 [inline]
  ip6_output+0x340/0x550 net/ipv6/ip6_output.c:247
  NF_HOOK+0x9e/0x380 include/linux/netfilter.h:318
  mld_sendpack+0x8d4/0xe60 net/ipv6/mcast.c:1855
  mld_send_cr net/ipv6/mcast.c:2154 [inline]
  mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Reported-by: syzbot+43a2ebcf2a64b1102d64@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/693b002c.a70a0220.33cd7b.0033.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251211173550.2032674-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: openvswitch: Avoid needlessly taking the RTNL on vport destroy

The openvswitch teardown code will immediately call
ovs_netdev_detach_dev() in response to a NETDEV_UNREGISTER notification.
It will then start the dp_notify_work workqueue, which will later end up
calling the vport destroy() callback. This callback takes the RTNL to do
another ovs_netdev_detach_port(), which in this case is unnecessary.
This causes extra pressure on the RTNL, in some cases leading to
"unregister_netdevice: waiting for XX to become free" warnings on
teardown.

We can straight-forwardly avoid the extra RTNL lock acquisition by
checking the device flags before taking the lock, and skip the locking
altogether if the IFF_OVS_DATAPATH flag has already been unset.

Fixes: b07c26511e94 ("openvswitch: fix vport-netdev unregister")
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20251211115006.228876-1-toke@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration

When nvmem_cell_read() fails in mt798x_phy_calibration(), the function
returns without calling nvmem_cell_put(), leaking the cell reference.

Move nvmem_cell_put() right after nvmem_cell_read() to ensure the cell
reference is always released regardless of the read result.

Found via static analysis and code review.

Fixes: 98c485eaf509 ("net: phy: add driver for MediaTek SoC built-in GE PHYs")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251211081313.2368460-1-linmq006@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fjes: Add missing iounmap in fjes_hw_init()

In error paths, add fjes_hw_iounmap() to release the
resource acquired by fjes_hw_iomap(). Add a goto label
to do so.

Fixes: 8cdc3f6c5d22 ("fjes: Hardware initialization routine")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251211073756.101824-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mdio: aspeed: add dummy read to avoid read-after-write issue

The Aspeed MDIO controller may return incorrect data when a read operation
follows immediately after a write. Due to a controller bug, the subsequent
read can latch stale data, causing the polling logic to terminate earlier
than expected.

To work around this hardware issue, insert a dummy read after each write
operation. This ensures that the next actual read returns the correct
data and prevents premature polling exit.

This workaround has been verified to stabilize MDIO transactions on
affected Aspeed platforms.

Fixes: f160e99462c6 ("net: phy: Add mdio-aspeed")
Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251211-aspeed_mdio_add_dummy_read-v3-1-382868869004@aspeedtech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: sr9700: support devices with virtual driver CD

Some SR9700 devices have an SPI flash chip containing a virtual driver
CD, in which case they appear as a device with two interfaces and
product ID 0x9702. Interface 0 is the driver CD and interface 1 is the
Ethernet device.

Link: https://github.com/name-kurniawan/usb-lan
Link: https://www.draisberghof.de/usb_modeswitch/bb/viewtopic.php?t=2185
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20251211062451.139036-1-enelsonmoore@gmail.com
[pabeni@redhat.com: fixes link tags]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Bluetooth: btusb: revert use of devm_kzalloc in btusb

This reverts commit 98921dbd00c4e ("Bluetooth: Use devm_kzalloc in
btusb.c file").

In btusb_probe(), we use devm_kzalloc() to allocate the btusb data. This
ties the lifetime of all the btusb data to the binding of a driver to
one interface, INTF. In a driver that binds to other interfaces, ISOC
and DIAG, this is an accident waiting to happen.

The issue is revealed in btusb_disconnect(), where calling
usb_driver_release_interface(&btusb_driver, data->intf) will have devm
free the data that is also being used by the other interfaces of the
driver that may not be released yet.

To fix this, revert the use of devm and go back to freeing memory
explicitly.

Fixes: 98921dbd00c4e ("Bluetooth: Use devm_kzalloc in btusb.c file")
Signed-off-by: Raphael Pinsonneault-Thibeault <rpthibeault@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: MGMT: report BIS capability flags in supported settings

MGMT_SETTING_ISO_BROADCASTER and MGMT_SETTING_ISO_RECEIVER flags are
missing from supported_settings although they are in current_settings.

Report them also in supported_settings to be consistent.

Fixes: ae7533613133 ("Bluetooth: Check for ISO support in controller")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Merge tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and CAN.

  Current release - regressions:

   - netfilter: nf_conncount: fix leaked ct in error paths

   - sched: act_mirred: fix loop detection

   - sctp: fix potential deadlock in sctp_clone_sock()

   - can: fix build dependency

   - eth: mlx5e: do not update BQL of old txqs during channel
     reconfiguration

  Previous releases - regressions:

   - sched: ets: always remove class from active list before deleting it

   - inet: frags: flush pending skbs in fqdir_pre_exit()

   - netfilter: nf_nat: remove bogus direction check

   - mptcp:
      - schedule rtx timer only after pushing data
      - avoid deadlock on fallback while reinjecting

   - can: gs_usb: fix error handling

   - eth:
      - mlx5e:
         - avoid unregistering PSP twice
         - fix double unregister of HCA_PORTS component
      - bnxt_en: fix XDP_TX path
      - mlxsw: fix use-after-free when updating multicast route stats

  Previous releases - always broken:

   - ethtool: avoid overflowing userspace buffer on stats query

   - openvswitch: fix middle attribute validation in push_nsh() action

   - eth:
      - mlx5: fw_tracer, validate format string parameters
      - mlxsw: spectrum_router: fix neighbour use-after-free
      - ipvlan: ignore PACKET_LOOPBACK in handle_mode_l2()

  Misc:

   - Jozsef Kadlecsik retires from maintaining netfilter

   - tools: ynl: fix build on systems with old kernel headers"

* tag 'net-6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: hns3: add VLAN id validation before using
  net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx
  net: hns3: using the num_tqps in the vf driver to apply for resources
  net: enetc: do not transmit redirected XDP frames when the link is down
  selftests/tc-testing: Test case exercising potential mirred redirect deadlock
  net/sched: act_mirred: fix loop detection
  sctp: Clear inet_opt in sctp_v6_copy_ip_options().
  sctp: Fetch inet6_sk() after setting ->pinet6 in sctp_clone_sock().
  net/handshake: duplicate handshake cancellations leak socket
  net/mlx5e: Don't include PSP in the hard MTU calculations
  net/mlx5e: Do not update BQL of old txqs during channel reconfiguration
  net/mlx5e: Trigger neighbor resolution for unresolved destinations
  net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init
  net/mlx5: Serialize firmware reset with devlink
  net/mlx5: fw_tracer, Handle escaped percent properly
  net/mlx5: fw_tracer, Validate format string parameters
  net/mlx5: Drain firmware reset in shutdown callback
  net/mlx5: fw reset, clear reset requested on drain_fw_reset
  net: dsa: mxl-gsw1xx: manually clear RANEG bit
  net: dsa: mxl-gsw1xx: fix .shutdown driver operation
  ...

Merge tag 'v6.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- important fix for reconnect problem

- minor cleanup

* tag 'v6.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  smb: move some SMB1 definitions into common/smb1pdu.h
  smb: align durable reconnect v2 context to 8 byte boundary

Merge tag 'fsnotify_for_v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify fixes from Jan Kara:
"Two fsnotify fixes.

  The fix from Ahelenia makes sure we generate event when modifying
  inode flags, the fix from Amir disables sending of events from device
  inodes to their parent directory as it could concievably create a
  usable side channel attack in case of some devices and so far we
  aren't aware of anybody depending on the functionality"

* tag 'fsnotify_for_v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fs: send fsnotify_xattr()/IN_ATTRIB from vfs_fileattr_set()/chattr(1)
  fsnotify: do not generate ACCESS/MODIFY events on child for special files

Merge tag 'linux-can-fixes-for-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-12-18

this is a pull request of 3 patches for net/main.

Tetsuo Handa contributes 2 patches to fix race windows in the j1939
protocol to properly handle disappearing network devices.

The last patch is by me, it fixes a build dependency with the CAN
drivers, that got introduced while fixing a dependency between the CAN
protocol and CAN device code.

linux-can-fixes-for-6.19-20251218

* tag 'linux-can-fixes-for-6.19-20251218' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: fix build dependency
  can: j1939: make j1939_sk_bind() fail if device is no longer registered
  can: j1939: make j1939_session_activate() fail if device is no longer registered
====================

Link: https://patch.msgid.link/20251218123132.664533-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'

Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver
====================

Link: https://patch.msgid.link/20251211023737.2327018-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: hns3: add VLAN id validation before using

Currently, the VLAN id may be used without validation when
receive a VLAN configuration mailbox from VF. The length of
vlan_del_fail_bmap is BITS_TO_LONGS(VLAN_N_VID). It may cause
out-of-bounds memory access once the VLAN id is bigger than
or equal to VLAN_N_VID.

Therefore, VLAN id needs to be checked to ensure it is within
the range of VLAN_N_VID.

Fixes: fe4144d47eef ("net: hns3: sync VLAN filter entries when kill VLAN ID failed")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251211023737.2327018-4-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: hns3: using the num_tqps to check whether tqp_index is out of range when vf get ring info from mbx

Currently, rss_size = num_tqps / tc_num. If tc_num is 1, then num_tqps
equals rss_size. However, if the tc_num is greater than 1, then rss_size
will be less than num_tqps, causing the tqp_index check for subsequent TCs
using rss_size to always fail.

This patch uses the num_tqps to check whether tqp_index is out of range,
instead of rss_size.

Fixes: 326334aad024 ("net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251211023737.2327018-3-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: hns3: using the num_tqps in the vf driver to apply for resources

Currently, hdev->htqp is allocated using hdev->num_tqps, and kinfo->tqp
is allocated using kinfo->num_tqps. However, kinfo->num_tqps is set to
min(new_tqps, hdev->num_tqps); Therefore, kinfo->num_tqps may be smaller
than hdev->num_tqps, which causes some hdev->htqp[i] to remain
uninitialized in hclgevf_knic_setup().

Thus, this patch allocates hdev->htqp and kinfo->tqp using hdev->num_tqps,
ensuring that the lengths of hdev->htqp and kinfo->tqp are consistent
and that all elements are properly initialized.

Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251211023737.2327018-2-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: do not transmit redirected XDP frames when the link is down

In the current implementation, the enetc_xdp_xmit() always transmits
redirected XDP frames even if the link is down, but the frames cannot
be transmitted from TX BD rings when the link is down, so the frames
are still kept in the TX BD rings. If the XDP program is uninstalled,
users will see the following warning logs.

fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear

More worse, the TX BD ring cannot work properly anymore, because the
HW PIR and CIR are not equal after the re-initialization of the TX
BD ring. At this point, the BDs between CIR and PIR are invalid,
which will cause a hardware malfunction.

Another reason is that there is internal context in the ring prefetch
logic that will retain the state from the first incarnation of the ring
and continue prefetching from the stale location when we re-initialize
the ring. The internal context is only reset by an FLR. That is to say,
for LS1028A ENETC, software cannot set the HW CIR and PIR when
initializing the TX BD ring.

It does not make sense to transmit redirected XDP frames when the link is
down. Add a link status check to prevent transmission in this condition.
This fixes part of the issue, but more complex cases remain. For example,
the TX BD ring may still contain unsent frames when the link goes down.
Those situations require additional patches, which will build on this
one.

Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251211020919.121113-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/tc-testing: Test case exercising potential mirred redirect deadlock

Add a test case that reproduces deadlock scenario where the user has
a drr qdisc attached to root and has a mirred action that redirects to
self on egress

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251210162255.1057663-2-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: act_mirred: fix loop detection

Fix a loop scenario of ethx:egress->ethx:egress

Example setup to reproduce:
tc qdisc add dev ethx root handle 1: drr
tc filter add dev ethx parent 1: protocol ip prio 1 matchall \
         action mirred egress redirect dev ethx

Now ping out of ethx and you get a deadlock:

[  116.892898][  T307] ============================================
[  116.893182][  T307] WARNING: possible recursive locking detected
[  116.893418][  T307] 6.18.0-rc6-01205-ge05021a829b8-dirty #204 Not tainted
[  116.893682][  T307] --------------------------------------------
[  116.893926][  T307] ping/307 is trying to acquire lock:
[  116.894133][  T307] ffff88800c122908 (&sch->root_lock_key){+...}-{3:3}, at: __dev_queue_xmit+0x2210/0x3b50
[  116.894517][  T307]
[  116.894517][  T307] but task is already holding lock:
[  116.894836][  T307] ffff88800c122908 (&sch->root_lock_key){+...}-{3:3}, at: __dev_queue_xmit+0x2210/0x3b50
[  116.895252][  T307]
[  116.895252][  T307] other info that might help us debug this:
[  116.895608][  T307]  Possible unsafe locking scenario:
[  116.895608][  T307]
[  116.895901][  T307]        CPU0
[  116.896057][  T307]        ----
[  116.896200][  T307]   lock(&sch->root_lock_key);
[  116.896392][  T307]   lock(&sch->root_lock_key);
[  116.896605][  T307]
[  116.896605][  T307]  *** DEADLOCK ***
[  116.896605][  T307]
[  116.896864][  T307]  May be due to missing lock nesting notation
[  116.896864][  T307]
[  116.897123][  T307] 6 locks held by ping/307:
[  116.897302][  T307]  #0: ffff88800b4b0250 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xb20/0x2cf0
[  116.897808][  T307]  #1: ffffffff88c839c0 (rcu_read_lock){....}-{1:3}, at: ip_output+0xa9/0x600
[  116.898138][  T307]  #2: ffffffff88c839c0 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0x2c6/0x1ee0
[  116.898459][  T307]  #3: ffffffff88c83960 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x200/0x3b50
[  116.898782][  T307]  #4: ffff88800c122908 (&sch->root_lock_key){+...}-{3:3}, at: __dev_queue_xmit+0x2210/0x3b50
[  116.899132][  T307]  #5: ffffffff88c83960 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x200/0x3b50
[  116.899442][  T307]
[  116.899442][  T307] stack backtrace:
[  116.899667][  T307] CPU: 2 UID: 0 PID: 307 Comm: ping Not tainted 6.18.0-rc6-01205-ge05021a829b8-dirty #204 PREEMPT(voluntary)
[  116.899672][  T307] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  116.899675][  T307] Call Trace:
[  116.899678][  T307]  <TASK>
[  116.899680][  T307]  dump_stack_lvl+0x6f/0xb0
[  116.899688][  T307]  print_deadlock_bug.cold+0xc0/0xdc
[  116.899695][  T307]  __lock_acquire+0x11f7/0x1be0
[  116.899704][  T307]  lock_acquire+0x162/0x300
[  116.899707][  T307]  ? __dev_queue_xmit+0x2210/0x3b50
[  116.899713][  T307]  ? srso_alias_return_thunk+0x5/0xfbef5
[  116.899717][  T307]  ? stack_trace_save+0x93/0xd0
[  116.899723][  T307]  _raw_spin_lock+0x30/0x40
[  116.899728][  T307]  ? __dev_queue_xmit+0x2210/0x3b50
[  116.899731][  T307]  __dev_queue_xmit+0x2210/0x3b50

Fixes: 178ca30889a1 ("Revert "net/sched: Fix mirred deadlock on device recursion"")
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251210162255.1057663-1-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'sctp-fix-two-issues-in-sctp_clone_sock'

Kuniyuki Iwashima says:

====================
sctp: Fix two issues in sctp_clone_sock().

syzbot reported two issues in sctp_clone_sock().

This series fixes the issues.

v1: https://lore.kernel.org/netdev/20251208133728.157648-1-kuniyu@google.com/
====================

Link: https://patch.msgid.link/20251210081206.1141086-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

sctp: Clear inet_opt in sctp_v6_copy_ip_options().

syzbot reported the splat below. [0]

Since the cited commit, the child socket inherits all fields
of its parent socket unless explicitly cleared.

syzbot set IP_OPTIONS to AF_INET6 socket and created a child
socket inheriting inet_sk(sk)->inet_opt.

sctp_v6_copy_ip_options() only clones np->opt, and leaving
inet_opt results in double-free.

Let's clear inet_opt in sctp_v6_copy_ip_options().

[0]:
BUG: KASAN: double-free in inet_sock_destruct+0x538/0x740 net/ipv4/af_inet.c:159
Free of addr ffff8880304b6d40 by task ksoftirqd/0/15

CPU: 0 UID: 0 PID: 15 Comm: ksoftirqd/0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report_invalid_free+0xea/0x110 mm/kasan/report.c:557
check_slab_allocation+0xe1/0x130 include/linux/page-flags.h:-1
kasan_slab_pre_free include/linux/kasan.h:198 [inline]
slab_free_hook mm/slub.c:2484 [inline]
slab_free mm/slub.c:6630 [inline]
kfree+0x148/0x6d0 mm/slub.c:6837
inet_sock_destruct+0x538/0x740 net/ipv4/af_inet.c:159
__sk_destruct+0x89/0x660 net/core/sock.c:2350
sock_put include/net/sock.h:1991 [inline]
sctp_endpoint_destroy_rcu+0xa1/0xf0 net/sctp/endpointola.c:197
rcu_do_batch kernel/rcu/tree.c:2605 [inline]
rcu_core+0xcab/0x1770 kernel/rcu/tree.c:2861
handle_softirqs+0x286/0x870 kernel/softirq.c:622
run_ksoftirqd+0x9b/0x100 kernel/softirq.c:1063
smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Allocated by task 6003:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:417
kasan_kmalloc include/linux/kasan.h:262 [inline]
__do_kmalloc_node mm/slub.c:5642 [inline]
__kmalloc_noprof+0x411/0x7f0 mm/slub.c:5654
kmalloc_noprof include/linux/slab.h:961 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
ip_options_get+0x51/0x4c0 net/ipv4/ip_options.c:517
do_ip_setsockopt+0x1d9b/0x2d00 net/ipv4/ip_sockglue.c:1087
ip_setsockopt+0x66/0x110 net/ipv4/ip_sockglue.c:1417
do_sock_setsockopt+0x17c/0x1b0 net/socket.c:2360
__sys_setsockopt net/socket.c:2385 [inline]
__do_sys_setsockopt net/socket.c:2391 [inline]
__se_sys_setsockopt net/socket.c:2388 [inline]
__x64_sys_setsockopt+0x13f/0x1b0 net/socket.c:2388
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 15:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
__kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:587
kasan_save_free_info mm/kasan/kasan.h:406 [inline]
poison_slab_object mm/kasan/common.c:252 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284
kasan_slab_free include/linux/kasan.h:234 [inline]
slab_free_hook mm/slub.c:2539 [inline]
slab_free mm/slub.c:6630 [inline]
kfree+0x19a/0x6d0 mm/slub.c:6837
inet_sock_destruct+0x538/0x740 net/ipv4/af_inet.c:159
__sk_destruct+0x89/0x660 net/core/sock.c:2350
sock_put include/net/sock.h:1991 [inline]
sctp_endpoint_destroy_rcu+0xa1/0xf0 net/sctp/endpointola.c:197
rcu_do_batch kernel/rcu/tree.c:2605 [inline]
rcu_core+0xcab/0x1770 kernel/rcu/tree.c:2861
handle_softirqs+0x286/0x870 kernel/softirq.c:622
run_ksoftirqd+0x9b/0x100 kernel/softirq.c:1063
smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Fixes: 16942cf4d3e31 ("sctp: Use sk_clone() in sctp_accept().")
Reported-by: syzbot+ec33a1a006ed5abe7309@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6936d112.a70a0220.38f243.00a8.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251210081206.1141086-3-kuniyu@google.com
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

sctp: Fetch inet6_sk() after setting ->pinet6 in sctp_clone_sock().

syzbot reported the lockdep splat below. [0]

sctp_clone_sock() sets the child socket's ipv6_mc_list to NULL,
but somehow sock_release() in an error path finally acquires
lock_sock() in ipv6_sock_mc_close().

The root cause is that sctp_clone_sock() fetches inet6_sk(newsk)
before setting newinet->pinet6, meaning that the parent's
ipv6_mc_list was actually cleared.

Also, sctp_v6_copy_ip_options() uses inet6_sk() but is called
before newinet->pinet6 is set.

Let's use inet6_sk() only after setting newinet->pinet6.

[0]:
WARNING: possible recursive locking detected
syzkaller #0 Not tainted

syz.0.17/5996 is trying to acquire lock:
ffff888031af4c60 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1700 [inline]
ffff888031af4c60 (sk_lock-AF_INET6){+.+.}-{0:0}, at: ipv6_sock_mc_close+0xd3/0x140 net/ipv6/mcast.c:348

but task is already holding lock:
ffff888031af4320 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1700 [inline]
ffff888031af4320 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_getsockopt+0x135/0xb60 net/sctp/socket.c:8131

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(sk_lock-AF_INET6);
  lock(sk_lock-AF_INET6);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by syz.0.17/5996:
#0: ffff888031af4320 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1700 [inline]
#0: ffff888031af4320 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_getsockopt+0x135/0xb60 net/sctp/socket.c:8131

stack backtrace:
CPU: 0 UID: 0 PID: 5996 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_deadlock_bug+0x279/0x290 kernel/locking/lockdep.c:3041
check_deadlock kernel/locking/lockdep.c:3093 [inline]
validate_chain kernel/locking/lockdep.c:3895 [inline]
__lock_acquire+0x2540/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x117/0x340 kernel/locking/lockdep.c:5868
lock_sock_nested+0x48/0x100 net/core/sock.c:3780
lock_sock include/net/sock.h:1700 [inline]
ipv6_sock_mc_close+0xd3/0x140 net/ipv6/mcast.c:348
inet6_release+0x47/0x70 net/ipv6/af_inet6.c:482
__sock_release net/socket.c:653 [inline]
sock_release+0x85/0x150 net/socket.c:681
sctp_getsockopt_peeloff_common+0x56b/0x770 net/sctp/socket.c:5732
sctp_getsockopt_peeloff_flags+0x13b/0x230 net/sctp/socket.c:5801
sctp_getsockopt+0x3ab/0xb60 net/sctp/socket.c:8151
do_sock_getsockopt+0x2b4/0x3d0 net/socket.c:2399
__sys_getsockopt net/socket.c:2428 [inline]
__do_sys_getsockopt net/socket.c:2435 [inline]
__se_sys_getsockopt net/socket.c:2432 [inline]
__x64_sys_getsockopt+0x1a5/0x250 net/socket.c:2432
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f8f8c38f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcfdade018 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 00007f8f8c5e5fa0 RCX: 00007f8f8c38f749
RDX: 000000000000007a RSI: 0000000000000084 RDI: 0000000000000003
RBP: 00007f8f8c413f91 R08: 0000200000000040 R09: 0000000000000000
R10: 0000200000000340 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f8f8c5e5fa0 R14: 00007f8f8c5e5fa0 R15: 0000000000000005
</TASK>

Fixes: 16942cf4d3e31 ("sctp: Use sk_clone() in sctp_accept().")
Reported-by: syzbot+c59e6bb54e7620495725@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6936d112.a70a0220.38f243.00a7.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251210081206.1141086-2-kuniyu@google.com
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/handshake: duplicate handshake cancellations leak socket

When a handshake request is cancelled it is removed from the
handshake_net->hn_requests list, but it is still present in the
handshake_rhashtbl until it is destroyed.

If a second cancellation request arrives for the same handshake request,
then remove_pending() will return false... and assuming
HANDSHAKE_F_REQ_COMPLETED isn't set in req->hr_flags, we'll continue
processing through the out_true label, where we put another reference on
the sock and a refcount underflow occurs.

This can happen for example if a handshake times out - particularly if
the SUNRPC client sends the AUTH_TLS probe to the server but doesn't
follow it up with the ClientHello due to a problem with tlshd. When the
timeout is hit on the server, the server will send a FIN, which triggers
a cancellation request via xs_reset_transport(). When the timeout is
hit on the client, another cancellation request happens via
xs_tls_handshake_sync().

Add a test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED) in the pending cancel
path so duplicate cancels can be detected.

Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20251209193015.3032058-1-smayhew@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-25-12-16' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*:

1)  Jozsef Kadlecsik is retiring.  Fortunately Jozsef will still keep an
    eye on ipset patches.

2)  remove a bogus direction check from nat core, this caused spurious
    flakes in the 'reverse clash' selftest, from myself.

3) nf_tables doesn't need to do chain validation on register store,
   from Pablo Neira Ayuso.

4) nf_tables shouldn't revisit chains during ruleset (graph) validation
   if possible.  Both 3 and 4 were slated for -next initially but there
   are now two independent reports of people hitting soft lockup errors
   during ruleset validation, so it makes no sense anymore to route
   this via -next given this is -stable material. From myself.

5) call cond_resched() in a more frequently visited place during nf_tables
   chain validation, this wasn't possible earlier due to rcu read lock,
   but nowadays its not held anymore during set walks.

6) Don't fail conntrack packetdrill test with HZ=100 kernels.

netfilter pull request nf-25-12-16

* tag 'nf-25-12-16' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: packetdrill: avoid failure on HZ=100 kernel
  netfilter: nf_tables: avoid softlockup warnings in nft_chain_validate
  netfilter: nf_tables: avoid chain re-validation if possible
  netfilter: nf_tables: remove redundant chain validation on register store
  netfilter: nf_nat: remove bogus direction check
  MAINTAINERS: Remove Jozsef Kadlecsik from MAINTAINERS file
====================

Link: https://patch.msgid.link/20251216190904.14507-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mlx5-misc-fixes-2025-12-09'

Tariq Toukan says:

====================
mlx5 misc fixes 2025-12-09

This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
====================

Link: https://patch.msgid.link/1765284977-1363052-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Don't include PSP in the hard MTU calculations

Commit [1] added the 40 bytes required by the PSP header+trailer and the
UDP header to MLX5E_ETH_HARD_MTU, which limits the device-wide max
software MTU that could be set. This is not okay, because most packets
are not PSP packets and it doesn't make sense to always reserve space
for headers which won't get added in most cases.

As it turns out, for TCP connections, PSP overhead is already taken into
account in the TCP MSS calculations via inet_csk(sk)->icsk_ext_hdr_len.
This was added in commit [2]. This means that the extra space reserved
in the hard MTU for mlx5 ends up unused and wasted.

Remove the unnecessary 40 byte reservation from hard MTU.

[1] commit e5a1861a298e ("net/mlx5e: Implement PSP Tx data path")
[2] commit e97269257fe4 ("net: psp: update the TCP MSS to reflect PSP
packet overhead")

Fixes: e5a1861a298e ("net/mlx5e: Implement PSP Tx data path")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-10-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Do not update BQL of old txqs during channel reconfiguration

During channel reconfiguration (e.g., ethtool private flags changes),
the driver can trigger a kernel BUG_ON in dql_completed() with the error
"kernel BUG at lib/dynamic_queue_limits.c:99".

The issue occurs in the following sequence:

During mlx5e_safe_switch_params(), old channels are deactivated via
mlx5e_deactivate_txqsq(). New channels are created and activated, taking
ownership of the netdev_queues and their BQL state.

When old channels are closed via mlx5e_close_txqsq(), there may be
pending TX descriptors (sq->cc != sq->pc) that were in-flight during the
deactivation.

mlx5e_free_txqsq_descs() frees these pending descriptors and attempts to
complete them via netdev_tx_completed_queue().

However, the BQL state (dql->num_queued and dql->num_completed) have
been reset in mlx5e_activate_txqsq and belong to the new queue owner,
leading to dql->num_queued - dql->num_completed < nbytes.

This triggers BUG_ON(count > num_queued - num_completed) in
dql_completed().

Fixes: 3b88a535a8e1 ("net/mlx5e: Defer channels closure to reduce interface down time")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-9-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Trigger neighbor resolution for unresolved destinations

When initializing the MAC addresses for an outbound IPsec packet offload
rule in mlx5e_ipsec_init_macs, the call to dst_neigh_lookup is used to
find the next-hop neighbor (typically the gateway in tunnel mode).
This call might create a new neighbor entry if one doesn't already
exist. This newly created entry starts in the INCOMPLETE state, as the
kernel hasn't yet sent an ARP or NDISC probe to resolve the MAC
address. In this case, neigh_ha_snapshot will correctly return an
all-zero MAC address.

IPsec packet offload requires the actual next-hop MAC address to
program the rule correctly. If the neighbor state is INCOMPLETE when
the rule is created, the hardware rule is programmed with an all-zero
destination MAC address. Packets sent using this rule will be
subsequently dropped by the receiving network infrastructure or host.

This patch adds a check specifically for the outbound offload path. If
neigh_ha_snapshot returns an all-zero MAC address, it proactively
calls neigh_event_send(n, NULL). This ensures the kernel immediately
sends the initial ARP or NDISC probe if one isn't already pending,
accelerating the resolution process. This helps prevent the hardware
rule from being programmed with an invalid MAC address and avoids
packet drops due to unresolved neighbors.

Fixes: 71670f766b8f ("net/mlx5e: Support routed networks during IPsec MACs initialization")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Use ip6_dst_lookup instead of ipv6_dst_lookup_flow for MAC init

Replace ipv6_stub->ipv6_dst_lookup_flow() with ip6_dst_lookup() in
mlx5e_ipsec_init_macs() since IPsec transformations are not needed
during Security Association setup - only basic routing information is
required for nexthop MAC address resolution.

This resolves an issue where XfrmOutNoStates error counter would be
incremented when xfrm policy is configured before xfrm state, as the
IPsec-aware routing function would attempt policy checks during SA
initialization.

Fixes: 71670f766b8f ("net/mlx5e: Support routed networks during IPsec MACs initialization")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Serialize firmware reset with devlink

The firmware reset mechanism can be triggered by asynchronous events,
which may race with other devlink operations like devlink reload or
devlink dev eswitch set, potentially leading to inconsistent states.

This patch addresses the race by using the devl_lock to serialize the
firmware reset against other devlink operations. When a reset is
requested, the driver attempts to acquire the lock. If successful, it
sets a flag to block devlink reload or eswitch changes, ACKs the reset
to firmware and then releases the lock. If the lock is already held by
another operation, the driver NACKs the firmware reset request,
indicating that the reset cannot proceed.

Firmware reset does not keep the devl_lock and instead uses an internal
firmware reset bit. This is because firmware resets can be triggered by
asynchronous events, and processed in different threads. It is illegal
and unsafe to acquire a lock in one thread and attempt to release it in
another, as lock ownership is intrinsically thread-specific.

This change ensures that firmware resets and other devlink operations
are mutually exclusive during the critical reset request phase,
preventing race conditions.

Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mateusz Berezecki <mberezecki@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fw_tracer, Handle escaped percent properly

The firmware tracer's format string validation and parameter counting
did not properly handle escaped percent signs (%%). This caused
fw_tracer to count more parameters when trace format strings contained
literal percent characters.

To fix it, allow %% to pass string validation and skip %% sequences when
counting parameters since they represent literal percent signs rather
than format specifiers.

Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fw_tracer, Validate format string parameters

Add validation for format string parameters in the firmware tracer to
prevent potential security vulnerabilities and crashes from malformed
format strings received from firmware.

The firmware tracer receives format strings from the device firmware and
uses them to format trace messages. Without proper validation, bad
firmware could provide format strings with invalid format specifiers
(e.g., %s, %p, %n) that could lead to crashes, or other undefined
behavior.

Add mlx5_tracer_validate_params() to validate that all format specifiers
in trace strings are limited to safe integer/hex formats (%x, %d, %i,
%u, %llx, %lx, etc.). Reject strings containing other format types that
could be used to access arbitrary memory or cause crashes.
Invalid format strings are added to the trace output for visibility with
"BAD_FORMAT: " prefix.

Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/netdev/hanz6rzrb2bqbplryjrakvkbmv4y5jlmtthnvi3thg5slqvelp@t3s3erottr6s/
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Drain firmware reset in shutdown callback

Invoke drain_fw_reset() in the shutdown callback to ensure all
firmware reset handling is completed before shutdown proceeds.

Fixes: 16d42d313350 ("net/mlx5: Drain fw_reset when removing device")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fw reset, clear reset requested on drain_fw_reset

drain_fw_reset() waits for ongoing firmware reset events and blocks new
event handling, but does not clear the reset requested flag, and may
keep sync reset polling.

To fix it, call mlx5_sync_reset_clear_reset_requested() to clear the
flag, stop sync reset polling, and resume health polling, ensuring
health issues are still detected after the firmware reset drain.

Fixes: 16d42d313350 ("net/mlx5: Drain fw_reset when removing device")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1765284977-1363052-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-dsa-lantiq-a-bunch-of-fixes'

Daniel Golle says:

====================
net: dsa: lantiq: a bunch of fixes

This series is the continuation and result of comments received for a fix
for the SGMII restart-an bit not actually being self-clearing, which was
reported by by Rasmus Villemoes.

A closer investigation and testing the .remove and the .shutdown paths
of the mxl-gsw1xx.c and lantiq_gswip.c drivers has revealed a couple of
existing problems, which are also addressed in this series.
====================

Link: https://patch.msgid.link/cover.1765241054.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mxl-gsw1xx: manually clear RANEG bit

Despite being documented as self-clearing, the RANEG bit sometimes
remains set, preventing auto-negotiation from happening.

Manually clear the RANEG bit after 10ms as advised by MaxLinear.
In order to not hold RTNL during the 10ms of waiting schedule
delayed work to take care of clearing the bit asynchronously, which
is similar to the self-clearing behavior.

Fixes: 22335939ec90 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Reported-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/76745fceb5a3f53088110fb7a96acf88434088ca.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mxl-gsw1xx: fix .shutdown driver operation

The .shutdown operation should call dsa_switch_shutdown() just like
it is done also by the sibling lantiq_gswip driver. Not doing that
results in shutdown or reboot hanging and waiting for the CPU port
becoming free, which introduces a longer delay and a WARNING before
shutdown or reboot in case the driver is built-into the kernel.
Fix this by calling dsa_switch_shutdown() in the driver's shutdown
operation, harmonizing it with what is done in the lantiq_gswip
driver. As a side-effect this now allows to remove the previously
exported gswip_disable_switch() function which no longer got any
users.

Fixes: 22335939ec907 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/77ed91a5206e5dbf5d3e83d7e364ebfda90d31fd.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mxl-gsw1xx: fix order in .remove operation

The driver's .remove operation was calling gswip_disable_switch() which
clears the GSWIP_MDIO_GLOB_ENABLE bit before calling
dsa_unregister_switch() and thereby violating a Golden Rule of driver
development to always unpublish userspace interfaces before disabling
hardware, as pointed out by Russell King.

Fix this by relying in GSWIP_MDIO_GLOB_ENABLE being cleared by the
.teardown operation introduced by the previous commit
("net: dsa: lantiq_gswip: fix teardown order").

Fixes: 22335939ec907 ("net: dsa: add driver for MaxLinear GSW1xx switch family")
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/63f882eeb910cf24503c35a443b541cc54a930f2.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: lantiq_gswip: fix order in .remove operation

Russell King pointed out that disabling the switch by clearing
GSWIP_MDIO_GLOB_ENABLE before calling dsa_unregister_switch() is
problematic, as it violates a Golden Rule of driver development to
always first unpublish userspace interfaces and then disable the
hardware.

Fix this, and also simplify the probe() function, by introducing a
dsa_switch_ops teardown() operation which takes care of clearing the
GSWIP_MDIO_GLOB_ENABLE bit.

Fixes: 14fceff4771e5 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/4ebd72a29edc1e4059b9666a26a0bb5d906a829a.1765241054.git.daniel@makrotopia.org
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ethtool: Avoid overflowing userspace buffer on stats query

The ethtool -S command operates across three ioctl calls:
ETHTOOL_GSSET_INFO for the size, ETHTOOL_GSTRINGS for the names, and
ETHTOOL_GSTATS for the values.

If the number of stats changes between these calls (e.g., due to device
reconfiguration), userspace's buffer allocation will be incorrect,
potentially leading to buffer overflow.

Drivers are generally expected to maintain stable stat counts, but some
drivers (e.g., mlx5, bnx2x, bna, ksz884x) use dynamic counters, making
this scenario possible.

Some drivers try to handle this internally:
- bnad_get_ethtool_stats() returns early in case stats.n_stats is not
  equal to the driver's stats count.
- micrel/ksz884x also makes sure not to write anything beyond
  stats.n_stats and overflow the buffer.

However, both use stats.n_stats which is already assigned with the value
returned from get_sset_count(), hence won't solve the issue described
here.

Change ethtool_get_strings(), ethtool_get_stats(),
ethtool_get_phy_stats() to not return anything in case of a mismatch
between userspace's size and get_sset_size(), to prevent buffer
overflow.
The returned n_stats value will be equal to zero, to reflect that
nothing has been returned.

This could result in one of two cases when using upstream ethtool,
depending on when the size change is detected:
1. When detected in ethtool_get_strings():
    # ethtool -S eth2
    no stats available

2. When detected in get stats, all stats will be reported as zero.

Both cases are presumably transient, and a subsequent ethtool call
should succeed.

Other than the overflow avoidance, these two cases are very evident (no
output/cleared stats), which is arguably better than presenting
incorrect/shifted stats.
I also considered returning an error instead of a "silent" response, but
that seems more destructive towards userspace apps.

Notes:
- This patch does not claim to fix the inherent race, it only makes sure
  that we do not overflow the userspace buffer, and makes for a more
  predictable behavior.

- RTNL lock is held during each ioctl, the race window exists between
  the separate ioctl calls when the lock is released.

- Userspace ethtool always fills stats.n_stats, but it is likely that
  these stats ioctls are implemented in other userspace applications
  which might not fill it. The added code checks that it's not zero,
  to prevent any regressions.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20251208121901.3203692-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

can: fix build dependency

Arnd Bergmann's patch [1] fixed the build dependency problem introduced by
bugfix commit cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure
by default"). This ended up as commit 6abd4577bccc ("can: fix build
dependency"), but I broke Arnd's fix by removing a dependency that we
thought was superfluous.

[1] https://lore.kernel.org/all/20251204100015.1033688-1-arnd@kernel.org/

Meanwhile the problem was also found by intel's kernel test robot,
complaining about undefined symbols:

| ERROR: modpost: "m_can_class_unregister" [drivers/net/can/m_can/m_can_platform.ko] undefined!
| ERROR: modpost: "m_can_class_free_dev" [drivers/net/can/m_can/m_can_platform.ko] undefined!
| ERROR: modpost: "m_can_class_allocate_dev" [drivers/net/can/m_can/m_can_platform.ko] undefined!
| ERROR: modpost: "m_can_class_get_clocks" [drivers/net/can/m_can/m_can_platform.ko] undefined!
| ERROR: modpost: "m_can_class_register" [drivers/net/can/m_can/m_can_platform.ko] undefined!

To fix this problem, add the missing dependency again.

Cc: Vincent Mailhol <mailhol@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512132253.vO9WFDJK-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512180808.fTAUQ2XN-lkp@intel.com/
Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/all/7427949a-ea7d-4854-9fe4-e01db7d878c7@app.fastmail.com/
Fixes: 6abd4577bccc ("can: fix build dependency")
Fixes: cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure by default")
Acked-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20251217-can-fix-dependency-v1-1-fd2d4f2a2bf5@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

e1000: fix OOB in e1000_tbi_should_accept()

In e1000_tbi_should_accept() we read the last byte of the frame via
'data[length - 1]' to evaluate the TBI workaround. If the descriptor-
reported length is zero or larger than the actual RX buffer size, this
read goes out of bounds and can hit unrelated slab objects. The issue
is observed from the NAPI receive path (e1000_clean_rx_irq):

==================================================================
BUG: KASAN: slab-out-of-bounds in e1000_tbi_should_accept+0x610/0x790
Read of size 1 at addr ffff888014114e54 by task sshd/363

CPU: 0 PID: 363 Comm: sshd Not tainted 5.18.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x5a/0x74
print_address_description+0x7b/0x440
print_report+0x101/0x200
kasan_report+0xc1/0xf0
e1000_tbi_should_accept+0x610/0x790
e1000_clean_rx_irq+0xa8c/0x1110
e1000_clean+0xde2/0x3c10
__napi_poll+0x98/0x380
net_rx_action+0x491/0xa20
__do_softirq+0x2c9/0x61d
do_softirq+0xd1/0x120
</IRQ>
<TASK>
__local_bh_enable_ip+0xfe/0x130
ip_finish_output2+0x7d5/0xb00
__ip_queue_xmit+0xe24/0x1ab0
__tcp_transmit_skb+0x1bcb/0x3340
tcp_write_xmit+0x175d/0x6bd0
__tcp_push_pending_frames+0x7b/0x280
tcp_sendmsg_locked+0x2e4f/0x32d0
tcp_sendmsg+0x24/0x40
sock_write_iter+0x322/0x430
vfs_write+0x56c/0xa60
ksys_write+0xd1/0x190
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f511b476b10
Code: 73 01 c3 48 8b 0d 88 d3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d f9 2b 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e 9b 01 00 48 89 04 24
RSP: 002b:00007ffc9211d4e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000004024 RCX: 00007f511b476b10
RDX: 0000000000004024 RSI: 0000559a9385962c RDI: 0000000000000003
RBP: 0000559a9383a400 R08: fffffffffffffff0 R09: 0000000000004f00
R10: 0000000000000070 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc9211d57f R14: 0000559a9347bde7 R15: 0000000000000003
</TASK>
Allocated by task 1:
__kasan_krealloc+0x131/0x1c0
krealloc+0x90/0xc0
add_sysfs_param+0xcb/0x8a0
kernel_add_sysfs_param+0x81/0xd4
param_sysfs_builtin+0x138/0x1a6
param_sysfs_init+0x57/0x5b
do_one_initcall+0x104/0x250
do_initcall_level+0x102/0x132
do_initcalls+0x46/0x74
kernel_init_freeable+0x28f/0x393
kernel_init+0x14/0x1a0
ret_from_fork+0x22/0x30
The buggy address belongs to the object at ffff888014114000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1620 bytes to the right of
2048-byte region [ffff888014114000, ffff888014114800]
The buggy address belongs to the physical page:
page:ffffea0000504400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14110
head:ffffea0000504400 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x100000000010200(slab|head|node=0|zone=1)
raw: 0100000000010200 0000000000000000 dead000000000001 ffff888013442000
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
==================================================================

This happens because the TBI check unconditionally dereferences the last
byte without validating the reported length first:

u8 last_byte = *(data + length - 1);

Fix by rejecting the frame early if the length is zero, or if it exceeds
adapter->rx_buffer_len. This preserves the TBI workaround semantics for
valid frames and prevents touching memory beyond the RX buffer.

Fixes: 2037110c96d5 ("e1000: move tbi workaround code into helper function")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

idpf: reduce mbx_task schedule delay to 300us

During the IDPF init phase, the mailbox runs in poll mode until it is
configured to properly handle interrupts. The previous delay of 300ms is
excessively long for the mailbox polling mechanism, which causes a slow
initialization of ~2s:

echo 0000:06:12.4 > /sys/bus/pci/drivers/idpf/bind

[   52.444239] idpf 0000:06:12.4: enabling device (0000 -> 0002)
[   52.485005] idpf 0000:06:12.4: Device HW Reset initiated
[   54.177181] idpf 0000:06:12.4: PTP init failed, err=-EOPNOTSUPP
[   54.206177] idpf 0000:06:12.4: Minimum RX descriptor support not provided, using the default
[   54.206182] idpf 0000:06:12.4: Minimum TX descriptor support not provided, using the default

Changing the delay to 300us avoids the delays during the initial mailbox
transactions, making the init phase much faster:

[   83.342590] idpf 0000:06:12.4: enabling device (0000 -> 0002)
[   83.384402] idpf 0000:06:12.4: Device HW Reset initiated
[   83.518323] idpf 0000:06:12.4: PTP init failed, err=-EOPNOTSUPP
[   83.547430] idpf 0000:06:12.4: Minimum RX descriptor support not provided, using the default
[   83.547435] idpf 0000:06:12.4: Minimum TX descriptor support not provided, using the default

Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

idpf: fix LAN memory regions command on some NVMs

IPU SDK versions 1.9 through 2.0.5 require send buffer to contain a single
empty memory region. Set number of regions to 1 and use appropriate send
buffer size to satisfy this requirement.

Fixes: 6aa53e861c1a ("idpf: implement get LAN MMIO memory regions")
Suggested-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: fix off-by-one issues in iavf_config_rss_reg()

There are off-by-one bugs when configuring RSS hash key and lookup
table, causing out-of-bounds reads to memory [1] and out-of-bounds
writes to device registers.

Before commit 43a3d9ba34c9 ("i40evf: Allow PF driver to configure RSS"),
the loop upper bounds were:
    i <= I40E_VFQF_{HKEY,HLUT}_MAX_INDEX
which is safe since the value is the last valid index.

That commit changed the bounds to:
    i <= adapter->rss_{key,lut}_size / 4
where `rss_{key,lut}_size / 4` is the number of dwords, so the last
valid index is `(rss_{key,lut}_size / 4) - 1`. Therefore, using `<=`
accesses one element past the end.

Fix the issues by using `<` instead of `<=`, ensuring we do not exceed
the bounds.

[1] KASAN splat about rss_key_size off-by-one
  BUG: KASAN: slab-out-of-bounds in iavf_config_rss+0x619/0x800
  Read of size 4 at addr ffff888102c50134 by task kworker/u8:6/63

  CPU: 0 UID: 0 PID: 63 Comm: kworker/u8:6 Not tainted 6.18.0-rc2-enjuk-tnguy-00378-g3005f5b77652-dirty #156 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  Workqueue: iavf iavf_watchdog_task
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6f/0xb0
   print_report+0x170/0x4f3
   kasan_report+0xe1/0x1a0
   iavf_config_rss+0x619/0x800
   iavf_watchdog_task+0x2be7/0x3230
   process_one_work+0x7fd/0x1420
   worker_thread+0x4d1/0xd40
   kthread+0x344/0x660
   ret_from_fork+0x249/0x320
   ret_from_fork_asm+0x1a/0x30
   </TASK>

  Allocated by task 63:
   kasan_save_stack+0x30/0x50
   kasan_save_track+0x14/0x30
   __kasan_kmalloc+0x7f/0x90
   __kmalloc_noprof+0x246/0x6f0
   iavf_watchdog_task+0x28fc/0x3230
   process_one_work+0x7fd/0x1420
   worker_thread+0x4d1/0xd40
   kthread+0x344/0x660
   ret_from_fork+0x249/0x320
   ret_from_fork_asm+0x1a/0x30

  The buggy address belongs to the object at ffff888102c50100
   which belongs to the cache kmalloc-64 of size 64
  The buggy address is located 0 bytes to the right of
   allocated 52-byte region [ffff888102c50100, ffff888102c50134)

  The buggy address belongs to the physical page:
  page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x102c50
  flags: 0x200000000000000(node=0|zone=2)
  page_type: f5(slab)
  raw: 0200000000000000 ffff8881000418c0 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000080200020 00000000f5000000 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff888102c50000: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
   ffff888102c50080: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
  >ffff888102c50100: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
                                       ^
   ffff888102c50180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
   ffff888102c50200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: 43a3d9ba34c9 ("i40evf: Allow PF driver to configure RSS")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: validate ring_len parameter against hardware-specific values

The maximum number of descriptors supported by the hardware is
hardware-dependent and can be retrieved using
i40e_get_max_num_descriptors(). Move this function to a shared header
and use it when checking for valid ring_len parameter rather than using
hardcoded value.

By fixing an over-acceptance issue, behavior change could be seen where
ring_len could now be rejected while configuring rx and tx queues if its
size is larger than the hardware-dependent maximum number of
descriptors.

Fixes: 55d225670def ("i40e: add validation for ring_len param")
Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

i40e: fix scheduling in set_rx_mode

Add service task schedule to set_rx_mode.
In some cases there are error messages printed out in PTP application
(ptp4l):

ptp4l[13848.762]: port 1 (ens2f3np3): received SYNC without timestamp
ptp4l[13848.825]: port 1 (ens2f3np3): received SYNC without timestamp
ptp4l[13848.887]: port 1 (ens2f3np3): received SYNC without timestamp

This happens when service task would not run immediately after
set_rx_mode, and we need it for setup tasks. This service task checks, if
PTP RX packets are hung in firmware, and propagate correct settings such
as multicast address for IEEE 1588 Precision Time Protocol.
RX timestamping depends on some of these filters set. Bug happens only
with high PTP packets frequency incoming, and not every run since
sometimes service task is being ran from a different place immediately
after starting ptp4l.

Fixes: 0e4425ed641f ("i40e: fix: do not sleep in netdev_ops")
Reviewed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

can: j1939: make j1939_sk_bind() fail if device is no longer registered

There is a theoretical race window in j1939_sk_netdev_event_unregister()
where two j1939_sk_bind() calls jump in between read_unlock_bh() and
lock_sock().

The assumption jsk->priv == priv can fail if the first j1939_sk_bind()
call once made jsk->priv == NULL due to failed j1939_local_ecu_get() call
and the second j1939_sk_bind() call again made jsk->priv != NULL due to
successful j1939_local_ecu_get() call.

Since the socket lock is held by both j1939_sk_netdev_event_unregister()
and j1939_sk_bind(), checking ndev->reg_state with the socket lock held can
reliably make the second j1939_sk_bind() call fail (and close this race
window).

Fixes: 7fcbe5b2c6a4 ("can: j1939: implement NETDEV_UNREGISTER notification handler")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/5732921e-247e-4957-a364-da74bd7031d7@I-love.SAKURA.ne.jp
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: j1939: make j1939_session_activate() fail if device is no longer registered

syzbot is still reporting

unregister_netdevice: waiting for vcan0 to become free. Usage count = 2

even after commit 93a27b5891b8 ("can: j1939: add missing calls in
NETDEV_UNREGISTER notification handler") was added. A debug printk() patch
found that j1939_session_activate() can succeed even after
j1939_cancel_active_session() from j1939_netdev_notify(NETDEV_UNREGISTER)
has completed.

Since j1939_cancel_active_session() is processed with the session list lock
held, checking ndev->reg_state in j1939_session_activate() with the session
list lock held can reliably close the race window.

Reported-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/b9653191-d479-4c8b-8536-1326d028db5c@I-love.SAKURA.ne.jp
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix BPF builds due to -fms-extensions. selftests (Alexei
   Starovoitov), bpftool (Quentin Monnet).

- Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n
   (Geert Uytterhoeven)

- Fix livepatch/BPF interaction and support reliable unwinding through
   BPF stack frames (Josh Poimboeuf)

- Do not audit capability check in arm64 JIT (Ondrej Mosnacek)

- Fix truncated dmabuf BPF iterator reads (T.J. Mercier)

- Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)

- Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
   C23 (Mikhail Gavrilov)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: add regression test for bpf_d_path()
  bpf: Fix verifier assumptions of bpf_d_path's output buffer
  selftests/bpf: Add test for truncated dmabuf_iter reads
  bpf: Fix truncated dmabuf iterator reads
  x86/unwind/orc: Support reliable unwinding through BPF stack frames
  bpf: Add bpf_has_frame_pointer()
  bpf, arm64: Do not audit capability check in do_jit()
  libbpf: Fix -Wdiscarded-qualifiers under C23
  bpftool: Fix build warnings due to MS extensions
  net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
  selftests/bpf: Add -fms-extensions to bpf build flags

Merge tag 's390-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

- clear 'Search boot program' flag when 'bootprog' sysfs file is
   written to override a value set from Hardware Management Console

- fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
   functions when triggering PCI device recovery using sysfs

- annotate the expected lock context imbalance in zpci_release_device()
   function to fix a sparse complaint

- fix the logic to fallback to the return address register value in the
   topmost frame when stack tracing uses a back chain

* tag 's390-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/stacktrace: Do not fallback to RA register
  s390/pci: Annotate lock context imbalance in zpci_release_device()
  s390/pci: Fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
  s390/ipl: Clear SBP flag when bootprog is set

cifs: update internal module version number

to 2.58

Signed-off-by: Steve French <stfrench@microsoft.com>

smb: move some SMB1 definitions into common/smb1pdu.h

These definitions are only used by SMB1, so move them into the new
common/smb1pdu.h.

KSMBD only implements SMB_COM_NEGOTIATE, see MS-SMB2 3.3.5.2.

Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: align durable reconnect v2 context to 8 byte boundary

Add a 4-byte Pad to create_durable_handle_reconnect_v2 so the DH2C
create context is 8 byte aligned.
This avoids malformed CREATE contexts on reconnect.
Recent change removed this Padding, adding it back.

Fixes: 81a45de432c6 ("smb: move create_durable_handle_reconnect_v2 to common/smb2pdu.h")
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

wifi: iwlwifi: Implement settime64 as stub for MVM/MLD PTP

Since commit dfb073d32cac ("ptp: Return -EINVAL on ptp_clock_register if
required ops are NULL"), PTP clock registered through ptp_clock_register
is required to have ptp_clock_info.settime64 set, however, neither MVM
nor MLD's PTP clock implementation sets it, resulting in warnings when
the interface starts up, like

WARNING: drivers/ptp/ptp_clock.c:325 at ptp_clock_register+0x2c8/0x6b8, CPU#1: wpa_supplicant/469
CPU: 1 UID: 0 PID: 469 Comm: wpa_supplicant Not tainted 6.18.0+ #101 PREEMPT(full)
ra: ffff800002732cd4 iwl_mvm_ptp_init+0x114/0x188 [iwlmvm]
ERA: 9000000002fdc468 ptp_clock_register+0x2c8/0x6b8
iwlwifi 0000:01:00.0: Failed to register PHC clock (-22)

I don't find an appropriate firmware interface to implement settime64()
for iwlwifi MLD/MVM, thus instead create a stub that returns
-EOPTNOTSUPP only, suppressing the warning and allowing the PTP clock to
be registered.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20251108044822.GA3262936@ax162/
Signed-off-by: Yao Zi <ziyao@disroot.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
tested-by: damian Tometzki damian@riscv-rocks.de
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251204123204.9316-1-ziyao@disroot.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: Fix firmware version handling

On my system the arithmetic done on the firmware numbers
results in a negative number, but since the types are
unsigned it gets interpreted as a large positive number.

The end result is that the firmware gets rejected and wifi
is defunct.

Switch to signed types to handle this case correctly.

iwlwifi 0000:0c:00.0: Driver unable to support your firmware API. Driver supports FW core 4294967294..2, firmware is 2.
iwlwifi 0000:0c:00.0: Direct firmware load for iwlwifi-5000-4.ucode failed with error -2
iwlwifi 0000:0c:00.0: Direct firmware load for iwlwifi-5000-3.ucode failed with error -2
iwlwifi 0000:0c:00.0: Direct firmware load for iwlwifi-5000-2.ucode failed with error -2
iwlwifi 0000:0c:00.0: Direct firmware load for iwlwifi-5000-1.ucode failed with error -2
iwlwifi 0000:0c:00.0: no suitable firmware found!
iwlwifi 0000:0c:00.0: minimum version required: iwlwifi-5000-1
iwlwifi 0000:0c:00.0: maximum version supported: iwlwifi-5000-5
iwlwifi 0000:0c:00.0: check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

Cc: stable@vger.kernel.org
Fixes: 5f708cccde9d ("wifi: iwlwifi: add a new FW file numbering scheme")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220805
Link: https://patch.msgid.link/20251113222852.15896-1-ville.syrjala@linux.intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: ocb: skip rx_no_sta when interface is not joined

ieee80211_ocb_rx_no_sta() assumes a valid channel context, which is only
present after JOIN_OCB.

RX may run before JOIN_OCB is executed, in which case the OCB interface
is not operational. Skip RX peer handling when the interface is not
joined to avoid warnings in the RX path.

Reported-by: syzbot+b364457b2d1d4e4a3054@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b364457b2d1d4e4a3054
Tested-by: syzbot+b364457b2d1d4e4a3054@syzkaller.appspotmail.com
Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com>
Link: https://patch.msgid.link/20251216035932.18332-1-moonhee.lee.ca@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: do not use old MBSSID elements

When userspace brings down and deletes a non-transmitted profile,
it is expected to send a new updated Beacon template for the
transmitted profile of that multiple BSSID (MBSSID) group which
does not include the removed profile in MBSSID element. This
update comes via NL80211_CMD_SET_BEACON.

Such updates work well as long as the group continues to have at
least one non-transmitted profile as NL80211_ATTR_MBSSID_ELEMS
is included in the new Beacon template.

But when the last non-trasmitted profile is removed, it still
gets included in Beacon templates sent to driver. This happens
because when no MBSSID elements are sent by the userspace,
ieee80211_assign_beacon() ends up using the element stored from
earlier Beacon template.

Do not copy old MBSSID elements, instead userspace should always
include these when applicable.

Fixes: 2b3171c6fe0a ("mac80211: MBSSID beacon handling in AP mode")
Signed-off-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com>
Link: https://patch.msgid.link/20251215174656.2866319-2-aloka.dixit@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: don't WARN for connections on invalid channels

It's not clear (to me) how exactly syzbot managed to hit this,
but it seems conceivable that e.g. regulatory changed and has
disabled a channel between scanning (channel is checked to be
usable by cfg80211_get_ies_channel_number) and connecting on
the channel later.

With one scenario that isn't covered elsewhere described above,
the warning isn't good, replace it with a (more informative)
error message.

Reported-by: syzbot+639af5aa411f2581ad38@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20251202102511.5a8fb5184fa3.I961ee41b8f10538a54b8565dbf03ec1696e80e03@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: wlcore: ensure skb headroom before skb_push

This avoids occasional skb_under_panic Oops from wl1271_tx_work. In this case, headroom is
less than needed (typically 110 - 94 = 16 bytes).

Signed-off-by: Peter Astrand <astrand@lysator.liu.se>
Link: https://patch.msgid.link/097bd417-e1d7-acd4-be05-47b199075013@lysator.liu.se
Signed-off-by: Johannes Berg <johannes.berg@intel.com>