]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
3 weeks agonet: hns3: use seq_file for files in queue/ in debugfs
Jian Shen [Mon, 14 Jul 2025 06:10:30 +0000 (14:10 +0800)] 
net: hns3: use seq_file for files in queue/ in debugfs

This patch use seq_file for the following nodes:
rx_queue_info/queue_map

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: hns3: clean up the build warning in debugfs by use seq file
Jian Shen [Mon, 14 Jul 2025 06:10:29 +0000 (14:10 +0800)] 
net: hns3: clean up the build warning in debugfs by use seq file

Arnd reported that there are two build warning for on-stasck
buffer oversize. As Arnd's suggestion, using seq file way
to avoid the stack buffer or kmalloc buffer allocating.

Reported-by: Arnd Bergmann <arnd@kernel.org>
Closes: https://lore.kernel.org/all/20250610092113.2639248-1-arnd@kernel.org/
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: hns3: remove tx spare info from debugfs.
Jijie Shao [Mon, 14 Jul 2025 06:10:28 +0000 (14:10 +0800)] 
net: hns3: remove tx spare info from debugfs.

The tx spare info in debugfs is not very useful,
and there are related statistics available for troubleshooting.

This patch removes the tx spare info from debugfs.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250714061037.2616413-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoipv6: mcast: Remove unnecessary null check in ip6_mc_find_dev()
Yue Haibing [Mon, 14 Jul 2025 08:17:32 +0000 (16:17 +0800)] 
ipv6: mcast: Remove unnecessary null check in ip6_mc_find_dev()

These is no need to check null for idev before return NULL.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250714081732.3109764-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agodon't open-code kernel_accept() in rds_tcp_accept_one()
Al Viro [Sun, 13 Jul 2025 18:01:34 +0000 (19:01 +0100)] 
don't open-code kernel_accept() in rds_tcp_accept_one()

rds_tcp_accept_one() starts with a pretty much verbatim
copy of kernel_accept().  Might as well use the real thing...

That code went into mainline in 2009, kernel_accept()
had been added in Aug 2006, the copyright on rds/tcp_listen.c
is "Copyright (c) 2006 Oracle", so it's entirely possible
that it predates the introduction of kernel_accept().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://patch.msgid.link/20250713180134.GC1880847@ZenIV
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agobnxt: move bnxt_hsi.h to include/linux/bnxt/hsi.h
Andy Gospodarek [Mon, 14 Jul 2025 17:02:02 +0000 (13:02 -0400)] 
bnxt: move bnxt_hsi.h to include/linux/bnxt/hsi.h

This moves bnxt_hsi.h contents to a common location so it can be
properly referenced by bnxt_en, bnxt_re, and bnge.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250714170202.39688-1-gospo@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-mctp-improved-bind-handling'
Paolo Abeni [Tue, 15 Jul 2025 10:08:41 +0000 (12:08 +0200)] 
Merge branch 'net-mctp-improved-bind-handling'

Matt Johnston says:

====================
net: mctp: Improved bind handling

This series improves a couple of aspects of MCTP bind() handling.

MCTP wasn't checking whether the same MCTP type was bound by multiple
sockets. That would result in messages being received by an arbitrary
socket, which isn't useful behaviour. Instead it makes more sense to
have the duplicate binds fail, the same as other network protocols.
An exception is made for more-specific binds to particular MCTP
addresses.

It is also useful to be able to limit a bind to only receive incoming
request messages (MCTP TO bit set) from a specific peer+type, so that
individual processes can communicate with separate MCTP peers. One
example is a PLDM firmware update requester, which will initiate
communication with a device, and then the device will connect back to the
requester process.

These limited binds are implemented by a connect() call on the socket
prior to bind. connect() isn't used in the general case for MCTP, since
a plain send() wouldn't provide the required MCTP tag argument for
addressing.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
====================

Link: https://patch.msgid.link/20250710-mctp-bind-v4-0-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: Add bind lookup test
Matt Johnston [Thu, 10 Jul 2025 08:56:01 +0000 (16:56 +0800)] 
net: mctp: Add bind lookup test

Test the preference order of bound socket matches with a series of test
packets.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-8-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: Test conflicts of connect() with bind()
Matt Johnston [Thu, 10 Jul 2025 08:56:00 +0000 (16:56 +0800)] 
net: mctp: Test conflicts of connect() with bind()

The addition of connect() adds new conflict cases to test.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-7-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: Allow limiting binds to a peer address
Matt Johnston [Thu, 10 Jul 2025 08:55:59 +0000 (16:55 +0800)] 
net: mctp: Allow limiting binds to a peer address

Prior to calling bind() a program may call connect() on a socket to
restrict to a remote peer address.

Using connect() is the normal mechanism to specify a remote network
peer, so we use that here. In MCTP connect() is only used for bound
sockets - send() is not available for MCTP since a tag must be provided
for each message.

The smctp_type must match between connect() and bind() calls.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-6-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: Use hashtable for binds
Matt Johnston [Thu, 10 Jul 2025 08:55:58 +0000 (16:55 +0800)] 
net: mctp: Use hashtable for binds

Ensure that a specific EID (remote or local) bind will match in
preference to a MCTP_ADDR_ANY bind.

This adds infrastructure for binding a socket to receive messages from a
specific remote peer address, a future commit will expose an API for
this.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-5-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: Add test for conflicting bind()s
Matt Johnston [Thu, 10 Jul 2025 08:55:57 +0000 (16:55 +0800)] 
net: mctp: Add test for conflicting bind()s

Test pairwise combinations of bind addresses and types.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-4-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: Treat MCTP_NET_ANY specially in bind()
Matt Johnston [Thu, 10 Jul 2025 08:55:56 +0000 (16:55 +0800)] 
net: mctp: Treat MCTP_NET_ANY specially in bind()

When a specific EID is passed as a bind address, it only makes sense to
interpret with an actual network ID, so resolve that to the default
network at bind time.

For bind address of MCTP_ADDR_ANY, we want to be able to capture traffic
to any network and address, so keep the current behaviour of matching
traffic from any network interface (don't interpret MCTP_NET_ANY as
the default network ID).

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-3-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: Prevent duplicate binds
Matt Johnston [Thu, 10 Jul 2025 08:55:55 +0000 (16:55 +0800)] 
net: mctp: Prevent duplicate binds

Disallow bind() calls that have the same arguments as existing bound
sockets.  Previously multiple sockets could bind() to the same
type/local address, with an arbitrary socket receiving matched messages.

This is only a partial fix, a future commit will define precedence order
for MCTP_ADDR_ANY versus specific EID bind(), which are allowed to exist
together.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-2-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agonet: mctp: mctp_test_route_extaddr_input cleanup
Matt Johnston [Thu, 10 Jul 2025 08:55:54 +0000 (16:55 +0800)] 
net: mctp: mctp_test_route_extaddr_input cleanup

The sock was not being released. Other than leaking, the stale socket
will conflict with subsequent bind() calls in unrelated MCTP tests.

Fixes: 46ee16462fed ("net: mctp: test: Add extaddr routing output test")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250710-mctp-bind-v4-1-8ec2f6460c56@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()
Yue Haibing [Mon, 14 Jul 2025 08:19:49 +0000 (16:19 +0800)] 
ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()

Avoid duplicate non-null pointer check for pmc in mld_del_delrec().
No functional changes.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250714081949.3109947-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
3 weeks agoMerge branch 'tcp-receiver-changes'
Jakub Kicinski [Tue, 15 Jul 2025 01:40:51 +0000 (18:40 -0700)] 
Merge branch 'tcp-receiver-changes'

Eric Dumazet says:

====================
tcp: receiver changes

Before accepting an incoming packet:

- Make sure to not accept a packet beyond advertized RWIN.
  If not, increment a new SNMP counter (LINUX_MIB_BEYOND_WINDOW)

- ooo packets should update rcv_mss and tp->scaling_ratio.

- Make sure to not accept packet beyond sk_rcvbuf limit.

This series includes three associated packetdrill tests.
====================

Link: https://patch.msgid.link/20250711114006.480026-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests/net: packetdrill: add tcp_rcv_toobig.pkt
Eric Dumazet [Fri, 11 Jul 2025 11:40:06 +0000 (11:40 +0000)] 
selftests/net: packetdrill: add tcp_rcv_toobig.pkt

Check that TCP receiver behavior after "tcp: stronger sk_rcvbuf checks"

Too fat packet is dropped unless receive queue is empty.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotcp: stronger sk_rcvbuf checks
Eric Dumazet [Fri, 11 Jul 2025 11:40:05 +0000 (11:40 +0000)] 
tcp: stronger sk_rcvbuf checks

Currently, TCP stack accepts incoming packet if sizes of receive queues
are below sk->sk_rcvbuf limit.

This can cause memory overshoot if the packet is big, like an 1/2 MB
BIG TCP one.

Refine the check to take into account the incoming skb truesize.

Note that we still accept the packet if the receive queue is empty,
to not completely freeze TCP flows in pathological conditions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotcp: add const to tcp_try_rmem_schedule() and sk_rmem_schedule() skb
Eric Dumazet [Fri, 11 Jul 2025 11:40:04 +0000 (11:40 +0000)] 
tcp: add const to tcp_try_rmem_schedule() and sk_rmem_schedule() skb

These functions to not modify the skb, add a const qualifier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests/net: packetdrill: add tcp_ooo_rcv_mss.pkt
Eric Dumazet [Fri, 11 Jul 2025 11:40:03 +0000 (11:40 +0000)] 
selftests/net: packetdrill: add tcp_ooo_rcv_mss.pkt

We make sure tcpi_rcv_mss and tp->scaling_ratio
are correctly updated if no in-order packet has been received yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotcp: call tcp_measure_rcv_mss() for ooo packets
Eric Dumazet [Fri, 11 Jul 2025 11:40:02 +0000 (11:40 +0000)] 
tcp: call tcp_measure_rcv_mss() for ooo packets

tcp_measure_rcv_mss() is used to update icsk->icsk_ack.rcv_mss
(tcpi_rcv_mss in tcp_info) and tp->scaling_ratio.

Calling it from tcp_data_queue_ofo() makes sure these
fields are updated, and permits a better tuning
of sk->sk_rcvbuf, in the case a new flow receives many ooo
packets.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests/net: packetdrill: add tcp_rcv_big_endseq.pkt
Eric Dumazet [Fri, 11 Jul 2025 11:40:01 +0000 (11:40 +0000)] 
selftests/net: packetdrill: add tcp_rcv_big_endseq.pkt

This test checks TCP behavior when receiving a packet beyond the window.

It checks the new TcpExtBeyondWindow SNMP counter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotcp: add LINUX_MIB_BEYOND_WINDOW
Eric Dumazet [Fri, 11 Jul 2025 11:40:00 +0000 (11:40 +0000)] 
tcp: add LINUX_MIB_BEYOND_WINDOW

Add a new SNMP MIB : LINUX_MIB_BEYOND_WINDOW

Incremented when an incoming packet is received beyond the
receiver window.

nstat -az | grep TcpExtBeyondWindow

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotcp: do not accept packets beyond window
Eric Dumazet [Fri, 11 Jul 2025 11:39:59 +0000 (11:39 +0000)] 
tcp: do not accept packets beyond window

Currently, TCP accepts incoming packets which might go beyond the
offered RWIN.

Add to tcp_sequence() the validation of packet end sequence.

Add the corresponding check in the fast path.

We relax this new constraint if the receive queue is empty,
to not freeze flows from buggy peers.

Add a new drop reason : SKB_DROP_REASON_TCP_INVALID_END_SEQUENCE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: wangxun: fix LIBWX dependencies again
Arnd Bergmann [Fri, 11 Jul 2025 08:23:34 +0000 (10:23 +0200)] 
net: wangxun: fix LIBWX dependencies again

Two more drivers got added that use LIBWX and cause a build warning

WARNING: unmet direct dependencies detected for LIBWX
  Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
  Selected by [y]:
  - NGBEVF [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PCI_MSI [=y]
  Selected by [m]:
  - NGBE [=m] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_WANGXUN [=y] && PCI [=y] && PTP_1588_CLOCK_OPTIONAL [=m]

ld: drivers/net/ethernet/wangxun/libwx/wx_lib.o: in function `wx_clean_tx_irq':
wx_lib.c:(.text+0x5a68): undefined reference to `ptp_schedule_worker'
ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_nway_reset':
wx_ethtool.c:(.text+0x880): undefined reference to `phylink_ethtool_nway_reset'

Add the same dependency on PTP_1588_CLOCK_OPTIONAL to the two driver
using this library module, following the pattern from commit
8fa19c2c69fb ("net: wangxun: fix LIBWX dependencies").

Fixes: 377d180bd71c ("net: wangxun: add txgbevf build")
Fixes: a0008a3658a3 ("net: wangxun: add ngbevf build")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://patch.msgid.link/20250711082339.1372821-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoAdd support to set NAPI threaded for individual NAPI
Samiullah Khawaja [Thu, 10 Jul 2025 21:12:03 +0000 (21:12 +0000)] 
Add support to set NAPI threaded for individual NAPI

A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.

Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.

Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.

Tested
 ./tools/testing/selftests/net/nl_netdev.py
 TAP version 13
 1..7
 ok 1 nl_netdev.empty_check
 ok 2 nl_netdev.lo_check
 ok 3 nl_netdev.page_pool_check
 ok 4 nl_netdev.napi_list_check
 ok 5 nl_netdev.dev_set_threaded
 ok 6 nl_netdev.napi_set_threaded
 ok 7 nl_netdev.nsim_rxq_reset_down
 # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: phy: Don't register LEDs for genphy
Sean Anderson [Thu, 10 Jul 2025 20:14:53 +0000 (16:14 -0400)] 
net: phy: Don't register LEDs for genphy

If a PHY has no driver, the genphy driver is probed/removed directly in
phy_attach/detach. If the PHY's ofnode has an "leds" subnode, then the
LEDs will be (un)registered when probing/removing the genphy driver.
This could occur if the leds are for a non-generic driver that isn't
loaded for whatever reason. Synchronously removing the PHY device in
phy_detach leads to the following deadlock:

rtnl_lock()
ndo_close()
    ...
    phy_detach()
        phy_remove()
            phy_leds_unregister()
                led_classdev_unregister()
                    led_trigger_set()
                        netdev_trigger_deactivate()
                            unregister_netdevice_notifier()
                                rtnl_lock()

There is a corresponding deadlock on the open/register side of things
(and that one is reported by lockdep), but it requires a race while this
one is deterministic. Regular drivers do not have this problem since
they are probed asynchronously (without RTNL held).

Generic PHYs do not support LEDs anyway, so don't bother registering
them.

[JakubL this is a net-next version of
 commit f0f2b992d818 ("net: phy: Don't register LEDs for genphy"),
 which uses APIs removed in -next.]

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250710201454.1280277-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonetdevsim: implement peer queue flow control
Breno Leitao [Fri, 11 Jul 2025 17:06:59 +0000 (10:06 -0700)] 
netdevsim: implement peer queue flow control

Add flow control mechanism between paired netdevsim devices to stop the
TX queue during high traffic scenarios. When a receive queue becomes
congested (approaching NSIM_RING_SIZE limit), the corresponding transmit
queue on the peer device is stopped using netif_subqueue_try_stop().

Once the receive queue has sufficient capacity again, the peer's
transmit queue is resumed with netif_tx_wake_queue().

Key changes:
  * Add nsim_stop_peer_tx_queue() to pause peer TX when RX queue is full
  * Add nsim_start_peer_tx_queue() to resume peer TX when RX queue drains
  * Implement queue mapping validation to ensure TX/RX queue count match
  * Wake all queues during device unlinking to prevent stuck queues
  * Use RCU protection when accessing peer device references
  * wake the queues when changing the queue numbers
  * Remove IFF_NO_QUEUE given it will enqueue packets now

The flow control only activates when devices have matching TX/RX queue
counts to ensure proper queue mapping.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250711-netdev_flow_control-v3-1-aa1d5a155762@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: net: add test for variable PMTU in broadcast routes
Oscar Maes [Thu, 10 Jul 2025 14:27:14 +0000 (16:27 +0200)] 
selftests: net: add test for variable PMTU in broadcast routes

Added a test for variable PMTU in broadcast routes.

This test uses iputils' ping and attempts to send a ping between
two peers, which should result in a regular echo reply.

This test will fail when the receiving peer does not receive the echo
request due to a lack of packet fragmentation.

Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250710142714.12986-2-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: ipv4: fix incorrect MTU in broadcast routes
Oscar Maes [Thu, 10 Jul 2025 14:27:13 +0000 (16:27 +0200)] 
net: ipv4: fix incorrect MTU in broadcast routes

Currently, __mkroute_output overrules the MTU value configured for
broadcast routes.

This buggy behaviour can be reproduced with:

ip link set dev eth1 mtu 9000
ip route del broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2
ip route add broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2 mtu 1500

The maximum packet size should be 1500, but it is actually 8000:

ping -b 192.168.0.255 -s 8000

Fix __mkroute_output to allow MTU values to be configured for
for broadcast routes (to support a mixed-MTU local-area-network).

Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250710142714.12986-1-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Tue, 15 Jul 2025 00:26:57 +0000 (17:26 -0700)] 
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2025-07-14

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: IFC updates for disabled host PF
  net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc
  RDMA/mlx5: Fix UMR modifying of mkey page size
  net/mlx5: Expose HCA capability bits for mkey max page size
====================

Link: https://patch.msgid.link/1752481357-34780-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Tue, 15 Jul 2025 00:26:20 +0000 (17:26 -0700)] 
Merge tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-07-11

The first patch is by Geert Uytterhoeven and converts the rcar_can
driver to DEFINE_SIMPLE_DEV_PM_OPS.

The last patch is by Biju Das and removes unused macros from the
rcar_canfd driver.

* tag 'linux-can-next-for-6.17-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: rcar_canfd: Drop unused macros
  can: rcar_can: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
====================

Link: https://patch.msgid.link/20250711101706.2822687-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/x25: Remove unused x25_terminate_link()
Dr. David Alan Gilbert [Sat, 12 Jul 2025 20:57:59 +0000 (21:57 +0100)] 
net/x25: Remove unused x25_terminate_link()

x25_terminate_link() has been unused since the last use was removed
in 2020 by:
commit 7eed751b3b2a ("net/x25: handle additional netdev events")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://patch.msgid.link/20250712205759.278777-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: drv-net: add rss_api to the Makefile
Jakub Kicinski [Sat, 12 Jul 2025 01:20:05 +0000 (18:20 -0700)] 
selftests: drv-net: add rss_api to the Makefile

I missed adding rss_api.py to the Makefile. The NIPA Makefile
checking script was scanning for shell scripts only, so it
didn't flag it either.

Fixes: 4d13c6c449af ("selftests: drv-net: test RSS Netlink notifications")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250712012005.4010263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: thunderx: Fix format-truncation warning in bgx_acpi_match_id()
Alok Tiwari [Fri, 11 Jul 2025 14:05:30 +0000 (07:05 -0700)] 
net: thunderx: Fix format-truncation warning in bgx_acpi_match_id()

The buffer bgx_sel used in snprintf() was too small to safely hold
the formatted string "BGX%d" for all valid bgx_id values. This caused
a -Wformat-truncation warning with `Werror` enabled during build.

Increase the buffer size from 5 to 7 and use `sizeof(bgx_sel)` in
snprintf() to ensure safety and suppress the warning.

Build warning:
  CC      drivers/net/ethernet/cavium/thunder/thunder_bgx.o
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c: In function
‘bgx_acpi_match_id’:
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:27: error: â€˜%d’
directive output may be truncated writing between 1 and 3 bytes into a
region of size 2 [-Werror=format-truncation=]
    snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
                             ^~
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:23: note:
directive argument in the range [0, 255]
    snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);
                         ^~~~~~~
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c:1434:2: note:
‘snprintf’ output between 5 and 7 bytes into a destination of size 5
    snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id);

compiler warning due to insufficient snprintf buffer size.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250711140532.2463602-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-fec-add-some-optimizations'
Jakub Kicinski [Tue, 15 Jul 2025 00:14:34 +0000 (17:14 -0700)] 
Merge branch 'net-fec-add-some-optimizations'

Wei Fang says:

====================
net: fec: add some optimizations

Add some optimizations to the fec driver, see each patch for details.

v1: https://lore.kernel.org/20250710090902.1171180-1-wei.fang@nxp.com
====================

Link: https://patch.msgid.link/20250711091639.1374411-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: add fec_set_hw_mac_addr() helper function
Wei Fang [Fri, 11 Jul 2025 09:16:39 +0000 (17:16 +0800)] 
net: fec: add fec_set_hw_mac_addr() helper function

In the current driver, the MAC address is set in both fec_restart() and
fec_set_mac_address(), so a generic helper function fec_set_hw_mac_addr()
is added to set the hardware MAC address to make the code more compact.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250711091639.1374411-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: add more macros for bits of FEC_ECR
Wei Fang [Fri, 11 Jul 2025 09:16:38 +0000 (17:16 +0800)] 
net: fec: add more macros for bits of FEC_ECR

There are also some RCR bits that are not defined but are used by the
driver, so add macro definitions for these bits to improve readability
and maintainability.

In addition, although FEC_RCR_HALFDPX has been defined, it is not used
in the driver. According to the description of FEC_RCR[1] in RM, it is
used to disable receive on transmit. Therefore, it is more appropriate
to redefine FEC_RCR[1] as FEC_RCR_DRT.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250711091639.1374411-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: fec: use phy_interface_mode_is_rgmii() to check RGMII mode
Wei Fang [Fri, 11 Jul 2025 09:16:37 +0000 (17:16 +0800)] 
net: fec: use phy_interface_mode_is_rgmii() to check RGMII mode

Use the generic helper function phy_interface_mode_is_rgmii() to check
RGMII mode.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250711091639.1374411-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agodev: Pass netdevice_tracker to dev_get_by_flags_rcu().
Kuniyuki Iwashima [Fri, 11 Jul 2025 05:10:59 +0000 (05:10 +0000)] 
dev: Pass netdevice_tracker to dev_get_by_flags_rcu().

This is a follow-up for commit eb1ac9ff6c4a5 ("ipv6: anycast: Don't
hold RTNL for IPV6_JOIN_ANYCAST.").

We should not add a new device lookup API without netdevice_tracker.

Let's pass netdevice_tracker to dev_get_by_flags_rcu() and rename it
with netdev_ prefix to match other newer APIs.

Note that we always use GFP_ATOMIC for netdev_hold() as it's expected
to be called under RCU.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250708184053.102109f6@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250711051120.2866855-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: phy: micrel: Add ksz9131_resume()
Biju Das [Fri, 11 Jul 2025 05:40:21 +0000 (06:40 +0100)] 
net: phy: micrel: Add ksz9131_resume()

The Renesas RZ/G3E SMARC EVK uses KSZ9131RNXC phy. On deep power state,
PHY loses the power and on wakeup the rgmii delays are not reconfigured
causing it to fail.

Replace the callback kszphy_resume()->ksz9131_resume() for reconfiguring
the rgmii_delay when it exits from PM suspend state.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250711054029.48536-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agotools: ynl: default to --process-unknown in installed mode
Jakub Kicinski [Thu, 10 Jul 2025 17:51:15 +0000 (10:51 -0700)] 
tools: ynl: default to --process-unknown in installed mode

We default to raising an exception when unknown attrs are found
to make sure those are noticed during development.
When YNL CLI is "installed" and used by sysadmins erroring out
is not going to be helpful. It's far more likely the user space
is older than the kernel in that case, than that some attr is
misdefined or missing.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 weeks agonet: dsa: mt7530: Constify struct regmap_config
Christophe JAILLET [Sun, 13 Jul 2025 15:09:24 +0000 (17:09 +0200)] 
net: dsa: mt7530: Constify struct regmap_config

'struct regmap_config' are not modified in these drivers. They be
statically defined instead of allocated and populated at run-time.

The main benefits are:
  - it saves some memory at runtime
  - the structures can be declared as 'const', which is always better for
    structures that hold some function pointers
  - the code is less verbose

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 weeks agotools: ynl: process unknown for enum values
Donald Hunter [Fri, 11 Jul 2025 17:04:56 +0000 (18:04 +0100)] 
tools: ynl: process unknown for enum values

Extend the process_unknown handing to enum values and flags.

Tested by removing entries from rt-link.yaml and rt-neigh.yaml:

./tools/net/ynl/pyynl/cli.py --family rt-link --dump getlink \
    --process-unknown --output-json | jq '.[0] | ."ifi-flags"'
[
  "up",
  "Unknown(6)",
  "loopback",
  "Unknown(16)"
]

./tools/net/ynl/pyynl/cli.py --family rt-neigh --dump getneigh \
    --process-unknown --output-json | jq '.[] | ."ndm-type"'
"unicast"
"Unknown(5)"
"Unknown(5)"
"unicast"
"Unknown(5)"
"unicast"
"broadcast"

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 weeks agonet/mlx5: IFC updates for disabled host PF
Daniel Jurgens [Wed, 9 Jul 2025 12:41:07 +0000 (15:41 +0300)] 
net/mlx5: IFC updates for disabled host PF

The port 2 host PF can be disabled, this bit reflects that setting.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752064867-16874-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 weeks agonet/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc
Carolina Jubran [Wed, 9 Jul 2025 12:41:06 +0000 (15:41 +0300)] 
net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc

Introduce the `disciplined_fr_counter` capability bit to indicate that
the device’s free-running cycle counter is disciplined to real-time.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752064867-16874-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 weeks agoRDMA/mlx5: Fix UMR modifying of mkey page size
Edward Srouji [Wed, 9 Jul 2025 06:42:09 +0000 (09:42 +0300)] 
RDMA/mlx5: Fix UMR modifying of mkey page size

When changing the page size on an mkey, the driver needs to set the
appropriate bits in the mkey mask to indicate which fields are being
modified.
The 6th bit of a page size in mlx5 driver is considered an extension,
and this bit has a dedicated capability and mask bits.

Previously, the driver was not setting this mask in the mkey mask when
performing page size changes, regardless of its hardware support,
potentially leading to an incorrect page size updates.

This fixes the issue by setting the relevant bit in the mkey mask when
performing page size changes on an mkey and the 6th bit of this field is
supported by the hardware.

Fixes: cef7dde8836a ("net/mlx5: Expand mkey page size to support 6 bits")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/9f43a9c73bf2db6085a99dc836f7137e76579f09.1751979184.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 weeks agonet/mlx5: Expose HCA capability bits for mkey max page size
Michael Guralnik [Wed, 9 Jul 2025 06:42:08 +0000 (09:42 +0300)] 
net/mlx5: Expose HCA capability bits for mkey max page size

Expose the HCA capability for maximal page size that can be configured
for an mkey. Used for enforcing capabilities when working with highly
contiguous memory and using large page sizes.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/3e4d3fda37934430f65f72601519e22bf396fd05.1751979184.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoMerge tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge
Jakub Kicinski [Sat, 12 Jul 2025 00:50:26 +0000 (17:50 -0700)] 
Merge tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - batman-adv: store hard_iface as iflink private data,
   by Matthias Schiffer

* tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge:
  batman-adv: store hard_iface as iflink private data
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20250710164501.153872-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Sat, 12 Jul 2025 00:33:06 +0000 (17:33 -0700)] 
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: cleanups and preparation for live migration

Jake Keller says:

Various cleanups and preparation to the ice driver code for supporting
SR-IOV live migration.

The logic for unpacking Rx queue context data is added. This is the inverse
of the existing packing logic. Thanks to <linux/packing.h> this is trivial
to add.

Code to enable both reading and writing the Tx queue context for a queue
over a shared hardware register interface is added. Thanks to ice_adapter,
this is locked across all PFs that need to use it, preventing concurrency
issues with multiple PFs.

The RSS hash configuration requested by a VF is cached within the VF
structure. This will be used to track and restore the same configuration
during migration load.

ice_sriov_set_msix_vec_count() is updated to use pci_iov_vf_id() instead of
open-coding a worse equivalent, and checks to avoid rebuilding MSI-X if the
current request is for the existing amount of vectors.

A new ice_get_vf_by_dev() helper function is added to simplify accessing a
VF from its PCI device structure. This will be used more heavily within the
live migration code itself.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: introduce ice_get_vf_by_dev() wrapper
  ice: avoid rebuilding if MSI-X vector count is unchanged
  ice: use pci_iov_vf_id() to get VF ID
  ice: expose VF functions used by live migration
  ice: move ice_vsi_update_l2tsel to ice_lib.c
  ice: save RSS hash configuration for migration
  ice: add functions to get and set Tx queue context
  ice: add support for reading and unpacking Rx queue context
====================

Link: https://patch.msgid.link/20250710214518.1824208-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: ll_temac: Fix incorrect PHY node reference in debug message
Alok Tiwari [Thu, 10 Jul 2025 18:37:34 +0000 (11:37 -0700)] 
net: ll_temac: Fix incorrect PHY node reference in debug message

In temac_probe(), the debug message intended to print the resolved
PHY node was mistakenly using the controller node temac_np
instead of the actual PHY node lp->phy_node. This patch corrects
the log to reference the correct device tree node.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250710183737.2385156-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoselftests/net: packetdrill: add --mss option to three tests
Eric Dumazet [Thu, 10 Jul 2025 15:56:41 +0000 (15:56 +0000)] 
selftests/net: packetdrill: add --mss option to three tests

Three tests are cooking GSO packets but do not provide
gso_size information to the kernel, triggering this message:

TCP: tun0: Driver has suspect GRO implementation, TCP performance may be compromised.

Add --mss option to avoid this warning.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710155641.3028726-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge branch 'netdevsim-support-setting-a-permanent-address'
Jakub Kicinski [Sat, 12 Jul 2025 00:00:19 +0000 (17:00 -0700)] 
Merge branch 'netdevsim-support-setting-a-permanent-address'

Toke Høiland-Jørgensen says:

====================
netdevsim: support setting a permanent address

Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.

This series adds support to netdevsim to set a permanent address on port
creation, and adds a test script to test setting and getting of the
different L2 address types.

v3: https://lore.kernel.org/20250706-netdevsim-perm_addr-v3-0-88123e2b2027@redhat.com
v2: https://lore.kernel.org/20250702-netdevsim-perm_addr-v2-0-66359a6288f0@redhat.com
v1: https://lore.kernel.org/20250203-netdevsim-perm_addr-v1-1-10084bc93044@redhat.com
====================

Link: https://patch.msgid.link/20250710-netdevsim-perm_addr-v4-0-c9db2fecf3bf@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoselftests: net: add netdev-l2addr.sh for testing L2 address functionality
Toke Høiland-Jørgensen [Thu, 10 Jul 2025 11:18:34 +0000 (13:18 +0200)] 
selftests: net: add netdev-l2addr.sh for testing L2 address functionality

Add a new test script to the network selftests which tests getting and
setting of layer 2 addresses through netlink, including the newly added
support for setting a permaddr on netdevsim devices.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250710-netdevsim-perm_addr-v4-2-c9db2fecf3bf@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: netdevsim: Support setting dev->perm_addr on port creation
Toke Høiland-Jørgensen [Thu, 10 Jul 2025 11:18:33 +0000 (13:18 +0200)] 
net: netdevsim: Support setting dev->perm_addr on port creation

Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.

To support this use case, support setting netdev->perm_addr when
creating a netdevsim port.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250710-netdevsim-perm_addr-v4-1-c9db2fecf3bf@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoselftests: flip local/remote endpoints in iou-zcrx.py
Vishwanath Seshagiri [Thu, 10 Jul 2025 16:53:37 +0000 (09:53 -0700)] 
selftests: flip local/remote endpoints in iou-zcrx.py

The iou-zcrx selftest currently runs the server on the remote host
and the client on the local host. This commit flips the endpoints
such that server runs on localhost and client on remote.
This change brings the iou-zcrx selftest in convention with other
selftests.

Drive-by fix for a missing import exception that happens when the
network interface has less than 2 combined channels.

Test plan: ran iou-zcrx.py selftest between 2 physical machines

Signed-off-by: Vishwanath Seshagiri <vishs@fb.com>
Link: https://patch.msgid.link/20250710165337.614159-1-vishs@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agosfc: falcon: refactor and document ef4_ethtool_get_rxfh_fields
Edward Cree [Thu, 10 Jul 2025 17:32:13 +0000 (18:32 +0100)] 
sfc: falcon: refactor and document ef4_ethtool_get_rxfh_fields

The code had some rather odd control flow inherited from when it was
 shared with siena and ef10 before this driver was split out.
Simplify that for easier reading.
Also add a comment explaining why we return the values we do, since
 some Falcon documents and datasheets confusingly mention the part
 supporting 4-tuple UDP hashing.
(I couldn't find any record of exactly what was "broken" about the
 original Falcon A hash, I'm just trusting that falcon_init_rx_cfg()
 had a good reason for not using it.)

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250710173213.1638397-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge branch 'net_sched-act-extend-rcu-use-in-dump-methods'
Jakub Kicinski [Fri, 11 Jul 2025 23:01:23 +0000 (16:01 -0700)] 
Merge branch 'net_sched-act-extend-rcu-use-in-dump-methods'

Eric Dumazet says:

====================
net_sched: act: extend RCU use in dump() methods

We are trying to get away from central RTNL in favor of fine-grained
mutexes. While looking at net/sched, I found that act already uses
RCU in the fast path for the most cases, and could also be used
in dump() methods.

This series is not complete and will be followed by a second one.

v1: https://lore.kernel.org/20250707130110.619822-1-edumazet@google.com
====================

Link: https://patch.msgid.link/20250709090204.797558-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_skbedit: use RCU in tcf_skbedit_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:02:03 +0000 (09:02 +0000)] 
net_sched: act_skbedit: use RCU in tcf_skbedit_dump()

Also storing tcf_action into struct tcf_skbedit_params
makes sure there is no discrepancy in tcf_skbedit_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_police: use RCU in tcf_police_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:02:02 +0000 (09:02 +0000)] 
net_sched: act_police: use RCU in tcf_police_dump()

Also storing tcf_action into struct tcf_police_params
makes sure there is no discrepancy in tcf_police_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-11-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_pedit: use RCU in tcf_pedit_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:02:01 +0000 (09:02 +0000)] 
net_sched: act_pedit: use RCU in tcf_pedit_dump()

Also storing tcf_action into struct tcf_pedit_params
makes sure there is no discrepancy in tcf_pedit_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_nat: use RCU in tcf_nat_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:02:00 +0000 (09:02 +0000)] 
net_sched: act_nat: use RCU in tcf_nat_dump()

Also storing tcf_action into struct tcf_nat_params
makes sure there is no discrepancy in tcf_nat_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_mpls: use RCU in tcf_mpls_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:01:59 +0000 (09:01 +0000)] 
net_sched: act_mpls: use RCU in tcf_mpls_dump()

Also storing tcf_action into struct tcf_mpls_params
makes sure there is no discrepancy in tcf_mpls_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_ctinfo: use RCU in tcf_ctinfo_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:01:58 +0000 (09:01 +0000)] 
net_sched: act_ctinfo: use RCU in tcf_ctinfo_dump()

Also storing tcf_action into struct tcf_ctinfo_params
makes sure there is no discrepancy in tcf_ctinfo_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_ctinfo: use atomic64_t for three counters
Eric Dumazet [Wed, 9 Jul 2025 09:01:57 +0000 (09:01 +0000)] 
net_sched: act_ctinfo: use atomic64_t for three counters

Commit 21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
missed that stats_dscp_set, stats_dscp_error and stats_cpmark_set
might be written (and read) locklessly.

Use atomic64_t for these three fields, I doubt act_ctinfo is used
heavily on big SMP hosts anyway.

Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250709090204.797558-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_ct: use RCU in tcf_ct_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:01:56 +0000 (09:01 +0000)] 
net_sched: act_ct: use RCU in tcf_ct_dump()

Also storing tcf_action into struct tcf_ct_params
makes sure there is no discrepancy in tcf_ct_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_csum: use RCU in tcf_csum_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:01:55 +0000 (09:01 +0000)] 
net_sched: act_csum: use RCU in tcf_csum_dump()

Also storing tcf_action into struct tcf_csum_params
makes sure there is no discrepancy in tcf_csum_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act_connmark: use RCU in tcf_connmark_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:01:54 +0000 (09:01 +0000)] 
net_sched: act_connmark: use RCU in tcf_connmark_dump()

Also storing tcf_action into struct tcf_connmark_parms
makes sure there is no discrepancy in tcf_connmark_act().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet_sched: act: annotate data-races in tcf_lastuse_update() and tcf_tm_dump()
Eric Dumazet [Wed, 9 Jul 2025 09:01:53 +0000 (09:01 +0000)] 
net_sched: act: annotate data-races in tcf_lastuse_update() and tcf_tm_dump()

tcf_tm_dump() reads fields that can be changed concurrently,
and tcf_lastuse_update() might race against itself.

Add READ_ONCE() and WRITE_ONCE() annotations.

Fetch jiffies once in tcf_tm_dump().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250709090204.797558-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoeth: fbnic: fix ubsan complaints about OOB accesses
Jakub Kicinski [Wed, 9 Jul 2025 20:59:10 +0000 (13:59 -0700)] 
eth: fbnic: fix ubsan complaints about OOB accesses

UBSAN complains that we reach beyond the end of the log entry:

   UBSAN: array-index-out-of-bounds in drivers/net/ethernet/meta/fbnic/fbnic_fw_log.c:94:50
   index 71 is out of range for type 'char [*]'
   Call Trace:
    <TASK>
    ubsan_epilogue+0x5/0x2b
    fbnic_fw_log_write+0x120/0x960
    fbnic_fw_parse_logs+0x161/0x210

We're just taking the address of the character after the array,
so this really seems like something that should be legal.
But whatever, easy enough to silence by doing direct pointer math.

Fixes: c2b93d6beca8 ("eth: fbnic: Create ring buffer for firmware logs")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250709205910.3107691-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agovirtio_net: simplify tx queue wake condition check
Liming Wu [Thu, 10 Jul 2025 02:32:08 +0000 (10:32 +0800)] 
virtio_net: simplify tx queue wake condition check

Consolidate the two nested if conditions for checking tx queue wake
conditions into a single combined condition. This improves code
readability without changing functionality. And move netif_tx_wake_queue
into if condition to reduce unnecessary checks for queue stops.

Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250710023208.846-1-liming.wu@jaguarmicro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoselftests/tc-testing: Add tests for restrictions on netem duplication
William Liu [Tue, 8 Jul 2025 16:44:05 +0000 (16:44 +0000)] 
selftests/tc-testing: Add tests for restrictions on netem duplication

Ensure that a duplicating netem cannot exist in a tree with other netems
in both qdisc addition and change. This is meant to prevent the soft
lockup and OOM loop scenario discussed in [1]. Also adjust a HFSC's
re-entrancy test case with netem for this new restriction - KASAN
still triggers upon its failure.

[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/

Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250708164219.875521-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet/sched: Restrict conditions for adding duplicating netems to qdisc tree
William Liu [Tue, 8 Jul 2025 16:43:26 +0000 (16:43 +0000)] 
net/sched: Restrict conditions for adding duplicating netems to qdisc tree

netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.

Previous approaches suggested in discussions in chronological order:

1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].

2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.

3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.

4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.

[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
[2] https://lwn.net/Articles/719297/

Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@willsroot.io>
Reported-by: Savino Dicanosa <savy@syst3mfailure.io>
Signed-off-by: William Liu <will@willsroot.io>
Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20250708164141.875402-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 10 Jul 2025 17:08:47 +0000 (10:08 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).

No conflicts.

Adjacent changes:

drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
  c701574c5412 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan")
  b3a431fe2e39 ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")

drivers/net/wireless/mediatek/mt76/mt7996/mac.c
  62da647a2b20 ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()")
  dc66a129adf1 ("wifi: mt76: add a wrapper for wcid access with validation")

drivers/net/wireless/mediatek/mt76/mt7996/main.c
  3dd6f67c669c ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()")
  8989d8e90f5f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")

net/mac80211/cfg.c
  58fcb1b4287c ("wifi: mac80211: reject VHT opmode for unsupported channel widths")
  037dc18ac3fb ("wifi: mac80211: add support for storing station S1G capabilities")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge tag 'net-6.16-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 11 Jul 2025 17:18:51 +0000 (10:18 -0700)] 
Merge tag 'net-6.16-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull more networking fixes from Jakub Kicinski
 "Big chunk of fixes for WiFi, Johannes says probably the last for the
  release.

  The Netlink fixes (on top of the tree) restore operation of iw (WiFi
  CLI) which uses sillily small recv buffer, and is the reason for this
  'emergency PR'.

  The GRE multicast fix also stands out among the user-visible
  regressions.

  Current release - fix to a fix:

   - netlink: make sure we always allow at least one skb to be queued,
     even if the recvbuf is (mis)configured to be tiny

  Previous releases - regressions:

   - gre: fix IPv6 multicast route creation

  Previous releases - always broken:

   - wifi: prevent A-MSDU attacks in mesh networks

   - wifi: cfg80211: fix S1G beacon head validation and detection

   - wifi: mac80211:
       - always clear frame buffer to prevent stack leak in cases which
         hit a WARN()
       - fix monitor interface in device restart

   - wifi: mwifiex: discard erroneous disassoc frames on STA interface

   - wifi: mt76:
       - prevent null-deref in mt7925_sta_set_decap_offload()
       - add missing RCU annotations, and fix sleep in atomic
       - fix decapsulation offload
       - fixes for scanning

   - phy: microchip: improve link establishment and reset handling

   - eth: mlx5e: fix race between DIM disable and net_dim()

   - bnxt_en: correct DMA unmap len for XDP_REDIRECT"

* tag 'net-6.16-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
  netlink: make sure we allow at least one dump skb
  netlink: Fix rmem check in netlink_broadcast_deliver().
  bnxt_en: Set DMA unmap len correctly for XDP_REDIRECT
  bnxt_en: Flush FW trace before copying to the coredump
  bnxt_en: Fix DCB ETS validation
  net: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam()
  net/mlx5e: Add new prio for promiscuous mode
  net/mlx5e: Fix race between DIM disable and net_dim()
  net/mlx5: Reset bw_share field when changing a node's parent
  can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level
  selftests: net: lib: fix shift count out of range
  selftests: Add IPv6 multicast route generation tests for GRE devices.
  gre: Fix IPv6 multicast route creation.
  net: phy: microchip: limit 100M workaround to link-down events on LAN88xx
  net: phy: microchip: Use genphy_soft_reset() to purge stale LPA bits
  ibmvnic: Fix hardcoded NUM_RX_STATS/NUM_TX_STATS with dynamic sizeof
  net: appletalk: Fix device refcount leak in atrtr_create()
  netfilter: flowtable: account for Ethernet header in nf_flow_pppoe_proto()
  wifi: mac80211: add the virtual monitor after reconfig complete
  wifi: mac80211: always initialize sdata::key_list
  ...

4 weeks agoMerge tag 'gpio-fixes-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Jul 2025 17:15:50 +0000 (10:15 -0700)] 
Merge tag 'gpio-fixes-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - fix performance regression when setting values of multiple GPIO lines
   at once

 - make sure the GPIO OF xlate code doesn't end up passing an
   uninitialized local variable to GPIO core

 - update MAINTAINERS

* tag 'gpio-fixes-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: remove bouncing address for Nandor Han
  gpio: of: initialize local variable passed to the .of_xlate() callback
  gpiolib: fix performance regression when using gpio_chip_get_multiple()

4 weeks agoselftests: drv-net: Add bpftool util
Mohsin Bashir [Thu, 10 Jul 2025 18:43:47 +0000 (11:43 -0700)] 
selftests: drv-net: Add bpftool util

Add bpf utility to simplify the use of bpftool for XDP tests included in
this series.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250710184351.63797-2-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge tag 'pm-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 11 Jul 2025 16:19:33 +0000 (09:19 -0700)] 
Merge tag 'pm-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix a coding mistake in a previous fix related to system suspend and
  hibernation merged recently"

* tag 'pm-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: Call pm_restore_gfp_mask() after dpm_resume()

4 weeks agoMerge tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Jul 2025 15:49:25 +0000 (08:49 -0700)] 
Merge tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fix from Marek Szyprowski:

 - small fix relevant to arm64 server and custom CMA configuration (Feng
   Tang)

* tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-contiguous: hornor the cma address limit setup by user

4 weeks agonetlink: make sure we allow at least one dump skb
Jakub Kicinski [Fri, 11 Jul 2025 00:11:21 +0000 (17:11 -0700)] 
netlink: make sure we allow at least one dump skb

Commit under Fixes tightened up the memory accounting for Netlink
sockets. Looks like the accounting is too strict for some existing
use cases, Marek reported issues with nl80211 / WiFi iw CLI.

To reduce number of iterations Netlink dumps try to allocate
messages based on the size of the buffer passed to previous
recvmsg() calls. If user space uses a larger buffer in recvmsg()
than sk_rcvbuf we will allocate an skb we won't be able to queue.

Make sure we always allow at least one skb to be queued.
Same workaround is already present in netlink_attachskb().
Alternative would be to cap the allocation size to
  rcvbuf - rmem_alloc
but as I said, the workaround is already present in other places.

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/9794af18-4905-46c6-b12c-365ea2f05858@samsung.com
Fixes: ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711001121.3649033-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonetlink: Fix rmem check in netlink_broadcast_deliver().
Kuniyuki Iwashima [Fri, 11 Jul 2025 05:32:07 +0000 (05:32 +0000)] 
netlink: Fix rmem check in netlink_broadcast_deliver().

We need to allow queuing at least one skb even when skb is
larger than sk->sk_rcvbuf.

The cited commit made a mistake while converting a condition
in netlink_broadcast_deliver().

Let's correct the rmem check for the allow-one-skb rule.

Fixes: ae8f160e7eb24 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711053208.2965945-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge branch 'bnxt_en-3-bug-fixes'
Jakub Kicinski [Fri, 11 Jul 2025 14:28:36 +0000 (07:28 -0700)] 
Merge branch 'bnxt_en-3-bug-fixes'

Michael Chan says:

====================
bnxt_en: 3 bug fixes

The first one fixes a possible failure when setting DCB ETS.
The second one fixes the ethtool coredump (-W 2) not containing
all the FW traces.  The third one fixes the DMA unmap length when
transmitting XDP_REDIRECT packets.
====================

Link: https://patch.msgid.link/20250710213938.1959625-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agobnxt_en: Set DMA unmap len correctly for XDP_REDIRECT
Somnath Kotur [Thu, 10 Jul 2025 21:39:38 +0000 (14:39 -0700)] 
bnxt_en: Set DMA unmap len correctly for XDP_REDIRECT

When transmitting an XDP_REDIRECT packet, call dma_unmap_len_set()
with the proper length instead of 0.  This bug triggers this warning
on a system with IOMMU enabled:

WARNING: CPU: 36 PID: 0 at drivers/iommu/dma-iommu.c:842 __iommu_dma_unmap+0x159/0x170
RIP: 0010:__iommu_dma_unmap+0x159/0x170
Code: a8 00 00 00 00 48 c7 45 b0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 a0 ff ff ff ff 4c 89 45
b8 4c 89 45 c0 e9 77 ff ff ff <0f> 0b e9 60 ff ff ff e8 8b bf 6a 00 66 66 2e 0f 1f 84 00 00 00 00
RSP: 0018:ff22d31181150c88 EFLAGS: 00010206
RAX: 0000000000002000 RBX: 00000000e13a0000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ff22d31181150cf0 R08: ff22d31181150ca8 R09: 0000000000000000
R10: 0000000000000000 R11: ff22d311d36c9d80 R12: 0000000000001000
R13: ff13544d10645010 R14: ff22d31181150c90 R15: ff13544d0b2bac00
FS: 0000000000000000(0000) GS:ff13550908a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005be909dacff8 CR3: 0008000173408003 CR4: 0000000000f71ef0
PKRU: 55555554
Call Trace:
<IRQ>
? show_regs+0x6d/0x80
? __warn+0x89/0x160
? __iommu_dma_unmap+0x159/0x170
? report_bug+0x17e/0x1b0
? handle_bug+0x46/0x90
? exc_invalid_op+0x18/0x80
? asm_exc_invalid_op+0x1b/0x20
? __iommu_dma_unmap+0x159/0x170
? __iommu_dma_unmap+0xb3/0x170
iommu_dma_unmap_page+0x4f/0x100
dma_unmap_page_attrs+0x52/0x220
? srso_alias_return_thunk+0x5/0xfbef5
? xdp_return_frame+0x2e/0xd0
bnxt_tx_int_xdp+0xdf/0x440 [bnxt_en]
__bnxt_poll_work_done+0x81/0x1e0 [bnxt_en]
bnxt_poll+0xd3/0x1e0 [bnxt_en]

Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250710213938.1959625-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agobnxt_en: Flush FW trace before copying to the coredump
Shruti Parab [Thu, 10 Jul 2025 21:39:37 +0000 (14:39 -0700)] 
bnxt_en: Flush FW trace before copying to the coredump

bnxt_fill_drv_seg_record() calls bnxt_dbg_hwrm_log_buffer_flush()
to flush the FW trace buffer.  This needs to be done before we
call bnxt_copy_ctx_mem() to copy the trace data.

Without this fix, the coredump may not contain all the FW
traces.

Fixes: 3c2179e66355 ("bnxt_en: Add FW trace coredump segments to the coredump")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250710213938.1959625-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agobnxt_en: Fix DCB ETS validation
Shravya KN [Thu, 10 Jul 2025 21:39:36 +0000 (14:39 -0700)] 
bnxt_en: Fix DCB ETS validation

In bnxt_ets_validate(), the code incorrectly loops over all possible
traffic classes to check and add the ETS settings.  Fix it to loop
over the configured traffic classes only.

The unconfigured traffic classes will default to TSA_ETS with 0
bandwidth.  Looping over these unconfigured traffic classes may
cause the validation to fail and trigger this error message:

"rejecting ETS config starving a TC\n"

The .ieee_setets() will then fail.

Fixes: 7df4ae9fe855 ("bnxt_en: Implement DCBNL to support host-based DCBX.")
Reviewed-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250710213938.1959625-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam()
Alok Tiwari [Thu, 10 Jul 2025 18:06:17 +0000 (11:06 -0700)] 
net: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam()

The function ll_temac_ethtools_set_ringparam() incorrectly checked
rx_pending twice, once correctly for RX and once mistakenly in place
of tx_pending. This caused tx_pending to be left unchecked against
TX_BD_NUM_MAX.
As a result, invalid TX ring sizes may have been accepted or valid
ones wrongly rejected based on the RX limit, leading to potential
misconfiguration or unexpected results.

This patch corrects the condition to properly validate tx_pending.

Fixes: f7b261bfc35e ("net: ll_temac: Make RX/TX ring sizes configurable")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250710180621.2383000-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge branch 'mlx5-misc-fixes-2025-07-10'
Jakub Kicinski [Fri, 11 Jul 2025 14:26:49 +0000 (07:26 -0700)] 
Merge branch 'mlx5-misc-fixes-2025-07-10'

Tariq Toukan says:

====================
mlx5 misc fixes 2025-07-10

This small patchset provides misc bug fixes from the team to the mlx5
core and EN drivers.
====================

Link: https://patch.msgid.link/1752155624-24095-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet/mlx5e: Add new prio for promiscuous mode
Jianbo Liu [Thu, 10 Jul 2025 13:53:44 +0000 (16:53 +0300)] 
net/mlx5e: Add new prio for promiscuous mode

An optimization for promiscuous mode adds a high-priority steering
table with a single catch-all rule to steer all traffic directly to
the TTC table.

However, a gap exists between the creation of this table and the
insertion of the catch-all rule. Packets arriving in this brief window
would miss as no rule was inserted yet, unnecessarily incrementing the
'rx_steer_missed_packets' counter and dropped.

This patch resolves the issue by introducing a new prio for this
table, placing it between MLX5E_TC_PRIO and MLX5E_NIC_PRIO. By doing
so, packets arriving during the window now fall through to the next
prio (at MLX5E_NIC_PRIO) instead of being dropped.

Fixes: 1c46d7409f30 ("net/mlx5e: Optimize promiscuous mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1752155624-24095-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet/mlx5e: Fix race between DIM disable and net_dim()
Carolina Jubran [Thu, 10 Jul 2025 13:53:43 +0000 (16:53 +0300)] 
net/mlx5e: Fix race between DIM disable and net_dim()

There's a race between disabling DIM and NAPI callbacks using the dim
pointer on the RQ or SQ.

If NAPI checks the DIM state bit and sees it still set, it assumes
`rq->dim` or `sq->dim` is valid. But if DIM gets disabled right after
that check, the pointer might already be set to NULL, leading to a NULL
pointer dereference in net_dim().

Fix this by calling `synchronize_net()` before freeing the DIM context.
This ensures all in-progress NAPI callbacks are finished before the
pointer is cleared.

Kernel log:

BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:net_dim+0x23/0x190
...
Call Trace:
 <TASK>
 ? __die+0x20/0x60
 ? page_fault_oops+0x150/0x3e0
 ? common_interrupt+0xf/0xa0
 ? sysvec_call_function_single+0xb/0x90
 ? exc_page_fault+0x74/0x130
 ? asm_exc_page_fault+0x22/0x30
 ? net_dim+0x23/0x190
 ? mlx5e_poll_ico_cq+0x41/0x6f0 [mlx5_core]
 ? sysvec_apic_timer_interrupt+0xb/0x90
 mlx5e_handle_rx_dim+0x92/0xd0 [mlx5_core]
 mlx5e_napi_poll+0x2cd/0xac0 [mlx5_core]
 ? mlx5e_poll_ico_cq+0xe5/0x6f0 [mlx5_core]
 busy_poll_stop+0xa2/0x200
 ? mlx5e_napi_poll+0x1d9/0xac0 [mlx5_core]
 ? mlx5e_trigger_irq+0x130/0x130 [mlx5_core]
 __napi_busy_loop+0x345/0x3b0
 ? sysvec_call_function_single+0xb/0x90
 ? asm_sysvec_call_function_single+0x16/0x20
 ? sysvec_apic_timer_interrupt+0xb/0x90
 ? pcpu_free_area+0x1e4/0x2e0
 napi_busy_loop+0x11/0x20
 xsk_recvmsg+0x10c/0x130
 sock_recvmsg+0x44/0x70
 __sys_recvfrom+0xbc/0x130
 ? __schedule+0x398/0x890
 __x64_sys_recvfrom+0x20/0x30
 do_syscall_64+0x4c/0x100
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
...
---[ end trace 0000000000000000 ]---
...
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: 445a25f6e1a2 ("net/mlx5e: Support updating coalescing configuration without resetting channels")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1752155624-24095-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet/mlx5: Reset bw_share field when changing a node's parent
Carolina Jubran [Thu, 10 Jul 2025 13:53:42 +0000 (16:53 +0300)] 
net/mlx5: Reset bw_share field when changing a node's parent

When changing a node's parent, its scheduling element is destroyed and
re-created with bw_share 0. However, the node's bw_share field was not
updated accordingly.

Set the node's bw_share to 0 after re-creation to keep the software
state in sync with the firmware configuration.

Fixes: 9c7bbf4c3304 ("net/mlx5: Add support for setting parent of nodes")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1752155624-24095-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge tag 'linux-can-fixes-for-6.16-20250711' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Fri, 11 Jul 2025 14:07:56 +0000 (07:07 -0700)] 
Merge tag 'linux-can-fixes-for-6.16-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-07-11

Sean Nyekjaer's patch targets the m_can driver and demotes the "msg
lost in rx" message to debug level to prevent flooding the kernel log
with error messages.

* tag 'linux-can-fixes-for-6.16-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level
====================

Link: https://patch.msgid.link/20250711102451.2828802-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge branch 'hv-msi-parent-domain' into main
David S. Miller [Fri, 11 Jul 2025 11:57:02 +0000 (12:57 +0100)] 
Merge branch 'hv-msi-parent-domain' into main

Nam Cao says:

====================
Subject: [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion

This series originally belongs to a bigger series sent to PCI tree:
https://lore.kernel.org/linux-pci/024f0122314198fe0a42fef01af53e8953a687ec.1750858083.git.namcao@linutronix.de/

However, during review, we noticed that the patch conflicts with another
patch in netdev tree:
https://lore.kernel.org/netdev/1749651015-9668-1-git-send-email-shradhagupta@linux.microsoft.com/

As this series has no dependency with the rest of the series, we think it
is best to split out this one and send it to netdev, to avoid conflict
resolution headache later on.

Can netdev maintainers please pick it up?
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 weeks agoPCI: hv: Switch to msi_create_parent_irq_domain()
Nam Cao [Mon, 7 Jul 2025 08:20:16 +0000 (10:20 +0200)] 
PCI: hv: Switch to msi_create_parent_irq_domain()

Move away from the legacy MSI domain setup, switch to use
msi_create_parent_irq_domain().

While doing the conversion, I noticed that hv_compose_msi_msg() is doing
more than it is supposed to (composing message). This function also
allocates and populates struct tran_int_desc, which should be done in
hv_pcie_domain_alloc() instead. It works, but it is not the correct design.
However, I have no hardware to test such change, therefore I leave a TODO
note.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 weeks agoirqdomain: Export irq_domain_free_irqs_top()
Nam Cao [Mon, 7 Jul 2025 08:20:15 +0000 (10:20 +0200)] 
irqdomain: Export irq_domain_free_irqs_top()

Export irq_domain_free_irqs_top(), making it usable for drivers compiled as
modules.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 weeks agocan: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level
Sean Nyekjaer [Fri, 11 Jul 2025 10:12:02 +0000 (12:12 +0200)] 
can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level

Downgrade the "msg lost in rx" message to debug level, to prevent
flooding the kernel log with error messages.

Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support")
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Link: https://patch.msgid.link/20250711-mcan_ratelimit-v3-1-7413e8e21b84@geanix.com
[mkl: enhance commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 weeks agocan: rcar_canfd: Drop unused macros
Biju Das [Wed, 2 Jul 2025 12:05:29 +0000 (13:05 +0100)] 
can: rcar_canfd: Drop unused macros

Drop unused macros from the rcar_canfd.c.

Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Closes: https://lore.kernel.org/all/7ff93ff9-f578-4be2-bdc6-5b09eab64fe6@wanadoo.fr/
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20250702120539.98490-1-biju.das.jz@bp.renesas.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 weeks agocan: rcar_can: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
Geert Uytterhoeven [Wed, 9 Jul 2025 18:36:21 +0000 (20:36 +0200)] 
can: rcar_can: Convert to DEFINE_SIMPLE_DEV_PM_OPS()

Convert the Renesas R-Car CAN driver from SIMPLE_DEV_PM_OPS() to
DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr().  This lets us drop the
__maybe_unused annotations from its suspend and resume callbacks, and
reduces kernel size in case CONFIG_PM or CONFIG_PM_SLEEP is disabled.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/6ffe085f6e2548f53674dd11704b388cf4b303e9.1752086078.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 weeks agoMAINTAINERS: remove bouncing address for Nandor Han
Bartosz Golaszewski [Wed, 9 Jul 2025 07:18:24 +0000 (09:18 +0200)] 
MAINTAINERS: remove bouncing address for Nandor Han

Nandor's address has been bouncing for some time now. Remove it from
MAINTAINERS. The affected driver falls under the wider umbrella of GPIO
modules.

Link: https://lore.kernel.org/r/20250709071825.16212-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
4 weeks agoMerge tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilt...
Jakub Kicinski [Fri, 11 Jul 2025 01:18:40 +0000 (18:18 -0700)] 
Merge tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next (v2)

The following series contains an initial small batch of Netfilter
updates for net-next:

1) Remove DCCP conntrack support, keep DCCP matches around in order to
   avoid breakage when loading ruleset, add Kconfig to wrap the code
   so it can be disabled by distributors.

2) Remove buggy code aiming at shrinking netlink deletion event, then
   re-add it correctly in another patch. This is to prevent -stable to
   pick up on a fix that breaks old userspace. From Phil Sutter.

3) Missing WARN_ON_ONCE() to check for lockdep_commit_lock_is_held()
   to uncover bugs. From Fedor Pchelkin.

* tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: adjust lockdep assertions handling
  netfilter: nf_tables: Reintroduce shortened deletion notifications
  netfilter: nf_tables: Drop dead code from fill_*_info routines
  netfilter: conntrack: remove DCCP protocol support
====================

Link: https://patch.msgid.link/20250710010706.2861281-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>