]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
2 months agonet: dsa: microchip: add ETS scheduler support for KSZ88x3 switches
Oleksij Rempel [Thu, 10 Apr 2025 12:42:49 +0000 (14:42 +0200)] 
net: dsa: microchip: add ETS scheduler support for KSZ88x3 switches

Implement Enhanced Transmission Selection scheduler (ETS) support for
KSZ88x3 devices, which support two fixed egress scheduling modes:
Strict Priority and Weighted Fair Queuing (WFQ).

Since the switch does not allow remapping priorities to queues or
adjusting weights, this implementation only supports enabling
strict priority mode. If strict mode is not explicitly requested,
the switch falls back to its default WFQ mode.

This patch introduces KSZ88x3-specific handlers for ETS add and
delete operations and uses TXQ Split Control registers to toggle
the WFQ enable bit per queue. Corresponding macros are also added
for register access.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250410124249.2728568-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-stmmac-remove-unnecessary-initialisation-of-1-s-tic-counter'
Jakub Kicinski [Tue, 15 Apr 2025 00:12:44 +0000 (17:12 -0700)] 
Merge branch 'net-stmmac-remove-unnecessary-initialisation-of-1-s-tic-counter'

Russell King says:

====================
net: stmmac: remove unnecessary initialisation of 1µs TIC counter

In commit 8efbdbfa9938 ("net: stmmac: Initialize MAC_ONEUS_TIC_COUNTER
register"), code to initialise the LPI 1us counter in dwmac4's
initialisation was added, making the initialisation in glue drivers
unnecessary. This series cleans up the now redundant initialisation.
====================

Link: https://patch.msgid.link/Z_oe0U5E0i3uZbop@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: remove GMAC_1US_TIC_COUNTER definition
Russell King (Oracle) [Sat, 12 Apr 2025 08:08:50 +0000 (09:08 +0100)] 
net: stmmac: remove GMAC_1US_TIC_COUNTER definition

GMAC_1US_TIC_COUNTER is now no longer used, so remove the definition.
This was duplicated by GMAC4_MAC_ONEUS_TIC_COUNTER further down in the
same file.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u3Vv0-000E87-DQ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: remove eee_usecs_rate
Russell King (Oracle) [Sat, 12 Apr 2025 08:08:45 +0000 (09:08 +0100)] 
net: stmmac: remove eee_usecs_rate

plat_dat->eee_users_rate is now unused, so remove this member.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u3Vuv-000E7y-9k@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: intel-plat: remove eee_usecs_rate and hardware write
Russell King (Oracle) [Sat, 12 Apr 2025 08:08:40 +0000 (09:08 +0100)] 
net: stmmac: intel-plat: remove eee_usecs_rate and hardware write

Remove the write to GMAC_1US_TIC_COUNTER for two reasons:

1. during initialisation or reinitialisation of the DWMAC core, the
   core is reset, which sets this register back to its default value.
   Writing it prior to stmmac_dvr_probe() has no effect.

2. Since commit 8efbdbfa9938 ("net: stmmac: Initialize
   MAC_ONEUS_TIC_COUNTER register"), GMAC4/5 core code will set
   this register based on the rate of plat->stmmac_clk. This clock
   is fetched by devm_stmmac_probe_config_dt(), and plat->clk_ptp_rate
   will be set to its rate profided a "ptp_ref" clock is not provided.
   In any case, Marek's commit will set the effectual value of this
   register.

Therefore, dwmac-intel-plat.c writing GMAC_1US_TIC_COUNTER serves no
useful purpose and can be removed.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u3Vuq-000E7s-5Y@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: intel: remove eee_usecs_rate and hardware write
Russell King (Oracle) [Sat, 12 Apr 2025 08:08:35 +0000 (09:08 +0100)] 
net: stmmac: intel: remove eee_usecs_rate and hardware write

Remove the write to GMAC_1US_TIC_COUNTER for two reasons:

1. during initialisation or reinitialisation of the DWMAC core, the
   core is reset, which sets this register back to its default value.
   Writing it prior to stmmac_dvr_probe() has no effect.

2. Since commit 8efbdbfa9938 ("net: stmmac: Initialize
   MAC_ONEUS_TIC_COUNTER register"), GMAC4/5 core code will set
   this register based on the rate of plat->stmmac_clk. This clock
   is created by the same code which initialises plat->eee_usecs_rate,
   which is also created to run at this same rate. Since Marek's
   commit, this will set this register appropriately using the
   rate of this clock.

Therefore, dwmac-intel.c writing GMAC_1US_TIC_COUNTER serves no
useful purpose and can be removed.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u3Vul-000E7m-1j@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: dwc-qos: remove tegra_eqos_init()
Russell King (Oracle) [Sat, 12 Apr 2025 08:08:29 +0000 (09:08 +0100)] 
net: stmmac: dwc-qos: remove tegra_eqos_init()

tegra_eqos_init() initialises the 1US TIC counter for the EEE timers.
However, the DWGMAC core is reset after this write, which clears
this register to its default.

However, dwmac4_core_init() configures this register using the same
clock, which happens after reset - thus this is the write which
ensures that the register is correctly configured.

Therefore, tegra_eqos_init() is not required and is removed. This also
means eqos->clk_slave can also be removed.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u3Vuf-000E7g-U4@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-convert-exit_batch_rtnl-to-exit_rtnl'
Jakub Kicinski [Tue, 15 Apr 2025 00:09:13 +0000 (17:09 -0700)] 
Merge branch 'net-convert-exit_batch_rtnl-to-exit_rtnl'

Kuniyuki Iwashima says:

====================
net: Convert ->exit_batch_rtnl() to ->exit_rtnl().

While converting nexthop to per-netns RTNL, there are two blockers
to using rtnl_net_dereference(), flush_all_nexthops() and
__unregister_nexthop_notifier(), both of which are called from
->exit_batch_rtnl().

Instead of spreading __rtnl_net_lock() over each ->exit_batch_rtnl(),
we should convert all ->exit_batch_rtnl() to per-net ->exit_rtnl() and
run it under __rtnl_net_lock() because all ->exit_batch_rtnl() functions
do not have anything to factor out for batching.

Patch 1 & 2 factorise the undo mechanism against ->init() into a single
function, and Patch 3 adds ->exit_batch_rtnl().

Patch 4 ~ 13 convert all ->exit_batch_rtnl() users.

Patch 14 removes ->exit_batch_rtnl().

Later, we can convert pfcp and ppp to use ->exit_rtnl().

v1: https://lore.kernel.org/all/20250410022004.8668-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20250411205258.63164-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Remove ->exit_batch_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:43 +0000 (13:52 -0700)] 
net: Remove ->exit_batch_rtnl().

There are no ->exit_batch_rtnl() users remaining.

Let's remove the hook.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-15-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agogeneve: Convert geneve_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:42 +0000 (13:52 -0700)] 
geneve: Convert geneve_exit_batch_rtnl() to ->exit_rtnl().

geneve_exit_batch_rtnl() iterates the dying netns list and
performs the same operation for each.

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-14-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agobareudp: Convert bareudp_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:41 +0000 (13:52 -0700)] 
bareudp: Convert bareudp_exit_batch_rtnl() to ->exit_rtnl().

bareudp_exit_batch_rtnl() iterates the dying netns list and performs the
same operation for each.

Let's use ->exit_rtnl().

While at it, we replace unregister_netdevice_queue() with
bareudp_dellink() for better cleanup.  It unlinks the device
from net_generic(net, bareudp_net_id)->bareudp_list, but there
is no real issue as both the dev and the list are freed later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-13-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agogtp: Convert gtp_net_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:40 +0000 (13:52 -0700)] 
gtp: Convert gtp_net_exit_batch_rtnl() to ->exit_rtnl().

gtp_net_exit_batch_rtnl() iterates the dying netns list
and performs the same operations for each.

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agobonding: Convert bond_net_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:39 +0000 (13:52 -0700)] 
bonding: Convert bond_net_exit_batch_rtnl() to ->exit_rtnl().

bond_net_exit_batch_rtnl() iterates the dying netns list and
performs the same operation for each.

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-11-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agobridge: Convert br_net_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:38 +0000 (13:52 -0700)] 
bridge: Convert br_net_exit_batch_rtnl() to ->exit_rtnl().

br_net_exit_batch_rtnl() iterates the dying netns list and
performs the same operation for each.

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-10-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoxfrm: Convert xfrmi_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:37 +0000 (13:52 -0700)] 
xfrm: Convert xfrmi_exit_batch_rtnl() to ->exit_rtnl().

xfrmi_exit_batch_rtnl() iterates the dying netns list and
performs the same operations for each.

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoipv6: Convert tunnel devices' ->exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:36 +0000 (13:52 -0700)] 
ipv6: Convert tunnel devices' ->exit_batch_rtnl() to ->exit_rtnl().

The following functions iterates the dying netns list and performs
the same operations for each netns.

  * ip6gre_exit_batch_rtnl()
  * ip6_tnl_exit_batch_rtnl()
  * vti6_exit_batch_rtnl()
  * sit_exit_batch_rtnl()

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoipv4: ip_tunnel: Convert ip_tunnel_delete_nets() callers to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:35 +0000 (13:52 -0700)] 
ipv4: ip_tunnel: Convert ip_tunnel_delete_nets() callers to ->exit_rtnl().

ip_tunnel_delete_nets() iterates the dying netns list and performs the
same operations for each.

Let's export ip_tunnel_destroy() as ip_tunnel_delete_net() and call it
from ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agovxlan: Convert vxlan_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:34 +0000 (13:52 -0700)] 
vxlan: Convert vxlan_exit_batch_rtnl() to ->exit_rtnl().

vxlan_exit_batch_rtnl() iterates the dying netns list and
performs the same operations for each.

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonexthop: Convert nexthop_net_exit_batch_rtnl() to ->exit_rtnl().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:33 +0000 (13:52 -0700)] 
nexthop: Convert nexthop_net_exit_batch_rtnl() to ->exit_rtnl().

nexthop_net_exit_batch_rtnl() iterates the dying netns list and
performs the same operation for each.

Let's use ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Add ->exit_rtnl() hook to struct pernet_operations.
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:32 +0000 (13:52 -0700)] 
net: Add ->exit_rtnl() hook to struct pernet_operations.

struct pernet_operations provides two batching hooks; ->exit_batch()
and ->exit_batch_rtnl().

The batching variant is beneficial if ->exit() meets any of the
following conditions:

  1) ->exit() repeatedly acquires a global lock for each netns

  2) ->exit() has a time-consuming operation that can be factored
     out (e.g. synchronize_rcu(), smp_mb(), etc)

  3) ->exit() does not need to repeat the same iterations for each
     netns (e.g. inet_twsk_purge())

Currently, none of the ->exit_batch_rtnl() functions satisfy any of
the above conditions because RTNL is factored out and held by the
caller and all of these functions iterate over the dying netns list.

Also, we want to hold per-netns RTNL there but avoid spreading
__rtnl_net_lock() across multiple locations.

Let's add ->exit_rtnl() hook and run it under __rtnl_net_lock().

The following patches will convert all ->exit_batch_rtnl() users
to ->exit_rtnl().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Add ops_undo_single for module load/unload.
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:31 +0000 (13:52 -0700)] 
net: Add ops_undo_single for module load/unload.

If ops_init() fails while loading a module or we unload the
module, free_exit_list() rolls back the changes.

The rollback sequence is the same as ops_undo_list().

The ops is already removed from pernet_list before calling
free_exit_list().  If we link the ops to a temporary list,
we can reuse ops_undo_list().

Let's add a wrapper of ops_undo_list() and use it instead
of free_exit_list().

Now, we have the central place to roll back ops_init().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Factorise setup_net() and cleanup_net().
Kuniyuki Iwashima [Fri, 11 Apr 2025 20:52:30 +0000 (13:52 -0700)] 
net: Factorise setup_net() and cleanup_net().

When we roll back the changes made by struct pernet_operations.init(),
we execute mostly identical sequences in three places.

  * setup_net()
  * cleanup_net()
  * free_exit_list()

The only difference between the first two is which list and RCU helpers
to use.

In setup_net(), an ops could fail on the way, so we need to perform a
reverse walk from its previous ops in pernet_list.  OTOH, in cleanup_net(),
we iterate the full list from tail to head.

The former passes the failed ops to list_for_each_entry_continue_reverse().
It's tricky, but we can reuse it for the latter if we pass list_entry() of
the head node.

Also, synchronize_rcu() and synchronize_rcu_expedited() can be easily
switched by an argument.

Let's factorise the rollback part in setup_net() and cleanup_net().

In the next patch, ops_undo_list() will be reused for free_exit_list(),
and then two arguments (ops_list and hold_rtnl) will differ.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethtool: fix get_ts_stats() documentation
Russell King (Oracle) [Fri, 11 Apr 2025 22:04:19 +0000 (23:04 +0100)] 
net: ethtool: fix get_ts_stats() documentation

Commit 0e9c127729be ("ethtool: add interface to read Tx hardware
timestamping statistics") added documentation for timestamping
statistics, but added the detailed explanation for this method to
the get_ts_info() rather than get_ts_stats(). Move it to the correct
entry.

Cc: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u3MTz-000Crx-IW@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'fix-late-dma-unmap-crash-for-page-pool'
Jakub Kicinski [Mon, 14 Apr 2025 23:30:35 +0000 (16:30 -0700)] 
Merge branch 'fix-late-dma-unmap-crash-for-page-pool'

Toke Høiland-Jørgensen says:

====================
Fix late DMA unmap crash for page pool

This series fixes the late dma_unmap crash for page pool first reported
by Yonglong Liu in [0]. It is an alternative approach to the one
submitted by Yunsheng Lin, most recently in [1]. The first commit just
wraps some tests in a helper function, in preparation of the main change
in patch 2. See the commit message of patch 2 for the details.

[0] https://lore.kernel.org/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org
[1] https://lore.kernel.org/20250307092356.638242-1-linyunsheng@huawei.com

v8: https://lore.kernel.org/20250407-page-pool-track-dma-v8-0-da9500d4ba21@redhat.com
v7: https://lore.kernel.org/20250404-page-pool-track-dma-v7-0-ad34f069bc18@redhat.com
v6: https://lore.kernel.org/20250401-page-pool-track-dma-v6-0-8b83474870d4@redhat.com
v5: https://lore.kernel.org/20250328-page-pool-track-dma-v5-0-55002af683ad@redhat.com
v4: https://lore.kernel.org/20250327-page-pool-track-dma-v4-0-b380dc6706d0@redhat.com
v3: https://lore.kernel.org/20250326-page-pool-track-dma-v3-0-8e464016e0ac@redhat.com
v2: https://lore.kernel.org/20250325-page-pool-track-dma-v2-0-113ebc1946f3@redhat.com
v1: https://lore.kernel.org/20250314-page-pool-track-dma-v1-0-c212e57a74c2@redhat.com
====================

Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agopage_pool: Track DMA-mapped pages and unmap them when destroying the pool
Toke Høiland-Jørgensen [Wed, 9 Apr 2025 10:41:37 +0000 (12:41 +0200)] 
page_pool: Track DMA-mapped pages and unmap them when destroying the pool

When enabling DMA mapping in page_pool, pages are kept DMA mapped until
they are released from the pool, to avoid the overhead of re-mapping the
pages every time they are used. This causes resource leaks and/or
crashes when there are pages still outstanding while the device is torn
down, because page_pool will attempt an unmap through a non-existent DMA
device on the subsequent page return.

To fix this, implement a simple tracking of outstanding DMA-mapped pages
in page pool using an xarray. This was first suggested by Mina[0], and
turns out to be fairly straight forward: We simply store pointers to
pages directly in the xarray with xa_alloc() when they are first DMA
mapped, and remove them from the array on unmap. Then, when a page pool
is torn down, it can simply walk the xarray and unmap all pages still
present there before returning, which also allows us to get rid of the
get/put_device() calls in page_pool. Using xa_cmpxchg(), no additional
synchronisation is needed, as a page will only ever be unmapped once.

To avoid having to walk the entire xarray on unmap to find the page
reference, we stash the ID assigned by xa_alloc() into the page
structure itself, using the upper bits of the pp_magic field. This
requires a couple of defines to avoid conflicting with the
POINTER_POISON_DELTA define, but this is all evaluated at compile-time,
so does not affect run-time performance. The bitmap calculations in this
patch gives the following number of bits for different architectures:

- 23 bits on 32-bit architectures
- 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE)
- 32 bits on other 64-bit architectures

Stashing a value into the unused bits of pp_magic does have the effect
that it can make the value stored there lie outside the unmappable
range (as governed by the mmap_min_addr sysctl), for architectures that
don't define ILLEGAL_POINTER_VALUE. This means that if one of the
pointers that is aliased to the pp_magic field (such as page->lru.next)
is dereferenced while the page is owned by page_pool, that could lead to
a dereference into userspace, which is a security concern. The risk of
this is mitigated by the fact that (a) we always clear pp_magic before
releasing a page from page_pool, and (b) this would need a
use-after-free bug for struct page, which can have many other risks
since page->lru.next is used as a generic list pointer in multiple
places in the kernel. As such, with this patch we take the position that
this risk is negligible in practice. For more discussion, see[1].

Since all the tracking added in this patch is performed on DMA
map/unmap, no additional code is needed in the fast path, meaning the
performance overhead of this tracking is negligible there. A
micro-benchmark shows that the total overhead of the tracking itself is
about 400 ns (39 cycles(tsc) 395.218 ns; sum for both map and unmap[2]).
Since this cost is only paid on DMA map and unmap, it seems like an
acceptable cost to fix the late unmap issue. Further optimisation can
narrow the cases where this cost is paid (for instance by eliding the
tracking when DMA map/unmap is a no-op).

The extra memory needed to track the pages is neatly encapsulated inside
xarray, which uses the 'struct xa_node' structure to track items. This
structure is 576 bytes long, with slots for 64 items, meaning that a
full node occurs only 9 bytes of overhead per slot it tracks (in
practice, it probably won't be this efficient, but in any case it should
be an acceptable overhead).

[0] https://lore.kernel.org/all/CAHS8izPg7B5DwKfSuzz-iOop_YRbk3Sd6Y4rX7KBG9DcVJcyWg@mail.gmail.com/
[1] https://lore.kernel.org/r/20250320023202.GA25514@openwall.com
[2] https://lore.kernel.org/r/ae07144c-9295-4c9d-a400-153bb689fe9e@huawei.com

Reported-by: Yonglong Liu <liuyonglong@huawei.com>
Closes: https://lore.kernel.org/r/8743264a-9700-4227-a556-5f931c720211@huawei.com
Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code")
Suggested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Qiuling Ren <qren@redhat.com>
Tested-by: Yuying Ma <yuma@redhat.com>
Tested-by: Yonglong Liu <liuyonglong@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-2-6a9ef2e0cba8@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agopage_pool: Move pp_magic check into helper functions
Toke Høiland-Jørgensen [Wed, 9 Apr 2025 10:41:36 +0000 (12:41 +0200)] 
page_pool: Move pp_magic check into helper functions

Since we are about to stash some more information into the pp_magic
field, let's move the magic signature checks into a pair of helper
functions so it can be changed in one place.

Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Yonglong Liu <liuyonglong@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-1-6a9ef2e0cba8@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Mon, 14 Apr 2025 23:05:43 +0000 (16:05 -0700)] 
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-04-11 (ice, i40e, ixgbe, igc, e1000e)

For ice:
Mateusz and Larysa add support for LLDP packets to be received on a VF
and transmitted by a VF in switchdev mode. Additional information:
https://lore.kernel.org/intel-wired-lan/20250214085215.2846063-1-larysa.zaremba@intel.com/

Karol adds timesync support for E825C devices using 2xNAC (Network
Acceleration Complex) configuration. 2xNAC mode is the mode in which
IO die is housing two complexes and each of them has its own PHY
connected to it.

Martyna adds messaging to clarify filter errors when recipe space is
exhausted.

Colin Ian King adds static modifier to a const array to avoid stack
usage.

For i40e:
Kyungwook Boo changes variable declaration types to prevent possible
underflow.

For ixgbe:
Rand Deeb adjusts retry values so that retries are attempted.

For igc:
Rui Salvaterra sets VLAN offloads to be enabled as default.

For e1000e:
Piotr Wejman converts driver to use newer hardware timestamping API.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  net: e1000e: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
  igc: enable HW vlan tag insertion/stripping by default
  ixgbe: Fix unreachable retry logic in combined and byte I2C write functions
  i40e: fix MMIO write access to an invalid page in i40e_clear_hw
  ice: make const read-only array dflt_rules static
  ice: improve error message for insufficient filter space
  ice: enable timesync operation on 2xNAC E825 devices
  ice: refactor ice_sbq_msg_dev enum
  ice: remove SW side band access workaround for E825
  ice: enable LLDP TX for VFs through tc
  ice: support egress drop rules on PF
  ice: remove headers argument from ice_tc_count_lkups
  ice: receive LLDP on trusted VFs
  ice: do not add LLDP-specific filter if not necessary
  ice: fix check for existing switch rule
====================

Link: https://patch.msgid.link/20250411204401.3271306-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'cpsw-bindings-for-5000m-fixed-link'
Jakub Kicinski [Mon, 14 Apr 2025 22:57:12 +0000 (15:57 -0700)] 
Merge branch 'cpsw-bindings-for-5000m-fixed-link'

Siddharth Vadapalli says:

====================
CPSW Bindings for 5000M Fixed-Link

This series adds 5000M as a valid speed for fixed-link mode of operation
and also updates the CPSW bindings to evaluate fixed-link property. This
series is in the context of the following device-tree overlay which
enables USXGMII 5000M Fixed-link mode of operation with CPSW on TI's
J784S4 SoC:
https://github.com/torvalds/linux/blob/v6.15-rc1/arch/arm64/boot/dts/ti/k3-j784s4-evm-usxgmii-exp1-exp2.dtso
====================

Link: https://patch.msgid.link/20250411060917.633769-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodt-bindings: net: ti: k3-am654-cpsw-nuss: evaluate fixed-link property
Siddharth Vadapalli [Fri, 11 Apr 2025 06:09:17 +0000 (11:39 +0530)] 
dt-bindings: net: ti: k3-am654-cpsw-nuss: evaluate fixed-link property

Since the fixed-link (phyless) mode of operation is supported by the
CPSW MAC, include "fixed-link" in the set of properties to be evaluated.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20250411060917.633769-3-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodt-bindings: net: ethernet-controller: add 5000M speed to fixed-link
Siddharth Vadapalli [Fri, 11 Apr 2025 06:09:16 +0000 (11:39 +0530)] 
dt-bindings: net: ethernet-controller: add 5000M speed to fixed-link

A link speed of 5000 Mbps is a valid speed for a fixed-link mode of
operation. Hence, update the bindings to include the same.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250411060917.633769-2-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'add-support-for-mdb-offload-failure-notification'
Jakub Kicinski [Mon, 14 Apr 2025 22:56:55 +0000 (15:56 -0700)] 
Merge branch 'add-support-for-mdb-offload-failure-notification'

Joseph Huang says:

====================
Add support for mdb offload failure notification

Currently the bridge does not provide real-time feedback to user space
on whether or not an attempt to offload an mdb entry was successful.

This patch set adds support to notify user space about failed offload
attempts, and is controlled by a new knob mdb_offload_fail_notification.

A break-down of the patches in the series:

Patch 1 adds offload failed flag to indicate that the offload attempt
has failed. The flag is reflected in netlink mdb entry flags.

Patch 2 adds the new bridge bool option mdb_offload_fail_notification.

Patch 3 notifies user space when the result is known, controlled by
mdb_offload_fail_notification setting.
====================

Link: https://patch.msgid.link/20250411150323.1117797-1-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: bridge: mcast: Notify on mdb offload failure
Joseph Huang [Fri, 11 Apr 2025 15:03:18 +0000 (11:03 -0400)] 
net: bridge: mcast: Notify on mdb offload failure

Notify user space on mdb offload failure if
mdb_offload_fail_notification is enabled.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250411150323.1117797-4-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: bridge: Add offload_fail_notification bopt
Joseph Huang [Fri, 11 Apr 2025 15:03:17 +0000 (11:03 -0400)] 
net: bridge: Add offload_fail_notification bopt

Add BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION bool option.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250411150323.1117797-3-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: bridge: mcast: Add offload failed mdb flag
Joseph Huang [Fri, 11 Apr 2025 15:03:16 +0000 (11:03 -0400)] 
net: bridge: mcast: Add offload failed mdb flag

Add MDB_FLAGS_OFFLOAD_FAILED and MDB_PG_FLAGS_OFFLOAD_FAILED to indicate
that an attempt to offload the MDB entry to switchdev has failed.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250411150323.1117797-2-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agobna: bnad_dim_timeout: Rename del_timer_sync in comment
WangYuli [Fri, 11 Apr 2025 10:17:36 +0000 (18:17 +0800)] 
bna: bnad_dim_timeout: Rename del_timer_sync in comment

Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()")
switched del_timer_sync to timer_delete_sync, but did not modify the
comment for bnad_dim_timeout(). Now fix it.

Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/61DDCE7AB5B6CE82+20250411101736.160981-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'pktgen-code-cleanup'
Jakub Kicinski [Mon, 14 Apr 2025 21:53:17 +0000 (14:53 -0700)] 
Merge branch 'pktgen-code-cleanup'

Peter Seiderer says:

====================
net: pktgen: fix checkpatch code style errors/warnings

Fix checkpatch detected code style errors/warnings detected in
the file net/core/pktgen.c (remaining checkpatch checks will be addressed
in a follow up patch set).
====================

Link: https://patch.msgid.link/20250410071749.30505-1-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pktgen: fix code style (WARNING: quoted string split across lines)
Peter Seiderer [Thu, 10 Apr 2025 07:17:47 +0000 (09:17 +0200)] 
net: pktgen: fix code style (WARNING: quoted string split across lines)

Fix checkpatch code style warnings:

  WARNING: quoted string split across lines
  #480: FILE: net/core/pktgen.c:480:
  +       "Packet Generator for packet performance testing. "
  +       "Version: " VERSION "\n";

  WARNING: quoted string split across lines
  #632: FILE: net/core/pktgen.c:632:
  +                  "     udp_src_min: %d  udp_src_max: %d"
  +                  "  udp_dst_min: %d  udp_dst_max: %d\n",

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pktgen: fix code style (WARNING: macros should not use a trailing semicolon)
Peter Seiderer [Thu, 10 Apr 2025 07:17:45 +0000 (09:17 +0200)] 
net: pktgen: fix code style (WARNING: macros should not use a trailing semicolon)

Fix checkpatch code style warnings:

  WARNING: macros should not use a trailing semicolon
  #180: FILE: net/core/pktgen.c:180:
  +#define func_enter() pr_debug("entering %s\n", __func__);

  WARNING: macros should not use a trailing semicolon
  #234: FILE: net/core/pktgen.c:234:
  +#define   if_lock(t)           mutex_lock(&(t->if_lock));

  CHECK: Unnecessary parentheses around t->if_lock
  #235: FILE: net/core/pktgen.c:235:
  +#define   if_unlock(t)           mutex_unlock(&(t->if_lock));

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pktgen: fix code style (WARNING: Missing a blank line after declarations)
Peter Seiderer [Thu, 10 Apr 2025 07:17:44 +0000 (09:17 +0200)] 
net: pktgen: fix code style (WARNING: Missing a blank line after declarations)

Fix checkpatch code style warnings:

  WARNING: Missing a blank line after declarations
  #761: FILE: net/core/pktgen.c:761:
  +               char c;
  +               if (get_user(c, &user_buffer[i]))

  WARNING: Missing a blank line after declarations
  #780: FILE: net/core/pktgen.c:780:
  +               char c;
  +               if (get_user(c, &user_buffer[i]))

  WARNING: Missing a blank line after declarations
  #806: FILE: net/core/pktgen.c:806:
  +               char c;
  +               if (get_user(c, &user_buffer[i]))

  WARNING: Missing a blank line after declarations
  #823: FILE: net/core/pktgen.c:823:
  +               char c;
  +               if (get_user(c, &user_buffer[i]))

  WARNING: Missing a blank line after declarations
  #1968: FILE: net/core/pktgen.c:1968:
  +               char f[32];
  +               memset(f, 0, 32);

  WARNING: Missing a blank line after declarations
  #2410: FILE: net/core/pktgen.c:2410:
  +       struct pktgen_net *pn = net_generic(dev_net(pkt_dev->odev), pg_net_id);
  +       if (!x) {

  WARNING: Missing a blank line after declarations
  #2442: FILE: net/core/pktgen.c:2442:
  +               __u16 t;
  +               if (pkt_dev->flags & F_QUEUE_MAP_RND) {

  WARNING: Missing a blank line after declarations
  #2523: FILE: net/core/pktgen.c:2523:
  +               unsigned int i;
  +               for (i = 0; i < pkt_dev->nr_labels; i++)

  WARNING: Missing a blank line after declarations
  #2567: FILE: net/core/pktgen.c:2567:
  +                       __u32 t;
  +                       if (pkt_dev->flags & F_IPSRC_RND)

  WARNING: Missing a blank line after declarations
  #2587: FILE: net/core/pktgen.c:2587:
  +                               __be32 s;
  +                               if (pkt_dev->flags & F_IPDST_RND) {

  WARNING: Missing a blank line after declarations
  #2634: FILE: net/core/pktgen.c:2634:
  +               __u32 t;
  +               if (pkt_dev->flags & F_TXSIZE_RND) {

  WARNING: Missing a blank line after declarations
  #2736: FILE: net/core/pktgen.c:2736:
  +               int i;
  +               for (i = 0; i < pkt_dev->cflows; i++) {

  WARNING: Missing a blank line after declarations
  #2738: FILE: net/core/pktgen.c:2738:
  +                       struct xfrm_state *x = pkt_dev->flows[i].x;
  +                       if (x) {

  WARNING: Missing a blank line after declarations
  #2752: FILE: net/core/pktgen.c:2752:
  +               int nhead = 0;
  +               if (x) {

  WARNING: Missing a blank line after declarations
  #2795: FILE: net/core/pktgen.c:2795:
  +       unsigned int i;
  +       for (i = 0; i < pkt_dev->nr_labels; i++)

  WARNING: Missing a blank line after declarations
  #3480: FILE: net/core/pktgen.c:3480:
  +       ktime_t idle_start = ktime_get();
  +       schedule();

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pktgen: fix code style (WARNING: Block comments)
Peter Seiderer [Thu, 10 Apr 2025 07:17:43 +0000 (09:17 +0200)] 
net: pktgen: fix code style (WARNING: Block comments)

Fix checkpatch code style warnings:

  WARNING: Block comments use a trailing */ on a separate line
  +                                * removal by worker thread */

  WARNING: Block comments use * on subsequent lines
  +       __u8 tos;            /* six MSB of (former) IPv4 TOS
  +                               are for dscp codepoint */

  WARNING: Block comments use a trailing */ on a separate line
  +                               are for dscp codepoint */

  WARNING: Block comments use * on subsequent lines
  +       __u8 traffic_class;  /* ditto for the (former) Traffic Class in IPv6
  +                               (see RFC 3260, sec. 4) */

  WARNING: Block comments use a trailing */ on a separate line
  +                               (see RFC 3260, sec. 4) */

  WARNING: Block comments use * on subsequent lines
  +       /* = {
  +          0x00, 0x80, 0xC8, 0x79, 0xB3, 0xCB,

  WARNING: Block comments use * on subsequent lines
  +       /* Field for thread to receive "posted" events terminate,
  +          stop ifs etc. */

  WARNING: Block comments use a trailing */ on a separate line
  +          stop ifs etc. */

  WARNING: Block comments should align the * on each line
  + * we go look for it ...
  +*/

  WARNING: Block comments use a trailing */ on a separate line
  +        * we resolve the dst issue */

  WARNING: Block comments use a trailing */ on a separate line
  +        * with proc_create_data() */

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pktgen: fix code style (WARNING: suspect code indent for conditional statements)
Peter Seiderer [Thu, 10 Apr 2025 07:17:42 +0000 (09:17 +0200)] 
net: pktgen: fix code style (WARNING: suspect code indent for conditional statements)

Fix checkpatch code style warnings:

  WARNING: suspect code indent for conditional statements (8, 17)
  #2901: FILE: net/core/pktgen.c:2901:
  +       } else {
  +                skb = __netdev_alloc_skb(dev, size, GFP_NOWAIT);

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pktgen: fix code style (ERROR: space prohibited after that '&')
Peter Seiderer [Thu, 10 Apr 2025 07:17:39 +0000 (09:17 +0200)] 
net: pktgen: fix code style (ERROR: space prohibited after that '&')

Fix checkpatch code style errors/checks:

  CHECK: No space is necessary after a cast
  #2984: FILE: net/core/pktgen.c:2984:
  +       *(__be16 *) & eth[12] = protocol;

  ERROR: space prohibited after that '&' (ctx:WxW)
  #2984: FILE: net/core/pktgen.c:2984:
  +       *(__be16 *) & eth[12] = protocol;

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pktgen: fix code style (ERROR: "foo * bar" should be "foo *bar")
Peter Seiderer [Thu, 10 Apr 2025 07:17:38 +0000 (09:17 +0200)] 
net: pktgen: fix code style (ERROR: "foo * bar" should be "foo *bar")

Fix checkpatch code style errors:

  ERROR: "foo * bar" should be "foo *bar"
  #977: FILE: net/core/pktgen.c:977:
  +                              const char __user * user_buffer, size_t count,

  ERROR: "foo * bar" should be "foo *bar"
  #978: FILE: net/core/pktgen.c:978:
  +                              loff_t * offset)

  ERROR: "foo * bar" should be "foo *bar"
  #1912: FILE: net/core/pktgen.c:1912:
  +                                  const char __user * user_buffer,

  ERROR: "foo * bar" should be "foo *bar"
  #1913: FILE: net/core/pktgen.c:1913:
  +                                  size_t count, loff_t * offset)

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: convert dev->rtnl_link_state to a bool
Jakub Kicinski [Thu, 10 Apr 2025 01:42:46 +0000 (18:42 -0700)] 
net: convert dev->rtnl_link_state to a bool

netdevice reg_state was split into two 16 bit enums back in 2010
in commit a2835763e130 ("rtnetlink: handle rtnl_link netlink
notifications manually"). Since the split the fields have been
moved apart, and last year we converted reg_state to a normal
u8 in commit 4d42b37def70 ("net: convert dev->reg_state to u8").

rtnl_link_state being a 16 bitfield makes no sense. Convert it
to a single bool, it seems very unlikely after 15 years that
we'll need more values in it.

We could drop dev->rtnl_link_ops from the conditions but feels
like having it there more clearly points at the reason for this
hack.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250410014246.780885-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoudp: properly deal with xfrm encap and ADDRFORM
Paolo Abeni [Wed, 9 Apr 2025 20:00:56 +0000 (22:00 +0200)] 
udp: properly deal with xfrm encap and ADDRFORM

UDP GRO accounting assumes that the GRO receive callback is always
set when the UDP tunnel is enabled, but syzkaller proved otherwise,
leading tot the following splat:

WARNING: CPU: 0 PID: 5837 at net/ipv4/udp_offload.c:123 udp_tunnel_update_gro_rcv+0x28d/0x4c0 net/ipv4/udp_offload.c:123
Modules linked in:
CPU: 0 UID: 0 PID: 5837 Comm: syz-executor850 Not tainted 6.14.0-syzkaller-13320-g420aabef3ab5 #0 PREEMPT(full)
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
RIP: 0010:udp_tunnel_update_gro_rcv+0x28d/0x4c0 net/ipv4/udp_offload.c:123
Code: 00 00 e8 c6 5a 2f f7 48 c1 e5 04 48 8d b5 20 53 c7 9a ba 10
      00 00 00 4c 89 ff e8 ce 87 99 f7 e9 ce 00 00 00 e8 a4 5a 2f
      f7 90 <0f> 0b 90 e9 de fd ff ff bf 01 00 00 00 89 ee e8 cf
      5e 2f f7 85 ed
RSP: 0018:ffffc90003effa88 EFLAGS: 00010293
RAX: ffffffff8a93fc9c RBX: 0000000000000000 RCX: ffff8880306f9e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff8a93fabe R09: 1ffffffff20bfb2e
R10: dffffc0000000000 R11: fffffbfff20bfb2f R12: ffff88814ef21738
R13: dffffc0000000000 R14: ffff88814ef21778 R15: 1ffff11029de42ef
FS:  0000000000000000(0000) GS:ffff888124f96000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f04eec760d0 CR3: 000000000eb38000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 udp_tunnel_cleanup_gro include/net/udp_tunnel.h:205 [inline]
 udpv6_destroy_sock+0x212/0x270 net/ipv6/udp.c:1829
 sk_common_release+0x71/0x2e0 net/core/sock.c:3896
 inet_release+0x17d/0x200 net/ipv4/af_inet.c:435
 __sock_release net/socket.c:647 [inline]
 sock_close+0xbc/0x240 net/socket.c:1391
 __fput+0x3e9/0x9f0 fs/file_table.c:465
 task_work_run+0x251/0x310 kernel/task_work.c:227
 exit_task_work include/linux/task_work.h:40 [inline]
 do_exit+0xa11/0x27f0 kernel/exit.c:953
 do_group_exit+0x207/0x2c0 kernel/exit.c:1102
 __do_sys_exit_group kernel/exit.c:1113 [inline]
 __se_sys_exit_group kernel/exit.c:1111 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1111
 x64_sys_call+0x26c3/0x26d0 arch/x86/include/generated/asm/syscalls_64.h:232
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f04eebfac79
Code: Unable to access opcode bytes at 0x7f04eebfac4f.
RSP: 002b:00007fffdcaa34a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04eebfac79
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007f04eec75270 R08: ffffffffffffffb8 R09: 00007fffdcaa36c8
R10: 0000200000000000 R11: 0000000000000246 R12: 00007f04eec75270
R13: 0000000000000000 R14: 00007f04eec75cc0 R15: 00007f04eebcca70

Address the issue moving the accounting hook into
setup_udp_tunnel_sock() and set_xfrm_gro_udp_encap_rcv(), where
the GRO callback is actually set.

set_xfrm_gro_udp_encap_rcv() is prone to races with IPV6_ADDRFORM,
run the relevant setsockopt under the socket lock to ensure using
consistent values of sk_family and up->encap_type.

Refactor the GRO callback selection code, to make it clear that
the function pointer is always initialized.

Reported-by: syzbot+8c469a2260132cd095c1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8c469a2260132cd095c1
Fixes: 172bf009c18d ("xfrm: Support GRO for IPv4 ESP in UDP encapsulation")
Fixes: 5d7f5b2f6b935 ("udp_tunnel: use static call for GRO hooks when possible")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/92bcdb6899145a9a387c8fa9e3ca656642a43634.1744228733.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: hsr: sync hw addr of slave2 according to slave1 hw addr on PRP
Fernando Fernandez Mancera [Wed, 9 Apr 2025 10:19:11 +0000 (12:19 +0200)] 
net: hsr: sync hw addr of slave2 according to slave1 hw addr on PRP

In order to work properly PRP requires slave1 and slave2 to share the
same MAC address. To ease the configuration process on userspace tools,
sync the slave2 MAC address with slave1. In addition, when deleting the
port from the list, restore the original MAC address.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: phy: mediatek: add Airoha PHY ID to SoC driver
Christian Marangi [Thu, 10 Apr 2025 10:04:04 +0000 (12:04 +0200)] 
net: phy: mediatek: add Airoha PHY ID to SoC driver

Airoha AN7581 SoC ship with a Switch based on the MT753x Switch embedded
in other SoC like the MT7581 and the MT7988. Similar to these they
require configuring some pin to enable LED PHYs.

Add support for the PHY ID for the Airoha embedded Switch and define a
simple probe function to toggle these pins. Also fill the LED functions
and add dedicated function to define LED polarity.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250410100410.348-2-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: phy: mediatek: permit to compile test GE SOC PHY driver
Christian Marangi [Thu, 10 Apr 2025 10:04:03 +0000 (12:04 +0200)] 
net: phy: mediatek: permit to compile test GE SOC PHY driver

When commit 462a3daad679 ("net: phy: mediatek: fix compile-test
dependencies") fixed the dependency, it should have also introduced
an or on COMPILE_TEST to permit this driver to be compile-tested even if
NVMEM_MTK_EFUSE wasn't selected. The driver makes use of NVMEM API that
are always compiled (return error) so the driver can actually be
compiled even without that config.

Fix and simplify the dependency condition of this kernel config.

Fixes: 462a3daad679 ("net: phy: mediatek: fix compile-test dependencies")
Acked-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250410100410.348-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'add-l2-hw-acceleration-for-airoha_eth-driver'
Jakub Kicinski [Sat, 12 Apr 2025 03:15:38 +0000 (20:15 -0700)] 
Merge branch 'add-l2-hw-acceleration-for-airoha_eth-driver'

Lorenzo Bianconi says:

====================
Add L2 hw acceleration for airoha_eth driver

Introduce the capability to offload L2 traffic defining flower rules in
the PSE/PPE engine available on EN7581 SoC.
Since the hw always reports L2/L3/L4 flower rules, link all L2 rules
sharing the same L2 info (with different L3/L4 info) in the L2 subflows
list of a given L2 PPE entry.

v1: https://lore.kernel.org/20250407-airoha-flowtable-l2b-v1-0-18777778e568@kernel.org
====================

Link: https://patch.msgid.link/20250409-airoha-flowtable-l2b-v2-0-4a1e3935ea92@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: airoha: Add L2 hw acceleration support
Lorenzo Bianconi [Wed, 9 Apr 2025 09:47:15 +0000 (11:47 +0200)] 
net: airoha: Add L2 hw acceleration support

Similar to mtk driver, introduce the capability to offload L2 traffic
defining flower rules in the PSE/PPE engine available on EN7581 SoC.
Since the hw always reports L2/L3/L4 flower rules, link all L2 rules
sharing the same L2 info (with different L3/L4 info) in the L2 subflows
list of a given L2 PPE entry.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20250409-airoha-flowtable-l2b-v2-2-4a1e3935ea92@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: airoha: Add l2_flows rhashtable
Lorenzo Bianconi [Wed, 9 Apr 2025 09:47:14 +0000 (11:47 +0200)] 
net: airoha: Add l2_flows rhashtable

Introduce l2_flows rhashtable in airoha_ppe struct in order to
store L2 flows committed by upper layers of the kernel. This is a
preliminary patch in order to offload L2 traffic rules.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20250409-airoha-flowtable-l2b-v2-1-4a1e3935ea92@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-retire-dccp-socket'
Jakub Kicinski [Sat, 12 Apr 2025 01:58:13 +0000 (18:58 -0700)] 
Merge branch 'net-retire-dccp-socket'

Kuniyuki Iwashima says:

====================
net: Retire DCCP socket.

As announced by commit b144fcaf46d4 ("dccp: Print deprecation
notice."), it's time to remove DCCP socket.

The patch 2 removes net/dccp, LSM code, doc, and etc, leaving
DCCP netfilter modules.

The patch 3 unexports shared functions for DCCP, and the patch 4
renames tcp_or_dccp_get_hashinfo() to tcp_get_hashinfo().

We can do more cleanup; for example, remove IPPROTO_TCP checks in
__inet6?_check_established(), remove __module_get() for twsk,
remove timewait_sock_ops.twsk_destructor(), etc, but it will be
more of TCP stuff, so I'll defer to a later series.

v2: https://lore.kernel.org/20250409003014.19697-1-kuniyu@amazon.com
v1: https://lore.kernel.org/20250407231823.95927-1-kuniyu@amazon.com
====================

Link: https://patch.msgid.link/20250410023921.11307-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotcp: Rename tcp_or_dccp_get_hashinfo().
Kuniyuki Iwashima [Thu, 10 Apr 2025 02:36:47 +0000 (19:36 -0700)] 
tcp: Rename tcp_or_dccp_get_hashinfo().

DCCP was removed, so tcp_or_dccp_get_hashinfo() should be renamed.

Let's rename it to tcp_get_hashinfo().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250410023921.11307-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Unexport shared functions for DCCP.
Kuniyuki Iwashima [Thu, 10 Apr 2025 02:36:46 +0000 (19:36 -0700)] 
net: Unexport shared functions for DCCP.

DCCP was removed, so many inet functions no longer need to
be exported.

Let's unexport or use EXPORT_IPV6_MOD() for such functions.

sk_free_unlock_clone() is inlined in sk_clone_lock() as it's
the only caller.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250410023921.11307-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: Retire DCCP socket.
Kuniyuki Iwashima [Thu, 10 Apr 2025 02:36:45 +0000 (19:36 -0700)] 
net: Retire DCCP socket.

DCCP was orphaned in 2021 by commit 054c4610bd05 ("MAINTAINERS: dccp:
move Gerrit Renker to CREDITS"), which noted that the last maintainer
had been inactive for five years.

In recent years, it has become a playground for syzbot, and most changes
to DCCP have been odd bug fixes triggered by syzbot.  Apart from that,
the only changes have been driven by treewide or networking API updates
or adjustments related to TCP.

Thus, in 2023, we announced we would remove DCCP in 2025 via commit
b144fcaf46d4 ("dccp: Print deprecation notice.").

Since then, only one individual has contacted the netdev mailing list. [0]

There is ongoing research for Multipath DCCP.  The repository is hosted
on GitHub [1], and development is not taking place through the upstream
community.  While the repository is published under the GPLv2 license,
the scheduling part remains proprietary, with a LICENSE file [2] stating:

  "This is not Open Source software."

The researcher mentioned a plan to address the licensing issue, upstream
the patches, and step up as a maintainer, but there has been no further
communication since then.

Maintaining DCCP for a decade without any real users has become a burden.

Therefore, it's time to remove it.

Removing DCCP will also provide significant benefits to TCP.  It allows
us to freely reorganize the layout of struct inet_connection_sock, which
is currently shared with DCCP, and optimize it to reduce the number of
cachelines accessed in the TCP fast path.

Note that we keep DCCP netfilter modules as requested.  [3]

Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u
Link: https://github.com/telekom/mp-dccp
Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE
Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: net: Remove DCCP bits.
Kuniyuki Iwashima [Thu, 10 Apr 2025 02:36:44 +0000 (19:36 -0700)] 
selftest: net: Remove DCCP bits.

We will remove DCCP.

Let's remove DCCP bits from selftest.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250410023921.11307-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: phy: air_en8811h: Add clk provider for CKO pin
Lucien.Jheng [Wed, 9 Apr 2025 15:09:02 +0000 (23:09 +0800)] 
net: phy: air_en8811h: Add clk provider for CKO pin

EN8811H outputs 25MHz or 50MHz clocks on CKO, selected by GPIO3.
CKO clock operates continuously from power-up through md32 loading.
Implement clk provider driver so we can disable the clock output in case
it isn't needed, which also helps to reduce EMF noise

Signed-off-by: Lucien.Jheng <lucienx123@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250409150902.3596-1-lucienx123@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agosock: Correct error checking condition for (assign|release)_proto_idx()
Zijun Hu [Thu, 10 Apr 2025 01:01:27 +0000 (09:01 +0800)] 
sock: Correct error checking condition for (assign|release)_proto_idx()

(assign|release)_proto_idx() wrongly check find_first_zero_bit() failure
by condition '(prot->inuse_idx == PROTO_INUSE_NR - 1)' obviously.

Fix by correcting the condition to '(prot->inuse_idx == PROTO_INUSE_NR)'

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250410-fix_net-v2-1-d69e7c5739a4@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: e1000e: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()
Piotr Wejman [Sat, 22 Feb 2025 12:46:29 +0000 (13:46 +0100)] 
net: e1000e: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()

Update the driver to use the new hardware timestamping API added in commit
66f7223039c0 ("net: add NDOs for configuring hardware timestamping").
Use Netlink extack for error reporting in e1000e_config_hwtstamp.
Align the indentation of net_device_ops.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Piotr Wejman <wejmanpm@gmail.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoigc: enable HW vlan tag insertion/stripping by default
Rui Salvaterra [Thu, 13 Mar 2025 09:35:22 +0000 (09:35 +0000)] 
igc: enable HW vlan tag insertion/stripping by default

This is enabled by default in other Intel drivers I've checked (e1000, e1000e,
iavf, igb and ice). Fixes an out-of-the-box performance issue when running
OpenWrt on typical mini-PCs with igc-supported Ethernet controllers and 802.1Q
VLAN configurations, as ethtool isn't part of the default packages and sane
defaults are expected.

In my specific case, with an Intel N100-based machine with four I226-V Ethernet
controllers, my upload performance increased from under 30 Mb/s to the expected
~1 Gb/s.

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoixgbe: Fix unreachable retry logic in combined and byte I2C write functions
Rand Deeb [Thu, 6 Mar 2025 10:12:00 +0000 (13:12 +0300)] 
ixgbe: Fix unreachable retry logic in combined and byte I2C write functions

The current implementation of `ixgbe_write_i2c_combined_generic_int` and
`ixgbe_write_i2c_byte_generic_int` sets `max_retry` to `1`, which makes
the condition `retry < max_retry` always evaluate to `false`. This renders
the retry mechanism ineffective, as the debug message and retry logic are
never executed.

This patch increases `max_retry` to `3` in both functions, aligning them
with the retry logic in `ixgbe_read_i2c_combined_generic_int`. This
ensures that the retry mechanism functions as intended, improving
robustness in case of I2C write failures.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Rand Deeb <rand.sec96@gmail.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoi40e: fix MMIO write access to an invalid page in i40e_clear_hw
Kyungwook Boo [Tue, 11 Mar 2025 05:16:02 +0000 (14:16 +0900)] 
i40e: fix MMIO write access to an invalid page in i40e_clear_hw

When the device sends a specific input, an integer underflow can occur, leading
to MMIO write access to an invalid page.

Prevent the integer underflow by changing the type of related variables.

Signed-off-by: Kyungwook Boo <bookyungwook@gmail.com>
Link: https://lore.kernel.org/lkml/ffc91764-1142-4ba2-91b6-8c773f6f7095@gmail.com/T/
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: make const read-only array dflt_rules static
Colin Ian King [Mon, 17 Mar 2025 17:20:24 +0000 (17:20 +0000)] 
ice: make const read-only array dflt_rules static

Don't populate the const read-only array dflt_rules on the stack at run
time, instead make it static.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: improve error message for insufficient filter space
Martyna Szapar-Mudlaw [Fri, 14 Mar 2025 08:11:11 +0000 (09:11 +0100)] 
ice: improve error message for insufficient filter space

When adding a rule to switch through tc, if the operation fails
due to not enough free recipes (-ENOSPC), provide a clearer
error message: "Unable to add filter: insufficient space available."

This improves user feedback by distinguishing space limitations from
other generic failures.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: enable timesync operation on 2xNAC E825 devices
Karol Kolacinski [Thu, 20 Mar 2025 13:15:38 +0000 (14:15 +0100)] 
ice: enable timesync operation on 2xNAC E825 devices

According to the E825C specification, SBQ address for ports on a single
complex is device 2 for PHY 0 and device 13 for PHY1.
For accessing ports on a dual complex E825C (so called 2xNAC mode),
the driver should use destination device 2 (referred as phy_0) for
the current complex PHY and device 13 (referred as phy_0_peer) for
peer complex PHY.

Differentiate SBQ destination device by checking if current PF port
number is on the same PHY as target port number.

Adjust 'ice_get_lane_number' function to provide unique port number for
ports from PHY1 in 'dual' mode config (by adding fixed offset for PHY1
ports). Cache this value in ice_hw struct.

Introduce ice_get_primary_hw wrapper to get access to timesync register
not available from second NAC.

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: refactor ice_sbq_msg_dev enum
Karol Kolacinski [Thu, 20 Mar 2025 13:15:37 +0000 (14:15 +0100)] 
ice: refactor ice_sbq_msg_dev enum

Rename ice_sbq_msg_dev to ice_sbq_dev_id to reflect the meaning of this
type more precisely. This enum type describes RDA (Remote Device Access)
client ids, accessible over SB (Side Band) interface.
Rename enum elements to make a driver namespace more cleaner and
consistent with other definitions within SB
Remove unused 'rmn_x' entries, specific to unsupported E824 device.
Adjust clients '2' and '13' names (phy_0 and phy_0_peer respectively) to
be compliant with EAS doc. According to the specification, regardless of
the complex entity (single or dual), when accessing its own ports,
they're accessed always as 'phy_0' client. And referred as 'phy_0_peer'
when handling ports connected to the other complex.

Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: remove SW side band access workaround for E825
Karol Kolacinski [Thu, 20 Mar 2025 13:15:36 +0000 (14:15 +0100)] 
ice: remove SW side band access workaround for E825

Due to the bug in FW/NVM autoload mechanism (wrong default
SB_REM_DEV_CTL register settings), the access to peer PHY and CGU
clients was disabled by default.

As the workaround solution, the register value was overwritten by the
driver at the probe or reset handling.
Remove workaround as it's not needed anymore. The fix in autoload
procedure has been provided with NVM 3.80 version.

NOTE: at the time the fix was provided in NVM, the E825C product was
not officially available on the market, so it's not expected this change
will cause regression when running with older driver/kernel versions.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: enable LLDP TX for VFs through tc
Larysa Zaremba [Fri, 14 Feb 2025 08:50:40 +0000 (09:50 +0100)] 
ice: enable LLDP TX for VFs through tc

Only a single VSI can be in charge of sending LLDP frames, sometimes it is
beneficial to assign this function to a VF, that is possible to do with tc
capabilities in the switchdev mode. It requires first blocking the PF from
sending the LLDP frames with a following command:

tc filter add dev <ifname> egress protocol lldp flower skip_sw action drop

Then it becomes possible to configure a forward rule from a VF port
representor to uplink instead.

tc filter add dev <vf_ifname> ingress protocol lldp flower skip_sw
action mirred egress redirect dev <ifname>

How LLDP exclusivity was done previously is LLDP traffic was blocked for a
whole port by a single rule and PF was bypassing that. Now at least in the
switchdev mode, every separate VSI has to have its own drop rule. Another
complication is the fact that tc does not respect when the driver refuses
to delete a rule, so returning an error results in a HW rule still present
with no way to reference it through tc. This is addressed by allowing the
PF rule to be deleted at any time, but making the VF forward rule "dormant"
in such case, this means it is deleted from HW but stays in tc and driver's
bookkeeping to be restored when drop rule is added back to the PF.

Implement tc configuration handling which enables the user to transmit LLDP
packets from VF instead of PF.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: support egress drop rules on PF
Larysa Zaremba [Fri, 14 Feb 2025 08:50:39 +0000 (09:50 +0100)] 
ice: support egress drop rules on PF

tc clsact qdisc allows us to add offloaded egress rules with commands such
as the following one:

tc filter add dev <ifname> egress protocol lldp flower skip_sw action drop

Support the egress rule drop action when added to PF, with a few caveats:
* in switchdev mode, all PF traffic has to go uplink with an exception for
  LLDP that can be delegated to a single VSI at a time
* in legacy mode, we cannot delegate LLDP functionality to another VSI, so
  such packets from PF should not be blocked.

Also, simplify the rule direction logic, it was previously derived from
actions, but actually can be inherited from the tc block (and flipped in
case of port representors).

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: remove headers argument from ice_tc_count_lkups
Larysa Zaremba [Fri, 14 Feb 2025 08:50:38 +0000 (09:50 +0100)] 
ice: remove headers argument from ice_tc_count_lkups

Remove the headers argument from the ice_tc_count_lkups() function, because
it is not used anywhere.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: receive LLDP on trusted VFs
Mateusz Pacuszka [Fri, 14 Feb 2025 08:50:37 +0000 (09:50 +0100)] 
ice: receive LLDP on trusted VFs

When a trusted VF tries to configure an LLDP multicast address, configure a
rule that would mirror the traffic to this VF, untrusted VFs are not
allowed to receive LLDP at all, so the request to add LLDP MAC address will
always fail for them.

Add a forwarding LLDP filter to a trusted VF when it tries to add an LLDP
multicast MAC address. The MAC address has to be added after enabling
trust (through restarting the LLDP service).

Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: do not add LLDP-specific filter if not necessary
Larysa Zaremba [Fri, 14 Feb 2025 08:50:36 +0000 (09:50 +0100)] 
ice: do not add LLDP-specific filter if not necessary

Commit 34295a3696fb ("ice: implement new LLDP filter command")
introduced the ability to use LLDP-specific filter that directs all
LLDP traffic to a single VSI. However, current goal is for all trusted VFs
to be able to see LLDP neighbors, which is impossible to do with the
special filter.

Make using the generic filter the default choice and fall back to special
one only if a generic filter cannot be added. That way setups with "NVMs
where an already existent LLDP filter is blocking the creation of a filter
to allow LLDP packets" will still be able to configure software Rx LLDP on
PF only, while all other setups would be able to forward them to VFs too.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agoice: fix check for existing switch rule
Mateusz Pacuszka [Fri, 14 Feb 2025 08:50:35 +0000 (09:50 +0100)] 
ice: fix check for existing switch rule

In case the rule already exists and another VSI wants to subscribe to it
new VSI list is being created and both VSIs are moved to it.
Currently, the check for already existing VSI with the same rule is done
based on fdw_id.hw_vsi_id, which applies only to LOOKUP_RX flag.
Change it to vsi_handle. This is software VSI ID, but it can be applied
here, because vsi_map itself is also based on it.

Additionally change return status in case the VSI already exists in the
VSI map to "Already exists". Such case should be handled by the caller.

Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 months agonet: stmmac: stm32: simplify clock handling
Russell King (Oracle) [Mon, 7 Apr 2025 19:15:35 +0000 (20:15 +0100)] 
net: stmmac: stm32: simplify clock handling

Some stm32 implementations need the receive clock running in suspend,
as indicated by dwmac->ops->clk_rx_enable_in_suspend. The existing
code achieved this in a rather complex way, by passing a flag around.

However, the clk API prepare/enables are counted - which means that a
clock won't be stopped as long as there are more prepare and enables
than disables and unprepares, just like a reference count.

Therefore, we can simplify this logic by calling clk_prepare_enable()
an additional time in the probe function if this flag is set, and then
balancing that at remove time.

With this, we can avoid passing a "are we suspending" and "are we
resuming" flag to various functions in the driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agor8169: add helper rtl8125_phy_param
Heiner Kallweit [Wed, 9 Apr 2025 19:14:47 +0000 (21:14 +0200)] 
r8169: add helper rtl8125_phy_param

The integrated PHY's of RTL8125/8126 have an own mechanism to access
PHY parameters, similar to what r8168g_phy_param does on earlier PHY
versions. Add helper rtl8125_phy_param to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/847b7356-12d6-441b-ade9-4b6e1539b84a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agor8169: add helper rtl_csi_mod for accessing extended config space
Heiner Kallweit [Wed, 9 Apr 2025 19:05:37 +0000 (21:05 +0200)] 
r8169: add helper rtl_csi_mod for accessing extended config space

Add a helper for the Realtek-specific mechanism for accessing extended
config space if native access isn't possible.
This avoids code duplication.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/b368fd91-57d7-4cb5-9342-98b4d8fe9aea@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoipv4: remove unnecessary judgment in ip_route_output_key_hash_rcu
Zhengchao Shao [Wed, 9 Apr 2025 03:33:21 +0000 (11:33 +0800)] 
ipv4: remove unnecessary judgment in ip_route_output_key_hash_rcu

In the ip_route_output_key_cash_rcu function, the input fl4 member saddr is
first checked to be non-zero before entering multicast, broadcast and
arbitrary IP address checks. However, the fact that the IP address is not
0 has already ruled out the possibility of any address, so remove
unnecessary judgment.

Signed-off-by: Zhengchao Shao <shaozhengchao@163.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250409033321.108244-1-shaozhengchao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'tools-ynl-c-basic-netlink-raw-support'
Jakub Kicinski [Fri, 11 Apr 2025 03:14:43 +0000 (20:14 -0700)] 
Merge branch 'tools-ynl-c-basic-netlink-raw-support'

Jakub Kicinski says:

====================
tools: ynl: c: basic netlink-raw support

Basic support for netlink-raw AKA classic netlink in user space C codegen.
This series is enough to read routes and addresses from the kernel
(see the samples in patches 12 and 13).

Specs need to be slightly adjusted and decorated with the c naming info.

In terms of codegen this series includes just the basic plumbing required
to skip genlmsghdr and handle request types which may technically also
be legal in genetlink-legacy but are very uncommon there.

Subsequent series will add support for:
 - handling CRUD-style notifications
 - code gen for array types classic netlink uses
 - sub-message support

v1: https://lore.kernel.org/20250409000400.492371-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250410014658.782120-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl: generate code for rt-route and add a sample
Jakub Kicinski [Thu, 10 Apr 2025 01:46:58 +0000 (18:46 -0700)] 
tools: ynl: generate code for rt-route and add a sample

YNL C can now generate code for simple classic netlink families.
Include rt-route in the Makefile for generation and add a sample.

    $ ./tools/net/ynl/samples/rt-route
    oif: wlp0s20f3        gateway: 192.168.1.1
    oif: wlp0s20f3        dst: 192.168.1.0/24
    oif: vpn0             dst: fe80::/64
    oif: wlp0s20f3        dst: fe80::/64
    oif: wlp0s20f3        gateway: fe80::200:5eff:fe00:201

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-14-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl: generate code for rt-addr and add a sample
Jakub Kicinski [Thu, 10 Apr 2025 01:46:57 +0000 (18:46 -0700)] 
tools: ynl: generate code for rt-addr and add a sample

YNL C can now generate code for simple classic netlink families.
Include rt-addr in the Makefile for generation and add a sample.

  $ ./tools/net/ynl/samples/rt-addr
              lo: 127.0.0.1
       wlp0s20f3: 192.168.1.101
              lo: ::
       wlp0s20f3: fe80::6385:be6:746e:8116
            vpn0: fe80::3597:d353:b5a7:66dd

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl-gen: use family c-name in notifications
Jakub Kicinski [Thu, 10 Apr 2025 01:46:56 +0000 (18:46 -0700)] 
tools: ynl-gen: use family c-name in notifications

Family names may include dashes. Fix notification handling
code gen to the c-compatible name.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl-gen: consider dump ops without a do "type-consistent"
Jakub Kicinski [Thu, 10 Apr 2025 01:46:55 +0000 (18:46 -0700)] 
tools: ynl-gen: consider dump ops without a do "type-consistent"

If the type for the response to do and dump are the same we don't
generate it twice. This is called "type_consistent" in the generator.
Consider operations which only have dump to also be consistent.
This removes unnecessary "_dump" from the names. There's a number
of GET ops in classic Netlink which only have dump handlers.

Make sure we output the "onesided" types, normally if the type
is consistent we only output it when we render the do structures.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250410014658.782120-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl: don't use genlmsghdr in classic netlink
Jakub Kicinski [Thu, 10 Apr 2025 01:46:54 +0000 (18:46 -0700)] 
tools: ynl: don't use genlmsghdr in classic netlink

Make sure the codegen calls the right YNL lib helper to start
the request based on family type. Classic netlink request must
not include the genl header.

Conversely don't expect genl headers in the responses.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl-gen: don't consider requests with fixed hdr empty
Jakub Kicinski [Thu, 10 Apr 2025 01:46:53 +0000 (18:46 -0700)] 
tools: ynl-gen: don't consider requests with fixed hdr empty

C codegen skips generating the structs if request/reply has no attrs.
In such cases the request op takes no argument and return int
(rather than response struct). In case of classic netlink a lot of
information gets passed using the fixed struct, however, so adjust
the logic to consider a request empty only if it has no attrs _and_
no fixed struct.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250410014658.782120-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl: support creating non-genl sockets
Jakub Kicinski [Thu, 10 Apr 2025 01:46:52 +0000 (18:46 -0700)] 
tools: ynl: support creating non-genl sockets

Classic netlink has static family IDs specified in YAML,
there is no family name -> ID lookup. Support providing
the ID info to the library via the generated struct and
make library use it. Since NETLINK_ROUTE is ID 0 we need
an extra boolean to indicate classic_id is to be used.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: rt-route: add C naming info
Jakub Kicinski [Thu, 10 Apr 2025 01:46:51 +0000 (18:46 -0700)] 
netlink: specs: rt-route: add C naming info

Add properties needed for C codegen to match names with uAPI headers.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: rt-addr: add C naming info
Jakub Kicinski [Thu, 10 Apr 2025 01:46:50 +0000 (18:46 -0700)] 
netlink: specs: rt-addr: add C naming info

Add properties needed for C codegen to match names with uAPI headers.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: rt-route: remove the fixed members from attrs
Jakub Kicinski [Thu, 10 Apr 2025 01:46:49 +0000 (18:46 -0700)] 
netlink: specs: rt-route: remove the fixed members from attrs

The purpose of the attribute list is to list the attributes
which will be included in a given message to shrink the objects
for families with huge attr spaces. Fixed headers are always
present in their entirety (between netlink header and the attrs)
so there's no point in listing their members. Current C codegen
doesn't expect them and tries to look them up in the attribute space.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: rt-addr: remove the fixed members from attrs
Jakub Kicinski [Thu, 10 Apr 2025 01:46:48 +0000 (18:46 -0700)] 
netlink: specs: rt-addr: remove the fixed members from attrs

The purpose of the attribute list is to list the attributes
which will be included in a given message to shrink the objects
for families with huge attr spaces. Fixed headers are always
present in their entirety (between netlink header and the attrs)
so there's no point in listing their members. Current C codegen
doesn't expect them and tries to look them up in the attribute space.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: rt-route: specify fixed-header at operations level
Jakub Kicinski [Thu, 10 Apr 2025 01:46:47 +0000 (18:46 -0700)] 
netlink: specs: rt-route: specify fixed-header at operations level

The C codegen currently stores the fixed-header as part of family
info, so it only supports one fixed-header type per spec. Luckily
all rtm route message have the same fixed header so just move it up
to the higher level.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: rename rtnetlink specs in accordance with family name
Jakub Kicinski [Thu, 10 Apr 2025 01:46:46 +0000 (18:46 -0700)] 
netlink: specs: rename rtnetlink specs in accordance with family name

The rtnetlink family names are set to rt-$name within the YAML
but the files are called rt_$name. C codegen assumes that the
generated file name will match the family. The use of dashes
is in line with our general expectation that name properties
in the spec use dashes not underscores (even tho, as Donald
points out most genl families use underscores in the name).

We have 3 un-ideal options to choose from:

 - accept the slight inconsistency with old families using _, or
 - accept the slight annoyance with all languages having to do s/-/_/
   when looking up family ID, or
 - accept the inconsistency with all name properties in new YAML spec
   being separated with - and just the family name always using _.

Pick option 1 and rename the rtnl spec files.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agousbnet: asix AX88772: leave the carrier control to phylink
Krzysztof Hałasa [Tue, 8 Apr 2025 11:59:41 +0000 (13:59 +0200)] 
usbnet: asix AX88772: leave the carrier control to phylink

ASIX AX88772B based USB 10/100 Ethernet adapter doesn't come
up ("carrier off"), despite the built-in 100BASE-FX PHY positive link
indication. The internal PHY is configured (using EEPROM) in fixed
100 Mbps full duplex mode.

The primary problem appears to be using carrier_netif_{on,off}() while,
at the same time, delegating carrier management to phylink. Use only the
latter and remove "manual control" in the asix driver.

I don't have any other AX88772 board here, but the problem doesn't seem
specific to a particular board or settings - it's probably
timing-dependent.

Remove unused asix_adjust_link() as well.

Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/m3plhmdfte.fsf_-_@t19.piap.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'trace-add-tracepoint-for-tcp_sendmsg_locked'
Jakub Kicinski [Fri, 11 Apr 2025 01:34:08 +0000 (18:34 -0700)] 
Merge branch 'trace-add-tracepoint-for-tcp_sendmsg_locked'

Breno Leitao says:

====================
trace: add tracepoint for tcp_sendmsg_locked()

Meta has been using BPF programs to monitor tcp_sendmsg() for years,
indicating significant interest in observing this important
functionality. Adding a proper tracepoint provides a stable API for all
users who need visibility into TCP message transmission.

David Ahern is using a similar functionality with a custom patch[1]. So,
this means we have more than a single use case for this request, and it
might be a good idea to have such feature upstream.

Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/
v2: https://lore.kernel.org/20250407-tcpsendmsg-v2-0-9f0ea843ef99@debian.org
v1: https://lore.kernel.org/20250224-tcpsendmsg-v1-1-bac043c59cc8@debian.org
====================

Link: https://patch.msgid.link/20250408-tcpsendmsg-v3-0-208b87064c28@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotrace: tcp: Add tracepoint for tcp_sendmsg_locked()
Breno Leitao [Tue, 8 Apr 2025 18:32:02 +0000 (11:32 -0700)] 
trace: tcp: Add tracepoint for tcp_sendmsg_locked()

Add a tracepoint to monitor TCP send operations, enabling detailed
visibility into TCP message transmission.

Create a new tracepoint within the tcp_sendmsg_locked function,
capturing traditional fields along with size_goal, which indicates the
optimal data size for a single TCP segment. Additionally, a reference to
the struct sock sk is passed, allowing direct access for BPF programs.
The implementation is largely based on David's patch[1] and suggestions.

Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250408-tcpsendmsg-v3-2-208b87064c28@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pass const to msg_data_left()
Breno Leitao [Tue, 8 Apr 2025 18:32:01 +0000 (11:32 -0700)] 
net: pass const to msg_data_left()

The msg_data_left() function doesn't modify the struct msghdr parameter,
so mark it as const. This allows the function to be used with const
references, improving type safety and making the API more flexible.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250408-tcpsendmsg-v3-1-208b87064c28@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-stmmac-stmmac_pltfr_find_clk'
Jakub Kicinski [Fri, 11 Apr 2025 01:31:56 +0000 (18:31 -0700)] 
Merge branch 'net-stmmac-stmmac_pltfr_find_clk'

Russell King says:

====================
net: stmmac: stmmac_pltfr_find_clk()

The GBETH glue driver that is being proposed duplicates the clock
finding from the bulk clock data in the stmmac platform data structure.
iLet's provide a generic implementation that glue drivers can use, and
convert dwc-qos-eth to use it.
====================

Link: https://patch.msgid.link/Z_Yn3dJjzcOi32uU@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: dwc-qos: use stmmac_pltfr_find_clk()
Russell King (Oracle) [Wed, 9 Apr 2025 08:02:25 +0000 (09:02 +0100)] 
net: stmmac: dwc-qos: use stmmac_pltfr_find_clk()

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1u2QO9-001Rp8-Ii@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: provide stmmac_pltfr_find_clk()
Russell King (Oracle) [Wed, 9 Apr 2025 08:02:20 +0000 (09:02 +0100)] 
net: stmmac: provide stmmac_pltfr_find_clk()

Provide a generic way to find a clock in the bulk data.

Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1u2QO4-001Rp2-Dy@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'tcp-add-a-new-tw_paws-drop-reason'
Jakub Kicinski [Fri, 11 Apr 2025 01:29:27 +0000 (18:29 -0700)] 
Merge branch 'tcp-add-a-new-tw_paws-drop-reason'

Jiayuan Chen says:

====================
tcp: add a new TW_PAWS drop reason

Devices in the networking path, such as firewalls, NATs, or routers, which
can perform SNAT or DNAT, use addresses from their own limited address
pools to masquerade the source address during forwarding, causing PAWS
verification to fail more easily under TW status.

Currently, packet loss statistics for PAWS can only be viewed through MIB,
which is a global metric and cannot be precisely obtained through tracing
to get the specific 4-tuple of the dropped packet. In the past, we had to
use kprobe ret to retrieve relevant skb information from
tcp_timewait_state_process().

We add a drop_reason pointer and a new counter.

I didn't provide a packetdrill script.
I struggled for a long time to get packetdrill to fix the client port, but
ultimately failed to do so...

Instead, I wrote my own program to trigger PAWS, which can be found at
https://github.com/mrpre/nettrigger/tree/main
'''
//assume nginx running on 172.31.75.114:9999, current host is 172.31.75.115
iptables -t filter -I OUTPUT -p tcp --sport 12345 --tcp-flags RST RST -j DROP
./nettrigger -i eth0 -s 172.31.75.115:12345 -d 172.31.75.114:9999 -action paws
'''

v2: https://lore.kernel.org/5cdc1bdd9caee92a6ae932638a862fd5c67630e8@linux.dev
v3: https://lore.kernel.org/20250407140001.13886-1-jiayuan.chen@linux.dev
====================

Link: https://patch.msgid.link/20250409112614.16153-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotcp: add LINUX_MIB_PAWS_TW_REJECTED counter
Jiayuan Chen [Wed, 9 Apr 2025 11:26:05 +0000 (19:26 +0800)] 
tcp: add LINUX_MIB_PAWS_TW_REJECTED counter

When TCP is in TIME_WAIT state, PAWS verification uses
LINUX_PAWSESTABREJECTED, which is ambiguous and cannot be distinguished
from other PAWS verification processes.

We added a new counter, like the existing PAWS_OLD_ACK one.

Also we update the doc with previously missing PAWS_OLD_ACK.

usage:
'''
nstat -az | grep PAWSTimewait
TcpExtPAWSTimewait              1                  0.0
'''

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250409112614.16153-3-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>