git.ipfire.org Git - thirdparty/linux.git/log

]> git.ipfire.org Git - thirdparty/linux.git/log

Eric Dumazet [Mon, 30 Jun 2025 12:19:32 +0000 (12:19 +0000)]

ipv6: adopt dst_dev() helper

Use the new helper as a step to deal with potential dst->dev races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 12:19:31 +0000 (12:19 +0000)]

ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu]

Use the new helpers as a first step to deal with
potential dst->dev races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 12:19:30 +0000 (12:19 +0000)]

net: dst: add four helpers to annotate data-races around dst->dev

dst->dev is read locklessly in many contexts,
and written in dst_dev_put().

Fixing all the races is going to need many changes.

We probably will have to add full RCU protection.

Add three helpers to ease this painful process.

static inline struct net_device *dst_dev(const struct dst_entry *dst)
{
       return READ_ONCE(dst->dev);
}

static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
{
       return dst_dev(skb_dst(skb));
}

static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
{
       return dev_net(skb_dst_dev(skb));
}

static inline struct net *skb_dst_dev_net_rcu(const struct sk_buff *skb)
{
       return dev_net_rcu(skb_dst_dev(skb));
}

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 12:19:29 +0000 (12:19 +0000)]

net: dst: annotate data-races around dst->output

dst_dev_put() can overwrite dst->output while other
cpus might read this field (for instance from dst_output())

Add READ_ONCE()/WRITE_ONCE() annotations to suppress
potential issues.

We will likely need RCU protection in the future.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 12:19:28 +0000 (12:19 +0000)]

net: dst: annotate data-races around dst->input

dst_dev_put() can overwrite dst->input while other
cpus might read this field (for instance from dst_input())

Add READ_ONCE()/WRITE_ONCE() annotations to suppress
potential issues.

We will likely need full RCU protection later.

Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 12:19:27 +0000 (12:19 +0000)]

net: dst: annotate data-races around dst->lastuse

(dst_entry)->lastuse is read and written locklessly,
add corresponding annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 12:19:26 +0000 (12:19 +0000)]

net: dst: annotate data-races around dst->expires

(dst_entry)->expires is read and written locklessly,
add corresponding annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 12:19:25 +0000 (12:19 +0000)]

net: dst: annotate data-races around dst->obsolete

(dst_entry)->obsolete is read locklessly, add corresponding
annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 2 Jul 2025 21:22:04 +0000 (14:22 -0700)]

Merge branch 'net-introduce-net_aligned_data'

Eric Dumazet says:

====================
net: introduce net_aligned_data

____cacheline_aligned_in_smp on small fields like
tcp_memory_allocated and udp_memory_allocated is not good enough.

It makes sure to put these fields at the start of a cache line,
but does not prevent the linker from using the cache line for other
fields, with potential performance impact.

nm -v vmlinux|egrep -5 "tcp_memory_allocated|udp_memory_allocated"

...
ffffffff83e35cc0 B tcp_memory_allocated
ffffffff83e35cc8 b __key.0
ffffffff83e35cc8 b __tcp_tx_delay_enabled.1
ffffffff83e35ce0 b tcp_orphan_timer
...
ffffffff849dddc0 B udp_memory_allocated
ffffffff849dddc8 B udp_encap_needed_key
ffffffff849dddd8 B udpv6_encap_needed_key
ffffffff849dddf0 b inetsw_lock

One solution is to move these sensitive fields to a structure,
so that the compiler is forced to add empty holes between each member.

nm -v vmlinux|egrep -2 "tcp_memory_allocated|udp_memory_allocated|net_aligned_data"

ffffffff885af970 b mem_id_init
ffffffff885af980 b __key.0
ffffffff885af9c0 B net_aligned_data
ffffffff885afa80 B page_pool_mem_providers
ffffffff885afa90 b __key.2

v1: https://lore.kernel.org/netdev/20250627200551.348096-1-edumazet@google.com/
====================

Link: https://patch.msgid.link/20250630093540.3052835-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 09:35:40 +0000 (09:35 +0000)]

udp: move udp_memory_allocated into net_aligned_data

____cacheline_aligned_in_smp attribute only makes sure to align
a field to a cache line. It does not prevent the linker to use
the remaining of the cache line for other variables, causing
potential false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250630093540.3052835-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 09:35:39 +0000 (09:35 +0000)]

tcp: move tcp_memory_allocated into net_aligned_data

____cacheline_aligned_in_smp attribute only makes sure to align
a field to a cache line. It does not prevent the linker to use
the remaining of the cache line for other variables, causing
potential false sharing.

Move tcp_memory_allocated into a dedicated cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250630093540.3052835-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 09:35:38 +0000 (09:35 +0000)]

net: move net_cookie into net_aligned_data

Using per-cpu data for net->net_cookie generation is overkill,
because even busy hosts do not create hundreds of netns per second.

Make sure to put net_cookie in a private cache line to avoid
potential false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250630093540.3052835-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Mon, 30 Jun 2025 09:35:37 +0000 (09:35 +0000)]

net: add struct net_aligned_data

This structure will hold networking data that must
consume a full cache line to avoid accidental false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250630093540.3052835-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

zhangjianrong [Sat, 28 Jun 2025 09:38:13 +0000 (17:38 +0800)]

net: thunderbolt: Enable end-to-end flow control also in transmit

According to USB4 specification, if E2E flow control is disabled for
the Transmit Descriptor Ring, the Host Interface Adapter Layer shall
not require any credits to be available before transmitting a Tunneled
Packet from this Transmit Descriptor Ring, so e2e flow control should
be enabled in both directions.

Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://lore.kernel.org/20250624153805.GC2824380@black.fi.intel.com
Signed-off-by: zhangjianrong <zhangjianrong5@huawei.com>
Link: https://patch.msgid.link/20250628093813.647005-1-zhangjianrong5@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

zhangjianrong [Sat, 28 Jun 2025 09:49:20 +0000 (17:49 +0800)]

net: thunderbolt: Fix the parameter passing of tb_xdomain_enable_paths()/tb_xdomain_disable_paths()

According to the description of tb_xdomain_enable_paths(), the third
parameter represents the transmit ring and the fifth parameter represents
the receive ring. tb_xdomain_disable_paths() is the same case.

[Jakub] Mika says: it works now because both rings ->hop is the same

Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://lore.kernel.org/20250625051149.GD2824380@black.fi.intel.com
Signed-off-by: zhangjianrong <zhangjianrong5@huawei.com>
Link: https://patch.msgid.link/20250628094920.656658-1-zhangjianrong5@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Uwe Kleine-König [Fri, 27 Jun 2025 10:22:20 +0000 (12:22 +0200)]

net: tulip: Rename PCI driver struct to end in _driver

This is not only a cosmetic change because the section mismatch checks
also depend on the name and for drivers the checks are stricter than for
ops.

However xircom_driver also passes the stricter checks just fine, so no
further changes needed.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20250627102220.1937649-2-u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Uwe Kleine-König [Mon, 30 Jun 2025 16:44:07 +0000 (18:44 +0200)]

net: atlantic: Rename PCI driver struct to end in _driver

This is not only a cosmetic change because the section mismatch checks
(implemented in scripts/mod/modpost.c) also depend on the object's name
and for drivers the checks are stricter than for ops.

However aq_pci_driver also passes the stricter checks just fine, so no
further changes needed.

The cheating^Wmisleading name was introduced in commit 97bde5c4f909
("net: ethernet: aquantia: Support for NIC-specific code")

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250630164406.57589-2-u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Lucien.Jheng [Mon, 30 Jun 2025 15:41:47 +0000 (23:41 +0800)]

net: phy: air_en8811h: Introduce resume/suspend and clk_restore_context to ensure correct CKO settings after network interface reinitialization.

If the user reinitializes the network interface, the PHY will reinitialize,
and the CKO settings will revert to their initial configuration(be enabled).
To prevent CKO from being re-enabled,
en8811h_clk_restore_context and en8811h_resume were added
to ensure the CKO settings remain correct.

Signed-off-by: Lucien.Jheng <lucienzx159@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250630154147.80388-1-lucienzx159@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Christophe JAILLET [Sun, 29 Jun 2025 21:06:38 +0000 (23:06 +0200)]

net: dsa: hellcreek: Constify struct devlink_region_ops and struct hellcreek_fdb_entry

'struct devlink_region_ops' and 'struct hellcreek_fdb_entry' are not
modified in this driver.

Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

On a x86_64, with allmodconfig:
Before:
======
   text    data     bss     dec     hex filename
  55320   19216     320   74856   12468 drivers/net/dsa/hirschmann/hellcreek.o

After:
=====
   text    data     bss     dec     hex filename
  55960   18576     320   74856   12468 drivers/net/dsa/hirschmann/hellcreek.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/2f7e8dc30db18bade94999ac7ce79f333342e979.1751231174.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Wed, 2 Jul 2025 02:32:46 +0000 (19:32 -0700)]

Merge branch 'seg6-fix-typos-in-comments-within-the-srv6-subsystem'

Andrea Mayer says:

====================
seg6: fix typos in comments within the SRv6 subsystem

In this patchset, we correct some typos found both in the SRv6 Endpoints
implementation (i.e., seg6local) and in some SRv6 selftests, using
codespell.
====================

Link: https://patch.msgid.link/20250629171226.4988-1-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Andrea Mayer [Sun, 29 Jun 2025 17:12:26 +0000 (19:12 +0200)]

selftests: seg6: fix instaces typo in comments

Fix a typo:
instaces -> instances

The typo has been identified using codespell, and the tool does not
report any additional issues in the selftests considered.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250629171226.4988-3-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Andrea Mayer [Sun, 29 Jun 2025 17:12:25 +0000 (19:12 +0200)]

seg6: fix lenghts typo in a comment

Fix a typo:
lenghts -> length

The typo has been identified using codespell, and the tool currently
does not report any additional issues in comments.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250629171226.4988-2-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Christophe JAILLET [Sun, 29 Jun 2025 12:35:50 +0000 (14:35 +0200)]

net: dsa: mv88e6xxx: Use kcalloc()

Use kcalloc() instead of hand writing it. This is less verbose.

Also move the initialization of 'count' to save some LoC.

On a x86_64, with allmodconfig, as an example:
Before:
======
   text    data     bss     dec     hex filename
  18652    5920      64   24636    603c drivers/net/dsa/mv88e6xxx/devlink.o

After:
=====
   text    data     bss     dec     hex filename
  18498    5920      64   24482    5fa2 drivers/net/dsa/mv88e6xxx/devlink.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/2f4fca4ff84950da71e007c9169f18a0272476f3.1751200453.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Christophe JAILLET [Sun, 29 Jun 2025 12:35:49 +0000 (14:35 +0200)]

net: dsa: mv88e6xxx: Constify struct devlink_region_ops and struct mv88e6xxx_region

'struct devlink_region_ops' and 'struct mv88e6xxx_region' are not modified
in this driver.

Constifying these structures moves some data to a read-only section, so
increases overall security, especially when the structure holds some
function pointers.

On a x86_64, with allmodconfig, as an example:
Before:
======
   text    data     bss     dec     hex filename
  18076    6496      64   24636    603c drivers/net/dsa/mv88e6xxx/devlink.o

After:
=====
   text    data     bss     dec     hex filename
  18652    5920      64   24636    603c drivers/net/dsa/mv88e6xxx/devlink.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/46040062161dda211580002f950a6d60433243dc.1751200453.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Work [Sun, 29 Jun 2025 05:15:28 +0000 (22:15 -0700)]

net: atlantic: add set_power to fw_ops for atl2 to fix wol

Aquantia AQC113(C) using ATL2FW doesn't properly prepare the NIC for
enabling wake-on-lan. The FW operation `set_power` was only implemented
for `hw_atl` and not `hw_atl2`. Implement the `set_power` functionality
for `hw_atl2`.

Tested with both AQC113 and AQC113C devices. Confirmed you can shutdown
the system and wake from S5 using magic packets. NIC was previously
powered off when entering S5. If the NIC was configured for WOL by the
Windows driver, loading the atlantic driver would disable WOL.

Partially cherry-picks changes from commit,
https://github.com/Aquantia/AQtion/commit/37bd5cc

Attributing original authors from Marvell for the referenced commit.

Closes: https://github.com/Aquantia/AQtion/issues/70
Co-developed-by: Igor Russkikh <irusskikh@marvell.com>
Co-developed-by: Mark Starovoitov <mstarovoitov@marvell.com>
Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Co-developed-by: Pavel Belous <pbelous@marvell.com>
Co-developed-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Eric Work <work.eric@gmail.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://patch.msgid.link/20250629051535.5172-1-work.eric@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Haiyang Zhang [Fri, 27 Jun 2025 20:26:23 +0000 (13:26 -0700)]

net: mana: Handle Reset Request from MANA NIC

Upon receiving the Reset Request, pause the connection and clean up
queues, wait for the specified period, then resume the NIC.
In the cleanup phase, the HWC is no longer responding, so set hwc_timeout
to zero to skip waiting on the response.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1751055983-29760-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Oleksij Rempel [Fri, 27 Jun 2025 11:25:39 +0000 (13:25 +0200)]

phy: micrel: add Signal Quality Indicator (SQI) support for KSZ9477 switch PHYs

Add support for the Signal Quality Indicator (SQI) feature on KSZ9477
family switches, providing a relative measure of receive signal quality.

The hardware exposes separate SQI readings per channel. For 1000BASE-T,
all four channels are read. For 100BASE-TX, only one channel is reported,
but which receive pair is active depends on Auto MDI-X negotiation, which
is not exposed by the hardware. Therefore, it is not possible to reliably
map the measured channel to a specific wire pair.

This resolves an earlier discussion about how to handle multi-channel
SQI. Originally, the plan was to expose all channels individually.
However, since pair mapping is sometimes unavailable, this
implementation treats SQI as a per-link metric instead. This fallback
avoids ambiguity and ensures consistent behavior. The existing get_sqi()
UAPI was designed for single-pair Ethernet (SPE), where per-pair and
per-link are effectively equivalent. Restricting its use to per-link
metrics does not introduce regressions for existing users.

The raw 7-bit SQI value (0–127, lower is better) is converted to the
standard 0–7 (high is better) scale. Empirical testing showed that the
link becomes unstable around a raw value of 8.

The SQI raw value remains zero if no data is received, even if noise is
present. This confirms that the measurement reflects the "quality" during
active data reception rather than the passive line state. User space
must ensure that traffic is present on the link to obtain valid SQI
readings.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250627112539.895255-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Mina Almasry [Fri, 27 Jun 2025 20:04:52 +0000 (20:04 +0000)]

selftests: pp-bench: remove page_pool_put_page wrapper

Minor cleanup: remove the pointless looking _ wrapper around
page_pool_put_page, and just do the call directly.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20250627200501.1712389-2-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Mina Almasry [Fri, 27 Jun 2025 20:04:51 +0000 (20:04 +0000)]

selftests: pp-bench: remove unneeded linux/version.h

linux/version.h was used by the out-of-tree version, but not needed in
the upstream one anymore.

While I'm at it, sort the includes.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506271434.Gk0epC9H-lkp@intel.com/
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20250627200501.1712389-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Nicolas Dichtel [Mon, 30 Jun 2025 14:54:54 +0000 (16:54 +0200)]

ip6_tunnel: enable to change proto of fb tunnels

This is possible via the ioctl API:
> ip -6 tunnel change ip6tnl0 mode any

Let's align the netlink API:
> ip link set ip6tnl0 type ip6tnl mode any

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20250630145602.1027220-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Sebastian Andrzej Siewior [Mon, 30 Jun 2025 15:33:41 +0000 (17:33 +0200)]

selftests/tc-testing: Enable CONFIG_IP_SET

The config snippet specifies CONFIG_NET_EMATCH_IPSET. This option
depends on CONFIG_IP_SET.

Set CONFIG_IP_SET to be enabled at part for tc-testing.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20250630153341.Wgh3SzGi@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Frank Li [Mon, 30 Jun 2025 16:16:12 +0000 (12:16 -0400)]

dt-bindings: net: convert nxp,lpc1850-dwmac.txt to yaml format

Convert nxp,lpc1850-dwmac.txt to yaml format.

Additional changes:
- compatible string add fallback as "nxp,lpc1850-dwmac", "snps,dwmac-3.611"
"snps,dwmac".
- add common interrupts, interrupt-names, clocks, clock-names, resets and
reset-names properties.
- add ref snps,dwmac.yaml.
- add phy-mode in example to avoid dt_binding_check warning.
- update examples to align lpc18xx.dtsi.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250630161613.2838039-1-Frank.Li@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Dave Marquardt [Mon, 30 Jun 2025 16:23:53 +0000 (11:23 -0500)]

docs: netdevsim: fixe typo in netdevsim documentation

Fixed a typographical error in "Rate objects" section

Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com>
Link: https://patch.msgid.link/20250630-netdevsim-typo-fix-v3-1-e1eae3a5f018@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Mon, 30 Jun 2025 15:40:53 +0000 (08:40 -0700)]

net: ethtool: fix leaking netdev ref if ethnl_default_parse() failed

Ido spotted that I made a mistake in commit under Fixes,
ethnl_default_parse() may acquire a dev reference even when it returns
an error. This may have been driven by the code structure in dumps
(which unconditionally release dev before handling errors), but it's
too much of a trap. Functions should undo what they did before returning
an error, rather than expecting caller to clean up.

Rather than fixing ethnl_default_set_doit() directly make
ethnl_default_parse() clean up errors.

Reported-by: Ido Schimmel <idosch@idosch.org>
Link: https://lore.kernel.org/aGEPszpq9eojNF4Y@shredder
Fixes: 963781bdfe20 ("net: ethtool: call .parse_request for SET handlers")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250630154053.1074664-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Fushuai Wang [Sat, 28 Jun 2025 05:10:33 +0000 (13:10 +0800)]

sfc: siena: eliminate xdp_rxq_info_valid using XDP base API

Commit d48523cb88e0 ("sfc: Copy shared files needed for Siena (part 2)")
use xdp_rxq_info_valid to track failures of xdp_rxq_info_reg().
However, this driver-maintained state becomes redundant since the XDP
framework already provides xdp_rxq_info_is_reg() for checking registration
status.

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250628051033.51133-1-wangfushuai@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Fushuai Wang [Sat, 28 Jun 2025 05:10:16 +0000 (13:10 +0800)]

sfc: eliminate xdp_rxq_info_valid using XDP base API

Commit eb9a36be7f3e ("sfc: perform XDP processing on received packets")
use xdp_rxq_info_valid to track failures of xdp_rxq_info_reg().
However, this driver-maintained state becomes redundant since the XDP
framework already provides xdp_rxq_info_is_reg() for checking registration
status.

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250628051016.51022-1-wangfushuai@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

RubenKelevra [Thu, 26 Jun 2025 20:59:07 +0000 (22:59 +0200)]

net: ieee8021q: fix insufficient table-size assertion

_Static_assert(ARRAY_SIZE(map) != IEEE8021Q_TT_MAX - 1) rejects only a
length of 7 and allows any other mismatch. Replace it with a strict
equality test via a helper macro so that every mapping table must have
exactly IEEE8021Q_TT_MAX (8) entries.

Signed-off-by: RubenKelevra <rubenkelevra@gmail.com>
Link: https://patch.msgid.link/20250626205907.1566384-1-rubenkelevra@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Jakub Kicinski [Thu, 26 Jun 2025 19:15:53 +0000 (12:15 -0700)]

docs: fbnic: explain the ring config

fbnic takes 4 parameters to configure the Rx queues. The semantics
are similar to other existing NICs but confusing to newcomers.
Document it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250626191554.32343-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Oleksij Rempel [Thu, 26 Jun 2025 10:37:31 +0000 (12:37 +0200)]

net: usb: lan78xx: fix possible NULL pointer dereference in lan78xx_phy_init()

If no PHY device is found (e.g., for LAN7801 in fixed-link mode),
lan78xx_phy_init() may proceed to dereference a NULL phydev pointer,
leading to a crash.

Update the logic to perform MAC configuration first, then check for the presence
of a PHY. For the fixed-link case, set up the fixed link and return early,
bypassing any code that assumes a valid phydev pointer.

It is safe to move lan78xx_mac_prepare_for_phy() earlier because this function
only uses information from dev->interface, which is configured by
lan78xx_get_phy() beforehand. The function does not access phydev or any data
set up by later steps.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Fixes: e110bc825897 ("net: usb: lan78xx: Convert to PHYLINK for improved PHY and MAC management")
Link: https://patch.msgid.link/20250626103731.3986545-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Paolo Abeni [Tue, 1 Jul 2025 07:49:21 +0000 (09:49 +0200)]

Merge branch 'clean-up-usage-of-ffi-types'

Tamir Duberstein says:

====================
Clean up usage of ffi types

Remove qualification of ffi types which are included in the prelude and
change `as` casts to target the proper ffi type alias rather than the
underlying primitive.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
====================

Link: https://patch.msgid.link/20250625-correct-type-cast-v2-0-6f2c29729e69@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Tamir Duberstein [Wed, 25 Jun 2025 12:25:39 +0000 (05:25 -0700)]

Cast to the proper type

Use the ffi type rather than the resolved underlying type.

Acked-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20250625-correct-type-cast-v2-2-6f2c29729e69@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Tamir Duberstein [Wed, 25 Jun 2025 12:25:38 +0000 (05:25 -0700)]

Use unqualified references to ffi types

Remove unnecessary qualifications; `kernel::ffi::*` is included in
`kernel::prelude`.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Reviewed-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Link: https://patch.msgid.link/20250625-correct-type-cast-v2-1-6f2c29729e69@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Eric Dumazet [Fri, 27 Jun 2025 16:32:42 +0000 (16:32 +0000)]

net: net->nsid_lock does not need BH safety

At the time of commit bc51dddf98c9 ("netns: avoid disabling irq
for netns id") peernet2id() was not yet using RCU.

Commit 2dce224f469f ("netns: protect netns
ID lookups with RCU") changed peernet2id() to no longer
acquire net->nsid_lock (potentially from BH context).

We do not need to block soft interrupts when acquiring
net->nsid_lock anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Guillaume Nault <gnault@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250627163242.230866-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 1 Jul 2025 01:23:57 +0000 (18:23 -0700)]

Merge branch 'net-enetc-change-some-statistics-to-64-bit'

Wei Fang says:

====================
net: enetc: change some statistics to 64-bit

The port MAC counters of ENETC are 64-bit registers and the statistics
of ethtool are also u64 type, so add enetc_port_rd64() helper function
to read 64-bit statistics from these registers, and also change the
statistics of ring to unsigned long type to be consistent with the
statistics type in struct net_device_stats.

v1: https://lore.kernel.org/20250620102140.2020008-1-wei.fang@nxp.com
v2: https://lore.kernel.org/20250624101548.2669522-1-wei.fang@nxp.com
====================

Link: https://patch.msgid.link/20250627021108.3359642-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Fri, 27 Jun 2025 02:11:08 +0000 (10:11 +0800)]

net: enetc: read 64-bit statistics from port MAC counters

The counters of port MAC are all 64-bit registers, and the statistics of
ethtool are u64 type, so replace enetc_port_rd() with enetc_port_rd64()
to read 64-bit statistics.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250627021108.3359642-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Fri, 27 Jun 2025 02:11:07 +0000 (10:11 +0800)]

net: enetc: separate 64-bit counters from enetc_port_counters

Some counters in enetc_port_counters are 32-bit registers, and some are
64-bit registers. But in the current driver, they are all read through
enetc_port_rd(), which can only read a 32-bit value. Therefore, separate
64-bit counters (enetc_pm_counters) from enetc_port_counters and use
enetc_port_rd64() to read the 64-bit statistics.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250627021108.3359642-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Wei Fang [Fri, 27 Jun 2025 02:11:06 +0000 (10:11 +0800)]

net: enetc: change the statistics of ring to unsigned long type

The statistics of the ring are all unsigned int type, so the statistics
will overflow quickly under heavy traffic. In addition, the statistics
of struct net_device_stats are obtained from struct enetc_ring_stats,
but the statistics of net_device_stats are unsigned long type. So it is
better to keep the statistics types consistent in these two structures.
Considering these two factors, and the fact that both LS1028A and i.MX95
are arm64 architecture, the statistics of enetc_ring_stats are changed
to unsigned long type. Note that unsigned int and unsigned long are the
same thing on some systems, and on such systems there is no overflow
advantage of one over the other.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250627021108.3359642-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jonas Rebmann [Thu, 26 Jun 2025 13:44:02 +0000 (15:44 +0200)]

net: fec: allow disable coalescing

In the current implementation, IP coalescing is always enabled and
cannot be disabled.

As setting maximum frames to 0 or 1, or setting delay to zero implies
immediate delivery of single packets/IRQs, disable coalescing in
hardware in these cases.

This also guarantees that coalescing is never enabled with ICFT or ICTT
set to zero, a configuration that could lead to unpredictable behaviour
according to i.MX8MP reference manual.

Signed-off-by: Jonas Rebmann <jre@pengutronix.de>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250626-fec_deactivate_coalescing-v2-1-0b217f2e80da@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Tue, 1 Jul 2025 01:14:25 +0000 (18:14 -0700)]

Merge branch 'add-support-for-externally-validated-neighbor-entries'

Ido Schimmel says:

====================
Add support for externally validated neighbor entries

Patch #1 adds a new neighbor flag ("extern_valid") that prevents the
kernel from invalidating or removing a neighbor entry, while allowing
the kernel to notify user space when the entry becomes reachable. See
motivation and implementation details in the commit message.

Patch #2 adds a selftest.

v1: https://lore.kernel.org/20250611141551.462569-1-idosch@nvidia.com
====================

Link: https://patch.msgid.link/20250626073111.244534-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ido Schimmel [Thu, 26 Jun 2025 07:31:11 +0000 (10:31 +0300)]

selftests: net: Add a selftest for externally validated neighbor entries

Add test cases for externally validated neighbor entries, testing both
IPv4 and IPv6. Name the file "test_neigh.sh" so that it could be
possibly extended in the future with more neighbor test cases.

Example output:

# ./test_neigh.sh
TEST: IPv4 "extern_valid" flag: Add entry                           [ OK ]
TEST: IPv4 "extern_valid" flag: Add with an invalid state           [ OK ]
TEST: IPv4 "extern_valid" flag: Add with "use" flag                 [ OK ]
TEST: IPv4 "extern_valid" flag: Replace entry                       [ OK ]
TEST: IPv4 "extern_valid" flag: Replace entry with "managed" flag   [ OK ]
TEST: IPv4 "extern_valid" flag: Replace with an invalid state       [ OK ]
TEST: IPv4 "extern_valid" flag: Interface down                      [ OK ]
TEST: IPv4 "extern_valid" flag: Carrier down                        [ OK ]
TEST: IPv4 "extern_valid" flag: Transition to "reachable" state     [ OK ]
TEST: IPv4 "extern_valid" flag: Transition back to "stale" state    [ OK ]
TEST: IPv4 "extern_valid" flag: Forced garbage collection           [ OK ]
TEST: IPv4 "extern_valid" flag: Periodic garbage collection         [ OK ]
TEST: IPv6 "extern_valid" flag: Add entry                           [ OK ]
TEST: IPv6 "extern_valid" flag: Add with an invalid state           [ OK ]
TEST: IPv6 "extern_valid" flag: Add with "use" flag                 [ OK ]
TEST: IPv6 "extern_valid" flag: Replace entry                       [ OK ]
TEST: IPv6 "extern_valid" flag: Replace entry with "managed" flag   [ OK ]
TEST: IPv6 "extern_valid" flag: Replace with an invalid state       [ OK ]
TEST: IPv6 "extern_valid" flag: Interface down                      [ OK ]
TEST: IPv6 "extern_valid" flag: Carrier down                        [ OK ]
TEST: IPv6 "extern_valid" flag: Transition to "reachable" state     [ OK ]
TEST: IPv6 "extern_valid" flag: Transition back to "stale" state    [ OK ]
TEST: IPv6 "extern_valid" flag: Forced garbage collection           [ OK ]
TEST: IPv6 "extern_valid" flag: Periodic garbage collection         [ OK ]

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250626073111.244534-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Ido Schimmel [Thu, 26 Jun 2025 07:31:10 +0000 (10:31 +0300)]

neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries

tl;dr
=====

Add a new neighbor flag ("extern_valid") that can be used to indicate to
the kernel that a neighbor entry was learned and determined to be valid
externally. The kernel will not try to remove or invalidate such an
entry, leaving these decisions to the user space control plane. This is
needed for EVPN multi-homing where a neighbor entry for a multi-homed
host needs to be synced across all the VTEPs among which the host is
multi-homed.

Background
==========

In a typical EVPN multi-homing setup each host is multi-homed using a
set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf
switches (VTEPs). VTEPs that are connected to the same ES are called ES
peers.

When a neighbor entry is learned on a VTEP, it is distributed to both ES
peers and remote VTEPs using EVPN MAC/IP advertisement routes. ES peers
use the neighbor entry when routing traffic towards the multi-homed host
and remote VTEPs use it for ARP/NS suppression.

Motivation
==========

If the ES link between a host and the VTEP on which the neighbor entry
was locally learned goes down, the EVPN MAC/IP advertisement route will
be withdrawn and the neighbor entries will be removed from both ES peers
and remote VTEPs. Routing towards the multi-homed host and ARP/NS
suppression can fail until another ES peer locally learns the neighbor
entry and distributes it via an EVPN MAC/IP advertisement route.

"draft-rbickhart-evpn-ip-mac-proxy-adv-03" [1] suggests avoiding these
intermittent failures by having the ES peers install the neighbor
entries as before, but also injecting EVPN MAC/IP advertisement routes
with a proxy indication. When the previously mentioned ES link goes down
and the original EVPN MAC/IP advertisement route is withdrawn, the ES
peers will not withdraw their neighbor entries, but instead start aging
timers for the proxy indication.

If an ES peer locally learns the neighbor entry (i.e., it becomes
"reachable"), it will restart its aging timer for the entry and emit an
EVPN MAC/IP advertisement route without a proxy indication. An ES peer
will stop its aging timer for the proxy indication if it observes the
removal of the proxy indication from at least one of the ES peers
advertising the entry.

In the event that the aging timer for the proxy indication expired, an
ES peer will withdraw its EVPN MAC/IP advertisement route. If the timer
expired on all ES peers and they all withdrew their proxy
advertisements, the neighbor entry will be completely removed from the
EVPN fabric.

Implementation
==============

In the above scheme, when the control plane (e.g., FRR) advertises a
neighbor entry with a proxy indication, it expects the corresponding
entry in the data plane (i.e., the kernel) to remain valid and not be
removed due to garbage collection or loss of carrier. The control plane
also expects the kernel to notify it if the entry was learned locally
(i.e., became "reachable") so that it will remove the proxy indication
from the EVPN MAC/IP advertisement route. That is why these entries
cannot be programmed with dummy states such as "permanent" or "noarp".

Instead, add a new neighbor flag ("extern_valid") which indicates that
the entry was learned and determined to be valid externally and should
not be removed or invalidated by the kernel. The kernel can probe the
entry and notify user space when it becomes "reachable" (it is initially
installed as "stale"). However, if the kernel does not receive a
confirmation, have it return the entry to the "stale" state instead of
the "failed" state.

In other words, an entry marked with the "extern_valid" flag behaves
like any other dynamically learned entry other than the fact that the
kernel cannot remove or invalidate it.

One can argue that the "extern_valid" flag should not prevent garbage
collection and that instead a neighbor entry should be programmed with
both the "extern_valid" and "extern_learn" flags. There are two reasons
for not doing that:

1. Unclear why a control plane would like to program an entry that the
   kernel cannot invalidate but can completely remove.

2. The "extern_learn" flag is used by FRR for neighbor entries learned
   on remote VTEPs (for ARP/NS suppression) whereas here we are
   concerned with local entries. This distinction is currently irrelevant
   for the kernel, but might be relevant in the future.

Given that the flag only makes sense when the neighbor has a valid
state, reject attempts to add a neighbor with an invalid state and with
this flag set. For example:

# ip neigh add 192.0.2.1 nud none dev br0.10 extern_valid
Error: Cannot create externally validated neighbor with an invalid state.
# ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid
# ip neigh replace 192.0.2.1 nud failed dev br0.10 extern_valid
Error: Cannot mark neighbor as externally validated with an invalid state.

The above means that a neighbor cannot be created with the
"extern_valid" flag and flags such as "use" or "managed" as they result
in a neighbor being created with an invalid state ("none") and
immediately getting probed:

# ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use
Error: Cannot create externally validated neighbor with an invalid state.

However, these flags can be used together with "extern_valid" after the
neighbor was created with a valid state:

# ip neigh add 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid
# ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use

One consequence of preventing the kernel from invalidating a neighbor
entry is that by default it will only try to determine reachability
using unicast probes. This can be changed using the "mcast_resolicit"
sysctl:

# sysctl net.ipv4.neigh.br0/10.mcast_resolicit
0
# tcpdump -nn -e -i br0.10 -Q out arp &
# ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use
62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
# sysctl -wq net.ipv4.neigh.br0/10.mcast_resolicit=3
# ip neigh replace 192.0.2.1 lladdr 00:11:22:33:44:55 nud stale dev br0.10 extern_valid use
62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
62:50:1d:11:93:6f > 00:11:22:33:44:55, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28
62:50:1d:11:93:6f > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.1 tell 192.0.2.2, length 28

iproute2 patches can be found here [2].

[1] https://datatracker.ietf.org/doc/html/draft-rbickhart-evpn-ip-mac-proxy-adv-03
[2] https://github.com/idosch/iproute2/tree/submit/extern_valid_v1

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20250626073111.244534-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Fri, 27 Jun 2025 11:58:21 +0000 (11:58 +0000)]

ipv6: guard ip6_mr_output() with rcu

syzbot found at least one path leads to an ip_mr_output()
without RCU being held.

Add guard(rcu)() to fix this in a concise way.

WARNING: net/ipv6/ip6mr.c:2376 at ip6_mr_output+0xe0b/0x1040 net/ipv6/ip6mr.c:2376, CPU#1: kworker/1:2/121
Call Trace:
<TASK>
  ip6tunnel_xmit include/net/ip6_tunnel.h:162 [inline]
  udp_tunnel6_xmit_skb+0x640/0xad0 net/ipv6/ip6_udp_tunnel.c:112
  send6+0x5ac/0x8d0 drivers/net/wireguard/socket.c:152
  wg_socket_send_skb_to_peer+0x111/0x1d0 drivers/net/wireguard/socket.c:178
  wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline]
  wg_packet_tx_worker+0x1c8/0x7c0 drivers/net/wireguard/send.c:276
  process_one_work kernel/workqueue.c:3239 [inline]
  process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3322
  worker_thread+0x8a0/0xda0 kernel/workqueue.c:3403
  kthread+0x70e/0x8a0 kernel/kthread.c:464
  ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Fixes: 96e8f5a9fe2d ("net: ipv6: Add ip6_mr_output()")
Reported-by: syzbot+0141c834e47059395621@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/685e86b3.a00a0220.129264.0003.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Benjamin Poirier <bpoirier@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250627115822.3741390-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Mon, 30 Jun 2025 15:41:32 +0000 (08:41 -0700)]

Merge branch 'net-ethtool-consistently-take-rss_lock-for-all-rxfh-ops'

Jakub Kicinski says:

====================
net: ethtool: consistently take rss_lock for all rxfh ops

I'd like to bring RXFH and RXFHINDIR ioctls under a single set of
Netlink ops. It appears that while core takes the ethtool->rss_lock
around some of the RXFHINDIR ops, drivers (sfc) take it internally
for the RXFH.

Consistently take the lock around all ops and accesses to the XArray
within the core. This should hopefully make the rss_lock a lot less
confusing.
====================

Link: https://patch.msgid.link/20250626202848.104457-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 26 Jun 2025 20:28:48 +0000 (13:28 -0700)]

net: ethtool: move get_rxfh callback under the rss_lock

We already call get_rxfh under the rss_lock when we read back
context state after changes. Let's be consistent and always
hold the lock. The existing callers are all under rtnl_lock
so this should make no difference in practice, but it makes
the locking rules far less confusing IMHO. Any RSS callback
and any access to the RSS XArray should hold the lock.

Link: https://patch.msgid.link/20250626202848.104457-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 26 Jun 2025 20:28:47 +0000 (13:28 -0700)]

net: ethtool: move rxfh_fields callbacks under the rss_lock

Netlink code will want to perform the RSS_SET operation atomically
under the rss_lock. sfc wants to hold the rss_lock in rxfh_fields_get,
which makes that difficult. Lets move the locking up to the core
so that for all driver-facing callbacks rss_lock is taken consistently
by the core.

Link: https://patch.msgid.link/20250626202848.104457-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 26 Jun 2025 20:28:46 +0000 (13:28 -0700)]

net: ethtool: take rss_lock for all rxfh changes

Always take the rss_lock in ethtool_set_rxfh(). We will want to
make a similar change in ethtool_set_rxfh_fields() and some
drivers lock that callback regardless of rss context ID being set.
Having some callbacks locked unconditionally and some only if
context ID is set would be very confusing.

ethtool handling is under rtnl_lock, so rss_lock is very unlikely
to ever be congested.

Link: https://patch.msgid.link/20250626202848.104457-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 26 Jun 2025 23:39:26 +0000 (16:39 -0700)]

net: ethtool: avoid OOB accesses in PAUSE_SET

We now reuse .parse_request() from GET on SET, so we need to make sure
that the policies for both cover the attributes used for .parse_request().
genetlink will only allocate space in info->attrs for ARRAY_SIZE(policy).

Reported-by: syzbot+430f9f76633641a62217@syzkaller.appspotmail.com
Fixes: 963781bdfe20 ("net: ethtool: call .parse_request for SET handlers")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250626233926.199801-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Fushuai Wang [Thu, 26 Jun 2025 05:30:03 +0000 (13:30 +0800)]

net/mlx5e: Fix error handling in RQ memory model registration

Currently when xdp_rxq_info_reg_mem_model() fails in the XSK path, the
error handling incorrectly jumps to err_destroy_page_pool. While this
may not cause errors, we should make it jump to the correct location.

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Acked-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Carpenter [Wed, 25 Jun 2025 15:23:05 +0000 (10:23 -0500)]

octeontx2-af: Fix error code in rvu_mbox_init()

The error code was intended to be -EINVAL here, but it was accidentally
changed to returning success. Set the error code.

Fixes: e53ee4acb220 ("octeontx2-af: CN20k basic mbox operations and structures")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Fri, 27 Jun 2025 11:46:41 +0000 (11:46 +0000)]

net: ipv4: guard ip_mr_output() with rcu

syzbot found at least one path leads to an ip_mr_output()
without RCU being held.

Add guard(rcu)() to fix this in a concise way.

WARNING: CPU: 0 PID: 0 at net/ipv4/ipmr.c:2302 ip_mr_output+0xbb1/0xe70 net/ipv4/ipmr.c:2302
Call Trace:
<IRQ>
  igmp_send_report+0x89e/0xdb0 net/ipv4/igmp.c:799
igmp_timer_expire+0x204/0x510 net/ipv4/igmp.c:-1
  call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747
  expire_timers kernel/time/timer.c:1798 [inline]
  __run_timers kernel/time/timer.c:2372 [inline]
  __run_timer_base+0x61a/0x860 kernel/time/timer.c:2384
  run_timer_base kernel/time/timer.c:2393 [inline]
  run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403
  handle_softirqs+0x286/0x870 kernel/softirq.c:579
  __do_softirq kernel/softirq.c:613 [inline]
  invoke_softirq kernel/softirq.c:453 [inline]
  __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:696
  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline]
  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050

Fixes: 35bec72a24ac ("net: ipv4: Add ip_mr_output()")
Reported-by: syzbot+f02fb9e43bd85c6c66ae@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/685e841a.a00a0220.129264.0002.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Machata <petrm@nvidia.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Benjamin Poirier <bpoirier@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 23:56:02 +0000 (16:56 -0700)]

Merge branch 'octeontx2-pf-extend-link-modes-support'

Hariprasad Kelam says:

====================
Octeontx2-pf: extend link modes support

This series of patches adds multi advertise mode support along with
other improvements in link mode management code flow.

Patch1: Currently all SGMII modes 10/100/1000baseT are mapped with
        single firmware mode. This patch updates these link modes
        with corresponding firmware modes.

Patch2: Due to limitation in current kernel <-> firmware communication,
        link modes are divided into multiple groups, and identified
        with their group index.

Patch3: Adds support for multi advertise mode.
====================

Link: https://patch.msgid.link/20250625092107.9746-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Hariprasad Kelam [Wed, 25 Jun 2025 09:21:07 +0000 (14:51 +0530)]

Octeontx2-pf: ethtool: support multi advertise mode

Current implementation considers only first advertise
mode and passes the same to firmware to process.
This patch extends code such that user can advertise
multiple modes on the given interface.

Below are high level changes:

1. Remove unnecessary speed/duplex/autoneg validation as its
already verified as part of "set_link_ksettings"

2. Since scratch csr framework designed to support single mode at a time,
use "shared firmware data" for multi mode support.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250625092107.9746-4-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Hariprasad Kelam [Wed, 25 Jun 2025 09:21:06 +0000 (14:51 +0530)]

Octeontx2-af: Introduce mode group index

Kernel and firmware communicates via scratch register which is
64 bit in size.

[MODE_ID   PORT    AUTONEG  DUPLEX  SPEED   CMD_ID   OWNERSHIP ]
63-22     21-14     13      12      11-8    7-2       1-0

The existing MODE_ID bitmap can only support up to 42 modes.
To resolve the issue, the unused port field is modified as below
            uint64_t reserved2:6;
            uint64_t mode_group_idx:2;

'mode_group_idx' categorize the mode ID range to accommodate more modes.

To specify mode ID range of 0 - 41, this field will be 0.

     To specify mode ID range of 42 - 83, this field will be 1.

mode ID will be still mentioned as 1 << (0 - 41).  But the mode_group_idx
decides the actual mode range

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250625092107.9746-3-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Hariprasad Kelam [Wed, 25 Jun 2025 09:21:05 +0000 (14:51 +0530)]

Octeontx-pf: Update SGMII mode mapping

Current implementation maps ethtool link modes 10baseT/100baseT/1000baseT
to single firmware mode SGMII. This create a problem for end users who want
to advertise only one speed among them.

This patch addresses the issue by mapping each ethtool link mode
to a corresponding firmware mode also updates new modes supported
by firmware.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250625092107.9746-2-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 23:38:05 +0000 (16:38 -0700)]

Merge branch 'dpll-add-reference-sync-feature'

Arkadiusz Kubalewski says:

====================
dpll: add Reference SYNC feature

The device may support the Reference SYNC feature, which allows the
combination of two inputs into a input pair. In this configuration,
clock signals from both inputs are used to synchronize the DPLL device.
The higher frequency signal is utilized for the loop bandwidth of the DPLL,
while the lower frequency signal is used to syntonize the output signal of
the DPLL device. This feature enables the provision of a high-quality loop
bandwidth signal from an external source.

A capable input provides a list of inputs that can be bound with to create
Reference SYNC. To control this feature, the user must request a
desired state for a target pin: use ``DPLL_PIN_STATE_CONNECTED`` to
enable or ``DPLL_PIN_STATE_DISCONNECTED`` to disable the feature. An input
pin can be bound to only one other pin at any given time.

Verify pins bind state/capabilities:
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/dpll.yaml \
--do pin-get \
--json '{"id":0}'
{'board-label': 'CVL-SDP22',
'id': 0,
[...]
'reference-sync': [{'id': 1, 'state': 'disconnected'}],
[...]}

Bind the pins by setting connected state between them:
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/dpll.yaml \
--do pin-set \
--json '{"id":0, "reference-sync":{"id":1, "state":"connected"}}'

Verify pins bind state:
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/dpll.yaml \
--do pin-get \
--json '{"id":0}'
{'board-label': 'CVL-SDP22',
'id': 0,
[...]
'reference-sync': [{'id': 1, 'state': 'connected'}],
[...]}

Unbind the pins by setting disconnected state between them:
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/dpll.yaml \
--do pin-set \
--json '{"id":0, "reference-sync":{"id":1, "state":"disconnected"}}'
====================

Link: https://patch.msgid.link/20250626135219.1769350-1-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Arkadiusz Kubalewski [Thu, 26 Jun 2025 13:52:19 +0000 (15:52 +0200)]

ice: add ref-sync dpll pins

Implement reference sync input pin get/set callbacks, allow user space
control over dpll pin pairs capable of reference sync support.

Reviewed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250626135219.1769350-4-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Arkadiusz Kubalewski [Thu, 26 Jun 2025 13:52:18 +0000 (15:52 +0200)]

dpll: add reference sync get/set

Define function for reference sync pin registration and callback ops to
set/get current feature state.

Implement netlink handler to fill netlink messages with reference sync
pin configuration of capable pins (pin-get).

Implement netlink handler to call proper ops and configure reference
sync pin state (pin-set).

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Milena Olech <milena.olech@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250626135219.1769350-3-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Arkadiusz Kubalewski [Thu, 26 Jun 2025 13:52:17 +0000 (15:52 +0200)]

dpll: add reference-sync netlink attribute

Add new netlink attribute to allow user space configuration of reference
sync pin pairs, where both pins are used to provide one clock signal
consisting of both: base frequency and sync signal.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Milena Olech <milena.olech@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250626135219.1769350-2-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 23:27:08 +0000 (16:27 -0700)]

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
ice: remaining TSPLL cleanups

These are the remaining patches from the "ice: Separate TSPLL from PTP
and cleanup" series [1] with control flow macros removed. What remains
are cleanups and some minor improvements.

[1] https://lore.kernel.org/netdev/20250618174231.3100231-1-anthony.l.nguyen@intel.com/

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: default to TIME_REF instead of TXCO on E825-C
  ice: move TSPLL init calls to ice_ptp.c
  ice: fall back to TCXO on TSPLL lock fail
  ice: wait before enabling TSPLL
  ice: add multiple TSPLL helpers
  ice: use bitfields instead of unions for CGU regs
  ice: read TSPLL registers again before reporting status
  ice: clear time_sync_en field for E825-C during reprogramming
====================

Link: https://patch.msgid.link/20250626162921.1173068-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Thu, 26 Jun 2025 16:54:41 +0000 (09:54 -0700)]

eth: bnxt: take page size into account for page pool recycling rings

The Rx rings are filled with Rx buffers. Which are supposed to fit
packet headers (or MTU if HW-GRO is disabled). The aggregation buffers
are filled with "device pages". Adjust the sizes of the page pool
recycling ring appropriately, based on ratio of the size of the
buffer on given ring vs system page size. Otherwise on a system
with 64kB pages we end up with >700MB of memory sitting in every
single page pool cache.

Correct the size calculation for the head_pool. Since the buffers
there are always small I'm pretty sure I meant to cap the size
at 1k, rather than make it the lowest possible size. With 64k pages
1k cache with a 1k ring is 64x larger than we need.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250626165441.4125047-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 22:35:08 +0000 (15:35 -0700)]

Merge branch 'tcp-fix-dsack-bug-with-non-contiguous-ranges'

Eric Dumazet says:

====================
tcp: fix DSACK bug with non contiguous ranges

This series combines a fix from xin.guo and a new packetdrill test.
====================

Link: https://patch.msgid.link/20250626123420.1933835-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 26 Jun 2025 12:34:20 +0000 (12:34 +0000)]

selftests/net: packetdrill: add tcp_dsack_mult.pkt

Test DSACK behavior with non contiguous ranges.

Without prior fix (tcp: fix tcp_ofo_queue() to avoid including
too much DUP SACK range) this would fail with:

tcp_dsack_mult.pkt:37: error handling packet: bad value outbound TCP option 5
script packet: 0.100682 . 1:1(0) ack 6001 <nop,nop,sack 1001:3001 7001:8001>
actual packet: 0.100679 . 1:1(0) ack 6001 win 1097 <nop,nop,sack 1001:6001 7001:8001>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: xin.guo <guoxin0309@gmail.com>
Link: https://patch.msgid.link/20250626123420.1933835-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

xin.guo [Thu, 26 Jun 2025 12:34:19 +0000 (12:34 +0000)]

tcp: fix tcp_ofo_queue() to avoid including too much DUP SACK range

If the new coming segment covers more than one skbs in the ofo queue,
and which seq is equal to rcv_nxt, then the sequence range
that is duplicated will be sent as DUP SACK, the detail as below,
in step6, the {501,2001} range is clearly including too much
DUP SACK range, in violation of RFC 2883 rules.

1. client > server: Flags [.], seq 501:1001, ack 1325288529, win 20000, length 500
2. server > client: Flags [.], ack 1, [nop,nop,sack 1 {501:1001}], length 0
3. client > server: Flags [.], seq 1501:2001, ack 1325288529, win 20000, length 500
4. server > client: Flags [.], ack 1, [nop,nop,sack 2 {1501:2001} {501:1001}], length 0
5. client > server: Flags [.], seq 1:2001, ack 1325288529, win 20000, length 2000
6. server > client: Flags [.], ack 2001, [nop,nop,sack 1 {501:2001}], length 0

After this fix, the final ACK is as below:

6. server > client: Flags [.], ack 2001, options [nop,nop,sack 1 {501:1001}], length 0

[edumazet] added a new packetdrill test in the following patch.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: xin.guo <guoxin0309@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250626123420.1933835-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 22:34:20 +0000 (15:34 -0700)]

Merge branch 'tcp-remove-rtx_syn_ack-and-inet_rtx_syn_ack'

Eric Dumazet says:

====================
tcp: remove rtx_syn_ack and inet_rtx_syn_ack()

After DCCP removal, we can cleanup SYNACK retransmits a bit.
====================

Link: https://patch.msgid.link/20250626153017.2156274-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 26 Jun 2025 15:30:17 +0000 (15:30 +0000)]

tcp: remove inet_rtx_syn_ack()

inet_rtx_syn_ack() is a simple wrapper around tcp_rtx_synack(),
if we move req->num_retrans update.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250626153017.2156274-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 26 Jun 2025 15:30:16 +0000 (15:30 +0000)]

tcp: remove rtx_syn_ack field

Now inet_rtx_syn_ack() is only used by TCP, it can directly
call tcp_rtx_synack() instead of using an indirect call
to req->rsk_ops->rtx_syn_ack().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250626153017.2156274-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 22:14:58 +0000 (15:14 -0700)]

Merge branch 'net-dsa-ks8995-fix-up-bindings'

Linus Walleij says:

====================
net: dsa: ks8995: Fix up bindings

After looking at the datasheets for KS8995 I realized this is
a DSA switch and need to have DT bindings as such and be implemented
as such.

This series just fixes up the bindings and the offending device tree.

The existing kernel driver which is in drivers/net/phy/spi_ks8995.c
does not implement DSA. It can be forgiven for this because it was
merged in 2011 and the DSA framework was not widely established
back then. It continues to probe fine but needs to be rewritten
to use the special DSA tag and moved to drivers/net/dsa as time
permits. (I hope I can do this.)

It's fine for the networking tree to merge both patches, I maintain
ixp4xx as well. But I can also carry the second patch through the
SoC tree if so desired.

v1: https://lore.kernel.org/20250624-ks8995-dsa-bindings-v1-0-71a8b4f63315@linaro.org
====================

Link: https://patch.msgid.link/20250625-ks8995-dsa-bindings-v2-0-ce71dce9be0b@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Linus Walleij [Wed, 25 Jun 2025 06:51:25 +0000 (08:51 +0200)]

ARM: dts: Fix up wrv54g device tree

Fix up the KS8995 switch and PHYs the way that is most likely:

- Phy 1-4 is certainly the PHYs of the KS8995 (mask 0x1e in
  the outoftree code masks PHYs 1,2,3,4).
- Phy 5 is the MII-P5 separate WAN phy of the KS8995 directly
  connected to EthC.
- The EthB MII is probably connected as CPU interface to the
  KS8995.

Properly integrate the KS8995 switch using the new bindings.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250625-ks8995-dsa-bindings-v2-2-ce71dce9be0b@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Linus Walleij [Wed, 25 Jun 2025 06:51:24 +0000 (08:51 +0200)]

dt-bindings: dsa: Rewrite Micrel KS8995 in schema

After studying the datasheets for some of the KS8995 variants
it becomes pretty obvious that this is a straight-forward
and simple MII DSA switch with one port in (CPU) and four outgoing
ports, and it even supports custom tags by setting a bit in
a special register, and elaborate VLAN handling as all DSA
switches do.

What is a bit odd with KS8995 is that it uses an extra MII-P5
port to access one of the PHYs separately, on the side of the
switch fabric, such as when using a WAN port separately from
a LAN switch in a home router.

Rewrite the terse bindings to YAML, and move to the proper
subdirectory. Include a verbose example to make things clear.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250625-ks8995-dsa-bindings-v2-1-ce71dce9be0b@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paul Kocialkowski [Thu, 26 Jun 2025 08:09:21 +0000 (10:09 +0200)]

dt-bindings: net: sun8i-emac: Add A100 EMAC compatible

The Allwinner A100/A133 has an Ethernet MAC (EMAC) controller that is
compatible with the A64 one. It features the same syscon registers for
control of the top-level integration of the unit.

Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250626080923.632789-4-paulk@sys-base.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 22:09:04 +0000 (15:09 -0700)]

Merge branch 'nfc-trf7970a-add-option-to-reduce-antenna-gain'

Paul Geurts says:

====================
NFC: trf7970a: Add option to reduce antenna gain

The TRF7970a device is sensitive to RF disturbances, which can make it
hard to pass some EMC immunity tests. By reducing the RX antenna gain,
the device becomes less sensitive to EMC disturbances, as a trade-off
against antenna performance.
====================

Link: https://patch.msgid.link/20250626141242.3749958-1-paul.geurts@prodrive-technologies.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paul Geurts [Thu, 26 Jun 2025 14:12:42 +0000 (16:12 +0200)]

NFC: trf7970a: Create device-tree parameter for RX gain reduction

The TRF7970a device is sensitive to RF disturbances, which can make it
hard to pass some EMC immunity tests. By reducing the RX antenna gain,
the device becomes less sensitive to EMC disturbances, as a trade-off
against antenna performance.

Add a device tree option to select RX gain reduction to improve EMC
performance.

Selecting a communication standard in the ISO control register resets
the RX antenna gain settings. Therefore set the RX gain reduction
everytime the ISO control register changes, when the option is used.

Signed-off-by: Paul Geurts <paul.geurts@prodrive-technologies.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250626141242.3749958-3-paul.geurts@prodrive-technologies.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Paul Geurts [Thu, 26 Jun 2025 14:12:41 +0000 (16:12 +0200)]

dt-bindings: net/nfc: ti,trf7970a: Add ti,rx-gain-reduction-db option

Add option to reduce the RX antenna gain to be able to reduce the
sensitivity.

Signed-off-by: Paul Geurts <paul.geurts@prodrive-technologies.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250626141242.3749958-2-paul.geurts@prodrive-technologies.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Frank Li [Tue, 24 Jun 2025 20:20:27 +0000 (16:20 -0400)]

dt-bindings: net: convert lpc-eth.txt yaml format

Convert lpc-eth.txt yaml format.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250624202028.2516257-1-Frank.Li@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 20:22:59 +0000 (13:22 -0700)]

Merge branch 'ref_tracker-fix'

Merge a fix from Jeff from a stable commit ID:

* ref_tracker: do xarray and workqueue job initializations earlier

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Jeff Layton [Thu, 26 Jun 2025 12:52:14 +0000 (08:52 -0400)]

ref_tracker: do xarray and workqueue job initializations earlier

The kernel test robot reported an oops that occurred when attempting to
deregister a dentry from the xarray during subsys_initcall().

The ref_tracker xarrays and workqueue job are being initialized in
late_initcall() which is too late. Move those to postcore_initcall()
instead.

Fixes: 65b584f53611 ("ref_tracker: automatically register a file in debugfs for a ref_tracker_dir")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506251406.c28f2adb-lkp@intel.com
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250626-reftrack-dbgfs-v1-1-812102e2a394@kernel.org

commit | commitdiff | tree

Simon Horman [Wed, 25 Jun 2025 12:52:10 +0000 (13:52 +0100)]

tg3: spelling corrections

Correct spelling as flagged by codespell.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christian Marangi [Tue, 17 Jun 2025 09:16:53 +0000 (11:16 +0200)]

net: mdio: Add MDIO bus controller for Airoha AN7583

Airoha AN7583 SoC have 2 dedicated MDIO bus controller in the SCU
register map. To driver register an MDIO controller based on the DT
reg property and access the register by accessing the parent syscon.

The MDIO bus logic is similar to the MT7530 internal MDIO bus but
deviates of some setting and some HW bug.

On Airoha AN7583 the MDIO clock is set to 25MHz by default and needs to
be correctly setup to 2.5MHz to correctly work (by setting the divisor
to 10x).

There seems to be Hardware bug where AN7583_MII_RWDATA
is not wiped in the context of unconnected PHY and the
previous read value is returned.

Example: (only one PHY on the BUS at 0x1f)
- read at 0x1f report at 0x2 0x7500
- read at 0x0 report 0x7500 on every address

To workaround this, we reset the Mdio BUS at every read
to have consistent values on read operation.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christian Marangi [Tue, 17 Jun 2025 09:16:52 +0000 (11:16 +0200)]

dt-bindings: net: Document support for Airoha AN7583 MDIO Controller

Airoha AN7583 SoC have 3 different MDIO Controller. One comes from
the intergated Switch based on MT7530. The other 2 live under the SCU
register and expose 2 dedicated MDIO controller.

Document the schema for the 2 dedicated MDIO controller.
Each MDIO controller can be independently reset with the SoC reset line.
Each MDIO controller have a dedicated clock configured to 2.5MHz by
default to follow MDIO bus IEEE 802.3 standard.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jakub Kicinski [Fri, 27 Jun 2025 00:54:10 +0000 (17:54 -0700)]

Merge branch 'ptp-belated-spring-cleaning-of-the-chardev-driver'

Thomas Gleixner says:

====================
ptp: Belated spring cleaning of the chardev driver

When looking into supporting auxiliary clocks in the PTP ioctl, the
inpenetrable ptp_ioctl() letter soup bothered me enough to clean it up.

The code (~400 lines!) is really hard to follow due to a gazillion of
local variables, which are only used in certain case scopes, and a
mixture of gotos, breaks and direct error return paths.

Clean it up by splitting out the IOCTL functionality into seperate
functions, which contain only the required local variables and are trivial
to follow. Complete the cleanup by converting the code to lock guards and
get rid of all gotos.

That reduces the code size by 48 lines and also the binary text size is
80 bytes smaller than the current maze.

The series is split up into one patch per IOCTL command group for easy
review.

v1: https://lore.kernel.org/20250620130144.351492917@linutronix.de
====================

Link: https://patch.msgid.link/20250625114404.102196103@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:39 +0000 (13:52 +0200)]

ptp: Simplify ptp_read()

The mixture of gotos and direct return codes is inconsistent and just makes
the code harder to read. Let it consistently return error codes directly and
tidy the code flow up accordingly.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250625115133.486953538@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:38 +0000 (13:52 +0200)]

ptp: Convert chardev code to lock guards

Convert the various spin_lock_irqsave() protected critical regions to
scoped guards. Use spinlock_irq instead of spinlock_irqsave as all the
functions are invoked in thread context with interrupts enabled.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250625115133.425029269@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:36 +0000 (13:52 +0200)]

ptp: Split out PTP_MASK_EN_SINGLE ioctl code

Finish the ptp_ioctl() cleanup by splitting out the PTP_MASK_EN_SINGLE
ioctl code and removing the remaining local variables and return
statements.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.364422719@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:35 +0000 (13:52 +0200)]

ptp: Split out PTP_MASK_CLEAR_ALL ioctl code

Continue the ptp_ioctl() cleanup by splitting out the PTP_MASK_CLEAR_ALL ioctl
code into a helper function.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.302755618@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:34 +0000 (13:52 +0200)]

ptp: Split out PTP_PIN_SETFUNC ioctl code

Continue the ptp_ioctl() cleanup by splitting out the PTP_PIN_SETFUNC ioctl
code into a helper function. Convert to lock guard while at it and remove
the pointless memset of the pd::rsv because nothing uses it.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.241503804@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:33 +0000 (13:52 +0200)]

ptp: Split out PTP_PIN_GETFUNC ioctl code

Continue the ptp_ioctl() cleanup by splitting out the PTP_PIN_GETFUNC ioctl
code into a helper function. Convert to lock guard while at it and remove
the pointless memset of the pd::rsv because nothing uses it.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.177265865@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:31 +0000 (13:52 +0200)]

ptp: Split out PTP_SYS_OFFSET ioctl code

Continue the ptp_ioctl() cleanup by splitting out the PTP_SYS_OFFSET ioctl
code into a helper function.

Convert it to __free() to avoid gotos.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250625115133.113841216@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:30 +0000 (13:52 +0200)]

ptp: Split out PTP_SYS_OFFSET_EXTENDED ioctl code

Continue the ptp_ioctl() cleanup by splitting out the
PTP_SYS_OFFSET_EXTENDED ioctl code into a helper function.

Convert it to __free() to avoid gotos.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115133.050445505@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:29 +0000 (13:52 +0200)]

ptp: Split out PTP_SYS_OFFSET_PRECISE ioctl code

Continue the ptp_ioctl() cleanup by splitting out the PTP_SYS_OFFSET_PRECISE
ioctl code into a helper function.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115132.986897454@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 25 Jun 2025 11:52:28 +0000 (13:52 +0200)]

ptp: Split out PTP_ENABLE_PPS ioctl code

Continue the ptp_ioctl() cleanup by splitting out the PTP_ENABLE_PPS
ioctl code into a helper function. Convert to a lock guard while at it.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250625115132.923803136@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

A mirror of Linus' kernel repository

RSS Atom