git.ipfire.org Git - thirdparty/kernel/linux.git/log

af_unix: Move internal definitions to net/unix/.

net/af_unix.h is included by core and some LSMs, but most definitions
need not be.

Let's move struct unix_{vertex,edge} to net/unix/garbage.c and other
definitions to net/unix/af_unix.h.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250318034934.86708-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Sort headers.

This is a prep patch to make the following changes cleaner.

No functional change intended.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250318034934.86708-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'support-tcp_rto_min_us-and-tcp_delack_max_us-for-set-getsockopt'

Jason Xing says:

====================
support TCP_RTO_MIN_US and TCP_DELACK_MAX_US for set/getsockopt

Add set/getsockopt supports for TCP_RTO_MIN_US and TCP_DELACK_MAX_US.
====================

Link: https://patch.msgid.link/20250317120314.41404-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: support TCP_DELACK_MAX_US for set/getsockopt use

Support adjusting/reading delayed ack max for socket level by using
set/getsockopt().

This option aligns with TCP_BPF_DELACK_MAX usage. Considering that bpf
option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

Add WRITE_ONCE/READ_ONCE() to prevent data-race if setsockopt()
happens to write one value to icsk_delack_max while icsk_delack_max is
being read.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: support TCP_RTO_MIN_US for set/getsockopt use

Support adjusting/reading RTO MIN for socket level by using set/getsockopt().

This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.

This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-add-vxlan-to-the-same-hardware-domain-as-physical-bridge-ports'

Petr Machata says:

====================
mlxsw: Add VXLAN to the same hardware domain as physical bridge ports

Amit Cohen writes:

Packets which are trapped to CPU for forwarding in software data path
are handled according to driver marking of skb->offload_{,l3}_fwd_mark.
Packets which are marked as L2-forwarded in hardware, will not be flooded
by the bridge to bridge ports which are in the same hardware domain as the
ingress port.

Currently, mlxsw does not add VXLAN bridge ports to the same hardware
domain as physical bridge ports despite the fact that the device is able
to forward packets to and from VXLAN tunnels in hardware. In some
scenarios this can result in remote VTEPs receiving duplicate packets.

To solve such packets duplication, add VXLAN bridge ports to the same
hardware domain as other bridge ports.

One complication is ARP suppression which requires the local VTEP to avoid
flooding ARP packets to remote VTEPs if the local VTEP is able to reply on
behalf of remote hosts. This is currently implemented by having the device
flood ARP packets in hardware and trapping them during VXLAN encapsulation,
but marking them with skb->offload_fwd_mark=1 so that the bridge will not
re-flood them to physical bridge ports.

The above scheme will break when VXLAN bridge ports are added to the same
hardware domain as physical bridge ports as ARP packets that cannot be
suppressed by the bridge will not be able to egress the VXLAN bridge ports
due to hardware domain filtering. This is solved by trapping ARP packets
when they enter the device and not marking them as being forwarded in
hardware.

Patch set overview:
Patch #1 sets hardware to trap ARP packets at layer 2
Patches #2-#4 are preparations for setting hardwarwe domain of VXLAN
Patch #5 sets hardware domain of VXLAN
Patch #6 extends VXLAN flood test to verify that this set solves the
packets duplication
====================

Link: https://patch.msgid.link/cover.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: vxlan_bridge: Test flood with unresolved FDB entry

Extend flood test to configure FDB entry with unresolved destination IP,
check that packets are not sent twice.

Without the previous patch which handles such scenario in mlxsw, the
tests fail:

$ TESTS='test_flood' ./vxlan_bridge_1d.sh
Running tests with UDP port 4789
TEST: VXLAN: flood                                                  [ OK ]
TEST: VXLAN: flood, unresolved FDB entry                            [FAIL]
        vx2 ns2: Expected to capture 10 packets, got 20.

$ TESTS='test_flood' ./vxlan_bridge_1q.sh
INFO: Running tests with UDP port 4789
TEST: VXLAN: flood vlan 10                                          [ OK ]
TEST: VXLAN: flood vlan 20                                          [ OK ]
TEST: VXLAN: flood vlan 10, unresolved FDB entry                    [FAIL]
        vx10 ns2: Expected to capture 10 packets, got 20.
TEST: VXLAN: flood vlan 20, unresolved FDB entry                    [FAIL]
        vx20 ns2: Expected to capture 10 packets, got 20.

With the previous patch, the tests pass.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: Add VXLAN bridge ports to same hardware domain as physical bridge ports

When hardware floods packets to bridge ports, but flooding to VXLAN bridge
port fails during encapsulation to one of the remote VTEPs, the packets are
trapped to CPU. In such case, the packets are marked with
skb->offload_fwd_mark, which means that packet was L2-forwarded in
hardware. Software data path repeats flooding, but packets which are
marked with skb->offload_fwd_mark will not be flooded by the bridge to
bridge ports which are in the same hardware domain as the ingress port.

Currently, mlxsw does not add VXLAN bridge ports to the same hardware
domain as physical bridge ports despite the fact that the device is able
to forward packets to and from VXLAN tunnels in hardware. In some scenarios
(as mentioned above) this can result in remote VTEPs receiving duplicate
packets. The packets are first flooded by hardware and after an
encapsulation failure, they are flooded again to all remote VTEPs by
software.

Solve this by adding VXLAN bridge ports to the same hardware domain as
physical bridge ports, so then nbp_switchdev_allowed_egress() will return
false also for VXLAN, and packets will not be sent twice from VXLAN device.

switchdev_bridge_port_offload() should get vxlan_dev not as const, so
some changes are required. Call switchdev API from
mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations.

Reported-by: Vladimir Oltean <olteanv@gmail.com>
Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/
Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_switchdev: Move mlxsw_sp_bridge_vxlan_join()

Next patch will call __mlxsw_sp_bridge_vxlan_leave() from
mlxsw_sp_bridge_vxlan_join() as part of error flow, move the function to
be able to call the second one.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/64750a0965536530482318578bada30fac372b8a.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_switchdev: Add an internal API for VXLAN leave

There is asymmetry in how the VXLAN join and leave functions are used.
The join function (mlxsw_sp_bridge_vxlan_join()) is only called in
response to netdev events (e.g., VXLAN device joining a bridge), but the
leave function is also called in response to switchdev events (e.g.,
VLAN configuration on top of the VXLAN device) in order to invalidate
VNI to FID mappings.

This asymmetry will cause problems when the functions will be later
extended to mark VXLAN bridge ports as offloaded or not.

Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave())
that is used to invalidate VNI to FID mappings and call it from
mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to
netdev events, like mlxsw_sp_bridge_vxlan_join().

No functional changes intended.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum: Call mlxsw_sp_bridge_vxlan_{join, leave}() for VLAN-aware bridge

mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device
joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the
bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a
VLAN, but at this point no VLANs are configured on the VxLAN device. This
means that we can call the APIs, but there is no point to do that, as they
do not configure anything in such cases.

Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware
domain for VXLAN, this should be done also when a VXLAN device joins or
leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything
in these flows.

Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like
mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up,
so move the check to be done before calling
mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing
behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave().

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: Trap ARP packets at layer 2 instead of layer 3

Next patch will set the same hardware domain for all bridge ports,
including VXLAN, to prevent packets from being forwarded by software when
they were already forwarded by hardware.

ARP packets are not flooded by hardware to VXLAN, so software should handle
such flooding. When hardware domain of VXLAN device will be changed, ARP
packets which are trapped and marked with offload_fwd_mark will not be
flooded to VXLAN also in software, which will break VXLAN traffic.

To prevent such breaking, trap ARP packets at layer 2 and don't mark them
as L2-forwarded in hardware, then flooding ARP packets will be done only
in software, and VXLAN will send ARP packets.

Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are
trapped when they enter the device.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: introduce per netns packet chains

Currently network taps unbound to any interface are linked in the
global ptype_all list, affecting the performance in all the network
namespaces.

Add per netns ptypes chains, so that in the mentioned case only
the netns owning the packet socket(s) is affected.

While at that drop the global ptype_all list: no in kernel user
registers a tap on "any" type without specifying either the target
device or the target namespace (and IMHO doing that would not make
any sense).

Note that this adds a conditional in the fast path (to check for
per netns ptype_specific list) and increases the dataset size by
a cacheline (owing the per netns lists).

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumaze@google.com>
Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tty: caif: removed unused function debugfs_tx()

Remove debugfs_tx() which was added when the caif driver was added in
commit 9b27105b4a44 ("net-caif-driver: add CAIF serial driver (ldisc)")
but it has never been used.

Flagged by LLVM 19.1.7 W=1 builds.

Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250320-caif-debugfs-tx-v1-1-be5654770088@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: Drop unused of_gpio.h

of_gpio.h is deprecated. Since there is no of_gpio_x API, drop
unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if
no user in driver.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: transition to the faux device interface

The net fixed phy driver does not require the creation of a platform
device. Originally, this approach was chosen for simplicity when the
driver was first implemented.

With the introduction of the lightweight faux device interface, we now
have a more appropriate alternative. Migrate the device to utilize the
faux bus, given that the platform device it previously created was not
a real one anyway. This will get rid of the fake platform device.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/20250319135209.2734594-1-sudeep.holla@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlx5-cleanups-2025-03-19'

Tariq Toukan says:

====================
mlx5 cleanups 2025-03-19

This series contains small cleanups to the mlx5 core and Eth drivers.
====================

Link: https://patch.msgid.link/1742412199-159596-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Always select CONFIG_PAGE_POOL_STATS

Always set PAGE_POOL_STATS in mlx5 Eth driver.
Cleanup the corresponding #ifdefs.

Page pool stats are essential to monitor and analyze RX performance.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Use right API to free bitmap memory

Use bitmap_free() to free memory allocated with bitmap_zalloc_node().
This fixes memtrack error:
mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove NULL check before dev_{put, hold}

Fix coccinelle warnings:
WARNING: NULL check before dev_{put, hold} functions is not needed.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phylink: Remove unused function pointer from phylink structure

From what I can tell the .get_fixed_state pointer in the phylink structure
hasn't been used since commit 5c05c1dbb177 ("net: phylink, dsa: eliminate
phylink_fixed_state_cb()") . Since I can't find any users for it we might
as well just drop the pointer.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/174240634772.1745174.5690351737682751849.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: Eliminate redundant assignment

The assignment of zero to udph->check is unnecessary as it is
immediately overwritten in the subsequent line. Remove the redundant
assignment.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250319-netpoll_nit-v1-1-a7faac5cbd92@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Call xpcs_config_eee_mult_fact() only when xpcs is present

Some dwmac variants such as dwmac_socfpga don't use xpcs but lynx_pcs.

Don't call xpcs_config_eee_mult_fact() in this case, as this causes a
crash at init :

Unable to handle kernel NULL pointer dereference at virtual address 00000039 when write

[...]

Call trace:
  xpcs_config_eee_mult_fact from stmmac_pcs_setup+0x40/0x10c
  stmmac_pcs_setup from stmmac_dvr_probe+0xc0c/0x1244
  stmmac_dvr_probe from socfpga_dwmac_probe+0x130/0x1bc
  socfpga_dwmac_probe from platform_probe+0x5c/0xb0

Fixes: 060fb27060e8 ("net: stmmac: call xpcs_config_eee_mult_fact()")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250321103502.1303539-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_ctx: Don't assume indirection table is present

The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.

Skip the check if 'indir' is not present.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs/kcm: Fix typo "BFP"

'BFP' should be 'BPF'.

Signed-off-by: Ryohei Kinugawa <ryohei.kinugawa@gmail.com>
Link: https://patch.msgid.link/20250318095154.4187952-1-ryohei.kinugawa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'r8169-enable-more-devices-aspm-support'

ChunHao Lin says:

====================
r8169: enable more devices ASPM support

This series of patches will enable more devices ASPM support.
It also fix a RTL8126 cannot enter L1 substate issue when ASPM is
enabled.
====================

Link: https://patch.msgid.link/20250318083721.4127-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: disable RTL8126 ZRX-DC timeout

Disable it due to it dose not meet ZRX-DC specification. If it is enabled,
device will exit L1 substate every 100ms. Disable it for saving more power
in L1 substate.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-3-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: enable RTL8168H/RTL8168EP/RTL8168FP ASPM support

This patch will enable RTL8168H/RTL8168EP/RTL8168FP ASPM support on
the platforms that have tested with ASPM enabled.

Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250318083721.4127-2-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: networking: strparser: Fix a typo

The context indicates that 'than' is the correct word instead of 'then',
as a comparison is being performed.

Given that 'then' is also a valid English word, checkpatch.pl wouldn't
have picked up on this spelling error.

This typo was caught by AI during code review.

Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://patch.msgid.link/A43BEA49ED5CC6E5+20250318074656.644391-1-wangyuli@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: fix the path of example code and example commands for device memory TCP

This updates the old path and fixes the description of unavailable options.

Signed-off-by: Yui Washizu <yui.washidu@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250318061251.775191-1-yui.washidu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp/dccp: Remove inet_connection_sock_af_ops.addr2sockaddr().

inet_connection_sock_af_ops.addr2sockaddr() hasn't been used at all
in the git era.

$ git grep addr2sockaddr $(git rev-list HEAD | tail -n 1)

Let's remove it.

Note that there was a 4 bytes hole after sockaddr_len and now it's
6 bytes, so the binary layout is not changed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250318060112.3729-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest: net: update proc_net_pktgen (add more imix_weights test cases)

Add more imix_weights test cases (for incomplete input).

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pktgen: add strict buffer parsing index check

Add strict buffer parsing index check to avoid the following Smatch
warning:

net/core/pktgen.c:877 get_imix_entries()
warn: check that incremented offset 'i' is capped

Checking the buffer index i after every get_user/i++ step and returning
with error code immediately avoids the current indirect (but correct)
error handling.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/netdev/36cf3ee2-38b1-47e5-a42a-363efeb0ace3@stanley.mountain/
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-1-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: adjust the file entry in INTEL PMC CORE DRIVER

Commit 7e2f7e25f6ff ("arch: x86: add IPC mailbox accessor function and add
SoC register access") adds a new file entry referring to the non-existent
file linux/platform_data/x86/intel_pmc_ipc.h in section INTEL PMC CORE
DRIVER rather than referring to the file
include/linux/platform_data/x86/intel_pmc_ipc.h added with this commit.
Note that it was missing 'include' in the beginning.

Adjust the file reference to the intended file.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250317092717.322862-1-lukas.bulwahn@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: move icsk_clean_acked to a better location

As a followup of my presentation in Zagreb for netdev 0x19:

icsk_clean_acked is only used by TCP when/if CONFIG_TLS_DEVICE
is enabled from tcp_ack().

Rename it to tcp_clean_acked, move it to tcp_sock structure
in the tcp_sock_read_rx for better cache locality in TCP
fast path.

Define this field only when CONFIG_TLS_DEVICE is enabled
saving 8 bytes on configs not using it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250317085313.2023214-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: dp83822: fix transmit amplitude if CONFIG_OF_MDIO not defined

When CONFIG_OF_MDIO is not defined the index for selecting the transmit
amplitude voltage for 100BASE-TX is set to 0, but it should be -1, if there
is no need to modify the transmit amplitude voltage. Move initialization of
the index from dp83822_of_init to dp8382x_probe.

Fixes: 4f3735e82d8a ("net: phy: dp83822: Add support for changing the transmit amplitude voltage")
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Link: https://patch.msgid.link/20250317-dp83822-fix-transceiver-mdio-v2-1-fb09454099a4@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: resolve WARN_ON when freeing IRQs

When IRQs are freed, a WARN_ON is triggered as the
affinity notifier is not released.
This results in the below stack trace:

[  484.544586]  ? __warn+0x84/0x130
[  484.544843]  ? free_irq+0x5c/0x70
[  484.545105]  ? report_bug+0x18a/0x1a0
[  484.545390]  ? handle_bug+0x53/0x90
[  484.545664]  ? exc_invalid_op+0x14/0x70
[  484.545959]  ? asm_exc_invalid_op+0x16/0x20
[  484.546279]  ? free_irq+0x5c/0x70
[  484.546545]  ? free_irq+0x10/0x70
[  484.546807]  ena_free_io_irq+0x5f/0x70 [ena]
[  484.547138]  ena_down+0x250/0x3e0 [ena]
[  484.547435]  ena_destroy_device+0x118/0x150 [ena]
[  484.547796]  __ena_shutoff+0x5a/0xe0 [ena]
[  484.548110]  pci_device_remove+0x3b/0xb0
[  484.548412]  device_release_driver_internal+0x193/0x200
[  484.548804]  driver_detach+0x44/0x90
[  484.549084]  bus_remove_driver+0x69/0xf0
[  484.549386]  pci_unregister_driver+0x2a/0xb0
[  484.549717]  ena_cleanup+0xc/0x130 [ena]
[  484.550021]  __do_sys_delete_module.constprop.0+0x176/0x310
[  484.550438]  ? syscall_trace_enter+0xfb/0x1c0
[  484.550782]  do_syscall_64+0x5b/0x170
[  484.551067]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Adding a call to `netif_napi_set_irq` with -1 as the IRQ index,
which frees the notifier.

Fixes: de340d8206bf ("net: ena: use napi's aRFS rmap notifers")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250317071147.1105-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: openvswitch: fix kernel-doc warnings in internal headers

Some field descriptions were missing, some were not very accurate.
Not touching the uAPI header or .c files for now.

Formatting of those comments isn't great in general, but at least
they are not missing anything now.

Before:
  $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2>&1 | wc -l
  16

After:
  $ ./scripts/kernel-doc -none -Wall net/openvswitch/*.h 2>&1 | wc -l
  0

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250320224431.252489-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: phy_interface_t: Fix RGMII_TXID code comment

Fix copy-paste error in the code comment for Interface Mode definitions.
The code refers to Internal TX delay, not Internal RX delay. It was likely
copied from the line above this one.

Signed-off-by: Ihor Matushchak <ihor.matushchak@foobox.net>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250316071551.9794-1-ihor.matushchak@foobox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: disable PHY-mode EEE

Realtek RTL8211F has a "PHY-mode" EEE support which interferes with an
IEEE 802.3 compliant implementation. This mode defaults to enabled, and
results in the MAC receive path not seeing the link transition to LPI
state.

Fix this by disabling PHY-mode EEE.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: fix genphy_c45_eee_is_active() for disabled EEE

Commit 809265fe96fe ("net: phy: c45: remove local advertisement
parameter from genphy_c45_eee_is_active") stopped reading the local
advertisement from the PHY earlier in this development cycle, which
broke "ethtool --set-eee ethX eee off".

When ethtool is used to set EEE off, genphy_c45_eee_is_active()
indicates that EEE was active if the link partner reported an
advertisement, which causes phylib to set phydev->enable_tx_lpi on
link up, despite our local advertisement in hardware being empty.
However, phydev->advertising_eee is preserved while EEE is turned off,
which leads to genphy_c45_eee_is_active() incorrectly reporting that
EEE is active.

Fix it by checking phydev->eee_cfg.eee_enabled, and if clear,
immediately indicate that EEE is not active.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mlx5e-support-recovery-counter-in-reset'

Tariq Toukan says:

====================
mlx5e: Support recovery counter in reset

This series by Yael adds a recovery counter in ethtool, for any recovery
type during port reset cycle.
Series starts with some cleanup and refactoring patches.
New counter is added and exposed to ethtool stats in patch #4.
====================

Link: https://patch.msgid.link/1742112876-2890-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Expose port reset cycle recovery counter via ethtool

Display recovery event of PPCNT recovery counters group. Counts (per
link) the number of total successful recovery events of any recovery
types during port reset cycle.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Get counter group size by FW capability

Retrieve the number of fields supported by each PPCNT counter group
based on the FW capability for this group.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742112876-2890-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Access PHY layer counter group as other counter groups

Adjust the way physical layer counters group is accessed to match the
generic method used for accessing other PPCNT counter groups.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1742112876-2890-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Ensure each counter group uses its PCAM bit

The code was incorrectly relying on PCAM bit of ppcnt_statistical_group
for accessing per_lane_error_counters.
If ppcnt_statistical_group PCAM bit was not set, we would not read
per_lane_error_counters, even when its PCAM bit is set.
Given the existing device capabilities, it seems to cause no harm, so
this change primarily serves as cleanup.

Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/1742112876-2890-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sockopt: fix getting freebind & transparent

When adding a socket option support in MPTCP, both the get and set parts
are supposed to be implemented.

IP(V6)_FREEBIND and IP(V6)_TRANSPARENT support for the setsockopt part
has been added a while ago, but it looks like the get part got
forgotten. It should have been present as a way to verify a setting has
been set as expected, and not to act differently from TCP or any other
socket types.

Everything was in place to expose it, just the last step was missing.
Only new code is added to cover these specific getsockopt(), that seems
safe.

Fixes: c9406a23c116 ("mptcp: sockopt: add SOL_IP freebind & transparent options")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-3-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sockopt: fix getting IPV6_V6ONLY

When adding a socket option support in MPTCP, both the get and set parts
are supposed to be implemented.

IPV6_V6ONLY support for the setsockopt part has been added a while ago,
but it looks like the get part got forgotten. It should have been
present as a way to verify a setting has been set as expected, and not
to act differently from TCP or any other socket types.

Not supporting this getsockopt(IPV6_V6ONLY) blocks some apps which want
to check the default value, before doing extra actions. On Linux, the
default value is 0, but this can be changed with the net.ipv6.bindv6only
sysctl knob. On Windows, it is set to 1 by default. So supporting the
get part, like for all other socket options, is important.

Everything was in place to expose it, just the last step was missing.
Only new code is added to cover this specific getsockopt(), that seems
safe.

Fixes: c9b95a135987 ("mptcp: support IPV6_V6ONLY setsockopt")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/550
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-2-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'netconsole-add-support-for-userdata-release'

Breno Leitao says:

====================
netconsole: Add support for userdata release

I am submitting a series of patches that introduce a new feature for the
netconsole subsystem, specifically the addition of the 'release' field
to the sysdata structure. This feature allows the kernel release/version
to be appended to the userdata dictionary in every message sent,
enhancing the information available for debugging and monitoring
purposes.

This complements the already supported release prepend feature, which
was added some time ago. The release prepend appends the release
information at the message header, which is not ideal for two reasons:

1) It is difficult to determine if a message includes this information,
making it hard and resource-intensive to parse.

2) When a message is fragmented, the release information is appended to
every message fragment, consuming valuable space in the packet.

The "release prepend" feature was created before the concept of userdata
and sysdata. Now that this format has proven successful, we are
implementing the release feature as part of this enhanced structure.

This patch series aims to improve the netconsole subsystem by providing
a more efficient and user-friendly way to include kernel release
information in messages. I believe these changes will significantly aid
in system analysis and troubleshooting.

Suggested-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
====================

Link: https://patch.msgid.link/20250314-netcons_release-v1-0-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

docs: netconsole: document release feature

Add documentation explaining the kernel release auto-population feature
in netconsole.

This feature appends kernel version information to the userdata
dictionary in every message sent when enabled via the `release_enabled`
file in the configfs hierarchy.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-6-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: netconsole: Add tests for 'release' feature in sysdata

Expands the self-tests to include the 'release' feature in
sysdata.

Verifies that enabling the 'release' feature appends the
correct data and ensures that disabling it functions as expected.

When enabled, the message should have an item similar to in the
userdata: `release=$(uname -r)`

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-5-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: append release to sysdata

Append the init_utsname()->release to sysdata buffer before sending the
message in case the feature is set.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-4-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: add 'sysdata' suffix to related functions

This commit appends a common "sysdata" suffix to functions responsible
for appending data to sysdata.

This change enhances code clarity and prevents naming conflicts with
other "append" functions, particularly in anticipation of the upcoming
inclusion of the `release` field in the next patch.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-3-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: implement configfs for release_enabled

Implement the configfs helpers to show and set release_enabled configfs
directories under userdata.

When enabled, set the feature bit in netconsole_target->sysdata_fields.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-2-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netconsole: introduce 'release' as a new sysdata field

This commit adds a new feature to the sysdata structure, allowing the
kernel release/version to be appended as part of sysdata. Additionally,
it updates the logic to count this new field as a used entry when
enabled.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-1-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: fix CONFIG_DEBUG_FS check

The #if check causes a build failure when CONFIG_DEBUG_FS is turned
off:

In file included from drivers/net/ethernet/airoha/airoha_eth.c:17:
drivers/net/ethernet/airoha/airoha_eth.h:543:5: error: "CONFIG_DEBUG_FS" is not defined, evaluates to 0 [-Werror=undef]
543 | #if CONFIG_DEBUG_FS
| ^~~~~~~~~~~~~~~

Replace it with the correct #ifdef.

Fixes: 3fe15c640f38 ("net: airoha: Introduce PPE debugfs support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250314155009.4114308-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mctp: Remove unnecessary cast in mctp_cb

The void * cast in mctp_cb is unnecessary as it's already been done
at the start of the function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/Z9PwOQeBSYlgZlHq@gondor.apana.org.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-phy-remove-calls-to-devm_hwmon_sanitize_name'

Heiner Kallweit says:

====================
net: phy: remove calls to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.
====================

Link: https://patch.msgid.link/198f3cd0-6c39-4783-afe7-95576a4b8539@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: marvell-88q2xxx: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/59c485e4-983c-42f6-9114-916703a62e3f@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: mxl-gpy: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/e34c4802-20ce-4556-a47c-812e602e8526@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: tja11xx: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Note that neither priv->hwmon_name nor priv->hwmon_dev are used
outside tja11xx_hwmon_register.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/4452cb7e-1a2f-4213-b49f-9de196be9204@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: realtek: remove call to devm_hwmon_sanitize_name

Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in
devm_hwmon_device_register_with_info") we can simply provide NULL
as name argument.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/6e8d26f4-8d0a-4c83-aec3-378847a377eb@gmail.com
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: remove sb1000 cable modem driver

This one is hilariously outdated, it provided a faster downlink over
TV cable for users of analog modems in the 1990s, through an ISA card.

The web page for the userspace tools has been broken for 25 years, and
the driver has only ever seen mechanical updates.

Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-03-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 4 non-merge commits during the last 3 day(s) which contain
a total of 2 files changed, 35 insertions(+), 12 deletions(-).

The main changes are:

1) bpf_getsockopt support for TCP_BPF_RTO_MIN and TCP_BPF_DELACK_MAX,
   from Jason Xing

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Add bpf_getsockopt() for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN
  tcp: bpf: Support bpf_getsockopt for TCP_BPF_DELACK_MAX
  tcp: bpf: Support bpf_getsockopt for TCP_BPF_RTO_MIN
  tcp: bpf: Introduce bpf_sol_tcp_getsockopt to support TCP_BPF flags
====================

Link: https://patch.msgid.link/20250313221620.2512684-1-martin.lau@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.14-rc8).

Conflict:

tools/testing/selftests/net/Makefile
  03544faad761 ("selftest: net: add proc_net_pktgen")
  3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops")

tools/testing/selftests/net/config:
  85cb3711acb8 ("selftests: net: Add test cases for link and peer netns")
  3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops")

Adjacent commits:

tools/testing/selftests/net/Makefile
  c935af429ec2 ("selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY")
  355d940f4d5a ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from can, bluetooth and ipsec.

  This contains a last minute revert of a recent GRE patch, mostly to
  allow me stating there are no known regressions outstanding.

  Current release - regressions:

   - revert "gre: Fix IPv6 link-local address generation."

   - eth: ti: am65-cpsw: fix NAPI registration sequence

  Previous releases - regressions:

   - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().

   - mptcp: fix data stream corruption in the address announcement

   - bluetooth: fix connection regression between LE and non-LE adapters

   - can:
       - flexcan: only change CAN state when link up in system PM
       - ucan: fix out of bound read in strscpy() source

  Previous releases - always broken:

   - lwtunnel: fix reentry loops

   - ipv6: fix TCP GSO segmentation with NAT

   - xfrm: force software GSO only in tunnel mode

   - eth: ti: icssg-prueth: add lock to stats

  Misc:

   - add Andrea Mayer as a maintainer of SRv6"

* tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
  MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
  Revert "gre: Fix IPv6 link-local address generation."
  Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
  net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
  tools headers: Sync uapi/asm-generic/socket.h with the kernel sources
  mptcp: Fix data stream corruption in the address announcement
  selftests: net: test for lwtunnel dst ref loops
  net: ipv6: ioam6: fix lwtunnel_output() loop
  net: lwtunnel: fix recursion loops
  net: ti: icssg-prueth: Add lock to stats
  net: atm: fix use after free in lec_send()
  xsk: fix an integer overflow in xp_create_and_assign_umem()
  net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data
  selftests: drv-net: use defer in the ping test
  phy: fix xa_alloc_cyclic() error handling
  dpll: fix xa_alloc_cyclic() error handling
  devlink: fix xa_alloc_cyclic() error handling
  ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().
  ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
  net: ipv6: fix TCP GSO segmentation with NAT
  ...

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Collected driver fixes from the last few weeks, I was surprised how
  significant many of them seemed to be.

   - Fix rdma-core test failures due to wrong startup ordering in rxe

   - Don't crash in bnxt_re if the FW supports more than 64k QPs

   - Fix wrong QP table indexing math in bnxt_re

   - Calculate the max SRQs for userspace properly in bnxt_re

   - Don't try to do math on errno for mlx5's rate calculation

   - Properly allow userspace to control the VLAN in the QP state during
     INIT->RTR for bnxt_re

   - 6 bug fixes for HNS:
      - Soft lockup when processing huge MRs, add a cond_resched()
      - Fix missed error unwind for doorbell allocation
      - Prevent bad send queue parameters from userspace
      - Wrong error unwind in qp creation
      - Missed xa_destroy during driver shutdown
      - Fix reporting to userspace of max_sge_rd, hns doesn't have a
        read/write difference"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Fix wrong value of max_sge_rd
  RDMA/hns: Fix missing xa_destroy()
  RDMA/hns: Fix a missing rollback in error path of hns_roce_create_qp_common()
  RDMA/hns: Fix invalid sq params not being blocked
  RDMA/hns: Fix unmatched condition in error path of alloc_user_qp_db()
  RDMA/hns: Fix soft lockup during bt pages loop
  RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp path
  RDMA/mlx5: Handle errors returned from mlx5r_ib_rate()
  RDMA/bnxt_re: Fix reporting maximum SRQs on P7 chips
  RDMA/bnxt_re: Add missing paranthesis in map_qp_id_to_tbl_indx
  RDMA/bnxt_re: Fix allocation of QP table
  RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests

Merge tag 'mmc-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC host fixes from Ulf Hansson:

- sdhci-brcmstb: Fix CQE suspend/resume support

- atmel-mci: Add a missing clk_disable_unprepare() in ->probe()

* tag 'mmc-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-brcmstb: add cqhci suspend/resume to PM ops
mmc: atmel-mci: Add missing clk_disable_unprepare()

Merge tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:
"Here's a final batch of EFI fixes for v6.14.

  The efivarfs ones are fixes for changes that were made this cycle.
  James's fix is somewhat of a band-aid, but it was blessed by the VFS
  folks, who are working with James to come up with something better for
  the next cycle.

   - Avoid physical address 0x0 for random page allocations

   - Add correct lockdep annotation when traversing efivarfs on resume

   - Avoid NULL mount in kernel_file_open() when traversing efivarfs on
     resume"

* tag 'efi-fixes-for-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efivarfs: fix NULL dereference on resume
  efivarfs: use I_MUTEX_CHILD nested lock to traverse variables on resume
  efi/libstub: Avoid physical address 0x0 when doing random allocation

MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6

Andrea has made significant contributions to SRv6 support in Linux.
Acknowledge the work and on-going interest in Srv6 support with a
maintainers entry for these files so hopefully he is included
on patches going forward.

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250312092212.46299-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'gre-revert-ipv6-link-local-address-fix'

Guillaume Nault says:

====================
gre: Revert IPv6 link-local address fix.

Following Paolo's suggestion, let's revert the IPv6 link-local address
generation fix for GRE devices. The patch introduced regressions in the
upstream CI, which are still under investigation.

Start by reverting the kselftest that depend on that fix (patch 1), then
revert the kernel code itself (patch 2).
====================

Link: https://patch.msgid.link/cover.1742418408.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Revert "gre: Fix IPv6 link-local address generation."

This reverts commit 183185a18ff96751db52a46ccf93fff3a1f42815.

This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.

Fixes: 183185a18ff9 ("gre: Fix IPv6 link-local address generation.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/8b1ce738eb15dd841aab9ef888640cab4f6ccfea.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."

This reverts commit 6f50175ccad4278ed3a9394c00b797b75441bd6e.

Commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation.") is
going to be reverted. So let's revert the corresponding kselftest
first.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/259a9e98f7f1be7ce02b53d0b4afb7c18a8ff747.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2025-03-19

1) Fix tunnel mode TX datapath in packet offload mode
   by directly putting it to the xmit path.
   From Alexandre Cassen.

2) Force software GSO only in tunnel mode in favor
   of potential HW GSO. From Cosmin Ratiu.

ipsec-2025-03-19

* tag 'ipsec-2025-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm_output: Force software GSO only in tunnel mode
  xfrm: fix tunnel mode TX datapath in packet offload mode
====================

Link: https://patch.msgid.link/20250319065513.987135-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is batman-adv bugfix:

- Ignore own maximum aggregation size during RX, Sven Eckelmann

* tag 'batadv-net-pullrequest-20250318' of git://git.open-mesh.org/linux-merge:
batman-adv: Ignore own maximum aggregation size during RX
====================

Link: https://patch.msgid.link/20250318150035.35356-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES

Previous commit 8b5c171bb3dc ("neigh: new unresolved queue limits")
introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent
approximative value for deprecated QUEUE_LEN. However, it forgot to add
the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one
simple NLA_U32 type policy.

Fixes: 8b5c171bb3dc ("neigh: new unresolved queue limits")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://patch.msgid.link/20250315165113.37600-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

tools headers: Sync uapi/asm-generic/socket.h with the kernel sources

This also fixes a wrong definitions for SCM_TS_OPT_ID & SO_RCVPRIORITY.

Accidentally found while working on another patchset.

Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>
Cc: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: a89568e9be75 ("selftests: txtimestamp: add SCM_TS_OPT_ID test")
Fixes: e45469e594b2 ("sock: Introduce SO_RCVPRIORITY socket option")
Link: https://lore.kernel.org/netdev/20250314195257.34854-1-kuniyu@amazon.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250314214155.16046-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: Fix data stream corruption in the address announcement

Because of the size restriction in the TCP options space, the MPTCP
ADD_ADDR option is exclusive and cannot be sent with other MPTCP ones.
For this reason, in the linked mptcp_out_options structure, group of
fields linked to different options are part of the same union.

There is a case where the mptcp_pm_add_addr_signal() function can modify
opts->addr, but not ended up sending an ADD_ADDR. Later on, back in
mptcp_established_options, other options will be sent, but with
unexpected data written in other fields due to the union, e.g. in
opts->ext_copy. This could lead to a data stream corruption in the next
packet.

Using an intermediate variable, prevents from corrupting previously
established DSS option. The assignment of the ADD_ADDR option
parameters is now done once we are sure this ADD_ADDR option can be set
in the packet, e.g. after having dropped other suboptions.

Fixes: 1bff1e43a30e ("mptcp: optimize out option generation")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Arthur Mongodin <amongodin@randorisec.fr>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ Matt: the commit message has been updated: long lines splits and some
clarifications. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-1-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-fix-lwtunnel-reentry-loops'

Justin Iurman says:

====================
net: fix lwtunnel reentry loops

When the destination is the same after the transformation, we enter a
lwtunnel loop. This is true for most of lwt users: ioam6, rpl, seg6,
seg6_local, ila_lwt, and lwt_bpf. It can happen in their input() and
output() handlers respectively, where either dst_input() or dst_output()
is called at the end. It can also happen in xmit() handlers.

Here is an example for rpl_input():

dump_stack_lvl+0x60/0x80
rpl_input+0x9d/0x320
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
[...]
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
ip6_sublist_rcv_finish+0x85/0x90
ip6_sublist_rcv+0x236/0x2f0

... until rpl_do_srh() fails, which means skb_cow_head() failed.

This series provides a fix at the core level of lwtunnel to catch such
loops when they're not caught by the respective lwtunnel users, and
handle the loop case in ioam6 which is one of the users. This series
also comes with a new selftest to detect some dst cache reference loops
in lwtunnel users.
====================

Link: https://patch.msgid.link/20250314120048.12569-1-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: test for lwtunnel dst ref loops

As recently specified by commit 0ea09cbf8350 ("docs: netdev: add a note
on selftest posting") in net-next, the selftest is therefore shipped in
this series. However, this selftest does not really test this series. It
needs this series to avoid crashing the kernel. What it really tests,
thanks to kmemleak, is what was fixed by the following commits:
- commit c71a192976de ("net: ipv6: fix dst refleaks in rpl, seg6 and
ioam6 lwtunnels")
- commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and
ioam6 lwtunnels")
- commit c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6
lwt")
- commit 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl
lwt")
- commit 0e7633d7b95b ("net: ipv6: fix dst ref loop in ila lwtunnel")
- commit 5da15a9c11c1 ("net: ipv6: fix missing dst ref drop in ila
lwtunnel")

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-4-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ipv6: ioam6: fix lwtunnel_output() loop

Fix the lwtunnel_output() reentry loop in ioam6_iptunnel when the
destination is the same after transformation. Note that a check on the
destination address was already performed, but it was not enough. This
is the example of a lwtunnel user taking care of loops without relying
only on the last resort detection offered by lwtunnel.

Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-3-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: lwtunnel: fix recursion loops

This patch acts as a parachute, catch all solution, by detecting
recursion loops in lwtunnel users and taking care of them (e.g., a loop
between routes, a loop within the same route, etc). In general, such
loops are the consequence of pathological configurations. Each lwtunnel
user is still free to catch such loops early and do whatever they want
with them. It will be the case in a separate patch for, e.g., seg6 and
seg6_local, in order to provide drop reasons and update statistics.
Another example of a lwtunnel user taking care of loops is ioam6, which
has valid use cases that include loops (e.g., inline mode), and which is
addressed by the next patch in this series. Overall, this patch acts as
a last resort to catch loops and drop packets, since we don't want to
leak something unintentionally because of a pathological configuration
in lwtunnels.

The solution in this patch reuses dev_xmit_recursion(),
dev_xmit_recursion_inc(), and dev_xmit_recursion_dec(), which seems fine
considering the context.

Closes: https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
Closes: https://lore.kernel.org/netdev/Z7NKYMY7fJT5cYWu@shredder/
Fixes: ffce41962ef6 ("lwtunnel: support dst output redirect function")
Fixes: 2536862311d2 ("lwt: Add support to redirect dst.input")
Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-2-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg-prueth: Add lock to stats

Currently the API emac_update_hardware_stats() reads different ICSSG
stats without any lock protection.

This API gets called by .ndo_get_stats64() which is only under RCU
protection and nothing else. Add lock to this API so that the reading of
statistics happens during lock.

Fixes: c1e10d5dc7a1 ("net: ti: icssg-prueth: Add ICSSG Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314102721.1394366-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: atm: fix use after free in lec_send()

The ->send() operation frees skb so save the length before calling
->send() to avoid a use after free.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c751531d-4af4-42fe-affe-6104b34b791d@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mptcp-pm-prep-work-for-new-ops-and-sysctl-knobs'

Matthieu Baerts says:

====================
mptcp: pm: prep work for new ops and sysctl knobs

Here are a few cleanups, preparation work for the new PM ops, and sysctl
knobs.

- Patch 1: reorg: move generic NL code used by all PMs to pm_netlink.c.

- Patch 2: use kmemdup() instead of kmalloc + copy.

- Patch 3: small cleanup to use pm var instead of msk->pm.

- Patch 4: reorg: id_avail_bitmap is only used by the in-kernel PM.

- Patch 5: use struct_group to easily reset a subset of PM data vars.

- Patch 6: introduce the minimal skeleton for the new PM ops.

- Patch 7: register in-kernel and userspace PM ops.

- Patch 8: new net.mptcp.path_manager sysctl knob, deprecating pm_type.

- Patch 9: map the new path_manager sysctl knob with pm_type.

- Patch 10: map the old pm_type sysctl knob with path_manager.

- Patch 11: new net.mptcp.available_path_managers sysctl knob.

- Patch 12: new test to validate path_manager and pm_type mapping.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================

Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-0-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: add pm sysctl mapping tests

This patch checks if the newly added net.mptcp.path_manager is mapped
successfully from or to the old net.mptcp.pm_type in userspace_pm.sh.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-12-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sysctl: add available_path_managers

Similarly to net.mptcp.available_schedulers, this patch adds a new one
net.mptcp.available_path_managers to list the available path managers.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sysctl: map pm_type to path_manager

This patch adds a new proc_handler "proc_pm_type" for "pm_type" to
map old path manager sysctl "pm_type" to the newly added "path_manager".

path_manager pm_type

MPTCP_PM_TYPE_KERNEL -> "kernel"
MPTCP_PM_TYPE_USERSPACE -> "userspace"

It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-10-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sysctl: map path_manager to pm_type

This patch maps the newly added path manager sysctl "path_manager"
to the old one "pm_type".

path_manager   pm_type

"kernel"    -> MPTCP_PM_TYPE_KERNEL
"userspace" -> MPTCP_PM_TYPE_USERSPACE
others      -> __MPTCP_PM_TYPE_NR

It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-9-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: sysctl: set path manager by name

Similar to net.mptcp.scheduler, a new net.mptcp.path_manager sysctl knob
is added to determine which path manager will be used by each newly
created MPTCP socket by setting the name of it.

Dealing with an explicit name is easier than with a number, especially
when more PMs will be introduced.

This sysctl knob makes the old one "pm_type" deprecated.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-8-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: register in-kernel and userspace PM

This patch defines the original in-kernel netlink path manager as a
new struct mptcp_pm_ops named "mptcp_pm_kernel", and register it in
mptcp_pm_kernel_register(). And define the userspace path manager as
a new struct mptcp_pm_ops named "mptcp_pm_userspace", and register it
in mptcp_pm_init().

To ensure that there's always a valid path manager available, the default
path manager "mptcp_pm_kernel" will be skipped in mptcp_pm_unregister().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-7-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: define struct mptcp_pm_ops

In order to allow users to develop their own BPF-based path manager,
this patch defines a struct ops "mptcp_pm_ops" for an MPTCP path
manager, which contains a set of interfaces. Currently only init()
and release() interfaces are included, subsequent patches will add
others step by step.

Add a set of functions to register, unregister, find and validate a
given path manager struct ops.

"list" is used to add this path manager to mptcp_pm_list list when
it is registered. "name" is used to identify this path manager.
mptcp_pm_find() uses "name" to find a path manager on the list.

mptcp_pm_unregister is not used in this set, but will be invoked in
.unreg of struct bpf_struct_ops. mptcp_pm_validate() will be invoked
in .validate of struct bpf_struct_ops. That's why they are exported.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-6-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: add struct_group in mptcp_pm_data

This patch adds a "struct_group(reset, ...)" in struct mptcp_pm_data to
simplify the reset, and make sure we don't miss any.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-5-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: only fill id_avail_bitmap for in-kernel pm

id_avail_bitmap of struct mptcp_pm_data is currently only used by the
in-kernel PM, so this patch moves its initialization operation under
the "if (pm_type == MPTCP_PM_TYPE_KERNEL)" condition.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-4-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: use pm variable instead of msk->pm

The variable "pm" has been defined in mptcp_pm_fully_established()
and mptcp_pm_data_reset() as "msk->pm", so use "pm" directly instead
of using "msk->pm".

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-3-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: in-kernel: use kmemdup helper

Instead of using kmalloc() or kzalloc() to allocate an entry and
then immediately duplicate another entry to the newly allocated
one, kmemdup() helper can be used to simplify the code.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-2-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: split netlink and in-kernel init

The registration of mptcp_genl_family is useful for both the in-kernel
and the userspace PM. It should then be done in pm_netlink.c.

On the other hand, the registration of the in-kernel pernet subsystem is
specific to the in-kernel PM, and should stay there in pm_kernel.c.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-1-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: vlan: don't propagate flags on open

With the device instance lock, there is now a possibility of a deadlock:

[    1.211455] ============================================
[    1.211571] WARNING: possible recursive locking detected
[    1.211687] 6.14.0-rc5-01215-g032756b4ca7a-dirty #5 Not tainted
[    1.211823] --------------------------------------------
[    1.211936] ip/184 is trying to acquire lock:
[    1.212032] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_set_allmulti+0x4e/0xb0
[    1.212207]
[    1.212207] but task is already holding lock:
[    1.212332] ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0
[    1.212487]
[    1.212487] other info that might help us debug this:
[    1.212626]  Possible unsafe locking scenario:
[    1.212626]
[    1.212751]        CPU0
[    1.212815]        ----
[    1.212871]   lock(&dev->lock);
[    1.212944]   lock(&dev->lock);
[    1.213016]
[    1.213016]  *** DEADLOCK ***
[    1.213016]
[    1.213143]  May be due to missing lock nesting notation
[    1.213143]
[    1.213294] 3 locks held by ip/184:
[    1.213371]  #0: ffffffff838b53e0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x1b/0xa0
[    1.213543]  #1: ffffffff84e5fc70 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock+0x37/0xa0
[    1.213727]  #2: ffff8881024a4c30 (&dev->lock){+.+.}-{4:4}, at: dev_open+0x50/0xb0
[    1.213895]
[    1.213895] stack backtrace:
[    1.213991] CPU: 0 UID: 0 PID: 184 Comm: ip Not tainted 6.14.0-rc5-01215-g032756b4ca7a-dirty #5
[    1.213993] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[    1.213994] Call Trace:
[    1.213995]  <TASK>
[    1.213996]  dump_stack_lvl+0x8e/0xd0
[    1.214000]  print_deadlock_bug+0x28b/0x2a0
[    1.214020]  lock_acquire+0xea/0x2a0
[    1.214027]  __mutex_lock+0xbf/0xd40
[    1.214038]  dev_set_allmulti+0x4e/0xb0 # real_dev->flags & IFF_ALLMULTI
[    1.214040]  vlan_dev_open+0xa5/0x170 # ndo_open on vlandev
[    1.214042]  __dev_open+0x145/0x270
[    1.214046]  __dev_change_flags+0xb0/0x1e0
[    1.214051]  netif_change_flags+0x22/0x60 # IFF_UP vlandev
[    1.214053]  dev_change_flags+0x61/0xb0 # for each device in group from dev->vlan_info
[    1.214055]  vlan_device_event+0x766/0x7c0 # on netdevsim0
[    1.214058]  notifier_call_chain+0x78/0x120
[    1.214062]  netif_open+0x6d/0x90
[    1.214064]  dev_open+0x5b/0xb0 # locks netdevsim0
[    1.214066]  bond_enslave+0x64c/0x1230
[    1.214075]  do_set_master+0x175/0x1e0 # on netdevsim0
[    1.214077]  do_setlink+0x516/0x13b0
[    1.214094]  rtnl_newlink+0xaba/0xb80
[    1.214132]  rtnetlink_rcv_msg+0x440/0x490
[    1.214144]  netlink_rcv_skb+0xeb/0x120
[    1.214150]  netlink_unicast+0x1f9/0x320
[    1.214153]  netlink_sendmsg+0x346/0x3f0
[    1.214157]  __sock_sendmsg+0x86/0xb0
[    1.214160]  ____sys_sendmsg+0x1c8/0x220
[    1.214164]  ___sys_sendmsg+0x28f/0x2d0
[    1.214179]  __x64_sys_sendmsg+0xef/0x140
[    1.214184]  do_syscall_64+0xec/0x1d0
[    1.214190]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[    1.214191] RIP: 0033:0x7f2d1b4a7e56

Device setup:

     netdevsim0 (down)
     ^        ^
  bond        netdevsim1.100@netdevsim1 allmulticast=on (down)

When we enslave the lower device (netdevsim0) which has a vlan, we
propagate vlan's allmuti/promisc flags during ndo_open. This causes
(re)locking on of the real_dev.

Propagate allmulti/promisc on flags change, not on the open. There
is a slight semantics change that vlans that are down now propagate
the flags, but this seems unlikely to result in the real issues.

Reproducer:

  echo 0 1 > /sys/bus/netdevsim/new_device

  dev_path=$(ls -d /sys/bus/netdevsim/devices/netdevsim0/net/*)
  dev=$(echo $dev_path | rev | cut -d/ -f1 | rev)

  ip link set dev $dev name netdevsim0
  ip link set dev netdevsim0 up

  ip link add link netdevsim0 name netdevsim0.100 type vlan id 100
  ip link set dev netdevsim0.100 allmulticast on down
  ip link add name bond1 type bond mode 802.3ad
  ip link set dev netdevsim0 down
  ip link set dev netdevsim0 master bond1
  ip link set dev bond1 up
  ip link show

Reported-by: syzbot+b0c03d76056ef6cd12a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/Z9CfXjLMKn6VLG5d@mini-arch/T/#m15ba130f53227c883e79fb969687d69d670337a0
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313100657.2287455-1-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-ptp-fix-egregious-supported-flag-checks'

Jacob Keller says:

====================
net: ptp: fix egregious supported flag checks

In preparation for adding .supported_extts_flags and
.supported_perout_flags to the ptp_clock_info structure, fix a couple of
places where drivers get existing flag gets grossly incorrect.

The igb driver claims 82580 supports strictly validating PTP_RISING_EDGE
and PTP_FALLING_EDGE, but doesn't actually check the flags. Fix the driver
to require that the request match both edges, as this is implied by the
datasheet description.

The renesas driver also claims to support strict flag checking, but does
not actually check the flags either. I do not have the data sheet for this
device, so I do not know what edge it timestamps. For simplicity, just
reject all requests with PTP_STRICT_FLAGS. This essentially prevents the
PTP_EXTTS_REQUEST2 ioctl from working. Updating to correctly validate the
flags will require someone who has the hardware to confirm the behavior.

The lan743x driver supports (and strictly validates) that the request is
either PTP_RISING_EDGE or PTP_FALLING_EDGE but not both. However, it does
not check the flags are one of the known valid flags. Thus, requests for
PTP_EXT_OFF (and any future flag) will be accepted and misinterpreted. Add
the appropriate check to reject unsupported PTP_EXT_OFF requests and future
proof against new flags.

The broadcom PHY driver checks that PTP_PEROUT_PHASE is not set. This
appears to be an attempt at rejecting unsupported flags. It is not robust
against flag additions such as the PTP_PEROUT_ONE_SHOT, or anything added
in the future. Fix this by instead checking against the negation of the
supported PTP_PEROUT_DUTY_CYCLE instead.

The ptp_ocp driver supports PTP_PEROUT_PHASE and PTP_PEROUT_DUTY_CYCLE, but
does not check unsupported flags. Add the appropriate check to ensure
PTP_PEROUT_ONE_SHOT and any future flags are rejected as unsupported.

These are changes compile-tested, but I do not have hardware to validate the
behavior.

There are a number of other drivers which enable periodic output or
external timestamp requests, but which do not check flags at all. We could
go through each of these drivers one-by-one and meticulously add a flag
check. Instead, these drivers will be covered only by the upcoming
.supported_extts_flags and .supported_perout_flags checks in a net-next
series.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
====================

Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-0-ea930ba82459@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ptp: ocp: reject unsupported periodic output flags

The ptp_ocp_signal_from_perout() function supports PTP_PEROUT_DUTY_CYCLE
and PTP_PEROUT_PHASE. It does not support PTP_PEROUT_ONE_SHOT, but does not
reject a request with such an unsupported flag.

Add the appropriate check to ensure that unsupported requests are rejected
both for PTP_PEROUT_ONE_SHOT as well as any future flags.

Fixes: 1aa66a3a135a ("ptp: ocp: Program the signal generators via PTP_CLK_REQ_PEROUT")
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312-jk-net-fixes-supported-extts-flags-v2-5-ea930ba82459@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>