]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
5 weeks agonet/mlx5: Relax capability check for eswitch query paths
Moshe Shemesh [Wed, 6 May 2026 13:32:37 +0000 (16:32 +0300)] 
net/mlx5: Relax capability check for eswitch query paths

Several eswitch functions that only query other functions' HCA
capabilities or read cached vport state are guarded by the
vhca_resource_manager capability. This capability is required for
set_hca_cap operations but query_hca_cap of other functions only
requires the vport_group_manager capability.

Relax the capability check from vhca_resource_manager to
vport_group_manager in the following query-only paths:
- mlx5_esw_vport_caps_get() - queries other function general caps
- esw_ipsec_vf_query_generic() - queries other function ipsec cap
- mlx5_devlink_port_fn_migratable_get() - reads cached vport state
- mlx5_devlink_port_fn_roce_get() - reads cached vport state
- mlx5_devlink_port_fn_max_io_eqs_get() - queries other function caps
- mlx5_esw_vport_enable/disable() - vhca_id map/unmap

Functions that perform also set_hca_cap (migratable_set, roce_set,
max_io_eqs_set, esw_ipsec_vf_set_generic, esw_ipsec_vf_set_bytype)
retain the vhca_resource_manager requirement.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260506133239.276237-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoixgbe: E610: do not fill EEE lp_advertised from local PHY caps
David Carlier [Thu, 7 May 2026 23:42:42 +0000 (16:42 -0700)] 
ixgbe: E610: do not fill EEE lp_advertised from local PHY caps

ixgbe_get_eee_e610() fills kedata->lp_advertised from pcaps.eee_cap
returned by ixgbe_aci_get_phy_caps() with IXGBE_ACI_REPORT_ACTIVE_CFG.
That report mode (and the other IXGBE_ACI_REPORT_* modes) describe the
local PHY only, not the link partner. The X550 path uses a separate
FW_PHY_ACT_UD_2 activity for partner data; the E610 ACI has no
equivalent.

Leave lp_advertised zeroed via the existing linkmode_zero() and drop
the now-unused ixgbe_eee_cap_map[]. eee_active/eee_enabled are
unaffected (sourced from link.eee_status).

Fixes: b61dbdeff3a9 ("ixgbe: E610: add EEE support")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260507-jk-iwl-next-fix-eee-ixgbe-v1-1-62bc1d197d1d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agosctp: Fix typo in comment
Md Shofiqul Islam [Thu, 7 May 2026 10:57:58 +0000 (13:57 +0300)] 
sctp: Fix typo in comment

Fix a typo in a comment in sctp_endpoint_destroy(): "releated" should
be "related".

Signed-off-by: Md Shofiqul Islam <shofiqtest@gmail.com>
Link: https://patch.msgid.link/20260507105758.25728-1-shofiqtest@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-fix-protodown-with-macvlan'
Jakub Kicinski [Fri, 8 May 2026 23:50:29 +0000 (16:50 -0700)] 
Merge branch 'net-fix-protodown-with-macvlan'

Ido Schimmel says:

====================
net: Fix protodown with macvlan

When protodown is enabled on a macvlan, two bugs cause the macvlan to
incorrectly gain carrier:

1. Toggling the lower device's carrier while protodown is enabled on the
macvlan causes the macvlan to gain carrier, effectively bypassing the
protodown mechanism.

2. Toggling protodown on and then off on the macvlan while the lower
device has no carrier causes the macvlan to gain carrier, since
netif_change_proto_down() unconditionally turns the carrier on.

Patch #1 is a preparation.

Patch #2 solves the first problem by making netif_carrier_on() return
early when protodown is on.

Patch #3 solves the second problem by only calling netif_carrier_on()
when protodown is turned off if there is no linked net device or if the
linked net device has a carrier.

Patch #4 adds a selftest covering both bugs and the basic protodown
functionality.

Targeting at net-next since these are not regressions (i.e., never
worked).

Note that while these changes are in the core, they should only affect
macvlan as protodown is only supported by macvlan and vxlan and only
the former has a linked net device.
====================

Link: https://patch.msgid.link/20260507105906.891817-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: net: Add protodown tests
Ido Schimmel [Thu, 7 May 2026 10:59:06 +0000 (13:59 +0300)] 
selftests: net: Add protodown tests

Add a selftest for the protodown mechanism.

Five test cases are included:

1. Basic protodown toggling: Verify that setting protodown on macvlan
   results in DOWN operational state and clearing it restores UP.

2. Same as the previous test case, but with vxlan.

3. Protodown reasons: Verify that protodown cannot be cleared while
   there are active protodown reasons, but can be cleared once all
   reasons are removed.

4. Protodown with lower device being toggled: Verify that toggling the
   lower device's carrier while protodown is on does not cause the
   macvlan to gain carrier.

5. Protodown with lower device down: Verify that toggling protodown
   while the lower device has no carrier does not cause the macvlan to
   gain carrier.

Note that the last two test cases fail without "net: Do not turn on
carrier when protodown is on" and "net: Do not unconditionally turn on
carrier when turning off protodown":

 # ./protodown.sh
 TEST: Basic protodown on/off with macvlan                           [ OK ]
 TEST: Basic protodown on/off with vxlan                             [ OK ]
 TEST: Protodown reasons                                             [ OK ]
 TEST: Protodown with lower device toggled                           [FAIL]
         Macvlan operational state is not DOWN despite protodown
 TEST: Protodown with lower device down                              [FAIL]
         Macvlan is not LOWERLAYERDOWN after clearing protodown

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: Do not unconditionally turn on carrier when turning off protodown
Ido Schimmel [Thu, 7 May 2026 10:59:05 +0000 (13:59 +0300)] 
net: Do not unconditionally turn on carrier when turning off protodown

The protodown functionality allows user space to turn off the carrier of
a net device:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>

When protodown is turned off, the core unconditionally turns on the
carrier of the net device:

 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

This is wrong as it means that a macvlan can end up with a carrier when
its lower device does not have a carrier:

 # ip link set dev dummy1 carrier off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev macvlan1 protodown on
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Solve this by resolving the linked net device and if one exists, inherit
its carrier state when protodown is turned off. Otherwise, if no linked
net device exists, as before, simply turn on the carrier.

Resolve the linked net device using a new helper and have it return the
device itself (in a similar fashion to dev_get_iflink()) if the device
does not implement both ndo_get_iflink() and get_link_net(). If the
latter is not implemented, it is unclear in which network namespace we
should look up the linked net device. Currently, this helper is only
used for net devices that support protodown (macvlan and vxlan) and for
both it returns the correct result.

Output with the patch:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev dummy1 carrier off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev macvlan1 protodown on
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  LOWERLAYERDOWN 0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev dummy1 carrier on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>
 # ip link set dev macvlan1 protodown on
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: Do not turn on carrier when protodown is on
Ido Schimmel [Thu, 7 May 2026 10:59:04 +0000 (13:59 +0300)] 
net: Do not turn on carrier when protodown is on

The protodown functionality allows user space to turn off the carrier of
a net device:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>

Different applications can set different protodown reasons, which
prevents an application from turning on the carrier of a net device as
long as others want it down:

 # ip link set dev macvlan1 protodown_reason 1 on
 # ip link set dev macvlan1 protodown_reason 2 on
 # ip link set dev macvlan1 protodown off
 Error: Cannot clear protodown, active reasons.
 # ip link set dev macvlan1 protodown_reason 2 off
 # ip link set dev macvlan1 protodown off
 Error: Cannot clear protodown, active reasons.
 # ip link set dev macvlan1 protodown_reason 1 off
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Unfortunately, this mechanism is not very useful when the carrier of a
net device can be toggled by toggling the carrier of its lower device:

 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev dummy1 carrier off
 # ip link set dev dummy1 carrier on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Obviously, this is not the intended behavior and it is unlikely to be
relied on by anyone. In fact, it is a problem for applications like FRR
that use protodown with macvlan on top of a bridge as part of Virtual
Router Redundancy Protocol (VRRP).

Solve this by preventing a net device configured with protodown on from
gaining carrier by making netif_carrier_on() a NOP when protodown is
turned on.

Output with the patch:

 # ip link add name dummy1 up type dummy
 # ip link add name macvlan1 up link dummy1 type macvlan mode bridge
 # ip link set dev macvlan1 protodown on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev dummy1 carrier off
 # ip link set dev dummy1 carrier on
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  DOWN           0a:5c:a3:05:c7:86 <NO-CARRIER,BROADCAST,MULTICAST,UP>
 # ip link set dev macvlan1 protodown off
 $ ip -br link show dev macvlan1
 macvlan1@dummy1  UP             0a:5c:a3:05:c7:86 <BROADCAST,MULTICAST,UP,LOWER_UP>

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: Set dev->proto_down before changing carrier state
Ido Schimmel [Thu, 7 May 2026 10:59:03 +0000 (13:59 +0300)] 
net: Set dev->proto_down before changing carrier state

A subsequent patch will make netif_carrier_on() a NOP for net devices
that have protodown turned on so that they will not accidentally gain
carrier. As a preparation, set dev->proto_down before calling
netif_carrier_{off,on}().

Note that the only driver that supports protodown and has a notion of a
carrier is macvlan and it is calling netif_carrier_{off,on}() with RTNL
held.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260507105906.891817-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'keep-phy-link-during-wol-sleep-cycle'
Jakub Kicinski [Fri, 8 May 2026 23:39:00 +0000 (16:39 -0700)] 
Merge branch 'keep-phy-link-during-wol-sleep-cycle'

Justin Chen says:

====================
Keep PHY link during WoL sleep cycle

First we divide the init/deinit path to allow for a partial init/deinit
during a sleep cycle. We also remove some unnecessary small functions at
the same time.

Then we modify the suspend and resume path to allow for a partial bring
down and bring up. This allow us to keep the PHY link up and to resume
network traffic much quicker. Note we only do this when WoL is enabled
since the PHY is already powered. In the non-WoL case we want to follow
the same flow.
====================

Link: https://patch.msgid.link/20260506213114.2002886-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: bcmasp: Keep phy link during WoL sleep cycle
Justin Chen [Wed, 6 May 2026 21:31:14 +0000 (14:31 -0700)] 
net: bcmasp: Keep phy link during WoL sleep cycle

We currently more or less restart all the HW on resume. Since we also
stop the PHY, it takes a while for the PHY link to be re-negotiated on
resume. Instead of doing a full restart, we keep the HW state and the
PHY link, that way we can resume network traffic with a much smaller
delay.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260506213114.2002886-3-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: bcmasp: Divide init to allow partial bring up
Justin Chen [Wed, 6 May 2026 21:31:13 +0000 (14:31 -0700)] 
net: bcmasp: Divide init to allow partial bring up

To prepare for a partial bring up of the interface during resume,
we break apart the bcmasp_netif_init() function into smaller chunks
that can be called as necessary. Also consolidate some functions that
do not need to be standalone.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260506213114.2002886-2-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agodt-bindings: net: microchip: Add LAN7500 and LAN7505 devices
Thomas Richard [Wed, 6 May 2026 12:13:03 +0000 (14:13 +0200)] 
dt-bindings: net: microchip: Add LAN7500 and LAN7505 devices

Add bindings for LAN7500 and LAN7505 USB Ethernet Devices which are similar
to LAN9500.

Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260506-b4-var-som-om44-lan7500-v2-1-b8af59ab877c@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: dp83867: add MDI-X management
Luca Ellero [Wed, 6 May 2026 14:19:06 +0000 (16:19 +0200)] 
net: phy: dp83867: add MDI-X management

ethtool on this phy device always reports "MDI-X: Unknown" and doesn't
support forcing it to on or off.
This patch adds support for reading/forcing MDI-X mode from ethtool
properly.

Signed-off-by: Luca Ellero <l.ellero@asem.it>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260506141918.13136-1-l.ellero@asem.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agogve: Use generic power management
Vaibhav Gupta [Wed, 6 May 2026 16:50:06 +0000 (16:50 +0000)] 
gve: Use generic power management

Switch to the generic power management and remove the usage of legacy
(pci_driver) hooks.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506165015.641738-1-vaibhavgupta40@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agobnxt_en: Drop pci_save_state() after pci_restore_state()
Lukas Wunner [Thu, 7 May 2026 13:04:59 +0000 (15:04 +0200)] 
bnxt_en: Drop pci_save_state() after pci_restore_state()

Commit 383d89699c50 ("treewide: Drop pci_save_state() after
pci_restore_state()") sought to purge all superfluous invocations of
pci_save_state() from the tree.

Unfortunately the commit missed one invocation in the Broadcom
NetXtreme-C/E driver.  Drop it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/39de1b025928d9a457976010b2324e7e99baa92a.1778158755.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agodt-bindings: net: lan966x: Accept standard ethernet prefixes
Linus Walleij [Thu, 7 May 2026 09:26:01 +0000 (11:26 +0200)] 
dt-bindings: net: lan966x: Accept standard ethernet prefixes

The dsa.yaml and ethernet-switch.yaml bindings recommend
prefixing ethernet switches and ports with "ethernet-" so
make the LAN966x do the same.

Reported-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260507-lan966-binding-v1-1-e99293d2a4ec@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ethernet: atheros: atl2: remove kernel backward-compatibility code
Ethan Nelson-Moore [Wed, 6 May 2026 05:40:27 +0000 (22:40 -0700)] 
net: ethernet: atheros: atl2: remove kernel backward-compatibility code

The atl2 driver contains code for compatibility with old kernels that
do not support module_param_array. Backward compatibility is
irrelevant because this driver is in-tree. Remove this unreachable
code to simplify the driver's handling of module parameters.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260506054035.23710-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: net: add tests for filtered dumps of page pool
Jakub Kicinski [Wed, 6 May 2026 03:48:21 +0000 (20:48 -0700)] 
selftests: net: add tests for filtered dumps of page pool

Add tests for page pool dumps of a specific ifindex.

Link: https://patch.msgid.link/20260506034821.1710113-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: page_pool: support dumping pps of a specific ifindex via Netlink
Jakub Kicinski [Wed, 6 May 2026 03:48:20 +0000 (20:48 -0700)] 
net: page_pool: support dumping pps of a specific ifindex via Netlink

NIPA tries to make sure that HW tests don't modify system state.
It saves the state of page pools, too. Now that I write this commit
message I realize that this is impractical since page pool IDs and
state will get legitimately changed by the tests. But I already
spent a couple of hours implementing the filtering, so..

Link: https://patch.msgid.link/20260506034821.1710113-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 30 Apr 2026 19:49:56 +0000 (12:49 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.1-rc3).

Conflicts:

net/ipv4/igmp.c
  726fa7da2d8c ("ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation")
  c6bebaa744f7 ("ipv4: igmp: annotate data-races in igmp_heard_query()")
https://lore.kernel.org/a7365e4873340f7a5e30411207de3bf9@kernel.org

Adjacent changes:

net/psp/psp_main.c
  30cb24f97d44 ("psp: strip variable-length PSP header in psp_dev_rcv()")
  c2b22277ad89 ("psp: validate IPv4 header fields in psp_dev_rcv()")

net/sched/sch_fq_codel.c
  f83e07b29246 ("net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()")
  3f3aa77ff1c8 ("net/sched: add qstats_cpu_drop_inc() helper")

net/wireless/pmsr.c
  0f3c0a197309 ("wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage")
  410aa47fd9d3 ("wifi: cfg80211: allow suppressing FTM result reporting for PD requests")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 7 May 2026 17:32:03 +0000 (10:32 -0700)] 
Merge tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from Netfilter, IPsec, Bluetooth and WiFi.

  Current release - fix to a fix:

   - ipmr: add __rcu to netns_ipv4.mrt, make sure we hold the RCU lock
     in all relevant places

  Current release - new code bugs:

   - fixes for the recently added resizable hash tables

   - ipv6: make sure we default IPv6 tunnel drivers to =m now that IPv6
     itself is built in

   - drv: octeontx2-af: fixes for parser/CAM fixes

  Previous releases - regressions:

   - phy: micrel: fix LAN8814 QSGMII soft reset

   - wifi:
       - cw1200: revert "Fix locking in error paths"
       - ath12k: fix crash on WCN7850, due to adding the same queue
         buffer to a list multiple times

  Previous releases - always broken:

   - number of info leak fixes

   - ipv6: implement limits on extension header parsing

   - wifi: number of fixes for missing bound checks in the drivers

   - Bluetooth: fixes for races and locking issues

   - af_unix:
       - fix an issue between garbage collection and PEEK
       - fix yet another issue with OOB data

   - xfrm: esp: avoid in-place decrypt on shared skb frags

   - netfilter: replace skb_try_make_writable() by skb_ensure_writable()

   - openvswitch: vport: fix race between tunnel creation and linking
     leading to invalid memory accesses (type confusion)

   - drv: amd-xgbe: fix PTP addend overflow causing frozen clock

  Misc:

   - sched/isolation: make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
     (for relevant IPVS change)"

* tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (190 commits)
  net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
  net: sparx5: fix wrong chip ids for TSN SKUs
  net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
  tcp: Fix dst leak in tcp_v6_connect().
  ipmr: Call ipmr_fib_lookup() under RCU.
  net: phy: broadcom: Save PHY counters during suspend
  net/smc: fix missing sk_err when TCP handshake fails
  af_unix: Reject SIOCATMARK on non-stream sockets
  veth: fix OOB txq access in veth_poll() with asymmetric queue counts
  eth: fbnic: fix double-free of PCS on phylink creation failure
  net: ethernet: cortina: Drop half-assembled SKB
  selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl
  selftests: mptcp: check output: catch cmd errors
  mptcp: pm: prio: skip closed subflows
  mptcp: pm: ADD_ADDR rtx: return early if no retrans
  mptcp: pm: ADD_ADDR rtx: skip inactive subflows
  mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
  mptcp: pm: ADD_ADDR rtx: free sk if last
  mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
  mptcp: pm: ADD_ADDR rtx: fix potential data-race
  ...

5 weeks agonet: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
Daniel Machon [Wed, 6 May 2026 07:25:39 +0000 (09:25 +0200)] 
net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()

sparx5_port_init() only invokes sparx5_serdes_set() and the associated
shadow-device enable and low-speed device switch for SGMII and QSGMII.
On any port with a high-speed primary device (DEV5G/DEV10G/DEV25G)
configured for 1000BASE-X the serdes is therefore left uninitialized,
the DEV2G5 shadow is never enabled, and the port stays pointed at its
high-speed device rather than the DEV2G5. The PCS1G block looks
healthy in isolation, but no frames reach the link partner.

Add 1000BASE-X to the check so the same three steps run.

Note: the same issue might apply to 2500BASE-X, but that will,
eventually, be addressed in a separate commit.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 946e7fd5053a ("net: sparx5: add port module support")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-4-fb236aa96908@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: sparx5: fix wrong chip ids for TSN SKUs
Daniel Machon [Wed, 6 May 2026 07:25:38 +0000 (09:25 +0200)] 
net: sparx5: fix wrong chip ids for TSN SKUs

The TSN SKUs in enum spx5_target_chiptype have incorrect IDs:

  SPX5_TARGET_CT_7546TSN    = 0x47546,
  SPX5_TARGET_CT_7549TSN    = 0x47549,
  SPX5_TARGET_CT_7552TSN    = 0x47552,
  SPX5_TARGET_CT_7556TSN    = 0x47556,
  SPX5_TARGET_CT_7558TSN    = 0x47558,

The value read back from the chip is GCB_CHIP_ID_PART_ID, which is a
GENMASK(27, 12) field, i.e. at most 16 bits wide. It can never match
these IDs, so probing a TSN part fails with a "Target not supported"
error.

Fix the enum to use the actual 16-bit part IDs returned by the
hardware: 0x0546, 0x0549, 0x0552, 0x0556 and 0x0558.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 3cfa11bac9bb ("net: sparx5: add the basic sparx5 driver")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20260506-misc-fixes-sparx5-lan969x-v2-3-fb236aa96908@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 7 May 2026 15:55:15 +0000 (08:55 -0700)] 
Merge tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Again a collection of small fixes, mostly for device-specific ones.

  The only big LOC is about the removal of pretty old dead code in
  ab8500 codec driver, while the rest all nice small changes.

  Core / API:
   - Fix race in deferred fasync state checks
   - Fix UMP group filtering in sequencer

  ASoC:
   - cs35l56: fixes for driver cleanup and error paths
   - tas2764/2770: workaround for bogus temperature readings
   - wm_adsp: fixes for firmware unit tests
   - amd-yc: more DMI quirks for laptops
   - Minor fixes for fsl_xcvr and spacemit

  HD-Audio:
   - Mute LED and speaker quirks for HP, Lenovo, and Xiaomi laptops

  USB-audio:
   - New device-specific quirks (Motu, JBL, AlphaTheta, Razer)
   - Fix of MIDI2 playback on resume

  Others:
   - Firewire-tascam control event fix
   - Minor cleanups and fixes for sparc/dbri and pcmtest"

* tag 'sound-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
  ASoC: cs35l56: Destroy workqueue in probe error path
  ASoC: cs35l56: Don't use devres to unregister component
  ALSA: sparc/dbri: add missing fallthrough
  ALSA: core: Serialize deferred fasync state checks
  ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx
  ALSA: seq: Fix UMP group 16 filtering
  ASoC: wm_adsp_fw_find_test: Clear searched_fw_files in find-by-index test
  ASoC: wm_adsp_fw_find_test: Redirect wm_adsp_release_firmware_files()
  ASoC: tas2770: Deal with bogus initial temperature value
  ASoC: tas2764: Deal with bogus initial temperature register value
  ALSA: usb-audio: add clock quirk for Motu 1248
  ALSA: usb-audio: midi2: Restart output URBs on resume
  ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP Envy X360 15-fh0xxx
  ALSA: usb-audio: Add quirk flags for JBL Pebbles
  ALSA: firewire-tascam: Do not drop unread control events
  ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA
  ASoC: fsl_xcvr: Fix event generation for cached controls
  ASoC: sdw_utils: avoid the SDCA companion function not supported failure
  ASoC: amd: yc: Add HP OMEN Gaming Laptop 16-ap0xxx product line in quirk table
  ASoC: cs35l56: Fix out-of-bounds in dev_err() in cs35l56_read_onchip_spkid()
  ...

5 weeks agoMerge tag 'platform-drivers-x86-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 7 May 2026 15:46:27 +0000 (08:46 -0700)] 
Merge tag 'platform-drivers-x86-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

 - Silence unknown board warning for 8D41 (hp-wmi)

 - Fix uninitialized variable in fan RPM handling (lenovo/wmi-other)

 - Check min_size also when ACPI does not return an out object (wmi)

* tag 'platform-drivers-x86-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: lenovo: wmi-other: Fix uninitialized variable in lwmi_om_hwmon_write()
  platform/x86: hp-wmi: silence unknown board warning for 8D41
  platform/wmi: Fix unchecked min_size in wmidev_invoke_method()

5 weeks agoMerge tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh...
Linus Torvalds [Thu, 7 May 2026 15:43:25 +0000 (08:43 -0700)] 
Merge tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:

 - Fix detach procedure for virtual devices in genpd

 - mediatek: Fix use-after-free in scpsys_get_bus_protection_legacy()

* tag 'pmdomain-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: mediatek: fix use-after-free in scpsys_get_bus_protection_legacy()
  pmdomain: core: Fix detach procedure for virtual devices in genpd

5 weeks agonet: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
Joey Lu [Wed, 6 May 2026 08:46:13 +0000 (16:46 +0800)] 
net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()

priv->dev was never initialized after devm_kzalloc() allocates the
private data structure. When nvt_set_phy_intf_sel() is later invoked
via the phylink interface_select callback, it calls
nvt_gmac_get_delay(priv->dev, ...) which dereferences the NULL pointer.

Fix this by assigning priv->dev = dev immediately after allocation.

Fixes: 4d7c557f58ef ("net: stmmac: dwmac-nuvoton: Add dwmac glue for Nuvoton MA35 family")
Signed-off-by: Joey Lu <a0987203069@gmail.com>
Link: https://patch.msgid.link/20260506084614.192894-2-a0987203069@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Fix dst leak in tcp_v6_connect().
Kuniyuki Iwashima [Wed, 6 May 2026 07:04:42 +0000 (07:04 +0000)] 
tcp: Fix dst leak in tcp_v6_connect().

If a socket is bound to a wildcard address, tcp_v[46]_connect()
updates it with a non-wildcard address based on the route lookup.

After bhash2 was introduced in the cited commit, we must call
inet_bhash2_update_saddr() to update the bhash2 entry as well.

If inet_bhash2_update_saddr() fails, we must release the refcount
for dst by ip_route_connect() or ip6_dst_lookup_flow().

While tcp_v4_connect() calls ip_rt_put() in the error path,
tcp_v6_connect() does not call dst_release().

Let's call dst_release() when inet_bhash2_update_saddr() fails
in tcp_v6_connect().

Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260506070443.1699879-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoipmr: Call ipmr_fib_lookup() under RCU.
Kuniyuki Iwashima [Wed, 6 May 2026 06:59:53 +0000 (06:59 +0000)] 
ipmr: Call ipmr_fib_lookup() under RCU.

Yi Lai reported RCU splat in reg_vif_xmit() below. [0]

When CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup()
uses rcu_dereference() without explicit rcu_read_lock().

Although rcu_read_lock_bh() is already held by the caller
__dev_queue_xmit(), lockdep requires explicit rcu_read_lock()
for rcu_dereference().

Let's move up rcu_read_lock() in reg_vif_xmit() to
cover ipmr_fib_lookup().

[0]:
WARNING: suspicious RCU usage
7.1.0-rc2-next-20260504-9d0d467c3572 #1 Not tainted
 -----------------------------
net/ipv4/ipmr.c:329 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz.2.17/1779:
 #0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
 #0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: rcu_read_lock_bh include/linux/rcupdate.h:891 [inline]
 #0: ffffffff87896440 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x239/0x4140 net/core/dev.c:4792
 #1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: spin_lock include/linux/spinlock.h:342 [inline]
 #1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __netif_tx_lock include/linux/netdevice.h:4795 [inline]
 #1: ffff88801a199d18 (_xmit_PIMREG#2){+...}-{3:3}, at: __dev_queue_xmit+0x1d5d/0x4140 net/core/dev.c:4865

stack backtrace:
CPU: 1 UID: 0 PID: 1779 Comm: syz.2.17 Not tainted 7.1.0-rc2-next-20260504-9d0d467c3572 #1 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x121/0x150 lib/dump_stack.c:120
 dump_stack+0x19/0x20 lib/dump_stack.c:129
 lockdep_rcu_suspicious+0x15b/0x1f0 kernel/locking/lockdep.c:6878
 ipmr_fib_lookup net/ipv4/ipmr.c:329 [inline]
 reg_vif_xmit+0x2ee/0x3c0 net/ipv4/ipmr.c:540
 __netdev_start_xmit include/linux/netdevice.h:5382 [inline]
 netdev_start_xmit include/linux/netdevice.h:5391 [inline]
 xmit_one net/core/dev.c:3889 [inline]
 dev_hard_start_xmit+0x170/0x700 net/core/dev.c:3905
 __dev_queue_xmit+0x1df1/0x4140 net/core/dev.c:4871
 dev_queue_xmit include/linux/netdevice.h:3423 [inline]
 packet_xmit+0x252/0x370 net/packet/af_packet.c:276
 packet_snd net/packet/af_packet.c:3082 [inline]
 packet_sendmsg+0x39ad/0x5650 net/packet/af_packet.c:3114
 sock_sendmsg_nosec net/socket.c:797 [inline]
 __sock_sendmsg net/socket.c:812 [inline]
 ____sys_sendmsg+0xa21/0xba0 net/socket.c:2716
 ___sys_sendmsg+0x121/0x1c0 net/socket.c:2770
 __sys_sendmsg+0x177/0x220 net/socket.c:2802
 __do_sys_sendmsg net/socket.c:2807 [inline]
 __se_sys_sendmsg net/socket.c:2805 [inline]
 __x64_sys_sendmsg+0x80/0xc0 net/socket.c:2805
 x64_sys_call+0x1d9c/0x21c0 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc1/0x1020 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f37e563ee5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe5caa7fa8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000005c5fa0 RCX: 00007f37e563ee5d
RDX: 0000000000000000 RSI: 00002000000012c0 RDI: 0000000000000004
RBP: 00000000005c5fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000005c5fac R15: 00000000005c5fa0
 </TASK>

Fixes: b3b6babf4751 ("ipmr: Free mr_table after RCU grace period.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Yi Lai <yi1.lai@intel.com>
Closes: https://lore.kernel.org/netdev/afrY34dLXNUboevf@ly-workstation/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260506065955.1695753-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: broadcom: Save PHY counters during suspend
Justin Chen [Tue, 5 May 2026 17:39:26 +0000 (10:39 -0700)] 
net: phy: broadcom: Save PHY counters during suspend

The PHY counters can be lost if the PHY is reset during suspend. We
need to save the values into the shadow counters or the accounting
will be incorrect over multiple suspend and resume cycles.

Fixes: 820ee17b8d3b ("net: phy: broadcom: Add support code for reading PHY counters")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260505173926.2870069-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/smc: fix missing sk_err when TCP handshake fails
D. Wythe [Wed, 6 May 2026 01:41:05 +0000 (09:41 +0800)] 
net/smc: fix missing sk_err when TCP handshake fails

In smc_connect_work(), when the underlying TCP handshake fails, the error
code (rc) must be propagated to sk_err to ensure userspace can correctly
retrieve the error status via SO_ERROR. Currently, the code only handles
a restricted set of error codes (e.g., EPIPE, ECONNREFUSED). If other
errors occurs, such as EHOSTUNREACH, sk_err remains unset (zero).

This affects applications that rely on SO_ERROR to determine connect
outcome. For example, higher versions of Go's netpoller treats
SO_ERROR == 0 combined with a failed getpeername() as a spurious wakeup
and re-enters epoll_wait(). Under ET mode, no further edge will be
generated since the socket is already in a terminal state, causing the
connect to hang indefinitely or until a user-specified timeout, if one
is set.

Fixes: 50717a37db03 ("net/smc: nonblocking connect rework")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20260506014105.27093-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoaf_unix: Reject SIOCATMARK on non-stream sockets
Jiexun Wang [Wed, 6 May 2026 14:08:23 +0000 (22:08 +0800)] 
af_unix: Reject SIOCATMARK on non-stream sockets

SIOCATMARK reports whether the receive queue is at the urgent mark for
MSG_OOB.

In AF_UNIX, MSG_OOB is supported only for SOCK_STREAM sockets.
SOCK_DGRAM and SOCK_SEQPACKET reject MSG_OOB in sendmsg() and recvmsg(),
so they should not support SIOCATMARK either.

Return -EOPNOTSUPP for non-stream sockets before checking the receive
queue.

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260506140825.2987635-1-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoveth: fix OOB txq access in veth_poll() with asymmetric queue counts
Jesper Dangaard Brouer [Tue, 5 May 2026 13:21:53 +0000 (15:21 +0200)] 
veth: fix OOB txq access in veth_poll() with asymmetric queue counts

XDP redirect into a veth device (via bpf_redirect()) calls
veth_xdp_xmit(), which enqueues frames into the peer's ptr_ring using
  smp_processor_id() % peer->real_num_rx_queues
as the ring index.  With an asymmetric veth pair where the peer has
fewer TX queues than RX queues, that index can exceed
peer->real_num_tx_queues.

veth_poll() then resolves peer_txq for the ring via:

  peer_txq = peer_dev ? netdev_get_tx_queue(peer_dev, queue_idx) : NULL;

where queue_idx = rq->xdp_rxq.queue_index.  When queue_idx exceeds
peer_dev->real_num_tx_queues this is an out-of-bounds (OOB) access
into the peer's netdev_queue array, triggering DEBUG_NET_WARN_ON_ONCE
in netdev_get_tx_queue().

The normal ndo_start_xmit path is not affected: the stack clamps
skb->queue_mapping via netdev_cap_txqueue() before invoking
ndo_start_xmit, so rxq in veth_xmit() never exceeds real_num_tx_queues.

Fix veth_poll() by clamping: only dereference peer_txq when queue_idx is
within bounds, otherwise set it to NULL.  The out-of-range rings are fed
exclusively via XDP redirect (veth_xdp_xmit), never via ndo_start_xmit
(veth_xmit), so the peer txq was never stopped and there is nothing to
wake; NULL is the correct fallback.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/all/20260502071828.616C3C19425@smtp.kernel.org/
Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20260505132159.241305-2-hawk@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoeth: fbnic: fix double-free of PCS on phylink creation failure
Bobby Eshleman [Tue, 5 May 2026 01:42:11 +0000 (18:42 -0700)] 
eth: fbnic: fix double-free of PCS on phylink creation failure

fbnic_phylink_create() stores the newly allocated PCS in fbn->pcs and
then calls phylink_create(). When phylink_create() fails, the error path
correctly destroys the PCS via xpcs_destroy_pcs(), but the caller,
fbnic_netdev_alloc(), responds by invoking fbnic_netdev_free() which
calls fbnic_phylink_destroy(). That function finds fbn->pcs non-NULL and
calls xpcs_destroy_pcs() a second time on the already-freed object,
triggering a refcount underflow use-after-free:

[   1.934973] fbnic 0000:01:00.0: Failed to create Phylink interface, err: -22
[   1.935103] ------------[ cut here ]------------
[   1.935179] refcount_t: underflow; use-after-free.
[   1.935252] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x59/0x90, CPU#0: swapper/0/1
[   1.935389] Modules linked in:
[   1.935484] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-virtme-04244-g1f5ffc672165-dirty #1 PREEMPT(lazy)
[   1.935661] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   1.935826] RIP: 0010:refcount_warn_saturate+0x59/0x90
[   1.935931] Code: 44 48 8d 3d 49 f9 a7 01 67 48 0f b9 3a e9 bf 1e 96 00 48 8d 3d 48 f9 a7 01 67 48 0f b9 3a c3 cc cc cc cc 48 8d 3d 47 f9 a7 01 <67> 48 0f b9 3a c3 cc cc cc cc 48 8d 3d 46 f9 a7 01 67 48 0f b9 3a
[   1.936274] RSP: 0000:ffffd0d440013c58 EFLAGS: 00010246
[   1.936376] RAX: 0000000000000000 RBX: ffff8f39c188c278 RCX: 000000000000002b
[   1.936524] RDX: ffff8f39c004f000 RSI: 0000000000000003 RDI: ffffffff96abab00
[   1.936692] RBP: ffff8f39c188c240 R08: ffffffff96988e88 R09: 00000000ffffdfff
[   1.936835] R10: ffffffff96878ea0 R11: 0000000000000187 R12: 0000000000000000
[   1.936970] R13: ffff8f39c0cef0c8 R14: ffff8f39c1ac01c0 R15: 0000000000000000
[   1.937114] FS:  0000000000000000(0000) GS:ffff8f3ba08b4000(0000) knlGS:0000000000000000
[   1.937273] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   1.937382] CR2: ffff8f3b3ffff000 CR3: 0000000172642001 CR4: 0000000000372ef0
[   1.937540] Call Trace:
[   1.937619]  <TASK>
[   1.937698]  xpcs_destroy_pcs+0x25/0x40
[   1.937783]  fbnic_netdev_alloc+0x1e5/0x200
[   1.937859]  fbnic_probe+0x230/0x370
[   1.937939]  local_pci_probe+0x3e/0x90
[   1.938013]  pci_device_probe+0xbb/0x1e0
[   1.938091]  ? sysfs_do_create_link_sd+0x6d/0xe0
[   1.938188]  really_probe+0xc1/0x2b0
[   1.938282]  __driver_probe_device+0x73/0x120
[   1.938371]  driver_probe_device+0x1e/0xe0
[   1.938466]  __driver_attach+0x8d/0x190
[   1.938560]  ? __pfx___driver_attach+0x10/0x10
[   1.938663]  bus_for_each_dev+0x7b/0xd0
[   1.938758]  bus_add_driver+0xe8/0x210
[   1.938854]  driver_register+0x60/0x120
[   1.938929]  ? __pfx_fbnic_init_module+0x10/0x10
[   1.939026]  fbnic_init_module+0x25/0x60
[   1.939109]  do_one_initcall+0x49/0x220
[   1.939202]  ? rdinit_setup+0x20/0x40
[   1.939304]  kernel_init_freeable+0x1b0/0x310
[   1.939449]  ? __pfx_kernel_init+0x10/0x10
[   1.939560]  kernel_init+0x1a/0x1c0
[   1.939640]  ret_from_fork+0x1ed/0x240
[   1.939730]  ? __pfx_kernel_init+0x10/0x10
[   1.939805]  ret_from_fork_asm+0x1a/0x30
[   1.939886]  </TASK>
[   1.939927] ---[ end trace 0000000000000000 ]---
[   1.940184] fbnic 0000:01:00.0: Netdev allocation failed

Instead of calling fbnic_phylink_destroy(), the prior initialization of
netdev should just be unrolled with free_netdev() and clearing
fbd->netdev.

Clearing fbd->netdev to NULL avoids UAF in init_failure_mode where
callers guard by checking !fbd->netdev, such as fbnic_mdio_read_pmd().
These callers remain active even after a failed probe, so fdb->netdev
still needs to be cleared.

Fixes: d0fe7104c795 ("fbnic: Replace use of internal PCS w/ Designware XPCS")
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260504-fbnic-pcs-fix-v2-1-de45192821d9@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoMerge tag 'v7.1-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 7 May 2026 05:02:28 +0000 (22:02 -0700)] 
Merge tag 'v7.1-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Fix memory leak in connection free

 - Fix inherited ACL ACE validation

 - Minor cleanup

 - Fix for share config

 - Fix durable handle cleanup race

 - Fix close_file_table_ids in session teardown

 - smbdirect fixes:
    - Fix memory region registration
    - Two fixes for out-of-tree builds

* tag 'v7.1-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: validate inherited ACE SID length
  ksmbd: fix kernel-doc warnings from ksmbd_conn_get/put()
  ksmbd: fail share config requests when path allocation fails
  ksmbd: close durable scavenger races against m_fp_list lookups
  ksmbd: harden file lifetime during session teardown
  ksmbd: centralize ksmbd_conn final release to plug transport leak
  smb: smbdirect: fix MR registration for coalesced SG lists
  smb: smbdirect: introduce and use include/linux/smbdirect.h
  smb: smbdirect: make use of DEFAULT_SYMBOL_NAMESPACE and EXPORT_SYMBOL_GPL

5 weeks agoMerge tag 'chrome-platform-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 7 May 2026 03:44:03 +0000 (20:44 -0700)] 
Merge tag 'chrome-platform-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome-platform fix from Tzung-Bi Shih:

 - Fix a NULL dereference in cros_ec_typec

* tag 'chrome-platform-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_typec: Init mutex in Thunderbolt registration

5 weeks agow5100: remove unused gpio link detection
Arnd Bergmann [Tue, 5 May 2026 18:04:59 +0000 (20:04 +0200)] 
w5100: remove unused gpio link detection

Since the platform_device support is now gone, nothing ever passes a
valid gpio number, and all the link state handling can go away.

An earlier version of my patch changed this to look up the GPIO descriptor
from devicetree and convert it all to the modern interface, but there
are no users of that binding at the moment.

Remove the gpio handling, which is now one of the last users of the
legacy gpio interface in platform-independent code.

Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20230127095839.3266452-1-arnd@kernel.org/
Link: https://patch.msgid.link/20260505180459.1247690-3-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agow5300: remove unused driver
Arnd Bergmann [Tue, 5 May 2026 18:04:58 +0000 (20:04 +0200)] 
w5300: remove unused driver

Unlike w5100, this driver does not support SPI mode or devicetree
bindings, and is hence entirely unusable without third-party board
support patches that likely haven't existed for any recent kernel
version.

Remove the entire driver.

If anyone is in fact using it with their custom board files, they
can bring it back and include an earlier patch I sent to add
DT based probing for the GPIO lines.

Link: https://lore.kernel.org/all/20260427142924.2702598-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260505180459.1247690-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agow5100: remove MMIO support
Arnd Bergmann [Tue, 5 May 2026 18:04:57 +0000 (20:04 +0200)] 
w5100: remove MMIO support

This driver supports both SPI and MMIO based register access, but only
the former has devicetree support. While MMIO mode would have worked
with old-style board files, those have never defined such a device
upstream.

Remove the MMIO mode, leaving SPI as the only way to use this driver,
but leave it in two loadable modules. More cleanups can be done by
combining the two into one file.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260505180459.1247690-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-mlx5-improve-representor-lifecycle-and-late-ib-representor-loading'
Jakub Kicinski [Thu, 7 May 2026 02:03:39 +0000 (19:03 -0700)] 
Merge branch 'net-mlx5-improve-representor-lifecycle-and-late-ib-representor-loading'

Tariq Toukan says:

====================
net/mlx5: Improve representor lifecycle and late IB representor loading

This series addresses two problems that have been present for years, and
fixes one representor reload error-unwind case exposed while making the
reload path reusable.

First, there is no coordination between E-Switch reconfiguration and
representor registration. The E-Switch can be mid-way through a mode
change or VF count update while mlx5_ib walks in and registers or
unregisters representors. Nothing stops them. The race window is small
and there is no field report, but it is clearly wrong.

Second, loading mlx5_ib while the device is already in switchdev mode
does not bring up the IB representors. mlx5_eswitch_register_vport_reps()
only stores callbacks; nobody triggers the actual load after registration.

The series fixes the registration race with a per-E-Switch representor
mutex. The lock is introduced first, then LAG shared-FDB and multiport
E-Switch transitions are adjusted so auxiliary device rescans and IB
representor reloads do not hold ldev->lock while taking the representor
lock. This keeps the intermediate commits bisectable before the stricter
E-Switch serialization and lock assertions are enabled.

After the LAG ordering is fixed, all E-Switch reconfiguration paths that
create, destroy, load, or unload representors take the representor mutex.
esw_mode_change() deliberately drops the mutex around
mlx5_rescan_drivers_locked(), because auxiliary probe and remove paths
re-enter mlx5_eswitch_register_vport_reps() and
mlx5_eswitch_unregister_vport_reps() on the same thread.

The shared-FDB peer IB registration path can hold one E-Switch
representor mutex and then register peer representor ops on another
E-Switch. The series annotates that case as nested locking so lockdep can
distinguish it from recursive locking on the same E-Switch.

For the missing IB representors, mlx5_eswitch_register_vport_reps() queues
a work item that acquires the devlink lock and loads all relevant
representors. This is the change that actually fixes the long-standing
bug.

The reload path also learns to track which representor types were loaded by
the current attempt, so an error does not unload representors that were
already active before the retry.

Patch 1 is cleanup. LAG and MPESW had the same representor reload
sequence duplicated in several places and the copies had started to
drift. This consolidates them into one helper.

Patch 2 lets E-Switch workqueue callers choose GFP allocation flags.

Patch 3 adds the per-E-Switch representor lifecycle lock and helper APIs.

Patch 4 adjusts the LAG shared-FDB and multiport E-Switch transitions so
auxiliary device rescans and IB representor reloads run without
ldev->lock held while taking the representor lock.

Patch 5 protects the E-Switch reconfiguration, representor registration
and peer IB representor paths with the representor lock.

Patch 6 fixes representor load error unwind so only representor types
loaded by the current attempt are unloaded on failure.

Patch 7 moves the representor load triggered by
mlx5_eswitch_register_vport_reps() onto the work queue. This is the patch
that fixes IB representors not coming up when mlx5_ib is loaded while the
device is already in switchdev mode.
====================

Link: https://patch.msgid.link/20260503202726.266415-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: E-Switch, load reps via work queue after registration
Mark Bloch [Sun, 3 May 2026 20:27:26 +0000 (23:27 +0300)] 
net/mlx5: E-Switch, load reps via work queue after registration

mlx5_eswitch_register_vport_reps() only installs representor callbacks and
marks the rep type as registered. If the E-Switch is already in switchdev
mode, the newly registered rep type must then be loaded for already enabled
vports.

That load path needs to run under the devlink lock, which is not held by
the auxiliary driver registration context. Queue the reload to the E-Switch
workqueue, whose handler acquires the devlink lock, and load the relevant
representors from there.

Since representor registration runs from sleepable auxiliary-driver
context, queue the late reload with GFP_KERNEL. The functions-change
notifier path remains the GFP_ATOMIC user of mlx5_esw_add_work().

The unregister path is unchanged and still unloads representors
synchronously while tearing down the registered callbacks.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: E-Switch, unwind only newly loaded representor types
Mark Bloch [Sun, 3 May 2026 20:27:25 +0000 (23:27 +0300)] 
net/mlx5: E-Switch, unwind only newly loaded representor types

__esw_offloads_load_rep() may return success without invoking the
representor load callback when the representor type is already loaded.

On a later load failure, mlx5_esw_offloads_rep_load() unconditionally
unloaded all previously iterated representor types. This could unload
representor types that were already loaded before this load attempt.

Track which representor types were actually loaded by the current call and
unwind only those on error. Also restore the representor state back to
REP_REGISTERED when the load callback itself fails.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: E-Switch, serialize representor lifecycle
Mark Bloch [Sun, 3 May 2026 20:27:24 +0000 (23:27 +0300)] 
net/mlx5: E-Switch, serialize representor lifecycle

Representor callbacks can be registered and unregistered while the
E-Switch is already in switchdev mode, and the same E-Switch may also be
reconfigured by devlink, VF changes and SF changes. Serialize these paths
with the per-E-Switch representor mutex instead of relying on ad-hoc bit
state and wait queues.

Take the representor lock around the mode transition, VF/SF representor
changes and representor ops registration. Keep mode_lock and the
representor lock unnested by using the operation flag while the mode lock
is dropped. During mode changes, drop the representor lock around the
auxiliary bus rescan because driver bind/unbind may register or unregister
representor ops.

Split representor ops registration into locked public wrappers and blocked
internal helpers, clear the ops pointer on unregister, and add nested
wrappers for the shared-FDB master IB path that registers peer
representor ops while another E-Switch representor lock is already held.

On unregister, always call __unload_reps_all_vport() before marking reps
unregistered and clearing rep_ops. The per-representor state check makes
this a no-op for types that were not loaded, so unregister no longer has
to infer load state from esw->mode.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: Lag, avoid LAG and representor lock cycles
Mark Bloch [Sun, 3 May 2026 20:27:23 +0000 (23:27 +0300)] 
net/mlx5: Lag, avoid LAG and representor lock cycles

The LAG shared-FDB and multiport E-Switch transitions rescan auxiliary
devices and reload IB representors while holding ldev->lock. Driver
bind/unbind paths may register or unregister E-Switch representor ops, and
representor load paths may enter LAG code, so holding ldev->lock across
those calls creates lock-order cycles with the E-Switch representor lock.

Keep the devcom component locked for the transition, but drop ldev->lock
before rescanning auxiliary devices or reloading IB representors. Mark the
LAG transition as in progress while the lock is dropped and assert the
devcom lock where the helper relies on it. This preserves LAG serialization
while avoiding ldev->lock nesting under E-Switch representor registration.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: E-Switch, add representor lifecycle lock
Mark Bloch [Sun, 3 May 2026 20:27:22 +0000 (23:27 +0300)] 
net/mlx5: E-Switch, add representor lifecycle lock

Add a per-E-Switch mutex for serializing representor lifecycle work and
provide small helpers for taking and dropping it. Initialize and destroy
the mutex with the E-Switch offloads state.

Add the lock and helper API first. Follow-up patches will take the lock in
the individual representor lifecycle components. This keeps the functional
changes split by component and leaves this patch without intended behavior
change, making the series easier to review and bisectable.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: E-Switch, let esw work callers choose GFP flags
Mark Bloch [Sun, 3 May 2026 20:27:21 +0000 (23:27 +0300)] 
net/mlx5: E-Switch, let esw work callers choose GFP flags

mlx5_esw_add_work() always allocates the queued work item with
GFP_ATOMIC. That is required for the E-Switch functions-change notifier,
but not every caller of this helper will run from atomic context.

Pass an allocation flag to mlx5_esw_add_work() and keep the notifier
caller using GFP_ATOMIC. This allows sleepable callers to use GFP_KERNEL
instead of unnecessarily relying on atomic reserves.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: Lag: refactor representor reload handling
Mark Bloch [Sun, 3 May 2026 20:27:20 +0000 (23:27 +0300)] 
net/mlx5: Lag: refactor representor reload handling

Representor reload during LAG/MPESW transitions has to be repeated in
several flows, and each open-coded loop was easy to get out of sync
when adding new flags or tweaking error handling. Move the sequencing
into a single helper so that all call sites share the same ordering
and checks.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'r8152-add-support-for-the-rtl8159-10gbit-usb-ethernet-chip'
Jakub Kicinski [Thu, 7 May 2026 01:54:17 +0000 (18:54 -0700)] 
Merge branch 'r8152-add-support-for-the-rtl8159-10gbit-usb-ethernet-chip'

Birger Koblitz says:

====================
r8152: Add support for the RTL8159 10Gbit USB Ethernet chip

Add support for the RTL8159, which is a 10GBit USB-Ethernet adapter
chip in the RTL815x family of chips.

The RTL8159 re-uses the frame descriptor format and SRAM2 access introduced
with the RTL8157 as well as most of the setup and PM logic of the RTL8157.

The module was tested with a Lekuo DR59R11 USB-C 10GbE Ethernet Adapter:
[ 2502.906947] usb 2-1: new SuperSpeed USB device number 3 using xhci_hcd
[ 2502.927859] usb 2-1: New USB device found, idVendor=0bda, idProduct=815a, bcdDevice=30.00
[ 2502.927867] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=7
[ 2502.927871] usb 2-1: Product: USB 10/100/1G/2.5G/5G/10G LAN
[ 2502.927873] usb 2-1: Manufacturer: Realtek
[ 2502.927875] usb 2-1: SerialNumber: 000388C9B3B5XXXX
[ 2503.063745] r8152-cfgselector 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
[ 2503.123876] r8152 2-1:1.0: Requesting firmware: rtl_nic/rtl8159-1.fw
[ 2503.126267] r8152 2-1:1.0: PHY firmware installed 0 to be loaded: 20
[ 2503.156265] r8152 2-1:1.0: load rtl8159-1 v1 2026/01/01 successfully
[ 2503.270729] r8152 2-1:1.0 eth0: v1.12.13
[ 2503.289349] r8152 2-1:1.0 enx88c9b3b5xxxx: renamed from eth0
[ 2507.777055] r8152 2-1:1.0 enx88c9b3b5xxxx: carrier on

The RTL8159 adapter was tested against an AQC107 PCIe-card supporting
10GBit/s and an RTL8157 5Gbit USB-Ethernet adapter supporting 5GBit/s for
performance, link speed and EEE negotiation. Using USB3.2 Gen 2 (20GBit) with
the RTL8159 USB adapter and running iperf3 against the AQC107 PCIe
card resulted in 8.96 Gbits/sec transfer speed.

The code is based on the out-of-tree r8152 driver published by Realtek under
the GPL.

The RTL8159 requires firmware for the PHY in order to achieve a 10GBit link
speed. Without firmware, only 5GBit were achieved. The firmware can be
extracted from the out-of-tree r8152 driver-code where it is stored in the
ram17 u8-array. Code is added to use the existing firmware upload mechanism
of the driver for the RTL8157/9 PHY firmware code. The firmware will be
submitted separately to linux-firmware.
====================

Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-0-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agor8152: Add firmware upload capability for RTL8157/RTL8159
Birger Koblitz [Tue, 5 May 2026 15:56:35 +0000 (17:56 +0200)] 
r8152: Add firmware upload capability for RTL8157/RTL8159

The RTL8159 (RTL_VER_17) requires firmware for its PHY in order to work
at connection speeds > 5GBit. Add support for uploading firmware for
the PHY using the existing rtl8152_apply_firmware() function
in r8157_hw_phy_cfg() and set up the correct names for the firmware
files.

This also adds support for uploading firmware for the RTL8157
(RTL_VER_16) PHY, for which firmware is however not strictly necessary
to work. Still, this allows to upload newer versions of the firmware used
by this chip, e.g. to improve interoperability.

If no firmware is found, both the RTL8157 and the RTL8159 will continue
to work.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Tested-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-3-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agor8152: Add support for the RTL8159 chip
Birger Koblitz [Tue, 5 May 2026 15:56:34 +0000 (17:56 +0200)] 
r8152: Add support for the RTL8159 chip

The RTL8159 re-uses the packet descriptor format introduced with the
RTL8157 and other hardware features of the RTL8157 (RTL_VER_16) such
as the SRAM access. The support therefore consists in expanding the
existing RTL8157 code for initialization and USB power management
to also be used for the RTL8159 (RTL_VER_17).

Most of the additional code is added in r8157_hw_phy_cfg() to configure
the RTL8159 PHY.

Add support for the USB device ID of Realtek RTL8159-based adapters,
for which the product ID is 0x815a. Detect the RTL8159 as RTL_VER_17
and set it up.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Tested-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-2-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agor8152: Add support for 10Gbit Link Speeds and EEE
Birger Koblitz [Tue, 5 May 2026 15:56:33 +0000 (17:56 +0200)] 
r8152: Add support for 10Gbit Link Speeds and EEE

The RTL8159 supports 10GBit Link speeds. Add support for this speed
in the setup and setting/getting through ethtool. Also add 10GBit EEE.
Add functionality for setup and ethtool get/set methods.

Signed-off-by: Birger Koblitz <mail@birger-koblitz.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20260505-rtl8159_net_next-v4-1-1a648a9c4d8d@birger-koblitz.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ethernet: cortina: Drop half-assembled SKB
Andreas Haarmann-Thiemann [Tue, 5 May 2026 21:52:17 +0000 (23:52 +0200)] 
net: ethernet: cortina: Drop half-assembled SKB

In gmac_rx() (drivers/net/ethernet/cortina/gemini.c), when
gmac_get_queue_page() returns NULL for the second page of a multi-page
fragment, the driver logs an error and continues â€” but does not free the
partially assembled skb that was being assembled via napi_build_skb() /
napi_get_frags().

Free the in-progress partially assembled skb via napi_free_frags()
and increase the number of dropped frames appropriately
and assign the skb pointer NULL to make sure it is not lingering
around, matching the pattern already used elsewhere in the driver.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Andreas Haarmann-Thiemann <eitschman@nebelreich.de>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260505-gemini-ethernet-fix-v2-1-997c31d06079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-mlx5e-report-more-netdev-stats'
Jakub Kicinski [Thu, 7 May 2026 01:39:00 +0000 (18:39 -0700)] 
Merge branch 'net-mlx5e-report-more-netdev-stats'

Tariq Toukan says:

====================
net/mlx5e: Report more netdev stats

This series by Gal extends the set of counters reported in netdev stats,
by adding:
- hw_gso_packets/bytes
- RX HW-GRO stats
- TX csum_none
- TX queue stop/wake

It also aligns the tso_bytes/tso_inner_bytes counters with the netdev
stats API and virtio spec definition.
====================

Link: https://patch.msgid.link/20260504183704.272322-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Report stop and wake TX queue stats
Gal Pressman [Mon, 4 May 2026 18:37:04 +0000 (21:37 +0300)] 
net/mlx5e: Report stop and wake TX queue stats

Report TX queue stop and wake statistics via the netdev queue stats API
by mapping the existing stopped and wake counters to the stop and wake
fields.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Report TX csum_none netdev stat
Gal Pressman [Mon, 4 May 2026 18:37:03 +0000 (21:37 +0300)] 
net/mlx5e: Report TX csum_none netdev stat

Report TX csum_none statistic via the netdev queue stats API by mapping
the existing csum_none counter to the csum_none field.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Report RX HW-GRO netdev stats
Gal Pressman [Mon, 4 May 2026 18:37:02 +0000 (21:37 +0300)] 
net/mlx5e: Report RX HW-GRO netdev stats

Report RX hardware GRO statistics via the netdev queue stats API by
mapping the existing gro_packets, gro_bytes and gro_skbs counters to the
hw_gro_wire_packets, hw_gro_wire_bytes and hw_gro_packets fields.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Report hw_gso_packets and hw_gso_bytes netdev stats
Gal Pressman [Mon, 4 May 2026 18:37:01 +0000 (21:37 +0300)] 
net/mlx5e: Report hw_gso_packets and hw_gso_bytes netdev stats

Report hardware GSO statistics via the netdev queue stats API by mapping
the existing TSO counters to hw_gso_packets and hw_gso_bytes fields.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Count full skb length in TSO byte counters
Gal Pressman [Mon, 4 May 2026 18:37:00 +0000 (21:37 +0300)] 
net/mlx5e: Count full skb length in TSO byte counters

The tso_bytes and tso_inner_bytes counters currently subtract the header
length from skb->len, counting only the payload. This is confusing and
doesn't align with the behavior of other _bytes counters in the driver.

Report the full skb length to align with this expectation.

This also makes our behavior consistent with the netdev stats API and
virtio spec definition.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504183704.272322-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'mptcp-pm-misc-fixes-for-v7-1-rc3'
Jakub Kicinski [Thu, 7 May 2026 01:16:49 +0000 (18:16 -0700)] 
Merge branch 'mptcp-pm-misc-fixes-for-v7-1-rc3'

Matthieu Baerts says:

====================
mptcp: pm: misc. fixes for v7.1-rc3

Here are various fixes, mainly related to ADD_ADDRs:

- Patch 1: save ADD_ADDR for rtx with ID0 when needed. A fix for v6.1.

- Patch 2: remove unneeded exception for ID 0. A fix for v5.10.

- Patches 3-5: fix potential data-race and leaks during ADD_ADDR rtx. A
  fix for v5.10.

- Patch 6: resched blocked ADD_ADDR rtx after a more appropriated
  timeout, not after 15 seconds. A fix for v5.10.

- Patch 7: skip inactive subflows when when looking at the max RTO. A
  fix for v6.18.

- Patch 8: avoid iterating over all subflows when there is no need to. A
  fix for v6.18.

- Patch 9: skip closed subflows when looking at sending MP_PRIO. A fix
  for v5.17.

- Patch 10: properly catch errors when using check_output() in the
  selftests. A fix for v6.9.

- Patch 11: skip the 'unknown' flag test when 'ip mptcp' is used. A fix
  for v6.10.
====================

Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-0-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:59 +0000 (17:00 +0200)] 
selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl

When pm_netlink.sh is executed with '-i', 'ip mptcp' is used instead of
'pm_nl_ctl'. IPRoute2 doesn't support the 'unknown' flag, which has only
been added to 'pm_nl_ctl' for this specific check: to ensure that the
kernel ignores such unsupported flag.

No reason to add this flag to 'ip mptcp'. Then, this check should be
skipped when 'ip mptcp' is used.

Fixes: 0cef6fcac24d ("selftests: mptcp: ip_mptcp option for more scripts")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-11-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: mptcp: check output: catch cmd errors
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:58 +0000 (17:00 +0200)] 
selftests: mptcp: check output: catch cmd errors

Using '${?}' inside the if-statement to check the returned value from
the command that was evaluated as part of the if-statement is not
correct: here, '${?}' will be linked to the previous instruction, not
the one that is expected here (${cmd}).

Instead, simply mark the error, except if an error is expected. If
that's the case, 1 can be passed as the 4th argument of this helper.
Three checks from pm_netlink.sh expect an error.

While at it, improve the error message when the command unexpectedly
fails or succeeds.

Note that we could expect a specific returned value, but the checks
currently expecting an error can be used with 'ip mptcp' or 'pm_nl_ctl',
and these two tools don't return the same error code.

Fixes: 2d0c1d27ea4e ("selftests: mptcp: add mptcp_lib_check_output helper")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-10-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: prio: skip closed subflows
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:57 +0000 (17:00 +0200)] 
mptcp: pm: prio: skip closed subflows

When sending an MP_PRIO, closed subflows need to be skipped.

This fixes the case where the initial subflow got closed, re-opened
later, then an MP_PRIO is needed for the same local address.

Note that explicit MP_PRIO cannot be sent during the 3WHS, so it is fine
to use __mptcp_subflow_active().

Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support")
Cc: stable@vger.kernel.org
Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-9-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: ADD_ADDR rtx: return early if no retrans
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:56 +0000 (17:00 +0200)] 
mptcp: pm: ADD_ADDR rtx: return early if no retrans

No need to iterate over all subflows if there is no retransmission
needed.

Exit early in this case then.

Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-8-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: ADD_ADDR rtx: skip inactive subflows
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:55 +0000 (17:00 +0200)] 
mptcp: pm: ADD_ADDR rtx: skip inactive subflows

When looking at the maximum RTO amongst the subflows, inactive subflows
were taken into account: that includes stale ones, and the initial one
if it has been already been closed.

Unusable subflows are now simply skipped. Stale ones are used as an
alternative: if there are only stale ones, to take their maximum RTO and
avoid to eventually fallback to net.mptcp.add_addr_timeout, which is set
to 2 minutes by default.

Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-7-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:54 +0000 (17:00 +0200)] 
mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker

When an ADD_ADDR needs to be retransmitted and another one has already
been prepared -- e.g. multiple ADD_ADDRs have been sent in a row and
need to be retransmitted later -- this additional retransmission will
need to wait.

In this case, the timer was reset to TCP_RTO_MAX / 8, which is ~15
seconds. This delay is unnecessary long: it should just be rescheduled
at the next opportunity, e.g. after the retransmission timeout.

Without this modification, some issues can be seen from time to time in
the selftests when multiple ADD_ADDRs are sent, and the host takes time
to process them, e.g. the "signal addresses, ADD_ADDR timeout" MPTCP
Join selftest, especially with a debug kernel config.

Note that on older kernels, 'timeout' is not available. It should be
enough to replace it by one second (HZ).

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-6-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: ADD_ADDR rtx: free sk if last
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:53 +0000 (17:00 +0200)] 
mptcp: pm: ADD_ADDR rtx: free sk if last

When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(),
and released at the end.

If at that moment, it was the last reference being held, the sk would
not be freed. sock_put() should then be called instead of __sock_put().

But that's not enough: if it is the last reference, sock_put() will call
sk_free(), which will end up calling sk_stop_timer_sync() on the same
timer, and waiting indefinitely to finish. So it is needed to mark that
the timer is done at the end of the timer handler when it has not been
rescheduled, not to call sk_stop_timer_sync() on "itself".

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-5-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: ADD_ADDR rtx: always decrease sk refcount
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:52 +0000 (17:00 +0200)] 
mptcp: pm: ADD_ADDR rtx: always decrease sk refcount

When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer().
It should then be released in all cases at the end.

Some (unlikely) checks were returning directly instead of calling
sock_put() to decrease the refcount. Jump to a new 'exit' label to call
__sock_put() (which will become sock_put() in the next commit) to fix
this potential leak.

While at it, drop the '!msk' check which cannot happen because it is
never reset, and explicitly mark the remaining one as "unlikely".

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-4-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: ADD_ADDR rtx: fix potential data-race
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:51 +0000 (17:00 +0200)] 
mptcp: pm: ADD_ADDR rtx: fix potential data-race

This mptcp_pm_add_timer() helper is executed as a timer callback in
softirq context. To avoid any data races, the socket lock needs to be
held with bh_lock_sock().

If the socket is in use, retry again soon after, similar to what is done
with the keepalive timer.

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-3-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: ADD_ADDR rtx: allow ID 0
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:50 +0000 (17:00 +0200)] 
mptcp: pm: ADD_ADDR rtx: allow ID 0

ADD_ADDR can be sent for the ID 0, which corresponds to the local
address and port linked to the initial subflow.

Indeed, this address could be removed, and re-added later on, e.g. what
is done in the "delete re-add signal" MPTCP Join selftests. So no reason
to ignore it.

Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-2-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomptcp: pm: kernel: correctly retransmit ADD_ADDR ID 0
Matthieu Baerts (NGI0) [Tue, 5 May 2026 15:00:49 +0000 (17:00 +0200)] 
mptcp: pm: kernel: correctly retransmit ADD_ADDR ID 0

When adding the ADD_ADDR to the list, the address including the IP, port
and ID are copied. On the other hand, when the endpoint corresponds to
the one from the initial subflow, the ID is set to 0, as specified by
the MPTCP protocol.

The issue is that the ID was reset after having copied the ID in the
ADD_ADDR entry. So the retransmission was done, but using a different ID
than the initial one.

Fixes: 8b8ed1b429f8 ("mptcp: pm: reuse ID 0 after delete and re-add")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-1-fca8091060a4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: tcp_child_process() related UAF
Eric Dumazet [Tue, 5 May 2026 15:39:27 +0000 (15:39 +0000)] 
tcp: tcp_child_process() related UAF

tcp_child_process( .. child ...) currently calls sock_put(child).

Unfortunately @child (named @nsk in callers) can be used after
this point to send a RST packet.

To fix this UAF, I remove the sock_put() from tcp_child_process()
and let the callers handle this after it is safe.

Remove @rsk variable in tcp_v4_do_rcv() and change tcp_v6_do_rcv()
so that both functions look the same.

Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260505153927.3435532-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/sched: sch_sfq: annotate data-races from sfq_dump_class_stats()
Eric Dumazet [Tue, 5 May 2026 09:11:33 +0000 (09:11 +0000)] 
net/sched: sch_sfq: annotate data-races from sfq_dump_class_stats()

sfq_dump_class_stats() runs locklessly, add needed READ_ONCE()
and WRITE_ONCE() annotations.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260505091133.2452510-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoinetpeer: add a missing read_seqretry() in inet_getpeer()
Eric Dumazet [Tue, 5 May 2026 13:32:33 +0000 (13:32 +0000)] 
inetpeer: add a missing read_seqretry() in inet_getpeer()

When performing a lockless lookup over the inet_peer rbtree,
if a matching node is found, inet_getpeer() returns it immediately
without validating the seqlock sequence.

This missing check introduces a race condition:

Trigger Path: When a host receives an incoming fragmented IPv4 packet,
ip4_frag_init() (in net/ipv4/ip_fragment.c) calls inet_getpeer_v4()
to track the peer.

The Race: If the packet is from a new source IP, CPU A acquires the
write_seqlock, allocates a new inet_peer node (p), sets its IP address
(daddr), and links it to the rbtree (rb_link_node).

Uninitialized Access: Due to the lack of memory barriers between
rb_link_node and the initialization of the rest of the struct
(like refcount_set(&p->refcnt, 1)), CPU A can make the node visible
to readers before its refcnt is initialized.
This is especially true on weakly-ordered architectures like ARM64
where the CPU can reorder the memory stores.

Lockless Reader: Concurrently, CPU B processes a second fragmented packet
from the same source IP. CPU B does a lockless lookup, finds the newly
inserted node, and returns it immediately.

Use-After-Free (UAF): CPU B reads p->refcnt as uninitialized garbage
(left over from previous kmalloc-128/192 allocations).
If the garbage is > 0, refcount_inc_not_zero(&p->refcnt) succeeds.
CPU A then executes refcount_set(&p->refcnt, 1), overwriting CPU B's increment.
When CPU B finishes with the fragment queue, it calls inet_putpeer(),
which drops the refcount to 0 and frees the node via RCU.
The node is now freed but remains linked in the rbtree,
resulting in a Use-After-Free in the rbtree.

Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260505133233.3039575-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: rtsn: fix mdio_node leak in rtsn_mdio_alloc()
Shitalkumar Gandhi [Tue, 5 May 2026 12:32:36 +0000 (18:02 +0530)] 
net: rtsn: fix mdio_node leak in rtsn_mdio_alloc()

of_get_child_by_name() takes a reference. The rtsn_reset() and
rtsn_change_mode() failure paths jump to out_free_bus and leak
mdio_node.

Add out_put_node to drop it before falling through.

Fixes: b0d3969d2b4d ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN")
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20260505123236.406000-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'netdevsim-psp-fix-init-and-uninit-bugs'
Jakub Kicinski [Thu, 7 May 2026 00:39:22 +0000 (17:39 -0700)] 
Merge branch 'netdevsim-psp-fix-init-and-uninit-bugs'

Daniel Zahka says:

====================
netdevsim: psp: fix init and uninit bugs

This series has three fixes. The first is a straightforward NULL
pointer dereference that is reachable by creating and destroying some
vfs on a kernel with INET_PSP enabled.

The last two patches deal with nsim_psp_rereg_write(), which is a
debugfs handler that reregisters netdevsim's psp_dev without
aquiescing and disabling tx/rx processing. This was added to enable
some tests in psp.py where a psp device is unregistered while it still
referenced by tcp socket state.

There are two issues with this code:
1. Calls to nsim_psp_uninit() are not properly serialized
2. netdevsim's psp_dev refcount can be released while nsim_do_psp() is
   reading from it.
====================

Link: https://patch.msgid.link/20260505-psd-rcu-v1-0-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetdevsim: psp: rcu protect psp_dev reference
Daniel Zahka [Tue, 5 May 2026 10:42:25 +0000 (03:42 -0700)] 
netdevsim: psp: rcu protect psp_dev reference

There are two issues with the way psp_dev is used in nsim_do_psp():

1. There is no check for IS_ERR() on the peers psp_dev, before
   dereferencing.
2. The refcount on this psp_dev can be dropped by
   nsim_psp_rereg_write()

To fix this, we can make netdevsim's reference to its psp_dev an rcu
reference, and then nsim_do_psp() can read the fields it needs from an
rcu critical section.

Fixes: f857478d6206 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-3-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetdevsim: psp: serialize calls to nsim_psp_uninit()
Daniel Zahka [Tue, 5 May 2026 10:42:24 +0000 (03:42 -0700)] 
netdevsim: psp: serialize calls to nsim_psp_uninit()

The debugfs write handler, nsim_psp_rereg_write(), can race against
nsim_destroy() and against itself, causing nsim_psp_uninit() to run
more than once concurrently. Two complementary changes serialize all
callers:

1. Delete the psp_rereg debugfs file from nsim_psp_uninit() before
   doing the actual teardown. debugfs_remove() drains any in-flight
   writers and prevents new ones from starting.

2. Add a mutex around the body of nsim_psp_rereg_write() so that two
   concurrent userspace writers cannot both enter the teardown path
   at once.

The teardown work itself is moved into a new __nsim_psp_uninit() that
the rereg handler calls under the mutex, while the public
nsim_psp_uninit() wraps it with the debugfs_remove()/mutex_destroy()
pair so nsim_destroy() doesn't have to know about the psp internals.

Fixes: f857478d6206 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-2-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetdevsim: psp: only call nsim_psp_uninit() on PFs
Daniel Zahka [Tue, 5 May 2026 10:42:23 +0000 (03:42 -0700)] 
netdevsim: psp: only call nsim_psp_uninit() on PFs

VFs go through nsim_init_netdevsim_vf() which never calls
nsim_psp_init(), so ns->psp.dev stays NULL. nsim_psp_uninit() guards
with !IS_ERR(ns->psp.dev), so destroying a VF reaches
psp_dev_unregister(NULL) and dereferences NULL on the first
mutex_lock(&psd->lock):

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  RIP: 0010:mutex_lock+0x1c/0x30
  Call Trace:
   psp_dev_unregister+0x2a/0x1a0
   nsim_psp_uninit+0x1f/0x40 [netdevsim]
   nsim_destroy+0x61/0x1e0 [netdevsim]
   __nsim_dev_port_del+0x47/0x90 [netdevsim]
   nsim_drv_configure_vfs+0xc9/0x130 [netdevsim]
   nsim_bus_dev_numvfs_store+0x79/0xb0 [netdevsim]

Gate nsim_psp_uninit() on nsim_dev_port_is_pf(), matching the pattern
already used for nsim_exit_netdevsim() and the bpf/ipsec/macsec/queue
teardowns.

Reproducer:
  modprobe netdevsim
  echo "10 1" > /sys/bus/netdevsim/new_device
  echo 1 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs
  devlink dev eswitch set netdevsim/netdevsim10 mode switchdev
  echo 0 > /sys/bus/netdevsim/devices/netdevsim10/sriov_numvfs

Fixes: f857478d6206 ("netdevsim: a basic test PSP implementation")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260505-psd-rcu-v1-1-a8f69ec1ab96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoipv6: fix potential UAF caused by ip6_forward_proxy_check()
Eric Dumazet [Tue, 5 May 2026 13:00:56 +0000 (13:00 +0000)] 
ipv6: fix potential UAF caused by ip6_forward_proxy_check()

ip6_forward_proxy_check() calls pskb_may_pull() which might re-allocate
skb->head.

Reload ipv6_hdr() after the pskb_may_pull() call to avoid using
the freed memory.

Fixes: e21e0b5f19ac ("[IPV6] NDISC: Handle NDP messages to proxied addresses.")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260505130056.2927197-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agortase: Fix flow control configuration
Justin Lai [Tue, 5 May 2026 06:41:21 +0000 (14:41 +0800)] 
rtase: Fix flow control configuration

The hardware has two sets of registers controlling TX/RX flow control.
The effective flow control state is determined by the logical OR of
these two sets of bits.

RTASE_FORCE_TXFLOW_EN and RTASE_FORCE_RXFLOW_EN in RTASE_CPLUS_CMD are
the bits used by the driver to control TX/RX flow control according to
the ethtool pause configuration.

RTASE_TXFLOW_EN and RTASE_RXFLOW_EN in RTASE_GPHY_STD_00 are another
set of TX/RX flow control enable bits. Clear them by default so they do
not keep flow control enabled independently of the driver setting.

With the RTASE_GPHY_STD_00 bits cleared, the effective flow control
state is controlled through RTASE_CPLUS_CMD, so the ethtool setting can
take effect correctly.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260505064121.31286-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: drv-net: fix sort order of makefile and config
Jakub Kicinski [Thu, 7 May 2026 00:22:05 +0000 (17:22 -0700)] 
selftests: drv-net: fix sort order of makefile and config

Recent changes added configs and tests in the wrong spot.

Link: https://lore.kernel.org/20260506170435.34984dfc@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klasser...
Jakub Kicinski [Wed, 6 May 2026 23:49:41 +0000 (16:49 -0700)] 
Merge tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2026-05-05

1. Fix an IPv6 encapsulation error path that leaked route references
   when UDPv6 ESP decapsulation resolved to an error route.
   From Yilin Zhu.

2. Fix AH with ESN on async crypto paths by accounting for the extra
   high-order sequence number when reconstructing the temporary
   authentication layout in the completion callbacks.
   From Michael Bomarito.

3. Fix XFRM output so it does not overwrite already-correct inner header
   pointers when a tunnel layer such as VXLAN has already saved them.
   The fix comes with new selftests. From Cosmin Ratiu.

4. Add the missing native payload size entry for XFRM_MSG_MAPPING in the
   compat translation path. From Ruijie Li.

5. Harden __xfrm_state_delete() against repeated or inconsistent unhashing
   of state list nodes by keying the removal on actual list membership and
   using delete-and-init helpers. From Michal Kosiorek.

6. Prevent ESP from decrypting shared splice-backed skb fragments in place
   by marking UDP splice frags as shared and forcing copy-on-write in ESP
   input when needed. From Kuan-Ting Chen.

* tag 'ipsec-2026-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: esp: avoid in-place decrypt on shared skb frags
  xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
  xfrm: provide message size for XFRM_MSG_MAPPING
  xfrm: Don't clobber inner headers when already set
  tools/selftests: Add a VXLAN+IPsec traffic test
  tools/selftests: Use a sensible timeout value for iperf3 client
  xfrm: ah: account for ESN high bits in async callbacks
  ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()
====================

Link: https://patch.msgid.link/20260505132326.1362733-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'ovpn-net-20260504' of https://github.com/OpenVPN/ovpn-net-next
Jakub Kicinski [Wed, 6 May 2026 23:10:02 +0000 (16:10 -0700)] 
Merge tag 'ovpn-net-20260504' of https://github.com/OpenVPN/ovpn-net-next

Antonio Quartulli says:

====================
Includes changes:

* ensure MAC header offset is reset before delivering packet
* ensure gro_cells_receive() and dstats_dev_add() are called
  with BH disabled
* reduce ping count in selftest to ensure it completes within
  timeout

* tag 'ovpn-net-20260504' of https://github.com/OpenVPN/ovpn-net-next:
  selftests: ovpn: reduce ping count in test.sh
  ovpn: ensure packet delivery happens with BH disabled
  ovpn: reset MAC header before passing skb up
====================

Link: https://patch.msgid.link/20260504230305.2681646-1-antonio@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'for-net-2026-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Wed, 6 May 2026 22:43:33 +0000 (15:43 -0700)] 
Merge tag 'for-net-2026-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_conn: fix potential UAF in create_big_sync
 - hci_event: fix memset typo
 - hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
 - L2CAP: fix MPS check in l2cap_ecred_reconf_req
 - L2CAP: defer conn param update to avoid conn->lock/hdev->lock inversion
 - L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
 - L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
 - L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
 - RFCOMM: pull credit byte with skb_pull_data()
 - SCO: fix sleeping under spinlock in sco_conn_ready
 - SCO: hold sk properly in sco_conn_ready
 - ISO: Fix data-race on dst in iso_sock_connect()
 - ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
 - bnep: fix incorrect length parsing in bnep_rx_frame() extension handling
 - hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized
 - virtio_bt: clamp rx length before skb_put
 - virtio_bt: validate rx pkt_type header length
 - HIDP: serialise l2cap_unregister_user via hidp_session_sem
 - btintel_pcie: treat boot stage bit 12 as warning
 - btmtk: validate WMT event SKB length before struct access

* tag 'for-net-2026-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: HIDP: serialise l2cap_unregister_user via hidp_session_sem
  Bluetooth: hci_event: fix memset typo
  Bluetooth: RFCOMM: pull credit byte with skb_pull_data()
  Bluetooth: virtio_bt: validate rx pkt_type header length
  Bluetooth: virtio_bt: clamp rx length before skb_put
  Bluetooth: btmtk: validate WMT event SKB length before struct access
  Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
  Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()
  Bluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized
  Bluetooth: btintel_pcie: treat boot stage bit 12 as warning
  Bluetooth: SCO: hold sk properly in sco_conn_ready
  Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
  Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
  Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
  Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion
  Bluetooth: l2cap: fix MPS check in l2cap_ecred_reconf_req
  Bluetooth: bnep: fix incorrect length parsing in bnep_rx_frame() extension handling
  Bluetooth: hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
  Bluetooth: hci_conn: fix potential UAF in create_big_sync
  Bluetooth: SCO: fix sleeping under spinlock in sco_conn_ready
====================

Link: https://patch.msgid.link/20260506204553.58686-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoBluetooth: HIDP: serialise l2cap_unregister_user via hidp_session_sem
Michael Bommarito [Sat, 2 May 2026 16:43:03 +0000 (12:43 -0400)] 
Bluetooth: HIDP: serialise l2cap_unregister_user via hidp_session_sem

Commit dbf666e4fc9b ("Bluetooth: HIDP: Fix possible UAF") made
hidp_session_remove() drop the L2CAP reference and set
session->conn = NULL once the session is considered removed, and
added a bare if (session->conn) guard around the kthread-exit
l2cap_unregister_user() call in hidp_session_thread().  The sibling
ioctl site in hidp_connection_del() still reads session->conn
unlocked and unguarded, and the kthread-exit guard itself is a
lockless double-read.

hidp_session_find() drops hidp_session_sem before returning, so
hidp_session_remove() can null session->conn between the lookup and
the call in hidp_connection_del().  Worse, since commit 752a6c9596dd
("Bluetooth: L2CAP: Fix use-after-free in l2cap_unregister_user")
takes mutex_lock(&conn->lock) inside l2cap_unregister_user(), a
stale non-NULL snapshot also UAFs on conn->lock.  v1 only added an
if (session->conn) guard at the ioctl site, which doesn't address
either race; Luiz suggested snapshotting session->conn under the
sem and clearing it before the call.

Taking hidp_session_sem across l2cap_unregister_user() would be
wrong: l2cap_conn_del() already establishes the lock order

  conn->lock -> hidp_session_sem

via l2cap_unregister_all_users() -> user->remove ==
hidp_session_remove(), so taking hidp_session_sem before conn->lock
would AB/BA deadlock.

Factor a helper hidp_session_unregister_conn() that under
down_write(&hidp_session_sem) snapshots session->conn and clears
the member, then outside the sem calls l2cap_unregister_user() and
l2cap_conn_put() on the snapshot.  Call it from both
hidp_connection_del() and hidp_session_thread()'s exit path.  At
most one consumer wins the write-sem; later callers observe
session->conn == NULL and skip the unregister and put, so the
reference hidp_session_new() took via l2cap_conn_get() is consumed
exactly once.  session_free() already tolerates a NULL session->conn.

Fixes: dbf666e4fc9b ("Bluetooth: HIDP: Fix possible UAF")
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Link: https://lore.kernel.org/all/20260422011437.176643-1-michael.bommarito@gmail.com/
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: hci_event: fix memset typo
Jann Horn [Wed, 29 Apr 2026 13:40:46 +0000 (15:40 +0200)] 
Bluetooth: hci_event: fix memset typo

hci_le_big_sync_established_evt() currently does:

    conn->num_bis = 0;
    memset(conn->bis, 0, sizeof(conn->num_bis));

sizeof(conn->num_bis) is wrong - it would make sense to either use
conn->num_bis (before setting that to 0) or sizeof(conn->bis).
Fix it by using sizeof(conn->bis), the least intrusive change.

Luckily, nothing actually depends on this memset() working properly:
Nothing seems to ever read from conn->bis beyond conn->num_bis, and when
conn->num_bis is increased, the corresponding elements of conn->bis are
initialized. So I think this line could also just be removed.

This is a purely theoretical fix and should have no impact on actual
behavior.

Fixes: 42ecf1947135 ("Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: RFCOMM: pull credit byte with skb_pull_data()
Pengpeng Hou [Thu, 23 Apr 2026 15:31:00 +0000 (23:31 +0800)] 
Bluetooth: RFCOMM: pull credit byte with skb_pull_data()

rfcomm_recv_data() treats the first payload byte as a credit field when
the UIH frame carries PF and credit-based flow control is enabled.

After the header has been stripped, the PF/CFC path consumes that byte
with a direct skb->data dereference followed by skb_pull(). A malformed
short frame can reach this path without a byte available.

Use skb_pull_data() so the length check and pull happen together before
the returned credit byte is consumed.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: virtio_bt: validate rx pkt_type header length
Michael Bommarito [Tue, 21 Apr 2026 17:08:45 +0000 (13:08 -0400)] 
Bluetooth: virtio_bt: validate rx pkt_type header length

virtbt_rx_handle() reads the leading pkt_type byte from the RX skb
and forwards the remainder to hci_recv_frame() for every
event/ACL/SCO/ISO type, without checking that the remaining payload
is at least the fixed HCI header for that type.

After the preceding patch bounds the backend-supplied used.len to
[1, VIRTBT_RX_BUF_SIZE], a one-byte completion still reaches
hci_recv_frame() with skb->len already pulled to 0. If the byte
happened to be HCI_ACLDATA_PKT, the ACL-vs-ISO classification
fast-path in hci_dev_classify_pkt_type() dereferences
hci_acl_hdr(skb)->handle whenever the HCI device has an active
CIS_LINK, BIS_LINK, or PA_LINK connection, reading two bytes of
uninitialized RX-buffer data. The same hazard exists for every
packet type the driver accepts because none of the switch cases in
virtbt_rx_handle() check skb->len against the per-type minimum HCI
header size before handing the frame to the core.

After stripping pkt_type, require skb->len to cover the fixed
header size for the selected type (event 2, ACL 4, SCO 3, ISO 4)
before calling hci_recv_frame(); drop ratelimited otherwise.
Unknown pkt_type values still take the original kfree_skb() default
path.

Use bt_dev_err_ratelimited() because both the length and pkt_type
values come from an untrusted backend that can otherwise flood the
kernel log.

Fixes: 160fbcf3bfb9 ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: virtio_bt: clamp rx length before skb_put
Michael Bommarito [Tue, 21 Apr 2026 17:08:44 +0000 (13:08 -0400)] 
Bluetooth: virtio_bt: clamp rx length before skb_put

virtbt_rx_work() calls skb_put(skb, len) where len comes directly
from virtqueue_get_buf() with no validation against the buffer we
posted to the device. The RX skb is allocated in virtbt_add_inbuf()
and exposed to virtio as exactly 1000 bytes via sg_init_one().

Checking len against skb_tailroom(skb) is not sufficient because
alloc_skb() can leave more tailroom than the 1000 bytes actually
handed to the device. A malicious or buggy backend can therefore
report used.len between 1001 and skb_tailroom(skb), causing skb_put()
to include uninitialized kernel heap bytes that were never written by
the device.

The same path also accepts len == 0, in which case skb_put(skb, 0)
leaves the skb empty but virtbt_rx_handle() still reads the pkt_type
byte from skb->data, consuming uninitialized memory.

Define VIRTBT_RX_BUF_SIZE once and reuse it in alloc_skb() and
sg_init_one(), and gate virtbt_rx_work() on that same constant so
the bound checked matches the buffer actually exposed to the device.
Reject used.len == 0 in the same gate so an empty completion can
no longer reach virtbt_rx_handle().

Use bt_dev_err_ratelimited() because the length value comes from an
untrusted backend that can otherwise flood the kernel log.

Same class of bug as commit c04db81cd028 ("net/9p: Fix buffer
overflow in USB transport layer"), which hardened the USB 9p
transport against unchecked device-reported length.

Fixes: 160fbcf3bfb9 ("Bluetooth: virtio_bt: Use skb_put to set length")
Cc: stable@vger.kernel.org
Cc: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: btmtk: validate WMT event SKB length before struct access
Tristan Madani [Tue, 21 Apr 2026 11:14:54 +0000 (11:14 +0000)] 
Bluetooth: btmtk: validate WMT event SKB length before struct access

btmtk_usb_hci_wmt_sync() casts the WMT event response SKB data to
struct btmtk_hci_wmt_evt (7 bytes) and struct btmtk_hci_wmt_evt_funcc
(9 bytes) without first checking that the SKB contains enough data.
A short firmware response causes out-of-bounds reads from SKB tailroom.

Use skb_pull_data() to validate and advance past the base WMT event
header. For the FUNC_CTRL case, pull the additional status field bytes
before accessing them.

Fixes: d019930b0049 ("Bluetooth: btmtk: move btusb_mtk_hci_wmt_sync to btmtk.c")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths
SeungJu Cheon [Tue, 21 Apr 2026 02:51:22 +0000 (11:51 +0900)] 
Bluetooth: ISO: Fix data-race on iso_pi(sk) in socket and HCI event paths

Several iso_pi(sk) fields (qos, qos_user_set, bc_sid, base, base_len,
sync_handle, bc_num_bis) are written under lock_sock in
iso_sock_setsockopt() and iso_sock_bind(), but read and written under
hci_dev_lock only in two other paths:

  - iso_connect_bis() / iso_connect_cis(), invoked from connect(2),
    read qos/base/bc_sid and reset qos to default_qos on the
    qos_user_set validation failure -- all without lock_sock.

  - iso_connect_ind(), invoked from hci_rx_work, writes sync_handle,
    bc_sid, qos.bcast.encryption, bc_num_bis, base and base_len on
    PA_SYNC_ESTABLISHED / PAST_RECEIVED / BIG_INFO_ADV_REPORT /
    PER_ADV_REPORT events. The BIG_INFO handler additionally passes
    &iso_pi(sk)->qos together with sync_handle / bc_num_bis / bc_bis
    to hci_conn_big_create_sync() while setsockopt may be mutating
    them.

Acquire lock_sock around the affected accesses in both paths.

The locking order hci_dev_lock -> lock_sock matches the existing
iso_conn_big_sync() precedent, whose comment documents the same
requirement for hci_conn_big_create_sync(). The HCI connect/bind
helpers do not wait for command completion -- they enqueue work via
hci_cmd_sync_queue{,_once}() / hci_le_create_cis_pending() and
return -- so the added hold time is comparable to iso_conn_big_sync().

KCSAN report:

BUG: KCSAN: data-race in iso_connect_cis / iso_sock_setsockopt

read to 0xffffa3ae8ce3cdc8 of 1 bytes by task 335 on cpu 0:
 iso_connect_cis+0x49f/0xa20
 iso_sock_connect+0x60e/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

write to 0xffffa3ae8ce3cdc8 of 60 bytes by task 334 on cpu 1:
 iso_sock_setsockopt+0x69a/0x930
 do_sock_setsockopt+0xc3/0x170
 __sys_setsockopt+0xd1/0x130
 __x64_sys_setsockopt+0x64/0x80
 x64_sys_call+0x1547/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 334 Comm: iso_setup_race Not tainted 7.0.0-10949-g8541d8f725c6 #44 PREEMPT(lazy)

The iso_connect_ind() races were found by inspection.

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: ISO: Fix data-race on dst in iso_sock_connect()
SeungJu Cheon [Tue, 21 Apr 2026 02:51:21 +0000 (11:51 +0900)] 
Bluetooth: ISO: Fix data-race on dst in iso_sock_connect()

iso_sock_connect() copies the destination address into
iso_pi(sk)->dst under lock_sock, then releases the lock and reads
it back with bacmp() to decide between the CIS and BIS connect
paths:

    lock_sock(sk);
    bacpy(&iso_pi(sk)->dst, &sa->iso_bdaddr);
    iso_pi(sk)->dst_type = sa->iso_bdaddr_type;
    release_sock(sk);

    if (bacmp(&iso_pi(sk)->dst, BDADDR_ANY))  // <- no lock held

This read after release_sock() races with any concurrent write to
iso_pi(sk)->dst on the same socket.

Fix by reading the destination address directly from the local
sockaddr argument (sa->iso_bdaddr) instead of iso_pi(sk)->dst.
Since sa is a function-local argument, reading it requires no
locking and avoids the race.

This patch addresses only the bacmp() race in iso_sock_connect();
other unprotected iso_pi(sk) accesses are fixed separately in the
next patch.

KCSAN report:

BUG: KCSAN: data-race in memcmp+0x39/0xb0

race at unknown origin, with read to 0xffff8f96ea66dde3 of 1 bytes by task 549 on cpu 1:
 memcmp+0x39/0xb0
 iso_sock_connect+0x275/0xb40
 __sys_connect_file+0xbd/0xe0
 __sys_connect+0xe0/0x110
 __x64_sys_connect+0x40/0x50
 x64_sys_call+0xcad/0x1c60
 do_syscall_64+0x133/0x590
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00 -> 0xee

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 549 Comm: iso_race_combin Not tainted 7.0.0-08391-g1d51b370a0f8 #40 PREEMPT(lazy)

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized
Aurelien DESBRIERES [Tue, 21 Apr 2026 13:53:31 +0000 (15:53 +0200)] 
Bluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized

When a fault is injected during hci_uart line discipline setup, the
proto open() callback may fail leaving hu->priv as NULL. A subsequent
TIOCSTI ioctl can trigger the recv() callback before priv is
initialized, causing a NULL pointer dereference.

Fix all four affected HCI UART protocol drivers by adding a NULL check
on hu->priv at the start of their recv() callbacks: h4, h5, ath and
bcsp.

Reported-by: syzbot+ff30eeab8e07b37d524e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ff30eeab8e07b37d524e
Signed-off-by: Aurelien DESBRIERES <aurelien@hackers.camp>
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: btintel_pcie: treat boot stage bit 12 as warning
Sai Teja Aluvala [Mon, 20 Apr 2026 17:37:35 +0000 (23:07 +0530)] 
Bluetooth: btintel_pcie: treat boot stage bit 12 as warning

CSR boot stage register bit 12 is documented as a device warning,
not a fatal error. Rename the bit definition accordingly and stop
including it in btintel_pcie_in_error().

This keeps warning-only boot stage values from being classified as
errors while preserving abort-handler state as the actual error
condition.

Fixes: 190377500fde ("Bluetooth: btintel_pcie: Dump debug registers on error")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Sai Teja Aluvala <aluvala.sai.teja@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: SCO: hold sk properly in sco_conn_ready
Pauli Virtanen [Sat, 18 Apr 2026 15:41:12 +0000 (18:41 +0300)] 
Bluetooth: SCO: hold sk properly in sco_conn_ready

sk deref in sco_conn_ready must be done either under conn->lock, or
holding a refcount, to avoid concurrent close. conn->sk and parent sk is
currently accessed without either, and without checking parent->sk_state:

    [Task 1]            [Task 2]
                        sco_sock_release
    sco_conn_ready
      sk = conn->sk
                          lock_sock(sk)
                            conn->sk = NULL
      lock_sock(sk)
                          release_sock(sk)
                          sco_sock_kill(sk)
       UAF on sk deref

and similarly for access to sco_get_sock_listen() return value.

Fix possible UAF by holding sk refcount in sco_conn_ready() and making
sco_get_sock_listen() increase refcount. Also recheck after lock_sock
that the socket is still valid.  Adjust conn->sk locking so it's
protected also by lock_sock() of the associated socket if any.

Fixes: 27c24fda62b60 ("Bluetooth: switch to lock_sock in SCO")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()
Siwei Zhang [Wed, 15 Apr 2026 20:49:59 +0000 (16:49 -0400)] 
Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_new_connection_cb()

Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().

Fixes: 80808e431e1e ("Bluetooth: Add l2cap_chan_ops abstraction")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()
Siwei Zhang [Wed, 15 Apr 2026 20:53:36 +0000 (16:53 -0400)] 
Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_get_sndtimeo_cb()

Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().

Fixes: 8d836d71e222 ("Bluetooth: Access sk_sndtimeo indirectly in l2cap_core.c")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()
Siwei Zhang [Wed, 15 Apr 2026 20:51:36 +0000 (16:51 -0400)] 
Bluetooth: L2CAP: Fix null-ptr-deref in l2cap_sock_state_change_cb()

Add the same NULL guard already present in
l2cap_sock_resume_cb() and l2cap_sock_ready_cb().

Fixes: 89bc500e41fc ("Bluetooth: Add state tracking to struct l2cap_chan")
Cc: stable@kernel.org
Signed-off-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion
Mikhail Gavrilov [Tue, 14 Apr 2026 21:52:37 +0000 (02:52 +0500)] 
Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock inversion

When a BLE peripheral sends an L2CAP Connection Parameter Update Request
the processing path is:

  process_pending_rx()          [takes conn->lock]
    l2cap_le_sig_channel()
      l2cap_conn_param_update_req()
        hci_le_conn_update()    [takes hdev->lock]

Meanwhile other code paths take the locks in the opposite order:

  l2cap_chan_connect()          [takes hdev->lock]
    ...
      mutex_lock(&conn->lock)

  l2cap_conn_ready()            [hdev->lock via hci_cb_list_lock]
    ...
      mutex_lock(&conn->lock)

This is a classic AB/BA deadlock which lockdep reports as a circular
locking dependency when connecting a BLE MIDI keyboard (Carry-On FC-49).

Fix this by making hci_le_conn_update() defer the HCI command through
hci_cmd_sync_queue() so it no longer needs to take hdev->lock in the
caller context.  The sync callback uses __hci_cmd_sync_status_sk() to
wait for the HCI_EV_LE_CONN_UPDATE_COMPLETE event, then updates the
stored connection parameters (hci_conn_params) and notifies userspace
(mgmt_new_conn_param) only after the controller has confirmed the update.

A reference on hci_conn is held via hci_conn_get()/hci_conn_put() for
the lifetime of the queued work to prevent use-after-free, and
hci_conn_valid() is checked before proceeding in case the connection was
removed while the work was pending.  The hci_dev_lock is held across
hci_conn_valid() and all conn field accesses to prevent a concurrent
disconnect from invalidating the connection mid-use.

Fixes: f044eb0524a0 ("Bluetooth: Store latency and supervision timeout in connection params")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
6 weeks agoBluetooth: l2cap: fix MPS check in l2cap_ecred_reconf_req
Dudu Lu [Wed, 15 Apr 2026 10:43:55 +0000 (18:43 +0800)] 
Bluetooth: l2cap: fix MPS check in l2cap_ecred_reconf_req

The L2CAP specification states that if more than one channel is being
reconfigured, the MPS shall not be decreased. The current check has
two issues:

1) The comparison uses >= (greater-than-or-equal), which incorrectly
   rejects reconfiguration requests where the MPS stays the same.
   Since the spec says MPS "shall be greater than or equal to the
   current MPS", only a strict decrease (remote_mps > mps) should be
   rejected. Keeping the same MPS is valid.

2) The multi-channel guard uses `&& i` (loop index) to approximate
   "more than one channel", but this incorrectly allows MPS decrease
   for the first channel (i==0) even when multiple channels are being
   reconfigured. Replace with `&& num_scid > 1` which correctly
   checks whether the request covers more than one channel.

Fixes: 7accb1c4321a ("Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ")
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>