Jun Miao [Wed, 18 Jun 2025 17:39:23 +0000 (13:39 -0400)]
net: usb: Convert tasklet API to new bottom half workqueue mechanism
Migrate tasklet APIs to the new bottom half workqueue mechanism. It
replaces all occurrences of tasklet usage with the appropriate workqueue
APIs throughout the usbnet driver. This transition ensures compatibility
with the latest design and enhances performance.
Colin Ian King [Wed, 18 Jun 2025 13:54:08 +0000 (14:54 +0100)]
igc: Make the const read-only array supported_sizes static
Don't populate the const read-only array supported_sizes on the
stack at run time, instead make it static.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>> Link: https://patch.msgid.link/20250618135408.1784120-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
convert lan78xx driver to the PHYLINK
This series converts the lan78xx driver to use the PHYLINK framework,
which enhances PHY and MAC management. The changes also streamline the
driver by removing unused elements and improving link status reporting.
This is the final part of the previously split conversion series:
https://lore.kernel.org/all/20250428130542.3879769-1-o.rempel@pengutronix.de/
Oleksij Rempel [Wed, 18 Jun 2025 12:26:01 +0000 (14:26 +0200)]
net: usb: lan78xx: Integrate EEE support with phylink LPI API
Refactor Energy-Efficient Ethernet (EEE) support in the LAN78xx driver to
fully integrate with the phylink Low Power Idle (LPI) API. This includes:
- Replacing direct calls to `phy_ethtool_get_eee` and `phy_ethtool_set_eee`
with `phylink_ethtool_get_eee` and `phylink_ethtool_set_eee`.
- Implementing `.mac_enable_tx_lpi` and `.mac_disable_tx_lpi` to control
LPI transitions via phylink.
- Configuring `lpi_timer_default` to align with recommended values from
LAN7800 documentation.
- ensure EEE is disabled on controller reset
Oleksij Rempel [Wed, 18 Jun 2025 12:26:00 +0000 (14:26 +0200)]
net: usb: lan78xx: port link settings to phylink API
Refactor lan78xx_get_link_ksettings and lan78xx_set_link_ksettings to
use the phylink API (phylink_ethtool_ksettings_get and
phylink_ethtool_ksettings_set) instead of directly interfacing with the
PHY. This change simplifies the code and ensures better integration with
the phylink framework for link management.
Additionally, the explicit calls to usb_autopm_get_interface() and
usb_autopm_put_interface() have been removed. These were originally
needed to manage USB power management during register accesses. However,
lan78xx_mdiobus_read() and lan78xx_mdiobus_write() already handle USB
auto power management internally, ensuring that the interface remains
active when necessary. Since there are no other direct register accesses
in these functions that require explicit power management handling, the
extra calls have become redundant and are no longer needed.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250618122602.3156678-5-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Wed, 18 Jun 2025 12:25:59 +0000 (14:25 +0200)]
net: usb: lan78xx: Use ethtool_op_get_link to reflect current link status
Replace the custom lan78xx_get_link implementation with the standard
ethtool_op_get_link helper, which uses netif_carrier_ok to reflect
the current link status accurately.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250618122602.3156678-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Wed, 18 Jun 2025 12:25:58 +0000 (14:25 +0200)]
net: usb: lan78xx: Rename EVENT_LINK_RESET to EVENT_PHY_INT_ACK
The EVENT_LINK_RESET macro currently triggers deferred work after a PHY
interrupt. Prior to PHYLINK conversion, this work included reconfiguring
the MAC and PHY, effectively performing a 'link reset'.
However, after porting the driver to the PHYLINK framework, the logic
associated with this event now solely handles the acknowledgment of
the PHY interrupt. The MAC and PHY reconfiguration is now managed by
PHYLINK's dedicated callbacks.
To accurately reflect its current, narrowed functionality, rename
EVENT_LINK_RESET to EVENT_PHY_INT_ACK.
Oleksij Rempel [Wed, 18 Jun 2025 12:25:57 +0000 (14:25 +0200)]
net: usb: lan78xx: Convert to PHYLINK for improved PHY and MAC management
Convert the LAN78xx USB Ethernet driver to use the PHYLINK framework for
managing PHY and MAC interactions. This improves consistency with other
network drivers, simplifies pause frame handling, and enables cleaner
suspend/resume support.
Key changes:
- Replace all PHYLIB-based logic with PHYLINK equivalents:
- Replace phy_connect()/phy_disconnect() with phylink_connect_phy()
- Replace phy_start()/phy_stop() with phylink_start()/phylink_stop()
- Replace pauseparam handling with phylink_ethtool_get/set_pauseparam()
- Introduce lan78xx_phylink_setup() to configure PHYLINK
- Add phylink MAC operations:
- lan78xx_mac_config()
- lan78xx_mac_link_up()
- lan78xx_mac_link_down()
- Remove legacy link state handling:
- lan78xx_link_status_change()
- lan78xx_link_reset()
- Handle fixed-link fallback for LAN7801 using phylink_set_fixed_link()
- Replace deprecated flow control handling with phylink-managed logic
Power management:
- Switch suspend/resume paths to use phylink_suspend()/phylink_resume()
- Ensure proper use of rtnl_lock() where required
- Note: full runtime testing of power management is currently limited
due to hardware setup constraints
Note: Conversion of EEE (Energy Efficient Ethernet) handling to the
PHYLINK-managed API will be done in a follow-up patch. For now, the
legacy EEE enable logic is preserved in mac_link_up().
Matti Vaittinen [Wed, 18 Jun 2025 12:22:02 +0000 (15:22 +0300)]
net: gianfar: Use device_get_named_child_node_count()
We can avoid open-coding the loop construct which counts firmware child
nodes with a specific name by using the newly added
device_get_named_child_node_count().
The gianfar driver has such open-coded loop. Replace it with the
device_get_child_node_count_named().
net: fec: fec_enet_rx_queue(): move_call to _vlan_hwaccel_put_tag()
Move __vlan_hwaccel_put_tag() into the if statement that sets
vlan_packet_rcvd = true. This change eliminates the unnecessary
vlan_packet_rcvd variable, simplifying the code and improving clarity.
net: fec: fec_enet_rx_queue(): replace manual VLAN header calculation with skb_vlan_eth_hdr()
For better readability and maintainability, use the provided helper function
skb_vlan_eth_hdr() to replace manual the VLAN header calculation, and change
the type of vlan_header to struct vlan_ethhdr to take into account that the
Ethernet header plus VLAN header is returned.
In da722186f654 ("net: fec: set GPR bit on suspend by DT
configuration.") the platform_device_id fec_devtype::driver_data was
converted from holding the quirks to a pointing to struct fec_devinfo.
The struct fec_devinfo holding the information for the i.MX6SX was
named fec_imx6x_info.
Rename fec_imx6x_info to fec_imx6sx_info to align with the SoC's name.
In commit 4d494cdc92b3 ("net: fec: change data structure to support
multiqueue") the data structures were changed, so that the comment about
the sent-in-place skb doesn't apply any more. Remove it.
A couple of patches to cleanup loongson1. First, introducing a match
data struct to allow the per-match data to be extended beyond the init
function pointer, and then adding a setup method to allow the resource
base address to be translated to the MAC index at probe time rather
than repeatedly in the setup function.
====================
net: stmmac: loongson1: get ls1b resource only once
ls1b_dwmac_syscon_init() was getting the stmmac iomem resource to detect
which GMAC block is being used. Move this to a separate setup() function
that only runs at probe time, so it can sensibly behave with an
unrecognised resource adress. Use this to set a MAC index (id) which is
then used in place of testing the base address.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Keguang Zhang <keguang.zhang@gmail.com> Tested-by: Keguang Zhang <keguang.zhang@gmail.com> # on LS1B & LS1C Link: https://patch.msgid.link/E1uRqEE-004c7M-Go@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide a structure for match data rather than using the function
pointer as match data. This allows stronger type-checking for the
function itself, and allows extensions to the match data.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Keguang Zhang <keguang.zhang@gmail.com> Tested-by: Keguang Zhang <keguang.zhang@gmail.com> # on LS1B & LS1C Link: https://patch.msgid.link/E1uRqE9-004c7G-CB@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 19 Jun 2025 17:21:32 +0000 (10:21 -0700)]
Merge tag 'net-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless.
The ath12k fix to avoid FW crashes requires adding support for a
number of new FW commands so it's quite large in terms of LoC. The
rest is relatively small.
Current release - fix to a fix:
- ptp: fix breakage after ptp_vclock_in_use() rework
Current release - regressions:
- openvswitch: allocate struct ovs_pcpu_storage dynamically, static
allocation may exhaust module loader limit on smaller systems
Previous releases - regressions:
- tcp: fix tcp_packet_delayed() for peers with no selective ACK
support
Previous releases - always broken:
- wifi: ath12k: don't activate more links than firmware supports
- tcp: make sure sockets open via passive TFO have valid NAPI ID
- eth: bnxt_en: update MRU and RSS table of RSS contexts on queue
reset, prevent Rx queues from silently hanging after queue reset
- NFC: uart: set tty->disc_data only in success path"
* tag 'net-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: airoha: Differentiate hwfd buffer size for QDMA0 and QDMA1
net: airoha: Compute number of descriptors according to reserved memory size
tools: ynl: fix mixing ops and notifications on one socket
net: atm: fix /proc/net/atm/lec handling
net: atm: add lec_mutex
mlxbf_gige: return EPROBE_DEFER if PHY IRQ is not available
net: airoha: Always check return value from airoha_ppe_foe_get_entry()
NFC: nci: uart: Set tty->disc_data only in success path
calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().
MAINTAINERS: Remove Shannon Nelson from MAINTAINERS file
net: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()
eth: fbnic: avoid double free when failing to DMA-map FW msg
tcp: fix passive TFO socket having invalid NAPI ID
selftests: net: add test for passive TFO socket NAPI ID
selftests: net: add passive TFO test binary
selftests: netdevsim: improve lib.sh include in peer.sh
tipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer
Octeontx2-pf: Fix Backpresure configuration
net: ftgmac100: select FIXED_PHY
net: ethtool: remove duplicate defines for family info
...
Lorenzo Bianconi [Thu, 19 Jun 2025 07:07:25 +0000 (09:07 +0200)]
net: airoha: Differentiate hwfd buffer size for QDMA0 and QDMA1
EN7581 SoC allows configuring the size and the number of buffers in
hwfd payload queue for both QDMA0 and QDMA1.
In order to reduce the required DRAM used for hwfd buffers queues and
decrease the memory footprint, differentiate hwfd buffer size for QDMA0
and QDMA1 and reduce hwfd buffer size to 1KB for QDMA1 (WAN) while
maintaining 2KB for QDMA0 (LAN).
Lorenzo Bianconi [Thu, 19 Jun 2025 07:07:24 +0000 (09:07 +0200)]
net: airoha: Compute number of descriptors according to reserved memory size
In order to not exceed the reserved memory size for hwfd buffers,
compute the number of hwfd buffers/descriptors according to the
reserved memory size and the size of each hwfd buffer (2KB).
Fixes: 3a1ce9e3d01b ("net: airoha: Add the capability to allocate hwfd buffers via reserved-memory") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-1-49600a9b319a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 19 Jun 2025 15:38:40 +0000 (08:38 -0700)]
Merge tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
More fixes:
- ath12k
- avoid busy-waiting
- activate correct number of links
- iwlwifi
- iwldvm regression (lots of warnings)
- iwlmld merge damage regression (crash)
- fix build with some old gcc versions
- carl9170: don't talk to device w/o FW [syzbot]
- ath6kl: remove bad FW WARN [syzbot]
- ieee80211: use variable-length arrays [syzbot]
- mac80211
- remove WARN on delayed beacon update [syzbot]
- drop OCB frames with invalid source [syzbot]
* tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: Fix incorrect logic on cmd_ver range checking
wifi: iwlwifi: dvm: restore n_no_reclaim_cmds setting
wifi: iwlwifi: cfg: Limit cb_size to valid range
wifi: iwlwifi: restore missing initialization of async_handlers_list (again)
wifi: ath6kl: remove WARN on bad firmware input
wifi: carl9170: do not ping device which has failed to load firmware
wifi: ath12k: don't wait when there is no vdev started
wifi: ath12k: don't use static variables in ath12k_wmi_fw_stats_process()
wifi: ath12k: avoid burning CPU while waiting for firmware stats
wifi: ath12k: fix documentation on firmware stats
wifi: ath12k: don't activate more links than firmware supports
wifi: ath12k: update link active in case two links fall on the same MAC
wifi: ath12k: support WMI_MLO_LINK_SET_ACTIVE_CMDID command
wifi: ath12k: update freq range for each hardware mode
wifi: ath12k: parse and save sbs_lower_band_end_freq from WMI_SERVICE_READY_EXT2_EVENTID event
wifi: ath12k: parse and save hardware mode info from WMI_SERVICE_READY_EXT_EVENTID event for later use
wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STAT
wifi: mac80211: don't WARN for late channel/color switch
wifi: mac80211: drop invalid source address OCB frames
wifi: remove zero-length arrays
====================
Jakub Kicinski [Wed, 18 Jun 2025 17:17:46 +0000 (10:17 -0700)]
tools: ynl: fix mixing ops and notifications on one socket
The multi message support loosened the connection between the request
and response handling, as we can now submit multiple requests before
we start processing responses. Passing the attr set to NlMsgs decoding
no longer makes sense (if it ever did), attr set may differ message
by messsage. Isolate the part of decoding responsible for attr-set
specific interpretation and call it once we identified the correct op.
Without this fix performing SET operation on an ethtool socket, while
being subscribed to notifications causes:
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1096, in _op
# Exception| return self._ops(ops)[0]
# Exception| ~~~~~~~~~^^^^^
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1040, in _ops
# Exception| nms = NlMsgs(reply, attr_space=op.attr_set)
# Exception| ^^^^^^^^^^^
The value of op we use on line 1040 is stale, it comes form the previous
loop. If a notification comes before a response we will update op to None
and the next iteration thru the loop will break with the trace above.
Fixes: 6fda63c45fe8 ("tools/net/ynl: fix cli.py --subscribe feature") Fixes: ba8be00f68f5 ("tools/net/ynl: Add multi message support to ynl") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250618171746.1201403-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 18 Jun 2025 14:08:43 +0000 (14:08 +0000)]
net: atm: add lec_mutex
syzbot found its way in net/atm/lec.c, and found an error path
in lecd_attach() could leave a dangling pointer in dev_lec[].
Add a mutex to protect dev_lecp[] uses from lecd_attach(),
lec_vcc_attach() and lec_mcast_attach().
Following patch will use this mutex for /proc/net/atm/lec.
BUG: KASAN: slab-use-after-free in lecd_attach net/atm/lec.c:751 [inline]
BUG: KASAN: slab-use-after-free in lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008
Read of size 8 at addr ffff88807c7b8e68 by task syz.1.17/6142
David Thompson [Wed, 18 Jun 2025 13:59:02 +0000 (13:59 +0000)]
mlxbf_gige: return EPROBE_DEFER if PHY IRQ is not available
The message "Error getting PHY irq. Use polling instead"
is emitted when the mlxbf_gige driver is loaded by the
kernel before the associated gpio-mlxbf driver, and thus
the call to get the PHY IRQ fails since it is not yet
available. The driver probe() must return -EPROBE_DEFER
if acpi_dev_gpio_irq_get_by() returns the same.
Fixes: 6c2a6ddca763 ("net: mellanox: mlxbf_gige: Replace non-standard interrupt handling") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250618135902.346-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NFC: nci: uart: Set tty->disc_data only in success path
Setting tty->disc_data before opening the NCI device means we need to
clean it up on error paths. This also opens some short window if device
starts sending data, even before NCIUARTSETDRIVER IOCTL succeeded
(broken hardware?). Close the window by exposing tty->disc_data only on
the success path, when opening of the NCI device and try_module_get()
succeeds.
The code differs in error path in one aspect: tty->disc_data won't be
ever assigned thus NULL-ified. This however should not be relevant
difference, because of "tty->disc_data=NULL" in nci_uart_tty_open().
calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().
syzkaller reported a null-ptr-deref in sock_omalloc() while allocating
a CALIPSO option. [0]
The NULL is of struct sock, which was fetched by sk_to_full_sk() in
calipso_req_setattr().
Since commit a1a5344ddbe8 ("tcp: avoid two atomic ops for syncookies"),
reqsk->rsk_listener could be NULL when SYN Cookie is returned to its
client, as hinted by the leading SYN Cookie log.
Here are 3 options to fix the bug:
1) Return 0 in calipso_req_setattr()
2) Return an error in calipso_req_setattr()
3) Alaways set rsk_listener
1) is no go as it bypasses LSM, but 2) effectively disables SYN Cookie
for CALIPSO. 3) is also no go as there have been many efforts to reduce
atomic ops and make TCP robust against DDoS. See also commit 3b24d854cb35
("tcp/dccp: do not touch listener sk_refcnt under synflood").
As of the blamed commit, SYN Cookie already did not need refcounting,
and no one has stumbled on the bug for 9 years, so no CALIPSO user will
care about SYN Cookie.
Let's return an error in calipso_req_setattr() and calipso_req_delattr()
in the SYN Cookie case.
This can be reproduced by [1] on Fedora and now connect() of nc times out.
Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: John Cheung <john.cs.hey@gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=MvfhrGADy+-WJiftV2_WzMH4VEhEFmeT28qY+4yxNu4w@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20250617224125.17299-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Mon, 16 Jun 2025 22:44:37 +0000 (15:44 -0700)]
MAINTAINERS: Remove Shannon Nelson from MAINTAINERS file
Brett Creeley is taking ownership of AMD/Pensando drivers while I wander
off into the sunset with my retirement this month. I'll still keep an
eye out on a few topics for awhile, and maybe do some free-lance work in
the future.
Meanwhile, thank you all for the fun and support and the many learning
opportunities :-).
Special thanks go to DaveM for merging my first patch long ago, the big
ionic patchset a few years ago, and my last patchset last week.
All 10 patches are by Geert Uytterhoeven, target the rcar_canfd
driver, first cleanup/refactor the driver and then add support for
Transceiver Delay Compensation.
* tag 'linux-can-next-for-6.17-20250618' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rcar_canfd: Add support for Transceiver Delay Compensation
can: rcar_canfd: Return early in rcar_canfd_set_bittiming() when not FD
can: rcar_canfd: Share config code in rcar_canfd_set_bittiming()
can: rcar_canfd: Rename rcar_canfd_setrnc() to rcar_canfd_set_rnc()
can: rcar_canfd: Repurpose f_dcfg base for other registers
can: rcar_canfd: Simplify data access in rcar_canfd_{ge,pu}t_data()
can: rcar_canfd: Add helper variable dev to rcar_canfd_reset_controller()
can: rcar_canfd: Add helper variable ndev to rcar_canfd_rx_pkt()
can: rcar_canfd: Remove bittiming debug prints
can: rcar_canfd: Consistently use ndev for net_device pointers
====================
Alexey Kodanev [Mon, 16 Jun 2025 11:37:43 +0000 (11:37 +0000)]
net: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()
Before calling lan743x_ptp_io_event_clock_get(), the 'channel' value
is checked against the maximum value of PCI11X1X_PTP_IO_MAX_CHANNELS(8).
This seems correct and aligns with the PTP interrupt status register
(PTP_INT_STS) specifications.
However, lan743x_ptp_io_event_clock_get() writes to ptp->extts[] with
only LAN743X_PTP_N_EXTTS(4) elements, using channel as an index:
====================
selftests: net: use slowwait to make sure setup finished
The two updated tests sometimes failed because the network setup hadn't
completed. Used slowwait to ensure the setup finished and the tests
always passed. I ran both tests 50 times, and all of them passed.
====================
Hangbin Liu [Tue, 17 Jun 2025 10:50:59 +0000 (10:50 +0000)]
selftests: net: use slowwait to stabilize vrf_route_leaking test
The vrf_route_leaking test occasionally fails due to connectivity issues
in our testing environment. A sample failure message shows that the ping
check fails intermittently
PING 2001:db8:16:2::2 (2001:db8:16:2::2) 56 data bytes
This is likely due to insufficient wait time on slower machines. To address
this, switch to using slowwait, which provides a longer and more reliable
wait for setup completion.
Before this change, the test failed 3 out of 10 times. After applying this
fix, the test was run 30 times without any failure.
====================
Support bandwidth clamping in mana using net shapers
This patchset introduces hardware-backed bandwidth rate limiting
for MANA NICs via the net_shaper_ops interface, enabling efficient and
fine-grained traffic shaping directly on the device.
Previously, MANA lacked a mechanism for user-configurable bandwidth
control. With this addition, users can now configure shaping parameters,
allowing better traffic management and performance isolation.
The implementation includes the net_shaper_ops callbacks in the MANA
driver and supports one shaper per vport. Add shaping support via
mana_set_bw_clamp(), allowing the configuration of bandwidth rates
in 100 Mbps increments (minimum 100 Mbps). The driver validates input
and rejects unsupported values. On failure, it restores the previous
configuration which is queried using mana_query_link_cfg() or
retains the current state.
To prevent potential deadlocks introduced by net_shaper_ops, switch to
_locked variants of NAPI APIs when netdevops_lock is held during
VF setup and teardown.
Also, Add support for ethtool get_link_ksettings to report the maximum
link speed supported by the SKU in mbps.
These APIs when invoked on hardware that are older or that do
not support these APIs, the speed would be reported as UNKNOWN and
the net-shaper calls to set speed would fail.
====================
If any of the HWC commands are not recognized by the
underlying hardware, the hardware returns the response
header status of -1. Log the information using
netdev_info_once to avoid multiple error logs in dmesg.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/1750144656-2021-5-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net: mana: Add speed support in mana_get_link_ksettings
Allow mana ethtool get_link_ksettings operation to report
the maximum speed supported by the SKU in mbps.
The driver retrieves this information by issuing a
HWC command to the hardware via mana_query_link_cfg(),
which retrieves the SKU's maximum supported speed.
These APIs when invoked on hardware that are older/do
not support these APIs, the speed would be reported as UNKNOWN.
Before:
$ethtool enP30832s1
> Settings for enP30832s1:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: yes
After:
$ethtool enP30832s1
> Settings for enP30832s1:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 16000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: yes
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-4-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Introduce support for net_shaper_ops in the MANA driver,
enabling configuration of rate limiting on the MANA NIC.
To apply rate limiting, the driver issues a HWC command via
mana_set_bw_clamp() and updates the corresponding shaper object
in the net_shaper cache. If an error occurs during this process,
the driver restores the previous speed by querying the current link
configuration using mana_query_link_cfg().
The minimum supported bandwidth is 100 Mbps, and only values that are
exact multiples of 100 Mbps are allowed. Any other values are rejected.
To remove a shaper, the driver resets the bandwidth to the maximum
supported by the SKU using mana_set_bw_clamp() and clears the
associated cache entry. If an error occurs during this process,
the shaper details are retained.
On the hardware that does not support these APIs, the net-shaper
calls to set speed would fail.
Set the speed:
./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/net_shaper.yaml \
--do set --json '{"ifindex":'$IFINDEX',
"handle":{"scope": "netdev", "id":'$ID' },
"bw-max": 200000000 }'
Get the shaper details:
./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/net_shaper.yaml \
--do get --json '{"ifindex":'$IFINDEX',
"handle":{"scope": "netdev", "id":'$ID' }}'
net: mana: Fix potential deadlocks in mana napi ops
When net_shaper_ops are enabled for MANA, netdev_ops_lock
becomes active.
MANA VF setup/teardown by netvsc follows this call chain:
netvsc_vf_setup()
dev_change_flags()
...
__dev_open() OR __dev_close()
dev_change_flags() holds the netdev mutex via netdev_lock_ops.
Meanwhile, mana_create_txq() and mana_create_rxq() in mana_open()
path call NAPI APIs (netif_napi_add_tx(), netif_napi_add_weight(),
napi_enable()), which also try to acquire the same lock, risking
deadlock.
Similarly in the teardown path (mana_close()), netif_napi_disable()
and netif_napi_del(), contend for the same lock.
Switch to the _locked variants of these APIs to avoid deadlocks
when the netdev_ops_lock is held.
Fixes: d4c22ec680c8 ("net: hold netdev instance lock during ndo_open/ndo_stop") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1750144656-2021-2-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Guillaume Nault [Mon, 16 Jun 2025 11:58:29 +0000 (13:58 +0200)]
ipv6: Simplify link-local address generation for IPv6 GRE.
Since commit 3e6a0243ff00 ("gre: Fix again IPv6 link-local address
generation."), addrconf_gre_config() has stopped handling IP6GRE
devices specially and just calls the regular addrconf_addr_gen()
function to create their link-local IPv6 addresses.
We can thus avoid using addrconf_gre_config() for IP6GRE devices and
use the normal IPv6 initialisation path instead (that is, jump directly
to addrconf_dev_config() in addrconf_init_auto_addrs()).
See commit 3e6a0243ff00 ("gre: Fix again IPv6 link-local address
generation.") for a deeper explanation on how and why GRE devices
started handling their IPv6 link-local address generation specially,
why it was a problem, and why this is not even necessary in most cases
(especially for GRE over IPv6).
====================
Add support for PSE budget evaluation strategy
This series brings support for budget evaluation strategy in the PSE
subsystem. PSE controllers can set priorities to decide which ports should
be turned off in case of special events like over-current.
This patch series adds support for two budget evaluation strategy.
1. Static Method:
This method involves distributing power based on PD classification.
It’s straightforward and stable, the PSE core keeping track of the
budget and subtracting the power requested by each PD’s class.
Advantages: Every PD gets its promised power at any time, which
guarantees reliability.
Disadvantages: PD classification steps are large, meaning devices
request much more power than they actually need. As a result, the power
supply may only operate at, say, 50% capacity, which is inefficient and
wastes money.
2. Dynamic Method:
To address the inefficiencies of the static method, vendors like
Microchip have introduced dynamic power budgeting, as seen in the
PD692x0 firmware. This method monitors the current consumption per port
and subtracts it from the available power budget. When the budget is
exceeded, lower-priority ports are shut down.
Advantages: This method optimizes resource utilization, saving costs.
Disadvantages: Low-priority devices may experience instability.
The UAPI allows adding support for software port priority mode managed from
userspace later if needed.
Patches 1-2: Add support for interrupt event report in PSE core, ethtool
and ethtool specs.
Patch 3: Adds support for interrupt and event report in TPS23881 driver.
Patches 4,5: Add support for PSE power domain in PSE core and ethtool.
Patches 6-8: Add support for budget evaluation strategy in PSE core,
ethtool and ethtool specs.
Patches 9-11: Add support for port priority and power supplies in PD692x0
drivers.
Patches 12,13: Add support for port priority in TPS23881 drivers.
====================
Add an interrupt property to the device tree bindings for the TI TPS23881
PSE controller. The interrupt is primarily used to detect classification
and disconnection events, which are essential for managing the PSE
controller in compliance with the PoE standard.
Interrupt support is essential for the proper functioning of the TPS23881
controller. Without it, after a power-on (PWON), the controller will
no longer perform detection and classification. This could lead to
potential hazards, such as connecting a non-PoE device after a PoE device,
which might result in magic smoke.
net: pse-pd: tps23881: Add support for static port priority feature
This patch enhances PSE callbacks by introducing support for the static
port priority feature. It extends interrupt management to handle and report
detection, classification, and disconnection events. Additionally, it
introduces the pi_get_pw_req() callback, which provides information about
the power requested by the Powered Devices.
Interrupt support is essential for the proper functioning of the TPS23881
controller. Without it, after a power-on (PWON), the controller will
no longer perform detection and classification. This could lead to
potential hazards, such as connecting a non-PoE device after a PoE device,
which might result in magic smoke.
Adds the regulator supply parameter of the managers.
Update also the example as the regulator supply of the PSE PIs
should be the managers itself and not an external regulator.
net: pse-pd: pd692x0: Add support for controller and manager power supplies
Add support for managing the VDD and VDDA power supplies for the PD692x0
PSE controller, as well as the VAUX5 and VAUX3P3 power supplies for the
PD6920x PSE managers.
net: pse-pd: pd692x0: Add support for PSE PI priority feature
This patch extends the PSE callbacks by adding support for the newly
introduced pi_set_prio() callback, enabling the configuration of PSE PI
priorities. The current port priority is now also included in the status
information returned to users.
net: ethtool: Add PSE port priority support feature
This patch expands the status information provided by ethtool for PSE c33
with current port priority and max port priority. It also adds a call to
pse_ethtool_set_prio() to configure the PSE port priority.
net: pse-pd: Add support for budget evaluation strategies
This patch introduces the ability to configure the PSE PI budget evaluation
strategies. Budget evaluation strategies is utilized by PSE controllers to
determine which ports to turn off first in scenarios such as power budget
exceedance.
The pis_prio_max value is used to define the maximum priority level
supported by the controller. Both the current priority and the maximum
priority are exposed to the user through the pse_ethtool_get_status call.
This patch add support for two mode of budget evaluation strategies.
1. Static Method:
This method involves distributing power based on PD classification.
It’s straightforward and stable, the PSE core keeping track of the
budget and subtracting the power requested by each PD’s class.
Advantages: Every PD gets its promised power at any time, which
guarantees reliability.
Disadvantages: PD classification steps are large, meaning devices
request much more power than they actually need. As a result, the power
supply may only operate at, say, 50% capacity, which is inefficient and
wastes money.
Priority max value is matching the number of PSE PIs within the PSE.
2. Dynamic Method:
To address the inefficiencies of the static method, vendors like
Microchip have introduced dynamic power budgeting, as seen in the
PD692x0 firmware. This method monitors the current consumption per port
and subtracts it from the available power budget. When the budget is
exceeded, lower-priority ports are shut down.
Advantages: This method optimizes resource utilization, saving costs.
Disadvantages: Low-priority devices may experience instability.
Priority max value is set by the PSE controller driver.
For now, budget evaluation methods are not configurable and cannot be
mixed. They are hardcoded in the PSE driver itself, as no current PSE
controller supports both methods.
Introduce PSE power domain support as groundwork for upcoming port
priority features. Multiple PSE PIs can now be grouped under a single
PSE power domain, enabling future enhancements like defining available
power budgets, port priority modes, and disconnection policies. This
setup will allow the system to assess whether activating a port would
exceed the available power budget, preventing over-budget states
proactively.
net: pse-pd: tps23881: Add support for PSE events and interrupts
Add support for PSE event reporting through interrupts. Set up the newly
introduced devm_pse_irq_helper helper to register the interrupt. Events are
reported for over-current and over-temperature conditions.
Add support for devm_pse_irq_helper() to register PSE interrupts and report
events such as over-current or over-temperature conditions. This follows a
similar approach to the regulator API but also sends notifications using a
dedicated PSE ethtool netlink socket.
net: pse-pd: Introduce attached_phydev to pse control
In preparation for reporting PSE events via ethtool notifications,
introduce an attached_phydev field in the pse_control structure.
This field stores the phy_device associated with the PSE PI,
ensuring that notifications are sent to the correct network
interface.
The attached_phydev pointer is directly tied to the PHY lifecycle. It
is set when the PHY is registered and cleared when the PHY is removed.
There is no need to use a refcount, as doing so could interfere with
the PHY removal process.
Jakub Kicinski [Thu, 19 Jun 2025 01:57:32 +0000 (18:57 -0700)]
Merge branch 'phc-support-in-ena-driver'
David Arinzon says:
====================
PHC support in ENA driver
This patchset adds the support for PHC (PTP Hardware Clock)
in the ENA driver. The documentation part of the patchset
includes additional information, including statistics,
utilization and invocation examples through the testptp
utility.
====================
David Arinzon [Tue, 17 Jun 2025 11:05:42 +0000 (14:05 +0300)]
net: ena: Control PHC enable through devlink
Add the capability to set parameters through the devlink framework.
The parameter used for controlling PHC (enable/disable) details
are as follows:
- Name: enable_phc
- Type: Boolean (true - enable/false - disable)
- Mode: DEVLINK_PARAM_CMODE_DRIVERINIT
- Effect: Changes take place during driver initialization,
any changes require a devlink reload to take effect.
David Arinzon [Tue, 17 Jun 2025 11:05:39 +0000 (14:05 +0300)]
net: ena: Add device reload capability through devlink
Adding basic devlink capability support of reloading the driver.
This capability is required to support driver init type
devlink params (DEVLINK_PARAM_CMODE_DRIVERINIT). Such params
require reloading of the driver (destroy/restore sequence).
The reloading is done by the devlink framework using the
hooks provided by the driver.
David Arinzon [Tue, 17 Jun 2025 11:05:38 +0000 (14:05 +0300)]
net: ena: PHC silent reset
Each PHC device kernel registration receives a unique kernel index,
which is associated with a new PHC device file located at
"/dev/ptp<index>".
This device file serves as an interface for obtaining PHC timestamps.
Examples of tools that use "/dev/ptp" include testptp [1]
and chrony [2].
A reset flow may occur in the ENA driver while PHC is active.
During a reset, the driver will unregister and then re-register the
PHC device with the kernel.
Under race conditions, particularly during heavy PHC loads,
the driver’s reset flow might complete faster than the kernel’s PHC
unregister/register process.
This can result in the PHC index being different from what it was prior
to the reset, as the PHC index is selected using kernel ID
allocation [3].
While driver rmmod/insmod are done by the user, a reset may occur
at anytime, without the user awareness, consequently, the driver
might receive a new PHC index after the reset, potentially compromising
the user experience.
To prevent this issue, the PHC flow will detect the reset during PHC
destruction and will skip the PHC unregister/register calls to preserve
the kernel PHC index.
During the reset flow, any attempt to get a PHC timestamp will fail as
expected, but the kernel PHC index will remain unchanged.
David Arinzon [Tue, 17 Jun 2025 11:05:37 +0000 (14:05 +0300)]
net: ena: Add PHC support in the ENA driver
The ENA driver will be extended to support the new PHC feature using
ptp_clock interface [1]. this will provide timestamp reference for user
space to allow measuring time offset between the PHC and the system
clock in order to achieve nanosecond accuracy.
Recently bnxt had to grow back a bunch of rtnl dependencies because
of udp_tunnel's infra. Add separate (global) mutext to protect
udp_tunnel state.
====================
udp_tunnel infra doesn't need RTNL, should be safe to get back
to only netdev instance lock.
Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-7-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that there is only one path in udp_tunnel, there is no need to
have udp_ports_sleep knob. Remove it and adjust the test.
Cc: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-6-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: remove redundant ASSERT_RTNL() in queue setup functions
The existing netdev_ops_assert_locked() already asserts that either
the RTNL lock or the per-device lock is held, making the explicit
ASSERT_RTNL() redundant.
Drivers that are using ops lock and don't depend on RTNL lock
still need to manage it because udp_tunnel's RTNL dependency.
Introduce new udp_tunnel_nic_lock and use it instead of
rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from
udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to
grab udp_tunnel_nic_lock mutex and might sleep).
We won't be able to sleep soon in vxlan_offload_rx_ports and won't be
able to grab sock_lock. Instead of having separate spinlock to
manage sockets, rely on rtnl lock. This is similar to how geneve
manages its sockets.
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250616162117.287806-3-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
geneve: rely on rtnl lock in geneve_offload_rx_ports
udp_tunnel_push_rx_port will grab mutex in the next patch so
we can't use rcu. geneve_offload_rx_ports is called
from geneve_netdevice_event for NETDEV_UDP_TUNNEL_PUSH_INFO and
NETDEV_UDP_TUNNEL_DROP_INFO which both have ASSERT_RTNL.
Entries are added to and removed from the sock_list under rtnl
lock as well (when adding or removing a tunneling device).
====================
net: fix passive TFO socket having invalid NAPI ID
Found a bug where an accepted passive TFO socket returns an invalid NAPI
ID (i.e. 0) from SO_INCOMING_NAPI_ID. Add a selftest for this using
netdevsim and fix the bug.
Patch 1 is a drive-by fix for the lib.sh include in an existing
drivers/net/netdevsim/peer.sh selftest.
Patch 2 adds a test binary for a simple TFO server/client.
Patch 3 adds a selftest for checking that the NAPI ID of a passive TFO
socket is valid. This will currently fail.
Patch 4 adds a fix for the bug.
====================
David Wei [Tue, 17 Jun 2025 21:21:02 +0000 (14:21 -0700)]
tcp: fix passive TFO socket having invalid NAPI ID
There is a bug with passive TFO sockets returning an invalid NAPI ID 0
from SO_INCOMING_NAPI_ID. Normally this is not an issue, but zero copy
receive relies on a correct NAPI ID to process sockets on the right
queue.
Fix by adding a sk_mark_napi_id_set().
Fixes: e5907459ce7e ("tcp: Record Rx hash and NAPI ID in tcp_child_process") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250617212102.175711-5-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Mon, 16 Jun 2025 21:24:05 +0000 (23:24 +0200)]
dpaa_eth: don't use fixed_phy_change_carrier
This effectively reverts 6e8b0ff1ba4c ("dpaa_eth: Add change_carrier()
for Fixed PHYs"). Usage of fixed_phy_change_carrier() requires that
fixed_phy_register() has been called before, directly or indirectly.
And that's not the case in this driver.
Linus Torvalds [Thu, 19 Jun 2025 00:47:27 +0000 (17:47 -0700)]
Merge tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix alternate data streams bug
- Important fix for null pointer deref with Kerberos authentication
- Fix oops in smbdirect (RDMA) in free_transport
* tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: handle set/get info file for streamed file
ksmbd: fix null pointer dereference in destroy_previous_session
ksmbd: add free_transport ops in ksmbd connection
Linus Torvalds [Wed, 18 Jun 2025 21:31:16 +0000 (14:31 -0700)]
Merge tag 'driver-core-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Fix a race condition in Devres::drop(). This depends on two other
patches:
- (Minimal) Rust abstractions for struct completion
- Let Revocable indicate whether its data is already being revoked
- Fix Devres to avoid exposing the internal Revocable
- Add .mailmap entry for Danilo Krummrich
- Add Madhavan Srinivasan to embargoed-hardware-issues.rst
* tag 'driver-core-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
Documentation: embargoed-hardware-issues.rst: Add myself for Power
mailmap: add entry for Danilo Krummrich
rust: devres: do not dereference to the internal Revocable
rust: devres: fix race in Devres::drop()
rust: revocable: indicate whether `data` has been revoked already
rust: completion: implement initial abstraction
Linus Torvalds [Wed, 18 Jun 2025 21:25:50 +0000 (14:25 -0700)]
Merge tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
- In cgroup1 freezer, a task migrating into a frozen cgroup might not
get frozen immediately due to the wrong operation order. Fix it.
* tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup,freezer: fix incomplete freezing when attaching tasks
Linus Torvalds [Wed, 18 Jun 2025 21:22:31 +0000 (14:22 -0700)]
Merge tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
- Fix missed early init of wq_isolated_cpumask
* tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()
Each link supports multiple channels for example RPM link supports
16 channels. In case of link oversubsribe, NIX will assert backpressure
on receive channels.
The previous patch considered a single channel per link, resulting in
backpressure not being enabled on the remaining channels
Fixes: a7ef63dbd588 ("octeontx2-af: Disable backpressure between CPT and NIX") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250617063403.3582210-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>