tcp: Remove hashinfo test for inet6?_lookup_run_sk_lookup().
Commit 6c886db2e78c ("net: remove duplicate sk_lookup helpers")
started to check if hashinfo == net->ipv4.tcp_death_row.hashinfo
in __inet_lookup_listener() and inet6_lookup_listener() and
stopped invoking BPF sk_lookup prog for DCCP.
tcp: Remove sk_protocol test for tcp_twsk_unique().
Commit 383eed2de529 ("tcp: get rid of twsk_unique()") added
sk->sk_protocol test in __inet_check_established() and
__inet6_check_established() to remove twsk_unique() and call
tcp_twsk_unique() directly.
====================
net: airoha: Add PPE support for RX wlan offload
Introduce the missing bits to airoha ppe driver to offload traffic received
by the MT76 driver (wireless NIC) and forwarded by the Packet Processor
Engine (PPE) to the ethernet interface.
Lorenzo Bianconi [Sat, 23 Aug 2025 07:56:04 +0000 (09:56 +0200)]
net: airoha: Introduce check_skb callback in ppe_dev ops
Export airoha_ppe_check_skb routine in ppe_dev ops. check_skb callback
will be used by the MT76 driver in order to offload the traffic received
by the wlan NIC and forwarded to the ethernet one.
Add rx_wlan parameter to airoha_ppe_check_skb routine signature.
Lorenzo Bianconi [Sat, 23 Aug 2025 07:56:03 +0000 (09:56 +0200)]
net: airoha: Add airoha_ppe_dev struct definition
Introduce airoha_ppe_dev struct as container for PPE offload callbacks
consumed by the MT76 driver during flowtable offload for traffic
received by the wlan NIC and forwarded to the wired one.
Add airoha_ppe_setup_tc_block_cb routine to PPE offload ops for MT76
driver.
Rely on airoha_ppe_dev pointer in airoha_ppe_setup_tc_block_cb
signature.
Lorenzo Bianconi [Sat, 23 Aug 2025 07:56:02 +0000 (09:56 +0200)]
net: airoha: Rely on airoha_eth struct in airoha_ppe_flow_offload_cmd signature
Rely on airoha_eth struct in airoha_ppe_flow_offload_cmd routine
signature and in all the called subroutines.
This is a preliminary patch to introduce flowtable offload for traffic
received by the wlan NIC and forwarded to the ethernet one.
Daniel Golle [Fri, 22 Aug 2025 17:38:49 +0000 (18:38 +0100)]
net: phy: mxl-86110: add basic support for MxL86111 PHY
Add basic support for the MxL86111 PHY which in addition to the features
of the MxL86110 also comes with an SGMII interface.
Setup the interface mode and take care of in-band-an.
Currently only RGMII-to-UTP and SGMII-to-UTP modes are supported while the
PHY would also support RGMII-to-1000Base-X, including automatic selection
of the Fiber or UTP link depending on the presence of a link partner.
Daniel Golle [Fri, 22 Aug 2025 17:38:39 +0000 (18:38 +0100)]
net: phy: mxl-86110: fix indentation in struct phy_driver
The .led_hw_control_get and .led_hw_control_set ops are indented with
spaces instead of tabs, unlike the rest of the values of the PHY's
struct phy_driver instance.
Use tabs instead of spaces resulting in a uniform indentation style.
Daniel Golle [Fri, 22 Aug 2025 17:38:27 +0000 (18:38 +0100)]
net: phy: mxl-86110: add basic support for led_brightness_set op
Add support for forcing each connected LED to be always on or always off
by implementing the led_brightness_set() op.
This is done by modifying the COM_EXT_LED_GEN_CFG register to enable
force-mode and forcing the LED either on or off.
When calling the led_hw_control_set() force-mode is again disabled for
that LED.
Implement mxl86110_modify_extended_reg() locked helper instead of
manually acquiring and releasing the MDIO bus lock for single
__mxl86110_modify_extended_reg() calls.
Qingfang Deng [Fri, 22 Aug 2025 01:25:47 +0000 (09:25 +0800)]
ppp: remove rwlock usage
In struct channel, the upl lock is implemented using rwlock_t,
protecting access to pch->ppp and pch->bridge.
As previously discussed on the list, using rwlock in the network fast
path is not recommended.
This patch replaces the rwlock with a spinlock for writers, and uses RCU
for readers.
- pch->ppp and pch->bridge are now declared as __rcu pointers.
- Readers use rcu_dereference_bh() under rcu_read_lock_bh().
- Writers use spin_lock() to update.
Yue Haibing [Fri, 22 Aug 2025 06:40:51 +0000 (14:40 +0800)]
ipv6: mcast: Add ip6_mc_find_idev() helper
Extract the same code logic from __ipv6_sock_mc_join() and
ip6_mc_find_dev(), also add new helper ip6_mc_find_idev() to
reduce redundancy and enhance readability.
====================
net: ipv4: allow directed broadcast routes to use dst hint
Currently, ip_extract_route_hint uses RTN_BROADCAST to decide
whether to use the route dst hint mechanism.
This check is too strict, as it prevents directed broadcast
routes from using the hint, resulting in poor performance
during bursts of directed broadcast traffic.
This series fixes this, and adds a new selftest to ensure
this does not regress.
Link to v2: https://lore.kernel.org/20250814140309.3742-1-oscmaes92@gmail.com
====================
Oscar Maes [Tue, 19 Aug 2025 17:46:42 +0000 (19:46 +0200)]
selftests: net: add test for dst hint mechanism with directed broadcast addresses
Add a test for ensuring that the dst hint mechanism is used for
directed broadcast addresses.
This test relies on mausezahn for sending directed broadcast packets.
Additionally, a high GRO flush timeout is set to ensure that packets
will be received as lists.
The test determines if the hint mechanism was used by checking
the in_brd statistic using lnstat.
Oscar Maes [Tue, 19 Aug 2025 17:46:41 +0000 (19:46 +0200)]
net: ipv4: allow directed broadcast routes to use dst hint
Currently, ip_extract_route_hint uses RTN_BROADCAST to decide
whether to use the route dst hint mechanism.
This check is too strict, as it prevents directed broadcast
routes from using the hint, resulting in poor performance
during bursts of directed broadcast traffic.
Fix this in ip_extract_route_hint and modify ip_route_use_hint
to preserve the intended behaviour.
Alessandro Ratti [Fri, 22 Aug 2025 14:03:40 +0000 (16:03 +0200)]
selftests: rtnetlink: skip tests if tools or feats are missing
Some rtnetlink selftests assume the presence of ifconfig and iproute2
support for the `proto` keyword in `ip address` commands. These
assumptions can cause test failures on modern systems (e.g. Debian
Bookworm) where:
- ifconfig is not installed by default
- The iproute2 version lacks support for address protocol
This patch improves test robustness by:
- Skipping kci_test_promote_secondaries if ifconfig is missing
- Skipping do_test_address_proto if ip address help does not mention
proto
These changes ensure the tests degrade gracefully by reporting SKIP
instead of FAIL when prerequisites are not met, improving portability
across systems.
====================
net: dsa: lantiq_gswip: prepare for supporting new features
Prepare for supporting the newer standalone MaxLinear GSW1xx switch
family by refactoring the existing lantiq_gswip driver.
This is the first of a total of 3 series and doesn't yet introduce
any functional changes, but rather just makes the driver more
flexible, so new hardware and features can be supported in future.
This series has been preceded by an RFC series which covers everything
needed to support the MaxLinear GSW1xx family of switches. Andrew Lunn
had suggested to start with the 8 patches now submitted as they prepare
but don't yet introduce any functional changes.
Everything has been compile and runtime tested on AVM Fritz!Box 7490
(GSWIP version 2.1, VR9 v1.2)
Daniel Golle [Fri, 22 Aug 2025 16:12:21 +0000 (17:12 +0100)]
net: dsa: lantiq_gswip: store switch API version in priv
Store the switch API version in struct gswip_priv. As the hardware has
the 'major/minor' version bytes in the wrong order preventing numerical
comparisons the version to be stored in gswip_priv is constructed in
such a way that the REV field is the most significant byte and the MOD
field the least significant byte. Also provide a conveniance macro to
allow comparing the stored version of the hardware against the already
defined GSWIP_VERSION_* macros.
This is done in order to prepare supporting newer features such as 4096
VLANs and per-port configurable learning which are only available
starting from specific hardware versions.
Daniel Golle [Fri, 22 Aug 2025 16:12:13 +0000 (17:12 +0100)]
net: dsa: lantiq_gswip: make DSA tag protocol model-specific
While the older Lantiq / Intel which are currently supported all use
the DSA_TAG_GSWIP tagging protocol, newer MaxLinear GSW1xx modules use
another 8-byte tagging protocol. Move the tag protocol information to
struct gswip_hw_info to make it possible for new models to specify
a different tagging protocol.
Load microcode as specified in struct hw_info instead of relying on
a single array of instructions. This is done in preparation to allow
loading different microcode for the MaxLinear GSW1xx family.
Daniel Golle [Fri, 22 Aug 2025 16:11:57 +0000 (17:11 +0100)]
net: dsa: lantiq_gswip: introduce bitmap for MII ports
Instead of relying on hard-coded numbers for MII ports, introduce
a bitmap for MII ports.
This is done in order to prepare for supporting MaxLinear GSW1xx ICs
which got a different layout of ports.
Daniel Golle [Fri, 22 Aug 2025 16:11:39 +0000 (17:11 +0100)]
net: dsa: lantiq_gswip: prepare for more CPU port options
The MaxLinear GSW1xx series of switches support using either the
(R)(G)MII interface on port 5 or the SGMII interface on port 4 to be
used as CPU port. Prepare for supporting them by defining a mask of
allowed CPU ports instead of a single port.
The two instances of struct dsa_switch_ops differ only by their
.phylink_get_caps op. Instead of having two instances of dsa_switch_ops,
rather just have a pointer to the phylink_get_caps function in
struct gswip_hw_info.
Jakub Kicinski [Fri, 22 Aug 2025 19:56:45 +0000 (12:56 -0700)]
selftests: drv-net: xdp: make sure we're actually testing native XDP
Kernel tries to be helpful and attach the XDP program in generic
mode if the driver has no BPF ndo at all. Since the xdp.py tests
all have "native" in their names this can be quite confusing.
Force native / "drv" attachment. Note that netdevsim re-uses
the generic handler as its "native" handler, so we'll maintain
the test coverage of the generic mode that way. No need to test
both explicitly, I reckon.
====================
Aquantia PHY driver consolidation - part 1
This started out as an effort to add some new features hinging on the
VEND1_GLOBAL_CFG_* registers, but I quickly started to notice that the
Aquantia PHY driver has a large code base, but individual PHYs only
implement arbitrary subsets of it.
The table below lists the PHYs known to me to have the
VEND1_GLOBAL_CFG_* registers.
PHY Access from Access from
aqr107_read_rate() aqr113c_fill_interface_modes()
------------------------------------------------------------------
AQR107 y n
AQCS109 y n
AQR111 y n
AQR111B0 y n
AQR112 y n
AQR412 y n
AQR113 y y
AQR113C y y
AQR813 y n
AQR114C y n
AQR115C y y
Maybe you're wondering, after reading this, why don't more Aquantia PHYs
populate phydev->possible_interfaces based on the registers that they
are known to have? And why do AQR114C and AQR115C, PHYs from the same
generation, just having different max speeds, differ in this behaviour?
And why does AQR813, the 8-port variant of AQR113, not call
aqr113c_config_init(), but aqr107_config_init()?
I did wonder, and I don't know either, but I suspect it has to do with
developers not wanting to break what they can't test, and only touching
what they are interested in. Multiplied at a large enough scale, this
tends to result in unmaintainable code.
The tendency might also be encouraged by the slightly strange and
inconsistent naming scheme in this driver.
The set proposes a naming scheme based on generations, and feature
inheritance from Gen X to Gen X+1. This helps fill in missing
software functionalities where the hardware feature should be present.
I had to put a hard stop at 15 patches, so I've picked the more
meaningful functions to consolidate, rather than going through the
entire driver. Depending on review feedback, I can do more or I can
stop.
Furthermore, the set adds generation-appropriate support for two more
PHY IDs: AQR412 and AQR115, and fixes the improper reporting of AQR412C
as AQR412.
The changes were tested on AQR107, AQR112, AQR412C and AQR115.
====================
Camelia Groza [Thu, 21 Aug 2025 15:20:22 +0000 (18:20 +0300)]
net: phy: aquantia: add support for AQR115
AQR115 is similar to the already supported AQR115C, having speeds up to
2.5Gbps. In fact, the two differ only in the FCBGA package size (7x11mm
vs 7x7mm for the Compact variant). So it makes sense that the feature
set is identical for the 2 drivers.
This PHY is present on the newest PCB revision E (v4.0) of the NXP
LS1046A-RDB, having replaced the RTL8211FS SGMII PHY going to fm1-mac5.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:21 +0000 (18:20 +0300)]
net: phy: aquantia: promote AQR813 and AQR114C to aqr_gen4_config_init()
I'm not sure whether there is any similar real-life problem on AQR813
and AQR114C as were seen on the PHYs that these commit were written for:
- a7f3abcf6357 ("net: phy: aquantia: only poll GLOBAL_CFG regs on
aqr113, aqr113c and aqr115c")
- bed90b06b681 ("net: phy: aquantia: clear PMD Global Transmit Disable
bit during init")
but the inconsistency in handling between PHYs of the same generation is
striking. Apart from different firmware builds with different
provisioning, the only difference between these PHYs should be the max
link speed and/or the number of ports.
Let's try and see if there's any problem if all PHYs from the same
generation use the same config_init() method.
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Robert Marko <robimarko@gmail.com> Cc: Paweł Owoc <frut3k7@gmail.com> Cc: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250821152022.1065237-15-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 21 Aug 2025 15:20:20 +0000 (18:20 +0300)]
net: phy: aquantia: rename aqr113c_config_init() to aqr_gen4_config_init()
aqr113c_config_init() is called by AQR113, AQR113C, AQR115C, all Gen4
PHYs. Thus, rename this to aqr_gen4_config_init().
Currently, aqr113c_config_init() calls aqr_gen2_config_init(). Since
we've established that these are Gen4 PHYs, it makes sense to inherit
the Gen3 feature set as well. Currently, aqr_gen3_config_init() just
calls aqr_gen2_config_init(), so we can safely make this extra
modification and expect no functional change.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:19 +0000 (18:20 +0300)]
net: phy: aquantia: reimplement aqcs109_config_init() as aqr_gen2_config_init()
I lack documentation for AQCS109, but from commit 99c864667c9f ("net:
phy: aquantia: add support for AQCS109"), it is known that "From
software point of view, it should be almost equivalent to AQR107."
Based on further conjecture of the device numbering scheme, I am
treating it as similar to AQR109 (a Gen2 PHY capable of to 2.5G).
Its current instructions are also present in other init sequences as
below:
- aqr_wait_reset_complete() ... aqr107_chip_info() as well as
aqr107_set_downshift() are in aqr_gen1_config_init()
- aqr_gen2_fill_interface_modes() is in aqr_gen2_config_init()
So it would be good to centralize this implementation by just calling
aqr_gen2_config_init().
In practice this completes support for the following features, which are
present on AQR109 already:
- Potentially reverse MDI lane order via "marvell,mdi-cfg-order"
- Restore polarity of active-high and active-low LEDs after reset.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:18 +0000 (18:20 +0300)]
net: phy: aquantia: call aqr_gen3_config_init() for AQR112 and AQR412(C)
The AQrate Gen3 PHYs are AQR111(C), AQR112(C), and their multi-port
variants, like AQR411(C), AQR412(C).
Currently, AQR112, AQR412 and AQR412C are Gen3 PHYs supported by the
driver which have no config_init() implementation. I have hardware and
documentation that confirms they are compatible with the operations done
in aqr_gen2_config_init(), a Gen2-level function.
This is needed as a preparation for reading cached registers in
aqr_gen2_read_status(), which is a function that these PHYs already call.
The initial reading is done from:
thus the need for them to also call aqr_gen2_config_init(), in order for
the cached register values to be available.
In expectation of Gen3-specific features, introduce aqr_gen3_config_init()
which calls aqr_gen2_config_init(). Also modify the AQR111 silicon
variants to call their generation-appropriate init function. No
functional change for these, hence the minor mention.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:17 +0000 (18:20 +0300)]
net: phy: aquantia: call aqr_gen2_fill_interface_modes() for AQCS109
I don't have documentation or hardware to test, but according to commit 99c864667c9f ("net: phy: aquantia: add support for AQCS109"), "From
software point of view, it should be almost equivalent to AQR107."
I am relatively confident that the GLOBAL_CFG registers read by
aqr_gen2_fill_interface_modes() are supported, because
aqr_gen2_read_status(), currently used by AQCS109, also reads them, and
I'm unaware of any reported problem.
The change is necessary because a future patch will introduce a
requirement for all aqr_gen2_read_status() callers to have previously
called aqr_gen2_read_global_syscfg(). This is done through
aqr_gen2_fill_interface_modes().
Vladimir Oltean [Thu, 21 Aug 2025 15:20:16 +0000 (18:20 +0300)]
net: phy: aquantia: merge and rename aqr105_read_status() and aqr107_read_status()
aqr105_read_status() and aqr107_read_status() are very similar.
In fact, they are identical, save from a code snippet accessing a Gen2
feature (rate adaptation), placed at the end of aqr107_read_rate(), and
absent from aqr105_read_rate().
After the recent change "net: phy: aquantia: use cached GLOBAL_CFG
registers in aqr107_read_rate()", it is absolutely trivial to
restructure the code as follows:
aqr_gen2_read_status()
-> aqr_gen1_read_status()
-> Gen2-specific stuff (read GLOBAL_CFG registers to set rate_matching)
Vladimir Oltean [Thu, 21 Aug 2025 15:20:15 +0000 (18:20 +0300)]
net: phy: aquantia: use cached GLOBAL_CFG registers in aqr107_read_rate()
aqr107_read_rate() - called from aqr107_read_status() even periodically
if there is no PHY IRQ - currently reads GLOBAL_CFG registers to
determine what kind of rate adaptation is in use for the current
phydev->speed. However, GLOBAL_CFG registers are runtime invariants, so
accessing the slow MDIO bus is unnecessary.
Reimplement aqr107_read_rate() by reading from the
priv->global_cfg[i].rade_adapt variables (where i is the entry
corresponding to the current phydev->speed).
Making this change also helps disentangle the code delta between
aqr105_read_rate() and aqr107_read_rate(). They are now identical up to
the code snippet which iterates over priv->global_cfg[]. This will help
eliminate the duplicate code in the upcoming patch.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:14 +0000 (18:20 +0300)]
net: phy: aquantia: remove handling for get_rate_matching(PHY_INTERFACE_MODE_NA)
After commit 7642cc28fd37 ("net: phylink: fix PHY validation with rate
adaption"), the API contract changed and PHY drivers are no longer
required to respond to the .get_rate_matching() method for
PHY_INTERFACE_MODE_NA. This was later followed up by documentation
commit 6d4cfcf97986 ("net: phy: Update documentation for
get_rate_matching").
As such, handling PHY_INTERFACE_MODE_NA in the Aquantia PHY driver
implementation of this method is unnecessary and confusing. Remove it.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:13 +0000 (18:20 +0300)]
net: phy: aquantia: save a local shadow of GLOBAL_CFG register values
Currently, aqr_gen2_fill_interface_modes() reads VEND1_GLOBAL_CFG_*
registers to populate phydev->supported_interfaces. But this is not
the only place which needs to read these registers. There is also
aqr107_read_rate().
Based on the premise that these values are statically set by firmware
and the driver only needs to read them, the proposal is to read them
only once, at config_init() time, and use the cached values also in
aqr107_read_rate().
This patch only refactors the aqr_gen2_fill_interface_modes() code to
save the registers to driver memory, and to populate supported_interfaces
based on that.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:12 +0000 (18:20 +0300)]
net: phy: aquantia: fill supported_interfaces for all aqr_gen2_config_init() callers
Since aqr_gen2_config_init() and aqr_gen2_fill_interface_modes() refer to
the feature set common to the same generation, it means all callers of
aqr_gen2_config_init() also support the Global System Configuration
registers at addresses 1E.31B -> 1E.31F, and these should be read by the
driver to figure out the list of supported interfaces for phylink.
This affects the following PHYs supported by this driver:
- Gen2: AQR107
- Gen3: AQR111, AQR111B0
- Gen4: AQR114C, AQR813.
AQR113C, a Gen4 PHY, has unmodified logic after this change, because
currently, the aqr_gen2_fill_interface_modes() call is chained after
aqr_gen2_config_init(), and after this patch, it is tail-called from the
latter function, leading to the same code flow.
At the same time, move aqr_gen2_fill_interface_modes() upwards of its
new caller, aqr_gen2_config_init(), to avoid a forward declaration.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:11 +0000 (18:20 +0300)]
net: phy: aquantia: rename some aqr107 functions according to generation
Establish a more intuitive function naming convention in this driver.
A GenX PHY must only call aqr_genY_ functions, where Y <= X.
Loosely speaking, aqr107_ is representative of Gen2 and above, except for:
- aqr107_config_init()
- aqr107_suspend()
- aqr107_resume()
- aqr107_wait_processor_intensive_op()
which are also called by AQR105, so these are renamed to Gen1.
Actually aqr107_config_init() is renamed to aqr_gen1_config_init() when
called by AQR105, and aqr_gen2_config_init() when called by all other
PHYs. The Gen2 function calls the Gen1 function, so there is no
functional change. This prefaces further Gen2-specific initialization
steps which must be omitted for AQR105. These will be added to
aqr_gen2_config_init().
In fact, many PHY drivers call an aqr*_config_init() beneath their
generation's feature set: AQR114C is a Gen4 PHY which calls
aqr_gen2_config_init(), even though AQR113C, also a Gen4 PHY which
differs only in maximum link speed, calls the richer
aqr113c_config_init() which also sets phydev->possible_interfaces.
Many of the more subtle inconsistencies of this kind will be fixed up in
later changes.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:10 +0000 (18:20 +0300)]
net: phy: aquantia: reorder AQR113C PMD Global Transmit Disable bit clearing with supported_interfaces
Introduced in commit bed90b06b681 ("net: phy: aquantia: clear PMD Global
Transmit Disable bit during init"), the clearing of MDIO_PMA_TXDIS plus
the call to aqr107_wait_processor_intensive_op() are only by chance
placed between aqr107_config_init() and aqr107_fill_interface_modes().
In other words, aqr107_fill_interface_modes() does not depend in any way
on these 2 operations.
I am only 90% sure of that, and I intend to move aqr107_fill_interface_modes()
to be a part of aqr107_config_init() in the future. So to isolate the
issue for blame attribution purposes, make these 2 functions adjacent to
each other again.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:09 +0000 (18:20 +0300)]
net: phy: aquantia: merge aqr113c_fill_interface_modes() into aqr107_fill_interface_modes()
I'm unsure whether intentionate or not, but I think the (partially
observed) naming convention in this driver is that function prefixes
denote the earliest generation when a feature is available. In case of
aqr107_fill_interface_modes(), that means that the GLOBAL_CFG registers
are a Gen2 feature. Supporting evidence: the AQR105, a Gen1 PHY, does
not have these registers, thus the function is not named aqr105_*.
Based on this inferred naming scheme, I am proposing a refinement of
commit a7f3abcf6357 ("net: phy: aquantia: only poll GLOBAL_CFG regs on
aqr113, aqr113c and aqr115c") which introduced aqr113c_fill_interface_modes(),
suggesting this may be a Gen4 PHY feature.
The long-term goal is for aqr107_config_init() to tail-call
aqr107_fill_interface_modes(), such that the latter function is also
called by AQR107 itself, and many other PHY drivers. Currently it can't,
because aqr113c_config_init() calls aqr107_config_init() and then
aqr113c_fill_interface_modes(). So this would lead to a duplicate call
to aqr107_fill_interface_modes() for AQR113C.
Centralize the reading of GLOBAL_CFG registers in the AQR107 method, and
create a boolean, set to true by AQR113C, which tests whether waiting
for a non-zero value in the GLOBAL_CFG_100M register is necessary.
Vladimir Oltean [Thu, 21 Aug 2025 15:20:08 +0000 (18:20 +0300)]
net: phy: aquantia: rename AQR412 to AQR412C and add real AQR412
I have noticed from schematics and firmware images that the PHY for
which I've previously added support in commit 973fbe68df39 ("net: phy:
aquantia: add AQR112 and AQR412 PHY IDs") is actually an AQR412C, not
AQR412.
These are actually PHYs from the same generation, and Marvell documents
them as differing only in the size of the FCCSP package: 19x19 mm for
the AQR412, vs 14x12mm for the Compact AQR412C.
I don't think there is any point in backporting this to stable kernels,
since the PHYs are identical in capabilities, and no functional
difference is expected regardless of how the PHY is identified.
Support to use adaptive RX coalescing. Change the default RX coalesce
usecs and limit the range of parameters for various types of devices,
according to their hardware design.
====================
Jiawen Wu [Thu, 21 Aug 2025 02:34:05 +0000 (10:34 +0800)]
net: ngbe: change the default ITR setting
Change the default RX/TX ITR for wx_mac_em devices from 20K to 7K, which
is an experience value from out-of-tree ngbe driver, to get higher
performance on some platforms with weak single-core performance.
TCP_SRTEAM test on Phytium 2000+ shows that the throughput of 64-Byte
packets is increased from 350.53Mbits/s to 395.92Mbits/s.
====================
net: hinic3: Add a driver for Huawei 3rd gen NIC - management interfaces
This is the 2/3 patch of the patch-set described below.
The patch-set contains driver for Huawei's 3rd generation HiNIC
Ethernet device that will be available in the future.
This is an SRIOV device, designed for data centers.
Initially, the driver only supports VFs.
Following the discussion over RFC01, the code will be submitted in
separate smaller patches where until the last patch the driver is
non-functional. The RFC02 submission contains overall view of the entire
driver but every patch will be posted as a standalone submission.
====================
Ujwal Kundur [Wed, 20 Aug 2025 17:55:50 +0000 (23:25 +0530)]
rds: Fix endianness annotations for RDS extension headers
Per the RDS 3.1 spec [1], RDS extension headers EXTHDR_NPATHS and
EXTHDR_GEN_NUM are be16 and be32 values respectively, exchanged during
normal operations over-the-wire (RDS Ping/Pong). This contrasts their
declarations as host endian unsigned ints.
Fix the annotations across occurrences. Flagged by Sparse.
Ujwal Kundur [Wed, 20 Aug 2025 17:55:48 +0000 (23:25 +0530)]
rds: Fix endianness annotation of jhash wrappers
__ipv6_addr_jhash (wrapper around jhash2()) and __inet_ehashfn (wrapper
around jhash_3words()) work with u32 (host endian) values but accept big
endian inputs. Declare the local variables as big endian to avoid
unnecessary casts.
====================
selftests: Test XDP_TX for single-buffer
Ensure single buffer XDP functions correctly by covering the following cases:
1) Zero size payload
2) Full MTU
3) Single buffer packets through a multi-buffer XDP program
These changes were tested with netdevsim and fbnic.
# ./ksft-net-drv/drivers/net/xdp.py
TAP version 13
1..10
ok 1 xdp.test_xdp_native_pass_sb
ok 2 xdp.test_xdp_native_pass_mb
ok 3 xdp.test_xdp_native_drop_sb
ok 4 xdp.test_xdp_native_drop_mb
ok 5 xdp.test_xdp_native_tx_sb
ok 6 xdp.test_xdp_native_tx_mb
# Failed run: pkt_sz 2048, offset 1. Last successful run: pkt_sz 1024, offset 256. Reason: Adjustment failed
ok 7 xdp.test_xdp_native_adjst_tail_grow_data
ok 8 xdp.test_xdp_native_adjst_tail_shrnk_data
# Failed run: pkt_sz 512, offset -256. Last successful run: pkt_sz 512, offset -128. Reason: Adjustment failed
ok 9 xdp.test_xdp_native_adjst_head_grow_data
# Failed run: pkt_sz (2048) > HDS threshold (1536) and offset 64 > 48
ok 10 xdp.test_xdp_native_adjst_head_shrnk_data
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0
====================
selftests: drv-net: xdp: Extract common XDP_TX setup/validation.
In preparation of single-buffer XDP_TX tests, refactor common test code
into the _test_xdp_native_tx method. Add support for multiple payload
sizes, and additional validation for RX packet count. Pass the -n flag
to echo to avoid adding an extra byte into the TX packet.
Due to a hardware errata, CN10k and earlier Octeon silicon series,
the hardware may incorrectly assert XOFF on certain channels during
reset. As a workaround, a write operation to the NIX_AF_RX_CHANX_CFG
register can be performed to broadcast XON signals on the affected
channels
Jakub Kicinski [Fri, 22 Aug 2025 00:23:26 +0000 (17:23 -0700)]
Merge tag 'nf-next-25-08-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
First patch gets rid of refcounting for dying list dumping,
use a cookie value instead of keeping the object around.
Remaining patches extend nftables pipapo (concatenated ranges) set type.
Make the AVX2 optimized version available from the control plane as
well, then use it during insert. This gives a nice speedup for large
sets. All from myself.
On PREEMPT_RT, we can't rely on local_bh_disable to protect the
access to the percpu scratch maps. Use nested-BH locking for this,
From Sebastian Siewior.
* tag 'nf-next-25-08-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch
netfilter: nft_set_pipapo: Store real pointer, adjust later.
netfilter: nft_set_pipapo: use avx2 algorithm for insertions too
netfilter: nft_set_pipapo_avx2: split lookup function in two parts
netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection
netfilter: ctnetlink: remove refcounting in dying list dumping
====================
The kernel test robot reports that various drivers have an undefined
reference to stmmac_simple_pm_ops. This is caused by
EXPORT_SYMBOL_GPL_SIMPLE_DEV_PM_OPS() defining the struct as static
and omitting the export when CONFIG_PM=n, unlike DEFINE_SIMPLE_PM_OPS()
which still defines the struct non-static.
Switch to using DEFINE_SIMPLE_PM_OPS() + EXPORT_SYMBOL_GPL(), which
means we always define stmmac_simple_pm_ops, and it will always be
visible for dwmac-* to reference whether modular or built-in.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508132051.a7hJXkrd-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202508132158.dEwQdick-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202508140029.V6tDuUxc-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202508161406.RwQuZBkA-lkp@intel.com/ Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uojpo-00BMoL-4W@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: Remove the use of dev_err_probe()
The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.
====================
Xichao Zhao [Wed, 20 Aug 2025 08:57:49 +0000 (16:57 +0800)]
net: dsa: Remove the use of dev_err_probe()
The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.
Xichao Zhao [Wed, 20 Aug 2025 08:57:48 +0000 (16:57 +0800)]
net: hibmcge: Remove the use of dev_err_probe()
The dev_err_probe() doesn't do anything when error is '-ENOMEM'.
Therefore, remove the useless call to dev_err_probe(), and just
return the value instead.
We've added 9 non-merge commits during the last 3 day(s) which contain
a total of 13 files changed, 1027 insertions(+), 27 deletions(-).
The main changes are:
1) Added bpf dynptr support for accessing the metadata of a skb,
from Jakub Sitnicki.
The patches are merged from a stable branch bpf-next/skb-meta-dynptr.
The same patches have also been merged into bpf-next/master.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Cover metadata access from a modified skb clone
selftests/bpf: Cover read/write to skb metadata at an offset
selftests/bpf: Cover write access to skb metadata via dynptr
selftests/bpf: Cover read access to skb metadata via dynptr
selftests/bpf: Parametrize test_xdp_context_tuntap
selftests/bpf: Pass just bpf_map to xdp_context_test helper
selftests/bpf: Cover verifier checks for skb_meta dynptr type
bpf: Enable read/write access to skb metadata through a dynptr
bpf: Add dynptr type for skb metadata
====================
Linus Torvalds [Thu, 21 Aug 2025 17:51:15 +0000 (13:51 -0400)]
Merge tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth.
Current release - fix to a fix:
- usb: asix_devices: fix PHY address mask in MDIO bus initialization
Current release - regressions:
- Bluetooth: fixes for the split between BIS_LINK and PA_LINK
- Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN
flag", breaks compatibility with some existing device tree blobs
- dsa: b53: fix reserved register access in b53_fdb_dump()
Current release - new code bugs:
- sched: dualpi2: run probability update timer in BH to avoid
deadlock
- eth: libwx: fix the size in RSS hash key population
- pse-pd: pd692x0: improve power budget error paths and handling
Previous releases - regressions:
- tls: fix handling of zero-length records on the rx_list
- hsr: reject HSR frame if skb can't hold tag
- bonding: fix negotiation flapping in 802.3ad passive mode
Previous releases - always broken:
- gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM
- sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit,
avoid packet drops with low buffer_limit, remove unnecessary WARN()
- sched: fix backlog accounting after modifying config of a qdisc in
the middle of the hierarchy
- mptcp: improve handling of skb extension allocation failures
- eth: mlx5:
- fixes for the "HW Steering" flow management method
- fixes for QoS and device buffer management"
* tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
netfilter: nf_reject: don't leak dst refcount for loopback packets
net/mlx5e: Preserve shared buffer capacity during headroom updates
net/mlx5e: Query FW for buffer ownership
net/mlx5: Restore missing scheduling node cleanup on vport enable failure
net/mlx5: Fix QoS reference leak in vport enable error path
net/mlx5: Destroy vport QoS element when no configuration remains
net/mlx5e: Preserve tc-bw during parent changes
net/mlx5: Remove default QoS group and attach vports directly to root TSAR
net/mlx5: Base ECVF devlink port attrs from 0
net: pse-pd: pd692x0: Skip power budget configuration when undefined
net: pse-pd: pd692x0: Fix power budget leak in manager setup error path
Octeontx2-af: Skip overlap check for SPI field
selftests: tls: add tests for zero-length records
tls: fix handling of zero-length records on the rx_list
net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision
selftests: bonding: add test for passive LACP mode
bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU
bonding: update LACP activity flag after setting lacp_active
Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag"
ipv6: sr: Fix MAC comparison to be constant-time
...
This is because blamed commit forgot about loopback packets.
Such packets already have a dst_entry attached, even at PRE_ROUTING stage.
Instead of checking hook just check if the skb already has a route
attached to it.
Fixes: f53b9b0bdc59 ("netfilter: introduce support for reject at prerouting stage") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20250820123707.10671-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 20 Aug 2025 02:56:50 +0000 (19:56 -0700)]
net: page_pool: add page_pool_get()
There is a page_pool_put() function but no get equivalent.
Having multiple references to a page pool is quite useful.
It avoids branching in create / destroy paths in drivers
which support memory providers.
Use the new helper in bnxt.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250820025704.166248-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Armen Ratner [Wed, 20 Aug 2025 13:32:09 +0000 (16:32 +0300)]
net/mlx5e: Preserve shared buffer capacity during headroom updates
When port buffer headroom changes, port_update_shared_buffer()
recalculates the shared buffer size and splits it in a 3:1 ratio
(lossy:lossless) - Currently, the calculation is:
lossless = shared / 4;
lossy = (shared / 4) * 3;
Meaning, the calculation dropped the remainder of shared % 4 due to
integer division, unintentionally reducing the total shared buffer
by up to three cells on each update. Over time, this could shrink
the buffer below usable size.
Fix it by changing the calculation to:
lossless = shared / 4;
lossy = shared - lossless;
This retains all buffer cells while still approximating the
intended 3:1 split, preventing capacity loss over time.
While at it, perform headroom calculations in units of cells rather than
in bytes for more accurate calculations avoiding extra divisions.
Alexei Lazar [Wed, 20 Aug 2025 13:32:08 +0000 (16:32 +0300)]
net/mlx5e: Query FW for buffer ownership
The SW currently saves local buffer ownership when setting
the buffer.
This means that the SW assumes it has ownership of the buffer
after the command is set.
If setting the buffer fails and we remain in FW ownership,
the local buffer ownership state incorrectly remains as SW-owned.
This leads to incorrect behavior in subsequent PFC commands,
causing failures.
Instead of saving local buffer ownership in SW,
query the FW for buffer ownership when setting the buffer.
This ensures that the buffer ownership state is accurately
reflected, avoiding the issues caused by incorrect ownership
states.
Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Carolina Jubran [Wed, 20 Aug 2025 13:32:05 +0000 (16:32 +0300)]
net/mlx5: Destroy vport QoS element when no configuration remains
If a VF has been configured and the user later clears all QoS settings,
the vport element remains in the firmware QoS tree. This leads to
inconsistent behavior compared to VFs that were never configured, since
the FW assumes that unconfigured VFs are outside the QoS hierarchy.
As a result, the bandwidth share across VFs may differ, even though
none of them appear to have any configuration.
Align the driver behavior with the FW expectation by destroying the
vport QoS element when all configurations are removed.
Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate") Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250820133209.389065-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Carolina Jubran [Wed, 20 Aug 2025 13:32:04 +0000 (16:32 +0300)]
net/mlx5e: Preserve tc-bw during parent changes
When changing parent of a node/leaf with tc-bw configured, the code
saves and restores tc-bw values. However, it was reading the converted
hardware bw_share values (where 0 becomes 1) instead of the original
user values, causing incorrect tc-bw calculations after parent change.
Store original tc-bw values in the node structure and use them directly
for save/restore operations.
Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Carolina Jubran [Wed, 20 Aug 2025 13:32:03 +0000 (16:32 +0300)]
net/mlx5: Remove default QoS group and attach vports directly to root TSAR
Currently, the driver creates a default group (`node0`) and attaches
all vports to it unless the user explicitly sets a parent group. As a
result, when a user configures tx_share on a group and tx_share on
a VF, the expectation is for the group and the VF to share bandwidth
relatively. However, since the VF is not connected to the same parent
(but to the default node), the proportional share logic is not applied
correctly.
To fix this, remove the default group (`node0`) and instead connect
vports directly to the root TSAR when no parent is specified. This
ensures that vports and groups share the same root scheduler and their
tx_share values are compared directly under the same hierarchy.
Fixes: 0fe132eac38c ("net/mlx5: E-switch, Allow to add vports to rate groups") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Jurgens [Wed, 20 Aug 2025 13:32:02 +0000 (16:32 +0300)]
net/mlx5: Base ECVF devlink port attrs from 0
Adjust the vport number by the base ECVF vport number so the port
attributes start at 0. Previously the port attributes would start 1
after the maximum number of host VFs.
Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent [Wed, 20 Aug 2025 13:33:21 +0000 (15:33 +0200)]
net: pse-pd: pd692x0: Skip power budget configuration when undefined
If the power supply's power budget is not defined in the device tree,
the current code still requests power and configures the PSE manager
with a 0W power limit, which is undesirable behavior.
Skip power budget configuration entirely when the budget is zero,
avoiding unnecessary power requests and preventing invalid 0W limits
from being set on the PSE manager.
Fixes: 359754013e6a ("net: pse-pd: pd692x0: Add support for PSE PI priority feature") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250820133321.841054-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>