Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:20 +0000 (11:06 +0200)]
net: airoha: Refactor src port configuration in airhoha_set_gdm2_loopback
AN7583 chipset relies on different definitions for source-port
identifier used for hw offloading. In order to support hw offloading
in AN7583 controller, refactor src port configuration in
airhoha_set_gdm2_loopback routine and introduce get_src_port_id
callback.
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:19 +0000 (11:06 +0200)]
net: airoha: Select default ppe cpu port in airoha_dev_init()
Select the PPE default cpu port in airoha_dev_init routine.
This patch allows to distribute the load between the two available cpu
ports (FE_PSE_PORT_CDM1 and FE_PSE_PORT_CDM2) if the device is running a
single PPE module (e.g. 7583) selecting the cpu port based on the use
QDMA device. For multi-PPE device (e.g. 7581) assign FE_PSE_PORT_CDM1 to
PPE1 and FE_PSE_PORT_CDM2 to PPE2.
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:18 +0000 (11:06 +0200)]
net: airoha: ppe: Flush PPE SRAM table during PPE setup
Rely on airoha_ppe_foe_commit_sram_entry routine to flush SRAM PPE table
entries. This patch allow moving PPE SRAM flush during PPE setup and
avoid dumping uninitialized values via the debugfs if no entries are
offloaded yet.
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:17 +0000 (11:06 +0200)]
net: airoha: ppe: Configure SRAM PPE entries via the cpu
Introduce airoha_ppe_foe_commit_sram_entry routine in order to configure
the SRAM PPE entries directly via the CPU instead of using the NPU APIs.
This is a preliminary patch to enable netfilter flowtable hw offload for
AN7583 SoC.
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:16 +0000 (11:06 +0200)]
net: airoha: ppe: Remove airoha_ppe_is_enabled() where not necessary
Now each PPE has always PPE_STATS_NUM_ENTRIES entries so we do not need
to run airoha_ppe_is_enabled routine to check if the hash refers to
PPE1 or PPE2.
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:15 +0000 (11:06 +0200)]
net: airoha: ppe: Move PPE memory info in airoha_eth_soc_data struct
AN7583 SoC runs a single PPE device while EN7581 runs two of them.
Moreover PPE SRAM in AN7583 SoC is reduced to 8K (while SRAM is 16K on
EN7581). Take into account PPE memory layout during PPE configuration.
Rename airoha_ppe2_is_enabled() in airoha_ppe_is_enabled() and
generalize it in order to check if each PPE module is enabled.
Rely on airoha_ppe_is_enabled routine to properly initialize PPE for
AN7583 SoC since AN7583 does not support PPE2.
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:13 +0000 (11:06 +0200)]
net: airoha: Add airoha_eth_soc_data struct
Introduce airoha_eth_soc_data struct to contain differences between
various SoC. Move XSI reset names in airoha_eth_soc_data. This is a
preliminary patch to enable AN7583 ethernet controller support in
airoha-eth driver.
Co-developed-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20251017-an7583-eth-support-v3-4-f28319666667@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:12 +0000 (11:06 +0200)]
net: airoha: Add airoha_ppe_get_num_stats_entries() and airoha_ppe_get_num_total_stats_entries()
Introduce airoha_ppe_get_num_stats_entries and
airoha_ppe_get_num_total_stats_entries routines in order to make the
code more readable controlling if CONFIG_NET_AIROHA_FLOW_STATS is
enabled or disabled.
Modify airoha_ppe_foe_get_flow_stats_index routine signature relying on
airoha_ppe_get_num_total_stats_entries().
Lorenzo Bianconi [Fri, 17 Oct 2025 09:06:10 +0000 (11:06 +0200)]
dt-bindings: net: airoha: Add AN7583 support
Introduce AN7583 ethernet controller support to Airoha EN7581
device-tree bindings. The main difference between EN7581 and AN7583 is
the number of reset lines required by the controller (AN7583 does not
require hsi-mac).
Bagas Sanjaya [Fri, 17 Oct 2025 06:45:26 +0000 (13:45 +0700)]
net: 6pack: Demote "How to turn on 6pack support" section heading
"How to turn on 6pack support" is a subsection of "Building and
installing the 6pack driver". Yet, the former is in the same heading
level as the latter as sections, making it listed in networking docs
toctree.
The first patch is by me and adds support for an optional reset to the
m_can drivers.
Vincent Mailhol's patch targets all drivers and removes the
can_change_mtu() function, since the netdev's min and max MTU are
populated.
Markus Schneider-Pargmann contributes 4 patches to the m_can driver to
add am62 wakeup support.
The last 7 patches are by me and provide various cleanups to the m_can
driver.
* tag 'linux-can-next-for-6.19-20251017' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: m_can: m_can_get_berr_counter(): don't wake up controller if interface is down
can: m_can: m_can_tx_submit(): remove unneeded sanity checks
can: m_can: m_can_class_register(): remove error message in case devm_kzalloc() fails
can: m_can: m_can_interrupt_enable(): use m_can_write() instead of open coding it
net: m_can: convert dev_{dbg,info,err} -> netdev_{dbg,info,err}
can: m_can: hrtimer_callback(): rename to m_can_polling_timer()
can: m_can: m_can_init_ram(): make static
can: m_can: Support pinctrl wakeup state
can: m_can: Return ERR_PTR on error in allocation
can: m_can: Map WoL to device_set_wakeup_enable
dt-bindings: can: m_can: Add wakeup properties
can: treewide: remove can_change_mtu()
can: m_can: add support for optional reset
====================
Jacob revives one-year-old work from Jesse Brandeburg to implement the
standardized statistics interfaces from ethtool in the ice driver.
Vitaly introduces a new private flag to control the K1 power state of ICH
network controllers supported by the e1000e driver. This flag has been
extensively discussed on the list and deemed the best available option to
provide a field workaround without impacting the many configurations that
have no issues with the K1 power state.
====================
Vitaly Lifshits [Fri, 17 Oct 2025 06:08:43 +0000 (23:08 -0700)]
e1000e: Introduce private flag to disable K1
The K1 state reduces power consumption on ICH family network controllers
during idle periods, similarly to L1 state on PCI Express NICs. Therefore,
it is recommended and enabled by default.
However, on some systems it has been observed to have adverse side
effects, such as packet loss. It has been established through debug that
the problem may be due to firmware misconfiguration of specific systems,
interoperability with certain link partners, or marginal electrical
conditions of specific units.
These problems typically cannot be fixed in the field, and generic
workarounds to resolve the side effects on all systems, while keeping K1
enabled, were found infeasible.
Therefore, add the option for users to globally disable K1 idle state on
the adapter.
Additionally, disable K1 by default for MTL and later platforms, due to
issues reported with the current configuration.
Jesse Brandeburg [Fri, 17 Oct 2025 06:08:41 +0000 (23:08 -0700)]
ice: refactor to use helpers
Use the ice_netdev_to_pf() helper in more places and remove a bunch of
boilerplate code. Not every instance could be replaced due to use of the
netdev_priv() output or the vsi variable within a bunch of functions.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251016-jk-iwl-next-2025-10-15-v2-12-ff3a390d9fc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesse Brandeburg [Fri, 17 Oct 2025 06:08:39 +0000 (23:08 -0700)]
ice: add tracking of good transmit timestamps
As a pre-requisite to implementing timestamp statistics, start tracking
successful PTP timestamps. There already existed a trace event, but
add a counter as well so it can be displayed by the next patch.
Good count is a u64 as it is much more likely to be incremented. The
existing error stats are all u32 as before, and are less likely so will
wrap less.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251016-jk-iwl-next-2025-10-15-v2-10-ff3a390d9fc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesse Brandeburg [Fri, 17 Oct 2025 06:08:38 +0000 (23:08 -0700)]
ice: implement ethtool standard stats
Add support for MAC/pause/RMON stats. This enables reporting hardware
statistics in a common way via:
ethtool -S eth0 --all-groups
and
ethtool --include-statistics --show-pause eth0
While doing so, add support for one new stat, receive length error
(RLEC), which is extremely unlikely to happen since most L2 frames have
a type/length field specifying a "type", and raw ethernet frames aren't
used much any longer.
NOTE: I didn't implement Ctrl aka control frame stats because the
hardware doesn't seem to implement support.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251016-jk-iwl-next-2025-10-15-v2-9-ff3a390d9fc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesse Brandeburg [Fri, 17 Oct 2025 06:08:37 +0000 (23:08 -0700)]
net: docs: add missing features that can have stats
While trying to figure out ethtool -I | --include-statistics, I noticed
some docs got missed when implementing commit 0e9c127729be ("ethtool:
add interface to read Tx hardware timestamping statistics").
Fix up the docs to match the kernel code, and while there, sort them in
alphabetical order.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251016-jk-iwl-next-2025-10-15-v2-8-ff3a390d9fc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ankit Garg [Fri, 17 Oct 2025 01:25:42 +0000 (18:25 -0700)]
gve: Consolidate and persist ethtool ring changes
Refactor the ethtool ring parameter configuration logic to address two
issues: unnecessary queue resets and lost configuration changes when
the interface is down.
Previously, `gve_set_ringparam` could trigger multiple queue
destructions and recreations for a single command, as different settings
(e.g., header split, ring sizes) were applied one by one. Furthermore,
if the interface was down, any changes made via ethtool were discarded
instead of being saved for the next time the interface was brought up.
This patch centralizes the configuration logic. Individual functions
like `gve_set_hsplit_config` are modified to only validate and stage
changes in a temporary config struct.
The main `gve_set_ringparam` function now gathers all staged changes
and applies them as a single, combined configuration:
1. If the interface is up, it calls `gve_adjust_config` once.
2. If the interface is down, it saves the settings directly to the
driver's private struct, ensuring they persist and are used when
the interface is brought back up.
Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251017012614.3631351-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 16 Oct 2025 18:29:11 +0000 (18:29 +0000)]
net: shrink napi_skb_cache_{put,get}() and napi_skb_cache_get_bulk()
Following loop in napi_skb_cache_put() is unrolled by the compiler
even if CONFIG_KASAN is not enabled:
for (i = NAPI_SKB_CACHE_HALF; i < NAPI_SKB_CACHE_SIZE; i++)
kasan_mempool_unpoison_object(nc->skb_cache[i],
kmem_cache_size(net_hotdata.skbuff_cache));
We have 32 times this sequence, for a total of 384 bytes.
This is because kmem_cache_size() is not an inline and not const,
and kasan_unpoison_object_data() is an inline function.
Cache kmem_cache_size() result in a variable, so that
the compiler can remove dead code (and variable) when/if
CONFIG_KASAN is unset.
After this patch, napi_skb_cache_put() is inlined in its callers,
and we avoid one kmem_cache_size() call in napi_skb_cache_get()
and napi_skb_cache_get_bulk().
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251016182911.1132792-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
convert net drivers to ndo_hwtstamp API part 1
This is part 1 of patchset to convert drivers which support HW
timestamping to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
The new API uses netlink to communicate with user-space and have some
test coverage. Part 2 will contain another 6 patches from v1 of the
series.
There are some drivers left with old ioctl interface after this series:
- mlx5 driver be shortly converted by nVidia folks
- TI netcp ethss driver which needs separate series which I'll post
after this one.
====================
Vadim Fedorenko [Thu, 16 Oct 2025 15:25:14 +0000 (15:25 +0000)]
tsnep: convert to ndo_hwtstatmp API
Convert to .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
After conversions the rest of tsnep_netdev_ioctl() becomes pure
phy_do_ioctl_running(), so remove tsnep_netdev_ioctl() and replace
it with phy_do_ioctl_running() in .ndo_eth_ioctl.
Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20251016152515.3510991-7-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vadim Fedorenko [Thu, 16 Oct 2025 15:25:13 +0000 (15:25 +0000)]
cxgb4: convert to ndo_hwtstamp API
Convert to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
There is some change in the logic as well. Previously, the driver was
storing newly requested configuration regardless of whether it was
applied or not. In case of request validation error, inconsistent
configuration would be returned by the driver. New logic stores
configuration only if was properly validated and applied. It brings the
consistency between reported and actual configuration.
Vadim Fedorenko [Thu, 16 Oct 2025 15:25:12 +0000 (15:25 +0000)]
net: atlantic: convert to ndo_hwtstamp API
Convert driver to .ndo_hwtstamp_get()/.ndo_hwtstamp_set() callbacks.
.ndo_eth_ioctl() becomes empty so remove it. Also simplify code with no
functional changes.
Vadim Fedorenko [Thu, 16 Oct 2025 15:25:10 +0000 (15:25 +0000)]
ti: icssg: convert to ndo_hwtstamp API
Convert driver to use .ndo_hwtstamp_get()/.ndo_hwtstamp_set() API.
.ndo_eth_ioctl() implementation becomes pure phy_do_ioctl(), remove
it from common module, remove exported symbol and replace ndo callback.
This series is radical - it takes the brave step of ripping out much of
the existing PCS support code and throwing it all away.
I have discussed the introduction of the STMMAC_FLAG_HAS_INTEGRATED_PCS
flag with Bartosz Golaszewski, and the conclusion I came to is that
this is to workaround the breakage that I've been going on about
concerning the phylink conversion for the last five or six years.
The problem is that the stmmac PCS code manipulates the netif carrier
state, which confuses phylink.
There is a way of testing this out on the Jetson Xavier NX platform as
the "PCS" code paths can be exercised while in RGMII mode - because
RGMII also has in-band status and the status register is shared with
SGMII. Testing this out confirms my long held theory: the interrupt
handler manipulates the netif carrier state before phylink gets a
look-in, which means that the mac_link_up() and mac_link_down() methods
are never called, resulting in the device being non-functional.
Moreover, on dwmac4 cores, ethtool reports incorrect information -
despite having a full-duplex link, ethtool reports that it is
half-dupex.
Thus, this code is completely broken - anyone using it will not have
a functional platform, and thus it doesn't deserve to live any longer,
especially as it's a thorn in phylink.
Rip all this out, leaving just the bare bones initialisation in place.
However, this is not the last of what's broken. We have this hw->ps
integer which is really not descriptive, and the DT property from
which it comes from does little to help understand what's going on.
Putting all the clues together:
- early configuration of the GMAC configuration register for the
speed.
- setting the SGMII rate adapter layer to take its speed from the
GMAC configuration register.
Lastly, setting the transmit enable (TE) bit, which is a typo that puts
the nail in the coffin of this code. It should be the transmit
configuration (TC) bit. Given that when the link comes up, phylink
will call mac_link_up() which will overwrite the speed in the GMAC
configuration register, the only part of this that is functional is
changing where the SGMII rate adapter layer gets its speed from,
which is a boolean.
From what I've found so far, everyone who sets the snps,ps-speed
property which configures this mode also configures a fixed link,
so the pre-configuration is unnecessary - the link will come up
anyway.
So, this series rips that out the preconfiguration as well, and
replaces hw->ps with a boolean hw->reverse_sgmii_enable flag.
We then move the sole PCS configuration into a phylink_pcs instance,
which configures the PCS control register in the same way as is done
during the probe function.
Thus, we end up with much easier and simpler conversion to phylink PCS
than previous attempts.
Even so, this still results in inband mode always being enabled at
the moment in the new .pcs_config() method to reflect what the probe
function was doing. The next stage will be to change that to allow
phylink to correctly configure the PCS. This needs fixing to allow
platform glue maintainers who are currently blocked to progress.
Please note, however, that this has not been tested with any SGMII
platform.
I've tried to get as many people into the Cc list with get_maintainers,
I hope that's sufficient to get enough eyeballs on this.
Changes since RFC:
- new patch (7) to remove RGMII "pcs" mode
- new patch (8) to move reverse "pcs" mode to stmmac_check_pcs_mode()
- new patch (9) to simplify the code moved in the previous patch
- new patch (10) to rename the confusing hw->ps to something more
understandable.
- new patch (11) to shut up inappropriate complaints about
"snps,ps-speed" being invalid.
- new patch (13) to add a MAC .pcs_init method, which will only be
called when core has PCS present.
- modify patch 14 to use this new pcs_init method.
Despite getting a couple of responses to the RFC series posted in
September, I have had nothing testing this on hardware. I have tested
this on the Jetson Xavier NX, which included trial runs with enabling
the RGMII "pcs" mode, hence the new patches that rip out this mode. I
have come to the conclusion that the only way to get stmmac changes
tested is to get them merged into net-next, thereby forcing people to
have to run with them... and we'll deal with any fallout later.
====================
Now that stmmac's PCS support is much more simple - just a matter of
configuring the control register - the basic conversion to phylink PCS
support becomes straight forward.
Create the infrastructure to setup a phylink_pcs instance for the
integrated PCS:
- add a struct stmmac_pcs to encapsulate the phylink_pcs structure,
pointer to stmmac_priv, and the core-specific base address of the PCS
registers.
- modify stmmac_priv and stmmac_mac_select_pcs() to return the embedded
phylink_pcs structure when setup and STMMAC_PCS_SGMII is in use, and
move the comment from stmmac_hw_setup() to here.
- create stmmac_pcs.c, which contains the phylink_pcs_ops structure, a
dummy .pcs_get_state() method which always reports link-down, and
.pcs_config() method, moving the call to stmmac_pcs_ctrl_ane() here,
but without indirecting through the dwmac specific core code. The
link-down behaviour mentioned above maintains the current behaviour
when phylink is used with inband but without a PCS.
This will ensure that the PCS control register is configured to the
same settings as before, but will now happen when the netdev is opened
or reusmed rather than only during probe time. However, this will be
before the .fix_mac_speed() method is called, which is critical for the
behaviour in dwmac-qcom-ethqos's ethqos_configure_sgmii() function to
be maintained.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P72-0000000AomR-3ro4@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dwmac cores provide a feature bit to indicate when the PCS block is
present, but features are only read after the core's setup() function
has been called, meaning we can't decide whether to initialise the
integrated PCS in the setup function. Provide a new MAC core hook
for PCS initialisation, which will be called after the feature
registers have been read.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6x-0000000AomL-3OKd@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: only call stmmac_pcs_ctrl_ane() for integrated SGMII PCS
The internal PCS registers only exist if the core is synthesized with
SGMII, TBI or RTBI support. They have no relevance for RGMII.
However, priv->hw->pcs contains a STMMAC_PCS_RGMII flag, which is set
if a PCS has been synthesized but we are operating in RGMII mode. As
the register has no effect for RGMII, there is no point calling
stmmac_pcs_ctrl_ane() in this case. Add a comment describing this
and make it conditional on STMMAC_PCS_SGMII.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6s-0000000AomE-2pAa@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: do not require snps,ps-speed for SGMII
SGMII mode does not require port-speed to be specified; this only
switches SGMII to use the MAC configuration register speed settings
and the actual value is irrelevant when the link comes up.
As it seems the intention was to support "reverse SGMII" with this
setting, but the code didn't actually configure that due to a typo,
the warning and bad DT binding documentation has led people to
specify snps,ps-speed in their DT files inappropriately.
If mac_port_sel_speed is zero, then don't complain that the speed
is invalid, as this means we're using "normal" SGMII.
This does _not_ obsolete snps,ps-speed, nor does it change the
behaviour of that property, with the exception of not making people
mistakenly believe that they need to specify this option to use
normal SGMII. There is no need to modify the binding.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6n-0000000Aom9-2LuZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After a lot of digging, it seems that the oddly named hw->ps member
is all about setting the core into reverse SGMII speed. When set to
a non-zero value, it:
1. Configures the MAC at initialisation time to operate at a specific
speed.
2. It _incorrectly_ enables the transmitter (GMAC_CONFIG_TE) which
makes no sense, rather than enabling the "transmit configuration"
bit (GMAC_CONFIG_TC).
3. It configures the SGMII rate adapter layer to retrieve its speed
setting from the MAC configuration register rather than the PHY.
In the previous commit, we removed (1) and (2) as phylink overwrites
the configuration set at that step.
Thus, the only functional aspect is (3), which is a boolean operation.
This means there is no need to store the actual speed, and just have a
boolean flag.
Convert the priv->ps member to a boolean, and rename it to
priv->reverse_sgmii_enable to make it more understandable.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6i-0000000Aom3-1y2y@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we only support one mode, simplify stmmac_check_pcs_mode().
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6d-0000000Aolw-1T7d@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: move reverse-"pcs" mode setup to stmmac_check_pcs_mode()
The broken reverse-mode, selected by snps,ps-speed, is configured when
the platform provides a valid port speed and a PCS is being used.
Both these remain constant after the driver has probed, so the software
state doesn't need to be re-initialised each time stmmac_hw_setup() is
called (which is called at open and resume time.)
Move the software setup of reverse-mode to stmmac_check_pcs_mode()
which is called from the driver probe function.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6Y-0000000Aolr-0vLH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the RGMII "pcs" code in stmmac_check_pcs_mode() due to:
1) This should never have been conditional on a PCS being present, as
when a core is synthesised using only RGMII, the PCS won't be present
and priv->dma_cap.pcs will be false. Only multi-interface cores which
have a PCS present would have detected RGMII.
2) STMMAC_PCS_RGMII has no effect since the broken netif_carrier and
ethtool code was removed.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6T-0000000Aoll-0Ify@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After a lot of digging, it seems that the oddly named hw->ps member is
all about configuring the core for reverse SGMII. This member is set to
one of 0, SPEED_10, SPEED_100 or SPEED_1000 depending on
priv->plat->mac_port_sel_speed. On DT systems, this comes from the
"snps,ps-speed" DT property.
When set to a non-zero value, it:
1. Configures the MAC at initialisation time to operate at a specific
speed. However, this will be overwritten by mac_link_up() when the
link comes up (e.g. with the fixed-link parameters.)
Note that dwxgmac2 wants to also support SPEED_2500 and SPEED_10000,
but both these values are impossible.
2. It _incorrectly_ enables the transmitter (GMAC_CONFIG_TE) which
makes no sense, rather than enabling the "transmit configuration"
bit (GMAC_CONFIG_TC). Likely a typo.
3. It configures the SGMII rate adapter layer to retrieve its speed
setting from the MAC configuration register rather than the PHY.
There are two ways forward here:
a) fixing (2) so that we set GMAC_CONFIG_TC. However, we have platform
that set the "snps,ps-speed" property and that work today. Fixing
this will cause the RGMII, SGMII or SMII inband configuration to be
transmitted, which will be a functional change which could cause a
regression.
b) ripping out (1) and (2) as they are ineffective. This also has the
possibility of regressions, but the patch author believes this risk
is much lower than (a).
Therefore, this commit takes the approach in (b).
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6N-0000000Aolg-3y0a@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nothing calls stmmac_pcs_ctrl_ane() with the "loopback" argument set to
anything except zero, so this serves no useful purpose. Remove the
argument to reduce the code complexity.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6I-0000000Aola-3Sih@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the "we always autoneg pause" forcing when the stmmac driver
decides that a "PCS" is present, which blocks passing the ethtool
pause calls to phylink when using SGMII mode.
This prevents the pause results being reported when a PHY is attached
using SGMII mode, or the pause settings being changed in SGMII mode.
There is no reason to prevent this.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P6D-0000000AolU-2zjv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the only use for the interrupt is to clear it and increment a
statistic counter (which is not that relevant anymore) remove all this
code and ensure that the interrupt remains disabled to avoid a stuck
interrupt.
dwmac-sun8i still uses this statistic counter, so it is inappropriate
for this patch to remove it.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P68-0000000AolO-2W5s@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As a result of the previous commit, the pcs_link, pcs_duplex and
pcs_speed members are not used outside of the interrupt handling code,
and are only used to print their status using the misleading "Link is"
messages that bear no relation to the actual status of the link.
Remove the printing of these messages, these members, and the code
that decodes them from the hardware.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P63-0000000AolI-23Kf@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Changing the netif_carrier_*() state behind phylink's back has always
been prohibited because it messes up with phylinks state tracking, and
means that phylink no longer guarantees to call the mac_link_down()
and mac_link_up() methods at the appropriate times. This was later
documented in the sfp-phylink network driver conversion guide.
stmmac was converted to phylink in 2019, but nothing was done with the
"PCS" code. Since then, apart from the updates as part of phylink
development, nothing has happened with stmmac to improve its use of
phylink, or even to address this point.
A couple of years ago, a has_integrated_pcs boolean was added by Bart,
which later became the STMMAC_FLAG_HAS_INTEGRATED_PCS flag, to avoid
manipulating the netif_carrier_*() state. This flag is mis-named,
because whenever the stmmac is synthesized for its native SGMII, TBI
or RTBI interfaces, it has an "integrated PCS". This boolean/flag
actually means "ignore the status from the integrated PCS".
Discussing with Bart, the reasons for this are lost to the winds of
time (which is why we should always document the reasons in the commit
message.)
RGMII also has in-band status, and the dwmac cores and stmmac code
supports this but with one bug that saves the day.
When dwmac cores are synthesised for RGMII only, they do not contain
an integrated PCS, and so priv->dma_cap.pcs is clear, which prevents
(incorrectly) the "RGMII PCS" being used, meaning we don't read the
in-band status. However, a core synthesised for RGMII and also SGMII,
TBI or RTBI will have this capability bit set, thus making these
code paths reachable.
The Jetson Xavier NX uses RGMII mode to talk to its PHY, and removing
the incorrect check for priv->dma_cap.pcs reveals the theortical issue
with netif_carrier_*() manipulation is real:
dwc-eth-dwmac 2490000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=141)
dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found
dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock
dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode
8021q: adding VLAN 0 to HW filter on device eth0
dwc-eth-dwmac 2490000.ethernet eth0: Adding VLAN ID 0 is not supported
Link is Up - 1000/Full
Link is Down
Link is Up - 1000/Full
This looks good until one realises that the phylink "Link" status
messages are missing, even when the RJ45 cable is reconnected. Nothing
one can do results in the interface working. The interrupt handler
(which prints those "Link is" messages) always wins over phylink's
resolve worker, meaning phylink never calls the mac_link_up() nor
mac_link_down() methods.
eth0 also sees no traffic received, and is unable to obtain a DHCP
address:
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group defa
ult qlen 1000
link/ether e6:d3:6a:e6:92:de brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
27686 149 0 0 0 0
With the STMMAC_FLAG_HAS_INTEGRATED_PCS flag set, which disables the
netif_carrier_*() manipulation then stmmac works normally:
dwc-eth-dwmac 2490000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=141)
dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found
dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock
dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode
8021q: adding VLAN 0 to HW filter on device eth0
dwc-eth-dwmac 2490000.ethernet eth0: Adding VLAN ID 0 is not supported
Link is Up - 1000/Full
dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
and packets can be transferred.
This clearly shows that when priv->hw->pcs is set, but
STMMAC_FLAG_HAS_INTEGRATED_PCS is clear, the driver reliably fails.
Discovering whether a platform falls into this is impossible as
parsing all the dtsi and dts files to find out which use the stmmac
driver, whether any of them use RGMII or SGMII and also depends
whether an external interface is being used. The kernel likely
doesn't contain all dts files either.
The only driver that sets this flag uses the qcom,sa8775p-ethqos
compatible, and uses SGMII or 2500BASE-X.
but these are saved from this problem by the incorrect check for
priv->dma_cap.pcs.
So, we have to assume that for every other platform that uses SGMII
with stmmac is using an external PCS.
Moreover, ethtool output can be incorrect. With the full-duplex link
negotiated, ethtool reports:
Speed: 1000Mb/s
Duplex: Half
because with dwmac4, the full-duplex bit is in bit 16 of the status,
priv->xstats.pcs_duplex becomes BIT(16) for full duplex, but the
ethtool ksettings duplex member is u8 - so becomes zero. Moreover,
the supported, advertised and link partner modes are all "not
reported".
Finally, ksettings_set() won't be able to set the advertisement on
a PHY if this PCS code is activated, which is incorrect when SGMII
is used with a PHY.
Thus, remove:
1. the incorrect netif_carrier_*() manipulation.
2. the broken ethtool ksettings code.
Given that all uses of STMMAC_FLAG_HAS_INTEGRATED_PCS are now gone,
remove the flag from stmmac.h and dwmac-qcom-ethqos.c.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P5y-0000000AolC-1QWH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Randy Dunlap [Thu, 16 Oct 2025 03:59:17 +0000 (20:59 -0700)]
nl802154: fix some kernel-doc warnings
Correct multiple kernel-doc warnings in nl802154.h:
- Fix a typo on one enum name to avoid a kernel-doc warning.
- Drop 2 enum descriptions that are no longer needed.
- Mark 2 internal enums as "private:" so that kernel-doc is not needed
for them.
Warning: nl802154.h:239 Enum value 'NL802154_CAP_ATTR_MAX_MAXBE' not described in enum 'nl802154_wpan_phy_capability_attr'
Warning: nl802154.h:239 Excess enum value '%NL802154_CAP_ATTR_MIN_CCA_ED_LEVEL' description in 'nl802154_wpan_phy_capability_attr'
Warning: nl802154.h:239 Excess enum value '%NL802154_CAP_ATTR_MAX_CCA_ED_LEVEL' description in 'nl802154_wpan_phy_capability_attr'
Warning: nl802154.h:369 Enum value '__NL802154_CCA_OPT_ATTR_AFTER_LAST' not described in enum 'nl802154_cca_opts'
Warning: nl802154.h:369 Enum value 'NL802154_CCA_OPT_ATTR_MAX' not described in enum 'nl802154_cca_opts'
Eric Dumazet [Fri, 17 Oct 2025 14:53:34 +0000 (14:53 +0000)]
net: add a fast path in __netif_schedule()
Cpus serving NIC interrupts and specifically TX completions are often
trapped in also restarting a busy qdisc (because qdisc was stopped by BQL
or the driver's own flow control).
When they call netdev_tx_completed_queue() or netif_tx_wake_queue(),
they call __netif_schedule() so that the queue can be run
later from net_tx_action() (involving NET_TX_SOFTIRQ)
Quite often, by the time the cpu reaches net_tx_action(), another cpu
grabbed the qdisc spinlock from __dev_xmit_skb(), and we spend too much
time spinning on this lock.
We can detect in __netif_schedule() if a cpu is already at a specific
point in __dev_xmit_skb() where we have the guarantee the queue will
be run.
This patch gives a 13 % increase of throughput on an IDPF NIC (200Gbit),
32 TX qeues, sending UDP packets of 120 bytes.
This also helps __qdisc_run() to not force a NET_TX_SOFTIRQ
if another thread is waiting in __dev_xmit_skb()
Before:
sar -n DEV 5 5|grep eth1|grep Average
Average: eth1 1496.44 52191462.56 210.00 13369396.90 0.00 0.00 0.00 54.76
After:
sar -n DEV 5 5|grep eth1|grep Average
Average: eth1 1457.88 59363099.96 205.08 15206384.35 0.00 0.00 0.00 62.29
====================
net: dsa: lantiq_gswip: clean up and improve VLAN handling
This series was developed by Vladimir Oltean to improve and clean up the
VLAN handling logic in the Lantiq GSWIP DSA driver.
As Vladimir currently doesn't have the availability to take care of the
submission process, we agreed that I would send the patches on his
behalf.
The series focuses on consolidating the VLAN management paths for both
VLAN-unaware and VLAN-aware bridges, simplifying internal logic, and
removing legacy or redundant code. It also fixes a number of subtle
inconsistencies regarding VLAN ID 0 handling, bridge FDB entries, and
brings the driver into shape to permit dynamic changes to the VLAN
filtering state.
Notable changes include:
- Support for bridge FDB entries on the CPU port
- Consolidation of gswip_vlan_add_unaware() and gswip_vlan_add_aware()
into a unified implementation
- Removal of legacy VLAN configuration options and redundant
assignments
- Improved handling of VLAN ID 0 and PVID behavior
- Better validation and error reporting in VLAN removal paths
- Support for dynamic VLAN filtering configuration changes
Overall, this refactor improves readability and maintainability of the
Lantiq GSWIP DSA driver. It also results in all local-termination.sh
tests now passing, and slightly improves the results of
bridge-vlan-{un,}aware.sh.
All patches have been authored by Vladimir Oltean; a small unintended
functional change in patch "net: dsa: lantiq_gswip: merge
gswip_vlan_add_unaware() and gswip_vlan_add_aware()" has been ironed out
and some of the commit descriptions were improved by me, apart from that
I'm only handling the submission and will help with follow-up
discussions or review feedback as needed.
Despite the fact that some changes here do actually fix things (in the
sense that selftests which would previously FAIL now PASS) we decided
that it would be the best for this series of patches to go via net-next.
If requested some of it can still be ported to stable kernels later on.
====================
Vladimir Oltean [Wed, 15 Oct 2025 22:34:01 +0000 (23:34 +0100)]
net: dsa: lantiq_gswip: treat VID 0 like the PVID
Documentation/networking/switchdev.rst says that VLAN-aware bridges must
treat packets tagged with VID 0 the same as untagged. It appears from
the documentation that setting the GSWIP_PCE_VCTRL_VID0 flag (which this
driver already had defined) might achieve this.
Vladimir Oltean [Wed, 15 Oct 2025 22:33:50 +0000 (23:33 +0100)]
net: dsa: lantiq_gswip: drop untagged on VLAN-aware bridge ports with no PVID
Implement the required functionality, as written in
Documentation/networking/switchdev.rst section "Bridge VLAN filtering",
by using the "VLAN Ingress Tag Rule" feature of the switch.
The bit field definitions for this were found while browsing the Intel
dual BSD/GPLv2 licensed drivers for this switch IP.
Vladimir Oltean [Wed, 15 Oct 2025 22:33:33 +0000 (23:33 +0100)]
net: dsa: lantiq_gswip: remove vlan_aware and pvid arguments from gswip_vlan_remove()
"bool pvid" is unused since commit "net: dsa: lantiq_gswip: remove
legacy configure_vlan_while_not_filtering option".
"bool vlan_aware" shouldn't have a role in finding the bridge VLAN.
It should be identified by VID regardless of VLAN-aware or VLAN-unaware.
The driver sets up VID 0 for the VLAN-unaware PVID.
Vladimir Oltean [Wed, 15 Oct 2025 22:33:25 +0000 (23:33 +0100)]
net: dsa: lantiq_gswip: disallow changes to privately set up VID 0
User space can force the altering of VID 0 as it was privately set up by
this driver.
For example, when the port joins a VLAN-aware bridge,
dsa_user_manage_vlan_filtering() will set NETIF_F_HW_VLAN_CTAG_FILTER.
If the port is subsequently brought up and CONFIG_VLAN_8021Q is enabled,
the vlan_vid0_add() function will want to make sure we are capable of
accepting packets tagged with VID 0.
Generally, DSA/switchdev drivers want to suppress that bit of help from
the 8021q layer, and handle VID 0 filters themselves. The 8021q layer
might actually be even detrimential, because VLANs added through
vlan_vid_add() pass through dsa_user_vlan_rx_add_vid(), which is
documented as this:
/* This API only allows programming tagged, non-PVID VIDs */
.flags = 0,
so it will force VID 0 to be reconfigured as egress-tagged, non-PVID.
Whereas the driver configures it as PVID and egress-untagged, the exact
opposite.
Vladimir Oltean [Wed, 15 Oct 2025 22:32:58 +0000 (23:32 +0100)]
net: dsa: lantiq_gswip: permit dynamic changes to VLAN filtering state
The driver should now tolerate these changes, now that the PVID is
automatically recalculated on a VLAN awareness state change.
The VLAN-unaware PVID must be installed to hardware even if the
joined bridge is currently VLAN-aware. Otherwise, when the bridge VLAN
filtering state dynamically changes to VLAN-unaware later, this PVID
will be missing.
This driver doesn't support dynamic VLAN filtering changes, for simplicity.
It expects that on a port, either gswip_vlan_add_unaware() or
gswip_vlan_add_aware() is called, but not both.
When !br_vlan_enabled(), the configure_vlan_while_not_filtering = false
option is exactly what will prevent calls to gswip_port_vlan_add() from
being issued by DSA.
In fact, at the time these features were submitted:
https://patchwork.ozlabs.org/project/netdev/patch/20190501204506.21579-3-hauke@hauke-m.de/
"configure_vlan_while_not_filtering = false" did not even have a name,
it was implicit behaviour. It only became legacy in commit 54a0ed0df496
("net: dsa: provide an option for drivers to always receive bridge
VLANs").
Section "Bridge VLAN filtering" of Documentation/networking/switchdev.rst
describes the exact set of rules. Notably, the PVID of the port must
follow the VLAN awareness state of the bridge port. A VLAN-unaware
bridge port should not respond to the addition of a bridge VLAN with the
PVID flag. In fact, the pvid_change() test in
tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh tests
exactly this.
The lantiq_gswip driver indeed does not respond to the addition of PVID
VLANs while VLAN-unaware in the way described above, but only because of
configure_vlan_while_not_filtering. Our purpose here is to get rid of
configure_vlan_while_not_filtering, so we must add more complex logic
which follows the VLAN awareness state and walks through the Active VLAN
table entries, to find the index of the PVID register that should be
committed to hardware on each port.
As a side-effect of now having a proper implementation to assign the
PVID all the "VLAN upper: ..." tests of the local_termination.sh self-
tests which would previously all FAIL now all PASS (or XFAIL, but
that's ok).
Vladimir Oltean [Wed, 15 Oct 2025 22:32:41 +0000 (23:32 +0100)]
net: dsa: lantiq_gswip: merge gswip_vlan_add_unaware() and gswip_vlan_add_aware()
The two functions largely duplicate functionality. The differences
consist in:
- the "fid" passed to gswip_vlan_active_create(). The unaware variant
always passes -1, the aware variant passes fid = priv->vlans[i].fid,
where i is an index into priv->vlans[] for which priv->vlans[i].bridge
is equal to the given bridge.
- the "vid" is not passed to gswip_vlan_add_unaware(). It is implicitly
GSWIP_VLAN_UNAWARE_PVID (zero).
- The "untagged" is not passed to gswip_vlan_add_unaware(). It is
implicitly true. Also, the CPU port must not be a tag member of the
PVID used for VLAN-unaware bridging.
- The "pvid" is not passed to gswip_vlan_add_unaware(). It is implicitly
true.
- The GSWIP_PCE_DEFPVID(port) register is written by the aware variant
with an "idx", but with a hardcoded 0 by the unaware variant.
Merge the two functions into a single unified function without any
functional changes.
Vladimir Oltean [Wed, 15 Oct 2025 22:32:05 +0000 (23:32 +0100)]
net: dsa: lantiq_gswip: support bridge FDB entries on the CPU port
Currently, the driver takes the bridge from dsa_port_bridge_dev_get(),
which only works for user ports. This is why it has to ignore FDB
entries installed on the CPU port.
Commit c26933639b54 ("net: dsa: request drivers to perform FDB
isolation") introduced the possibility of getting the originating bridge
from the passed dsa_db argument, so let's do that instead. This way, we
can act on the local FDB entries coming from the bridge.
Note that we do not expect FDB events for the DSA_DB_PORT database,
because this driver doesn't fulfill the dsa_switch_supports_uc_filtering()
requirements. So we can just return -EOPNOTSUPP and expect it will never
be triggered.
We've added 6 non-merge commits during the last 1 day(s) which contain
a total of 18 files changed, 577 insertions(+), 38 deletions(-).
The main changes are:
1) Bypass the global per-protocol memory accounting either by setting
a netns sysctl or using bpf_setsockopt in a bpf program,
from Kuniyuki Iwashima.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add test for sk->sk_bypass_prot_mem.
bpf: Introduce SK_BPF_BYPASS_PROT_MEM.
bpf: Support bpf_setsockopt() for BPF_CGROUP_INET_SOCK_CREATE.
net: Introduce net.core.bypass_prot_mem sysctl.
net: Allow opt-out from global protocol memory accounting.
tcp: Save lock_sock() for memcg in inet_csk_accept().
====================
Eric Biggers [Tue, 14 Oct 2025 21:58:36 +0000 (14:58 -0700)]
tcp: Convert tcp-md5 to use MD5 library instead of crypto_ahash
Make tcp-md5 use the MD5 library API (added in 6.18) instead of the
crypto_ahash API. This is much simpler and also more efficient:
- The library API just operates on struct md5_ctx. Just allocate this
struct on the stack instead of using a pool of pre-allocated
crypto_ahash and ahash_request objects.
- The library API accepts standard pointers and doesn't require
scatterlists. So, for hashing the headers just use an on-stack buffer
instead of a pool of pre-allocated kmalloc'ed scratch buffers.
- The library API never fails. Therefore, checking for MD5 hashing
errors is no longer necessary. Update tcp_v4_md5_hash_skb(),
tcp_v6_md5_hash_skb(), tcp_v4_md5_hash_hdr(), tcp_v6_md5_hash_hdr(),
tcp_md5_hash_key(), tcp_sock_af_ops::calc_md5_hash, and
tcp_request_sock_ops::calc_md5_hash to return void instead of int.
- The library API provides direct access to the MD5 code, eliminating
unnecessary overhead such as indirect function calls and scatterlist
management. Microbenchmarks of tcp_v4_md5_hash_skb() on x86_64 show a
speedup from 7518 to 7041 cycles (6% fewer) with skb->len == 1440, or
from 1020 to 678 cycles (33% fewer) with skb->len == 140.
Since tcp_sigpool_hash_skb_data() can no longer be used, add a function
tcp_md5_hash_skb_data() which is specialized to MD5. Of course, to the
extent that this duplicates any code, it's well worth it.
To preserve the existing behavior of TCP-MD5 support being disabled when
the kernel is booted with "fips=1", make tcp_md5_do_add() check
fips_enabled itself. Previously it relied on the error from
crypto_alloc_ahash("md5") being bubbled up. I don't know for sure that
this is actually needed, but this preserves the existing behavior.
Tested with bidirectional TCP-MD5, both IPv4 and IPv6, between a kernel
that includes this commit and a kernel that doesn't include this commit.
(Side note: please don't use TCP-MD5! It's cryptographically weak. But
as long as Linux supports it, it might as well be implemented properly.)
The io_uring functions return negative error values, but error() expects
these to be positive to properly match them to an errno string. Fix this
to make sure the correct error descriptions are displayed upon failure.
Heiner Kallweit [Thu, 16 Oct 2025 19:25:28 +0000 (21:25 +0200)]
r8169: reconfigure rx unconditionally before chip reset when resuming
There's a good chance that more chip versions suffer from the same
hw issue. So let's reconfigure rx unconditionally before the chip reset
when resuming. This shouldn't have any side effect on unaffected chip
versions.
Florian Westphal [Thu, 16 Oct 2025 11:51:47 +0000 (13:51 +0200)]
net: Kconfig: discourage drop_monitor enablement
Quoting Eric Dumazet:
"I do not understand the fascination with net/core/drop_monitor.c [..]
misses all the features, flexibility, scalability that 'perf',
eBPF tracing, bpftrace, .... have today."
Reword DROP_MONITOR kconfig help text to clearly state that its not
related to perf-based drop monitoring and that its safe to disable
this unless support for the older netlink-based tools is needed.
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251016115147.18503-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After replacing R/W locks with RCU in commit 3ab5aee7fe84 ("net: Convert
TCP & DCCP hash tables to use RCU / hlist_nulls"), a race window emerged
during the switch from reqsk/sk to sk/tw.
Now that both timewait sock (tw) and full sock (sk) reside on the same
ehash chain, it is appropriate to introduce hlist_nulls replace
operations, to eliminate the race conditions caused by this window.
Before this series of patches, I previously sent another version of the
patch, attempting to avoid the issue using a lock mechanism. However, it
seems there are some problems with that approach now, so I've switched to
the "replace" method in the current patches to resolve the issue.
For details, refer to:
https://lore.kernel.org/netdev/20250903024406.2418362-1-xuanqiang.luo@linux.dev/
Before I encountered this type of issue recently, I found there had been
several historical discussions about it. Therefore, I'm adding this
background information for those interested to reference:
1. https://lore.kernel.org/lkml/20230118015941.1313-1-kerneljasonxing@gmail.com/
2. https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/
====================
Xuanqiang Luo [Wed, 15 Oct 2025 02:02:36 +0000 (10:02 +0800)]
inet: Avoid ehash lookup race in inet_twsk_hashdance_schedule()
Since ehash lookups are lockless, if another CPU is converting sk to tw
concurrently, fetching the newly inserted tw with tw->tw_refcnt == 0 cause
lookup failure.
The call trace map is drawn as follows:
CPU 0 CPU 1
----- -----
inet_twsk_hashdance_schedule()
spin_lock()
inet_twsk_add_node_rcu(tw, ...)
__inet_lookup_established()
(find tw, failure due to tw_refcnt = 0)
__sk_nulls_del_node_init_rcu(sk)
refcount_set(&tw->tw_refcnt, 3)
spin_unlock()
By replacing sk with tw atomically via hlist_nulls_replace_init_rcu() after
setting tw_refcnt, we ensure that tw is either fully initialized or not
visible to other CPUs, eliminating the race.
It's worth noting that we held lock_sock() before the replacement, so
there's no need to check if sk is hashed. Thanks to Kuniyuki Iwashima!
Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251015020236.431822-4-xuanqiang.luo@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xuanqiang Luo [Wed, 15 Oct 2025 02:02:35 +0000 (10:02 +0800)]
inet: Avoid ehash lookup race in inet_ehash_insert()
Since ehash lookups are lockless, if one CPU performs a lookup while
another concurrently deletes and inserts (removing reqsk and inserting sk),
the lookup may fail to find the socket, an RST may be sent.
The call trace map is drawn as follows:
CPU 0 CPU 1
----- -----
inet_ehash_insert()
spin_lock()
sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
(lookup failed)
__sk_nulls_add_node_rcu(sk, list)
spin_unlock()
As both deletion and insertion operate on the same ehash chain, this patch
introduces a new sk_nulls_replace_node_init_rcu() helper functions to
implement atomic replacement.
Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions") Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251015020236.431822-3-xuanqiang.luo@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xuanqiang Luo [Wed, 15 Oct 2025 02:02:34 +0000 (10:02 +0800)]
rculist: Add hlist_nulls_replace_rcu() and hlist_nulls_replace_init_rcu()
Add two functions to atomically replace RCU-protected hlist_nulls entries.
Keep using WRITE_ONCE() to assign values to ->next and ->pprev, as
mentioned in the patch below:
commit efd04f8a8b45 ("rcu: Use WRITE_ONCE() for assignments to ->next for
rculist_nulls")
commit 860c8802ace1 ("rcu: Use WRITE_ONCE() for assignments to ->pprev for
hlist_nulls")
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Link: https://patch.msgid.link/20251015020236.431822-2-xuanqiang.luo@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While working on the m_can driver, I created several cleanups commits, make
m_can_init_ram() static, rename hrtimer function, convert debugging and
error output to netdev_(), replace open coded register write by
m_can_write(), remove not needed error messages and sanity checks and don't
wake up hte controller during m_can_get_berr_counter() if the interface is
down.
can: m_can: hrtimer_callback(): rename to m_can_polling_timer()
The original use of struct m_can_classdev::hrtimer was to support polling
for devices without IRQ, with the timer function called hrtimer_callback().
Commit 07f25091ca02 ("can: m_can: Implement receive coalescing") uses the
hrtimer for software-supported IRQ coalescence, with the timer function
called m_can_coalescing_timer().
To improve the readability of the driver, rename hrtimer_callback() to
m_can_polling_timer(), which better describes the functionality.
Merge patch series "can: m_can: Add am62 wakeup support"
Markus Schneider-Pargmann (TI.com) <msp@baylibre.com> says:
This series adds support for wakeup capabilities to the m_can driver,
which is necessary for enabling Partial-IO functionality on am62, am62a,
and am62p SoCs. It implements the wake-on-lan interface for m_can
devices and handles the pinctrl states needed for wakeup functionality.
am62, am62a and am62p support Partial-IO, a low power system state in
which nearly everything is turned off except the pins of the CANUART
group. This group contains mcu_mcan0, mcu_mcan1, wkup_uart0 and
mcu_uart0 devices.
To support mcu_mcan0 and mcu_mcan1 wakeup for the mentioned SoCs, the
series introduces a notion of wake-on-lan for m_can. If the user decides
to enable wake-on-lan for a m_can device, the device is set to wakeup
enabled. A 'wakeup' pinctrl state is selected to enable wakeup flags for
the relevant pins. If wake-on-lan is disabled the default pinctrl is
selected.
Partial-IO Overview
------------------
Partial-IO is a low power system state in which nearly everything is
turned off except the pins of the CANUART group (mcu_mcan0, mcu_mcan1,
wkup_uart0 and mcu_uart0). These devices can trigger a wakeup of the
system on pin activity. Note that this does not resume the system as the
DDR is off as well. So this state can be considered a power-off state
with wakeup capabilities.
A documentation can also be found in section 6.2.4 in the TRM:
https://www.ti.com/lit/pdf/spruiv7
Implementation Details
----------------------
The complete Partial-IO feature requires three coordinated series, each
handling a different aspect of the implementation:
1. This series (m_can driver): Implements device-specific wakeup
functionality for m_can devices, allowing them to be set as wakeup
sources.
2. Devicetree series: Defines system states and wakeup sources in the
devicetree for am62, am62a and am62p.
https://gitlab.baylibre.com/msp8/linux/-/tree/topic/am62-dt-partialio/v6.17?ref_type=heads
3. TI-SCI firmware series: Implements the firmware interface to enter
Partial-IO mode when appropriate wakeup sources are enabled.
https://gitlab.baylibre.com/msp8/linux/-/tree/topic/tisci-partialio/v6.17?ref_type=heads
Devicetree Bindings
-------------------
The wakeup-source property is used with references to
system-idle-states. This depends on the dt-schema pull request that adds
bindings for system-idle-states and updates the binding for
wakeup-source:
https://github.com/devicetree-org/dt-schema/pull/150
This is merged now and upstream in dt-schema.
Testing
-------
A test branch is available here that includes all patches required to
test Partial-IO:
After enabling Wake-on-LAN the system can be powered off and will enter
the Partial-IO state in which it can be woken up by activity on the
specific pins:
ethtool -s can0 wol p
ethtool -s can1 wol p
poweroff
Changes in v10:
- Change dt-binding to be able to set pinctrl-names = "default", "wakeup";
- Fix wording in the dt-binging
- Fix mcan commit message to have correct naming of the SoC
- Change function name from m_can_class_setup_optional_pinctrl() to
m_can_class_parse_pinctrl()
Changes in v9:
- Update the binding to accept the sleep pinctrl state which is
already in use by other devicetrees
- Modify suspend/resume to not set the sleep state if wakeup is enabled
and a wakeup pinctrl state is present. If wakeup pinctrl is active
this should be kept enabled even after suspend
- Modify m_can_set_wol() to use pinctrl_pm_select_default_state() to
get rid of the manually managed default pinctrl.
Changes in v8:
- Rebase to v6.17-rc1
Changes in v7:
- Separate this series from "firmware: ti_sci: Partial-IO support"
again as was requested internally
- All DT changes are now in their own series to avoid conflicts
- wakeup-source definition in the m_can binding is now only an
extension to the dt-schema binding and a pull request was created
Changes in v6:
- Rebased to v6.13-rc1
- After feedback of the other Partial-IO series, I updated this series
and removed all use of regulator-related patches.
- wakeup-source is now not only a boolean property but can also be a
list of power states in which the device is wakeup capable.
Changes in v5:
- Make the check of wol options nicer to read
Changes in v4:
- Remove leftover testing code that always returned -EIO in a specific
- Redesign pincontrol setup to be easier understandable and less nested
- Fix missing parantheses around wol_enable expression
- Remove | from binding description
Changes in v3:
- Rebase to v6.12-rc1
- Change 'wakeup-source' to only 'true'
- Simplify m_can_set_wol by returning early on error
- Add vio-suuply binding and handling of this optional property.
vio-supply is used to reflect the SoC architecture and which power
line powers the m_can unit. This is important as some units are
powered in special low power modes.
Changes in v2:
- Rebase to v6.11-rc1
- Squash these two patches for the binding into one:
dt-bindings: can: m_can: Add wakeup-source property
dt-bindings: can: m_can: Add wakeup pinctrl state
- Add error handling to multiple patches of the m_can driver
- Add error handling in m_can_class_allocate_dev(). This also required
to add a new patch to return error pointers from
m_can_class_allocate_dev().
TI AM62x SoC requires a wakeup flag being set in pinctrl when mcan pins
act as a wakeup source. Add support to select the wakeup state if WOL
is enabled.
In some devices the pins of the m_can module can act as a wakeup source.
This patch helps do that by connecting the PHY_WAKE WoL option to
device_set_wakeup_enable. By marking this device as being wakeup
enabled, this setting can be used by platform code to decide which
sleep or poweroff mode to use.
Also this prepares the driver for the next patch in which the pinctrl
settings are changed depending on the desired wakeup source.
The pins associated with m_can have to have a special configuration to
be able to wakeup the SoC from some system states. This configuration is
described in the wakeup pinctrl state while the default state describes
the default configuration. Also add the sleep state which is already in
use by some devicetrees.
Also m_can can be a wakeup-source if capable of wakeup.
Vincent Mailhol [Fri, 3 Oct 2025 03:16:38 +0000 (12:16 +0900)]
can: treewide: remove can_change_mtu()
can_change_mtu() became obsolete by commit 23049938605b ("can: populate the
minimum and maximum MTU values"). Now that net_device->min_mtu and
net_device->max_mtu are populated, all the checks are already done by
dev_validate_mtu() in net/core/dev.c.
Remove the net_device_ops->ndo_change_mtu() callback of all the physical
interfaces, then remove can_change_mtu(). Only keep the vcan_change_mtu()
and vxcan_change_mtu() because the virtual interfaces use their own
different MTU logic.
The only functional change this patch introduces is that now the user will
be able to change the MTU even if the interface is up. This does not matter
for Classical CAN and CAN FD because their MTU range is composed of only
one value, respectively CAN_MTU and CANFD_MTU. For the upcoming CAN XL, the
MTU will be configurable within the CANXL_MIN_MTU to CANXL_MAX_MTU range at
any time, even if the interface is up. This is consistent with the other
net protocols and does not contradict ISO 11898-1:2024 as having a
modifiable MTU is a kernel extension.
This patch has been split from the original series [1].
In some SoCs (observed on the STM32MP15) the M_CAN IP core keeps the CAN
state and CAN error counters over an internal reset cycle. The STM32MP15
SoC provides an external reset, which is shared between both M_CAN cores.
Add support for an optional external reset. Take care of shared resets,
de-assert reset during the probe phase in m_can_class_register() and while
the interface is up, assert the reset otherwise.
Jakub Kicinski [Thu, 16 Oct 2025 23:59:31 +0000 (16:59 -0700)]
Merge branch 'net-macb-various-cleanups'
Théo Lebrun says:
====================
net: macb: various cleanups
Fix many oddities inside the MACB driver. They accumulated in my
work-in-progress branch while working on MACB/GEM EyeQ5 support.
Part of this series has been seen on the lkml in March then June.
See below for a semblance of a changelog.
The initial goal was to post them alongside EyeQ5 support, but that
makes for too big of a series. It'll come afterwards, with new
features (interrupt coalescing, ethtool .set_channels() and XDP mostly).
Théo Lebrun [Tue, 14 Oct 2025 15:25:14 +0000 (17:25 +0200)]
net: macb: drop `count` local variable in macb_tx_map()
Local variable `count` is useless: it counts number of DMA descriptors
used and returns it. But the return value is only checked for error.
Drop counting the number of DMA descriptors and return a usual
negative-if-error integer.
Théo Lebrun [Tue, 14 Oct 2025 15:25:12 +0000 (17:25 +0200)]
net: macb: replace min() with umin() calls
Whenever min(a, b) is used with a and b unsigned variables or literals,
`make W=2` complains. Change four min() calls into umin().
stderr extract (GCC 11.2.0, MIPS Codescape):
./include/linux/minmax.h:68:57: warning: comparison is always true due
to limited range of data type [-Wtype-limits]
68 | #define __is_nonneg(ux) statically_true((long long)(ux) >= 0)
| ^~
drivers/net/ethernet/cadence/macb_main.c:2299:26: note: in expansion of
macro ‘min’
2299 | hdrlen = min(skb_headlen(skb), bp->max_tx_length);
| ^~~
Théo Lebrun [Tue, 14 Oct 2025 15:25:11 +0000 (17:25 +0200)]
net: macb: remove bp->queue_mask
The low 16 bits of GEM_DCFG6 tell us which queues are enabled in HW. In
theory, there could be holes in the bitfield. In practice, the macb
driver would fail if there were holes as most loops iterate upon
bp->num_queues. Only macb_init() iterated correctly.
- Drop bp->queue_mask field.
- Error out at probe if a hole is in the queue mask.
- Rely upon bp->num_queues for iteration.
- As we drop the queue_mask probe local variable, fix RCT.
- Compute queue_mask on the fly for TAPRIO using bp->num_queues.
Théo Lebrun [Tue, 14 Oct 2025 15:25:10 +0000 (17:25 +0200)]
net: macb: introduce DMA descriptor helpers (is 64bit? is PTP?)
Introduce macb_dma64() and macb_dma_ptp() helper functions.
Many codepaths are made simpler by dropping conditional compilation.
This implies two additional changes:
- Always compile related structure definitions inside <macb.h>.
- MACB_EXT_DESC can be dropped as it is useless now.