Heiner Kallweit [Thu, 30 Oct 2025 21:44:35 +0000 (22:44 +0100)]
net: b44: register a fixed phy using fixed_phy_register_100fd if needed
In case of bcm47xx a fixed phy is used, which so far is created
by platform code, using fixed_phy_add(). This function has a number of
problems, therefore create a potentially needed fixed phy here, using
fixed_phy_register_100fd.
Due to lack of hardware, this is compile-tested only.
Heiner Kallweit [Thu, 30 Oct 2025 21:42:30 +0000 (22:42 +0100)]
net: fec: register a fixed phy using fixed_phy_register_100fd if needed
In case of coldfire/5272 a fixed phy is used, which so far is created
by platform code, using fixed_phy_add(). This function has a number of
problems, therefore create a potentially needed fixed phy here, using
fixed_phy_register_100fd.
Note 1: This includes a small functional change, as coldfire/5272
created a fixed phy in half-duplex mode. Likely this was by mistake,
because the fec MAC is 100FD-capable, and connection is to a switch.
Note 2: Usage of phy_find_next() makes use of the fact that dev_id can
only be 0 or 1.
Due to lack of hardware, this is compile-tested only.
Altera TSE cleanup to make sure everything is properly intialized
before registering the netdev.
When Altera TSE was converted to phylink, the PCS and phylink creation
were added after register_netdev(), which is wrong as this may race
with .ndo_open() once the netdev is registered.
This series makes so that we register the netdev once all resources are
cleanly initialised, that includes PCS and phylink creation as well as a
few other operations such as reading the IP version.
No errors were found in the wild, so this series doesn't target net, but
given that we fix some racy-ness, a point could be made to send that to
net.
This series doesn't introduce functional changes, however the internal
mii_bus for PCS configuration is renamed.
net: altera-tse: Init PCS and phylink before registering netdev
register_netdev() must be done only once all resources are ready, as
they may be used in .ndo_open() immediately upon registration.
Move the lynx PCS and phylink initialisation before registerng the
netdevice. We also remove the call to netif_carrier_off(), as phylink
takes care of that.
net: altera-tse: Don't use netdev name for the PCS mdio bus
The PCS mdio bus must be created before registering the net_device. To
do that, we musn't depend on the netdev name to create the mdio bus
name. Let's use the device's name instead.
Note that this changes the bus name in /sys/bus/mdiobus
net: altera-tse: Warn on bad revision at probe time
Instead of reading the core revision at probe time, and print a warning
for an unexecpected version at .ndo_open() time, let's print that
warning directly in .probe().
This allows getting rid of the "revision" private field, and also
prevent a potential race between reading the revision in .probe() after
netdev registration, and accessing that revision in .ndo_open().
By printing the warning after register_netdev(), we are sure that we
have a netdev name, and that we try to print the revision after having
read it from the internal registers.
Heiner Kallweit [Mon, 3 Nov 2025 22:26:49 +0000 (23:26 +0100)]
net: phy: make phy_device members pause and asym_pause bitfield bits
We can reduce the size of struct phy_device a little by switching
the type of members pause and asym_pause from int to a single bit.
As C99 is supported now, we can use type bool for the bitfield members,
what provides us with the benefit of the usual implicit bool conversions.
====================
Add driver for 1Gbe network chips from MUCSE
This patch series adds support for MUCSE RNPGBE 1Gbps PCIe Ethernet controllers
(N500/N210 series), including build infrastructure, hardware initialization,
mailbox (MBX) communication with firmware, and basic netdev registration
(Can show mac witch is got from firmware, and tx/rx will be added later).
Series breakdown (5 patches):
01/05: net: ethernet/mucse: Add build support for rnpgbe
- Kconfig/Makefile for MUCSE vendor, basic PCI probe (no netdev)
02/05: net: ethernet/mucse: Add N500/N210 chip support
- netdev allocation, BAR mapping
03/05: net: ethernet/mucse: Add basic MBX ops for PF-FW communication
- base read/write, write with poll ack, poll and read data
04/05: net: ethernet/mucse: Add FW commands (sync, reset, MAC query)
- FW sync retry logic, MAC address retrieval, reset hw with
base mbx ops in patch4
05/05: net: ethernet/mucse: Complete netdev registration
- HW reset, MAC setup, netdev_ops registration
====================
Dong Yibo [Sat, 1 Nov 2025 01:38:49 +0000 (09:38 +0800)]
net: rnpgbe: Add register_netdev
Complete the network device (netdev) registration flow for Mucse Gbe
Ethernet chips, including:
1. Hardware state initialization:
- Send powerup notification to firmware (via echo_fw_status)
- Sync with firmware
- Reset hardware
2. MAC address handling:
- Retrieve permanent MAC from firmware (via mucse_mbx_get_macaddr)
- Fallback to random valid MAC (eth_random_addr) if not valid mac
from Fw
Dong Yibo [Sat, 1 Nov 2025 01:38:48 +0000 (09:38 +0800)]
net: rnpgbe: Add basic mbx_fw support
Add fundamental firmware (FW) communication operations via PF-FW
mailbox, including:
- FW sync (via HW info query with retries)
- HW reset (post FW command to reset hardware)
- MAC address retrieval (request FW for port-specific MAC)
- Power management (powerup/powerdown notification to FW)
Vadim Fedorenko [Mon, 3 Nov 2025 17:29:02 +0000 (17:29 +0000)]
ti: netcp: convert to ndo_hwtstamp callbacks
Convert TI NetCP driver to use ndo_hwtstamp_get()/ndo_hwtstamp_set()
callbacks. The logic is slightly changed, because I believe the original
logic was not really correct. Config reading part is using the very
first module to get the configuration instead of iterating over all of
them and keep the last one as the configuration is supposed to be identical
for all modules. HW timestamp config set path is now trying to configure
all modules, but in case of error from one module it adds extack
message. This way the configuration will be as synchronized as possible.
There are only 2 modules using netcp core infrastructure, and both use
the very same function to configure HW timestamping, so no actual
difference in behavior is expected.
====================
convert drivers to use ndo_hwtstamp callbacks part 3 [part]
This patchset converts the rest of ethernet drivers to use ndo callbacks
instead ioctl to configure and report time stamping. The drivers in part
3 originally implemented only SIOCSHWTSTAMP command, but converted to
also provide configuration back to users.
====================
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:51 +0000 (15:09 +0000)]
net: pch_gbe: convert to use ndo_hwtstamp callbacks
The driver implemented SIOCSHWTSTAMP ioctl command only, but it stores
configuration in the private data, so it is possible to report it back
to users. Implement both ndo_hwtstamp_set and ndo_hwtstamp_get
callbacks. To properly report RX filter type, store it in hwts_rx_en
instead of using this field as a simple flag. The logic didn't change
because receive path used this field as boolean flag.
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:50 +0000 (15:09 +0000)]
net: thunderx: convert to use ndo_hwtstamp callbacks
The driver implemented SIOCSHWTSTAMP ioctl command only, but it also
stores configuration in private data, so it's possible to report it back
to users. Implement both ndo_hwtstamp_set and ndo_hwtstamp_get
callbacks.
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:49 +0000 (15:09 +0000)]
net: octeon: mgmt: convert to use ndo_hwtstamp callbacks
The driver implemented SIOCSHWTSTAMP ioctl command only. But it stores
timestamping configuration, so it is possible to report it to users.
Implement both ndo_hwtstamp_set and ndo_hwtstamp_get callbacks. After
this the ndo_eth_ioctl effectively becomes phy_do_ioctl - adjust
callback accordingly.
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:48 +0000 (15:09 +0000)]
net: liquidio_vf: convert to use ndo_hwtstamp callbacks
The driver implemented SIOCSHWTSTAMP ioctl command only, but there is a
way to get configuration back. Implement both ndo_hwtstamp_set and
ndo_hwtstamp_set callbacks.
Vadim Fedorenko [Mon, 3 Nov 2025 15:09:47 +0000 (15:09 +0000)]
net: liquidio: convert to use ndo_hwtstamp callbacks
The driver implemented SIOCSHWTSTAMP ioctl command only, but there is a
way to get configured status. Implement both ndo_hwtstamp_set and
ndo_hwtstamp_get callbacks.
Buday Csaba [Mon, 3 Nov 2025 08:13:42 +0000 (09:13 +0100)]
dt-bindings: net: ethernet-phy: clarify when compatible must specify PHY ID
Change PHY ID description in ethernet-phy.yaml to clarify that a
PHY ID is required (may -> must) when the PHY requires special
initialization sequence.
====================
mptcp: pm: in-kernel: fullmesh endp nb + bind cases
Here is a small optimisation for the in-kernel PM, joined by a small
behavioural change to avoid confusions, and followed by a few more
tests.
- Patch 1: record fullmesh endpoints numbers, not to iterate over all
endpoints to check if one is marked as fullmesh.
- Patch 2: when at least one endpoint is marked as fullmesh, only use
these endpoints when reacting to an ADD_ADDR, even if there are no
endpoints for this IP family: this is less confusing.
- Patch 3: reduce duplicated code to prepare the next patch.
- Patch 4: extra "bind" cases: the listen socket restrict the bind to
one IP address, not allowing MP_JOIN to extra IP addresses, except if
another listening socket accepts them.
====================
By design, an MPTCP connection will not accept extra subflows where no
MPTCP listening sockets can accept such requests.
In other words, it means that if the 'server' listens on a specific
address / device, it cannot accept MP_JOIN sent to a different address /
device. Except if there is another MPTCP listening socket accepting
them.
This is what the new tests are validating:
- Forcing a bind on the main v4/v6 address, and checking that MP_JOIN
to announced addresses are not accepted.
- Also forcing a bind on the main v4/v6 address, but before, another
listening socket is created to accept additional subflows. Note that
'mptcpize run nc -l' -- or something else only doing: socket(MPTCP),
bind(<IP>), listen(0) -- would be enough, but here mptcp_connect is
reused not to depend on another tool just for that.
- Same as the previous one, but using v6 link-local addresses: this is
a bit particular because it is required to specify the outgoing
network interface when connecting to a link-local address announced
by the other peer. When using the routing rules, this doesn't work
(the outgoing interface is not known) ; but it does work with a
'laminar' endpoint having a specified interface.
Note that extra small modifications are needed for these tests to work:
- mptcp_connect's check_getpeername_connect() check should strip the
specified interface when comparing addresses.
- With IPv6 link-local addresses, it is required to wait for them to
be ready (no longer in 'tentative' mode) before using them, otherwise
the bind() will not be allowed.
mptcp: pm: in kernel: only use fullmesh endp if any
Our documentation is saying that the in-kernel PM is only using fullmesh
endpoints to establish subflows to announced addresses when at least one
endpoint has a fullmesh flag. But this was not totally correct: only
fullmesh endpoints were used if at least one endpoint *from the same
address family as the received ADD_ADDR* has the fullmesh flag.
This is confusing, and it seems clearer not to have differences
depending on the address family.
So, now, when at least one MPTCP endpoint has a fullmesh flag, the local
addresses are picked from all fullmesh endpoints, which might be 0 if
there are no endpoints for the correct address family.
One selftest needs to be adapted for this behaviour change.
Instead of iterating over all endpoints, under RCU read lock, just to
check if one of them as the fullmesh flag, we can keep a counter of
fullmesh endpoint, similar to what is done with the other flags.
This counter is now checked, before iterating over all endpoints.
Similar to the other counters, this new one is also exposed. A userspace
app can then know when it is being used in a fullmesh mode, with
potentially (too) many subflows.
====================
net/mlx5e: Reduce interface downtime on configuration change
This series significantly reduces the interface downtime while swapping
channels during a configuration change, on capable devices.
Here we remove an old requirement on operations ordering that became
obsolete on recent capable devices. This helps cutting the downtime by a
factor of magnitude, ~80% in our example.
Perf numbers:
Measured the number of dropped packets in a simple ping flood test,
during a configuration change operation, that switches the number of
channels from 247 to 248.
Tariq Toukan [Thu, 30 Oct 2025 13:32:39 +0000 (15:32 +0200)]
net/mlx5e: Defer channels closure to reduce interface down time
Cap bit tis_tir_td_order=1 indicates that an old firmware requirement /
limitation no longer exists. When unset, the latency of several firmware
commands significantly increases with the presence of high number of
co-existing channels (both old and new sets). Hence, we used to close
unneeded old channels before invoking those firmware commands.
Today, on capable devices, this is no longer the case. Minimize the
interface down time by deferring the old channels closure, after the
activation of the new ones.
Perf numbers:
Measured the number of dropped packets in a simple ping flood test,
during a configuration change operation, that switches the number of
channels from 247 to 248.
Before: 71 packets lost
After: 15 packets lost, ~80% saving.
Tariq Toukan [Thu, 30 Oct 2025 13:32:37 +0000 (15:32 +0200)]
net/mlx5e: Do not re-apply TIR loopback configuration if not necessary
On old firmware, (tis_tir_td_order=0), TIR of a transport domain should
either be created after all SQs of the same domain, or TIR.self_lb_en
should be reapplied using MODIFY_TIR, for self loopback filtering to
function correctly.
This is not necessary anymnore on new FW (tis_tir_td_order=1), thus
there's no need for calling modify_tir operations after creating a new
set of SQs to maintain the self loopback prevention functional.
Skip these operations.
This saves O(max_num_channels) MODIFY_TIR firmware commands in
operations like interface up or channels configuration change.
Tariq Toukan [Thu, 30 Oct 2025 13:32:36 +0000 (15:32 +0200)]
net/mlx5: IPoIB, set self loopback prevention in TIR init
In IPoIB, the self loopback prevention configuration apply in activation
stage has two roles: fulfill a firmware requirement for old firmware
(tis_tir_td_order=0), and update the proper configuration as it was not
set in init.
Here we set the proper configuration in init, to allow skipping the
modify_tirs commands on new firmware in a downstream patch.
Tariq Toukan [Thu, 30 Oct 2025 13:32:33 +0000 (15:32 +0200)]
net/mlx5e: Enhance function structures for self loopback prevention application
The re-application of self loopback prevention attributes in TIRs is
necessary in old firmwares (where tis_tir_td_order cap is cleared) after
recreation of SQs.
However, this is not needed in new firmware with tis_tir_td_order=1.
As a preparation patch, enhance the function structures to differentiate
between an explicit loopback prevention configuration apply, and the
re-apply operation required by old firmware.
Loopback selftests should now call mlx5e_modify_tirs_lb() directly, as
their use case is not related to the firmware limitation.
Chu Guangqing [Mon, 3 Nov 2025 03:22:12 +0000 (11:22 +0800)]
xen/netfront: Comment Correction: Fix Spelling Error and Description of Queue Quantity Rules
The original comments contained spelling errors and incomplete logical
descriptions, which could easily lead to misunderstandings of the code
logic. The specific modifications are as follows:
Correct the spelling error by changing "inut max" to "but not exceed the
maximum limit";
Add the note "If the user has not specified a value, the default maximum
limit is 8" to clarify the default value logic;
Improve the coherence of the statement to make the queue quantity rules
clearer.
After the modification, the comments can accurately reflect the code
behavior of "taking the smaller value between the number of CPUs and the
default maximum limit of 8 for the number of queues", enhancing code
maintainability.
This series adds a callback for platform glue to configure the stmmac
core interface mode depending on the PHY interface mode that is being
used. This is currently only called just before the dwmac core is reset
since these signals are latched on reset.
Included in this series are changes to s32 to move its PHY_INTF_SEL_x
definitions out of the way of the dwmac core's signals which has more
entitlement to use this name. We convert dwmac-imx as an example.
Including other platform glue would make this series excessively large,
but once this core code is merged, the individual platform glue updates
can be posted one after another as they will be independent of each
other.
It is hoped that this callback can be used in future to reconfigure the
dwmac core when the interface mode changes to support PHYs that change
their interface mode, but we're nowhere near being able to do that yet.
====================
i.MX implementations other than IMX8DXL involve setting the dwmac core
phy_intf_sel input. Use stmmac_get_phy_intf_sel() to decode the PHY
interface mode to the phy_intf_sel value, validating the result, and
passing it into the implementation specific .set_intf_mode() method
rather than each .set_intf_mode() method doing this.
Convert dwmac-imx to use the PHY_INTF_SEL_xxx definitions rather than
constants via:
- ensuring that the prefix for the MASK and value definitions is the
same.
- using FIELD_PREP() to shift the PHY_INTF_SEL_xxx definition to the
appropriate bitfield.
net: stmmac: add support for configuring the phy_intf_sel inputs
When dwmac is synthesised with support for multiple PHY interfaces, the
core provides phy_intf_sel inputs, sampled on reset, to configure the
PHY facing interface. Use stmmac_get_phy_intf_sel() in core code to
determine the dwmac phy_intf_sel input value, and provide a new
platform method called with this value just before we issue a soft
reset to the dwmac core.
Provide a function to translate the PHY interface mode to the
phy_intf_sel pin configuration for dwmac1000 and dwmac4 cores that
support multiple interfaces. We currently handle MII, GMII, RGMII,
SGMII, RMII and REVMII, but not TBI, RTBI nor SMII as drivers do not
appear to use these three and the driver doesn't currently support
these.
net: stmmac: add phy_intf_sel and ACTPHYIF definitions
Add definitions for the active PHY interface found in DMA hardware
feature register 0, and also used to configure the core in multi-
interface designs via phy_intf_sel.
net: stmmac: s32: move PHY_INTF_SEL_x definitions out of the way
S32's PHY_INTF_SEL_x definitions conflict with those for the dwmac
cores as they use a different bitmapping. Add a S32 prefix so that
they are unique.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com> Link: https://patch.msgid.link/E1vFt4S-0000000ChoS-2Ahi@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: imx: use phylink's interface mode for set_clk_tx_rate()
imx_dwmac_set_clk_tx_rate() is passed the interface mode from phylink
which will be the same as plat_dat->phy_interface. Use the passed-in
interface mode rather than plat_dat->phy_interface.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4N-0000000ChoM-1llp@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Adrian Moreno [Mon, 3 Nov 2025 15:40:04 +0000 (16:40 +0100)]
rtnetlink: honor RTEXT_FILTER_SKIP_STATS in IFLA_STATS
Gathering interface statistics can be a relatively expensive operation
on certain systems as it requires iterating over all the cpus.
RTEXT_FILTER_SKIP_STATS was first introduced [1] to skip AF_INET6
statistics from interface dumps and it was then extended [2] to
also exclude IFLA_VF_INFO.
The semantics of the flag does not seem to be limited to AF_INET
or VF statistics and having a way to query the interface status
(e.g: carrier, address) without retrieving its statistics seems
reasonable. So this patch extends the use RTEXT_FILTER_SKIP_STATS
to also affect IFLA_STATS.
====================
xsk: minor optimizations around locks
Two optimizations regarding xsk_tx_list_lock and cq_lock can yield a
performance increase because of avoiding disabling and enabling
interrupts frequently.
====================
Jason Xing [Thu, 30 Oct 2025 00:06:46 +0000 (08:06 +0800)]
xsk: use a smaller new lock for shared pool case
- Split cq_lock into two smaller locks: cq_prod_lock and
cq_cached_prod_lock
- Avoid disabling/enabling interrupts in the hot xmit path
In either xsk_cq_cancel_locked() or xsk_cq_reserve_locked() function,
the race condition is only between multiple xsks sharing the same
pool. They are all in the process context rather than interrupt context,
so now the small lock named cq_cached_prod_lock can be used without
handling interrupts.
While cq_cached_prod_lock ensures the exclusive modification of
@cached_prod, cq_prod_lock in xsk_cq_submit_addr_locked() only cares
about @producer and corresponding @desc. Both of them don't necessarily
be consistent with @cached_prod protected by cq_cached_prod_lock.
That's the reason why the previous big lock can be split into two
smaller ones. Please note that SPSC rule is all about the global state
of producer and consumer that can affect both layers instead of local
or cached ones.
Frequently disabling and enabling interrupt are very time consuming
in some cases, especially in a per-descriptor granularity, which now
can be avoided after this optimization, even when the pool is shared by
multiple xsks.
With this patch, the performance number[1] could go from 1,872,565 pps
to 1,961,009 pps. It's a minor rise of around 5%.
Jason Xing [Thu, 30 Oct 2025 00:06:45 +0000 (08:06 +0800)]
xsk: do not enable/disable irq when grabbing/releasing xsk_tx_list_lock
The commit ac98d8aab61b ("xsk: wire upp Tx zero-copy functions")
originally introducing this lock put the deletion process in the
sk_destruct which can run in irq context obviously, so the
xxx_irqsave()/xxx_irqrestore() pair was used. But later another
commit 541d7fdd7694 ("xsk: proper AF_XDP socket teardown ordering")
moved the deletion into xsk_release() that only happens in process
context. It means that since this commit, it doesn't necessarily
need that pair.
Now, there are two places that use this xsk_tx_list_lock and only
run in the process context. So avoid manipulating the irq then.
Tonghao Zhang [Tue, 28 Oct 2025 04:32:44 +0000 (12:32 +0800)]
net: add net cookie for net device trace events
In a multi-network card or container environment, this is needed in order
to differentiate between trace events relating to net devices that exist
in different network namespaces and share the same name.
====================
ethtool: introduce PHY MSE diagnostics UAPI and drivers
This series introduces a generic kernel-userspace API for retrieving PHY
Mean Square Error (MSE) diagnostics, together with netlink integration,
a fast-path reporting hook in LINKSTATE_GET, and initial driver
implementations for the KSZ9477 and DP83TD510E PHYs.
MSE is defined by the OPEN Alliance "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification [1] as a measure of
slicer error rate, typically used internally to derive the Signal
Quality Indicator (SQI). While SQI is useful as a normalized quality
index, it hides raw measurement data, varies in scaling and thresholds
between vendors, and may not indicate certain failure modes - for
example, cases where autonegotiation would fail even though SQI reports
a good link. In practice, such scenarios can only be investigated in
fixed-link mode; here, MSE can provide an empirically estimated value
indicating conditions under which autonegotiation would not succeed.
Example output with current implementation:
root@DistroKit:~ ethtool lan1
Settings for lan1:
...
Speed: 1000Mb/s
Duplex: Full
...
Link detected: yes
SQI: 5/7
MSE: 3/127 (channel: worst)
root@DistroKit:~ ethtool --show-mse lan1
MSE diagnostics for lan1:
MSE Configuration:
Max Average MSE: 127
Refresh Rate: 2000000 ps
Symbols per Sample: 250
Supported capabilities: average channel-a channel-b channel-c
channel-d worst
Oleksij Rempel [Mon, 27 Oct 2025 12:28:01 +0000 (13:28 +0100)]
net: phy: dp83td510: add MSE interface support for 10BASE-T1L
Implement get_mse_capability() and get_mse_snapshot() for the DP83TD510E
to expose its Mean Square Error (MSE) register via the new PHY MSE
UAPI.
The DP83TD510E does not document any peak MSE values; it only exposes
a single average MSE register used internally to derive SQI. This
implementation therefore advertises only PHY_MSE_CAP_AVG, along with
LINK and channel-A selectors. Scaling is fixed to 0xFFFF, and the
refresh interval/number of symbols are estimated from 10BASE-T1L
symbol rate (7.5 MBd) and typical diagnostic intervals (~1 ms).
For 10BASE-T1L deployments, SQI is a reliable indicator of link
modulation quality once the link is established, but it does not
indicate whether autonegotiation pulses will be correctly received
in marginal conditions. MSE provides a direct measurement of slicer
error rate that can be used to evaluate if autonegotiation is likely
to succeed under a given cable length and condition. In practice,
testing such scenarios often requires forcing a fixed-link setup to
isolate MSE behaviour from the autonegotiation process.
Oleksij Rempel [Mon, 27 Oct 2025 12:28:00 +0000 (13:28 +0100)]
net: phy: micrel: add MSE interface support for KSZ9477 family
Implement the get_mse_capability() and get_mse_snapshot() PHY driver ops
for KSZ9477-series integrated PHYs to demonstrate the new PHY MSE
UAPI.
These PHYs do not expose a documented direct MSE register, but the
Signal Quality Indicator (SQI) registers are derived from the
internal MSE computation. This hook maps SQI readings into the MSE
interface so that tooling can retrieve the raw value together with
metadata for correct interpretation in userspace.
Behaviour:
- For 1000BASE-T, report per-channel (A–D) values and support a
WORST channel selector.
- For 100BASE-TX, only LINK-wide measurements are available.
- Report average MSE only, with a max scale based on
KSZ9477_MMD_SQI_MASK and a fixed refresh rate of 2 µs.
This mapping differs from the OPEN Alliance SQI definition, which
assigns thresholds such as pre-fail indices; the MSE interface
instead provides the raw measurement, leaving interpretation to
userspace.
Oleksij Rempel [Mon, 27 Oct 2025 12:27:59 +0000 (13:27 +0100)]
ethtool: netlink: add ETHTOOL_MSG_MSE_GET and wire up PHY MSE access
Introduce the userspace entry point for PHY MSE diagnostics via
ethtool netlink. This exposes the core API added previously and
returns both capability information and one or more snapshots.
Userspace sends ETHTOOL_MSG_MSE_GET. The reply carries:
- ETHTOOL_A_MSE_CAPABILITIES: scale limits and timing information
- ETHTOOL_A_MSE_CHANNEL_* nests: one or more snapshots (per-channel
if available, otherwise WORST, otherwise LINK)
Link down returns -ENETDOWN.
Changes:
- YAML: add attribute sets (mse, mse-capabilities, mse-snapshot)
and the mse-get operation
- UAPI (generated): add ETHTOOL_A_MSE_* enums and message IDs,
ETHTOOL_MSG_MSE_GET/REPLY
- ethtool core: add net/ethtool/mse.c implementing the request,
register genl op, and hook into ethnl dispatch
- docs: document MSE_GET in ethtool-netlink.rst
The include/uapi/linux/ethtool_netlink_generated.h is generated
from Documentation/netlink/specs/ethtool.yaml.
Oleksij Rempel [Mon, 27 Oct 2025 12:27:58 +0000 (13:27 +0100)]
net: phy: introduce internal API for PHY MSE diagnostics
Add the base infrastructure for Mean Square Error (MSE) diagnostics,
as proposed by the OPEN Alliance "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" [1] specification.
The OPEN Alliance spec defines only average MSE and average peak MSE
over a fixed number of symbols. However, other PHYs, such as the
KSZ9131, additionally expose a worst-peak MSE value latched since the
last channel capture. This API accounts for such vendor extensions by
adding a distinct capability bit and snapshot field.
Channel-to-pair mapping is normally straightforward, but in some cases
(e.g. 100BASE-TX with MDI-X resolution unknown) the mapping is ambiguous.
If hardware does not expose MDI-X status, the exact pair cannot be
determined. To avoid returning misleading per-channel data in this case,
a LINK selector is defined for aggregate MSE measurements.
All investigated devices differ in MSE capabilities, such
as sample rate, number of analyzed symbols, and scaling factors.
For example, the KSZ9131 uses different scaling for MSE and pMSE.
To make this visible to callers, scale limits and timing information
are returned via get_mse_capability().
Some PHYs sample very few symbols at high frequency (e.g. 2 us update
rate). To cover such cases and allow for future high-speed PHYs with
even shorter intervals, the refresh rate is reported as u64 in
picoseconds.
This patch introduces the internal PHY API for Mean Square Error
diagnostics. It defines new kernel-side data types and driver hooks:
- struct phy_mse_capability: describes supported metrics, scale
limits, update interval, and sampling length.
- struct phy_mse_snapshot: holds one correlated measurement set.
- New phy_driver ops: `get_mse_capability()` and `get_mse_snapshot()`.
These definitions form the core kernel API. No user-visible interfaces
are added in this commit.
Standardization notes:
OPEN Alliance defines presence and interpretation of some metrics but does
not fix numeric scales or sampling internals:
- SQI (3-bit, 0..7) is mandatory; correlation to SNR/BER is informative
(OA 100BASE-T1 TC1 v1.0 6.1.2; OA 1000BASE-T1 TC12 v2.2 6.1.2).
- MSE is optional; OA recommends 2^16 symbols and scaling to 0..511,
with a worst-case latch since last read (OA 100BASE-T1 TC1 v1.0 6.1.1; OA
1000BASE-T1 TC12 v2.2 6.1.1). Refresh is recommended (~0.8-2.0 ms for
100BASE-T1; ~80-200 us for 1000BASE-T1). Exact scaling/time windows
are vendor-specific.
- Peak MSE (pMSE) is defined only for 100BASE-T1 as optional, e.g.
128-symbol sliding window with 8-bit range and worst-case latch (OA
100BASE-T1 TC1 v1.0 6.1.3).
Therefore this API exposes which measures and selectors a PHY supports,
and documents where behavior is standard-referenced vs vendor-specific.
====================
Add support to do threaded napi busy poll
Extend the already existing support of threaded napi poll to do continuous
busy polling.
This is used for doing continuous polling of napi to fetch descriptors
from backing RX/TX queues for low latency applications. Allow enabling
of threaded busypoll using netlink so this can be enabled on a set of
dedicated napis for low latency applications.
Once enabled user can fetch the PID of the kthread doing NAPI polling
and set affinity, priority and scheduler for it depending on the
low-latency requirements.
Extend the netlink interface to allow enabling/disabling threaded
busypolling at individual napi level.
We use this for our AF_XDP based hard low-latency usecase with usecs
level latency requirement. For our usecase we want low jitter and stable
latency at P99.
Following is an analysis and comparison of available (and compatible)
busy poll interfaces for a low latency usecase with stable P99. This can
be suitable for applications that want very low latency at the expense
of cpu usage and efficiency.
Already existing APIs (SO_BUSYPOLL and epoll) allow busy polling a NAPI
backing a socket, but the missing piece is a mechanism to busy poll a
NAPI instance in a dedicated thread while ignoring available events or
packets, regardless of the userspace API. Most existing mechanisms are
designed to work in a pattern where you poll until new packets or events
are received, after which userspace is expected to handle them.
As a result, one has to hack together a solution using a mechanism
intended to receive packets or events, not to simply NAPI poll. NAPI
threaded busy polling, on the other hand, provides this capability
natively, independent of any userspace API. This makes it really easy to
setup and manage.
For analysis we use an AF_XDP based benchmarking tool `xsk_rr`. The
description of the tool and how it tries to simulate the real workload
is following,
- It sends UDP packets between 2 machines.
- The client machine sends packets at a fixed frequency. To maintain the
frequency of the packet being sent, we use open-loop sampling. That is
the packets are sent in a separate thread.
- The server replies to the packet inline by reading the pkt from the
recv ring and replies using the tx ring.
- To simulate the application processing time, we use a configurable
delay in usecs on the client side after a reply is received from the
server.
The xsk_rr tool is posted separately as an RFC for tools/testing/selftest.
We use this tool with following napi polling configurations,
- Interrupts only
- SO_BUSYPOLL (inline in the same thread where the client receives the
packet).
- SO_BUSYPOLL (separate thread and separate core)
- Threaded NAPI busypoll
System is configured using following script in all 4 cases,
```
echo 0 | sudo tee /sys/class/net/eth0/threaded
echo 0 | sudo tee /proc/sys/kernel/timer_migration
echo off | sudo tee /sys/devices/system/cpu/smt/control
echo 0 | sudo tee /proc/sys/net/core/rps_sock_flow_entries
echo 0 | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus
# pin IRQs on CPU 2
IRQS="$(gawk '/eth0-(TxRx-)?1/ {match($1, /([0-9]+)/, arr); \
print arr[0]}' < /proc/interrupts)"
for irq in "${IRQS}"; \
do echo 2 | sudo tee /proc/irq/$irq/smp_affinity_list; done
echo -1 | sudo tee /proc/sys/kernel/sched_rt_runtime_us
for i in /sys/devices/virtual/workqueue/*/cpumask; \
do echo $i; echo 1,2,3,4,5,6 > $i; done
if [[ -z "$1" ]]; then
echo 400 | sudo tee /proc/sys/net/core/busy_read
echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
fi
sudo ethtool -C eth0 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0
if [[ "$1" == "enable_threaded" ]]; then
echo 0 | sudo tee /proc/sys/net/core/busy_poll
echo 0 | sudo tee /proc/sys/net/core/busy_read
echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
NAPI_ID=$(ynl --family netdev --output-json --do queue-get \
--json '{"ifindex": '${IFINDEX}', "id": '0', "type": "rx"}' | jq '."napi-id"')
# pin threaded poll thread to CPU 2
sudo taskset -pc 2 $NAPI_T
fi
if [[ "$1" == "enable_interrupt" ]]; then
echo 0 | sudo tee /proc/sys/net/core/busy_read
echo 0 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
fi
```
To enable various configurations, script can be run as following,
- Interrupt Only
```
<script> enable_interrupt
```
- SO_BUSYPOLL (no arguments to script)
```
<script>
```
- NAPI threaded busypoll
```
<script> enable_threaded
```
Once configured, the workload is run with various configurations using
following commands. Set period (1/frequency) and delay in usecs to
produce results for packet frequency and application processing delay.
- Here without application processing all the approaches give the same
latency within 1usecs range and NAPI threaded gives minimum latency.
- With application processing the latency increases by 3-4usecs when
doing inline polling.
- Using a dedicated core to drive napi polling keeps the latency same
even with application processing. This is observed both in userspace
and threaded napi (in kernel).
- Using napi threaded polling in kernel gives lower latency by
1-2usecs as compared to userspace driven polling in separate core.
- Even on a dedicated core, SO_BUSYPOLL adds around 1-2usecs of latency.
This is because it doesn't continuously busy poll until events are
ready. Instead, it returns after polling only once, requiring the
process to re-invoke the syscall for each poll, which requires a new
enter/leave kernel cycle and the setup/teardown of the busy poll for
every single poll attempt.
- With application processing userspace will get the packet from recv
ring and spend some time doing application processing and then do napi
polling. While application processing is happening a dedicated core
doing napi polling can pull the packet of the NAPI RX queue and
populate the AF_XDP recv ring. This means that when the application
thread is done with application processing it has new packets ready to
recv and process in recv ring.
- Napi threaded busy polling in the kernel with a dedicated core gives
the consistent P5-P99 latency.
Note well that threaded napi busy-polling has not been shown to yield
efficiency or throughput benefits. In contrast, dedicating an entire
core to busy-polling one NAPI (NIC queue) is rather inefficient.
However, in certain specific use cases, this mechanism results in lower
packet processing latency. The experimental testing reported here only
covers those use cases and does not present a comprehensive evaluation
of threaded napi busy-polling.
Following histogram is generated to measure the time spent in recvfrom
while using inline thread with SO_BUSYPOLL. The histogram is generated
using the following bpftrace command. In this experiment there are 32K
packets per second and the application processing delay is 30usecs. This
is to measure whether there is significant time spent pulling packets
from the descriptor queue that it will affect the overall latency if
done inline.
selftests: Add napi threaded busy poll test in `busy_poller`
Add testcase to run busy poll test with threaded napi busy poll enabled.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Martin Karsten <mkarsten@uwaterloo.ca> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://patch.msgid.link/20251028203007.575686-3-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: Extend NAPI threaded polling to allow kthread based busy polling
Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to
enable and disable threaded busy polling.
When threaded busy polling is enabled for a NAPI, enable
NAPI_STATE_THREADED also.
When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to
signal napi_complete_done not to rearm interrupts.
Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the
NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the
NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread
go to sleep.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Martin Karsten <mkarsten@uwaterloo.ca> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mpls: Drop RTNL for RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE.
RTM_NEWROUTE looks up dev under RCU (ip_route_output(),
ipv6_stub->ipv6_dst_lookup_flow(), netdev_get_by_index()),
and each neighbour holds the refcnt of its dev.
Also, net->mpls.platform_label is protected by a dedicated
per-netns mutex.
Now, no MPLS code depends on RTNL.
Let's drop RTNL for RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE.
1) to guarantee the lifetime of struct mpls_nh.nh_dev
2) to protect net->mpls.platform_label
, but neither actually requires RTNL.
If we do not call dev_put() in find_outdev() and call it
just before freeing struct mpls_route, we can drop RTNL for 1).
Let's hold the refcnt of mpls_nh.nh_dev and track it with
netdevice_tracker.
Two notable changes:
If mpls_nh_build_multi() fails to set up a neighbour, we need
to call netdev_put() for successfully created neighbours in
mpls_rt_free_rcu(), so the number of neighbours (rt->rt_nhn)
is now updated in each iteration.
When a dev is unregistered, mpls_ifdown() clones mpls_route
and replaces it with the clone, so the clone requires extra
netdev_hold().
net: stmmac: rename devlink parameter ts_coarse into phc_coarse_adj
The devlink param "ts_coarse" doesn't indicate that we get coarse
timestamps, but rather that the PHC clock adjusments are coarse as the
frequency won't be continuously adjusted. Adjust the devlink parameter
name to reflect that.
The Coarse terminlogy comes from the dwmac register naming, update the
documentation to better explain what the parameter is about.
With this change, the parameter can now be adjusted using:
devlink dev param set <dev> name phc_coarse_adj value true cmode runtime
Jianhui Zhao [Sun, 2 Nov 2025 15:26:37 +0000 (16:26 +0100)]
net: phy: realtek: add interrupt support for RTL8221B
This commit introduces interrupt support for RTL8221B (C45 mode).
Interrupts are mapped on the VEND2 page. VEND2 registers are only
accessible via C45 reads and cannot be accessed by C45 over C22.
Signed-off-by: Jianhui Zhao <zhaojh329@gmail.com>
[Enable only link state change interrupts] Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251102152644.1676482-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alok Tiwari [Fri, 31 Oct 2025 11:26:44 +0000 (04:26 -0700)]
hinic3: fix misleading error message in hinic3_open_channel()
The error message printed when hinic3_configure() fails incorrectly
reports "Failed to init txrxq irq", which does not match the actual
operation performed. The hinic3_configure() function sets up various
device resources such as MTU and RSS parameters , not IRQ initialization.
Update the log to "Failed to configure device resources" to make the
message accurate and clearer for debugging.
Bagas Sanjaya [Tue, 28 Oct 2025 01:44:52 +0000 (08:44 +0700)]
Documentation: ARCnet: Update obsolete contact info
ARCnet docs states that inquiries on the subsystem should be emailed to
Avery Pennarun <apenwarr@worldvisions.ca>, for whom has been in CREDITS
since the beginning of kernel git history and her email address is
unreachable (bounce). The subsystem is now maintained by Michael
Grzeschik since c38f6ac74c9980 ("MAINTAINERS: add arcnet and take
maintainership").
In addition, there used to be a dedicated ARCnet mailing list but its
archive at epistolary.org has been shut down. ARCnet discussion nowadays
take place in netdev list. The arcnet.com domain mentioned has become
AIoT (Artificial Intelligence of Things) related Typeform page and
ARCnet info now resides on arcnet.cc (ARCnet Resource Center) instead.
Update contact information.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251028014451.10521-2-bagasdotme@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
dpll: Add support for phase adjustment granularity
Phase-adjust values are currently limited only by a min-max range. Some
hardware requires, for certain pin types, that values be multiples of
a specific granularity, as in the zl3073x driver.
Patch 1: Adds 'phase-adjust-gran' pin attribute and an appropriate
handling
Patch 2: Adds a support for this attribute into zl3073x driver
====================
Ivan Vecera [Wed, 29 Oct 2025 15:32:07 +0000 (16:32 +0100)]
dpll: zl3073x: Specify phase adjustment granularity for pins
Output pins phase adjustment values in the device are expressed
in half synth clock cycles. Use this number of cycles as output
pins' phase adjust granularity and simplify both get/set callbacks.
Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20251029153207.178448-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Wed, 29 Oct 2025 15:32:06 +0000 (16:32 +0100)]
dpll: add phase-adjust-gran pin attribute
Phase-adjust values are currently limited by a min-max range. Some
hardware requires, for certain pin types, that values be multiples of
a specific granularity, as in the zl3073x driver.
Add a `phase-adjust-gran` pin attribute and an appropriate field in
dpll_pin_properties. If set by the driver, use its value to validate
user-provided phase-adjust values.
Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20251029153207.178448-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Wismer [Wed, 29 Oct 2025 21:23:10 +0000 (22:23 +0100)]
dt-bindings: pse-pd: ti,tps23881: Add TPS23881B
Add the TPS23881B I2C power sourcing equipment controller to the list of
supported devices.
Falling back to the TPS23881 predecessor device is not suitable as firmware
loading needs to handled differently by the driver. The TPS23881 and
TPS23881B devices require different firmware. Trying to load the TPS23881
firmware on a TPS23881B device fails and must therefore be omitted.
Signed-off-by: Thomas Wismer <thomas.wismer@scs.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20251029212312.108749-3-thomas@wismer.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Wismer [Wed, 29 Oct 2025 21:23:09 +0000 (22:23 +0100)]
net: pse-pd: tps23881: Add support for TPS23881B
The TPS23881B uses different firmware than the TPS23881. Trying to load the
TPS23881 firmware on a TPS23881B device fails and must be omitted.
The TPS23881B ships with a more recent ROM firmware. Moreover, no updated
firmware has been released yet and so the firmware loading step must be
skipped. As of today, the TPS23881B is intended to use its ROM firmware.
Signed-off-by: Thomas Wismer <thomas.wismer@scs.ch> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251029212312.108749-2-thomas@wismer.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bagas Sanjaya [Thu, 30 Oct 2025 07:50:13 +0000 (14:50 +0700)]
Documentation: netconsole: Separate literal code blocks for full and short netcat command name versions
Both full and short (abbreviated) command name versions of netcat
example are combined in single literal code block due to 'or::'
paragraph being indented one more space than the preceding paragraph
(before the short version example).
Unindent it to separate the versions.
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251030075013.40418-1-bagasdotme@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: phy: microchip_t1s: Add support for LAN867x Rev.D0 PHY
This patch series adds support for the latest Microchip LAN8670/1/2 Rev.D0
10BASE-T1S PHYs to the microchip_t1s driver.
The new Rev.D0 silicon introduces updated initialization requirements and
link status handling behavior compared to earlier revisions (Rev.C2 and
below). These updates are necessary for full compliance with the OPEN
Alliance 10BASE-T1S specification and are documented in Microchip
Application Note AN1699 Revision G (DS60001699G – October 2025).
Summary of changes:
- Implements Rev.D0-specific configuration sequence as described in AN1699
Rev.G.
- Introduces link status control configuration for LAN867x Rev.D0.
====================
net: phy: microchip_t1s: configure link status control for LAN867x Rev.D0
Configure the link status in the Link Status Control register for
LAN8670/1/2 Rev.D0 PHYs, depending on whether PLCA or CSMA/CD mode
is enabled. When PLCA is enabled, the link status reflects the PLCA
status. When PLCA is disabled (CSMA/CD mode), the PHY does not support
autonegotiation, so the link status is forced active by setting
the LINK_STATUS_SEMAPHORE bit.
The link status control is configured:
- During PHY initialization, for default CSMA/CD mode.
- Whenever PLCA configuration is updated.
This ensures correct link reporting and consistent behavior for
LAN867x Rev.D0 devices.
net: phy: microchip_t1s: add support for Microchip LAN867X Rev.D0 PHY
Add support for the LAN8670/1/2 Rev.D0 10BASE-T1S PHYs from Microchip.
The new Rev.D0 silicon requires a specific set of initialization
settings to be configured for optimal performance and compliance with
OPEN Alliance specifications, as described in Microchip Application Note
AN1699 (Revision G, DS60001699G – October 2025).
https://www.microchip.com/en-us/application-notes/an1699
When operating in "SGMII" mode (Cisco SGMII or 2500BASE-X), qcom-ethqos
modifies the MAC control register in its ethqos_configure_sgmii()
function, which is only called from one path:
stmmac_mac_link_up()
+- reads MAC_CTRL_REG
+- masks out priv->hw->link.speed_mask
+- sets bits according to speed (2500, 1000, 100, 10) from priv->hw.link.speed*
+- ethqos_fix_mac_speed()
| +- qcom_ethqos_set_sgmii_loopback(false)
| +- ethqos_update_link_clk(speed)
| `- ethqos_configure(speed)
| `- ethqos_configure_sgmii(speed)
| +- reads MAC_CTRL_REG,
| +- configures PS/FES bits according to speed
| `- writes MAC_CTRL_REG as the last operation
+- sets duplex bit(s)
+- stmmac_mac_flow_ctrl()
+- writes MAC_CTRL_REG if changed from original read
...
As can be seen, the modification of the control register that
stmmac_mac_link_up() overwrites the changes that ethqos_fix_mac_speed()
does to the register. This makes ethqos_configure_sgmii()'s
modification questionable at best.
Thus, they appear to be doing very similar, with the exception of the
FES bit (bit 14) for 1G and 2.5G speeds.
Given that stmmac_mac_link_up() will write the MAC_CTRL_REG after
ethqos_configure_sgmii(), remove the unnecessary update in the
glue driver's ethqos_configure_sgmii() method, simplifying the code.
Konrad states:
Without any additional knowledge, the register description says:
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vEPlg-0000000CFHY-282A@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>