Jakub Kicinski [Tue, 17 Jun 2025 21:43:58 +0000 (14:43 -0700)]
Merge branch 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux
Shradha Gupta says:
====================
Allow dyn MSI-X vector allocation of MANA
In this patchset we want to enable the MANA driver to be able to
allocate MSI-X vectors in PCI dynamically.
The first patch exports pci_msix_prepare_desc() in PCI to be able to
correctly prepare descriptors for dynamically added MSI-X vectors.
The second patch adds the support of dynamic vector allocation in
pci-hyperv PCI controller by enabling the MSI_FLAG_PCI_MSIX_ALLOC_DYN
flag and using the pci_msix_prepare_desc() exported in first patch.
The third patch adds a detailed description of the irq_setup(), to
help understand the function design better.
The fourth patch is a preparation patch for mana changes to support
dynamic IRQ allocation. It contains changes in irq_setup() to allow
skipping first sibling CPU sets, in case certain IRQs are already
affinitized to them.
The fifth patch has the changes in MANA driver to be able to allocate
MSI-X vectors dynamically. If the support does not exist it defaults to
older behavior.
* 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux:
net: mana: Allocate MSI-X vectors dynamically
net: mana: Allow irq_setup() to skip cpus for affinity
net: mana: explain irq_setup() algorithm
PCI: hv: Allow dynamic MSI-X vector allocation
PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
====================
Paolo Abeni [Tue, 17 Jun 2025 12:58:47 +0000 (14:58 +0200)]
Merge branch 'intel-next-queue-1GbE'
Tony Nguyen says:
====================
Faizal Rahim says:
MAC Merge support for frame preemption was previously added for igc:
https://lore.kernel.org/netdev/20250418163822.3519810-1-anthony.l.nguyen@intel.com/
This series builds on that work and adds support for:
- Harmonizing taprio and mqprio queue priority behavior, based on past
discussions and suggestions:
https://lore.kernel.org/all/20250214102206.25dqgut5tbak2rkz@skbuf/
- Enabling preemptible queue support for both taprio and mqprio, with
priority harmonization as a prerequisite.
Patch organization:
- Patches 1-3: Preparation work for patches 6 and 7
- Patches 4-5: Queue priority harmonization
- Patches 6-7: Add preemptible queue support
====================
Wei Fang [Fri, 13 Jun 2025 09:36:05 +0000 (17:36 +0800)]
net: enetc: replace PCVLANR1/2 with SICVLANR1/2 and remove dead branch
Both PF and VF have rx-vlan-offload enabled, however, the PCVLANR1/2
registers are resources controlled by PF, so VF cannot access these
two registers. Fortunately, the hardware provides SICVLANR1/2 registers
for each SI to reflect the value of PCVLANR1/2 registers. Therefore,
use SICVLANR1/2 instead of PCVLANR1/2. Note that this is not an issue
in actual use, because the current driver does not support custom TPID,
the driver will not access these two registers in actual use, so this
modification is just an optimization.
In addition, since ENETC_RXBD_FLAG_TPID is defined as GENMASK(1, 0),
the possible values are only 0, 1, 2, 3, so the default branch will
never be true, so remove the default branch.
Shradha Gupta [Wed, 11 Jun 2025 14:11:13 +0000 (07:11 -0700)]
net: mana: Allocate MSI-X vectors dynamically
Currently, the MANA driver allocates MSI-X vectors statically based on
MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
up allocating more vectors than it needs. This is because, by this time
we do not have a HW channel and do not know how many IRQs should be
allocated.
To avoid this, we allocate 1 MSI-X vector during the creation of HWC and
after getting the value supported by hardware, dynamically add the
remaining MSI-X vectors.
Shradha Gupta [Wed, 11 Jun 2025 14:10:42 +0000 (07:10 -0700)]
net: mana: Allow irq_setup() to skip cpus for affinity
In order to prepare the MANA driver to allocate the MSI-X IRQs
dynamically, we need to enhance irq_setup() to allow skipping
affinitizing IRQs to the first CPU sibling group.
This would be for cases when the number of IRQs is less than or equal
to the number of online CPUs. In such cases for dynamically added IRQs
the first CPU sibling group would already be affinitized with HWC IRQ.
Yury Norov [Wed, 11 Jun 2025 14:10:29 +0000 (07:10 -0700)]
net: mana: explain irq_setup() algorithm
Commit 91bfe210e196 ("net: mana: add a function to spread IRQs per CPUs")
added the irq_setup() function that distributes IRQs on CPUs according
to a tricky heuristic. The corresponding commit message explains the
heuristic.
Duplicate it in the source code to make available for readers without
digging git in history. Also, add more detailed explanation about how
the heuristics is implemented.
Shradha Gupta [Wed, 11 Jun 2025 14:10:15 +0000 (07:10 -0700)]
PCI: hv: Allow dynamic MSI-X vector allocation
Allow dynamic MSI-X vector allocation for pci_hyperv PCI controller
by adding support for the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and using
pci_msix_prepare_desc() to prepare the MSI-X descriptors.
Shradha Gupta [Wed, 11 Jun 2025 14:10:01 +0000 (07:10 -0700)]
PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
For supporting dynamic MSI-X vector allocation by PCI controllers, enabling
the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
to prepare the MSI descriptor is also needed.
Export pci_msix_prepare_desc() to allow PCI controllers to support dynamic
MSI-X vector allocation.
Jakub Kicinski [Fri, 13 Jun 2025 17:27:51 +0000 (10:27 -0700)]
eth: gianfar: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
Uniquely, this driver supports only the SET operation. It does not
support GET at all. The SET callback also always returns 0, even
tho it checks a bunch of conditions, and if my quick reading is
right, expects the user to insert filtering rules for given flow
type first? Long story short it seems too convoluted to easily
add the GET as part of the conversion.
Heiner Kallweit [Sat, 14 Jun 2025 20:31:57 +0000 (22:31 +0200)]
net: phy: improve phy_driver_is_genphy
Use new flag phydev->is_genphy_driven to simplify this function.
Note that this includes a minor functional change:
Now this function returns true if ANY of the genphy drivers
is bound to the PHY device.
We have only one user in DSA driver mt7530, and there the
functional change doesn't matter.
====================
eth: intel: migrate to new RXFH callbacks
Migrate Intel drivers to the recently added dedicated .get_rxfh_fields
and .set_rxfh_fields ethtool callbacks.
Note that I'm deleting all the boilerplate kdoc from the affected
functions in the more recent drivers. If the maintainers feel strongly
I can respin and add it back, but it really feels useless and undue
burden for refactoring. No other vendor does this.
Jakub Kicinski [Sat, 14 Jun 2025 18:09:07 +0000 (11:09 -0700)]
eth: iavf: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
I'm deleting all the boilerplate kdoc from the affected functions.
It is somewhere between pointless and incorrect, just a burden for
people refactoring the code.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Jun 2025 18:09:06 +0000 (11:09 -0700)]
eth: ice: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
I'm deleting all the boilerplate kdoc from the affected functions.
It is somewhere between pointless and incorrect, just a burden for
people refactoring the code.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Jun 2025 18:09:05 +0000 (11:09 -0700)]
eth: i40e: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
I'm deleting all the boilerplate kdoc from the affected functions.
It is somewhere between pointless and incorrect, just a burden for
people refactoring the code.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Jun 2025 18:09:04 +0000 (11:09 -0700)]
eth: fm10k: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
.get callback moves out of the switch and set_rxnfc disappears
as ETHTOOL_SRXFH as the only functionality.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Jun 2025 18:09:03 +0000 (11:09 -0700)]
eth: ixgbe: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Jun 2025 18:09:02 +0000 (11:09 -0700)]
eth: igc: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Jun 2025 18:09:01 +0000 (11:09 -0700)]
eth: igb: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 14 Jun 2025 18:06:38 +0000 (11:06 -0700)]
eth: enetc: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is trivial.
Jakub Kicinski [Sat, 14 Jun 2025 18:06:37 +0000 (11:06 -0700)]
eth: e1000e: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed and it's the only
get_rxnfc sub-command the driver supports. So convert the get_rxnfc
handler into a get_rxfh_fields handler.
Jakub Kicinski [Sat, 14 Jun 2025 18:06:36 +0000 (11:06 -0700)]
eth: lan743x: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is purely factoring out the handling into a helper.
Jakub Kicinski [Sat, 14 Jun 2025 18:06:35 +0000 (11:06 -0700)]
eth: cxgb4: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is purely factoring out the handling into a helper.
Jakub Kicinski [Sat, 14 Jun 2025 18:06:34 +0000 (11:06 -0700)]
eth: cisco: migrate to new RXFH callbacks
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
This driver's RXFH config is read only / fixed so the conversion
is trivial.
Jakub Kicinski [Tue, 17 Jun 2025 00:37:54 +0000 (17:37 -0700)]
Merge branch 'cn20k-silicon-with-mbox-support'
Subbaraya Sundeep says:
====================
CN20K silicon with mbox support
CN20K is the next generation silicon in the Octeon series with various
improvements and new features.
Along with other changes the mailbox communication mechanism between RVU
(Resource virtualization Unit) SRIOV PFs/VFs with Admin function (AF) has
also gone through some changes.
Some of those changes are
- Separate IRQs for mbox request and response/ack.
- Configurable mbox size, default being 64KB.
- Ability for VFs to communicate with RVU AF instead of going through
parent SRIOV PF.
Due to more memory requirement due to configurable mbox size, mbox memory
will now have to be allocated by
- AF (PF0) for communicating with other PFs and all VFs in the system.
- PF for communicating with it's child VFs.
On previous silicons mbox memory was reserved and configured by firmware.
This patch series add basic mbox support for AF (PF0) <=> PFs and
PF <=> VFs. AF <=> VFs communication and variable mbox size support will
come in later.
Patch #1 Supported co-existance of bit encoding PFs and VFs in 16-bit
hardware pcifunc format between CN20K silicon and older octeon
series. Also exported PF,VF masks and shifts present in mailbox
module to all other modules.
Patch #2 Added basic mbox operation APIs and structures to support both
CN20K and previous version of silicons.
Patch #3 This patch adds support for basic mbox infrastructure
implementation for CN20K silicon in AF perspective. There are
few updates w.r.t MBOX ACK interrupt and offsets in CN20k.
Patch #4 Added mbox implementation between NIC PF and AF for CN20K.
Patch #5 Added mbox communication support between AF and AF's VFs.
Patch #6 This patch adds support for MBOX communication between NIC PF and
its VFs.
====================
Sai Krishna [Wed, 11 Jun 2025 11:01:56 +0000 (16:31 +0530)]
octeontx2-pf: CN20K mbox implementation between PF-VF
This patch implements the CN20k MBOX communication between PF and
it's VFs. CN20K silicon got extra interrupt of MBOX response for trigger
interrupt. Also few of the CSR offsets got changed in CN20K against
prior series of silicons.
Sai Krishna [Wed, 11 Jun 2025 11:01:55 +0000 (16:31 +0530)]
octeontx2-af: CN20K mbox implementation for AF's VF
This patch implements the CN20k MBOX communication between AF and
AF's VFs. This implementation uses separate trigger interrupts
for request, response messages against using trigger message data in CN10K.
Sai Krishna [Wed, 11 Jun 2025 11:01:54 +0000 (16:31 +0530)]
octeontx2-pf: CN20K mbox REQ/ACK implementation for NIC PF
This implementation uses separate trigger interrupts for request,
response messages against using trigger message data in CN10K.
This patch adds support for basic mbox implementation for CN20K
from NIC PF side.
Sai Krishna [Wed, 11 Jun 2025 11:01:53 +0000 (16:31 +0530)]
octeontx2-af: CN20k mbox to support AF REQ/ACK functionality
This implementation uses separate trigger interrupts for request,
response MBOX messages against using trigger message data in CN10K.
This patch adds support for basic mbox implementation for CN20K
from AF side.
Sai Krishna [Wed, 11 Jun 2025 11:01:52 +0000 (16:31 +0530)]
octeontx2-af: CN20k basic mbox operations and structures
This patch adds basic mbox operation APIs and structures to add support
for mbox module on CN20k silicon. There are few CSR offsets, interrupts
changed between CN20k and prior Octeon series of devices.
octeontx2: Set appropriate PF, VF masks and shifts based on silicon
Number of RVU PFs on CN20K silicon have increased to 96 from maximum
of 32 that were supported on earlier silicons. Every RVU PF and VF is
identified by HW using a 16bit PF_FUNC value. Due to the change in
Max number of PFs in CN20K, the bit encoding of this PF_FUNC has changed.
This patch handles the change by using helper functions(using silicon
check) to use PF,VF masks and shifts to support both new silicon CN20K,
OcteonTx series. These helper functions are used in different modules.
Also moved the NIX AF register offset macros to other files which
will be posted in coming patches.
====================
seg6: Allow End.X behavior to accept an oif
Patches #1-#3 gradually extend the End.X behavior to accept an output
interface as an optional argument. This is needed for cases where user
space wishes to specify an IPv6 link-local address as the nexthop
address.
Patch #4 adds test cases to the existing End.X selftest to cover the new
functionality.
====================
Ido Schimmel [Thu, 12 Jun 2025 12:23:23 +0000 (15:23 +0300)]
selftests: seg6: Add test cases for End.X with link-local nexthop
In the current test topology, all the routers are connected to each
other via dedicated links with addresses of the form fcf0:0:x:y::/64.
The test configures rt-3 with an adjacency with rt-4 and rt-4 with an
adjacency with rt-1:
# ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
fcbb:0:300::/48 encap seg6local action End.X nh6 fcf0:0:3:4::4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
# ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
fcbb:0:400::/48 encap seg6local action End.X nh6 fcf0:0:1:4::1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
The routes are used when pinging hs-2 from hs-1 and vice-versa.
Extend the test to also cover End.X behavior with an IPv6 link-local
nexthop address and an output interface. Configure every router
interface with an IPv6 link-local address of the form fe80::x:y/64 and
before re-running the ping tests, replace the previous End.X routes with
routes that use the new IPv6 link-local addresses:
# ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
fcbb:0:300::/48 encap seg6local action End.X nh6 fe80::4:3 oif veth-rt-3-4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
# ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
fcbb:0:400::/48 encap seg6local action End.X nh6 fe80::1:4 oif veth-rt-4-1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
The new test cases fail without the previous patch ("seg6: Allow End.X
behavior to accept an oif"):
# ./srv6_end_x_next_csid_l3vpn_test.sh
[...]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv6), link-local
################################################################################
Ido Schimmel [Thu, 12 Jun 2025 12:23:22 +0000 (15:23 +0300)]
seg6: Allow End.X behavior to accept an oif
Extend the End.X behavior to accept an output interface as an optional
attribute and make use of it when resolving a route. This is needed when
user space wants to use a link-local address as the nexthop address.
Before:
# ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6
# ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6
$ ip -6 route show
2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 dev sr6 metric 1024 pref medium
2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium
After:
# ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6
# ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6
$ ip -6 route show
2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6 metric 1024 pref medium
2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium
Note that the oif attribute is not dumped to user space when it was not
specified (as an oif of 0) since each entry keeps track of the optional
attributes that it parsed during configuration (see struct
seg6_local_lwt::parsed_optattrs).
Ido Schimmel [Thu, 12 Jun 2025 12:23:21 +0000 (15:23 +0300)]
seg6: Call seg6_lookup_any_nexthop() from End.X behavior
seg6_lookup_nexthop() is a wrapper around seg6_lookup_any_nexthop().
Change End.X behavior to invoke seg6_lookup_any_nexthop() directly so
that we would not need to expose the new output interface argument
outside of the seg6local module.
Ido Schimmel [Thu, 12 Jun 2025 12:23:20 +0000 (15:23 +0300)]
seg6: Extend seg6_lookup_any_nexthop() with an oif argument
seg6_lookup_any_nexthop() is called by the different endpoint behaviors
(e.g., End, End.X) to resolve an IPv6 route. Extend the function with an
output interface argument so that it could be used to resolve a route
with a certain output interface. This will be used by subsequent patches
that will extend the End.X behavior with an output interface as an
optional argument.
ip6_route_input_lookup() cannot be used when an output interface is
specified as it ignores this parameter. Similarly, calling
ip6_pol_route() when a table ID was not specified (e.g., End.X behavior)
is wrong.
Therefore, when an output interface is specified without a table ID,
resolve the route using ip6_route_output() which will take the output
interface into account.
Note that no endpoint behavior currently passes both a table ID and an
output interface, so the oif argument passed to ip6_pol_route() is
always zero and there are no functional changes in this regard.
Jakub Kicinski [Mon, 16 Jun 2025 22:27:27 +0000 (15:27 -0700)]
Merge branch 'gve-add-rx-hw-timestamping-support'
Ziwei Xiao says:
====================
gve: Add Rx HW timestamping support
This patch series add the support of Rx HW timestamping, which sends
adminq commands periodically to the device for clock synchronization with
the NIC.
The ability to read the PHC from user space will be added in the
future patch series when adding the actual PTP support. For this patch
series, it's adding the initial ptp to utilize the ptp_schedule_worker
to schedule the work of syncing the NIC clock.
====================
John Fraker [Sat, 14 Jun 2025 00:07:53 +0000 (00:07 +0000)]
gve: Implement ndo_hwtstamp_get/set for RX timestamping
Implement ndo_hwtstamp_get/set to enable hardware RX timestamping,
providing support for SIOC[SG]HWTSTAMP IOCTLs. Included with this support
is the small change necessary to read the rx timestamp out of the rx
descriptor, now that timestamps start being enabled. The gve clock is
only used for hardware timestamps, so started when timestamps are
requested and stopped when not needed.
This version only supports RX hardware timestamping with the rx filter
HWTSTAMP_FILTER_ALL. If the user attempts to configure a more
restrictive filter, the filter will be set to HWTSTAMP_FILTER_ALL in the
returned structure.
Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-8-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
John Fraker [Sat, 14 Jun 2025 00:07:52 +0000 (00:07 +0000)]
gve: Add rx hardware timestamp expansion
Allow the rx path to recover the high 32 bits of the full 64 bit rx
timestamp.
Use the low 32 bits of the last synced nic time and the 32 bits of the
timestamp provided in the rx descriptor to generate a difference, which
is then applied to the last synced nic time to reconstruct the complete
64-bit timestamp.
This scheme remains accurate as long as no more than ~2 seconds have
passed between the last read of the nic clock and the timestamping
application of the received packet.
Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-7-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kevin Yang [Sat, 14 Jun 2025 00:07:51 +0000 (00:07 +0000)]
gve: Add support to query the nic clock
Query the nic clock and store the results. The timestamp delivered
in descriptors has a wraparound time of ~4 seconds so 250ms is chosen
as the sync cadence to provide a balance between performance, and
drift potential when we do start associating host time and nic time.
Leverage PTP's aux_work to query the nic clock periodically.
Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-6-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ziwei Xiao [Sat, 14 Jun 2025 00:07:50 +0000 (00:07 +0000)]
gve: Add adminq lock for queues creation and destruction
Adminq commands for queues creation and destruction were not
consistently protected by the driver's adminq_lock. This was previously
benign as these operations were always initiated from contexts holding
kernel-level locks (e.g., rtnl_lock, netdev_lock), which provided
serialization.
Upcoming PTP aux_work will issue adminq commands directly from the
driver to read the NIC clock, without such kernel lock protection.
To prevent race conditions with this new PTP work, this patch ensures
the adminq_lock is held during queues creation and destruction.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-5-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
John Fraker [Sat, 14 Jun 2025 00:07:48 +0000 (00:07 +0000)]
gve: Add adminq command to report nic timestamp
Add an adminq command to read NIC's hardware clock. The driver
allocates dma memory and passes that dma memory address to the device.
The device then writes the clock to the given address.
Signed-off-by: Jeff Rogers <jefrogers@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-3-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
John Fraker [Sat, 14 Jun 2025 00:07:47 +0000 (00:07 +0000)]
gve: Add device option for nic clock synchronization
Add the device option and negotiation with the device for clock
synchronization with the nic. This option is necessary before the driver
will advertise support for hardware timestamping or other related
features.
Signed-off-by: Jeff Rogers <jefrogers@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-2-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Haiyang Zhang [Fri, 13 Jun 2025 17:00:34 +0000 (10:00 -0700)]
net: mana: Add handler for hardware servicing events
To collaborate with hardware servicing events, upon receiving the special
EQE notification from the HW channel, remove the devices on this bus.
Then, after a waiting period based on the device specs, rescan the parent
bus to recover the devices.
====================
netpoll: Untangle netconsole and netpoll
Initially netpoll and netconsole were created together, and some
functions are in the wrong file. Seperate netconsole-only functions
in netconsole, avoiding exports.
1. Expose netpoll logging macros in the public header to enable consistent
log formatting across netpoll consumers.
2. Relocate netconsole-specific functions from netpoll to the netconsole
module where they are actually used, reducing unnecessary coupling.
3. Remove unnecessary function exports
4. Rename netpoll parsing functions in netconsole to better reflect their
specific usage.
5. Create a test to check that cmdline works fine. This was in my todo
list since [1], this was a good time to add it here to make sure this
patchset doesn't regress.
PS: The code was split in a way that it is easy to review. When copying
the functions from netpoll to netconsole, I do not change than other
than adding `static`. This will make checkpatch unhappy, but, further
patches will address the issues. It is done this way to make it easy for
reviewers.
Breno Leitao [Fri, 13 Jun 2025 11:31:37 +0000 (04:31 -0700)]
selftests: net: add netconsole test for cmdline configuration
Add a new selftest to verify netconsole module loading with command
line arguments. This test exercises the init_netconsole() path and
validates proper parsing of the netconsole= parameter format.
The test:
- Loads netconsole module with cmdline configuration instead of
dynamic reconfiguration
- Validates message transmission through the configured target
- Adds helper functions for cmdline string generation and module
validation
This complements existing netconsole selftests by covering the
module initialization code path that processes boot-time parameters.
This test is useful to test issues like the one described in [1].
Breno Leitao [Fri, 13 Jun 2025 11:31:35 +0000 (04:31 -0700)]
netconsole: improve code style in parser function
Split assignment from conditional checks and use preferred null pointer
check style (!delim instead of == NULL) in netconsole_parser_cmdline().
This improves code readability and follows kernel coding style
conventions.
Breno Leitao [Fri, 13 Jun 2025 11:31:34 +0000 (04:31 -0700)]
netconsole: rename functions to better reflect their purpose
Rename netpoll_parse_options() to netconsole_parser_cmdline() and
netpoll_print_options() to netconsole_print_banner() to better
describe what these functions actually do within the netconsole
context.
Also fix minor code style issues including variable declaration
ordering and spacing.
These functions are specific to netconsole functionality rather
than general netpoll operations, so the new names better reflect
their actual purpose.
Breno Leitao [Fri, 13 Jun 2025 11:31:33 +0000 (04:31 -0700)]
netpoll: move netpoll_print_options to netconsole
Move netpoll_print_options() from net/core/netpoll.c to
drivers/net/netconsole.c and make it static. This function is only used
by netconsole, so there's no need to export it or keep it in the public
netpoll API.
This reduces the netpoll API surface and improves code locality
by keeping netconsole-specific functionality within the netconsole
driver.
Breno Leitao [Fri, 13 Jun 2025 11:31:32 +0000 (04:31 -0700)]
netpoll: relocate netconsole-specific functions to netconsole module
Move netpoll_parse_ip_addr() and netpoll_parse_options() from the generic
netpoll module to the netconsole module where they are actually used.
These functions were originally placed in netpoll but are only consumed by
netconsole. This refactoring improves code organization by:
- Removing unnecessary exported symbols from netpoll
- Making netpoll_parse_options() static (no longer needs global visibility)
- Reducing coupling between netpoll and netconsole modules
The functions remain functionally identical - this is purely a code
reorganization to better reflect their actual usage patterns. Here are
the changes:
1) Move both functions from netpoll to netconsole
2) Add static to netpoll_parse_options()
3) Removed the EXPORT_SYMBOL()
PS: This diff does not change the function format, so, it is easy to
review, but, checkpatch will not be happy. A follow-up patch will
address the current issues reported by checkpatch.
Breno Leitao [Fri, 13 Jun 2025 11:31:31 +0000 (04:31 -0700)]
netpoll: expose netpoll logging macros in public header
Move np_info(), np_err(), and np_notice() macros from internal
implementation to the public netpoll header file to make them
available for use by netpoll consumers.
These logging macros provide consistent formatting for netpoll-related
messages by automatically prefixing log output with the netpoll instance
name.
The goal is to use the exact same format that is being displayed today,
instead of creating something netconsole-specific.
Breno Leitao [Fri, 13 Jun 2025 11:31:30 +0000 (04:31 -0700)]
netpoll: remove __netpoll_cleanup from exported API
Since commit 97714695ef90 ("net: netconsole: Defer netpoll cleanup to
avoid lock release during list traversal"), netconsole no longer uses
__netpoll_cleanup(). With no remaining users, remove this function
from the exported netpoll API.
The function remains available internally within netpoll for use by
netpoll_cleanup().
Breno Leitao [Fri, 13 Jun 2025 17:15:46 +0000 (10:15 -0700)]
ptp: Use ratelimite for freerun error message
Replace pr_err() with pr_err_ratelimited() in ptp_clock_settime() to
prevent log flooding when the physical clock is free running, which
happens on some of my hosts. This ensures error messages are
rate-limited and improves kernel log readability.
====================
net: phy: make phy_package a separate module
Only a handful of PHY drivers needs the PHY package functionality,
therefore make it a separate module which is built only if needed.
====================
It appears that the GMAC_ANE_ADV and GMAC_ANE_LPA registers are only
available for TBI and RTBI PHY interfaces. In commit 482b3c3ba757
("net: stmmac: Drop TBI/RTBI PCS flags") support for these was dropped,
and thus it no longer makes sense to access these registers.
Remove the *_get_adv_lp() functions, and the now redundant struct
rgmii_adv and STMMAC_PCS_* definitions.
Steven Rostedt [Thu, 12 Jun 2025 13:46:16 +0000 (09:46 -0400)]
net/tcp_ao: tracing: Hide tcp_ao events under CONFIG_TCP_AO
Several of the tcp_ao events are only called when CONFIG_TCP_AO is
defined. As each event can take up to 5K regardless if they are used or
not, it's best not to define them when they are not used. Add #ifdef
around these events when they are not used.
Add ethqos_pcs_set_inband() to improve readability, and to allow future
changes when phylink PCS support is properly merged.
Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sa8775p-ride-r3 Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1uPkbO-004EyA-EU@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yajun Deng [Thu, 12 Jun 2025 14:27:07 +0000 (14:27 +0000)]
net: sysfs: Implement is_visible for phys_(port_id, port_name, switch_id)
phys_port_id_show, phys_port_name_show and phys_switch_id_show would
return -EOPNOTSUPP if the netdev didn't implement the corresponding
method.
There is no point in creating these files if they are unsupported.
Put these attributes in netdev_phys_group and implement the is_visible
method. make phys_(port_id, port_name, switch_id) invisible if the netdev
dosen't implement the corresponding method.
net: ti: icssg-prueth: Read firmware-names from device tree
Refactor the way firmware names are handled for the ICSSG PRUETH driver.
Instead of using hardcoded firmware name arrays for different modes (EMAC,
SWITCH, HSR), the driver now reads the firmware names from the device tree
property "firmware-name". Only the EMAC firmware names are specified in the
device tree property. The firmware names for all other supported modes are
generated dynamically based on the EMAC firmware names by replacing
substrings (e.g., "eth" with "sw" or "hsr") as appropriate.
Example: Below are the firmwares used currently for PRU0 core
All three firmware names are same except for the operating mode.
In general for PRU0 core, firmware name is,
ti-pruss/am65x-sr2-pru0-pru<mode>-fw.elf
Since the EMAC firmware names are defined in DT, driver will read those
directly and for other modes swap the mode name. i.e. eth -> sw or
eth -> hsr.
This preserves backwards compatibility as ICSSG driver is supported only
by AM65x and AM64x. Both of these have "firmware-name" property
populated in their device tree.
Jakub Kicinski [Sat, 14 Jun 2025 01:23:01 +0000 (18:23 -0700)]
Merge branch 'net-stmmac-rk-much-needed-cleanups'
Russell King says:
====================
net: stmmac: rk: much needed cleanups
This series starts attacking the reams of fairly identical duplicated
code in dwmac-rk. Every new SoC that comes along seems to need more
code added to this file because e.g. the way the clock is controlled
is different in every SoC.
The first thing to realise is that the driver only supports RMII and
RGMII interface modes. So, the first patch adds a .get_interfaces()
implementation which reports this for phylink's usage, thus ensuring
that we error out during initialisation should something that isn't
supported be specified. Note that there is one case where there are
a pair of interfaces, one supports only RMII the other supports RMII
and RGMII, but we report both anyway - something that the existing
driver allows. A future patch may attempt to fix this.
Rather than writing code, let's realise that there are two major
implementations here:
1. a struct clk that needs to be set.
2. writing a register with settings for RGMII and RMII speeds.
Provide implementations for these, Also realise that as a result
of doing this, we can kill off the .set_rgmii_speed() and
.set_rmii_speed() methods by combining them together - indeed,
this is what later SoCs already do by pointing both these methods
at the same function.
Overall, this patch series shrinks the file LOC by almost 8.7%
by removing 175 lines from over 2000 lines.
Apart from the error reporting changing and restricting interface
modes to those that the driver supports, no functional change is
anticipated with this patch. However, I have no hardware to test
this.
====================
Now that no SoC implements the .set_*_speed() methods, we can get rid
of these methods and the now unused code in rk_set_clk_tx_rate().
Arrange for the function to return an error when the .set_speed()
method is not implemented.
px30_set_rmii_speed() doesn't need to be as verbose as it is - it
merely needs the values for the register and clock rate which depend
on the speed, and then call the appropriate functions. Rewrite the
function to make it so.
As a result of the previous patches, many of the .set_rgmii_speed()
and .set_rmii_speed() implementations are identical apart from the
interface mode. Add a new .set_speed() function which takes the
interface mode in addition to the speed, and use it to combine the
separate implementations, calling the common rk_set_reg_speed()
function.
Also convert rk_set_clk_mac_speed() to be called by this new method
pointer, rather than having these implementations called from both
.set_*_speed() methods.
Remove all the error messages from the .set_speed() methods, as these
return an error code which is propagated up to stmmac_mac_link_up()
which will print the error.
Just like rk3568, there is no need to have separate RGMII and RMII
methods to set clk_mac_speed() as rgmii_clock() can be used to return
the clock rate for both RGMII and RMII interface modes. Combine these
two methods.
net: stmmac: rk: add struct for programming register based speeds
There is a common pattern in the driver where many SoCs need to write a
single register with a value dependent on the interface mode and speed.
Rather than having a lot of repeated code, add some common functions
and a struct to contain the values to be written to a register to
select the RGMII and RMII speeds.
Rather than having lots of regmap_write()s to the same register but
with different values depending on the speed, reorganise the
functions to use a local variable for the value, and then have one
regmap_write() call to write it to the register. This reduces the
amount of code and is a step towards further reducing the code size.
RK platforms support RGMII and/or RMII depending on the SoC. Detect
whether support for a SoC exists by whether the interface specific
set_to functions have been populated, and set the appropriate bits in
phylink's bitmap of interfaces.
This assumes all dwmac interfaces on a SoC have identical support,
but it should be noted that this is not true for RK3528 which only
supports RGMII on GMAC1. However, the existing code structure
permits RGMII to be configured on GMAC0 without complaint, so
preserve this behaviour even though it is incorrect to avoid
functional change.
Phase offset measurement is typically performed against the current active
source. However, some DPLL (Digital Phase-Locked Loop) devices may offer
the capability to monitor phase offsets across all available inputs.
The attribute and current feature state shall be included in the response
message of the ``DPLL_CMD_DEVICE_GET`` command for supported DPLL devices.
In such cases, users can also control the feature using the
``DPLL_CMD_DEVICE_SET`` command by setting the ``enum dpll_feature_state``
values for the attribute.
Once enabled the phase offset measurements for the input shall be returned
in the ``DPLL_A_PIN_PHASE_OFFSET`` attribute.
Implement feature support in ice driver for dpll-enabled devices.
ice: add phase offset monitor for all PPS dpll inputs
Implement a new admin command and helper function to handle and obtain
CGU measurements for input pins.
Add new callback operations to control the dpll device-level feature
"phase offset monitor," allowing it to be enabled or disabled. If the
feature is enabled, provide users with measured phase offsets and
notifications.
Initialize PPS DPLL with new callback operations if the feature is
supported by the firmware.
Add new callback operations for a dpll device:
- phase_offset_monitor_get(..) - to obtain current state of phase offset
monitor feature from dpll device,
- phase_offset_monitor_set(..) - to allow feature configuration.
Obtain the feature state value using the get callback and provide it to
the user if the device driver implements callbacks.
dpll: add phase-offset-monitor feature to netlink spec
Add enum dpll_feature_state for control over features.
Add dpll device level attribute:
DPLL_A_PHASE_OFFSET_MONITOR - to allow control over a phase offset monitor
feature. Attribute is present and shall return current state of a feature
(enum dpll_feature_state), if the device driver provides such capability,
otherwie attribute shall not be present.
Improve the .set_clk_tx_rate() method error message to include the
PHY interface mode along with the speed, which will be helpful to
the RK implementations.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E1uPjjx-0049r5-NN@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve the rgmii_clock() documentation to indicate that it can also
be used for MII, GMII and RMII modes as well as RGMII as the required
clock rates are identical, but note that it won't error out for 1G
speeds for MII and RMII.
RubenKelevra [Thu, 12 Jun 2025 14:50:12 +0000 (16:50 +0200)]
net: pfcp: fix typo in message_priority field name
The field is spelled "message_priprity" in the big-endian bit-field
definition. Nothing in-tree currently references the member, so the
typo does not break kernel builds, but it is clearly incorrect.
Jakub Kicinski [Sat, 14 Jun 2025 01:09:48 +0000 (18:09 -0700)]
Merge branch 'dp83tg720-reduce-link-recovery'
Oleksij Rempel says:
====================
dp83tg720: Reduce link recovery
This patch series improves the link recovery behavior of the TI
DP83TG720 PHY driver.
Previously, we introduced randomized reset delay logic to avoid reset
collisions in multi-PHY setups. While this approach was functional, it
had notable drawbacks: unpredictable behavior, longer and more variable
link recovery times, and overall higher complexity in link handling.
With this new approach, we replace the randomized delay with
deterministic, role-specific delays in the PHY reset logic. This enables
us to:
- Remove the redundant empirical 600 ms delay in read_status()
- Drop the random polling interval logic
- Introduce a clean, adaptive polling strategy with consistent
behavior and improved responsiveness
As a result, the PHY is now able to recover link reliably in under
1000_ms
====================
David Jander [Thu, 12 Jun 2025 10:41:57 +0000 (12:41 +0200)]
net: phy: dp83tg720: switch to adaptive polling and remove random delays
Now that the PHY reset logic includes a role-specific asymmetric delay
to avoid synchronized reset deadlocks, the previously used randomized
polling intervals are no longer necessary.
This patch removes the get_random_u32_below()-based logic and introduces
an adaptive polling strategy:
- Fast polling for a short time after link-down
- Slow polling if the link remains down
- Slower polling when the link is up
This balances CPU usage and responsiveness while avoiding reset
collisions. Additionally, the driver still relies on polling for
all link state changes, as interrupt support is not implemented,
and link-up events are not reliably signaled by the PHY.
The polling parameters are now documented in the updated top-of-file
comment.
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David Jander <david@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250612104157.2262058-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that dp83tg720_soft_reset() introduces role-specific delays to avoid
reset synchronization deadlocks, the fixed 600ms post-reset delay in
dp83tg720_read_status() is no longer needed.
The new logic provides both the required MDC timing and link stabilization,
making the old empirical delay redundant and unnecessarily long.
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David Jander <david@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250612104157.2262058-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Jander [Thu, 12 Jun 2025 10:41:55 +0000 (12:41 +0200)]
net: phy: dp83tg720: implement soft reset with asymmetric delay
Add a .soft_reset callback for the DP83TG720 PHY that issues a hardware
reset followed by an asymmetric post-reset delay. The delay differs
based on the PHY's master/slave role to avoid synchronized reset
deadlocks, which are known to occur when both link partners use
identical reset intervals.
The delay includes:
- a fixed 1ms wait to satisfy MDC access timing per datasheet, and
- an empirically chosen extra delay (97ms for master, 149ms for slave).
Co-developed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David Jander <david@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250612104157.2262058-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 11 Jun 2025 20:11:21 +0000 (22:11 +0200)]
net: phy: improve mdio-boardinfo.h
There's no need to include phy.h and mutex.h in mdio-boardinfo.h.
However mdio-boardinfo.c included phy.h indirectly this way so far,
include it explicitly instead. Whilst at it, sort the included
headers properly.