git.ipfire.org Git - thirdparty/kernel/stable.git/log

net: add sk->sk_drop_counters

Some sockets suffer from heavy false sharing on sk->sk_drops,
and fields in the same cache line.

Add sk->sk_drop_counters to:

- move the drop counter(s) to dedicated cache lines.
- Add basic NUMA awareness to these drop counter(s).

Following patches will use this infrastructure for UDP and RAW sockets.

sk_clone_lock() is not yet ready, it would need to properly
set newsk->sk_drop_counters if we plan to use this for TCP sockets.

v2: used Paolo suggestion from https://lore.kernel.org/netdev/8f09830a-d83d-43c9-b36b-88ba0a23e9b2@redhat.com/

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-4-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: add sk_drops_skbadd() helper

Existing sk_drops_add() helper is renamed to sk_drops_skbadd().

Add sk_drops_add() and convert sk_drops_inc() to use it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-3-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers

We want to split sk->sk_drops in the future to reduce
potential contention on this field.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

uapi: wrap compiler_types.h in an ifdef instead of the implicit strip

The uAPI stddef header includes compiler_types.h, a kernel-only
header, to make sure that kernel definitions of annotations
like __counted_by() take precedence.

There is a hack in scripts/headers_install.sh which strips includes
of compiler.h and compiler_types.h when installing uAPI headers.
While explicit handling makes sense for compiler.h, which is included
all over the uAPI, compiler_types.h is only included by stddef.h
(within the uAPI, obviously it's included in kernel code a lot).

Remove the stripping from scripts/headers_install.sh and wrap
the include of compiler_types.h in #ifdef __KERNEL__ instead.
This should be equivalent functionally, but is easier to understand
to a casual reader of the code. It also makes it easier to work
with kernel headers directly from under tools/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250825201828.2370083-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'eth-fbnic-extend-hw-stats-support'

Jakub Kicinski says:

====================
eth: fbnic: Extend hw stats support

Mohsin says:

Extend hardware stats support for fbnic by adding the ability to reset
hardware stats when the device experience a reset due to a PCI error and
include MAC stats in the hardware stats reset. Additionally, expand
hardware stats coverage to include FEC, PHY, and Pause stats.

v1: https://lore.kernel.org/20250822164731.1461754-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250825200206.2357713-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Add pause stats support

Add support to read pause stats for fbnic. Unlike FEC and PCS stats,
pause stats won't wrap, do not fetch them under the service task. Since,
they are exclusively accessed via the ethtool API, don't include them in
fbnic_get_hw_stats().

]# ethtool -I -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: off
TX: off
Statistics:
tx_pause_frames: 0
rx_pause_frames: 0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Read PHY stats via the ethtool API

Provide support to read PHY stats (FEC and PCS) via the ethtool API.

]# ethtool -I --show-fec eth0
FEC parameters for eth0:
Supported/Configured FEC encodings: RS
Active FEC encoding: RS
Statistics:
corrected_blocks: 0
uncorrectable_blocks: 0

]# ethtool -S eth0 --groups eth-phy
Standard stats for eth0:
eth-phy-SymbolErrorDuringCarrier: 0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Fetch PHY stats from device

Add support to fetch PHY stats consisting of PCS and FEC stats from the
device. When reading the stats counters, the lo part is read first, which
latches the hi part to ensure consistent reading of the stats counter.

FEC and PCS stats can wrap depending on the access frequency. To prevent
wrapping, fetch these stats periodically under the service task. Also to
maintain consistency fetch these stats along with other 32b stats under
__fbnic_get_hw_stats32().

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Reset MAC stats

Reset the MAC stats as part of the hardware stats reset to ensure
consistency. Currently, hardware stats are reset during device bring-up
and upon experiencing PCI errors; however, MAC stats are being skipped
during these resets.

When fbnic_reset_hw_stats() is called upon recovering from PCI error,
MAC stats are accessed outside the rtnl_lock. The only other access to
MAC stats is via the ethtool API, which is protected by rtnl_lock. This
can result in concurrent access to MAC stats and a potential race. Protect
the fbnic_reset_hw_stats() call in __fbnic_pm_attach() with rtnl_lock to
avoid this.

Note that fbnic_reset_hw_mac_stats() is called outside the hardware
stats lock which protects access to the fbnic_hw_stats. This is intentional
because MAC stats are fetched from the device outside this lock and are
exclusively read via the ethtool API.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Reset hw stats upon PCI error

Upon experiencing a PCI error, fbnic reset the device to recover from
the failure. Reset the hardware stats as part of the device reset to
ensure accurate stats reporting.

Note that the reset is not really resetting the aggregate value to 0,
which may result in a spike for a system collecting deltas in stats.
Rather, the reset re-latches the current value as previous, in case HW
got reset.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: fbnic: Move hw_stats_lock out of fbnic_dev

Move hw_stats_lock out of fbnic_dev to a more appropriate struct
fbnic_hw_stats since the only use of this lock is to protect access to
the hardware stats. While at it, enclose the lock and stats
initialization in a single init call.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825200206.2357713-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'macsec-replace-custom-netlink-attribute-checks-with-policy-level-checks'

Sabrina Dubroca says:

====================
macsec: replace custom netlink attribute checks with policy-level checks

We can simplify attribute validation a lot by describing the accepted
ranges more precisely in the policies, using NLA_POLICY_MAX etc.

Some of the checks still need to be done later on, because the
attribute length and acceptable range can vary based on values that
can't be known when the policy is validated (cipher suite determines
the key length and valid ICV length, presence of XPN changes the PN
length, detection of duplicate SCIs or ANs, etc).

As a bonus, we get a few extack messages from the policy
validation. I'll add extack to the rest of the checks (mostly in the
genl commands) in an future series.

v1: https://lore.kernel.org/netdev/cover.1664379352.git.sd@queasysnail.net
====================

Link: https://patch.msgid.link/cover.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: replace custom check on IFLA_MACSEC_ENCODING_SA with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/085bc642136cf3d267ddbb114e6f0c4a9247c797.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: replace custom checks for IFLA_MACSEC_* flags with NLA_POLICY_MAX

Those are all off/on flags.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/95707fb36adc1904fa327bc8f4eb055895aa6eff.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: validate IFLA_MACSEC_VALIDATION with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/629efe0b2150b30abc6472074018cbd521b46578.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: use NLA_POLICY_VALIDATE_FN to validate IFLA_MACSEC_CIPHER_SUITE

Unfortunately, since the value of MACSEC_DEFAULT_CIPHER_ID doesn't fit
near the others, we can't use a simple range in the policy.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/015e43ade9548c7682c9739087eba0853b3a1331.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: replace custom checks on IFLA_MACSEC_ICV_LEN with NLA_POLICY_RANGE

The existing checks already force this range.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/398cf16191a634ab343ecd811c481d7bdd44a933.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: add NLA_POLICY_MAX for MACSEC_OFFLOAD_ATTR_TYPE and IFLA_MACSEC_OFFLOAD

This is equivalent to the existing checks allowing either
MACSEC_OFFLOAD_OFF or calling macsec_check_offload.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/37e1f1716f1d1d46d3d06c52317564b393fe60e6.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: remove validate_add_rxsc

It's not doing much anymore.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/218147f2f11cab885abc86b779dcefcd3208a2f8.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: use NLA_UINT for MACSEC_SA_ATTR_PN

MACSEC_SA_ATTR_PN is either a u32 or a u64, we can now use NLA_UINT
for this instead of a custom binary type. We can then use a min check
within the policy.

We need to keep the length checks done in macsec_{add,upd}_{rx,tx}sa
based on whether the device is set up for XPN (with 64b PNs instead of
32b).

On the dump side, keep the existing custom code as userspace may
expect a u64 when using XPN, and nla_put_uint may only output a u32
attribute if the value fits.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c9d32bd479cd4464e09010fbce1becc75377c8a0.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: use NLA_POLICY_MAX_LEN for MACSEC_SA_ATTR_KEY

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/192227ca0047b643d6530ece0a3679998b010fac.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: replace custom checks on MACSEC_SA_ATTR_KEYID with NLA_POLICY_EXACT_LEN

The existing checks already specify that MACSEC_SA_ATTR_KEYID must
have length MACSEC_KEYID_LEN.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c4c113328962aae4146183e7a27854e854c796fb.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: replace custom checks on MACSEC_SA_ATTR_SALT with NLA_POLICY_EXACT_LEN

The existing checks already specify that MACSEC_SA_ATTR_SALT must have
length MACSEC_SALT_LEN.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/9699c5fd72322118b164cc8777fadabcce3b997c.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: replace custom checks on MACSEC_*_ATTR_ACTIVE with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/2b07434304c725c72a7d81a8460d0bbe8af384a2.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

macsec: replace custom checks on MACSEC_SA_ATTR_AN with NLA_POLICY_MAX

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/22a7820cfc2cbfe5e33f030f1a3276e529cc70dc.1756202772.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-prevent-rps-table-overwrite-of-active-flows'

Krishna Kumar says:

====================
net: Prevent RPS table overwrite of active flows

This series splits the original RPS patch [1] into two patches for
net-next. It also addresses a kernel test robot warning by defining
rps_flow_is_active() only when aRFS is enabled. I tested v3 with
four builds and reboots: two for [PATCH 1/2] with aRFS enabled &
disabled, and two for [PATCH 2/2]. There are no code changes in v4
and v5, only documentation. Patch v6 has one line change to keep
'hash' field under #ifdef, and was test built with aRFS=on and
aRFS=off. The same two builds were done for v7, along with 15m load
testing with aRFS=on to ensure the new changes are correct.

The first patch prevents RPS table overwrite for active flows thereby
improving aRFS stability.

The second patch caches hash & flow_id in get_rps_cpu() to avoid
recalculating it in set_rps_cpu().

[1] lore.kernel.org/netdev/20250708081516.53048-1-krikku@gmail.com/
[2] lore.kernel.org/netdev/20250729104109.1687418-1-krikku@gmail.com/
====================

Link: https://patch.msgid.link/20250825031005.3674864-1-krikku@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Cache hash and flow_id to avoid recalculation

get_rps_cpu() can cache flow_id and hash as both are required by
set_rps_cpu() instead of recalculating them twice.

Signed-off-by: Krishna Kumar <krikku@gmail.com>
Link: https://patch.msgid.link/20250825031005.3674864-3-krikku@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Prevent RPS table overwrite of active flows

This patch fixes an issue where two different flows on the same RXq
produce the same hash resulting in continuous flow overwrites.

Flow #1: A packet for Flow #1 comes in, kernel calls the steering
         function. The driver gives back a filter id. The kernel saves
this filter id in the selected slot. Later, the driver's
service task checks if any filters have expired and then
installs the rule for Flow #1.
Flow #2: A packet for Flow #2 comes in. It goes through the same steps.
         But this time, the chosen slot is being used by Flow #1. The
driver gives a new filter id and the kernel saves it in the
same slot. When the driver's service task runs, it runs through
all the flows, checks if Flow #1 should be expired, the kernel
returns True as the slot has a different filter id, and then
the driver installs the rule for Flow #2.
Flow #1: Another packet for Flow #1 comes in. The same thing repeats.
         The slot is overwritten with a new filter id for Flow #1.

This causes a repeated cycle of flow programming for missed packets,
wasting CPU cycles while not improving performance. This problem happens
at higher rates when the RPS table is small, but tests show it still
happens even with 12,000 connections and an RPS size of 16K per queue
(global table size = 144x16K = 64K).

This patch prevents overwriting an rps_dev_flow entry if it is active.
The intention is that it is better to do aRFS for the first flow instead
of hurting all flows on the same hash. Without this, two (or more) flows
on one RX queue with the same hash can keep overwriting each other. This
causes the driver to reprogram the flow repeatedly.

Changes:
  1. Add a new 'hash' field to struct rps_dev_flow.
  2. Add rps_flow_is_active(): a helper function to check if a flow is
     active or not, extracted from rps_may_expire_flow(). It is further
     simplified as per reviewer feedback.
  3. In set_rps_cpu():
     - Avoid overwriting by programming a new filter if:
        - The slot is not in use, or
        - The slot is in use but the flow is not active, or
        - The slot has an active flow with the same hash, but target CPU
          differs.
     - Save the hash in the rps_dev_flow entry.
  4. rps_may_expire_flow(): Use earlier extracted rps_flow_is_active().

Testing & results:
  - Driver: ice (E810 NIC), Kernel: net-next
  - #CPUs = #RXq = 144 (1:1)
  - Number of flows: 12K
  - Eight RPS settings from 256 to 32768. Though RPS=256 is not ideal,
    it is still sufficient to cover 12K flows (256*144 rx-queues = 64K
    global table slots)
  - Global Table Size = 144 * RPS (effectively equal to 256 * RPS)
  - Each RPS test duration = 8 mins (org code) + 8 mins (new code).
  - Metrics captured on client

Legend for following tables:
Steer-C: #times ndo_rx_flow_steer() was Called by set_rps_cpu()
Steer-L: #times ice_arfs_flow_steer() Looped over aRFS entries
Add:     #times driver actually programmed aRFS (ice_arfs_build_entry())
Del:     #times driver deleted the flow (ice_arfs_del_flow_rules())
Units:   K = 1,000 times, M = 1 million times

  |-------|---------|------|     Org Code    |---------|---------|
  | RPS   | Latency | CPU  | Add    |  Del   | Steer-C | Steer-L |
  |-------|---------|------|--------|--------|---------|---------|
  | 256   | 227.0   | 93.2 | 1.6M   | 1.6M   | 121.7M  | 267.6M  |
  | 512   | 225.9   | 94.1 | 11.5M  | 11.2M  | 65.7M   | 199.6M  |
  | 1024  | 223.5   | 95.6 | 16.5M  | 16.5M  | 27.1M   | 187.3M  |
  | 2048  | 222.2   | 96.3 | 10.5M  | 10.5M  | 12.5M   | 115.2M  |
  | 4096  | 223.9   | 94.1 | 5.5M   | 5.5M   | 7.2M    | 65.9M   |
  | 8192  | 224.7   | 92.5 | 2.7M   | 2.7M   | 3.0M    | 29.9M   |
  | 16384 | 223.5   | 92.5 | 1.3M   | 1.3M   | 1.4M    | 13.9M   |
  | 32768 | 219.6   | 93.2 | 838.1K | 838.1K | 965.1K  | 8.9M    |
  |-------|---------|------|   New Code      |---------|---------|
  | 256   | 201.5   | 99.1 | 13.4K  | 5.0K   | 13.7K   | 75.2K   |
  | 512   | 202.5   | 98.2 | 11.2K  | 5.9K   | 11.2K   | 55.5K   |
  | 1024  | 207.3   | 93.9 | 11.5K  | 9.7K   | 11.5K   | 59.6K   |
  | 2048  | 207.5   | 96.7 | 11.8K  | 11.1K  | 15.5K   | 79.3K   |
  | 4096  | 206.9   | 96.6 | 11.8K  | 11.7K  | 11.8K   | 63.2K   |
  | 8192  | 205.8   | 96.7 | 11.9K  | 11.8K  | 11.9K   | 63.9K   |
  | 16384 | 200.9   | 98.2 | 11.9K  | 11.9K  | 11.9K   | 64.2K   |
  | 32768 | 202.5   | 98.0 | 11.9K  | 11.9K  | 11.9K   | 64.2K   |
  |-------|---------|------|--------|--------|---------|---------|

Some observations:
  1. Overall Latency improved: (1790.19-1634.94)/1790.19*100 = 8.67%
  2. Overall CPU increased:    (777.32-751.49)/751.45*100    = 3.44%
  3. Flow Management (add/delete) remained almost constant at ~11K
     compared to values in millions.

Signed-off-by: Krishna Kumar <krikku@gmail.com>
Link: https://patch.msgid.link/20250825031005.3674864-2-krikku@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wwan: iosm: use int type to store negative error codes

The 'ret' variable in ipc_pcie_resources_request() either stores '-EBUSY'
directly or holds returns from pci_request_regions() and ipc_acquire_irq().
Storing negative error codes in u32 causes no runtime issues but is
stylistically inconsistent and very ugly. Change 'ret' from u32 to int
type - this has no runtime impact.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20250826135021.510767-1-rongqianfeng@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: simplify fixed_mdio_read

swphy_read_reg() doesn't change the passed struct fixed_phy_status,
so we can pass &fp->status directly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/c49195c7-a3a1-485c-baed-9b33740752de@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

amd-xgbe: Use int type to store negative error codes

Use int instead of unsigned int for the 'ret' variable to store return
values from functions that either return zero on success or negative error
codes on failure. Storing negative error codes in an unsigned int causes
no runtime issues, but it's ugly as pants, Change 'ret' from unsigned int
to int type - this change has no runtime impact.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250826142159.525059-1-rongqianfeng@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: sun8i: drop unneeded default syscon value

For some odd reason we were very jealous about the value of the EMAC
clock register from the syscon block, insisting on a reset value and
only doing read-modify-write operations on that register, even though we
pretty much know the register layout.
This already led to a basically redundant entry for the H6, which only
differs by that value. We seem to have the same situation with the new
A523 SoC, which again is compatible to the A64, but has a different
syscon reset value.

Drop any assumptions about that value, and set or clear the bits that we
want to program, from scratch (starting with a value of 0). For the
remove() implementation, we just turn on the POWERDOWN bit, and deselect
the internal PHY, which mimics the existing code.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Acked-by: Corentin LABBE <clabbe.montjoie@gmail.com>
Tested-by: Corentin LABBE <clabbe.montjoie@gmail.com>
Tested-by: Paul Kocialkowski <paulk@sys-base.io>
Reviewed-by: Paul Kocialkowski <paulk@sys-base.io>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250825172055.19794-1-andre.przywara@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: nfc: ti,trf7970a: Restrict the ti,rx-gain-reduction-db values

Instead of stating the supported values for the ti,rx-gain-reduction-db
property in free text format, add an enum entry that can help validating
the devicetree files.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250826141736.712827-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: rk: remove incorrect _DLY_DISABLE bit definition

The RK3328 GMAC clock delay macros define enable/disable controls for
TX and RX clock delay. While the TX definitions are correct, the
RXCLK_DLY_DISABLE macro incorrectly clears bit 0.

The macros RK3328_GMAC_TXCLK_DLY_DISABLE and
RK3328_GMAC_RXCLK_DLY_DISABLE are not referenced anywhere
in the driver code. Remove them to clean up unused definitions.

No functional change.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250826102219.49656-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: support for TRIGGER_NETDEV_LINK on RTL8211E and RTL8211F

This patch adds support for the TRIGGER_NETDEV_LINK trigger. It activates
the LED when a link is established, regardless of the speed.

Tested on Orange Pi PC2 with RTL8211E PHY.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250825211059.143231-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-sr-simplify-and-optimize-hmac-calculations'

Eric Biggers says:

====================
ipv6: sr: Simplify and optimize HMAC calculations

This series simplifies and optimizes the HMAC calculations in
IPv6 Segment Routing.
====================

Link: https://patch.msgid.link/20250824013644.71928-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: sr: Prepare HMAC key ahead of time

Prepare the HMAC key when it is added to the kernel, instead of
preparing it implicitly for every packet. This significantly improves
the performance of seg6_hmac_compute(). A microbenchmark on x86_64
shows seg6_hmac_compute() (with HMAC-SHA256) dropping from ~1978 cycles
to ~1419 cycles, a 28% improvement.

The size of 'struct seg6_hmac_info' increases by 128 bytes, but that
should be fine, since there should not be a massive number of keys.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250824013644.71928-3-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: sr: Use HMAC-SHA1 and HMAC-SHA256 library functions

Use the HMAC-SHA1 and HMAC-SHA256 library functions instead of
crypto_shash. This is simpler and faster. Pre-allocating per-CPU hash
transformation objects and descriptors is no longer needed, and a
microbenchmark on x86_64 shows seg6_hmac_compute() (with HMAC-SHA256)
dropping from ~2494 cycles to ~1978 cycles, a 20% improvement.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250824013644.71928-2-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-drv-net-ncdevmem-fix-error-paths'

Jakub Kicinski says:

====================
selftests: drv-net: ncdevmem: fix error paths

Make ncdevmem clean up after itself. While at it make sure it sets
HDS threshold to 0 automatically.

v1: https://lore.kernel.org/20250822200052.1675613-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250825180447.2252977-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: ncdevmem: explicitly set HDS threshold to 0

Make sure we set HDS threshold to 0 if the device supports changing it.
It's required for ZC.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: ncdevmem: restore original HDS setting before exiting

Restore HDS settings if we modified them.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: ncdevmem: restore old channel config

In case changing channel count with provider bound succeeds
unexpectedly - make sure we return to original settings.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: ncdevmem: save IDs of flow rules we added

In prep for more selective resetting of ntuple filters
try to save the rule IDs to a table.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: ncdevmem: remove use of error()

Using error() makes it impossible for callers to unwind their
changes. Replace error() calls with proper error handling.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: hds: restore hds settings

The test currently modifies the HDS settings and doesn't restore them.
This may cause subsequent tests to fail (or pass when they should not).
Add defer()ed reset handling.

Link: https://patch.msgid.link/20250825175939.2249165-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: Convert ->flowi4_tos to dscp_t.

Convert the ->flowic_tos field of struct flowi_common from __u8 to
dscp_t, rename it ->flowic_dscp and propagate these changes to struct
flowi and struct flowi4.

We've had several bugs in the past where ECN bits could interfere with
IPv4 routing, because these bits were not properly cleared when setting
->flowi4_tos. These bugs should be fixed now and the dscp_t type has
been introduced to ensure that variables carrying DSCP values don't
accidentally have any ECN bits set. Several variables and structure
fields have been converted to dscp_t already, but the main IPv4 routing
structure, struct flowi4, is still using a __u8. To avoid any future
regression, this patch converts it to dscp_t.

There are many users to convert at once. Fortunately, around half of
->flowi4_tos users already have a dscp_t value at hand, which they
currently convert to __u8 using inet_dscp_to_dsfield(). For all of
these users, we just need to drop that conversion.

But, although we try to do the __u8 <-> dscp_t conversions at the
boundaries of the network or of user space, some places still store
TOS/DSCP variables as __u8 in core networking code. Those can hardly be
converted either because the data structure is part of UAPI or because
the same variable or field is also used for handling ECN in other parts
of the code. In all of these cases where we don't have a dscp_t
variable at hand, we need to use inet_dsfield_to_dscp() when
interacting with ->flowi4_dscp.

Changes since v1:
* Fix space alignment in __bpf_redirect_neigh_v4() (Ido).

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/29acecb45e911d17446b9a3dbdb1ab7b821ea371.1756128932.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'expose-burst-period-for-devlink-health-reporter'

Mark Bloch says:

====================
Expose burst period for devlink health reporter

Shahar writes:
--------------------------------------------------------------------------

Currently, the devlink health reporter initiates the grace period
immediately after recovering an error, which blocks further recovery
attempts until the grace period concludes. Since additional errors
are not generally expected during this short interval, any new error
reported during the grace period is not only rejected but also causes
the reporter to enter an error state that requires manual intervention.

This approach poses a problem in scenarios where a single root cause
triggers multiple related errors in quick succession - for example,
a PCI issue affecting multiple hardware queues. Because these errors
are closely related and occur rapidly, it is more effective to handle
them together rather than handling only the first one reported and
blocking any subsequent recovery attempts. Furthermore, setting the
reporter to an error state in this context can be misleading, as these
multiple errors are manifestations of a single underlying issue, making
it unlike the general case where additional errors are not expected
during the grace period.

To resolve this, introduce a configurable burst period attribute to the
devlink health reporter. This period starts when the first error
is recovered and lasts for a user-defined duration. Once this error
burst period expires, the grace period begins. After the grace period
ends, a new reported error will start the same flow again.

Timeline summary:

----|--------|------------------------------/----------------------/--
error is  error is      burst period             grace period
reported  recovered  (recoveries allowed)     (recoveries blocked)

With burst period, create a time window during which recovery attempts
are permitted, allowing all reported errors to be handled sequentially
before the grace period starts. Once the grace period begins, it
prevents any further error recoveries until it ends.

When burst period is set to 0, current behavior is preserved.

Design alternatives considered:

1. Recover all queues upon any error:
   A brute-force approach that recovers all queues on any error.
   While simple, it is overly aggressive and disrupts unaffected queues
   unnecessarily. Also, because this is handled entirely within the
   driver, it leads to a driver-specific implementation rather than a
   generic one.

2. Per-queue reporter:
   This design would isolate recovery handling per SQ or RQ, effectively
   removing interdependencies between queues. While conceptually clean,
   it introduces significant scalability challenges as the number of
   queues grows, as well as synchronization challenges across multiple
   reporters.

3. Error aggregation with delayed handling:
   Errors arriving during the grace period are saved and processed after
   it ends. While addressing the issue of related errors whose recovery
   is aborted as grace period started, this adds complexity due to
   synchronization needs and contradicts the assumption that no errors
   should occur during a healthy system’s grace period. Also, this
   breaks the important role of grace period in preventing an infinite
   loop of immediate error detection following recovery. In such cases
   we want to stop.

4. Allowing a fixed burst of errors before starting grace period:
   Allows a set number of recoveries before the grace period begins.
   However, it also requires limiting the error reporting window.
   To keep the design simple, the burst threshold becomes redundant.

The burst period design was chosen for its simplicity and precision in
addressing the problem at hand. It effectively captures the temporal
correlation of related errors and aligns with the original intent of
the grace period as a stabilization window where further errors are
unexpected, and if they do occur, they indicate an abnormal system
state.

v3: https://lore.kernel.org/1755111349-416632-1-git-send-email-tariqt@nvidia.com
v2: https://lore.kernel.org/1753390134-345154-1-git-send-email-tariqt@nvidia.com
v1: https://lore.kernel.org/1752768442-264413-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/20250824084354.533182-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Set default burst period for TX and RX reporters

System errors can sometimes cause multiple errors to be reported
to the TX reporter at the same time. For instance, lost interrupts
may cause several SQs to time out simultaneously. When dev_watchdog
notifies the driver for that, it iterates over all SQs to trigger
recovery for the timed-out ones, via TX health reporter.
However, grace period allows only one recovery at a time, so only
the first SQ recovers while others remain blocked. Since no further
recoveries are allowed during the grace period, subsequent errors
cause the reporter to enter an ERROR state, requiring manual
intervention.

To address this, set the TX reporter's default burst period
to 0.5 second. This allows the reporter to detect and handle all
timed-out SQs within this window before initiating the grace period.

To account for the possibility of a similar issue in the RX reporter,
its default burst period is also configured.

Additionally, while here, align the TX definition prefix with the RX,
as these are used only in EN driver.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250824084354.533182-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Make health reporter burst period configurable

Enable configuration of the burst period — a time window starting
from the first error recovery, during which the reporter allows
recovery attempts for each reported error.

This feature is helpful when a single underlying issue causes multiple
errors, as it delays the start of the grace period to allow sufficient
time for recovering all related errors. For example, if multiple TX
queues time out simultaneously, a sufficient burst period could allow
all affected TX queues to be recovered within that window. Without this
period, only the first TX queue that reports a timeout will undergo
recovery, while the remaining TX queues will be blocked once the grace
period begins.

Configuration example:
$ devlink health set pci/0000:00:09.0 reporter tx burst_period 500

Configuration example with ynl:
./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/devlink.yaml \
--do health-reporter-set --json '{
  "bus-name": "auxiliary",
  "dev-name": "mlx5_core.eth.0",
  "port-index": 65535,
  "health-reporter-name": "tx",
  "health-reporter-burst-period": 500
}'

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250824084354.533182-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Introduce burst period for health reporter

Currently, the devlink health reporter starts the grace period
immediately after handling an error, blocking any further recoveries
until it finished.

However, when a single root cause triggers multiple errors in a short
time frame, it is desirable to treat them as a bulk of errors and to
allow their recoveries, avoiding premature blocking of subsequent
related errors, and reducing the risk of inconsistent or incomplete
error handling.

To address this, introduce a configurable burst period for devlink
health reporter. Start this period when the first error is handled,
and allow recovery attempts for reported errors during this window.
Once burst period expires, begin the grace period to block further
recoveries until it concludes.

Timeline summary:

----|--------|------------------------------/----------------------/--
error is error is burst period grace period
reported recovered (recoveries allowed) (recoveries blocked)

For calculating the burst period duration, use the same
last_recovery_ts as the grace period. Update it on recovery only
when the burst period is inactive (either disabled or at the
first error).

This patch implements the framework for the burst period and
effectively sets its value to 0 at reporter creation, so the current
behavior remains unchanged, which ensures backward compatibility.

A downstream patch will make the burst period configurable.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250824084354.533182-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move health reporter recovery abort logic to a separate function

Extract the health reporter recovery abort logic into a separate
function devlink_health_recover_abort().
The function encapsulates the conditions for aborting recovery:
- When auto-recovery is disabled
- When previous error wasn't recovered
- When within the grace period after last recovery

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250824084354.533182-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Move graceful period parameter to reporter ops

Move the default graceful period from a parameter to
devlink_health_reporter_create() to a field in the
devlink_health_reporter_ops structure.

This change improves consistency, as the graceful period is inherently
tied to the reporter's behavior and recovery policy. It simplifies the
signature of devlink_health_reporter_create() and its internal helper
functions. It also centralizes the reporter configuration at the ops
structure, preparing the groundwork for a downstream patch that will
introduce a devlink health reporter burst period attribute whose
default value will similarly be provided by the driver via the ops
structure.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250824084354.533182-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed_phy: let fixed_phy_unregister free the phy_device

fixed_phy_register() creates and registers the phy_device. To be
symmetric, we should not only unregister, but also free the phy_device
in fixed_phy_unregister(). This allows to simplify code in users.

Note wrt of_phy_deregister_fixed_link():
put_device(&phydev->mdio.dev) and phy_device_free(phydev) are identical.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/ad8dda9a-10ed-4060-916b-3f13bdbb899d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: fixed: let fixed_phy_add always use addr 0 and remove return value

We have only two users of fixed_phy_add(), both use address 0 and
ignore the return value. So simplify fixed_phy_add() accordingly.

Whilst at it, constify the fixed_phy_status configs.

Note:
fixed_phy_add() is a legacy function which shouldn't be used in new
code, as it's use may be problematic:
- No check whether a fixed phy exists already at the given address
- If fixed_phy_register() is called afterwards by any other driver,
then it will also use phy_addr 0, because fixed_phy_add() ignores
the ida which manages address assignment
Drivers using a fixed phy created by fixed_phy_add() in platform code,
should dynamically create a fixed phy with fixed_phy_register()
instead.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/762700e5-a0b1-41af-aa03-929822a39475@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: use kcalloc() instead of kzalloc()

As noted in the kernel documentation, open-coded multiplication in
allocator arguments is discouraged because it can lead to integer overflow.

Use devm_kcalloc() to gain built-in overflow protection, making memory
allocation safer when calculating allocation size compared to explicit
multiplication.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250825142753.534509-1-rongqianfeng@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: as21xxx: better handle PHY HW reset on soft-reboot

On soft-reboot, with a reset GPIO defined for an Aeonsemi PHY, the
special match_phy_device fails to correctly identify that the PHY
needs to load the firmware again.

This is caused by the fact that PHY ID is read BEFORE the PHY reset
GPIO (if present) is asserted, so we can be in the scenario where the
phydev have the previous PHY ID (with the PHY firmware loaded) but
after reset the generic AS21xxx PHY is present in the PHY ID registers.

To better handle this, skip reading the PHY ID register only for the PHY
that are not AS21xxx (by matching for the Aeonsemi Vendor) and always
read the PHY ID for the other case to handle both firmware already
loaded or an HW reset.

Fixes: 830877d89edc ("net: phy: Add support for Aeonsemi AS21xxx PHYs")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://patch.msgid.link/20250823134431.4854-2-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: introduce phy_id_compare_vendor() PHY ID helper

Introduce phy_id_compare_vendor() PHY ID helper to compare a PHY ID with
the PHY ID Vendor using the generic PHY ID Vendor mask.

While at it also rework the PHY_ID_MATCH macro and move the mask to
dedicated define so that PHY driver can make use of the mask if needed.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250823134431.4854-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ibmvnic: Increase max subcrq indirect entries with fallback

POWER8 support a maximum of 16 subcrq indirect descriptor entries per
H_SEND_SUB_CRQ_INDIRECT call, while POWER9 and newer hypervisors
support up to 128 entries. Increasing the max number of indirect
descriptor entries improves batching efficiency and reduces
hcall overhead, which enhances throughput under large workload on POWER9+.

Currently, ibmvnic driver always uses a fixed number of max indirect
descriptor entries (16). send_subcrq_indirect() treats all hypervisor
errors the same:
- Cleanup and Drop the entire batch of descriptors.
- Return an error to the caller.
- Rely on TCP/IP retransmissions to recover.
- If the hypervisor returns H_PARAMETER (e.g., because 128
   entries are not supported on POWER8), the driver will continue
   to drop batches, resulting in unnecessary packet loss.

In this patch:
Raise the default maximum indirect entries to 128 to improve ibmvnic
batching on morden platform. But also gracefully fall back to
16 entries for Power 8 systems.

Since there is no VIO interface to query the hypervisor’s supported
limit, vnic handles send_subcrq_indirect() H_PARAMETER errors:
- On first H_PARAMETER failure, log the failure context
- Reduce max_indirect_entries to 16 and allow the single batch to drop.
- Subsequent calls automatically use the correct lower limit,
    avoiding repeated drops.

The goal is to  optimizes performance on modern systems while handles
falling back for older POWER8 hypervisors.

Performance shows 40% improvements with MTU (1500) on largework load.

Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250821130215.97960-1-mmc@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

octeontx2-af: Remove unused declarations

Commit 1845ada47f6d ("octeontx2-af: cn10k: Add RPM LMAC pause frame
support") remove cgx_lmac_[s|g]et_pause_frm() and leave these unused.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250820123007.1705047-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phylink: remove stale an_enabled from doc

state->an_enabled was removed by
commit 4ee9b0dcf09f ("net: phylink: remove an_enabled")
but is left in mac_config() doc, so clean it.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250824013009.2443580-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-follow-up-for-dccp-removal'

Kuniyuki Iwashima says:

====================
tcp: Follow up for DCCP removal.

As I mentioned in [0], TCP still has code for DCCP.

This series cleans up such leftovers.

[0]: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com

v1: https://lore.kernel.org/20250821061540.2876953-1-kuniyu@google.com
====================

Link: https://patch.msgid.link/20250822190803.540788-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Move TCP-specific diag functions to tcp_diag.c.

tcp_diag_dump() / tcp_diag_dump_one() is just a wrapper of
inet_diag_dump_icsk() / inet_diag_dump_one_icsk(), respectively.

Let's inline them in tcp_diag.c and move static callees as well.

Note that inet_sk_attr_size() is merged into tcp_diag_get_aux_size(),
and we remove inet_diag_handler.idiag_get_aux_size() accordingly.

While at it, BUG_ON() is replaced with DEBUG_NET_WARN_ON_ONCE().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250822190803.540788-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Don't pass hashinfo to inet_diag helpers.

These inet_diag functions required struct inet_hashinfo because
they are shared by TCP and DCCP:

  * inet_diag_dump_icsk()
  * inet_diag_dump_one_icsk()
  * inet_diag_find_one_icsk()

DCCP has gone, and we don't need to pass hashinfo down to them.

Let's fetch net->ipv4.tcp_death_row.hashinfo directly in the first
2 functions.

Note that inet_diag_find_one_icsk() don't need hashinfo since the
previous patch.

We will move TCP-specific functions to tcp_diag.c in the next patch.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Don't pass hashinfo to socket lookup helpers.

These socket lookup functions required struct inet_hashinfo because
they are shared by TCP and DCCP.

  * __inet_lookup_established()
  * __inet_lookup_listener()
  * __inet6_lookup_established()
  * inet6_lookup_listener()

DCCP has gone, and we don't need to pass hashinfo down to them.

Let's fetch net->ipv4.tcp_death_row.hashinfo directly in the above
4 functions.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Remove hashinfo test for inet6?_lookup_run_sk_lookup().

Commit 6c886db2e78c ("net: remove duplicate sk_lookup helpers")
started to check if hashinfo == net->ipv4.tcp_death_row.hashinfo
in __inet_lookup_listener() and inet6_lookup_listener() and
stopped invoking BPF sk_lookup prog for DCCP.

DCCP has gone and the condition is always true.

Let's remove the hashinfo test.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Remove timewait_sock_ops.twsk_destructor().

Since DCCP has been removed, sk->sk_prot->twsk_prot->twsk_destructor
is always tcp_twsk_destructor().

Let's call tcp_twsk_destructor() directly in inet_twsk_free() and
remove ->twsk_destructor().

While at it, tcp_twsk_destructor() is un-exported.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: Remove sk_protocol test for tcp_twsk_unique().

Commit 383eed2de529 ("tcp: get rid of twsk_unique()") added
sk->sk_protocol test in __inet_check_established() and
__inet6_check_established() to remove twsk_unique() and call
tcp_twsk_unique() directly.

DCCP has gone, and the condition is always true.

Let's remove the sk_protocol test.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-airoha-add-ppe-support-for-rx-wlan-offload'

Lorenzo Bianconi says:

====================
net: airoha: Add PPE support for RX wlan offload

Introduce the missing bits to airoha ppe driver to offload traffic received
by the MT76 driver (wireless NIC) and forwarded by the Packet Processor
Engine (PPE) to the ethernet interface.

v2: https://lore.kernel.org/20250822-airoha-en7581-wlan-rx-offload-v2-0-8a76e1d3fec2@kernel.org
v1: https://lore.kernel.org/20250819-airoha-en7581-wlan-rx-offload-v1-0-71a097e0e2a1@kernel.org
====================

Link: https://patch.msgid.link/20250823-airoha-en7581-wlan-rx-offload-v3-0-f78600ec3ed8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: Introduce check_skb callback in ppe_dev ops

Export airoha_ppe_check_skb routine in ppe_dev ops. check_skb callback
will be used by the MT76 driver in order to offload the traffic received
by the wlan NIC and forwarded to the ethernet one.
Add rx_wlan parameter to airoha_ppe_check_skb routine signature.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250823-airoha-en7581-wlan-rx-offload-v3-3-f78600ec3ed8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: Add airoha_ppe_dev struct definition

Introduce airoha_ppe_dev struct as container for PPE offload callbacks
consumed by the MT76 driver during flowtable offload for traffic
received by the wlan NIC and forwarded to the wired one.
Add airoha_ppe_setup_tc_block_cb routine to PPE offload ops for MT76
driver.
Rely on airoha_ppe_dev pointer in airoha_ppe_setup_tc_block_cb
signature.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250823-airoha-en7581-wlan-rx-offload-v3-2-f78600ec3ed8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: Rely on airoha_eth struct in airoha_ppe_flow_offload_cmd signature

Rely on airoha_eth struct in airoha_ppe_flow_offload_cmd routine
signature and in all the called subroutines.
This is a preliminary patch to introduce flowtable offload for traffic
received by the wlan NIC and forwarded to the ethernet one.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250823-airoha-en7581-wlan-rx-offload-v3-1-f78600ec3ed8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mxl-86110: add basic support for MxL86111 PHY

Add basic support for the MxL86111 PHY which in addition to the features
of the MxL86110 also comes with an SGMII interface.
Setup the interface mode and take care of in-band-an.

Currently only RGMII-to-UTP and SGMII-to-UTP modes are supported while the
PHY would also support RGMII-to-1000Base-X, including automatic selection
of the Fiber or UTP link depending on the presence of a link partner.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/707fd83ec0e11ea620d37f2125a394e9dd1b27fa.1755884175.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mxl-86110: fix indentation in struct phy_driver

The .led_hw_control_get and .led_hw_control_set ops are indented with
spaces instead of tabs, unlike the rest of the values of the PHY's
struct phy_driver instance.
Use tabs instead of spaces resulting in a uniform indentation style.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/b9b7336ae309facc5e73874c62e64492fd749cc6.1755884175.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: mxl-86110: add basic support for led_brightness_set op

Add support for forcing each connected LED to be always on or always off
by implementing the led_brightness_set() op.
This is done by modifying the COM_EXT_LED_GEN_CFG register to enable
force-mode and forcing the LED either on or off.
When calling the led_hw_control_set() force-mode is again disabled for
that LED.
Implement mxl86110_modify_extended_reg() locked helper instead of
manually acquiring and releasing the MDIO bus lock for single
__mxl86110_modify_extended_reg() calls.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/58eeefc8c24e06cd2110d3cefbd4236b1a4f44a2.1755884175.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ppp: remove rwlock usage

In struct channel, the upl lock is implemented using rwlock_t,
protecting access to pch->ppp and pch->bridge.

As previously discussed on the list, using rwlock in the network fast
path is not recommended.
This patch replaces the rwlock with a spinlock for writers, and uses RCU
for readers.

- pch->ppp and pch->bridge are now declared as __rcu pointers.
- Readers use rcu_dereference_bh() under rcu_read_lock_bh().
- Writers use spin_lock() to update.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20250822012548.6232-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: mcast: Add ip6_mc_find_idev() helper

Extract the same code logic from __ipv6_sock_mc_join() and
ip6_mc_find_dev(), also add new helper ip6_mc_find_idev() to
reduce redundancy and enhance readability.

No functional changes intended.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Link: https://patch.msgid.link/20250822064051.2991480-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: lan78xx: add support for generic net selftests via ethtool

Integrate generic net_selftest framework by wiring up
.get_strings, .get_sset_count, and .self_test ethtool ops.

This enables execution of standard self-tests using
`ethtool -t <dev>` on LAN78xx devices.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250822092555.2888870-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-annotate-data-races-around-icsk_retransmits-and-icsk_probes_out'

Eric Dumazet says:

====================
tcp: annotate data-races around icsk_retransmits and icsk_probes_out

icsk->icsk_retransmits is read locklessly from inet_sk_diag_fill(),
tcp_get_timestamping_opt_stats, get_tcp4_sock() and get_tcp6_sock().

icsk->icsk_probes_out is read locklessly from inet_sk_diag_fill(),
get_tcp4_sock() and get_tcp6_sock().

Add corresponding READ_ONCE()/WRITE_ONCE() annotations.
====================

Link: https://patch.msgid.link/20250822091727.835869-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: annotate data-races around icsk->icsk_probes_out

icsk->icsk_probes_out is read locklessly from inet_sk_diag_fill(),
get_tcp4_sock() and get_tcp6_sock().

Add corresponding READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250822091727.835869-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: annotate data-races around icsk->icsk_retransmits

icsk->icsk_retransmits is read locklessly from inet_sk_diag_fill(),
tcp_get_timestamping_opt_stats, get_tcp4_sock() and get_tcp6_sock().

Add corresponding READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250822091727.835869-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-ipv4-allow-directed-broadcast-routes-to-use-dst-hint'

Oscar Maes says:

====================
net: ipv4: allow directed broadcast routes to use dst hint

Currently, ip_extract_route_hint uses RTN_BROADCAST to decide
whether to use the route dst hint mechanism.

This check is too strict, as it prevents directed broadcast
routes from using the hint, resulting in poor performance
during bursts of directed broadcast traffic.

This series fixes this, and adds a new selftest to ensure
this does not regress.

Link to v2: https://lore.kernel.org/20250814140309.3742-1-oscmaes92@gmail.com
====================

Link: https://patch.msgid.link/20250819174642.5148-1-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add test for dst hint mechanism with directed broadcast addresses

Add a test for ensuring that the dst hint mechanism is used for
directed broadcast addresses.

This test relies on mausezahn for sending directed broadcast packets.
Additionally, a high GRO flush timeout is set to ensure that packets
will be received as lists.

The test determines if the hint mechanism was used by checking
the in_brd statistic using lnstat.

Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250819174642.5148-3-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipv4: allow directed broadcast routes to use dst hint

Currently, ip_extract_route_hint uses RTN_BROADCAST to decide
whether to use the route dst hint mechanism.

This check is too strict, as it prevents directed broadcast
routes from using the hint, resulting in poor performance
during bursts of directed broadcast traffic.

Fix this in ip_extract_route_hint and modify ip_route_use_hint
to preserve the intended behaviour.

Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250819174642.5148-2-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Drop vim style annotation

Bindings files should not carry markings of editor setup, so drop vim
style annotation. No functional impact.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Gabriel Somlo <gsomlo@gmail.com>
Link: https://patch.msgid.link/20250821083038.46274-4-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: litex,liteeth: Correct example indentation

DTS example in the bindings should be indented with 2- or 4-spaces, so
correct a mixture of different styles to keep consistent 4-spaces.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Gabriel Somlo <gsomlo@gmail.com>
Link: https://patch.msgid.link/20250821083038.46274-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: rtnetlink: skip tests if tools or feats are missing

Some rtnetlink selftests assume the presence of ifconfig and iproute2
support for the `proto` keyword in `ip address` commands. These
assumptions can cause test failures on modern systems (e.g. Debian
Bookworm) where:

- ifconfig is not installed by default
- The iproute2 version lacks support for address protocol

This patch improves test robustness by:

- Skipping kci_test_promote_secondaries if ifconfig is missing
- Skipping do_test_address_proto if ip address help does not mention
proto

These changes ensure the tests degrade gracefully by reporting SKIP
instead of FAIL when prerequisites are not met, improving portability
across systems.

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alessandro Ratti <alessandro@0x65c.net>
Link: https://patch.msgid.link/20250822140633.891360-2-alessandro@0x65c.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-dsa-lantiq_gswip-prepare-for-supporting-new-features'

Daniel Golle says:

====================
net: dsa: lantiq_gswip: prepare for supporting new features

Prepare for supporting the newer standalone MaxLinear GSW1xx switch
family by refactoring the existing lantiq_gswip driver.
This is the first of a total of 3 series and doesn't yet introduce
any functional changes, but rather just makes the driver more
flexible, so new hardware and features can be supported in future.

This series has been preceded by an RFC series which covers everything
needed to support the MaxLinear GSW1xx family of switches. Andrew Lunn
had suggested to start with the 8 patches now submitted as they prepare
but don't yet introduce any functional changes.

Everything has been compile and runtime tested on AVM Fritz!Box 7490
(GSWIP version 2.1, VR9 v1.2)

Link: https://lore.kernel.org/aKDhFCNwjDDwRKsI@pidgin.makrotopia.org
====================

Link: https://patch.msgid.link/cover.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: lantiq_gswip: store switch API version in priv

Store the switch API version in struct gswip_priv. As the hardware has
the 'major/minor' version bytes in the wrong order preventing numerical
comparisons the version to be stored in gswip_priv is constructed in
such a way that the REV field is the most significant byte and the MOD
field the least significant byte. Also provide a conveniance macro to
allow comparing the stored version of the hardware against the already
defined GSWIP_VERSION_* macros.

This is done in order to prepare supporting newer features such as 4096
VLANs and per-port configurable learning which are only available
starting from specific hardware versions.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/eddb51ae8d0b2046ca91906e93daad7be5af56d7.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: lantiq_gswip: make DSA tag protocol model-specific

While the older Lantiq / Intel which are currently supported all use
the DSA_TAG_GSWIP tagging protocol, newer MaxLinear GSW1xx modules use
another 8-byte tagging protocol. Move the tag protocol information to
struct gswip_hw_info to make it possible for new models to specify
a different tagging protocol.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/841281b62fdb472048fa98fbad6c88dfbf512825.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: lantiq_gswip: load model-specific microcode

Load microcode as specified in struct hw_info instead of relying on
a single array of instructions. This is done in preparation to allow
loading different microcode for the MaxLinear GSW1xx family.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/486d95c085913d506745fbe4a0ab5d1ebdc3ed63.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: lantiq_gswip: introduce bitmap for MII ports

Instead of relying on hard-coded numbers for MII ports, introduce
a bitmap for MII ports.
This is done in order to prepare for supporting MaxLinear GSW1xx ICs
which got a different layout of ports.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/019fc8ed06f2317976eac143320d1dc046e8f392.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: lantiq_gswip: move definitions to header

Introduce header file and move register definitions as well as the
definitions struct gswip_hw_info and struct gswip_priv there.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/551c72c93131cd200b38340741e68ff21793ba0b.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: lantiq_gswip: prepare for more CPU port options

The MaxLinear GSW1xx series of switches support using either the
(R)(G)MII interface on port 5 or the SGMII interface on port 4 to be
used as CPU port. Prepare for supporting them by defining a mask of
allowed CPU ports instead of a single port.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/879c66672d26fe49c1f5d9aa40d8ebc0f31885ab.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: lantiq_gswip: deduplicate dsa_switch_ops

The two instances of struct dsa_switch_ops differ only by their
.phylink_get_caps op. Instead of having two instances of dsa_switch_ops,
rather just have a pointer to the phylink_get_caps function in
struct gswip_hw_info.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/03d72eceeb3faecdbe03ce58aab40861cf6b77c1.1755878232.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: xdp: make sure we're actually testing native XDP

Kernel tries to be helpful and attach the XDP program in generic
mode if the driver has no BPF ndo at all. Since the xdp.py tests
all have "native" in their names this can be quite confusing.
Force native / "drv" attachment. Note that netdevsim re-uses
the generic handler as its "native" handler, so we'll maintain
the test coverage of the generic mode that way. No need to test
both explicitly, I reckon.

Link: https://patch.msgid.link/20250822195645.1673390-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'aquantia-phy-driver-consolidation-part-1'

Vladimir Oltean says:

====================
Aquantia PHY driver consolidation - part 1

This started out as an effort to add some new features hinging on the
VEND1_GLOBAL_CFG_* registers, but I quickly started to notice that the
Aquantia PHY driver has a large code base, but individual PHYs only
implement arbitrary subsets of it.

The table below lists the PHYs known to me to have the
VEND1_GLOBAL_CFG_* registers.

PHY       Access from            Access from
           aqr107_read_rate()     aqr113c_fill_interface_modes()
------------------------------------------------------------------
AQR107    y                      n
AQCS109   y                      n
AQR111    y                      n
AQR111B0  y                      n
AQR112    y                      n
AQR412    y                      n
AQR113    y                      y
AQR113C   y                      y
AQR813    y                      n
AQR114C   y                      n
AQR115C   y                      y

Maybe you're wondering, after reading this, why don't more Aquantia PHYs
populate phydev->possible_interfaces based on the registers that they
are known to have? And why do AQR114C and AQR115C, PHYs from the same
generation, just having different max speeds, differ in this behaviour?
And why does AQR813, the 8-port variant of AQR113, not call
aqr113c_config_init(), but aqr107_config_init()?

I did wonder, and I don't know either, but I suspect it has to do with
developers not wanting to break what they can't test, and only touching
what they are interested in. Multiplied at a large enough scale, this
tends to result in unmaintainable code.

The tendency might also be encouraged by the slightly strange and
inconsistent naming scheme in this driver.

The set proposes a naming scheme based on generations, and feature
inheritance from Gen X to Gen X+1. This helps fill in missing
software functionalities where the hardware feature should be present.
I had to put a hard stop at 15 patches, so I've picked the more
meaningful functions to consolidate, rather than going through the
entire driver. Depending on review feedback, I can do more or I can
stop.

Furthermore, the set adds generation-appropriate support for two more
PHY IDs: AQR412 and AQR115, and fixes the improper reporting of AQR412C
as AQR412.

The changes were tested on AQR107, AQR112, AQR412C and AQR115.
====================

Link: https://patch.msgid.link/20250821152022.1065237-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: add support for AQR115

AQR115 is similar to the already supported AQR115C, having speeds up to
2.5Gbps. In fact, the two differ only in the FCBGA package size (7x11mm
vs 7x7mm for the Compact variant). So it makes sense that the feature
set is identical for the 2 drivers.

This PHY is present on the newest PCB revision E (v4.0) of the NXP
LS1046A-RDB, having replaced the RTL8211FS SGMII PHY going to fm1-mac5.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-16-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: promote AQR813 and AQR114C to aqr_gen4_config_init()

I'm not sure whether there is any similar real-life problem on AQR813
and AQR114C as were seen on the PHYs that these commit were written for:
- a7f3abcf6357 ("net: phy: aquantia: only poll GLOBAL_CFG regs on
aqr113, aqr113c and aqr115c")
- bed90b06b681 ("net: phy: aquantia: clear PMD Global Transmit Disable
bit during init")

but the inconsistency in handling between PHYs of the same generation is
striking. Apart from different firmware builds with different
provisioning, the only difference between these PHYs should be the max
link speed and/or the number of ports.

Let's try and see if there's any problem if all PHYs from the same
generation use the same config_init() method.

Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Cc: Robert Marko <robimarko@gmail.com>
Cc: Paweł Owoc <frut3k7@gmail.com>
Cc: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-15-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: rename aqr113c_config_init() to aqr_gen4_config_init()

aqr113c_config_init() is called by AQR113, AQR113C, AQR115C, all Gen4
PHYs. Thus, rename this to aqr_gen4_config_init().

Currently, aqr113c_config_init() calls aqr_gen2_config_init(). Since
we've established that these are Gen4 PHYs, it makes sense to inherit
the Gen3 feature set as well. Currently, aqr_gen3_config_init() just
calls aqr_gen2_config_init(), so we can safely make this extra
modification and expect no functional change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-14-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: aquantia: reimplement aqcs109_config_init() as aqr_gen2_config_init()

I lack documentation for AQCS109, but from commit 99c864667c9f ("net:
phy: aquantia: add support for AQCS109"), it is known that "From
software point of view, it should be almost equivalent to AQR107."

Based on further conjecture of the device numbering scheme, I am
treating it as similar to AQR109 (a Gen2 PHY capable of to 2.5G).

Its current instructions are also present in other init sequences as
below:
- aqr_wait_reset_complete() ... aqr107_chip_info() as well as
aqr107_set_downshift() are in aqr_gen1_config_init()
- aqr_gen2_fill_interface_modes() is in aqr_gen2_config_init()

So it would be good to centralize this implementation by just calling
aqr_gen2_config_init().

In practice this completes support for the following features, which are
present on AQR109 already:
- Potentially reverse MDI lane order via "marvell,mdi-cfg-order"
- Restore polarity of active-high and active-low LEDs after reset.

Cc: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250821152022.1065237-13-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>