git.ipfire.org Git - thirdparty/kernel/linux.git/log

net: Add lease info to queue-get response

Populate nested lease info to the queue-get response that returns the
ifindex, queue id with type and optionally netns id if the device
resides in a different netns.

Example with ynl client:

  # ip a
  [...]
  4: enp10s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:24 qdisc mq state UP group default qlen 1000
    link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/24 scope global enp10s0f0np0
       valid_lft forever preferred_lft forever
    inet6 fe80::eaeb:d3ff:fea3:43f6/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
  [...]

  # ethtool -i enp10s0f0np0
  driver: mlx5_core
  [...]

  # ./pyynl/cli.py \
      --spec ~/netlink/specs/netdev.yaml \
      --do queue-get \
      --json '{"ifindex": 4, "id": 15, "type": "rx"}'
  {'id': 15,
   'ifindex': 4,
   'lease': {'ifindex': 8, 'netns-id': 0, 'queue': {'id': 1, 'type': 'rx'}},
   'napi-id': 8227,
   'type': 'rx',
   'xsk': {}}

  # ip netns list
  foo (id: 0)

  # ip netns exec foo ip a
  [...]
  8: nk@NONE: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
      inet6 fe80::200:ff:fe00:0/64 scope link proto kernel_ll
         valid_lft forever preferred_lft forever
  [...]

  # ip netns exec foo ethtool -i nk
  driver: netkit
  [...]

  # ip netns exec foo ls /sys/class/net/nk/queues/
  rx-0  rx-1  tx-0

  # ip netns exec foo ./pyynl/cli.py \
      --spec ~/netlink/specs/netdev.yaml \
      --do queue-get \
      --json '{"ifindex": 8, "id": 1, "type": "rx"}'
  {'id': 1, 'ifindex': 8, 'type': 'rx'}

Note that the caller of netdev_nl_queue_fill_one() holds the netdevice
lock. For the queue-get we do not lock both devices. When queues get
{un,}leased, both devices are locked, thus if __netif_get_rx_queue_peer()
returns true, the peer pointer points to a valid device. The netns-id
is fetched via peernet2id_alloc() similarly as done in OVS.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260115082603.219152-4-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: Implement netdev_nl_queue_create_doit

Implement netdev_nl_queue_create_doit which creates a new rx queue in a
virtual netdev and then leases it to a rx queue in a physical netdev.

Example with ynl client:

  # ./pyynl/cli.py \
      --spec ~/netlink/specs/netdev.yaml \
      --do queue-create \
      --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}'
  {'id': 1}

Note that the netdevice locking order is always from the virtual to
the physical device.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-3-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: Add queue-create operation

Add a ynl netdev family operation called queue-create that creates a
new queue on a netdevice:

      name: queue-create
      attribute-set: queue
      flags: [admin-perm]
      do:
        request:
          attributes:
            - ifindex
            - type
            - lease
        reply: &queue-create-op
          attributes:
            - id

This is a generic operation such that it can be extended for various
use cases in future. Right now it is mandatory to specify ifindex,
the queue type which is enforced to rx and a lease. The newly created
queue id is returned to the caller.

A queue from a virtual device can have a lease which refers to another
queue from a physical device. This is useful for memory providers
and AF_XDP operations which take an ifindex and queue id to allow
applications to bind against virtual devices in containers. The lease
couples both queues together and allows to proxy the operations from
a virtual device in a container to the physical device.

In future, the nested lease attribute can be lifted and made optional
for other use-cases such as dynamic queue creation for physical
netdevs. The lack of lease and the specification of the physical
device as an ifindex will imply that we need a real queue to be
allocated. Similarly, the queue type enforcement to rx can then be
lifted as well to support tx.

An early implementation had only driver-specific integration [0], but
in order for other virtual devices to reuse, it makes sense to have
this as a generic API in core net.

For leasing queues, the virtual netdev must have real_num_rx_queue
less than num_rx_queues at the time of calling queue-create. The
queue-type must be rx as only rx queues are supported for leasing
for now. We also enforce that the queue-create ifindex must point
to a virtual device, and that the nested lease attribute's ifindex
must point to a physical device. The nested lease attribute set
contains a netns-id attribute which is currently only intended for
dumping as part of the queue-get operation. Also, it is modeled as
an s32 type similarly as done elsewhere in the stack.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-hinic3-pf-initialization'

Fan Gong says:

====================
net: hinic3: PF initialization

This is [1/3] part of hinic3 Ethernet driver second submission.
With this patch hinic3 becomes a complete Ethernet driver with
pf and vf.

The driver parts contained in this patch:
Add support for PF framework based on the VF code.
Add PF management interfaces to communicate with HW.
Add 8 netdev ops to configure NIC features.
Support mac filter to unicast and multicast.
Add HW event handler to manage port and link status.

V01: https://lore.kernel.org/netdev/cover.1760502478.git.zhuyikai1@h-partners.com/
V02: https://lore.kernel.org/netdev/cover.1760685059.git.zhuyikai1@h-partners.com/
V03: https://lore.kernel.org/netdev/cover.1761362580.git.zhuyikai1@h-partners.com/
V04: https://lore.kernel.org/netdev/cover.1761711549.git.zhuyikai1@h-partners.com/
V05: https://lore.kernel.org/netdev/cover.1762414088.git.zhuyikai1@h-partners.com/
V06: https://lore.kernel.org/netdev/cover.1762581665.git.zhuyikai1@h-partners.com/
V07: https://lore.kernel.org/netdev/cover.1763555878.git.zhuyikai1@h-partners.com/
V08: https://lore.kernel.org/netdev/cover.1767495881.git.zhuyikai1@h-partners.com/
V09: https://lore.kernel.org/netdev/cover.1767707500.git.zhuyikai1@h-partners.com/
V10: https://lore.kernel.org/netdev/cover.1767861236.git.zhuyikai1@h-partners.com/
====================

Link: https://patch.msgid.link/cover.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add HW event handler

Add HINIC3_INIT_UP flags to trace netdev open status.
Add port module event handler.
Add link status event type(FAULT, PCIE link down, heart lost, mgmt
watchdog).

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/53a2b928136998f740d597bbd45ca1740b95538f.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add mac filter ops

Add ops to support unicast and multicast to filter mac address in
packets sending and receiving.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/9ea618ad5d6c404e222e9114e08e913a73177705.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add adaptive IRQ coalescing with DIM

DIM offers a way to adjust the coalescing settings based
on load. As hinic3 rx and tx share interrupts, we only need
to base dim on rx stats.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/af96c20a836800a5972a09cdaf520029d976ad48.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add .ndo_vlan_rx_add/kill_vid and .ndo_validate_addr

Implement following callback function:
.ndo_vlan_rx_add_vid
.ndo_vlan_rx_kill_vid
.ndo_validate_addr

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/70be187afcaa7b38981d114c088ffdc2cba0b4f1.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add .ndo_features_check

As we cannot solve packets with multiple stacked vlan, so we use
.ndo_features_check to check for these packets and return a smaller
feature without offload features.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/3879b20b7ffa20106a3f8f56dbf2d5eb389f260a.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add .ndo_set_features and .ndo_fix_features

Implement following callback function:
.ndo_set_features
.ndo_fix_features

The .ndo_set_features function includes five features: rx_csum,
tso, lro, rx_cvlan and vlan_filter.
Add these new features in netdev_feature_init.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/682734a08fde421413048bf70057dafe3cbe8497.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add .ndo_tx_timeout and .ndo_get_stats64

Implement following callback function:
.ndo_tx_timeout
.ndo_get_stats64

Use a work queue to trace tx_timeout callback and dump necessary
debug information.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/ec34d2ff9b142e1e142e47700714533baf7e659c.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add PF management interfaces

Add management and communication pathways between PF and HW.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/447e72b1c5f255ea5d79ecf96c8dac58011e604d.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

hinic3: Add PF framework

Add support for PF framework based on the VF code.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/4796d6076bbad0f096eb10261ab3c7d989c5f7b7.1768375903.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-mlx5e-save-per-channel-async-icosq-in-default'

Tariq Toukan says:

====================
net/mlx5e: Save per-channel async ICOSQ in default

This series by William reduces the default number of SQs in a channel
from 3 down to 2, by not creating the async ICOSQ (asynchronous
internal-communication-operations send-queue).

This significantly improves the latency of channel configuration
operations, like interface up (create channels), interface down (destroy
channels), and channels reconfiguration (create new set, destroy old
one).

This reduces the per-channel memory usage, saves hardware resources, in
addition to the improved latency.

This significantly speeds up the setup/config stage on systems with high
number of channels or many netdevs, in particular systems with hundreds
or K's of SFs.

The two remaining default SQs per channel after this series:
1 TXQ SQ (for traffic), and 1 ICOSQ (for internal communication
operations with the device).

Perf numbers:
NIC: Connect-X7.
Test: Latency of interface up + down operations.

Measured 20% speedup.
Saving ~0.36 sec for 248 channels (~1.45 msec per channel).
====================

Link: https://patch.msgid.link/1768376800-1607672-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Conditionally create async ICOSQ

The async ICOSQ is only required by TLS RX (for re-sync flow) and XSK
TX. Create it only when these features are enabled instead of always
allocating it. This reduces per-channel memory usage, saves hardware
resources, improves latency, and decreases the default number of SQs
(from 3 to 2) and CQs (from 4 to 3). It also speeds up channel
open/close operations for a netdev when async ICOSQ is not needed.

Currently when TLS RX is enabled, there is no channel reset triggered.
As a result, async ICOSQ allocation is not triggered, causing a NULL
pointer crash. One solution is to do channel reset every time when
toggling TLS RX. However, it's not straightforward as the offload
state matters only on connection creation, and can go on beyond the
channels reset.

Instead, introduce a new field 'ktls_rx_was_enabled': if TLS RX is
enabled for the first time: reset channels, create async ICOSQ, set
the field. From that point on, no need to reset channels for any TLS
RX enable/disable. Async ICOSQ will always be needed.

For XSK TX, async ICOSQ is used in wakeup control and is guaranteed
to have async ICOSQ allocated.

This improves the latency of interface up/down operations when it
applies.

Perf numbers:
NIC: Connect-X7.
Test: Latency of interface up + down operations.

Measured 20% speedup.
Saving ~0.36 sec for 248 channels (~1.45 msec per channel).

Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768376800-1607672-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Move async ICOSQ to dynamic allocation

Dynamically allocate async ICOSQ. ICO (Internal Communication
Operations) is for driver to communicate with the HW, and it's
not used for traffic. Currently mlx5 driver has sync and async
ICO send queues. The async ICOSQ means that it's not necessarily
under NAPI context protection. The patch is in preparation for
the later patch to detect its usage and enable it when necessary.

Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768376800-1607672-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Use regular ICOSQ for triggering NAPI

Before the cited commit, ICOSQ is used to post NOP WQE to trigger
hardware interrupt and start NAPI, but this mechanism suffers from
a race condition: mlx5e_alloc_rx_mpwqe may post UMR WQEs to ICOSQ
_before_ NOP WQE is posted. The cited commit fixes the issue by
replacing ICOSQ with async ICOSQ, as a new way to post the NOP WQE
to trigger the hardware interrupt and NAPI.

The patch changes it back by replacing async ICOSQ with regular
ICOSQ, for the purpose of saving memory in later patches, and solves
the issue by adding a new SQ state, MLX5E_SQ_STATE_LOCK_NEEDED
for syncing the start of NAPI.

What it does:
- Switch trigger path from async ICOSQ to regular ICOSQ to reduce
  need for async SQ.
- Introduce MLX5E_SQ_STATE_LOCK_NEEDED and mlx5e_icosq_sync_lock(),
  unlock() to prevent the race where UMR WQEs could be posted before
  the NOP WQE used to trigger NAPI.
- Use synchronize_net() once per trigger cycle to quiesce in-flight
  softirqs before serializing the NOP WQE and any UMR postings via
  the ICOSQ lock.
- Wrap ICOSQ UMR posting in en_rx.c and xsk/rx.c with the new
  conditional lock.

The conditional locking approach is critical for performance: always
locking would impose unnecessary overhead. Synchronization is not needed
between regular NAPI cycles once the channel is activated and running.
The lock is only required to protect against the race during channel
activation—specifically, when the very first NOP WQE is posted to trigger
NAPI. After that initial trigger, normal NAPI polling handles subsequent
work without contention. The MLX5E_SQ_STATE_LOCK_NEEDED flag ensures we
pay the synchronization cost only when necessary.

Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768376800-1607672-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Move async ICOSQ lock into ICOSQ struct

Move the async_icosq spinlock from the mlx5e_channel structure into
the mlx5e_icosq structure itself for better encapsulation and for
later patch to also use it for other icosq use cases.

Changes:
- Add spinlock_t lock field to struct mlx5e_icosq
- Remove async_icosq_lock field from struct mlx5e_channel
- Initialize the new lock in mlx5e_open_icosq()
- Update all lock usage in ktls_rx.c and en_main.c to use sq->lock
instead of c->async_icosq_lock

Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768376800-1607672-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeon_ep: reset firmware ready status

Add support to reset firmware ready status
when the driver is removed(either in unload
or unbind)

Signed-off-by: Sathesh Edara <sedara@marvell.com>
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115092048.870237-1-vimleshk@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-thunderbolt-various-improvements'

Mika Westerberg says:

====================
net: thunderbolt: Various improvements

This series improves the Thunderbolt networking driver so that it should
work with the bonding driver.

The discussion that started this patch series can be read below:

https://lore.kernel.org/netdev/CAFJzfF9N4Hak23sc-zh0jMobbkjK7rg4odhic1DQ1cC+=MoQoA@mail.gmail.com/

v2: https://lore.kernel.org/20260109122606.3586895-1-mika.westerberg@linux.intel.com
v1: https://lore.kernel.org/20251127131521.2580237-1-mika.westerberg@linux.intel.com
====================

Link: https://patch.msgid.link/20260115115646.328898-1-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: thunderbolt: Allow reading link settings

In order to use Thunderbolt networking as part of bonding device it
needs to support ->get_link_ksettings() ethtool operation, so that the
bonding driver can read the link speed and the related attributes. Add
support for this to the driver.

Signed-off-by: Ian MacDonald <ian@netstatz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://patch.msgid.link/20260115115646.328898-5-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: 3ad: Add support for SPEED_80000

Add support for ethtool SPEED_80000. This is needed to allow
Thunderbolt/USB4 networking driver to be used with the bonding driver.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115115646.328898-4-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: Add support for 80Gbps speed

USB4 v2 link used in peer-to-peer networking is symmetric 80Gbps so in
order to support reading this link speed, add support for it to ethtool.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260115115646.328898-3-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: thunderbolt: Allow changing MAC address of the device

The MAC address we use is based on a suggestion in the USB4 Inter-domain
spec but it is not really used in the USB4NET protocol. It is more
targeted for the upper layers of the network stack. There is no reason
why it should not be changed by the userspace for example if needed for
bonding.

Reported-by: Ian MacDonald <ian@netstatz.com>
Closes: https://lore.kernel.org/netdev/CAFJzfF9N4Hak23sc-zh0jMobbkjK7rg4odhic1DQ1cC+=MoQoA@mail.gmail.com/
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://patch.msgid.link/20260115115646.328898-2-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dpll-support-mode-switching'

Ivan Vecera says:

====================
dpll: support mode switching

This series adds support for switching the working mode (automatic vs
manual) of a DPLL device via netlink.

Currently, the DPLL subsystem allows userspace to retrieve the current
working mode but lacks the mechanism to configure it. Userspace is also
unaware of which modes a specific device actually supports, as it
currently assumes only the active mode is supported.

The series addresses these limitations by:
1. Introducing .supported_modes_get() callback to allow drivers to report
   all modes capable of running on the device.
2. Introducing .mode_set() callback and updating the netlink policy
   to allow userspace to request a mode change.
3. Implementing these callbacks in the zl3073x driver, enabling dynamic
   switching between automatic and manual modes.
====================

Link: https://patch.msgid.link/20260114122726.120303-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: zl3073x: Implement device mode setting support

Add support for .supported_modes_get() and .mode_set() callbacks
to enable switching between manual and automatic modes via netlink.

Implement .supported_modes_get() to report available modes based
on the current hardware configuration:

* manual mode is always supported
* automatic mode is supported unless the dpll channel is configured
  in NCO (Numerically Controlled Oscillator) mode

Implement .mode_set() to handle the specific logic required when
transitioning between modes:

1) Transition to manual:
* If a valid reference is currently active, switch the hardware
  to ref-lock mode (force lock to that reference).
* If no reference is valid and the DPLL is unlocked, switch to freerun.
* Otherwise, switch to Holdover.

2) Transition to automatic:
* If the currently selected reference pin was previously marked
  as non-selectable (likely during a previous manual forcing
  operation), restore its priority and selectability in the hardware.
* Switch the hardware to Automatic selection mode.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Link: https://patch.msgid.link/20260114122726.120303-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: add dpll_device op to set working mode

Currently, userspace can retrieve the DPLL working mode but cannot
configure it. This prevents changing the device operation, such as
switching from manual to automatic mode and vice versa.

Add a new callback .mode_set() to struct dpll_device_ops. Extend
the netlink policy and device-set command handling to process
the DPLL_A_MODE attribute. Update the netlink YAML specification
to include the mode attribute in the device-set operation.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260114122726.120303-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpll: add dpll_device op to get supported modes

Currently, the DPLL subsystem assumes that the only supported mode is
the one currently active on the device. When dpll_msg_add_mode_supported()
is called, it relies on ops->mode_get() and reports that single mode
to userspace. This prevents users from discovering other modes the device
might be capable of.

Add a new callback .supported_modes_get() to struct dpll_device_ops. This
allows drivers to populate a bitmap indicating all modes supported by
the hardware.

Update dpll_msg_add_mode_supported() to utilize this new callback:

* if ops->supported_modes_get is defined, use it to retrieve the full
  bitmap of supported modes.
* if not defined, fall back to the existing behavior: retrieve
  the current mode via ops->mode_get and set the corresponding bit
  in the bitmap.

Finally, iterate over the bitmap and add a DPLL_A_MODE_SUPPORTED netlink
attribute for every set bit, accurately reporting the device's capabilities
to userspace.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260114122726.120303-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: csum: Fix printk format in recv_get_packet_csum_status()

Following warning is encountered when building selftests on powerpc/32.

  CC       csum
csum.c: In function 'recv_get_packet_csum_status':
csum.c:710:50: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
  710 |                         error(1, 0, "cmsg: len=%lu expected=%lu",
      |                                                ~~^
      |                                                  |
      |                                                  long unsigned int
      |                                                %u
  711 |                               cm->cmsg_len, CMSG_LEN(sizeof(struct tpacket_auxdata)));
      |                               ~~~~~~~~~~~~
      |                                 |
      |                                 size_t {aka unsigned int}
csum.c:710:63: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
  710 |                         error(1, 0, "cmsg: len=%lu expected=%lu",
      |                                                             ~~^
      |                                                               |
      |                                                               long unsigned int
      |                                                             %u

cm->cmsg_len has type __kernel_size_t and CMSG() macro has the type
returned by sizeof() which is size_t.

size_t is 'unsigned int' on some platforms and 'unsigned long' on
other ones so use %zu instead of %lu.

The code in question was introduced by
commit 91a7de85600d ("selftests/net: add csum offload test").

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/8b69b40826553c1dd500d9d25e45883744f3f348.1768556791.git.chleroy@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dsa-mxl-gsw1xx-support-r-g-mii-slew-rate-configuration'

Alexander Sverdlin says:

====================
dsa: mxl-gsw1xx: Support R(G)MII slew rate configuration

Maxlinear GSW1xx switches offer slew rate configuration bits for R(G)MII
interface. The default state of the configuration bits is "normal", while
"slow" can be used to reduce the radiated emissions. Add the support for
the latter option into the driver as well as the new DT bindings.
====================

Link: https://patch.msgid.link/20260114104509.618984-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mxl-gsw1xx: Support R(G)MII slew rate configuration

Support newly introduced maxlinear,slew-rate-txc and
maxlinear,slew-rate-txd device tree properties to configure R(G)MII
interface pins' slew rate. It might be used to reduce the radiated
emissions.

Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20260114104509.618984-3-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: dsa: lantiq,gswip: add MaxLinear R(G)MII slew rate

Add new maxlinear,slew-rate-txc and maxlinear,slew-rate-txd uint32
properties. The properties are only applicable for ports in R(G)MII mode
and allow for slew rate reduction in comparison to "normal" default
configuration with the purpose to reduce radiated emissions.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260114104509.618984-2-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv6-more-data-race-annotations'

Eric Dumazet says:

====================
ipv6: more data-race annotations

Inspired by one unrelated syzbot report.

This series adds missing (and boring) data-race annotations in IPv6.

Only the first patch adds sysctl_ipv6_flowlabel group
to speedup ip6_make_flowlabel() a bit.
====================

Link: https://patch.msgid.link/20260115094141.3124990-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: annotate data-races in net/ipv6/route.c

sysctls are read while their values can change,
add READ_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: exthdrs: annotate data-race over multiple sysctl

Following four sysctls can change under us, add missing READ_ONCE().

- ipv6.sysctl.max_dst_opts_len
- ipv6.sysctl.max_dst_opts_cnt
- ipv6.sysctl.max_hbh_opts_len
- ipv6.sysctl.max_hbh_opts_cnt

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: annotate data-races around sysctl.ip6_rt_gc_interval

Add READ_ONCE() on lockless reads of net->ipv6.sysctl.ip6_rt_gc_interval

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: annotate data-races over sysctl.flowlabel_reflect

Add missing READ_ONCE() when reading ipv6.sysctl.flowlabel_reflect,
as its value can be changed under us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: annotate data-races in ip6_multipath_hash_{policy,fields}()

Add missing READ_ONCE() when reading sysctl values.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: annotate date-race in ipv6_can_nonlocal_bind()

Add a missing READ_ONCE(), and add const qualifiers to the two parameters.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: annotate data-races from ip6_make_flowlabel()

Use READ_ONCE() to read sysctl values in ip6_make_flowlabel()
and ip6_make_flowlabel()

Add a const qualifier to 'struct net' parameters.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: add sysctl_ipv6_flowlabel group

Group together following struct netns_sysctl_ipv6 fields:

- flowlabel_consistency
- auto_flowlabels
- flowlabel_state_ranges

After this patch, ip6_make_flowlabel() uses a single cache line to fetch
auto_flowlabels and flowlabel_state_ranges (instead of two before the patch).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260115094141.3124990-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-convert-drivers-to-get_rx_ring_count-part-2'

Breno Leitao says:

====================
net: convert drivers to .get_rx_ring_count (part 2)

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

This simplifies the RX ring count retrieval and aligns the following
drivers with the new ethtool API for querying RX ring parameters.
  * engleder/tsnep
  * mediatek
  * amazon/ena
  * microchip/lan743x
  * amd/xgbe
  * chelsio/cxgb4
  * wangxun/txgbe
  * cadence/macb

All of these change were compile-tested only.
====================

Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-0-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: txgbe: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-9-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: macb: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-8-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: cxgb4: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-7-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: xgbe: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Since ETHTOOL_GRXRINGS was the only command handled by xgbe_get_rxnfc(),
remove the function entirely.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-6-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan743x: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Since ETHTOOL_GRXRINGS was the only command handled by
lan743x_ethtool_get_rxnfc(), remove the function entirely.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-5-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Since ETHTOOL_GRXRINGS was the only useful command handled by
ena_get_rxnfc(), remove the function entirely.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-4-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mediatek: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-3-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tsnep: convert to use .get_rx_ring_count

Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-2-b3e1b58bced5@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-net-improve-error-handling-in-passive-tfo-test'

Yohei Kojima says:

====================
selftests: net: improve error handling in passive TFO test

This series improves error handling in the passive TFO test by (1)
fixing a broken behavior when the child processes failed (or timed out),
and (2) adding more error handlng code in the test program.

The first patch fixes the behavior that the test didn't report failure
even if the server or the client process exited with non-zero status.
The second patch adds error handling code in the test program to improve
reliability of the test.
====================

Link: https://patch.msgid.link/cover.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: improve error handling in passive TFO test

Improve the error handling in passive TFO test to check the return value
from sendto(), and to fail if read() or fprintf() failed.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/24707c8133f7095c0e5a94afa69e75c3a80bf6e7.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: fix passive TFO test to fail if child processes failed

Improve the passive TFO test to report failure if the server or the
client timed out or exited with non-zero status.

Before this commit, TFO test didn't fail even if exit(EXIT_FAILURE) is
added to the first line of the run_server() and run_client() functions.

Signed-off-by: Yohei Kojima <yk@y-koj.net>
Link: https://patch.msgid.link/214d399caec2e5de7738ced5736829915d507e4e.1768312014.git.yk@y-koj.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-realtek-simplify-and-reunify-c22-c45-drivers'

Daniel Golle says:

====================
net: phy: realtek: simplify and reunify C22/C45 drivers

The RTL8221B PHY variants (VB-CG and VM-CG) were previously split into
separate C22 and C45 driver instances to support copper SFP modules
using the RollBall MDIO-over-I2C protocol, which only supports Clause-45
access. However, this split created significant code duplication and
complexity.

Commit 8af2136e77989 ("net: phy: realtek: add helper
RTL822X_VND2_C22_REG") exposed that RealTek PHYs map all standard
Clause-22 registers into MDIO_MMD_VEND2 at offset 0xa400.

With commit 1850ec20d6e71 ("net: phy: realtek: use paged access for
MDIO_MMD_VEND2 in C22 mode") it is now possible to access all MMD
registers transparently, regardless of whether the PHY is accessed via
C22 or C45 MDIO.

Further improve the translation logic for this register mapping, so a
single unified driver works efficiently with both access methods,
reducing code duplication.

The series also includes cleanup to remove unnecessary paged operations
on registers that aren't actually affected by page selection.

Testing was done on RTL8211F and RTL8221B-VB-CG (the latter in both
C22 and C45 modes).
====================

Link: https://patch.msgid.link/cover.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: simplify bogus paged operations

Only registers 0x10~0x17 are affected by the value in the page
selection register 0x1f. Hence there is no point in using paged
operations when accessing any other registers.
Simplify the driver by using the normal phy_read and phy_write
operations for registers which are anyway not affected by paging.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/0c5cbb66ce3e72a011d76f8c3d61ebcac44483bb.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: demystify PHYSR register location

Turns out that register address RTL_VND2_PHYSR (0xa434) maps to
Clause-22 register MII_RESV2. Use that to get rid of yet another magic
number, and rename access macros accordingly.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/6ed246e0aa3ca8038d2fa432d51518959fb89b6b.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: reunify C22 and C45 drivers

Reunify the split C22/C45 drivers for the RTL8221B-VB-CG 2.5Gbps and
RTL8221B-VM-CG 2.5Gbps PHYs back into a single driver.

This is possible now by using all the driver operations previously used
by the C45 driver, as transparent access to all MMDs including
MDIO_MMD_VEND2 is now possible also over Clause-22 MDIO.

The unified driver will still only use Clause-45 access on any Clause-45
capable busses while still working fine on Clause-22 busses.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/bffcb85fdc20e07056976962d3caaa1be5d0ddb0.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: simplify C22 reg access via MDIO_MMD_VEND2

RealTek 2.5GE PHYs have all standard Clause-22 registers mapped also
inside MDIO_MMD_VEND2 at offset 0xa400. This is used mainly in case the
PHY is connected to a Clause-45-only bus. The RTL8221B is frequently
used in copper SFP module which uses the RollBall MDIO-over-I2C
method which *only* supports Clause-45, for example.

In order to support using the PHY on Clause-45-only busses, the PHY
driver has previously been split into a C22-only and C45-only instances,
creating quite a bit of redundancy and confusion.

In preparation of reunifying the two driver instances, add support for
translating MDIO_MMD_VEND2 registers 0xa400 to 0xa43c back to Clause-22
registers 0 to 30 in case the PHY is accessed on a Clause-22 bus.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/fd49d86bd0445b76269fd3ea456c709c2066683f.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: support interrupt also for C22 variants

Now that access to MDIO_MMD_VEND2 works transparently also in Clause-22
mode, add interrupt support also for the C22 variants of the
RTL8221B-VB-CG and RTL8221B-VM-CG. This results in the C22 and C45
driver instances now having all the same features implemented.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/7620084b1de01580edc2d0e1b9548507fb4643a8.1768275364.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: fix dwmac4 transmit performance regression

dwmac4's transmit performance dropped by a factor of four due to an
incorrect assumption about which definitions are for what. This
highlights the need for sane register macros.

Commit 8409495bf6c9 ("net: stmmac: cores: remove many xxx_SHIFT
definitions") changed the way the txpbl value is merged into the
register:

        value = readl(ioaddr + DMA_CHAN_TX_CONTROL(dwmac4_addrs, chan));
-       value = value | (txpbl << DMA_BUS_MODE_PBL_SHIFT);
+       value = value | FIELD_PREP(DMA_BUS_MODE_PBL, txpbl);

With the following in the header file:

#define DMA_BUS_MODE_PBL               BIT(16)
-#define DMA_BUS_MODE_PBL_SHIFT         16

The assumption here was that DMA_BUS_MODE_PBL was the mask for
DMA_BUS_MODE_PBL_SHIFT, but this turns out not to be the case.

The field is actually six bits wide, buts 21:16, and is called
TXPBL.

What's even more confusing is, there turns out to be a PBLX8
single bit in the DMA_CHAN_CONTROL register (0x1100 for channel 0),
and DMA_BUS_MODE_PBL seems to be used for that. However, this bit
et.al. was listed under a comment "/* DMA SYS Bus Mode bitmap */"
which is for register 0x1004.

Fix this up by adding an appropriately named field definition under
the DMA_CHAN_TX_CONTROL() register address definition.

Move the RPBL mask definition under DMA_CHAN_RX_CONTROL(), correctly
renaming it as well.

Also move the PBL bit definition under DMA_CHAN_CONTROL(), correctly
renaming it.

This removes confusion over the PBL fields.

Fixes: 8409495bf6c9 ("net: stmmac: cores: remove many xxx_SHIFT definitions")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Bisected-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://lore.kernel.org/51859704-57fd-4913-b09d-9ac58a57f185@bootlin.com
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vgY1k-00000003vOC-0Z1H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: move tcp_rate_skb_sent() to tcp_output.c

It is only called from __tcp_transmit_skb() and __tcp_retransmit_skb().

Move it in tcp_output.c and make it static.

clang compiler is now able to inline it from __tcp_transmit_skb().

gcc compiler inlines it in the two callers, which is also fine.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260114165109.1747722-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: cpsw_ale: Remove obsolete macros

- ALE_VERSION_MAJOR/MINOR are no longer used following the transition to
  regmaps in commit bbfc7e2b9ebe ("net: ethernet: ti: cpsw_ale: use
  regfields for ALE registers")
- ALE_VERSION_IR3 is unused since entry mask bits are no longer
  hardcoded with commit b5d31f294027 ("net: ethernet: ti: ale: optimize
  ale entry mask bits configuartion")
- ALE_VERSION_IR4 has never been used since its introduction in commit
  ca47130a744b ("net: netcp: ale: update to support unknown vlan
  controls for NU switch")

Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Link: https://patch.msgid.link/20260114144425.3973272-1-stefan.wiehler@nokia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: cake: avoid separate allocation of struct cake_sched_config

Paolo pointed out that we can avoid separately allocating struct
cake_sched_config even in the non-mq case, by embedding it into struct
cake_sched_data. This reduces the complexity of the logic that swaps the
pointers and frees the old value, at the cost of adding 56 bytes to the
latter. Since cake_sched_data is already almost 17k bytes, this seems
like a reasonable tradeoff.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Fixes: bc0ce2bad36c ("net/sched: sch_cake: Factor out config variables into separate struct")
Link: https://patch.msgid.link/20260113143157.2581680-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: tls: Enhance TLS resync async process documentation

Expand the tls-offload.rst documentation to provide a more detailed
explanation of the asynchronous resync process, including the role
of struct tls_offload_resync_async in managing resync requests on
the kernel side.

Also, add documentation for helper functions
tls_offload_rx_resync_async_request_start/ _end/ _cancel.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1768298883-1602599-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'uapi-use-uapi-definitions-of-int_max-and-int_min'

Thomas Weißschuh says:

====================
uapi: Use UAPI definitions of INT_MAX and INT_MIN

Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a
dependency on a libc, which UAPI headers should not do.

Introduce and use equivalent UAPI constants.
====================

Link: https://patch.msgid.link/20260113-uapi-limits-v2-0-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: uapi: Use UAPI definition of INT_MAX and INT_MIN

Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a
dependency on a libc, which UAPI headers should not do.

Use the equivalent UAPI constants.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20260113-uapi-limits-v2-3-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: uapi: Use UAPI definition of INT_MAX

Using <limits.h> to gain access to INT_MAX introduces a dependency on a
libc, which UAPI headers should not do.

Use the equivalent UAPI constant.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20260113-uapi-limits-v2-2-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

uapi: add INT_MAX and INT_MIN constants

Some UAPI headers use INT_MAX and INT_MIN. Currently they include
<limits.h> for their definitions, which introduces a problematic
dependency on libc.

Add custom, namespaced definitions of INT_MAX and INT_MIN using the
same values as the regular kernel code.
These definitions are not added to uapi/linux/limits.h, as that header
will conflict with libc definitions on some platforms.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20260113-uapi-limits-v2-1-93c20f4b2c1a@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: sr9700: remove code to drive nonexistent MII

This device does not have a MII, even though the driver
contains code to drive one (because it originated as a copy of the
dm9601 driver). It also only supports 10Mbps half-duplex
operation (the DM9601 registers to set the speed/duplex mode
are read-only). Remove all MII-related code and implement
sr9700_get_link_ksettings which returns hardcoded correct
information for the link speed and duplex mode. Also add
announcement of the link status like many other Ethernet
drivers have.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260113040649.54248-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-pcs-rzn1-miic-support-configurable-phy_link-polarity'

Lad Prabhakar says:

====================
net: pcs: rzn1-miic: Support configurable PHY_LINK polarity

This series adds support for configuring the active level of MIIC
PHY_LINK status signals on Renesas RZ/N1 and RZ/T2H/N2H platforms.

The MIIC block provides dedicated hardware PHY_LINK signals that indicate
EtherPHY link-up and link-down status independently of whether the MAC
(GMAC) or Ethernet switch (ETHSW) is used. While GMAC-based systems
typically obtain link state via MDIO and handle it in software, the
ETHSW relies on these PHY_LINK pins for both CPU-assisted operation and
switch-only forwarding paths that do not involve the host processor.

These hardware PHY_LINK signals are particularly important for use cases
requiring fast reaction to link-down events, such as redundancy protocols
including Device Level Ring (DLR). In such scenarios, relying solely on
software-based link detection introduces latency that can negatively
impact recovery time. The ETHSW therefore exposes PHY_LINK signals to
enable immediate hardware-level detection of cable or port failures.

Some systems require the PHY_LINK signal polarity to be configured as
active low rather than the default active high. This series introduces a
new DT property to describe the required polarity and adds corresponding
driver support to program the MIIC PHY_LINK register accordingly. The
configuration is accumulated during DT parsing and applied once hardware
initialization is complete, taking into account SoC-specific differences
between RZ/N1 and RZ/T2H/N2H.
====================

Link: https://patch.msgid.link/20260112173555.1166714-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pcs: rzn1-miic: Add PHY_LINK active-level configuration support

Add support to configure the active level of MIIC PHY_LINK status signals
on a per-converter basis using a DT property.

MIIC provides dedicated PHY_LINK signals that indicate EtherPHY link-up and
link-down status in hardware. These signals are required regardless of
whether GMAC or ETHSW is used. With GMAC, link state is retrieved via
MDC/MDIO and handled in software, while ETHSW relies on PHY_LINK pins for
both CPU-assisted operation and switch-only data paths that do not involve
the host.

Hardware PHY_LINK signals are also critical for fast reaction to link-down
events, for example when running redundancy protocols such as Device Level
Ring (DLR), where rapid detection of cable faults is required to switch to
an alternate path without software latency.

Parse the requested polarity from DT, accumulate the configuration during
probing, and apply it to the MIIC_PHY_LINK register once hardware
initialization is complete, when the registers can be safely modified.
Handle SoC-specific bit layout differences between RZ/N1 and RZ/T2H/N2H
within the driver.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260112173555.1166714-3-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: pcs: renesas,rzn1-miic: Add phy_link property

Add the renesas,miic-phy-link-active-low property to allow configuring
the active level of phy_link status signals provided by the MIIC block.

EtherPHY link-up and link-down status is required as a hardware IP
feature independent of whether GMAC or ETHSW is used. With GMAC, link
state is retrieved via MDC/MDIO and handled in software. In contrast,
ETHSW exposes dedicated PHY_LINK pins that provide this information
directly in hardware.

These PHY_LINK signals are required not only for host-controlled traffic
but also for switch-only forwarding paths where frames are exchanged
between external nodes without CPU involvement. This is particularly
important for redundancy protocols such as DLR (Device Level Ring),
which depend on fast detection of link-down events caused by cable or
port failures. Handling such events purely in software introduces
latency, which is why ETHSW provides dedicated hardware PHY_LINK pins.

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260112173555.1166714-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mctp i2c: initialise event handler read bytes

Set a 0xff value for i2c reads of an mctp-i2c device. Otherwise reads
will return "val" from the i2c bus driver. For i2c-aspeed and
i2c-npcm7xx that is a stack uninitialised u8.

Tested with "i2ctransfer -y 1 r10@0x34" where 0x34 is a mctp-i2c
instance, now it returns all 0xff.

Fixes: f5b8abf9fc3d ("mctp i2c: MCTP I2C binding driver")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20260113-mctp-read-fix-v1-1-70c4b59c741c@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xgbe: Use netlink extack to report errors to ethtool

Upgrade XGBE driver to report errors via netlink extack instead
of netdev_error so ethtool userspace can be aware of failures.

Signed-off-by: Vishal Badole <Vishal.Badole@amd.com>
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20260114080357.1778132-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix build break on non-x86 platforms

Commit c470195b989fe added .getcrosststamp() interface where
the code uses boot_cpu_has() function which is available only
in x86 platforms. This fails the build on any other platform.

Since the interface is going to be supported only on x86 anyway,
we can simply compile out the entire support on non-x86 platforms.

Cover the .getcrosststamp support under CONFIG_X86

Fixes: c470195b989f ("bnxt_en: Add PTP .getcrosststamp() interface to get device/host times")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601111808.WnBJCuWI-lkp@intel.com
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260113183422.508851-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: add WQ_PERCPU to alloc_workqueue users

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Fan Gong <gongfan1@huawei.com>
Link: https://patch.msgid.link/20260113151433.257320-1-marco.crivellari@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: fix in-band capabilities for 2.5G PHYs

It looks like the configuration of in-band AN only affects SGMII, and it
is always disabled for 2500Base-X. Adjust the reported capabilities
accordingly.

This is based on testing using OpenWrt on Zyxel XGS1010-12 rev A1 with
RTL8226-CG, and Zyxel XGS1210-12 rev B1 with RTL8221B-VB-CG. On these
devices, 2500Base-X in-band AN is known to work with some SFP modules
(containing an unknown PHY). However, with the built-in Realtek PHYs,
no auto-negotiation takes place, irrespective of the configuration of
the PHY.

Fixes: 10fbd71fc5f9b ("net: phy: realtek: implement configuring in-band an")
Signed-off-by: Jan Hoffmann <jan@3e8.eu>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20260113205557.503409-1-jan@3e8.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: minor __alloc_skb() optimization

We can directly call __finalize_skb_around()
instead of __build_skb_around() because @size is not zero.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260113131017.2310584-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: remove unused fixup unregistering functions

No user of PHY fixups unregisters these. IOW: The fixup unregistering
functions are unused and can be removed. Remove also documentation
for these functions. Whilst at it, remove also mentioning of
phy_register_fixup() from the Documentation, as this function has been
static since ea47e70e476f ("net: phy: remove fixup-related definitions
from phy.h which are not used outside phylib").

Fixup unregistering functions were added with f38e7a32ee4f
("phy: add phy fixup unregister functions") in 2016, and last user
was removed with 6782d06a47ad ("net: usb: lan78xx: Remove KSZ9031 PHY
fixup") in 2024.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ff8ac321-435c-48d0-b376-fbca80c0c22e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: sr9700: fix byte numbering in comments

The comments describing the RX/TX headers and status response use
a combination of 0- and 1-based indexing, leading to confusion. Correct
the numbering and make it consistent. Also fix a typo "pm" for "pn".

This issue also existed in dm9601 and was fixed in commit 61189c78bda8
("dm9601: trivial comment fixes").

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Link: https://patch.msgid.link/20260113075327.85435-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: dnet: remove driver

This legacy platform driver was used with some Qong board.
Support for this board was removed with
commit c93197b0041d ("ARM: imx: Remove i.MX31 board files")
in 2020. So remove this now orphaned driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/cef7c728-28ee-439f-b747-eb1c9394fe51@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-airoha-init-block-ack-memory-region-for-mt7996-npu-offloading'

Lorenzo Bianconi says:

====================
net: airoha: Init Block Ack memory region for MT7996 NPU offloading

This is a preliminary series in order to enable NPU offloading for
MT7996 (Eagle) chipset.
====================

Link: https://patch.msgid.link/20260108-airoha-ba-memory-region-v3-0-bf1814e5dcc4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: npu: Init BA memory region if provided via DTS

Initialize NPU Block Ack memory region if reserved via DTS.
Block Ack memory region is used by NPU MT7996 (Eagle) offloading.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260108-airoha-ba-memory-region-v3-2-bf1814e5dcc4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: airoha: npu: Add BA memory region

Introduce Block Ack memory region used by NPU MT7996 (Eagle) offloading.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260108-airoha-ba-memory-region-v3-1-bf1814e5dcc4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-adin-enable-configuration-of-the-lp-termination-register'

Osose Itua says:

====================
net: phy: adin: enable configuration of the LP Termination Register
====================

Link: https://patch.msgid.link/20260107221913.1334157-1-osose.itua@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: adin: enable configuration of the LP Termination Register

The ADIN1200/ADIN1300 provide a control bit that selects between normal
receive termination and the lowest common mode impedance for 100BASE-TX
operation. This behavior is controlled through the Low Power Termination
register (B_100_ZPTM_EN_DIMRX).

Bit 0 of this register enables normal termination when set (this is the
default), and selects the lowest common mode impedance when cleared.

Signed-off-by: Osose Itua <osose.itua@savoirfairelinux.com>
Acked-by: Nuno Sá <nuno.sa@analog.com>
Link: https://patch.msgid.link/20260107221913.1334157-3-osose.itua@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: adi,adin: document LP Termination property

Add "adi,low-cmode-impedance" boolean property which, when present,
configures the PHY for the lowest common-mode impedance on the receive
pair for 100BASE-TX operation by clearing the B_100_ZPTM_EN_DIMRX bit.
This is suited for capacitive coupled applications and other
applications where there may be a path for high common-mode noise to
reach the PHY.

If this value is not present, the value of the bit by default is 1,
which is normal termination (zero-power termination) mode.

Signed-off-by: Osose Itua <osose.itua@savoirfairelinux.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Nuno Sá <nuno.sa@analog.com>
Link: https://patch.msgid.link/20260107221913.1334157-2-osose.itua@savoirfairelinux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'phy_common_properties' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Vinod Koul says:

====================
phy common properties

Introduce "rx-polarity" and "tx-polarity" device tree properties
with Kunit tests (from Vladimir Oltean).

* tag 'phy_common_properties' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: add phy_get_rx_polarity() and phy_get_tx_polarity()
  dt-bindings: phy-common-props: RX and TX lane polarity inversion
  dt-bindings: phy-common-props: ensure protocol-names are unique
  dt-bindings: phy-common-props: create a reusable "protocol-names" definition
  dt-bindings: phy: rename transmit-amplitude.yaml to phy-common-props.yaml
====================

Link: https://patch.msgid.link/aWeXvFcGNK5T6As9@vaman
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: Clarify len/n_stats fields in/out semantics

Document that the 'len' field in ethtool_gstrings and 'n_stats' field in
ethtool_stats optionally serve dual purposes: on entry they specify the
number of items requested, and on return they indicate the number
actually returned (which is not necessarily the same).

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20260115060544.481550-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.19-rc6).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, can and IPsec.

  Current release - regressions:

   - net: add net.core.qdisc_max_burst

   - can: propagate CAN device capabilities via ml_priv

  Previous releases - regressions:

   - dst: fix races in rt6_uncached_list_del() and
     rt_del_uncached_list()

   - ipv6: fix use-after-free in inet6_addr_del().

   - xfrm: fix inner mode lookup in tunnel mode GSO segmentation

   - ip_tunnel: spread netdev_lockdep_set_classes()

   - ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv()

   - bluetooth: hci_sync: enable PA sync lost event

   - eth: virtio-net:
      - fix the deadlock when disabling rx NAPI
      - fix misalignment bug in struct virtnet_info

  Previous releases - always broken:

   - ipv4: ip_gre: make ipgre_header() robust

   - can: fix SSP_SRC in cases when bit-rate is higher than 1 MBit.

   - eth:
      - mlx5e: profile change fix
      - octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
      - macvlan: fix possible UAF in macvlan_forward_source()"

* tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
  virtio_net: Fix misalignment bug in struct virtnet_info
  net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
  can: raw: instantly reject disabled CAN frames
  can: propagate CAN device capabilities via ml_priv
  Revert "can: raw: instantly reject unsupported CAN frames"
  net/sched: sch_qfq: do not free existing class in qfq_change_class()
  selftests: drv-net: fix RPS mask handling for high CPU numbers
  selftests: drv-net: fix RPS mask handling in toeplitz test
  ipv6: Fix use-after-free in inet6_addr_del().
  dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list()
  net: hv_netvsc: reject RSS hash key programming without RX indirection table
  tools: ynl: render event op docs correctly
  net: add net.core.qdisc_max_burst
  net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definition
  net: phy: motorcomm: fix duplex setting error for phy leds
  net: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
  net/mlx5e: Restore destroying state bit after profile cleanup
  net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv
  net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv
  net/mlx5e: Fix crash on profile change rollback failure
  ...

Merge tag 'linux-can-fixes-for-6.19-20260115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2026-01-15

this is a pull request of 4 patches for net/main, it super-seeds the
"can 2026-01-14" pull request. The dev refcount leak in patch #3 is
fixed.

The first 3 patches are by Oliver Hartkopp and revert the approach to
instantly reject unsupported CAN frames introduced in
net-next-for-v6.19 and replace it by placing the needed data into the
CAN specific ml_priv.

The last patch is by Tetsuo Handa and fixes a J1939 refcount leak for
j1939_session in session deactivation upon receiving the second RTS.

linux-can-fixes-for-6.19-20260115

* tag 'linux-can-fixes-for-6.19-20260115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
  can: raw: instantly reject disabled CAN frames
  can: propagate CAN device capabilities via ml_priv
  Revert "can: raw: instantly reject unsupported CAN frames"
====================

Link: https://patch.msgid.link/20260115090603.1124860-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2026-01-14

1) Fix inner mode lookup in tunnel mode GSO segmentation.
   The protocol was taken from the wrong field.

2) Set ipv4 no_pmtu_disc flag only on output SAs. The
   insertation of input SAs can fail if no_pmtu_disc
   is set.

Please pull or let me know if there are problems.

ipsec-2026-01-14

* tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is set
  xfrm: Fix inner mode lookup in tunnel mode GSO segmentation
====================

Link: https://patch.msgid.link/20260114121817.1106134-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: inline napi_skb_cache_get()

clang is inlining it already, gcc (14.2) does not.

Small space cost (215 bytes on x86_64) but faster sk_buff allocations.

$ scripts/bloat-o-meter -t net/core/skbuff.gcc.before.o net/core/skbuff.gcc.after.o
add/remove: 0/1 grow/shrink: 4/1 up/down: 359/-144 (215)
Function                                     old     new   delta
__alloc_skb                                  471     611    +140
napi_build_skb                               245     363    +118
napi_alloc_skb                               331     416     +85
skb_copy_ubufs                              1869    1885     +16
skb_shift                                   1445    1413     -32
napi_skb_cache_get                           112       -    -112
Total: Before=59941, After=60156, chg +0.36%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260112131515.4051589-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-mlx5-hws-single-flow-counter-support'

Tariq Toukan says:

====================
net/mlx5: HWS single flow counter support

This small series refactors the flow counter bulk initialization code
and extends it so that single flow counters are also usable by hardware
steering (HWS) rules.

Patches 1-2 refactor the bulk init path: first by factoring out common
flow counter bulk initialization into mlx5_fc_bulk_init(), then by
splitting the bitmap allocation into mlx5_fs_bulk_bitmap_alloc(), with
no functional changes.

Patch 3 initializes bulk data for counters allocated via
mlx5_fc_single_alloc(), so they can be safely used by HWS rules.
====================

Link: https://patch.msgid.link/1768210825-1598472-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Initialize bulk for single flow counters

Ensure that flow counters allocated with mlx5_fc_single_alloc() have
bulk correctly initialized so they can safely be used in HWS rules.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fs, split bulk init

Refactor mlx5_fs_bulk_init() by moving bitmap allocation logic into a
new helper function mlx5_fs_bulk_bitmap_alloc(). This change does not
alter any logic.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: fs, factor out flow counter bulk init

Add mlx5_fc_bulk_init() to handle bulk initialization of flow counters.
This change does not alter any logic, but refactors the code to remove
duplicate initialization logic by centralizing it in a single function.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768210825-1598472-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'introduce-and-use-netif_xmit_timeout_ms-helper'

Tariq Toukan says:

====================
Introduce and use netif_xmit_timeout_ms() helper

This is V2, find V1 here:
https://lore.kernel.org/all/1764054776-1308696-1-git-send-email-tariqt@nvidia.com/

This series by Shahar introduces a new helper function
netif_xmit_timeout_ms() to check if a TX queue has timed out and report
the timeout duration.
It also encapsulates the check for whether the TX queue is stopped.

Replace duplicated open-coded timeout check in hns3 driver with the new
helper.

For mlx5e, refine the TX timeout recovery flow to act only on SQs whose
transmit timestamp indicates an actual timeout, as determined by the
helper. This prevents unnecessary channel reopen events caused by
attempting recovery on queues that are merely stopped but not truly
timed out.
====================

Link: https://patch.msgid.link/1768209383-1546791-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5e: Refine TX timeout handling to skip non-timed-out SQ

mlx5e_tx_timeout_work() is invoked when the dev_watchdog reports a
timed-out TX queue. Currently, the recovery flow is triggered for all
stopped SQs, which is not always correct — some SQs may be temporarily
stopped without actually timing out. Attempting to recover such SQs
results in no EQE being polled (since no real timeout occurred), which
the driver misinterprets as a recovery failure, unnecessarily causing
channel reopening.

Improve the logic to initiate recovery only for SQs that are both
stopped and timed out. Utilize the helper introduced in the previous
patch to determine whether the netdevice watchdog timeout period has
elapsed since the SQ’s last transmit timestamp.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1768209383-1546791-4-git-send-email-tariqt@nvidia.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>