]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
2 weeks agomt76: access ->pp through netmem_desc instead of page
Byungchul Park [Mon, 21 Jul 2025 02:18:28 +0000 (11:18 +0900)] 
mt76: access ->pp through netmem_desc instead of page

To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make mt76 access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-6-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetdevsim: access ->pp through netmem_desc instead of page
Byungchul Park [Mon, 21 Jul 2025 02:18:27 +0000 (11:18 +0900)] 
netdevsim: access ->pp through netmem_desc instead of page

To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make netdevsim access ->pp through netmem_desc instead of page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-5-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetmem, mlx4: access ->pp_ref_count through netmem_desc instead of page
Byungchul Park [Mon, 21 Jul 2025 02:18:26 +0000 (11:18 +0900)] 
netmem, mlx4: access ->pp_ref_count through netmem_desc instead of page

To eliminate the use of struct page in page pool, the page pool users
should use netmem descriptor and APIs instead.

Make mlx4 access ->pp_ref_count through netmem_desc instead of page.

While at it, add a helper, pp_page_to_nmdesc() and __pp_page_to_nmdesc(),
that can be used to get netmem_desc from page only if it's a pp page.
For now that netmem_desc overlays on page, it can be achieved by just
casting, and use macro and _Generic to cover const casting as well.

Plus, change page_pool_page_is_pp() to check for 'const struct page *'
instead of 'struct page *' since it doesn't modify data and additionally
covers const type.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-4-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetmem: use netmem_desc instead of page to access ->pp in __netmem_get_pp()
Byungchul Park [Mon, 21 Jul 2025 02:18:25 +0000 (11:18 +0900)] 
netmem: use netmem_desc instead of page to access ->pp in __netmem_get_pp()

To eliminate the use of the page pool fields in struct page, the page
pool code should use netmem descriptor and APIs instead.

However, __netmem_get_pp() still accesses ->pp via struct page.  So
change it to use struct netmem_desc instead, since ->pp no longer will
be available in struct page.

While at it, add a helper, __netmem_to_nmdesc(), that can be used to
unsafely get pointer to netmem_desc backing the netmem_ref, only when
the netmem_ref is always backed by system memory.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Link: https://patch.msgid.link/20250721021835.63939-3-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetmem: introduce struct netmem_desc mirroring struct page
Byungchul Park [Mon, 21 Jul 2025 02:18:24 +0000 (11:18 +0900)] 
netmem: introduce struct netmem_desc mirroring struct page

To simplify struct page, the page pool members of struct page should be
moved to other, allowing these members to be removed from struct page.

Introduce a network memory descriptor to store the members, struct
netmem_desc, and make it union'ed with the existing fields in struct
net_iov, allowing to organize the fields of struct net_iov.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20250721021835.63939-2-byungchul@sk.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agovxlan: remove redundant conversion of vni in vxlan_nl2conf
Wang Liang [Tue, 22 Jul 2025 09:30:49 +0000 (17:30 +0800)] 
vxlan: remove redundant conversion of vni in vxlan_nl2conf

The IFLA_VXLAN_ID data has been converted to local variable vni in
vxlan_nl2conf(), there is no need to do it again when set conf->vni.

Signed-off-by: Wang Liang <wangliang74@huawei.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250722093049.1527505-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetdevsim: add fw_update_flash_chunk_time_ms debugfs knobs
Jiri Pirko [Tue, 22 Jul 2025 09:19:45 +0000 (11:19 +0200)] 
netdevsim: add fw_update_flash_chunk_time_ms debugfs knobs

Netdevsim emulates firmware update and it takes 5 seconds to complete.
For some use cases, this is too long and unnecessary. Allow user to
configure the time by exposing debugfs a knob to set chunk time.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250722091945.79506-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodevlink: Fix excessive stack usage in rate TC bandwidth parsing
Carolina Jubran [Tue, 22 Jul 2025 09:13:29 +0000 (12:13 +0300)] 
devlink: Fix excessive stack usage in rate TC bandwidth parsing

The devlink_nl_rate_tc_bw_parse function uses a large stack array for
devlink attributes, which triggers a warning about excessive stack
usage:

net/devlink/rate.c: In function 'devlink_nl_rate_tc_bw_parse':
net/devlink/rate.c:382:1: error: the frame size of 1648 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]

Introduce a separate attribute set specifically for rate TC bandwidth
parsing that only contains the two attributes actually used: index
and bandwidth. This reduces the stack array from DEVLINK_ATTR_MAX
entries to just 2 entries, solving the stack usage issue.

Update devlink selftest to use the new 'index' and 'bw' attribute names
consistent with the YAML spec.

Example usage with ynl with the new spec:

    ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
      --do rate-set --json '{
      "bus-name": "pci",
      "dev-name": "0000:08:00.0",
      "port-index": 1,
      "rate-tc-bws": [
        {"index": 0, "bw": 50},
        {"index": 1, "bw": 50},
        {"index": 2, "bw": 0},
        {"index": 3, "bw": 0},
        {"index": 4, "bw": 0},
        {"index": 5, "bw": 0},
        {"index": 6, "bw": 0},
        {"index": 7, "bw": 0}
      ]
    }'

    ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
      --do rate-get --json '{
      "bus-name": "pci",
      "dev-name": "0000:08:00.0",
      "port-index": 1
    }'

    output for rate-get:
    {'bus-name': 'pci',
     'dev-name': '0000:08:00.0',
     'port-index': 1,
     'rate-tc-bws': [{'bw': 50, 'index': 0},
                     {'bw': 50, 'index': 1},
                     {'bw': 0, 'index': 2},
                     {'bw': 0, 'index': 3},
                     {'bw': 0, 'index': 4},
                     {'bw': 0, 'index': 5},
                     {'bw': 0, 'index': 6},
                     {'bw': 0, 'index': 7}],
     'rate-tx-max': 0,
     'rate-tx-priority': 0,
     'rate-tx-share': 0,
     'rate-tx-weight': 0,
     'rate-type': 'leaf'}

Fixes: 566e8f108fc7 ("devlink: Extend devlink rate API with traffic classes bandwidth management")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/netdev/20250708160652.1810573-1-arnd@kernel.org/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202507171943.W7DJcs6Y-lkp@intel.com/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Tested-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/1753175609-330621-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Wed, 23 Jul 2025 01:37:23 +0000 (18:37 -0700)] 
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2025-07-22

The following pull-request contains common mlx5 updates

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Expose cable_length field in PFCC register
  net/mlx5: Add IFC bits and enums for buf_ownership
  net/mlx5: Add IFC bits to support RSS for IPSec offload
====================

Link: https://patch.msgid.link/1753175048-330044-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'tcp-a-couple-of-fixes'
Jakub Kicinski [Wed, 23 Jul 2025 01:21:17 +0000 (18:21 -0700)] 
Merge branch 'tcp-a-couple-of-fixes'

Paolo Abeni says:

====================
tcp: a couple of fixes

This series includes a couple of follow-up for the recent tcp receiver
changes, addressing issues outlined by the nipa CI and the mptcp
self-tests.

Note that despite the affected self-tests where MPTCP ones, the issues
are really in the TCP code, see patch 1 for the details.

v1: https://lore.kernel.org/cover.1752859383.git.pabeni@redhat.com
====================

Link: https://patch.msgid.link/cover.1753118029.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotcp: do not increment BeyondWindow MIB for old seq
Paolo Abeni [Mon, 21 Jul 2025 17:20:22 +0000 (19:20 +0200)] 
tcp: do not increment BeyondWindow MIB for old seq

The mentioned MIB is currently incremented even when a packet
with an old sequence number (i.e. a zero window probe) is received,
which is IMHO misleading.

Explicitly restrict such MIB increment at the relevant events.

Fixes: 6c758062c64d ("tcp: add LINUX_MIB_BEYOND_WINDOW")
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20d147292eb4b13b6535e0ad6f56be64d9c330d3.1753118029.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotcp: do not set a zero size receive buffer
Paolo Abeni [Mon, 21 Jul 2025 17:20:21 +0000 (19:20 +0200)] 
tcp: do not set a zero size receive buffer

The nipa CI is reporting frequent failures in the mptcp_connect
self-tests.

In the failing scenarios (TCP -> MPTCP) the involved sockets are
actually plain TCP ones, as fallback for passive socket at 2whs
time cause the MPTCP listener to actually create a TCP socket.

The transfer is stuck due to the receiver buffer being zero.
With the stronger check in place, tcp_clamp_window() can be invoked
while the TCP socket has sk_rmem_alloc == 0, and the receive buffer
will be zeroed, too.

Check for the critical condition in tcp_prune_queue() and just
drop the packet without shrinking the receiver buffer.

Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20c18165d3f848e1c5c1b782d88c1a5ab38b3f70.1753118029.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'net-mlx5-misc-changes-2025-07-21'
Jakub Kicinski [Wed, 23 Jul 2025 01:20:15 +0000 (18:20 -0700)] 
Merge branch 'net-mlx5-misc-changes-2025-07-21'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-07-21

This series by Lama contains misc enhancements to the SHAMPO parameters.
====================

Link: https://patch.msgid.link/1753081999-326247-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet/mlx5e: Remove duplicate mkey from SHAMPO header
Lama Kayal [Mon, 21 Jul 2025 07:13:19 +0000 (10:13 +0300)] 
net/mlx5e: Remove duplicate mkey from SHAMPO header

SHAMPO structure holds two variations of the mkey, which is unnecessary,
a duplication that's repeated per rq.

Remove duplicate mkey information and keep only one version, the one
used in the fast path, rename field to reflect field type clearly.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1753081999-326247-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet/mlx5e: SHAMPO, Remove mlx5e_shampo_get_log_hd_entry_size()
Lama Kayal [Mon, 21 Jul 2025 07:13:18 +0000 (10:13 +0300)] 
net/mlx5e: SHAMPO, Remove mlx5e_shampo_get_log_hd_entry_size()

Refactor mlx5e_shampo_get_log_hd_entry_size() as macro, for more
simplicity.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1753081999-326247-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet/mlx5e: SHAMPO, Cleanup reservation size formula
Lama Kayal [Mon, 21 Jul 2025 07:13:17 +0000 (10:13 +0300)] 
net/mlx5e: SHAMPO, Cleanup reservation size formula

The reservation size formula can be reduced to a simple evaluation of
MLX5E_SHAMPO_WQ_RESRV_SIZE. This leaves mlx5e_shampo_get_log_rsrv_size()
with one single use, which can be replaced with a macro for simplicity.

Also, function mlx5e_shampo_get_log_rsrv_size() is used only throughout
params.c, make it static.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1753081999-326247-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotcp: trace retransmit failures in tcp_retransmit_skb
Fan Yu [Mon, 21 Jul 2025 03:16:07 +0000 (11:16 +0800)] 
tcp: trace retransmit failures in tcp_retransmit_skb

Background
==========
When TCP retransmits a packet due to missing ACKs, the
retransmission may fail for various reasons (e.g., packets
stuck in driver queues, receiver zero windows, or routing issues).

The original tcp_retransmit_skb tracepoint:

  'commit e086101b150a ("tcp: add a tracepoint for tcp retransmission")'

lacks visibility into these failure causes, making production
diagnostics difficult.

Solution
========
Adds the retval("err") to the tcp_retransmit_skb tracepoint.
Enables users to know why some tcp retransmission failed and
users can filter retransmission failures by retval.

Compatibility description
=========================
This patch extends the tcp_retransmit_skb tracepoint
by adding a new "err" field at the end of its
existing structure (within TP_STRUCT__entry). The
compatibility implications are detailed as follows:

1) Structural compatibility for legacy user-space tools
Legacy tools/BPF programs accessing existing fields
(by offset or name) can still work without modification
or recompilation.The new field is appended to the end,
preserving original memory layout.

2) Note: semantic changes
The original tracepoint primarily only focused on
successfully retransmitted packets. With this patch,
the tracepoint now can figure out packets that may
terminate early due to specific reasons. For accurate
statistics, users should filter using "err" to
distinguish outcomes.

Before patched:
field:const void * skbaddr; offset:8; size:8; signed:0;
field:const void * skaddr; offset:16; size:8; signed:0;
field:int state; offset:24; size:4; signed:1;
field:__u16 sport; offset:28; size:2; signed:0;
field:__u16 dport; offset:30; size:2; signed:0;
field:__u16 family; offset:32; size:2; signed:0;
field:__u8 saddr[4]; offset:34; size:4; signed:0;
field:__u8 daddr[4]; offset:38; size:4; signed:0;
field:__u8 saddr_v6[16]; offset:42; size:16; signed:0;
field:__u8 daddr_v6[16]; offset:58; size:16; signed:0;

print fmt: "skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s"

After patched:
field:const void * skbaddr; offset:8; size:8; signed:0;
field:const void * skaddr; offset:16; size:8; signed:0;
field:int state; offset:24; size:4; signed:1;
field:__u16 sport; offset:28; size:2; signed:0;
field:__u16 dport; offset:30; size:2; signed:0;
field:__u16 family; offset:32; size:2; signed:0;
field:__u8 saddr[4]; offset:34; size:4; signed:0;
field:__u8 daddr[4]; offset:38; size:4; signed:0;
field:__u8 saddr_v6[16]; offset:42; size:16; signed:0;
field:__u8 daddr_v6[16]; offset:58; size:16; signed:0;
field:int err; offset:76; size:4; signed:1;

print fmt: "skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s err=%d"

Co-developed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250721111607626_BDnIJB0ywk6FghN63bor@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Kconfig: add endif/endmenu comments
Randy Dunlap [Mon, 21 Jul 2025 02:04:20 +0000 (19:04 -0700)] 
net: Kconfig: add endif/endmenu comments

Add comments on endif & endmenu blocks. This can save time
when searching & trying to understand kconfig menu dependencies.

The other endif & endmenu statements are already commented like this.

This makes it similar to drivers/net/Kconfig, which is already
commented like this.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250721020420.3555128-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'selftests-drv-net-test-xdp-native-support'
Jakub Kicinski [Wed, 23 Jul 2025 01:15:55 +0000 (18:15 -0700)] 
Merge branch 'selftests-drv-net-test-xdp-native-support'

Mohsin Bashir says:

====================
selftests: drv-net: Test XDP native support

This patch series add tests to validate XDP native support for PASS,
DROP, ABORT, and TX actions, as well as headroom and tailroom adjustment.
For adjustment tests, validate support for both the extension and
shrinking cases across various packet sizes and offset values.

The pass criteria for head/tail adjustment tests require that at-least
one adjustment value works for at-least one packet size. This ensure
that the variability in maximum supported head/tail adjustment offset
across different drivers is being incorporated.

The results reported in this series are based on netdevsim. However,
the series is tested against multiple other drivers including fbnic.

Note: The XDP support for fbnic will be added later.
====================

Link: https://patch.msgid.link/20250719083059.3209169-1-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: drv-net: Test head-adjustment support
Mohsin Bashir [Sat, 19 Jul 2025 08:30:59 +0000 (01:30 -0700)] 
selftests: drv-net: Test head-adjustment support

Add test to validate the headroom adjustment support for both extension
and the shrinking cases. For the extension part, eat up space from
the start of payload data whereas, for the shrinking part, populate
the newly available space with a tag. In the user-space, validate that a
test string is manipulated accordingly.
The negative and positive offset values result in shrinking and growing of
headroom (growing and shrinking of payload) respectively.

TAP version 13
1..9
ok 1 xdp.test_xdp_native_pass_sb
ok 2 xdp.test_xdp_native_pass_mb
ok 3 xdp.test_xdp_native_drop_sb
ok 4 xdp.test_xdp_native_drop_mb
ok 5 xdp.test_xdp_native_tx_mb
\# Failed run: pkt_sz 512, ... offset 1. Reason: Adjustment failed
ok 6 xdp.test_xdp_native_adjst_tail_grow_data
ok 7 xdp.test_xdp_native_adjst_tail_shrnk_data
\# Failed run: pkt_sz 512, ... offset -128. Reason: Adjustment failed
ok 8 xdp.test_xdp_native_adjst_head_grow_data
\# Failed run: pkt_sz (512) > HDS threshold (0) and offset 64 > 48
ok 9 xdp.test_xdp_native_adjst_head_shrnk_data
\# Totals: pass:9 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250719083059.3209169-6-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: drv-net: Test tail-adjustment support
Mohsin Bashir [Sat, 19 Jul 2025 08:30:58 +0000 (01:30 -0700)] 
selftests: drv-net: Test tail-adjustment support

Add test to validate support for the two cases of tail adjustment: 1)
tail extension, and 2) tail shrinking across different frame sizes and
offset values. For each of the two cases, test both the single and
multi-buffer cases by choosing appropriate packet size.

The negative offset value result in growing of tailroom (shrinking of
payload) while the positive offset result in shrinking of tailroom
(growing of payload).

Since the support for tail adjustment varies across drivers, classify the
test as pass if at least one combination of packet size and offset from a
pre-selected list results in a successful run. In case of an unsuccessful
run, report the failure and highlight the packet size and offset values
that caused the test to fail, as well as the values that resulted in the
last successful run.

Note: The growing part of this test for netdevsim may appear flaky when
the offset value is larger than 1. This behavior occurs because tailroom
is not explicitly reserved for netdevsim, with 1 being the typical
tailroom value. However, in certain cases, such as payload being the last
in the page with additional available space, the truesize is expanded.
This also result increases the tailroom causing the test to pass
intermittently. In contrast, when tailrrom is explicitly reserved, such
as in the of fbnic, the test results are deterministic.

./drivers/net/xdp.py
TAP version 13
1..7
ok 1 xdp.test_xdp_native_pass_sb
ok 2 xdp.test_xdp_native_pass_mb
ok 3 xdp.test_xdp_native_drop_sb
ok 4 xdp.test_xdp_native_drop_mb
ok 5 xdp.test_xdp_native_tx_mb
\# Failed run: ... successful run: ... offset 1. Reason: Adjustment failed
ok 6 xdp.test_xdp_native_adjst_tail_grow_data
ok 7 xdp.test_xdp_native_adjst_tail_shrnk_data
\# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250719083059.3209169-5-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: drv-net: Test XDP_TX support
Mohsin Bashir [Sat, 19 Jul 2025 08:30:57 +0000 (01:30 -0700)] 
selftests: drv-net: Test XDP_TX support

Add test to verify the XDP_TX functionality by generating traffic from a
remote node on a specific UDP port and redirecting it back to the sender.

./drivers/net/xdp.py
TAP version 13
1..5
ok 1 xdp.test_xdp_native_pass_sb
ok 2 xdp.test_xdp_native_pass_mb
ok 3 xdp.test_xdp_native_drop_sb
ok 4 xdp.test_xdp_native_drop_mb
ok 5 xdp.test_xdp_native_tx_mb
\# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250719083059.3209169-4-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: drv-net: Test XDP_PASS/DROP support
Mohsin Bashir [Sat, 19 Jul 2025 08:30:56 +0000 (01:30 -0700)] 
selftests: drv-net: Test XDP_PASS/DROP support

Test XDP_PASS/DROP in single buffer and multi buffer mode when
XDP native support is available.

./drivers/net/xdp.py
TAP version 13
1..4
ok 1 xdp.test_xdp_native_pass_sb
ok 2 xdp.test_xdp_native_pass_mb
ok 3 xdp.test_xdp_native_drop_sb
ok 4 xdp.test_xdp_native_drop_mb
\# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250719083059.3209169-3-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: netdevsim: hook in XDP handling
Jakub Kicinski [Sat, 19 Jul 2025 08:30:55 +0000 (01:30 -0700)] 
net: netdevsim: hook in XDP handling

Add basic XDP support by hooking in do_xdp_generic().
This should be enough to validate most basic XDP tests.

Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250719083059.3209169-2-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'octeontx2-af-rpm-misc-feaures'
Paolo Abeni [Tue, 22 Jul 2025 13:36:40 +0000 (15:36 +0200)] 
Merge branch 'octeontx2-af-rpm-misc-feaures'

Hariprasad Kelam says:

====================
Octeontx2-af: RPM: misc feaures

This series patches adds different features like debugfs
support for shared firmware structure and DMAC filter
related enhancements.

Patch1: Saves interface MAC address configured from DMAC filters.

Patch2: Disables the stale DMAC filters in driver initialization

Patch3: Configure dma mask for CGX/RPM drivers

Patch4: Debugfs support for shared firmware data.
====================

Link: https://patch.msgid.link/20250720163638.1560323-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoOcteontx2-af: Debugfs support for firmware data
Hariprasad Kelam [Sun, 20 Jul 2025 16:36:38 +0000 (22:06 +0530)] 
Octeontx2-af: Debugfs support for firmware data

MAC address, Link modes (supported and advertised) and eeprom data
for the Netdev interface are read from the shared firmware data.
This patch adds debugfs support for the same.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250720163638.1560323-5-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoOcteontx2-af: RPM: Update DMA mask
Hariprasad Kelam [Sun, 20 Jul 2025 16:36:37 +0000 (22:06 +0530)] 
Octeontx2-af: RPM: Update DMA mask

CGX/RPM driver supports 48 bits of DMA addressing. Update
the DMA mask accordingly.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250720163638.1560323-4-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoOcteontx2-af: Disable stale DMAC filters
Subbaraya Sundeep [Sun, 20 Jul 2025 16:36:36 +0000 (22:06 +0530)] 
Octeontx2-af: Disable stale DMAC filters

During driver initialization disable stale DMAC filters
in CGX/RPM set by firmware.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250720163638.1560323-3-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoOcteontx2-af: Add programmed macaddr to RVU pfvf
Hariprasad Kelam [Sun, 20 Jul 2025 16:36:35 +0000 (22:06 +0530)] 
Octeontx2-af: Add programmed macaddr to RVU pfvf

Octeontx2/CN10k MAC block supports DMAC filters. DMAC filters
can be installed on the interface through ethtool.

When a user installs a DMAC filter, the interface's MAC address
is implicitly added to the filter list. To ensure consistency,
this MAC address must be kept in sync with the pfvf->mac_addr field,
which is used to install MAC-based NPC rules.

This patch updates the pfvf->mac_addr field with the programmed MAC
address and also enables VF interfaces to install DMAC filters.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250720163638.1560323-2-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoibmveth: Add multi buffers rx replenishment hcall support
Mingming Cao [Sat, 19 Jul 2025 09:13:56 +0000 (05:13 -0400)] 
ibmveth: Add multi buffers rx replenishment hcall support

This patch enables batched RX buffer replenishment in ibmveth by
using the new firmware-supported h_add_logical_lan_buffers() hcall
 to submit up to 8 RX buffers in a single call, instead of repeatedly
calling the single-buffer h_add_logical_lan_buffer() hcall.

During the probe, with the patch, the driver queries ILLAN attributes
to detect IBMVETH_ILLAN_RX_MULTI_BUFF_SUPPORT bit. If the attribute is
present, rx_buffers_per_hcall is set to 8, enabling batched replenishment.
Otherwise, it defaults to 1, preserving the original upstream behavior
 with no change in code flow for unsupported systems.

The core rx replenish logic remains the same. But when batching
is enabled, the driver aggregates up to 8 fully prepared descriptors
into a single h_add_logical_lan_buffers() hypercall. If any allocation
or DMA mapping fails while preparing a batch, only the successfully
prepared buffers are submitted, and the remaining are deferred for
the next replenish cycle.

If at runtime the firmware stops accepting the batched hcall—e,g,
after a Live Partition Migration (LPM) to a host that does not
support h_add_logical_lan_buffers(), the hypercall returns H_FUNCTION.
In that case, the driver transparently disables batching, resets
rx_buffers_per_hcall to 1, and falls back to the single-buffer hcall
in next future replenishments to take care of these and future buffers.

Test were done on systems with firmware that both supports and
does not support the new h_add_logical_lan_buffers hcall.

On supported firmware, this reduces hypercall overhead significantly
over multiple buffers. SAR measurements showed about a 15% improvement
in packet processing rate under moderate RX load, with heavier traffic
seeing gains more than 30%

Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250719091356.57252-1-mmc@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoip6_gre: Factor out common ip6gre tunnel match into helper
Yue Haibing [Sat, 19 Jul 2025 08:15:51 +0000 (16:15 +0800)] 
ip6_gre: Factor out common ip6gre tunnel match into helper

Extract common ip6gre tunnel match from ip6gre_tunnel_lookup() into new
helper function ip6gre_tunnel_match() to reduce code duplication.

No functional change intended.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250719081551.963670-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoMerge branch 'gve-af_xdp-zero-copy-for-dqo-rda'
Paolo Abeni [Tue, 22 Jul 2025 09:35:51 +0000 (11:35 +0200)] 
Merge branch 'gve-af_xdp-zero-copy-for-dqo-rda'

Joshua Washington says:

====================
gve: AF_XDP zero-copy for DQO RDA

This patch series adds support for AF_XDP zero-copy in the DQO RDA queue
format.

XSK infrastructure is updated to re-post buffers when adding XSK pools
because XSK umem will be posted directly to the NIC, a departure from
the bounce buffer model used in GQI QPL. A registry of XSK pools is
introduced to prevent the usage of XSK pools when in copy mode.

v1: https://lore.kernel.org/netdev/20250714160451.124671-1-jeroendb@google.com/
====================

Link: https://patch.msgid.link/20250717152839.973004-1-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agogve: implement DQO RX datapath and control path for AF_XDP zero-copy
Joshua Washington [Thu, 17 Jul 2025 15:28:39 +0000 (08:28 -0700)] 
gve: implement DQO RX datapath and control path for AF_XDP zero-copy

Add the RX datapath for AF_XDP zero-copy for DQ RDA. The RX path is
quite similar to that of the normal XDP case. Parallel methods are
introduced to properly handle XSKs instead of normal driver buffers.

To properly support posting from XSKs, queues are destroyed and
recreated, as the driver was initially making use of page pool buffers
instead of the XSK pool memory.

Expose support for AF_XDP zero-copy, as the TX and RX datapaths both
exist.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-6-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agogve: implement DQO TX datapath for AF_XDP zero-copy
Joshua Washington [Thu, 17 Jul 2025 15:28:38 +0000 (08:28 -0700)] 
gve: implement DQO TX datapath for AF_XDP zero-copy

In the descriptor clean path, a number of changes need to be made to
accommodate out of order completions and double completions.

The XSK stack can only handle completions being processed in order, as a
single counter is incremented in xsk_tx_completed to sigify how many XSK
descriptors have been completed. Because completions can come back out
of order in DQ, a separate queue of XSK descriptors must be maintained.
This queue keeps the pending packets in the order that they were written
so that the descriptors can be counted in xsk_tx_completed in the same
order.

For double completions, a new pending packet state and type are
introduced. The new type, GVE_TX_PENDING_PACKET_DQO_XSK, plays an
anlogous role to pre-existing _SKB and _XDP_FRAME pending packet types
for XSK descriptors. The new state, GVE_PACKET_STATE_XSK_COMPLETE,
represents packets for which no more completions are expected. This
includes packets which have received a packet completion or reinjection
completion, as well as packets whose reinjection completion timer have
timed out. At this point, such packets can be counted as part of
xsk_tx_completed() and freed.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-5-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agogve: keep registry of zc xsk pools in netdev_priv
Joshua Washington [Thu, 17 Jul 2025 15:28:37 +0000 (08:28 -0700)] 
gve: keep registry of zc xsk pools in netdev_priv

Relying on xsk_get_pool_from_qid for getting whether zero copy is
enabled on a queue is erroneous, as an XSK pool is registered in
xp_assign_dev whether AF_XDP zero-copy is enabled or not. This becomes
problematic when queues are restarted in copy mode, as all RX queues
with XSKs will register a pool, causing the driver to exercise the
zero-copy codepath.

This patch adds a bitmap to keep track of which queues have zero-copy
enabled.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-4-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agogve: merge xdp and xsk registration
Joshua Washington [Thu, 17 Jul 2025 15:28:36 +0000 (08:28 -0700)] 
gve: merge xdp and xsk registration

The existence of both of these xdp_rxq and xsk_rxq is redundant. xdp_rxq
can be used in both the zero-copy mode and the copy mode case. XSK pool
memory model registration is prioritized over normal memory model
registration to ensure that memory model registration happens only once
per queue.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-3-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agogve: deduplicate xdp info and xsk pool registration logic
Joshua Washington [Thu, 17 Jul 2025 15:28:35 +0000 (08:28 -0700)] 
gve: deduplicate xdp info and xsk pool registration logic

The XDP registration path currently has a lot of reused logic, leading
changes to the codepaths to be unnecessarily complex. gve_reg_xsk_pool
extracts the logic of registering an XSK pool with a queue into a method
that can be used by both XDP_SETUP_XSK_POOL and gve_reg_xdp_info.
gve_unreg_xdp_info is used to undo XDP info registration in the error
path instead of explicitly unregistering the XDP info, as it is more
complete and idempotent.

This patch will be followed by other changes to the XDP registration
logic, and will simplify those changes due to the use of common methods.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Link: https://patch.msgid.link/20250717152839.973004-2-jeroendb@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 weeks agoMerge branch 'ethtool-rss-support-creating-and-removing-contexts-via-netlink'
Jakub Kicinski [Tue, 22 Jul 2025 01:20:21 +0000 (18:20 -0700)] 
Merge branch 'ethtool-rss-support-creating-and-removing-contexts-via-netlink'

Jakub Kicinski says:

====================
ethtool: rss: support creating and removing contexts via Netlink

This series completes support of RSS configuration via Netlink.
All functionality supported by the IOCTL is now supported by
Netlink. Future series (time allowing) will add:
 - hashing on the flow label, which started this whole thing;
 - pinning the RSS context to a Netlink socket for auto-cleanup.

The first patch is a leftover held back from previous series
to avoid conflicting with Gal's fix.

Next 4 patches refactor existing code to make reusing it for
context creation possible. 2 patches after that add create
and delete commands. Last but not least the test is extended.
====================

Link: https://patch.msgid.link/20250717234343.2328602-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: drv-net: rss_api: context create and delete tests
Jakub Kicinski [Thu, 17 Jul 2025 23:43:43 +0000 (16:43 -0700)] 
selftests: drv-net: rss_api: context create and delete tests

Add test cases for creating and deleting contexts.

  TAP version 13
  1..12
  ok 1 rss_api.test_rxfh_nl_set_fail
  ok 2 rss_api.test_rxfh_nl_set_indir
  ok 3 rss_api.test_rxfh_nl_set_indir_ctx
  ok 4 rss_api.test_rxfh_indir_ntf
  ok 5 rss_api.test_rxfh_indir_ctx_ntf
  ok 6 rss_api.test_rxfh_nl_set_key
  ok 7 rss_api.test_rxfh_fields
  ok 8 rss_api.test_rxfh_fields_set
  ok 9 rss_api.test_rxfh_fields_set_xfrm # SKIP no input-xfrm supported
  ok 10 rss_api.test_rxfh_fields_ntf
  ok 11 rss_api.test_rss_ctx_add
  ok 12 rss_api.test_rss_ctx_ntf
  # Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0

Link: https://patch.msgid.link/20250717234343.2328602-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: rss: support removing contexts via Netlink
Jakub Kicinski [Thu, 17 Jul 2025 23:43:42 +0000 (16:43 -0700)] 
ethtool: rss: support removing contexts via Netlink

Implement removing additional RSS contexts via Netlink.
Technically it'd be possible to shoehorn the delete operation
into ethnl_request_ops-compatible handler. The code ends
up longer than open coded version, and I think we'll need
a custom way of sending notifications at some stage (if we
allow tying the context lifetime to the netlink socket, in
the future).

Link: https://patch.msgid.link/20250717234343.2328602-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: rss: support creating contexts via Netlink
Jakub Kicinski [Thu, 17 Jul 2025 23:43:41 +0000 (16:43 -0700)] 
ethtool: rss: support creating contexts via Netlink

Support creating contexts via Netlink. Setting flow hashing
fields on the new context is not supported at this stage,
it can be added later.

An empty indirection table is not supported. This is a carry
over from the IOCTL interface where empty indirection table
meant delete. We can repurpose empty indirection table in
Netlink but for now to avoid confusion reject it using the
policy.

Support letting user choose the ID for the new context. This was
not possible in IOCTL since the context ID field for the create
action had to be set to the ETH_RXFH_CONTEXT_ALLOC magic value.

Link: https://patch.msgid.link/20250717234343.2328602-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: move ethtool_rxfh_ctx_alloc() to common code
Jakub Kicinski [Thu, 17 Jul 2025 23:43:40 +0000 (16:43 -0700)] 
ethtool: move ethtool_rxfh_ctx_alloc() to common code

Move ethtool_rxfh_ctx_alloc() to common code, Netlink will need it.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: rss: factor out populating response from context
Jakub Kicinski [Thu, 17 Jul 2025 23:43:39 +0000 (16:43 -0700)] 
ethtool: rss: factor out populating response from context

Similarly to previous change, factor out populating the response.
We will use this after the context was allocated to send a notification
so this time factor out from the additional context handling, rather
than context 0 handling (for request context didn't exist, for response
it does).

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: rss: factor out allocating memory for response
Jakub Kicinski [Thu, 17 Jul 2025 23:43:38 +0000 (16:43 -0700)] 
ethtool: rss: factor out allocating memory for response

To ease the code reuse for RSS_CREATE we'll want to prepare
struct rss_reply_data for the new context. Unfortunately
we can't depend on the exiting scaffolding because the context
doesn't exist (ctx=NULL) when we start preparing. Factor out
the portion of the context 0 handling responsible for allocation
of request memory, so that we can call it directly.

Link: https://patch.msgid.link/20250717234343.2328602-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: rejig the RSS notification machinery for more types
Jakub Kicinski [Thu, 17 Jul 2025 23:43:37 +0000 (16:43 -0700)] 
ethtool: rejig the RSS notification machinery for more types

In anticipation for CREATE and DELETE notifications - explicitly
pass the notification type to ethtool_rss_notify(), when calling
from the IOCTL code.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: assert that drivers with sym hash are consistent for RSS contexts
Jakub Kicinski [Thu, 17 Jul 2025 23:43:36 +0000 (16:43 -0700)] 
ethtool: assert that drivers with sym hash are consistent for RSS contexts

Supporting per-RSS context configuration of hashing fields but
not the hashing algorithm would complicate the code a lot.
We'd need to cross check the config against all RSS contexts.
None of the drivers need this today, so explicitly prevent
new drivers with such skewed capabilities from registering.
If such driver appears it will need to first adjust the checks
in the core.

Link: https://patch.msgid.link/20250717234343.2328602-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'mptcp-add-tcp_maxseg-sockopt-support'
Jakub Kicinski [Tue, 22 Jul 2025 00:48:36 +0000 (17:48 -0700)] 
Merge branch 'mptcp-add-tcp_maxseg-sockopt-support'

Matthieu Baerts says:

====================
mptcp: add TCP_MAXSEG sockopt support

The TCP_MAXSEG socket option was not supported by MPTCP, mainly because
it has never been requested before. But there are still valid use-cases,
e.g. with HAProxy.

- Patch 1 is a small cleanup patch in the MPTCP sockopt file.

- Patch 2 expose some code from TCP, to avoid duplicating it in MPTCP.

- Patch 3 adds TCP_MAXSEG sockopt support in MPTCP.

- Patch 4 is not related to the others, it fixes a typo in a comment.

Note that the new TCP_MAXSEG sockopt support has been validated by a new
packetdrill script on the MPTCP CI:

  https://github.com/multipath-tcp/packetdrill/pull/161

v1: https://lore.kernel.org/20250716-net-next-mptcp-tcp_maxseg-v1-0-548d3a5666f6@kernel.org
====================

Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-0-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agomptcp: fix typo in a comment
moyuanhao [Fri, 18 Jul 2025 22:06:59 +0000 (00:06 +0200)] 
mptcp: fix typo in a comment

This patch fixes the follow spelling mistake in a comment:

  greter -> greater

Signed-off-by: moyuanhao <moyuanhao3676@163.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-4-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agomptcp: add TCP_MAXSEG sockopt support
Geliang Tang [Fri, 18 Jul 2025 22:06:58 +0000 (00:06 +0200)] 
mptcp: add TCP_MAXSEG sockopt support

The TCP_MAXSEG socket option is currently not supported by MPTCP, mainly
because it has never been requested before. But there are still valid
use-cases, e.g. with HAProxy.

This patch adds its support in MPTCP by propagating the value to all
subflows. The get part looks at the value on the first subflow, to be as
closed as possible to TCP. Only one value can be returned for the cached
MSS, so this can come only from one subflow.

Similar to mptcp_setsockopt_first_sf_only(), a generic helper
mptcp_setsockopt_all_subflows() is added to set sockopt for each
subflows of the mptcp socket.

Add a new member for struct mptcp_sock to store the TCP_MAXSEG value,
and return this value in getsockopt.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/515
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-3-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotcp: add tcp_sock_set_maxseg
Geliang Tang [Fri, 18 Jul 2025 22:06:57 +0000 (00:06 +0200)] 
tcp: add tcp_sock_set_maxseg

Add a helper tcp_sock_set_maxseg() to directly set the TCP_MAXSEG
sockopt from kernel space.

This new helper will be used in the following patch from MPTCP.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-2-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agomptcp: sockopt: drop redundant tcp_getsockopt
Geliang Tang [Fri, 18 Jul 2025 22:06:56 +0000 (00:06 +0200)] 
mptcp: sockopt: drop redundant tcp_getsockopt

tcp_getsockopt() is called twice in mptcp_getsockopt_first_sf_only() in
different conditions, which makes the code a bit redundant.

The first call to tcp_getsockopt() when the first subflow exists can be
replaced by going to a new label "get" before the second call.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250719-net-next-mptcp-tcp_maxseg-v2-1-8c910fbc5307@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agos390/qeth: Make hw_trap sysfs attribute idempotent
Aswin Karuvally [Fri, 18 Jul 2025 14:17:11 +0000 (16:17 +0200)] 
s390/qeth: Make hw_trap sysfs attribute idempotent

Update qeth driver to allow writing an existing value to the "hw_trap"
sysfs attribute. Attempting such a write earlier resulted in -EINVAL.
In other words, make the sysfs attribute idempotent.

After:
    $ cat hw_trap
    disarm
    $ echo disarm > hw_trap
    $

Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250718141711.1141049-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: phy: qcom: qca807x: Enable WoL support using shared library
Luo Jie [Fri, 18 Jul 2025 13:57:48 +0000 (21:57 +0800)] 
net: phy: qcom: qca807x: Enable WoL support using shared library

The Wake-on-LAN (WoL) functionality for the QCA807x series is identical
to that of the AT8031. WoL support for QCA807x is enabled by utilizing
the at8031_set_wol() function provided in the shared library.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250718-qca807x_wol_support-v1-1-cfe323cbb4e8@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: usb: smsc95xx: add support for ethtool pause parameters
Oleksij Rempel [Fri, 18 Jul 2025 07:51:56 +0000 (09:51 +0200)] 
net: usb: smsc95xx: add support for ethtool pause parameters

Implement ethtool .get_pauseparam and .set_pauseparam handlers for
configuring flow control on smsc95xx. The driver now supports enabling
or disabling transmit and receive pause frames, with or without
autonegotiation. Pause settings are applied during link-up based on
current PHY state and user configuration.

Previously, the driver used phy_get_pause() during link-up handling,
but lacked initialization and an ethtool interface to configure pause
modes. As a result, flow control support was effectively non-functional.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250718075157.297923-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Tue, 22 Jul 2025 00:18:33 +0000 (17:18 -0700)] 
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-07-18 (idpf, ice, igc, igbvf, ixgbevf)

For idpf:
Ahmed and Sudheer add support for flow steering via ntuple filters.
Current support is for IPv4 and TCP/UDP only.

Milena adds support for cross timestamping.

Ahmed preserves coalesce settings across resets.

For ice:
Alex adds reporting of 40GbE speed in devlink port split.

Dawid adds support for E835 devices.

Jesse refactors profile ptype processing for cleaner, more readable,
code.

Dave adds a couple of helper functions for LAG to reduce code
duplication.

For igc:
Siang adds support to configure "Default Queue" during runtime using
ethtool's Network Flow Classification (NFC) wildcard rule approach.

For igbvf:
Yuto Ohnuki removes unused fields from igbvf_adapter.

For ixgbevf:
Yuto Ohnuki removes unused fields from ixgbevf_adapter.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ixgbevf: remove unused fields from struct ixgbevf_adapter
  igbvf: remove unused fields from struct igbvf_adapter
  igc: Add wildcard rule support to ethtool NFC using Default Queue
  igc: Relocate RSS field definitions to igc_defines.h
  ice: breakout common LAG code into helpers
  ice: convert ice_add_prof() to bitmap
  ice: add E835 device IDs
  ice: add 40G speed to Admin Command GET PORT OPTION
  idpf: preserve coalescing settings across resets
  idpf: add cross timestamping
  idpf: add flow steering support
  virtchnl2: add flow steering support
  virtchnl2: rename enum virtchnl2_cap_rss
====================

Link: https://patch.msgid.link/20250718185118.2042772-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: usb: cdc-ncm: check for filtering capability
Oliver Neukum [Thu, 17 Jul 2025 12:06:17 +0000 (14:06 +0200)] 
net: usb: cdc-ncm: check for filtering capability

If the decice does not support filtering, filtering
must not be used and all packets delivered for the
upper layers to sort.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://patch.msgid.link/20250717120649.2090929-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: stmmac: dwmac-renesas-gbeth: Add PM suspend/resume callbacks
Biju Das [Thu, 17 Jul 2025 07:11:06 +0000 (08:11 +0100)] 
net: stmmac: dwmac-renesas-gbeth: Add PM suspend/resume callbacks

Add PM suspend/resume callbacks for RZ/G3E SMARC EVK.

The PM deep entry is executed by pressing the SLEEP button and exit from
entry is by pressing the power button.

Logs:
root@smarc-rzg3e:~# PM: suspend entry (deep)
Filesystems sync: 0.115 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.002 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
printk: Suspending console(s) (use no_console_suspend to debug)
NOTICE:  BL2: v2.10.5(release):2.10.5/rz_soc_dev-162-g7148ba838
NOTICE:  BL2: Built : 14:23:58, Jul  5 2025
NOTICE:  BL2: SYS_LSI_MODE: 0x13e06
NOTICE:  BL2: SYS_LSI_DEVID: 0x8679447
NOTICE:  BL2: SYS_LSI_PRR: 0x0
NOTICE:  BL2: Booting BL31
renesas-gbeth 15c30000.ethernet end0: Link is Down
Disabling non-boot CPUs ...
psci: CPU3 killed (polled 0 ms)
psci: CPU2 killed (polled 0 ms)
psci: CPU1 killed (polled 0 ms)
Enabling non-boot CPUs ...
Detected VIPT I-cache on CPU1
GICv3: CPU1: found redistributor 100 region 0:0x0000000014960000
CPU1: Booted secondary processor 0x0000000100 [0x412fd050]
CPU1 is up
Detected VIPT I-cache on CPU2
GICv3: CPU2: found redistributor 200 region 0:0x0000000014980000
CPU2: Booted secondary processor 0x0000000200 [0x412fd050]
CPU2 is up
Detected VIPT I-cache on CPU3
GICv3: CPU3: found redistributor 300 region 0:0x00000000149a0000
CPU3: Booted secondary processor 0x0000000300 [0x412fd050]
CPU3 is up
dwmac4: Master AXI performs fixed burst length
15c30000.ethernet end0: No Safety Features support found
15c30000.ethernet end0: IEEE 1588-2008 Advanced Timestamp supported
15c30000.ethernet end0: configuring for phy/rgmii-id link mode
dwmac4: Master AXI performs fixed burst length
15c40000.ethernet end1: No Safety Features support found
15c40000.ethernet end1: IEEE 1588-2008 Advanced Timestamp supported
15c40000.ethernet end1: configuring for phy/rgmii-id link mode
OOM killer enabled.
Restarting tasks: Starting
Restarting tasks: Done
random: crng reseeded on system resumption
PM: suspend exit

15c30000.ethernet end0: Link is Up - 1Gbps/Full - flow control rx/tx
root@smarc-rzg3e:~# ifconfig end0 192.168.10.7 up
root@smarc-rzg3e:~# ping 192.168.10.1
PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=2.05 ms
64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=0.928 ms

Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://patch.msgid.link/20250717071109.8213-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'amd-xgbe-add-hardware-ptp-timestamping'
Jakub Kicinski [Mon, 21 Jul 2025 23:40:23 +0000 (16:40 -0700)] 
Merge branch 'amd-xgbe-add-hardware-ptp-timestamping'

Raju Rangoju says:

====================
amd-xgbe: add hardware PTP timestamping

Remove the hwptp abstraction and associated callbacks from the
struct xgbe_hw_if {} and move them to separate file after cleanup.

Adds complete support for hardware-based PTP (IEEE 1588)
timestamping to the AMD XGBE driver.
====================

Link: https://patch.msgid.link/20250718185628.4038779-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoamd-xgbe: add hardware PTP timestamping support
Raju Rangoju [Fri, 18 Jul 2025 18:56:28 +0000 (00:26 +0530)] 
amd-xgbe: add hardware PTP timestamping support

Adds complete support for hardware-based PTP (IEEE 1588)
timestamping to the AMD XGBE driver.

- Initialize and configure the MAC PTP registers based on link
  speed and reference clock.
- Support both 50MHz and 125MHz PTP reference clocks.
- Update the driver interface and version data to support PTP
  clock frequency selection.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250718185628.4038779-3-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoand-xgbe: remove the abstraction for hwptp
Raju Rangoju [Fri, 18 Jul 2025 18:56:27 +0000 (00:26 +0530)] 
and-xgbe: remove the abstraction for hwptp

Remove the hwptp abstraction and associated callbacks from
the struct xgbe_hw_if {}.

The callback structure was only ever assigned a single function, without
null checks. This cleanup inlines the logic and moves all the hwtstamp
realted code a separate file, improving readability and maintainance.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250718185628.4038779-2-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: tc: Add generic erspan_opts matching support for tc-flower
Li Shuang [Fri, 18 Jul 2025 14:16:12 +0000 (22:16 +0800)] 
selftests: tc: Add generic erspan_opts matching support for tc-flower

Add test cases to tc_flower.sh to validate generic matching on ERSPAN
options. Both ERSPAN Type II and Type III are covered.

Also add check_tc_erspan_support() to verify whether tc supports
erspan_opts.

Signed-off-by: Li Shuang <shuali@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/1f354a1afd60f29bbbf02bd60cb52ecfc0b6bd17.1752848172.git.shuali@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: usb: Remove duplicate assignments for net->pcpu_stat_type
Zqiang [Wed, 16 Jul 2025 00:15:24 +0000 (08:15 +0800)] 
net: usb: Remove duplicate assignments for net->pcpu_stat_type

This commit remove duplicate assignments for net->pcpu_stat_type
in usbnet_probe().

Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 weeks agonet/mlx5: Expose cable_length field in PFCC register
Oren Sidi [Thu, 17 Jul 2025 06:48:15 +0000 (09:48 +0300)] 
net/mlx5: Expose cable_length field in PFCC register

Introduce new "cable_length" field in PFCC register and related fields
to enhance rx buffer configuration management:
1. cable_length: Shifts cable length handling to fw by storing a
   manually entered length from user in PFCC.cable_length
2. lane_rate_oper: In a case where PFCC.cable_length is not supported,
   helps compute a default cable length

Signed-off-by: Oren Sidi <osidi@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752734895-257735-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 weeks agonet/mlx5: Add IFC bits and enums for buf_ownership
Oren Sidi [Thu, 17 Jul 2025 06:48:14 +0000 (09:48 +0300)] 
net/mlx5: Add IFC bits and enums for buf_ownership

Extend structure layouts and defines buf_ownership.
buf_ownership indicates whether the buffer is managed by SW or FW.

Signed-off-by: Oren Sidi <osidi@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752734895-257735-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 weeks agonet/mlx5: Add IFC bits to support RSS for IPSec offload
Jianbo Liu [Thu, 17 Jul 2025 06:48:13 +0000 (09:48 +0300)] 
net/mlx5: Add IFC bits to support RSS for IPSec offload

This adds the capabilities, ipsec_next_header and inner/outer
l4_type_ext fields to support RSS for the decrypted packets.

These fields are specifically for firmware steering. HWS validation
logic is updated to correctly handle the changes, ensuring the
unsupported fields are not set.

Besides, reserved_at_c4 is fixed to reserved_at_d4 to reflect the
accurate offset within the structure.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752734895-257735-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
3 weeks agobe2net: Use correct byte order and format string for TCP seq and ack_seq
Alok Tiwari [Thu, 17 Jul 2025 19:35:47 +0000 (12:35 -0700)] 
be2net: Use correct byte order and format string for TCP seq and ack_seq

The TCP header fields seq and ack_seq are 32-bit values in network
byte order as (__be32). these fields were earlier printed using
ntohs(), which converts only 16-bit values and produces incorrect
results for 32-bit fields. This patch is changeing the conversion
to ntohl(), ensuring correct interpretation of these sequence numbers.

Notably, the format specifier is updated from %d to %u to reflect the
unsigned nature of these fields.

improves the accuracy of debug log messages for TCP sequence and
acknowledgment numbers during TX timeouts.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717193552.3648791-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: bcmasp: Add support for re-starting auto-negotiation
Florian Fainelli [Thu, 17 Jul 2025 18:09:15 +0000 (11:09 -0700)] 
net: bcmasp: Add support for re-starting auto-negotiation

Wire-up ethtool_ops::nway_reset to phy_ethtool_nway_reset in order to
support re-starting auto-negotiation.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250717180915.2611890-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-maintain-netif-vs-dev-prefix-semantics'
Jakub Kicinski [Sat, 19 Jul 2025 00:27:51 +0000 (17:27 -0700)] 
Merge branch 'net-maintain-netif-vs-dev-prefix-semantics'

Stanislav Fomichev says:

====================
net: maintain netif vs dev prefix semantics

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate. We care only
about driver-visible APIs, don't touch stack-internal routines.

The rest seem to be ok:
  * dev_xdp_prog_count - mostly called by sw drivers (bonding), should not matter
  * dev_get_by_xxx - too many to reasonably cleanup, already have different flavors
  * dev_fetch_sw_netstats - don't need instance lock
  * dev_get_tstats64 - never called directly, only as an ndo callback
  * dev_pick_tx_zero - never called directly, only as an ndo callback
  * dev_add_pack / dev_remove_pack - called early enough (in module init) to not matter
  * dev_get_iflink - mostly called by sw drivers, should not matter
  * dev_fill_forward_path - ditto
  * dev_getbyhwaddr_rcu - ditto
  * dev_getbyhwaddr - ditto
  * dev_getfirstbyhwtype - ditto
  * dev_valid_name - ditto
  * __dev_forward_skb dev_forward_skb dev_queue_xmit_nit - established helpers, no netif vs dev distinction
====================

Link: https://patch.msgid.link/20250717172333.1288349-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: s/dev_close_many/netif_close_many/
Stanislav Fomichev [Thu, 17 Jul 2025 17:23:33 +0000 (10:23 -0700)] 
net: s/dev_close_many/netif_close_many/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

netif_close_many is used only by vlan/dsa and one mtk driver, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-8-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: s/dev_set_threaded/netif_set_threaded/
Stanislav Fomichev [Thu, 17 Jul 2025 17:23:32 +0000 (10:23 -0700)] 
net: s/dev_set_threaded/netif_set_threaded/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Note that one dev_set_threaded call still remains in mt76 for debugfs file.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-7-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: s/dev_get_flags/netif_get_flags/
Stanislav Fomichev [Thu, 17 Jul 2025 17:23:31 +0000 (10:23 -0700)] 
net: s/dev_get_flags/netif_get_flags/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: s/__dev_set_mtu/__netif_set_mtu/
Stanislav Fomichev [Thu, 17 Jul 2025 17:23:30 +0000 (10:23 -0700)] 
net: s/__dev_set_mtu/__netif_set_mtu/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

__netif_set_mtu is used only by bond, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-5-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: s/dev_pre_changeaddr_notify/netif_pre_changeaddr_notify/
Stanislav Fomichev [Thu, 17 Jul 2025 17:23:29 +0000 (10:23 -0700)] 
net: s/dev_pre_changeaddr_notify/netif_pre_changeaddr_notify/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

netif_pre_changeaddr_notify is used only by ipvlan/bond, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: s/dev_get_mac_address/netif_get_mac_address/
Stanislav Fomichev [Thu, 17 Jul 2025 17:23:28 +0000 (10:23 -0700)] 
net: s/dev_get_mac_address/netif_get_mac_address/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

netif_get_mac_address is used only by tun/tap, so move it into
NETDEV_INTERNAL namespace.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-3-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: s/dev_get_port_parent_id/netif_get_port_parent_id/
Stanislav Fomichev [Thu, 17 Jul 2025 17:23:27 +0000 (10:23 -0700)] 
net: s/dev_get_port_parent_id/netif_get_port_parent_id/

Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests: rtnetlink: Add operational state test
Ido Schimmel [Thu, 17 Jul 2025 12:51:51 +0000 (15:51 +0300)] 
selftests: rtnetlink: Add operational state test

Virtual devices (e.g., VXLAN) that do not have a notion of a carrier are
created with an "UNKNOWN" operational state which some users find
confusing [1].

It is possible to set the operational state from user space either
during device creation or afterwards and some applications will start
doing that in order to avoid the above problem.

Add a test for this functionality to ensure it does not regress.

[1] https://lore.kernel.org/netdev/20241119153703.71f97b76@hermes.local/

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717125151.466882-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: selftests: add PHY-loopback test for bad TCP checksums
Oleksij Rempel [Thu, 17 Jul 2025 08:35:24 +0000 (10:35 +0200)] 
net: selftests: add PHY-loopback test for bad TCP checksums

Detect NICs and drivers that either drop frames with a corrupted TCP
checksum or, worse, pass them up as valid.  The test flips one bit in
the checksum, transmits the packet in internal loopback, and fails when
the driver reports CHECKSUM_UNNECESSARY.

Discussed at:
https://lore.kernel.org/all/20250625132117.1b3264e8@kernel.org/

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250717083524.1645069-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: track pfmemalloc drops via SKB_DROP_REASON_PFMEMALLOC
Jesper Dangaard Brouer [Wed, 16 Jul 2025 16:26:53 +0000 (18:26 +0200)] 
net: track pfmemalloc drops via SKB_DROP_REASON_PFMEMALLOC

Add a new SKB drop reason (SKB_DROP_REASON_PFMEMALLOC) to track packets
dropped due to memory pressure. In production environments, we've observed
memory exhaustion reported by memory layer stack traces, but these drops
were not properly tracked in the SKB drop reason infrastructure.

While most network code paths now properly report pfmemalloc drops, some
protocol-specific socket implementations still use sk_filter() without
drop reason tracking:
- Bluetooth L2CAP sockets
- CAIF sockets
- IUCV sockets
- Netlink sockets
- SCTP sockets
- Unix domain sockets

These remaining cases represent less common paths and could be converted
in a follow-up patch if needed. The current implementation provides
significantly improved observability into memory pressure events in the
network stack, especially for key protocols like TCP and UDP, helping to
diagnose problems in production environments.

Reported-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/175268316579.2407873.11634752355644843509.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: stream: add description for sk_stream_write_space()
Suchit Karunakaran [Wed, 16 Jul 2025 15:34:04 +0000 (21:04 +0530)] 
net: stream: add description for sk_stream_write_space()

Add a proper description for the sk_stream_write_space() function as
previously marked by a FIXME comment.
No functional changes.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Link: https://patch.msgid.link/20250716153404.7385-1-suchitkarunakaran@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoixgbevf: remove unused fields from struct ixgbevf_adapter
Yuto Ohnuki [Thu, 17 Jul 2025 08:46:09 +0000 (09:46 +0100)] 
ixgbevf: remove unused fields from struct ixgbevf_adapter

Remove hw_rx_no_dma_resources and eitr_param fields from struct
ixgbevf_adapter since these fields are never referenced in the driver.

Note that the interrupt throttle rate is controlled by the
rx_itr_setting and tx_itr_setting variables.

This change simplifies the ixgbevf driver by removing unused fields,
which improves maintainability.

Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoigbvf: remove unused fields from struct igbvf_adapter
Yuto Ohnuki [Mon, 7 Jul 2025 18:01:17 +0000 (19:01 +0100)] 
igbvf: remove unused fields from struct igbvf_adapter

Remove following unused fields from struct igbvf_adapter that are never
referenced in the driver.

- blink_timer
- eeprom_wol
- fc_autoneg
- int_mode
- led_status
- mng_vlan_id
- polling_interval
- rx_dma_failed
- test_icr
- test_rx_ring
- test_tx_ring
- tx_dma_failed
- tx_fifo_head
- tx_fifo_size
- tx_head_addr

Also removed the following fields from struct igbvf_adapter since they
are never read or used after initialization by igbvf_probe() and igbvf_sw_init().

- bd_number
- rx_abs_int_delay
- tx_abs_int_delay
- rx_int_delay
- tx_int_delay

This changes simplify the igbvf driver by removing unused fields, which
improves maintenability.

Tested-by: Kohei Enju <enjuk@amazon.com>
Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoigc: Add wildcard rule support to ethtool NFC using Default Queue
Song Yoong Siang [Fri, 20 Jun 2025 10:02:51 +0000 (18:02 +0800)] 
igc: Add wildcard rule support to ethtool NFC using Default Queue

Introduce support for a lowest priority wildcard (catch-all) rule in
ethtool's Network Flow Classification (NFC) for the igc driver. The
wildcard rule directs all unmatched network traffic, including traffic not
captured by Receive Side Scaling (RSS), to a specified queue. This
functionality utilizes the Default Queue feature available in I225/I226
hardware.

The implementation has been validated on Intel ADL-S systems with two
back-to-back connected I226 network interfaces.

Testing Procedure:
1. On the Device Under Test (DUT), verify the initial statistic:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 0
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 0

2. From the Link Partner, send 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

3. On the DUT, verify the packet reception on Queue 0:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 0

4. On the DUT, add a wildcard rule to route all packets to Queue 3:
   $ sudo ethtool -N enp1s0 flow-type ether queue 3

5. From the Link Partner, send another 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

6. Now, packets are routed to Queue 3 by the wildcard (Default Queue) rule:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 0
        rx_queue_2_packets: 0
        rx_queue_3_packets: 10

7. On the DUT, add a EtherType rule to route ARP packet to Queue 1:
   $ sudo ethtool -N enp1s0 flow-type ether proto 0x0806 queue 1

8. From the Link Partner, send another 10 ARP packets:
   $ arping -c 10 -I enp170s0 169.254.1.2

9. Now, packets are routed to Queue 1 by the EtherType rule because it is
   higher priority than the wildcard (Default Queue) rule:
   $ ethtool -S enp1s0 | grep rx_q.*packets
        rx_queue_0_packets: 10
        rx_queue_1_packets: 10
        rx_queue_2_packets: 0
        rx_queue_3_packets: 10

10. On the DUT, delete all the NFC rules:
    $ sudo ethtool -N enp1s0 delete 63
    $ sudo ethtool -N enp1s0 delete 64

11. From the Link Partner, send another 10 ARP packets:
    $ arping -c 10 -I enp170s0 169.254.1.2

12. Now, packets are routed to Queue 0 because the value of Default Queue
    is reset back to 0:
    $ ethtool -S enp1s0 | grep rx_q.*packets
         rx_queue_0_packets: 20
         rx_queue_1_packets: 10
         rx_queue_2_packets: 0
         rx_queue_3_packets: 10

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Co-developed-by: Blanco Alcaine Hector <hector.blanco.alcaine@intel.com>
Signed-off-by: Blanco Alcaine Hector <hector.blanco.alcaine@intel.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoigc: Relocate RSS field definitions to igc_defines.h
Song Yoong Siang [Fri, 20 Jun 2025 10:02:50 +0000 (18:02 +0800)] 
igc: Relocate RSS field definitions to igc_defines.h

Move the RSS field definitions related to IPv4 and IPv6 UDP from igc.h to
igc_defines.h to consolidate the RSS field definitions in a single header
file, improving code organization and maintainability.

This refactoring does not alter the functionality of the driver but
enhances the logical grouping of related constants

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoice: breakout common LAG code into helpers
Dave Ertman [Mon, 16 Jun 2025 11:03:22 +0000 (13:03 +0200)] 
ice: breakout common LAG code into helpers

In the VF handling code, parts of the code for lag can be broken out into
helper functions to reduce code duplication. Break this code out into
helper functions

Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoice: convert ice_add_prof() to bitmap
Jesse Brandeburg [Wed, 18 Jun 2025 11:28:53 +0000 (13:28 +0200)] 
ice: convert ice_add_prof() to bitmap

Previously the ice_add_prof() took an array of u8 and looped over it with
for_each_set_bit(), examining each 8 bit value as a bitmap.
This was just hard to understand and unnecessary, and was triggering
undefined behavior sanitizers with unaligned accesses within bitmap
fields (on our internal tools/builds). Since the @ptype being passed in
was already declared as a bitmap, refactor this to use native types with
the advantage of simplifying the code to use a single loop.

Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
CC: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoice: add E835 device IDs
Dawid Osuchowski [Tue, 20 May 2025 09:30:59 +0000 (11:30 +0200)] 
ice: add E835 device IDs

E835 is an enhanced version of the E830.
It continues to use the same set of commands, registers and interfaces
as other devices in the 800 Series.

Following device IDs are added:
- 0x1248: Intel(R) Ethernet Controller E835-CC for backplane
- 0x1249: Intel(R) Ethernet Controller E835-CC for QSFP
- 0x124A: Intel(R) Ethernet Controller E835-CC for SFP
- 0x1261: Intel(R) Ethernet Controller E835-C for backplane
- 0x1262: Intel(R) Ethernet Controller E835-C for QSFP
- 0x1263: Intel(R) Ethernet Controller E835-C for SFP
- 0x1265: Intel(R) Ethernet Controller E835-L for backplane
- 0x1266: Intel(R) Ethernet Controller E835-L for QSFP
- 0x1267: Intel(R) Ethernet Controller E835-L for SFP

Reviewed-by: Konrad Knitter <konrad.knitter@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoice: add 40G speed to Admin Command GET PORT OPTION
Aleksandr Loktionov [Fri, 16 May 2025 14:42:14 +0000 (14:42 +0000)] 
ice: add 40G speed to Admin Command GET PORT OPTION

Introduce the ICE_AQC_PORT_OPT_MAX_LANE_40G constant and update the code
to process this new option in both the devlink and the Admin Queue Command
GET PORT OPTION (opcode 0x06EA) message, similar to existing constants like
ICE_AQC_PORT_OPT_MAX_LANE_50G, ICE_AQC_PORT_OPT_MAX_LANE_100G, and so on.

This feature allows the driver to correctly report configuration options
for 2x40G on E823 and other cards in the future via devlink.

Example command:
 devlink port split pci/0000:01:00.0/0 count 2

Example dmesg:
 ice 0000:01:00.0: Available port split options and max port speeds (Gbps):
 ice 0000:01:00.0: Status  Split      Quad 0          Quad 1
 ice 0000:01:00.0:         count  L0  L1  L2  L3  L4  L5  L6  L7
 ice 0000:01:00.0:         2      40   -   -   -  40   -   -   -
 ice 0000:01:00.0:         2      50   -  50   -   -   -   -   -
 ice 0000:01:00.0:         4      25  25  25  25   -   -   -   -
 ice 0000:01:00.0:         4      25  25   -   -  25  25   -   -
 ice 0000:01:00.0: Active  8      10  10  10  10  10  10  10  10
 ice 0000:01:00.0:         1     100   -   -   -   -   -   -   -

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoidpf: preserve coalescing settings across resets
Ahmed Zaki [Fri, 20 Jun 2025 17:15:48 +0000 (11:15 -0600)] 
idpf: preserve coalescing settings across resets

The IRQ coalescing config currently reside only inside struct
idpf_q_vector. However, all idpf_q_vector structs are de-allocated and
re-allocated during resets. This leads to user-set coalesce configuration
to be lost.

Add new fields to struct idpf_vport_user_config_data to save the user
settings and re-apply them after reset.

Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoidpf: add cross timestamping
Milena Olech [Wed, 18 Jun 2025 14:42:36 +0000 (16:42 +0200)] 
idpf: add cross timestamping

Add cross timestamp support through virtchnl mailbox messages and directly,
through PCIe BAR registers. Cross timestamping assumes that both system
time and device clock time values are cached simultaneously, what is
triggered by HW. Feature is enabled for both ARM and x86 archs.

Signed-off-by: Milena Olech <milena.olech@intel.com>
Reviewed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoidpf: add flow steering support
Ahmed Zaki [Wed, 23 Apr 2025 19:27:05 +0000 (13:27 -0600)] 
idpf: add flow steering support

Use the new virtchnl2 OP codes to communicate with the Control Plane to
add flow steering filters. We add the basic functionality for add/delete
with TCP/UDP IPv4 only. Support for other OP codes and protocols will be
added later.

Standard 'ethtool -N|--config-ntuple' should be used, for example:

    # ethtool -N ens801f0d1 flow-type tcp4 src-ip 10.0.0.1 action 6

to route all IPv4/TCP traffic from IP 10.0.0.1 to queue 6.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agovirtchnl2: add flow steering support
Sudheer Mogilappagari [Wed, 23 Apr 2025 19:27:04 +0000 (13:27 -0600)] 
virtchnl2: add flow steering support

Add opcodes and corresponding message structure to add and delete
flow steering rules. Flow steering enables configuration
of rules to take an action or subset of actions based on a match
criteria. Actions could be redirect to queue, redirect to queue
group, drop packet or mark.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Co-developed-by: Dinesh Kumar <dinesh.kumar@intel.com>
Signed-off-by: Dinesh Kumar <dinesh.kumar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agovirtchnl2: rename enum virtchnl2_cap_rss
Ahmed Zaki [Wed, 23 Apr 2025 19:27:03 +0000 (13:27 -0600)] 
virtchnl2: rename enum virtchnl2_cap_rss

The "enum virtchnl2_cap_rss" will be used for negotiating flow
steering capabilities. Instead of adding a new enum, rename
virtchnl2_cap_rss to virtchnl2_flow_types. Also rename the enum's
constants.

Flow steering will use this enum in the next patches.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 weeks agoet131x: Add missing check after DMA map
Thomas Fourier [Wed, 16 Jul 2025 09:47:30 +0000 (11:47 +0200)] 
et131x: Add missing check after DMA map

The DMA map functions can fail and should be tested for errors.
If the mapping fails, unmap and return an error.

Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Acked-by: Mark Einon <mark.einon@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250716094733.28734-2-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet: ag71xx: Add missing check after DMA map
Thomas Fourier [Wed, 16 Jul 2025 09:57:25 +0000 (11:57 +0200)] 
net: ag71xx: Add missing check after DMA map

The DMA map functions can fail and should be tested for errors.

Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250716095733.37452-3-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoselftests/drivers/net: Support ipv6 for napi_id test
Tianyi Cui [Thu, 17 Jul 2025 01:19:13 +0000 (18:19 -0700)] 
selftests/drivers/net: Support ipv6 for napi_id test

Add support for IPv6 environment for napi_id test.

Test Plan:

    ./run_kselftest.sh -t drivers/net:napi_id.py
    TAP version 13
    1..1
    # timeout set to 45
    # selftests: drivers/net: napi_id.py
    # TAP version 13
    # 1..1
    # ok 1 napi_id.test_napi_id
    # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
    ok 1 selftests: drivers/net: napi_id.py

Signed-off-by: Tianyi Cui <1997cui@gmail.com>
Link: https://patch.msgid.link/20250717011913.1248816-1-1997cui@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoibmvnic: Use ndo_get_stats64 to fix inaccurate SAR reporting
Mingming Cao [Wed, 16 Jul 2025 15:21:15 +0000 (11:21 -0400)] 
ibmvnic: Use ndo_get_stats64 to fix inaccurate SAR reporting

VNIC testing on multi-core Power systems showed SAR stats drift
and packet rate inconsistencies under load.

Implements ndo_get_stats64 to provide safe aggregation of queue-level
atomic64 counters into rtnl_link_stats64 for use by tools like 'ip -s',
'ifconfig', and 'sar'. Switch to ndo_get_stats64 to align SAR reporting
with the standard kernel interface for retrieving netdev stats.

This removes redundant per-adapter stat updates, reduces overhead,
eliminates cacheline bouncing from hot path updates, and improves
the accuracy of reported packet rates.

Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
----
Changes since v3:
link to v3: https://www.spinics.net/lists/netdev/msg1107999.html
-- keep per queue counters as u64 (this patch) and drop off patch 1 in v3

Link: https://patch.msgid.link/20250716152115.61143-1-mmc@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agoMerge branch 'net-mlx5-misc-changes-2025-07-16'
Jakub Kicinski [Fri, 18 Jul 2025 01:53:11 +0000 (18:53 -0700)] 
Merge branch 'net-mlx5-misc-changes-2025-07-16'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-07-16

This series contains misc enhancements to the mlx5 driver.

v1: https://lore.kernel.org/1752471585-18053-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/1752675472-201445-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/mlx5e: Properly access RCU protected qdisc_sleeping variable
Leon Romanovsky [Wed, 16 Jul 2025 14:17:49 +0000 (17:17 +0300)] 
net/mlx5e: Properly access RCU protected qdisc_sleeping variable

qdisc_sleeping variable is declared as "struct Qdisc __rcu" and
as such needs proper annotation while accessing it.

Without rtnl_dereference(), the following error is generated by sparse:

drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning:
  incorrect type in initializer (different address spaces)
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40:    expected
  struct Qdisc *qdisc
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40:    got struct
  Qdisc [noderef] __rcu *qdisc_sleeping

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/mlx5e: fix kdoc warning on eswitch.h
Moshe Shemesh [Wed, 16 Jul 2025 14:17:48 +0000 (17:17 +0300)] 
net/mlx5e: fix kdoc warning on eswitch.h

Fix the following kdoc warning:
git ls-files *.[ch] | egrep drivers/net/ethernet/mellanox/mlx5/core/ |\
xargs scripts/kernel-doc --none
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:824: warning: cannot
understand function prototype: 'struct mlx5_esw_event_info '

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 weeks agonet/mlx5: HWS, Enable IPSec hardware offload in legacy mode
Lama Kayal [Wed, 16 Jul 2025 14:17:47 +0000 (17:17 +0300)] 
net/mlx5: HWS, Enable IPSec hardware offload in legacy mode

IPSec hardware offload in legacy mode should not be affected by the
steering mode, hence it should also work properly with hmfs mode.

Remove steering mode validation when calculating the cap for packet
offload, this will also enable the missing cap MLX5_IPSEC_CAP_PRIO
needed for crypto offload.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1752675472-201445-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>