]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
4 weeks agonet: stmmac: ingenic: prep PHY_INTF_SEL_x field after switch()
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:55 +0000 (08:28 +0000)] 
net: stmmac: ingenic: prep PHY_INTF_SEL_x field after switch()

Move the preparation of the PHY_INTF_SEL_x bitfield out of the switch()
statement such that it only appears once.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHq3-0000000DjrD-1u8O@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: stmmac: ingenic: use PHY_INTF_SEL_x directly
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:50 +0000 (08:28 +0000)] 
net: stmmac: ingenic: use PHY_INTF_SEL_x directly

Use the PHY_INTF_SEL_x values directly in each of the mac_set_mode
methods rather than the driver private MACPHYC_PHY_INFT_x definitions.
Remove the MACPHYC_PHY_INFT_x definitions.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpy-0000000Djr7-1R1m@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: stmmac: ingenic: use PHY_INTF_SEL_x to select PHY interface
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:45 +0000 (08:28 +0000)] 
net: stmmac: ingenic: use PHY_INTF_SEL_x to select PHY interface

Use the common dwmac definitions for the PHY interface selection field.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpt-0000000Djr1-0wwr@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: stmmac: ingenic: simplify jz4775 mac_set_mode()
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:40 +0000 (08:28 +0000)] 
net: stmmac: ingenic: simplify jz4775 mac_set_mode()

All paths configure the transmit clock as an input. Move this out of
the switch() statement to simplify the code.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpo-0000000Djqv-0RD4@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: stmmac: ingenic: move ingenic_mac_init()
Russell King (Oracle) [Fri, 7 Nov 2025 08:28:34 +0000 (08:28 +0000)] 
net: stmmac: ingenic: move ingenic_mac_init()

Move ingenic_mac_init() to between variant specific set_mode()
implementations and ingenic_mac_probe(). No code changes.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vHHpi-0000000Djqp-4910@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agousbnet: Add support for Byte Queue Limits (BQL)
Simon Schippers [Thu, 6 Nov 2025 17:56:15 +0000 (18:56 +0100)] 
usbnet: Add support for Byte Queue Limits (BQL)

In the current implementation, usbnet uses a fixed tx_qlen of:

USB2: 60 * 1518 bytes = 91.08 KB
USB3: 60 * 5 * 1518 bytes = 454.80 KB

Such large transmit queues can be problematic, especially for cellular
modems. For example, with a typical celluar link speed of 10 Mbit/s, a
fully occupied USB3 transmit queue results in:

454.80 KB / (10 Mbit/s / 8 bit/byte) = 363.84 ms

of additional latency.

This patch adds support for Byte Queue Limits (BQL) [1] to dynamically
manage the transmit queue size and reduce latency without sacrificing
throughput.

Testing was performed on various devices using the usbnet driver for
packet transmission:

- DELOCK 66045: USB3 to 2.5 GbE adapter (ax88179_178a)
- DELOCK 61969: USB2 to 1 GbE adapter (asix)
- Quectel RM520: 5G modem (qmi_wwan)
- USB2 Android tethering (cdc_ncm)

No performance degradation was observed for iperf3 TCP or UDP traffic,
while latency for a prioritized ping application was significantly
reduced. For example, using the USB3 to 2.5 GbE adapter, which was fully
utilized by iperf3 UDP traffic, the prioritized ping was improved from
1.6 ms to 0.6 ms. With the same setup but with a 100 Mbit/s Ethernet
connection, the prioritized ping was improved from 35 ms to 5 ms.

[1] https://lwn.net/Articles/469652/

Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106175615.26948-1-simon.schippers@tu-dortmund.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agotg3: Fix num of RX queues being reported by ethtool
Breno Leitao [Fri, 7 Nov 2025 10:36:59 +0000 (02:36 -0800)] 
tg3: Fix num of RX queues being reported by ethtool

Using num_online_cpus() to report number of queues is actually not
correct, as reported by Michael[1].

netif_get_num_default_rss_queues() was used to replace num_online_cpus()
in the past, but tg3 ethtool callbacks didn't get converted. Doing it
now.

Link: https://lore.kernel.org/all/CACKFLim7ruspmqvjr6bNRq5Z_XXVk3vVaLZOons7kMCzsEG23A@mail.gmail.com/#t
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251107-tg3_counts-v1-1-337fe5c8ccb7@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge branch 'net-dsa-b53-add-support-for-bcm5389-97-98-and-bcm63xx-arl-formats'
Jakub Kicinski [Tue, 11 Nov 2025 01:11:09 +0000 (17:11 -0800)] 
Merge branch 'net-dsa-b53-add-support-for-bcm5389-97-98-and-bcm63xx-arl-formats'

Jonas Gorski says:

====================
net: dsa: b53: add support for BCM5389/97/98 and BCM63XX ARL formats

Currently b53 assumes that all switches apart from BCM5325/5365 use the
same ARL formats, but there are actually multiple formats in use.

Older switches use a format apparently introduced with BCM5387/BCM5389,
while newer chips use a format apparently introduced with BCM5395.

Note that these numbers are not linear, BCM5397/BCM5398 use the older
format.

In addition to that the switches integrated into BCM63XX SoCs use their
own format. While accessing these normal read/write ARL entries are the
same format as BCM5389 one, the search format is different.

So in order to support all these different format, split all code
accessing these entries into chip-family specific functions, and collect
them in appropriate arl ops structs to keep the code cleaner.

Sent as net-next since the ARL accesses have never worked before, and
the extensive refactoring might be too much to warrant a fix.
====================

Link: https://patch.msgid.link/20251107080749.26936-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: add support for bcm63xx ARL entry format
Jonas Gorski [Fri, 7 Nov 2025 08:07:49 +0000 (09:07 +0100)] 
net: dsa: b53: add support for bcm63xx ARL entry format

The ARL registers of BCM63XX embedded switches are somewhat unique. The
normal ARL table access registers have the same format as BCM5389, but
the ARL search registers differ:

* SRCH_CTL is at the same offset of BCM5389, but 16 bits wide. It does
  not have more fields, just needs to be accessed by a 16 bit read.
* SRCH_RSLT_MACVID and SRCH_RSLT are aligned to 32 bit, and have shifted
  offsets.
* SRCH_RSLT has a different format than the normal ARL data entry
  register.
* There is only one set of ENTRY_N registers, implying a 1 bin layout.

So add appropriate ops for bcm63xx and let it use it.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-9-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: add support for 5389/5397/5398 ARL entry format
Jonas Gorski [Fri, 7 Nov 2025 08:07:48 +0000 (09:07 +0100)] 
net: dsa: b53: add support for 5389/5397/5398 ARL entry format

BCM5389, BCM5397 and BCM5398 use a different ARL entry format with just
a 16 bit fwdentry register, as well as different search control and data
offsets.

So add appropriate ops for them and switch those chips to use them.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-8-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: move ARL entry functions into ops struct
Jonas Gorski [Fri, 7 Nov 2025 08:07:47 +0000 (09:07 +0100)] 
net: dsa: b53: move ARL entry functions into ops struct

Now that the differences in ARL entry formats are neatly contained into
functions per chip family, wrap them into an ops struct and add wrapper
functions to access them.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-7-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: split reading search entry into their own functions
Jonas Gorski [Fri, 7 Nov 2025 08:07:46 +0000 (09:07 +0100)] 
net: dsa: b53: split reading search entry into their own functions

Split reading search entries into a function for each format.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-6-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: provide accessors for accessing ARL_SRCH_CTL
Jonas Gorski [Fri, 7 Nov 2025 08:07:45 +0000 (09:07 +0100)] 
net: dsa: b53: provide accessors for accessing ARL_SRCH_CTL

In order to more easily support more formats, move accessing
ARL_SRCH_CTL into helper functions to contain the differences.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-5-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: move writing ARL entries into their own functions
Jonas Gorski [Fri, 7 Nov 2025 08:07:44 +0000 (09:07 +0100)] 
net: dsa: b53: move writing ARL entries into their own functions

Move writing ARL entries into individual functions for each format.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-4-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: move reading ARL entries into their own function
Jonas Gorski [Fri, 7 Nov 2025 08:07:43 +0000 (09:07 +0100)] 
net: dsa: b53: move reading ARL entries into their own function

Instead of duplicating the whole code iterating over all bins for
BCM5325, factor out reading and parsing the entry into its own
functions, and name it the modern one after the first chip with that ARL
format, (BCM53)95.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-3-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: dsa: b53: b53_arl_read{,25}(): use the entry for comparision
Jonas Gorski [Fri, 7 Nov 2025 08:07:42 +0000 (09:07 +0100)] 
net: dsa: b53: b53_arl_read{,25}(): use the entry for comparision

Align the b53_arl_read{,25}() functions by consistently using the
parsed arl entry instead of parsing the raw registers again.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251107080749.26936-2-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Tue, 11 Nov 2025 00:43:51 +0000 (16:43 -0800)] 
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-11-10

We've added 19 non-merge commits during the last 3 day(s) which contain
a total of 22 files changed, 1345 insertions(+), 197 deletions(-).

The main changes are:

1) Preserve skb metadata after a TC BPF program has changed the skb,
   from Jakub Sitnicki.
   This allows a TC program at the end of a TC filter chain to still see
   the skb metadata, even if another TC program at the front of the chain
   has changed the skb using BPF helpers.

2) Initial af_smc bpf_struct_ops support to control the smc specific
   syn/synack options, from D. Wythe.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  bpf/selftests: Add selftest for bpf_smc_hs_ctrl
  net/smc: bpf: Introduce generic hook for handshake flow
  bpf: Export necessary symbols for modules with struct_ops
  selftests/bpf: Cover skb metadata access after bpf_skb_change_proto
  selftests/bpf: Cover skb metadata access after change_head/tail helper
  selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
  selftests/bpf: Cover skb metadata access after vlan push/pop helper
  selftests/bpf: Expect unclone to preserve skb metadata
  selftests/bpf: Dump skb metadata on verification failure
  selftests/bpf: Verify skb metadata in BPF instead of userspace
  bpf: Make bpf_skb_change_head helper metadata-safe
  bpf: Make bpf_skb_change_proto helper metadata-safe
  bpf: Make bpf_skb_adjust_room metadata-safe
  bpf: Make bpf_skb_vlan_push helper metadata-safe
  bpf: Make bpf_skb_vlan_pop helper metadata-safe
  vlan: Make vlan_remove_tag return nothing
  bpf: Unclone skb head on bpf_dynptr_write to skb metadata
  net: Preserve metadata on pskb_expand_head
  net: Helper to move packet data and metadata after skb_push/pull
====================

Link: https://patch.msgid.link/20251110232427.3929291-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agonet: ravb: Correct bad check of timestamp control flags
Niklas Söderlund [Fri, 7 Nov 2025 20:01:00 +0000 (21:01 +0100)] 
net: ravb: Correct bad check of timestamp control flags

When converting the Renesas network drivers to use flags from enum
hwtstamp_rx_filters to control when to timestamp packages instead of a
driver specific schema with bit-wise flags an error was made.

The bit-wise driver specific flags correct logic to set get_ts was:

  q: RAVB_BE + tstamp_rx_ctrl: 0 => 0
  q: RAVB_NC + tstamp_rx_ctrl: 0 => 0
  q: RAVB_BE + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_V2_L2_EVENT => 0
  q: RAVB_NC + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_V2_L2_EVENT => 1
  q: RAVB_BE + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_ALL => 1
  q: RAVB_NC + tstamp_rx_ctrl: RAVB_RXTSTAMP_TYPE_ALL => 1

The converted logic to use enum flags mapped tstamp_rx_ctrl as

  0 to HWTSTAMP_FILTER_NONE
  RAVB_RXTSTAMP_TYPE_V2_L2_EVENT to HWTSTAMP_FILTER_PTP_V2_L2_EVENT
  RAVB_RXTSTAMP_TYPE_ALL to HWTSTAMP_FILTER_ALL

But the logic was incorrectly changed to:

  q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_NONE => 1 (error)
  q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_NONE => 0
  q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_PTP_V2_L2_EVENT => 0
  q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_PTP_V2_L2_EVENT => 1
  q: RAVB_BE + tstamp_rx_ctrl: HWTSTAMP_FILTER_ALL => 1
  q: RAVB_NC + tstamp_rx_ctrl: HWTSTAMP_FILTER_ALL => 0 (error)

This change restores the converted flag check to the correct logic of
the bit-wise driver specific flags.

Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/linux-renesas-soc/aQ4xSv9629XF-Bt3@horms.kernel.org/
Fixes: 16e2e6cf75e6 ("net: ravb: Use common defines for time stamping control")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251107200100.3637869-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agoptp: ocp: Document sysfs output format for backward compatibility
Zhongqiu Han [Fri, 7 Nov 2025 07:45:33 +0000 (15:45 +0800)] 
ptp: ocp: Document sysfs output format for backward compatibility

Add a comment to ptp_ocp_tty_show() explaining that the sysfs output
intentionally does not include a trailing newline. This is required for
backward compatibility with existing userspace software that reads the
sysfs attribute and uses the value directly as a device path.

A previous attempt to add a newline to align with common kernel
conventions broke userspace applications that were opening device paths
like "/dev/ttyS4\n" instead of "/dev/ttyS4", resulting in ENOENT errors.

This comment prevents future attempts to "fix" this behavior, which would
break existing userspace applications.

Link: https://lore.kernel.org/netdev/20251030124519.1828058-1-zhongqiu.han@oss.qualcomm.com/
Link: https://lore.kernel.org/netdev/aef3b850-5f38-4c28-a018-3b0006dc2f08@linux.dev/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251107074533.416048-1-zhongqiu.han@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 weeks agosctp: Don't inherit do_auto_asconf in sctp_clone_sock().
Kuniyuki Iwashima [Thu, 6 Nov 2025 22:34:06 +0000 (22:34 +0000)] 
sctp: Don't inherit do_auto_asconf in sctp_clone_sock().

syzbot reported list_del(&sp->auto_asconf_list) corruption
in sctp_destroy_sock().

The repro calls setsockopt(SCTP_AUTO_ASCONF, 1) to a SCTP
listener, calls accept(), and close()s the child socket.

setsockopt(SCTP_AUTO_ASCONF, 1) sets sp->do_auto_asconf
to 1 and links sp->auto_asconf_list to a per-netns list.

Both fields are placed after sp->pd_lobby in struct sctp_sock,
and sctp_copy_descendant() did not copy the fields before the
cited commit.

Also, sctp_clone_sock() did not set them explicitly.

In addition, sctp_auto_asconf_init() is called from
sctp_sock_migrate(), but it initialises the fields only
conditionally.

The two fields relied on __GFP_ZERO added in sk_alloc(),
but sk_clone() does not use it.

Let's clear newsp->do_auto_asconf in sctp_clone_sock().

[0]:
list_del corruption. prev->next should be ffff8880799e9148, but was ffff8880799e8808. (prev=ffff88803347d9f8)
kernel BUG at lib/list_debug.c:64!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6008 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
RIP: 0010:__list_del_entry_valid_or_report+0x15a/0x190 lib/list_debug.c:62
Code: e8 7b 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 7c ee 92 fd 49 8b 17 48 c7 c7 80 0a bf 8b 48 89 de 4c 89 f9 e8 07 c6 94 fc 90 <0f> 0b 4c 89 f7 e8 4c 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 4d
RSP: 0018:ffffc90003067ad8 EFLAGS: 00010246
RAX: 000000000000006d RBX: ffff8880799e9148 RCX: b056988859ee6e00
RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffc90003067807 R09: 1ffff9200060cf00
R10: dffffc0000000000 R11: fffff5200060cf01 R12: 1ffff1100668fb3f
R13: dffffc0000000000 R14: ffff88803347d9f8 R15: ffff88803347d9f8
FS:  00005555823e5500(0000) GS:ffff88812613e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000480 CR3: 00000000741ce000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __list_del_entry_valid include/linux/list.h:132 [inline]
 __list_del_entry include/linux/list.h:223 [inline]
 list_del include/linux/list.h:237 [inline]
 sctp_destroy_sock+0xb4/0x370 net/sctp/socket.c:5163
 sk_common_release+0x75/0x310 net/core/sock.c:3961
 sctp_close+0x77e/0x900 net/sctp/socket.c:1550
 inet_release+0x144/0x190 net/ipv4/af_inet.c:437
 __sock_release net/socket.c:662 [inline]
 sock_close+0xc3/0x240 net/socket.c:1455
 __fput+0x44c/0xa70 fs/file_table.c:468
 task_work_run+0x1d4/0x260 kernel/task_work.c:227
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43
 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
 do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 16942cf4d3e3 ("sctp: Use sk_clone() in sctp_accept().")
Reported-by: syzbot+ba535cb417f106327741@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/690d2185.a70a0220.22f260.000e.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251106223418.1455510-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-smc-introduce-smc_hs_ctrl'
Martin KaFai Lau [Mon, 10 Nov 2025 19:10:09 +0000 (11:10 -0800)] 
Merge branch 'net-smc-introduce-smc_hs_ctrl'

D. Wythe says:

====================
net/smc: Introduce smc_hs_ctrl

This patch aims to introduce BPF injection capabilities for SMC and
includes a self-test to ensure code stability.

Since the SMC protocol isn't ideal for every situation, especially
short-lived ones, most applications can't guarantee the absence of
such scenarios. Consequently, applications may need specific strategies
to decide whether to use SMC. For example, an application might limit SMC
usage to certain IP addresses or ports.

To maintain the principle of transparent replacement, we want applications
to remain unaffected even if they need specific SMC strategies. In other
words, they should not require recompilation of their code.

Additionally, we need to ensure the scalability of strategy implementation.
While using socket options or sysctl might be straightforward, it could
complicate future expansions.

Fortunately, BPF addresses these concerns effectively. Users can write
their own strategies in eBPF to determine whether to use SMC, and they can
easily modify those strategies in the future.

This is a rework of the series from [1]. Changes since [1] are limited to
the SMC parts:

1. Rename smc_ops to smc_hs_ctrl and change interface name.
2. Squash SMC patches, removing standalone non-BPF hook capability.
3. Fix typos

[1]: https://lore.kernel.org/bpf/20250123015942.94810-1-alibuda@linux.alibaba.com/#t

v2 -> v1:
  - Removed the fixes patch, which have already been merged on current branch.
  - Fixed compilation warning of smc_call_hsbpf() when CONFIG_SMC_HS_CTRL_BPF
    is not enabled.
  - Changed the default value of CONFIG_SMC_HS_CTRL_BPF to Y.
  - Fix typo and renamed some variables

v3 -> v2:
  - Removed the libbpf patch, which have already been merged on current branch.
  - Fixed sparse warning of smc_call_hsbpf() and xchg().

v4 -> v3:
   - Rebased on latest bpf-next, updated SMC loopback config from SMC_LO to DIBS_LO
     per upstream changes.

v5 -> v4:
    - Removed the redundant sk parameter from smc_call_hsbpf
    - Reject registration when bpf_link is set, link support will be added in the
      future.
    - Updated selftests with new test heplers.
====================

Link: https://patch.msgid.link/20251107035632.115950-1-alibuda@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
5 weeks agobpf/selftests: Add selftest for bpf_smc_hs_ctrl
D. Wythe [Fri, 7 Nov 2025 03:56:32 +0000 (11:56 +0800)] 
bpf/selftests: Add selftest for bpf_smc_hs_ctrl

This tests introduces a tiny smc_hs_ctrl for filtering SMC connections
based on IP pairs, and also adds a realistic topology model to verify it.

Also, we can only use SMC loopback under CI test, so an additional
configuration needs to be enabled.

Follow the steps below to run this test.

make -C tools/testing/selftests/bpf
cd tools/testing/selftests/bpf
sudo ./test_progs -t smc

Results shows:
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20251107035632.115950-4-alibuda@linux.alibaba.com
5 weeks agonet/smc: bpf: Introduce generic hook for handshake flow
D. Wythe [Fri, 7 Nov 2025 03:56:31 +0000 (11:56 +0800)] 
net/smc: bpf: Introduce generic hook for handshake flow

The introduction of IPPROTO_SMC enables eBPF programs to determine
whether to use SMC based on the context of socket creation, such as
network namespaces, PID and comm name, etc.

As a subsequent enhancement, to introduce a new generic hook that
allows decisions on whether to use SMC or not at runtime, including
but not limited to local/remote IP address or ports.

User can write their own implememtion via bpf_struct_ops now to choose
whether to use SMC or not before TCP 3rd handshake to be comleted.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20251107035632.115950-3-alibuda@linux.alibaba.com
5 weeks agobpf: Export necessary symbols for modules with struct_ops
D. Wythe [Fri, 7 Nov 2025 03:56:30 +0000 (11:56 +0800)] 
bpf: Export necessary symbols for modules with struct_ops

Exports three necessary symbols for implementing struct_ops with
tristate subsystem.

To hold or release refcnt of struct_ops refcnt by inline funcs
bpf_try_module_get and bpf_module_put which use bpf_struct_ops_get(put)
conditionally.

And to copy obj name from one to the other with effective checks by
bpf_obj_name_cpy.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251107035632.115950-2-alibuda@linux.alibaba.com
5 weeks agoMerge branch 'make-tc-bpf-helpers-preserve-skb-metadata'
Martin KaFai Lau [Mon, 10 Nov 2025 18:52:33 +0000 (10:52 -0800)] 
Merge branch 'make-tc-bpf-helpers-preserve-skb-metadata'

Jakub Sitnicki says:

====================
Make TC BPF helpers preserve skb metadata

Changes in v4:
- Fix copy-paste bug in check_metadata() test helper (AI review)
- Add "out of scope" section (at the bottom)
- Link to v3: https://lore.kernel.org/r/20251026-skb-meta-rx-path-v3-0-37cceebb95d3@cloudflare.com

Changes in v3:
- Use the already existing BPF_STREAM_STDERR const in tests (Martin)
- Unclone skb head on bpf_dynptr_write to skb metadata (patch 3) (Martin)
- Swap order of patches 1 & 2 to refer to skb_postpush_data_move() in docs
- Mention in skb_data_move() docs how to move just the metadata
- Note in pskb_expand_head() docs to move metadata after skb_push() (Jakub)
- Link to v2: https://lore.kernel.org/r/20251019-skb-meta-rx-path-v2-0-f9a58f3eb6d6@cloudflare.com

Changes in v2:
- Tweak WARN_ON_ONCE check in skb_data_move() (patch 2)
- Convert all tests to verify skb metadata in BPF (patches 9-10)
- Add test coverage for modified BPF helpers (patches 12-15)
- Link to RFCv1: https://lore.kernel.org/r/20250929-skb-meta-rx-path-v1-0-de700a7ab1cb@cloudflare.com

This patch set continues our work [1] to allow BPF programs and user-space
applications to attach multiple bytes of metadata to packets via the
XDP/skb metadata area.

The focus of this patch set it to ensure that skb metadata remains intact
when packets pass through a chain of TC BPF programs that call helpers
which operate on skb head.

Currently, several helpers that either adjust the skb->data pointer or
reallocate skb->head do not preserve metadata at its expected location,
that is immediately in front of the MAC header. These are:

- bpf_skb_adjust_room
- bpf_skb_change_head
- bpf_skb_change_proto
- bpf_skb_change_tail
- bpf_skb_vlan_pop
- bpf_skb_vlan_push

In TC BPF context, metadata must be moved whenever skb->data changes to
keep the skb->data_meta pointer valid. I don't see any way around
it. Creative ideas how to avoid that would be very welcome.

With that in mind, we can patch the helpers in at least two different ways:

1. Integrate metadata move into header move

   Replace the existing memmove, which follows skb_push/pull, with a helper
   that moves both headers and metadata in a single call. This avoids an
   extra memmove but reduces transparency.

        skb_pull(skb, len);
-       memmove(skb->data, skb->data - len, n);
+       skb_postpull_data_move(skb, len, n);
        skb->mac_header += len;

        skb_push(skb, len)
-       memmove(skb->data, skb->data + len, n);
+       skb_postpush_data_move(skb, len, n);
        skb->mac_header -= len;

2. Move metadata separately

   Add a dedicated metadata move after the header move. This is more
   explicit but costs an additional memmove.

        skb_pull(skb, len);
        memmove(skb->data, skb->data - len, n);
+       skb_metadata_postpull_move(skb, len);
        skb->mac_header += len;

        skb_push(skb, len)
+       skb_metadata_postpush_move(skb, len);
        memmove(skb->data, skb->data + len, n);
        skb->mac_header -= len;

This patch set implements option (1), expecting that "you can have just one
memmove" will be the most obvious feedback, while readability is a,
somewhat subjective, matter of taste, which I don't claim to have ;-)

The structure of the patch set is as follows:

- patches 1-4 prepare ground for safe-proofing the BPF helpers
- patches 5-9 modify the BPF helpers to preserve skb metadata
- patches 10-11 prepare ground for metadata tests with BPF helper calls
- patches 12-16 adapt and expand tests to cover the made changes

Out of scope for this series:
- safe-proofing tunnel & tagging devices - VLAN, GRE, ...
  (next in line, in development preview at [2])
- metadata access after packet foward
  (to do after Rx path - once metadata reliably reaches sk_filter)

Thanks,
-jkbs

[1] https://lore.kernel.org/all/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com/
[2] https://github.com/jsitnicki/linux/commits/skb-meta/safeproof-netdevs/
====================

Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-0-5ceb08a9b37b@cloudflare.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
5 weeks agoselftests/bpf: Cover skb metadata access after bpf_skb_change_proto
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:53 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after bpf_skb_change_proto

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_proto(), which modifies packet headroom to accommodate
different IP header sizes.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-16-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Cover skb metadata access after change_head/tail helper
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:52 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after change_head/tail helper

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_head() and bpf_skb_change_tail(), which modify packet
headroom/tailroom and can trigger head reallocation.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-15-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:51 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_adjust_room(), which modifies the packet headroom and can trigger
head reallocation.

The helper expects an Ethernet frame carrying an IP packet so switch test
packet identification by source MAC address since we can no longer rely on
Ethernet proto being set to zero.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-14-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Cover skb metadata access after vlan push/pop helper
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:50 +0000 (21:19 +0100)] 
selftests/bpf: Cover skb metadata access after vlan push/pop helper

Add a test to verify that skb metadata remains accessible after calling
bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet
headroom.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-13-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Expect unclone to preserve skb metadata
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:49 +0000 (21:19 +0100)] 
selftests/bpf: Expect unclone to preserve skb metadata

Since pskb_expand_head() no longer clears metadata on unclone, update tests
for cloned packets to expect metadata to remain intact.

Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests.
Creating an r/w dynptr slice is sufficient to trigger an unclone in the
prologue, so remove the extraneous writes to the data/meta slice.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-12-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Dump skb metadata on verification failure
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:48 +0000 (21:19 +0100)] 
selftests/bpf: Dump skb metadata on verification failure

Add diagnostic output when metadata verification fails to help with
troubleshooting test failures. Introduce a check_metadata() helper that
prints both expected and received metadata to the BPF program's stderr
stream on mismatch. The userspace test reads and dumps this stream on
failure.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-11-5ceb08a9b37b@cloudflare.com
5 weeks agoselftests/bpf: Verify skb metadata in BPF instead of userspace
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:47 +0000 (21:19 +0100)] 
selftests/bpf: Verify skb metadata in BPF instead of userspace

Move metadata verification into the BPF TC programs. Previously,
userspace read metadata from a map and verified it once at test end.

Now TC programs compare metadata directly using __builtin_memcmp() and
set a test_pass flag. This enables verification at multiple points during
test execution rather than a single final check.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-10-5ceb08a9b37b@cloudflare.com
5 weeks agobpf: Make bpf_skb_change_head helper metadata-safe
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:46 +0000 (21:19 +0100)] 
bpf: Make bpf_skb_change_head helper metadata-safe

Although bpf_skb_change_head() doesn't move packet data after skb_push(),
skb metadata still needs to be relocated. Use the dedicated helper to
handle it.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-9-5ceb08a9b37b@cloudflare.com
5 weeks agobpf: Make bpf_skb_change_proto helper metadata-safe
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:45 +0000 (21:19 +0100)] 
bpf: Make bpf_skb_change_proto helper metadata-safe

bpf_skb_change_proto reuses the same headroom operations as
bpf_skb_adjust_room, already updated to handle metadata safely.

The remaining step is to ensure that there is sufficient headroom to
accommodate metadata on skb_push().

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-8-5ceb08a9b37b@cloudflare.com
5 weeks agobpf: Make bpf_skb_adjust_room metadata-safe
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:44 +0000 (21:19 +0100)] 
bpf: Make bpf_skb_adjust_room metadata-safe

bpf_skb_adjust_room() may push or pull bytes from skb->data. In both cases,
skb metadata must be moved accordingly to stay accessible.

Replace existing memmove() calls, which only move payload, with a helper
that also handles metadata. Reserve enough space for metadata to fit after
skb_push.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-7-5ceb08a9b37b@cloudflare.com
5 weeks agobpf: Make bpf_skb_vlan_push helper metadata-safe
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:43 +0000 (21:19 +0100)] 
bpf: Make bpf_skb_vlan_push helper metadata-safe

Use the metadata-aware helper to move packet bytes after skb_push(),
ensuring metadata remains valid after calling the BPF helper.

Also, take care to reserve sufficient headroom for metadata to fit.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-6-5ceb08a9b37b@cloudflare.com
5 weeks agobpf: Make bpf_skb_vlan_pop helper metadata-safe
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:42 +0000 (21:19 +0100)] 
bpf: Make bpf_skb_vlan_pop helper metadata-safe

Use the metadata-aware helper to move packet bytes after skb_pull(),
ensuring metadata remains valid after calling the BPF helper.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-5-5ceb08a9b37b@cloudflare.com
5 weeks agovlan: Make vlan_remove_tag return nothing
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:41 +0000 (21:19 +0100)] 
vlan: Make vlan_remove_tag return nothing

All callers ignore the return value.

Prepare to reorder memmove() after skb_pull() which is a common pattern.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-4-5ceb08a9b37b@cloudflare.com
5 weeks agobpf: Unclone skb head on bpf_dynptr_write to skb metadata
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:40 +0000 (21:19 +0100)] 
bpf: Unclone skb head on bpf_dynptr_write to skb metadata

Currently bpf_dynptr_from_skb_meta() marks the dynptr as read-only when
the skb is cloned, preventing writes to metadata.

Remove this restriction and unclone the skb head on bpf_dynptr_write() to
metadata, now that the metadata is preserved during uncloning. This makes
metadata dynptr consistent with skb dynptr, allowing writes regardless of
whether the skb is cloned.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-3-5ceb08a9b37b@cloudflare.com
5 weeks agonet: Preserve metadata on pskb_expand_head
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:39 +0000 (21:19 +0100)] 
net: Preserve metadata on pskb_expand_head

pskb_expand_head() copies headroom, including skb metadata, into the newly
allocated head, but then clears the metadata. As a result, metadata is lost
when BPF helpers trigger an skb head reallocation.

Let the skb metadata remain in the newly created copy of head.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-2-5ceb08a9b37b@cloudflare.com
5 weeks agonet: Helper to move packet data and metadata after skb_push/pull
Jakub Sitnicki [Wed, 5 Nov 2025 20:19:38 +0000 (21:19 +0100)] 
net: Helper to move packet data and metadata after skb_push/pull

Lay groundwork for fixing BPF helpers available to TC(X) programs.

When skb_push() or skb_pull() is called in a TC(X) ingress BPF program, the
skb metadata must be kept in front of the MAC header. Otherwise, BPF
programs using the __sk_buff->data_meta pseudo-pointer lose access to it.

Introduce a helper that moves both metadata and a specified number of
packet data bytes together, suitable as a drop-in replacement for
memmove().

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-1-5ceb08a9b37b@cloudflare.com
5 weeks agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Jakub Kicinski [Sat, 8 Nov 2025 03:15:36 +0000 (19:15 -0800)] 
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-11-06 (i40, ice, iavf)

Mohammad Heib introduces a new devlink parameter, max_mac_per_vf, for
controlling the maximum number of MAC address filters allowed by a VF. This
allows administrators to control the VF behavior in a more nuanced manner.

Aleksandr and Przemek add support for Receive Side Scaling of GTP to iAVF
for VFs running on E800 series ice hardware. This improves performance and
scalability for virtualized network functions in 5G and LTE deployments.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  iavf: add RSS support for GTP protocol via ethtool
  ice: Extend PTYPE bitmap coverage for GTP encapsulated flows
  ice: improve TCAM priority handling for RSS profiles
  ice: implement GTP RSS context tracking and configuration
  ice: add virtchnl definitions and static data for GTP RSS
  ice: add flow parsing for GTP and new protocol field support
  i40e: support generic devlink param "max_mac_per_vf"
  devlink: Add new "max_mac_per_vf" generic device param
====================

Link: https://patch.msgid.link/20251106225321.1609605-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-stmmac-lpc18xx-and-sti-convert-to-set_phy_intf_sel'
Jakub Kicinski [Sat, 8 Nov 2025 03:05:51 +0000 (19:05 -0800)] 
Merge branch 'net-stmmac-lpc18xx-and-sti-convert-to-set_phy_intf_sel'

Russell King says:

====================
net: stmmac: lpc18xx and sti: convert to set_phy_intf_sel()

This series converts lpc18xx and sti to use the new .set_phy_intf_sel()
method.
====================

Link: https://patch.msgid.link/aQyEs4DAZRWpAz32@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: sti: use ->set_phy_intf_sel()
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:52 +0000 (11:23 +0000)] 
net: stmmac: sti: use ->set_phy_intf_sel()

Rather than placing the phy_intf_sel() setup in the ->init() method,
move it to the new ->set_phy_intf_sel() method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5o-0000000DhQn-34JE@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: sti: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:47 +0000 (11:23 +0000)] 
net: stmmac: sti: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value, validate the result and use that to set the
control register to select the operating mode for the DWMAC core.

Note that when an unsupported interface mode is used, the array would
decode this to PHY_INTF_SEL_GMII_MII, so preserve this behaviour.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5j-0000000DhQh-2e0x@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: sti: use PHY_INTF_SEL_x directly
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:42 +0000 (11:23 +0000)] 
net: stmmac: sti: use PHY_INTF_SEL_x directly

Use the PHY_INTF_SEL_x values directly rather than the driver private
ETH_PHY_SEL_x values. Move the FIELD_PREP() into sti_dwmac_set_mode().
Use dwmac->interface directly.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5e-0000000DhQb-2B7I@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: sti: use PHY_INTF_SEL_x to select PHY interface
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:37 +0000 (11:23 +0000)] 
net: stmmac: sti: use PHY_INTF_SEL_x to select PHY interface

Use the common dwmac definitions for the PHY interface selection field,
adding MII_PHY_SEL_VAL() temporarily to avoid line wrapping.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5Z-0000000DhQV-1e2l@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: lpc18xx: use ->set_phy_intf_sel()
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:32 +0000 (11:23 +0000)] 
net: stmmac: lpc18xx: use ->set_phy_intf_sel()

Move the configuration of the dwmac PHY interface selection to the new
->set_phy_intf_sel() method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5U-0000000DhQP-19Hd@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: lpc18xx: validate phy_intf_sel
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:27 +0000 (11:23 +0000)] 
net: stmmac: lpc18xx: validate phy_intf_sel

Validate the phy_intf_sel value rather than the PHY interface mode.
This will allow us to transition to the ->set_phy_intf_sel() method.
Note that this will allow GMII as well as MII as the phy_intf_sel
value is the same for both.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5P-0000000DhQJ-0Oi3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: lpc18xx: use stmmac_get_phy_intf_sel()
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:21 +0000 (11:23 +0000)] 
net: stmmac: lpc18xx: use stmmac_get_phy_intf_sel()

Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the
phy_intf_sel value, and use the result to program the ethernet mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5J-0000000DhQD-46Ob@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: lpc18xx: use PHY_INTF_SEL_x directly
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:16 +0000 (11:23 +0000)] 
net: stmmac: lpc18xx: use PHY_INTF_SEL_x directly

Use the PHY_INTF_SEL_x values directly rather than the driver private
LPC18XX_CREG_CREG6_ETHMODE_x definitions, and convert
LPC18XX_CREG_CREG6_ETHMODE_MASK to use GENMASK().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy5E-0000000DhQ7-3cuy@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: stmmac: lpc18xx: convert to PHY_INTF_SEL_x
Russell King (Oracle) [Thu, 6 Nov 2025 11:23:11 +0000 (11:23 +0000)] 
net: stmmac: lpc18xx: convert to PHY_INTF_SEL_x

Use the common dwmac definitions for the PHY interface selection field.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vGy59-0000000DhQ1-393H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-use-skb_attempt_defer_free-in-napi_consume_skb'
Jakub Kicinski [Sat, 8 Nov 2025 03:02:42 +0000 (19:02 -0800)] 
Merge branch 'net-use-skb_attempt_defer_free-in-napi_consume_skb'

Eric Dumazet says:

====================
net: use skb_attempt_defer_free() in napi_consume_skb()

There is a lack of NUMA awareness and more generally lack
of slab caches affinity on TX completion path.

Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
in per-cpu caches so that they can be recycled in RX path.

Only use this if the skb was allocated on the same cpu,
otherwise use skb_attempt_defer_free() so that the skb
is freed on the original cpu.

This removes contention on SLUB spinlocks and data structures,
and this makes sure that recycled sk_buff have correct NUMA locality.

After this series, I get ~50% improvement for an UDP tx workload
on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).

I will later refactor skb_attempt_defer_free()
to no longer have to care of skb_shared() and skb_release_head_state().
====================

Link: https://patch.msgid.link/20251106202935.1776179-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: increase skb_defer_max default to 128
Eric Dumazet [Thu, 6 Nov 2025 20:29:35 +0000 (20:29 +0000)] 
net: increase skb_defer_max default to 128

skb_defer_max value is very conservative, and can be increased
to avoid too many calls to kick_defer_list_purge().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251106202935.1776179-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: fix napi_consume_skb() with alien skbs
Eric Dumazet [Thu, 6 Nov 2025 20:29:34 +0000 (20:29 +0000)] 
net: fix napi_consume_skb() with alien skbs

There is a lack of NUMA awareness and more generally lack
of slab caches affinity on TX completion path.

Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
in per-cpu caches so that they can be recycled in RX path.

Only use this if the skb was allocated on the same cpu,
otherwise use skb_attempt_defer_free() so that the skb
is freed on the original cpu.

This removes contention on SLUB spinlocks and data structures.

After this patch, I get ~50% improvement for an UDP tx workload
on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).

80 Mpps -> 120 Mpps.

Profiling one of the 32 cpus servicing NIC interrupts :

Before:

mpstat -P 511 1 1

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     511    0.00    0.00    0.00    0.00    0.00   98.00    0.00    0.00    0.00    2.00

    31.01%  ksoftirqd/511    [kernel.kallsyms]  [k] queued_spin_lock_slowpath
    12.45%  swapper          [kernel.kallsyms]  [k] queued_spin_lock_slowpath
     5.60%  ksoftirqd/511    [kernel.kallsyms]  [k] __slab_free
     3.31%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
     3.27%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
     2.95%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_start
     2.52%  ksoftirqd/511    [kernel.kallsyms]  [k] fq_dequeue
     2.32%  ksoftirqd/511    [kernel.kallsyms]  [k] read_tsc
     2.25%  ksoftirqd/511    [kernel.kallsyms]  [k] build_detached_freelist
     2.15%  ksoftirqd/511    [kernel.kallsyms]  [k] kmem_cache_free
     2.11%  swapper          [kernel.kallsyms]  [k] __slab_free
     2.06%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_features_check
     2.01%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
     1.97%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_release_data
     1.52%  ksoftirqd/511    [kernel.kallsyms]  [k] sock_wfree
     1.34%  swapper          [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
     1.23%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
     1.15%  ksoftirqd/511    [kernel.kallsyms]  [k] dma_unmap_page_attrs
     1.11%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_start
     1.03%  swapper          [kernel.kallsyms]  [k] fq_dequeue
     0.94%  swapper          [kernel.kallsyms]  [k] kmem_cache_free
     0.93%  swapper          [kernel.kallsyms]  [k] read_tsc
     0.81%  ksoftirqd/511    [kernel.kallsyms]  [k] napi_consume_skb
     0.79%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
     0.77%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_free_head
     0.76%  swapper          [kernel.kallsyms]  [k] idpf_features_check
     0.72%  swapper          [kernel.kallsyms]  [k] skb_release_data
     0.69%  swapper          [kernel.kallsyms]  [k] build_detached_freelist
     0.58%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_release_head_state
     0.56%  ksoftirqd/511    [kernel.kallsyms]  [k] __put_partials
     0.55%  ksoftirqd/511    [kernel.kallsyms]  [k] kmem_cache_free_bulk
     0.48%  swapper          [kernel.kallsyms]  [k] sock_wfree

After:

mpstat -P 511 1 1

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     511    0.00    0.00    0.00    0.00    0.00   51.49    0.00    0.00    0.00   48.51

    19.10%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
    13.86%  swapper          [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
    10.80%  swapper          [kernel.kallsyms]  [k] skb_attempt_defer_free
    10.57%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
     7.18%  swapper          [kernel.kallsyms]  [k] queued_spin_lock_slowpath
     6.69%  swapper          [kernel.kallsyms]  [k] sock_wfree
     5.55%  swapper          [kernel.kallsyms]  [k] dma_unmap_page_attrs
     3.10%  swapper          [kernel.kallsyms]  [k] fq_dequeue
     3.00%  swapper          [kernel.kallsyms]  [k] skb_release_head_state
     2.73%  swapper          [kernel.kallsyms]  [k] read_tsc
     2.48%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_start
     1.20%  swapper          [kernel.kallsyms]  [k] idpf_features_check
     1.13%  swapper          [kernel.kallsyms]  [k] napi_consume_skb
     0.93%  swapper          [kernel.kallsyms]  [k] idpf_vport_splitq_napi_poll
     0.64%  swapper          [kernel.kallsyms]  [k] native_send_call_func_single_ipi
     0.60%  swapper          [kernel.kallsyms]  [k] acpi_processor_ffh_cstate_enter
     0.53%  swapper          [kernel.kallsyms]  [k] io_idle
     0.43%  swapper          [kernel.kallsyms]  [k] netif_skb_features
     0.41%  swapper          [kernel.kallsyms]  [k] __direct_call_cpuidle_state_enter2
     0.40%  swapper          [kernel.kallsyms]  [k] native_irq_return_iret
     0.40%  swapper          [kernel.kallsyms]  [k] idpf_tx_buf_hw_update
     0.36%  swapper          [kernel.kallsyms]  [k] sched_clock_noinstr
     0.34%  swapper          [kernel.kallsyms]  [k] handle_softirqs
     0.32%  swapper          [kernel.kallsyms]  [k] net_rx_action
     0.32%  swapper          [kernel.kallsyms]  [k] dql_completed
     0.32%  swapper          [kernel.kallsyms]  [k] validate_xmit_skb
     0.31%  swapper          [kernel.kallsyms]  [k] skb_network_protocol
     0.29%  swapper          [kernel.kallsyms]  [k] skb_csum_hwoffload_help
     0.29%  swapper          [kernel.kallsyms]  [k] x2apic_send_IPI
     0.28%  swapper          [kernel.kallsyms]  [k] ktime_get
     0.24%  swapper          [kernel.kallsyms]  [k] __qdisc_run

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251106202935.1776179-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: allow skb_release_head_state() to be called multiple times
Eric Dumazet [Thu, 6 Nov 2025 20:29:33 +0000 (20:29 +0000)] 
net: allow skb_release_head_state() to be called multiple times

Currently, only skb dst is cleared (thanks to skb_dst_drop())

Make sure skb->destructor, conntrack and extensions are cleared.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20251106202935.1776179-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: add prefetch() in skb_defer_free_flush()
Eric Dumazet [Thu, 6 Nov 2025 08:55:00 +0000 (08:55 +0000)] 
net: add prefetch() in skb_defer_free_flush()

skb_defer_free_flush() is becoming more important these days.

Add a prefetch operation to reduce latency a bit on some
platforms like AMD EPYC 7B12.

On more recent cpus, a stall happens when reading skb_shinfo().
Avoiding it will require a more elaborate strategy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20251106085500.2438951-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'psp-track-stats-from-core-and-provide-a-driver-stats-api'
Jakub Kicinski [Sat, 8 Nov 2025 02:54:07 +0000 (18:54 -0800)] 
Merge branch 'psp-track-stats-from-core-and-provide-a-driver-stats-api'

Daniel Zahka says:

====================
psp: track stats from core and provide a driver stats api

This series introduces stats counters for psp. Device key rotations,
and so called 'stale-events' are common to all drivers and are tracked
by the core.

A driver facing api is provided for reporting stats required by the
"Implementation Requirements" section of the PSP Architecture
Specification. Drivers must implement these stats.

Lastly, implementations of the driver stats api for mlx5 and netdevsim
are included.

Here is the output of running the psp selftest suite and then
printing out stats with the ynl cli on system with a psp-capable CX7:

  $ ./ksft-psp-stats/drivers/net/psp.py
  TAP version 13
  1..28
  ok 1 psp.test_case # SKIP Test requires IPv4 connectivity
  ok 2 psp.data_basic_send_v0_ip6
  ok 3 psp.test_case # SKIP Test requires IPv4 connectivity
  ok 4 psp.data_basic_send_v1_ip6
  ok 5 psp.test_case # SKIP Test requires IPv4 connectivity
  ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
  ok 7 psp.test_case # SKIP Test requires IPv4 connectivity
  ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
  ok 9 psp.test_case # SKIP Test requires IPv4 connectivity
  ok 10 psp.data_mss_adjust_ip6
  ok 11 psp.dev_list_devices
  ok 12 psp.dev_get_device
  ok 13 psp.dev_get_device_bad
  ok 14 psp.dev_rotate
  ok 15 psp.dev_rotate_spi
  ok 16 psp.assoc_basic
  ok 17 psp.assoc_bad_dev
  ok 18 psp.assoc_sk_only_conn
  ok 19 psp.assoc_sk_only_mismatch
  ok 20 psp.assoc_sk_only_mismatch_tx
  ok 21 psp.assoc_sk_only_unconn
  ok 22 psp.assoc_version_mismatch
  ok 23 psp.assoc_twice
  ok 24 psp.data_send_bad_key
  ok 25 psp.data_send_disconnect
  ok 26 psp.data_stale_key
  ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim
  ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim
  # Totals: pass:19 fail:0 xfail:2 xpass:0 skip:7 error:0
  #
  # Responder logs (0):
  # STDERR:
  #  Set PSP enable on device 1 to 0x3
  #  Set PSP enable on device 1 to 0x0

  $ ynl --family psp --dump get-stats
  [{'dev-id': 1,
              'key-rotations': 5,
              'rx-auth-fail': 21,
              'rx-bad': 0,
              'rx-bytes': 11844,
              'rx-error': 0,
              'rx-packets': 94,
              'stale-events': 6,
              'tx-bytes': 1128456,
              'tx-error': 0,
              'tx-packets': 780}]
====================

Link: https://patch.msgid.link/20251106002608.1578518-1-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetdevsim: implement psp device stats
Daniel Zahka [Thu, 6 Nov 2025 00:26:06 +0000 (16:26 -0800)] 
netdevsim: implement psp device stats

For now only tx/rx packets/bytes are reported. This is not compliant
with the PSP Architecture Specification.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-6-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Add PSP stats support for Rx/Tx flows
Jakub Kicinski [Thu, 6 Nov 2025 00:26:05 +0000 (16:26 -0800)] 
net/mlx5e: Add PSP stats support for Rx/Tx flows

Add all statistics described under the "Implementation Requirements"
section of the PSP Architecture Specification:

Rx successfully decrypted PSP packets:
psp_rx_pkts  : Number of packets decrypted successfully
psp_rx_bytes : Number of bytes decrypted successfully

Rx PSP authentication failure statistics:
psp_rx_pkts_auth_fail  : Number of PSP packets that failed authentication
psp_rx_bytes_auth_fail : Number of PSP bytes that failed authentication

Rx PSP bad frame error statistics:
psp_rx_pkts_frame_err;
psp_rx_bytes_frame_err;

Rx PSP drop statistics:
psp_rx_pkts_drop  : Number of PSP packets dropped
psp_rx_bytes_drop : Number of PSP bytes dropped

Tx successfully encrypted PSP packets:
psp_tx_pkts  : Number of packets encrypted successfully
psp_tx_bytes : Number of bytes encrypted successfully

Tx drops:
tx_drop : Number of misc psp related drops

The above can be seen using the ynl cli:
./pyynl/cli.py  --spec netlink/specs/psp.yaml --dump get-stats

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-5-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agopsp: add stats from psp spec to driver facing api
Jakub Kicinski [Thu, 6 Nov 2025 00:26:04 +0000 (16:26 -0800)] 
psp: add stats from psp spec to driver facing api

Provide a driver api for reporting device statistics required by the
"Implementation Requirements" section of the PSP Architecture
Specification. Use a warning to ensure drivers report stats required
by the spec.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-4-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftests: drv-net: psp: add assertions on core-tracked psp dev stats
Daniel Zahka [Thu, 6 Nov 2025 00:26:03 +0000 (16:26 -0800)] 
selftests: drv-net: psp: add assertions on core-tracked psp dev stats

Add assertions to existing test cases to cover key rotations and
'stale-events'.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-3-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agopsp: report basic stats from the core
Jakub Kicinski [Thu, 6 Nov 2025 00:26:02 +0000 (16:26 -0800)] 
psp: report basic stats from the core

Track and report stats common to all psp devices from the core. A
'stale-event' is when the core marks the rx state of an active
psp_assoc as incapable of authenticating psp encapsulated data.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20251106002608.1578518-2-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: fixed_phy: shrink size of struct fixed_phy_status
Heiner Kallweit [Wed, 5 Nov 2025 22:09:17 +0000 (23:09 +0100)] 
net: phy: fixed_phy: shrink size of struct fixed_phy_status

All three members are effectively of type bool, so make this explicit
and shrink size of struct fixed_phy_status.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/9eca3d7e-fa64-4724-8fdc-f2c1a8f2ae8f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-phy-add-open-alliance-tc14-10base-t1s-phy-cable-diagnostic-support'
Jakub Kicinski [Sat, 8 Nov 2025 02:52:36 +0000 (18:52 -0800)] 
Merge branch 'net-phy-add-open-alliance-tc14-10base-t1s-phy-cable-diagnostic-support'

Parthiban Veerasooran says:

====================
net: phy: Add Open Alliance TC14 10Base-T1S PHY cable diagnostic support

This patch series adds Open Alliance TC14 (OATC14) 10BASE-T1S cable
diagnostic feature support to the Linux kernel PHY subsystem and enable
this feature for Microchip LAN867x Rev.D0 PHYs. These patches provide
standardized cable test functionality for 10BASE-T1S Ethernet PHYs,
allowing users to perform cable diagnostics via ethtool.

Patch Summary:
1. add OATC14 10BASE-T1S PHY cable diagnostic support
- Implements support for the OATC14 cable diagnostic feature in
  Clause 45 PHYs.
- Adds functions to start a cable test and retrieve its status,
  mapping hardware results to ethtool codes.
- Exports these functions for use by PHY drivers.
- Open Alliance TC14 10BASE-T1S Advanced Diagnostic PHY Features.
  https://opensig.org/wp-content/uploads/2025/06/OPEN_Alliance_10BASE-T1S_Advanced_PHY_features_for-automotive_Ethernet_V2.1b.pdf

2. add cable diagnostic support for LAN867x Rev.D0
- Integrates the generic OATC14 cable test functions into the
  Microchip LAN867x Rev.D0 PHY driver.
- Enables ethtool cable diagnostics for this PHY, improving
  troubleshooting and maintenance.
====================

Link: https://patch.msgid.link/20251105051213.50443-1-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: microchip_t1s:: add cable diagnostic support for LAN867x Rev.D0
Parthiban Veerasooran [Wed, 5 Nov 2025 05:12:13 +0000 (10:42 +0530)] 
net: phy: microchip_t1s:: add cable diagnostic support for LAN867x Rev.D0

Enable Open Alliance TC14 (OATC14) 10Base-T1S cable diagnostic feature
for Microchip LAN867x Rev.D0 PHY by implementing `cable_test_start` and
`cable_test_get_status` using the generic C45 functions. This allows the
`ethtool` utility to perform cable diagnostic tests directly on the PHY,
improving network troubleshooting and maintenance.

Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20251105051213.50443-3-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: phy-c45: add OATC14 10BASE-T1S PHY cable diagnostic support
Parthiban Veerasooran [Wed, 5 Nov 2025 05:12:12 +0000 (10:42 +0530)] 
net: phy: phy-c45: add OATC14 10BASE-T1S PHY cable diagnostic support

Add support for Open Alliance TC14 (OATC14) 10BASE-T1S PHYs cable
diagnostic feature.

This patch implements:
- genphy_c45_oatc14_cable_test_start() to initiate a cable test
- genphy_c45_oatc14_cable_test_get_status() to retrieve test results
- Helper function to map PHY cable test status to ethtool result codes
- Function declarations and exports for use by PHY drivers

This enables ethtool to report ok, open, short, and undetectable cable
conditions on OATC14 10Base-T1S PHYs.

Open Alliance TC14 10BASE-T1S Advanced Diagnostic PHY Features
Specification ref:
https://opensig.org/wp-content/uploads/2025/06/OPEN_Alliance_10BASE-T1S_Advanced_PHY_features_for-automotive_Ethernet_V2.1b.pdf

Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20251105051213.50443-2-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: mana: Fix incorrect speed reported by debugfs
Erni Sri Satya Vennela [Wed, 5 Nov 2025 19:04:28 +0000 (11:04 -0800)] 
net: mana: Fix incorrect speed reported by debugfs

Once the netshaper is created for MANA, the current bandwidth
is reported in debugfs like this:

$ sudo ./tools/net/ynl/pyynl/cli.py \
  --spec Documentation/netlink/specs/net_shaper.yaml \
  --do set \
  --json '{"ifindex":'3',
           "handle":{ "scope": "netdev", "id":'1' },
           "bw-max": 200000000 }'
None

$ sudo cat /sys/kernel/debug/mana/1/vport0/current_speed
200

After the shaper  is deleted, it is expected to report
the maximum speed supported by the SKU. But currently it is
reporting 0, which is incorrect.

Fix this inconsistency, by resetting apc->speed to apc->max_speed
during deletion of the shaper object. This will improve
readability and debuggability.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1762369468-32570-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: airoha: Add the capability to consume out-of-order DMA tx descriptors
Lorenzo Bianconi [Thu, 6 Nov 2025 12:53:23 +0000 (13:53 +0100)] 
net: airoha: Add the capability to consume out-of-order DMA tx descriptors

EN7581 and AN7583 SoCs are capable of DMA mapping non-linear tx skbs on
non-consecutive DMA descriptors. This feature is useful when multiple
flows are queued on the same hw tx queue since it allows to fully utilize
the available tx DMA descriptors and to avoid the starvation of
high-priority flow we have in the current codebase due to head-of-line
blocking introduced by low-priority flows.

Tested-by: Xuegang Lu <xuegang.lu@airoha.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20251106-airoha-tx-linked-list-v2-1-0706d4a322bd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: add net.ipv4.tcp_comp_sack_rtt_percent
Eric Dumazet [Thu, 6 Nov 2025 11:52:36 +0000 (11:52 +0000)] 
tcp: add net.ipv4.tcp_comp_sack_rtt_percent

TCP SACK compression has been added in 2018 in commit
5d9f4262b7ea ("tcp: add SACK compression").

It is working great for WAN flows (with large RTT).
Wifi in particular gets a significant boost _when_ ACK are suppressed.

Add a new sysctl so that we can tune the very conservative 5 % value
that has been used so far in this formula, so that small RTT flows
can benefit from this feature.

delay = min ( 5 % of RTT, 1 ms)

This patch adds new tcp_comp_sack_rtt_percent sysctl
to ease experiments and tuning.

Given that we cap the delay to 1ms (tcp_comp_sack_delay_ns sysctl),
set the default value to 33 %.

Quoting Neal Cardwell ( https://lore.kernel.org/netdev/CADVnQymZ1tFnEA1Q=vtECs0=Db7zHQ8=+WCQtnhHFVbEOzjVnQ@mail.gmail.com/ )

The rationale for 33% is basically to try to facilitate pipelining,
where there are always at least 3 ACKs and 3 GSO/TSO skbs per SRTT, so
that the path can maintain a budget for 3 full-sized GSO/TSO skbs "in
flight" at all times:

+ 1 skb in the qdisc waiting to be sent by the NIC next
+ 1 skb being sent by the NIC (being serialized by the NIC out onto the wire)
+ 1 skb being received and aggregated by the receiver machine's
aggregation mechanism (some combination of LRO, GRO, and sack
compression)

Note that this is basically the same magic number (3) and the same
rationales as:

(a) tcp_tso_should_defer() ensuring that we defer sending data for no
longer than cwnd/tcp_tso_win_divisor (where tcp_tso_win_divisor = 3),
and
(b) bbr_quantization_budget() ensuring that cwnd is at least 3 GSO/TSO
skbs to maintain pipelining and full throughput at low RTTs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20251106115236.3450026-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'tcp-clean-up-syn-ack-rto-code-and-apply-max-rto'
Jakub Kicinski [Sat, 8 Nov 2025 02:05:28 +0000 (18:05 -0800)] 
Merge branch 'tcp-clean-up-syn-ack-rto-code-and-apply-max-rto'

Kuniyuki Iwashima says:

====================
tcp: Clean up SYN+ACK RTO code and apply max RTO.

Patch 1 - 4 are misc cleanup.

Patch 5 applies max RTO to non-TFO SYN+ACK.

Patch 6 adds a test for max RTO of SYN+ACK.
====================

Link: https://patch.msgid.link/20251106003357.273403-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoselftest: packetdrill: Add max RTO test for SYN+ACK.
Kuniyuki Iwashima [Thu, 6 Nov 2025 00:32:45 +0000 (00:32 +0000)] 
selftest: packetdrill: Add max RTO test for SYN+ACK.

This script sets net.ipv4.tcp_rto_max_ms to 1000 and checks
if SYN+ACK RTO is capped at 1s for TFO and non-TFO.

Without the previous patch, the max RTO is applied to TFO
SYN+ACK only, and non-TFO SYN+ACK RTO increases exponentially.

  # selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
  # TAP version 13
  # 1..2
  # tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
     expected outbound packet at 5.091936 sec but happened at 6.107826 sec; tolerance 0.127974 sec
  # script packet:  5.091936 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
  # actual packet:  6.107826 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
  # not ok 1 ipv4
  # tcp_rto_synack_rto_max.pkt:46: error handling packet: timing error:
     expected outbound packet at 5.075901 sec but happened at 6.091841 sec; tolerance 0.127976 sec
  # script packet:  5.075901 S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
  # actual packet:  6.091841 S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK>
  # not ok 2 ipv6
  # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
  not ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt # exit=1

With the previous patch, all SYN+ACKs are retransmitted
after 1s.

  # selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt
  # TAP version 13
  # 1..2
  # ok 1 ipv4
  # ok 2 ipv6
  # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
  ok 49 selftests: net/packetdrill: tcp_rto_synack_rto_max.pkt

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Apply max RTO to non-TFO SYN+ACK.
Kuniyuki Iwashima [Thu, 6 Nov 2025 00:32:44 +0000 (00:32 +0000)] 
tcp: Apply max RTO to non-TFO SYN+ACK.

Since commit 54a378f43425 ("tcp: add the ability to control
max RTO"), TFO SYN+ACK RTO is capped by the TFO full sk's
inet_csk(sk)->icsk_rto_max.

The value is inherited from the parent listener.

Let's apply the same cap to non-TFO SYN+ACK.

Note that req->rsk_listener is always non-NULL when we call
tcp_reqsk_timeout() in reqsk_timer_handler() or tcp_check_req().

It could be NULL for SYN cookie req, but we do not use
req->timeout then.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Remove timeout arg from reqsk_timeout().
Kuniyuki Iwashima [Thu, 6 Nov 2025 00:32:43 +0000 (00:32 +0000)] 
tcp: Remove timeout arg from reqsk_timeout().

reqsk_timeout() is always called with @timeout being TCP_RTO_MAX.

Let's remove the arg.

As a prep for the next patch, reqsk_timeout() is moved to tcp.h
and renamed to tcp_reqsk_timeout().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Remove redundant init for req->num_timeout.
Kuniyuki Iwashima [Thu, 6 Nov 2025 00:32:42 +0000 (00:32 +0000)] 
tcp: Remove redundant init for req->num_timeout.

Commit 5903123f662e ("tcp: Use BPF timeout setting for SYN ACK
RTO") introduced req->timeout and initialised it in 3 places:

  1. reqsk_alloc() sets 0
  2. inet_reqsk_alloc() sets TCP_TIMEOUT_INIT
  3. tcp_conn_request() sets tcp_timeout_init()

1. has been always redundant as 2. overwrites it immediately.

2. was necessary for TFO SYN+ACK but is no longer needed after
commit 8ea731d4c2ce ("tcp: Make SYN ACK RTO tunable by BPF
programs with TFO").

3. was moved to reqsk_queue_hash_req() in the previous patch.

Now, we always initialise req->timeout just before scheduling
the SYN+ACK timer:

  * For non-TFO SYN+ACK : reqsk_queue_hash_req()
  * For TFO SYN+ACK     : tcp_fastopen_create_child()

Let's remove the redundant initialisation of req->timeout in
reqsk_alloc() and inet_reqsk_alloc().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Remove timeout arg from reqsk_queue_hash_req().
Kuniyuki Iwashima [Thu, 6 Nov 2025 00:32:41 +0000 (00:32 +0000)] 
tcp: Remove timeout arg from reqsk_queue_hash_req().

inet_csk_reqsk_queue_hash_add() is no longer shared by DCCP.

We do not need to pass req->timeout down to reqsk_queue_hash_req().

Let's move tcp_timeout_init() from tcp_conn_request() to
reqsk_queue_hash_req().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotcp: Call tcp_syn_ack_timeout() directly.
Kuniyuki Iwashima [Thu, 6 Nov 2025 00:32:40 +0000 (00:32 +0000)] 
tcp: Call tcp_syn_ack_timeout() directly.

Since DCCP has been removed, we do not need to use
request_sock_ops.syn_ack_timeout().

Let's call tcp_syn_ack_timeout() directly.

Now other function pointers of request_sock_ops are
protocol-dependent.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106003357.273403-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetlink: specs: netdev add missing stats to qstat-get
Jakub Kicinski [Tue, 4 Nov 2025 23:23:44 +0000 (15:23 -0800)] 
netlink: specs: netdev add missing stats to qstat-get

Add missing entries in C attribute list.

Link: https://patch.msgid.link/20251104232348.1954349-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'net-renesas-cleanup-usage-of-gptp-flags'
Jakub Kicinski [Fri, 7 Nov 2025 01:38:27 +0000 (17:38 -0800)] 
Merge branch 'net-renesas-cleanup-usage-of-gptp-flags'

Niklas Söderlund says:

====================
net: renesas: Cleanup usage of gPTP flags

This series aim is to prepare for future work that will enable the use
of gPTP on R-Car RAVB on Gen4. Currently RAVB have a dedicated gPTP
implementation supported on Gen2 and Gen3 (ravb_ptp.c). For Gen4 a new
implementation that is already upstream (rcar_gen4_ptp.c) and used by
other Gen4 devices such as RTSN and RSWITCH is needed.

Unfortunately the design of the Gen2/Gen3 RAVB driver where driver
specific flags to control gPTP behavior have been mimicked in RTSN and
RSWITCH. This was OK as there was no overlap between the two gPTP
implementations. Now that RAVB needs to be able to use both having to
translate between driver specific flags and common net code flags
becomes even more cumbersome as there are two sets of driver specific
flags to pick from.

This series cleans this up for all Renesas drivers using gPTP by
removing all driver specific flags and using the common flags directly.
This simplifies drivers while at the same time prepare RAVB to be
extended with Gen4 support.

Patch 1/7 is a drive by patch where RSWITCH specific define was added in
the wrong header. Patch 2/7 removes a short-cut used in RTSN and RSWITCH
that prevents extending Gen4 support to RAVB without fuss. While patch
3/7 to 7/7 rework the Renesas drivers to use the common flags instead of
driver specific ones.

There is no intentional behavior change and only a small rework in logic
in the RAVB driver. Looking at patch 3/7, 4/7 and 7/7 one can clearly
see how the code have been copied from RAVB to the later implementations
in RTSN and RSWITCH.
====================

Link: https://patch.msgid.link/20251104222420.882731-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ravb: Use common defines for time stamping control
Niklas Söderlund [Tue, 4 Nov 2025 22:24:20 +0000 (23:24 +0100)] 
net: ravb: Use common defines for time stamping control

Instead of translating to/from driver specific flags for packet time
stamp control use the common flags directly. This simplifies the driver
as the translating code can be removed while at the same time making it
clear the flags are not flags written to hardware registers.

The change from a device specific bit-field track variable to the common
enum datatypes forces us to touch the ravb_rx_rcar_hwstamp() in a non
trivial way. To make this cleaner and easier to understand expand the
nested conditions.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251104222420.882731-8-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: ravb: Break out Rx hardware timestamping
Niklas Söderlund [Tue, 4 Nov 2025 22:24:19 +0000 (23:24 +0100)] 
net: ravb: Break out Rx hardware timestamping

Prepare for moving away from device specific bit-fields to track how to
do hardware Rx timestamping to using net common enums by breaking out
the timestamping to a helper function. This is done to create cleaner
code and prepare for easier changes improving the hardware timestapming.

There is no functional change.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251104222420.882731-7-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: rcar_gen4_ptp: Remove unused defines
Niklas Söderlund [Tue, 4 Nov 2025 22:24:18 +0000 (23:24 +0100)] 
net: rcar_gen4_ptp: Remove unused defines

The driver specific flags to control packet time stamps have all been
replaced by values from enum hwtstamp_tx_types and enum
hwtstamp_rx_filters. Remove the driver specific flags as there are no
more users.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251104222420.882731-6-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: rtsn: Use common defines for time stamping control
Niklas Söderlund [Tue, 4 Nov 2025 22:24:17 +0000 (23:24 +0100)] 
net: rtsn: Use common defines for time stamping control

Instead of translating to/from driver specific flags for packet time
stamp control use the common flags directly. This simplifies the driver
as the translating code can be removed while at the same time making it
clear the flags are not flags written to hardware registers.

One thing to note is that the bit-wise and check in rtsn_rx() of
RCAR_GEN4_RXTSTAMP_TYPE_V2_L2_EVENT is replaced with a not set check of
HWTSTAMP_FILTER_NONE. This is okay as the bit of device specific event
replaced was set for all modes except HWTSTAMP_FILTER_NONE.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251104222420.882731-5-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: rswitch: Use common defines for time stamping control
Niklas Söderlund [Tue, 4 Nov 2025 22:24:16 +0000 (23:24 +0100)] 
net: rswitch: Use common defines for time stamping control

Instead of translating to/from driver specific flags for packet time
stamp control use the common flags directly. This simplifies the driver
as the translating code can be removed while at the same time making it
clear the flags are not flags written to hardware registers.

One thing to note is that the bit-wise and check in rswitch_rx() of
RCAR_GEN4_RXTSTAMP_TYPE_V2_L2_EVENT is replaced with a not set check of
HWTSTAMP_FILTER_NONE. This is okay as the bit of device specific event
replaced was set for all modes except HWTSTAMP_FILTER_NONE.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251104222420.882731-4-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: rcar_gen4_ptp: Move control fields to users
Niklas Söderlund [Tue, 4 Nov 2025 22:24:15 +0000 (23:24 +0100)] 
net: rcar_gen4_ptp: Move control fields to users

The struct rcar_gen4_ptp_private provides two fields for convenience of
its users, tstamp_tx_ctrl and tstamp_rx_ctrl. These fields are not used
by the rcar_gen4_ptp driver itself but only by the drivers using it.

Upcoming work will enable the RAVB driver currently only supporting gPTP
on pre-Gen4 SoCs to use the Gen4 implementation as well. To facilitate
this the convenience of having these fields in struct
rcar_gen4_ptp_private becomes a problem as the RAVB driver already have
it's own driver specific fields for the same thing.

Move the fields from struct rcar_gen4_ptp_private to each driver using
the Gen4 gPTP clocks own private data structures. There is no functional
change.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251104222420.882731-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: rswitch: Move definition of S4 gPTP offset
Niklas Söderlund [Tue, 4 Nov 2025 22:24:14 +0000 (23:24 +0100)] 
net: rswitch: Move definition of S4 gPTP offset

The files rcar_gen4_ptp.{c,h} implements an abstraction of the gPTP
support implemented together with different other IP blocks. The first
device added which supported this was RSWITCH on R-Car S4.

While doing so the RSWITCH R-Car S4 specific offset was added to the
generic Gen4 gPTP header file. Move it to the RSWITCH driver to make it
clear it only applies to this driver.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251104222420.882731-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonetkit: Document fast vs slowpath members via macros
Daniel Borkmann [Fri, 31 Oct 2025 21:20:59 +0000 (22:20 +0100)] 
netkit: Document fast vs slowpath members via macros

Instead of a comment, just use two cachline groups to document the intent
for members often accessed in fast or slow path.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251031212103.310683-11-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoxsk: Move NETDEV_XDP_ACT_ZC into generic header
Daniel Borkmann [Fri, 31 Oct 2025 21:20:55 +0000 (22:20 +0100)] 
xsk: Move NETDEV_XDP_ACT_ZC into generic header

Move NETDEV_XDP_ACT_ZC into xdp_sock_drv.h header such that external code
can reuse it, and rename it into more generic NETDEV_XDP_ACT_XSK.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251031212103.310683-7-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: qt2025: Wait until PHY becomes ready
FUJITA Tomonori [Wed, 5 Nov 2025 13:31:26 +0000 (22:31 +0900)] 
net: phy: qt2025: Wait until PHY becomes ready

Wait until a PHY becomes ready in the probe callback by
using read_poll_timeout function.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Link: https://patch.msgid.link/20251105133126.3221948-1-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotg3: extract GRXRINGS from .get_rxnfc
Breno Leitao [Wed, 5 Nov 2025 18:01:12 +0000 (10:01 -0800)] 
tg3: extract GRXRINGS from .get_rxnfc

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

Given that tg3_get_rxnfc() only handles ETHTOOL_GRXRINGS, then this
function becomes useless now, and it is removed.

This also fixes the behavior for devices without MSIX support.
Previously, the function would return -EOPNOTSUPP, but now it correctly
returns 1.

The functionality remains the same: return the current queue count
if the device is running, otherwise return the minimum of online
CPUs and TG3_RSS_MAX_NUM_QS.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251105-grxrings_v1-v1-1-54c2caafa1fd@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoiavf: add RSS support for GTP protocol via ethtool
Aleksandr Loktionov [Thu, 30 Oct 2025 13:59:50 +0000 (14:59 +0100)] 
iavf: add RSS support for GTP protocol via ethtool

Extend the iavf driver to support Receive Side Scaling (RSS)
configuration for GTP (GPRS Tunneling Protocol) flows using ethtool.

The implementation introduces new RSS flow segment headers and hash field
definitions for various GTP encapsulations, including:

  - GTPC
  - GTPU (IP, Extension Header, Uplink, Downlink)
  - TEID-based hashing

The ethtool interface is updated to parse and apply these new flow types
and hash fields, enabling fine-grained traffic distribution for GTP-based
mobile workloads.

This enhancement improves performance and scalability for virtualized
network functions (VNFs) and user plane functions (UPFs) in 5G and LTE
deployments.

Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: Extend PTYPE bitmap coverage for GTP encapsulated flows
Przemek Kitszel [Thu, 30 Oct 2025 13:59:49 +0000 (14:59 +0100)] 
ice: Extend PTYPE bitmap coverage for GTP encapsulated flows

Consolidate updates to the Protocol Type (PTYPE) bitmap definitions
across multiple flow types in the Intel ICE driver to support GTP
(GPRS Tunneling Protocol) encapsulated traffic.

Enable improved Receive Side Scaling (RSS) configuration for both user
and control plane GTP flows.

Cover a wide range of protocol and encapsulation scenarios, including:
 - MAC OFOS and IL
 - IPv4 and IPv6 (OFOS, IL, ALL, no-L4)
 - TCP, SCTP, ICMP
 - GRE OF
 - GTPC (control plane)

Expand the PTYPE bitmap entries to improve classification and
distribution of GTP traffic across multiple queues, enhancing
performance and scalability in mobile network environments.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Co-developed-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Co-developed-by: Jie Wang <jie1x.wang@intel.com>
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Co-developed-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: improve TCAM priority handling for RSS profiles
Aleksandr Loktionov [Thu, 30 Oct 2025 13:59:48 +0000 (14:59 +0100)] 
ice: improve TCAM priority handling for RSS profiles

Enhance TCAM priority logic to avoid conflicts between RSS profiles
with overlapping PTGs and attributes.

Track used PTG and attribute combinations.
Ensure higher-priority profiles override lower ones.
Add helper for setting TCAM flags and masks.

Ensure RSS rule consistency and prevent unintended matches.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: implement GTP RSS context tracking and configuration
Aleksandr Loktionov [Thu, 30 Oct 2025 13:59:47 +0000 (14:59 +0100)] 
ice: implement GTP RSS context tracking and configuration

This commit implements the core RSS context management and configuration
logic for GTP (GTPU) protocol support in VF RSS operations.

Key implementation features:
- GTPU hash context management with pre/post processing functions
- Context index calculation and mapping for different GTPU scenarios
- Integration with main RSS configuration flow via wrapper functions
- Support for IPv4/IPv6 GTPU RSS configurations
- Rollback mechanism for handling RSS rule conflicts
- Hash context reset and cleanup functionality

The implementation provides comprehensive GTPU RSS support by:
1. Adding ice_add_rss_cfg_pre_gtpu() for preprocessing GTPU contexts
2. Adding ice_add_rss_cfg_post_gtpu() for postprocessing configurations
3. Adding ice_calc_gtpu_ctx_idx() for context index calculation
4. Integrating GTPU logic into ice_add_rss_cfg_wrap() and
   ice_rem_rss_cfg_wrap()
5. Supporting context tracking in VF hash_ctx structures

This completes the GTP RSS infrastructure enabling VFs to configure
RSS hashing on GTP-encapsulated traffic.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Co-developed-by: Jie Wang <jie1x.wang@intel.com>
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Co-developed-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Co-developed-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: add virtchnl definitions and static data for GTP RSS
Aleksandr Loktionov [Thu, 30 Oct 2025 13:59:46 +0000 (14:59 +0100)] 
ice: add virtchnl definitions and static data for GTP RSS

Add virtchnl protocol header and field definitions for advanced RSS
configuration including GTPC, GTPU, L2TPv2, ECPRI, PPP, GRE, and IP
fragment headers.

- Define new virtchnl protocol header types
- Add RSS field selectors for tunnel protocols
- Extend static mapping arrays for protocol field matching
- Add L2TPv2 session ID and length+session ID field support

This provides the foundational definitions needed for VF RSS
configuration of tunnel protocols.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Co-developed-by: Jie Wang <jie1x.wang@intel.com>
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Co-developed-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Co-developed-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoice: add flow parsing for GTP and new protocol field support
Aleksandr Loktionov [Thu, 30 Oct 2025 13:59:45 +0000 (14:59 +0100)] 
ice: add flow parsing for GTP and new protocol field support

Introduce new protocol header types and field sizes to support GTPU, GTPC
tunneling protocols.

 - Add field size macros for GTP TEID, QFI, and other headers
 - Extend ice_flow_field_info and enum definitions
 - Update hash macros for new protocols
 - Add support for IPv6 prefix matching and fragment headers

This patch lays the groundwork for enhanced RSS and flow classification
capabilities.

Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Co-developed-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Co-developed-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 weeks agoMerge branch 'net-dsa-lantiq_gswip-add-support-for-maxlinear-gsw1xx-switch-family'
Jakub Kicinski [Thu, 6 Nov 2025 22:16:20 +0000 (14:16 -0800)] 
Merge branch 'net-dsa-lantiq_gswip-add-support-for-maxlinear-gsw1xx-switch-family'

Daniel Golle says:

====================
net: dsa: lantiq_gswip: Add support for MaxLinear GSW1xx switch family

This patch series extends the existing lantiq_gswip DSA driver to
support the MaxLinear GSW1xx family of dedicated Ethernet switch ICs.
These switches are based on the same IP as the Lantiq/Intel GSWIP found
in VR9 and xRX MIPS router SoCs which are currently supported by the
lantiq_gswip driver, but they are dedicated ICs connected via MDIO
rather than built-in components of a SoC accessible via memory-mapped
I/O.

The series includes several improvements and refactoring to implement
support for GSW1xx switch ICs by reusing the existing lantiq_gswip
driver.

The GSW1xx family includes several variants:
 - GSW120: 4 ports, 2 PHYs, RGMII & SGMII/2500Base-X
 - GSW125: 4 ports, 2 PHYs, RGMII & SGMII/2500Base-X, industrial temperature
 - GSW140: 6 ports, 4 PHYs, RGMII & SGMII/2500Base-X
 - GSW141: 6 ports, 4 PHYs, RGMII & SGMII
 - GSW145: 6 ports, 4 PHYs, RGMII & SGMII/2500Base-X, industrial temperature

Key features implemented:
 - MDIO-based register access using regmap
 - Support for SGMII/1000Base-X/2500Base-X SerDes interfaces
 - Configurable RGMII delays via device tree properties
 - Configurable RMII clock direction
 - Energy Efficient Ethernet (EEE) support
 - enabling/disabling learning
====================

Link: https://patch.msgid.link/cover.1762170107.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: add driver for MaxLinear GSW1xx switch family
Daniel Golle [Mon, 3 Nov 2025 12:20:28 +0000 (12:20 +0000)] 
net: dsa: add driver for MaxLinear GSW1xx switch family

Add driver for the MaxLinear GSW1xx family of Ethernet switch ICs which
are based on the same IP as the Lantiq/Intel GSWIP found in the Lantiq VR9
and Intel GRX MIPS router SoCs. The main difference is that instead of
using memory-mapped I/O to communicate with the host CPU these ICs are
connected via MDIO (or SPI, which isn't supported by this driver).
Implement the regmap API to access the switch registers over MDIO to allow
reusing lantiq_gswip_common for all core functionality.

The GSW1xx also comes with a SerDes port capable of 1000Base-X, SGMII and
2500Base-X, which can either be used to connect an external PHY or SFP
cage, or as the CPU port. Support for the SerDes interface is implemented
in this driver using the phylink_pcs interface.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/b567ec1b4beb08fd37abf18b280c56d5d8253c26.1762170107.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: add tagging driver for MaxLinear GSW1xx switch family
Daniel Golle [Mon, 3 Nov 2025 12:20:20 +0000 (12:20 +0000)] 
net: dsa: add tagging driver for MaxLinear GSW1xx switch family

Add support for a new DSA tagging protocol driver for the MaxLinear
GSW1xx switch family. The GSW1xx switches use a proprietary 8-byte
special tag inserted between the source MAC address and the EtherType
field to indicate the source and destination ports for frames
traversing the CPU port.

Implement the tag handling logic to insert the special tag on transmit
and parse it on receive.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/0e973ebfd9433c30c96f50670da9e9449a0d98f2.1762170107.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agodt-bindings: net: dsa: lantiq,gswip: add support for MaxLinear GSW1xx switches
Daniel Golle [Mon, 3 Nov 2025 12:20:12 +0000 (12:20 +0000)] 
dt-bindings: net: dsa: lantiq,gswip: add support for MaxLinear GSW1xx switches

Extend the Lantiq GSWIP device tree binding to also cover MaxLinear
GSW1xx switches which are based on the same hardware IP but connected
via MDIO instead of being memory-mapped.

Add compatible strings for MaxLinear GSW120, GSW125, GSW140, GSW141,
and GSW145 switches and adjust the schema to handle the different
connection methods with conditional properties.

Add MaxLinear GSW125 example showing MDIO-connected configuration.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/fc96f1dedb2b418a63e69960356dde7f6eb86424.1762170107.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>