Kevin Hao [Tue, 10 Mar 2026 10:12:09 +0000 (18:12 +0800)]
net: macb: Clean up the .usrio settings in macb_config instances
All instances of macb_config currently have the .usrio set, but most of
them use &macb_default_usrio. In fact, there is no need to duplicate
this across all macb_config instances. Remove the .usrio setting from
instances that use &macb_default_usrio, and ensure that the default is
selected at runtime when no other value is explicitly set.
Kevin Hao [Tue, 10 Mar 2026 10:12:08 +0000 (18:12 +0800)]
net: macb: Clean up the .init settings in macb_config instances
All instances of macb_config currently have the .init field set, but most
of them use macb_init(). In fact, there is no need to duplicate this
across all macb_config instances. Introduce a new macb_init() function
that executes the specific .init if it is set; otherwise, it runs a
default initialization function.
Kevin Hao [Tue, 10 Mar 2026 10:12:07 +0000 (18:12 +0800)]
net: macb: Clean up the .clk_init setting in the macb_config instances
All instances of macb_config currently have .clk_init set, but most of
them use macb_clk_init(). In fact, there is no need to duplicate this
across all macb_config instances. Introduce a new macb_clk_init()
function that executes the specific .clk_init if it is set; otherwise,
it runs the default clock initialization function.
Daniel Golle [Tue, 10 Mar 2026 18:10:32 +0000 (18:10 +0000)]
selftests: net: local_termination: test link-local protocols
Add tests to local_termination.sh to verify that link-local frames
arrive. On some switches the DSA driver uses bridges to connect the
user ports to their CPU ports. More "intelligent" switches typically
don't forward link-local frames, but may trap them to an internal
microcontroller. The driver may have to change trapping rules, so
link-local frames end up on the DSA CPU ports instead of being
silently dropped or trapped to the internal microcontroller of the
switch.
Add two tests which help to validate this has been done correctly:
- Link-local STP BPDU should arrive at the Linux netdev when the
bridge has STP disabled (BR_NO_STP), in which case the bridge
forwards them rather than consuming them in the control plane
- Link-local LLDP should arrive at standalone ports (and the test
should be skipped on bridged ports similar to how it is done
for the IEEE1588v2/PTP tests)
Soichiro Ueda [Tue, 10 Mar 2026 07:28:31 +0000 (16:28 +0900)]
selftests: af_unix: validate SO_PEEK_OFF advancement and reset
Extend the so_peek_off selftest to ensure the socket peek offset is handled
correctly after both MSG_PEEK and actual data consumption.
Verify that the peek offset advances by the same amount as the number of
bytes read when performing a read with MSG_PEEK.
After exercising SO_PEEK_OFF via MSG_PEEK, drain the receive queue with a
non-peek recv() and verify that it can receive all the content in the
buffer and SO_PEEK_OFF returns back to 0.
The verification after actual data consumption was suggested by Miao Wang
when the original so_peek_off selftest was introduced.
====================
net: stmmac: start to shrink memory usage
Start shrinking stmmac's memory usage by avoiding using "int" for
members that are only used for 0/1 (boolean) values, or values that
can't be larger than 255.
In addition, as struct stmmac_dma_cfg is approximately a cache line,
shrinks below a cache line as a result of this patch set, and is
required, there is no point separately allocating this from
struct plat_stmmacenet_data. Embed it into the end of this struct
and set the existing pointer to avoid large wide-spread changes.
Lastly, add documentation for struct stmmac_dma_cfg, and document
the stmmac clocks as best we can given the driver history.
====================
Add documentation covering stmmac_clk, pclk, clk_ptp_ref and clk_tx_i
in the hope that this will help understand what each of these clocks
are for.
There is confusion around stmmac_clk and pclk which can't be easily
resolved today as the Imagination Technologies Pistachio board that
pclk was introduced for has no public documentation and is likely now
obsolete. So the origins of pclk are lost to the winds of time.
net: stmmac: use u8 for host_dma_width and similar struct members
We aren't going to see >= 256-bit address busses soon, so reduce
host_dma_width and associated other struct members that initialise
this from u32 to u8.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> # qcom-ethqos Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX5P-0000000CVsK-0iwX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: use u8 for ?x_queues_to_use and number_?x_queues
The maximum number of queues is a compile time constant of only eight.
This makes using a 32-bit quantity wastefulf. Instead, use u8 for
these and their associated variables.
When reading the DT properties, saturdate at U8_MAX. Provided the core
provides DMA capabilities to describe the number of queues, this will
be capped by stmmac_hw_init() with a warning.
net: stmmac: reorder structs to reduce memory consumption
Reorder some of the stmmac structures to allow them to pack better,
thereby using less memory. On aarch64, sizeof(struct stmmac_priv)
was 880, and with this change becomes 816, saving 64 bytes, which
is an 8% saving.
net: stmmac: convert plat_stmmacenet_data booleans to type bool
Convert members of struct plat_stmmacenet_data that are booleans to
type 'bool' and ensure their initialisers are true/false. Move the
has_xxx for the GMAC cores together, and move the COE members to the
end of the list of bool to avoid unused holes in the struct.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX59-0000000CVs2-3MHc@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: stmmac: provide plat_dat->dma_cfg in stmmac_plat_dat_alloc()
plat_dat->dma_cfg is unconditionally required for the operation of the
driver, so it would make sense to allocate it along with the plat_dat.
On Arm64, sizeof(*plat_dat) has recently shrunk from 880 to 816 bytes
and sizeof(*plat_dat->dma_cfg) has shrunk from 32 to 20 bytes.
Given that dma_cfg is required, and it is now less than a cache line,
It doesn't make sense to allocate this separateny, so place it at the
end of struct plat_stmmacenet_data, and set plat_dat->dma_cfg to point
at that to avoid mass changes.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vzX54-0000000CVrw-2jfu@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johan Hovold [Mon, 9 Mar 2026 08:26:41 +0000 (09:26 +0100)]
net: mdio: mvusb: drop redundant device reference
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.
Drop the redundant device reference to reduce cargo culting, make it
easier to spot drivers where an extra reference is needed, and reduce
the risk of memory leaks when drivers fail to release it.
====================
amd-xgbe: Improve power management for S0i3
Improve the amd-xgbe power management handling to allow AMD platforms to
reach the deepest suspend state (S0i3) when modern standby is used.
The first patch cleans up the xgbe_powerdown() and xgbe_powerup()
helpers by removing an unused caller distinction and aligning the
ordering of operations with xgbe_stop().
The second patch adds proper PCI power management operations, following
the standard PCI PM model, so that the device can be cleanly put into
D3 and resumed back to D0. Without this, the amd_pmc driver reports:
"Last suspend didn't reach deepest state"
when the amd-xgbe driver is enabled.
These changes have been tested on AMD platforms using S0i3 modern
standby.
====================
Raju Rangoju [Sun, 8 Mar 2026 09:28:51 +0000 (14:58 +0530)]
amd-xgbe: add PCI power management for S0i3 support
The current suspend/resume implementation does not correctly handle PCI
device power state transitions, which prevents AMD platforms from
reaching the deepest suspend state (S0i3) when the amd-xgbe driver is
enabled.
In particular, the amd_pmc driver reports:
"Last suspend didn't reach deepest state"
when this device is present.
Implement proper PCI power management operations following the standard
PCI PM model so that the device can be cleanly powered down and resumed.
Suspend path:
- Power down the network interface
- Put the PHY into low-power mode
- Disable bus mastering to prevent DMA activity
- Save PCI configuration space
- Disable the PCI device
- Disable wake from D3 (S0i3 does not require Wake-on-LAN)
- Set the device to D3hot
Resume path:
- Restore the PCI power state to D0
- Restore PCI configuration space
- Enable the PCI device
- Re-enable bus mastering
- Re-enable device interrupts
- Clear the PHY low-power mode
- Power up the network interface
This allows systems using amd-xgbe to reach the deepest suspend state
when entering modern standby (S0i3).
Raju Rangoju [Sun, 8 Mar 2026 09:28:50 +0000 (14:58 +0530)]
amd-xgbe: Simplify powerdown/powerup paths
The caller parameter in xgbe_powerdown() and xgbe_powerup() was intended
to differentiate between driver and ioctl contexts, but the only
remaining usage is from the driver suspend/resume path.
Simplify this by:
- Removing the unused XGMAC_DRIVER_CONTEXT and XGMAC_IOCTL_CONTEXT
macros
- Dropping the now-unused caller parameter
- Reordering operations in xgbe_powerdown() to disable NAPI before
stopping TX/RX, matching the order used in xgbe_stop()
This makes the powerdown/powerup paths easier to follow and keeps the
ordering consistent with the rest of the driver.
Jiayuan Chen [Mon, 9 Mar 2026 12:39:16 +0000 (20:39 +0800)]
net: sched: cls_u32: Avoid memcpy() false-positive warning in u32_init_knode()
Syzbot reported a warning in u32_init_knode() [1].
Similar to commit 7cba18332e36 ("net: sched: cls_u32: Avoid memcpy()
false-positive warning") which addressed the same issue in u32_change(),
use unsafe_memcpy() in u32_init_knode() to work around the compiler's
inability to see into composite flexible array structs.
This silences the false-positive reported by syzbot:
memcpy: detected field-spanning write (size 32) of single field
"&new->sel" at net/sched/cls_u32.c:855 (size 16)
Since the memory is correctly allocated with kzalloc_flex() using
s->nkeys, this is purely a false positive and does not need a Fixes tag.
Eric Joyner [Fri, 6 Mar 2026 21:56:34 +0000 (13:56 -0800)]
ionic: Report additional media types from firmware
The device firmware supports reporting more media types than what was
there in the past, so map these new media types to existing ethtool
bits, which appears to be what other drivers do for media types that
match speeds but not physical spec.
And while here, make a very small cleanup in ionic_get_link_ksettings()
to remove some unnecessary code duplication.
Jakub Kicinski [Wed, 11 Mar 2026 02:33:07 +0000 (19:33 -0700)]
Merge branch 'tools-ynl-policy-query-support'
Jakub Kicinski says:
====================
tools: ynl: policy query support
Improve the Netlink policy support in YNL. This series grew out of
improvements to policy checking, when writing selftests I realized
that instead of doing all the policy parsing in the test we're
better off making it part of YNL itself.
Patch 1 adds pad handling, apparently we never hit pad with commonly
used families. nlctrl policy dumps use pad more frequently.
Patch 2 is a trivial refactor.
Patch 3 pays off some technical debt in terms of documentation.
The YnlFamily class is growing in size and it's quite hard to
find its members. So document it a little bit.
Patch 4 is the main dish, the implementation of get_policy(op)
in YnlFamily.
Patch 5 plugs the new functionality into the CLI.
====================
Jakub Kicinski [Tue, 10 Mar 2026 00:53:36 +0000 (17:53 -0700)]
tools: ynl: add Python API for easier access to policies
The format of Netlink policy dump is a bit curious with messages
in the same dump carrying both attrs and mapping info. Plus each
message carries a single piece of the puzzle the caller must then
reassemble.
I need to do this reassembly for a test, but I think it's generally
useful. So let's add proper support to YnlFamily to return more
user-friendly representation. See the various docs in the patch
for more details.
Rosen Penev [Sat, 7 Mar 2026 03:17:09 +0000 (19:17 -0800)]
net: mvneta: support EPROBE_DEFER when reading MAC address
If nvmem loads after the ethernet driver, mac address assignments will
not take effect. of_get_ethdev_address returns EPROBE_DEFER in such a
case so we need to handle that to avoid eth_hw_addr_random.
Add extra goto section to just free stats as they are allocated right
above.
selftests: drv-net: rss: Add retries to test_rss_key_indir to reduce flakes
The test generates 16 flows, and verifies that traffic is distributed
across two queues via the NICs RSS indirection table. The likelihood of the
flows skewing to a single queue is high, so we retry sending traffic up to
3 times.
Alternatively, we could increase the number of generated flows. But
debug kernels may struggle to ramp this many flows.
During manual testing, the test passed for 10,000 consecutive runs.
inet: add ip_local_port_step_width sysctl to improve port usage distribution
With the current port selection algorithm, ports after a reserved port
range or long time used port are used more often than others [1]. This
causes an uneven port usage distribution. This combines with cloud
environments blocking connections between the application server and the
database server if there was a previous connection with the same source
port, leading to connectivity problems between applications on cloud
environments.
The real issue here is that these firewalls cannot cope with
standards-compliant port reuse. This is a workaround for such situations
and an improvement on the distribution of ports selected.
The proposed solution is to implement a variant of RFC 6056 Algorithm 5.
The step size is selected randomly on every connect() call ensuring it
is a coprime with respect to the size of the range of ports we want to
scan. This way, we can ensure that all ports within the range are
scanned before returning an error. To enable this algorithm, the user
must configure the new sysctl option "net.ipv4.ip_local_port_step_width".
In addition, on graphs generated we can observe that the distribution of
source ports is more even with the proposed approach. [2]
This set addresses a few rds selftests clean ups and bugs encountered
when running in the ksft framework. The first patch is a clean up
patch that addresses pylint warnings, but otherwise no functional
changes. The next patch moves the test time out to a ksft settings
file so that the time out is set appropriately. And lastly we fix a
tcpdump segfault caused by deprecated a os.fork() call.
====================
The os.fork() call creates extra complexity because it forks the entire
process including the python interpreter. ip() then calls cmd() which
creates a subprocess.Popen. We can avoid the extra layering by simply
calling subprocess.Popen directly. Track the process handles directly
and terminate them at cleanup rather than relying on killall. Further
tcpdump's -Z flag attempts to change savefile ownership, which is not
supported by the 9p protocol. Fix this by writing pcap captures to
"/tmp" during the test and move them to the log directory after tcpdump
exits.
rds/run.sh sets a timer of 400s when calling test.py. However when
tests are run through ksft, a default 45s timer is applied. Fix this
by adding a ksft timeout in tools/testing/selftests/net/rds/settings
This series adds self tests to test the registers, the
msix interrupts, the tlv, and the firmware mailbox.
This series assumes that the
[PATCH net-next 0/2] Add debugfs hooks [1]
is present.
When the self tests are run the with ethtool -t:
ethtool -t eth0
The test result is PASS
The test extra info:
Register test (offline) 0
MSI-X Interrupt test (offline) 0
FW mailbox test (on/offline) 0
====================
The mailbox self test ensures the interface to and from
the firmware is healthy by sending a test message and
fielding the response from the firmware.
This patch uses the new completion API [1][2] that allocates a
completion structure, binds the completion to the TEST
message, and uses a new FW parsing routine that wraps the
completion processing around the TLV parser.
The TLV (Type-Value-Length) self uses a known set of data to create a
TLV message. These routines support the MBX self test by creating
the test messages and parsing the response message coming back
from the firmware.
This function is meant to test the global interrupt registers and the
PCIe IP MSI-X functionality. It essentially goes through and tests
various combinations of the set, clear, and mask bits in order to
verify the behavior is as we expect it to be from the driver.
dev_open() already is exported, but drivers which use the netdev
instance lock need to use netif_open() instead. netif_close() is
also already exported [1] so this completes the pairing.
This export is required for the following fbnic self tests to
avoid calling ndo_stop() and ndo_open() in favor of the
more appropriate netif_open() and netif_close() that notifies
any listeners that the interface went down to test and is now
coming back up.
net: mana: hardening: Validate doorbell ID from GDMA_REGISTER_DEVICE response
As a part of MANA hardening for CVM, add validation for the doorbell
ID (db_id) received from hardware in the GDMA_REGISTER_DEVICE response
to prevent out-of-bounds memory access when calculating the doorbell
page address.
In mana_gd_ring_doorbell(), the doorbell page address is calculated as:
addr = db_page_base + db_page_size * db_index
= (bar0_va + db_page_off) + db_page_size * db_index
A hardware could return values that cause this address to fall outside
the BAR0 MMIO region. In Confidential VM environments, hardware responses
cannot be fully trusted.
Add the following validations:
- Store the BAR0 size (bar0_size) in gdma_context during probe.
- Validate the doorbell page offset (db_page_off) read from device
registers does not exceed bar0_size during initialization, converting
mana_gd_init_registers() to return an error code.
- Validate db_id from GDMA_REGISTER_DEVICE response against the
maximum number of doorbell pages that fit within BAR0.
net: airoha: Move GDM forward port configuration in ndo_open/ndo_stop callbacks
This change allows to set GDM forward port configuration to
FE_PSE_PORT_DROP stopping the network device. Hw design requires to stop
packet forwarding putting the interface down. Moreover, PPE firmware
requires to use PPE1 for GDM3 or GDM4.
Jakub Kicinski [Tue, 10 Mar 2026 02:45:31 +0000 (19:45 -0700)]
Merge branch 'net-stmmac-further-ptp-cleanups'
Russell King says:
====================
net: stmmac: further ptp cleanups
The first uses a local variable when setting n_ext_ts which is a minor
simplification of the code. The second removes the now unnecessary
"available" flag for the PPS outputs.
====================
priv->pps[].available is set in stmmac_ptp_register() for all PPS
outputs reported by hardware up to STMMAC_PPS_MAX.
Since we now set priv->ptp_clock_ops.n_per_out to the number of PPS
outputs that both the hardware and driver can support to prevent
array overflow in stmmac_enable(), this makes priv->pps[].available
redundant. Remove this struct member.
Eric Dumazet [Sun, 8 Mar 2026 12:23:02 +0000 (12:23 +0000)]
tcp: move tp->chrono_type next tp->chrono_stat[]
chrono_type is currently in tcp_sock_read_txrx group, which
is supposed to hold read-mostly fields.
But chrono_type is mostly written in tx path, it should
be moved to tcp_sock_write_tx group, close to other
chrono fields (chrono_stat[], chrono_start).
Note this adds holes, but data locality is far more important.
Use a full u8 for the time being, compiler can generate
more efficient code.
Eric Dumazet [Sat, 7 Mar 2026 13:36:01 +0000 (13:36 +0000)]
net/sched: refine indirect call mitigation in tc_wrapper.h
Some modern cpus disable X86_FEATURE_RETPOLINE feature,
even if a direct call can still be beneficial.
Even when IBRS is present, an indirect call is more expensive
than a direct one:
Direct Calls:
Compilers can perform powerful optimizations like inlining,
where the function body is directly inserted at the call site,
eliminating call overhead entirely.
Indirect Calls:
Inlining is much harder, if not impossible, because the compiler
doesn't know the target function at compile time.
Techniques like Indirect Call Promotion can help by using
profile-guided optimization to turn frequently taken indirect calls
into conditional direct calls, but they still add complexity
and potential overhead compared to a truly direct call.
In this patch, I split tc_skip_wrapper in two different
static keys, one for tc_act() (tc_skip_wrapper_act)
and one for tc_classify() (tc_skip_wrapper_cls).
Then I enable the tc_skip_wrapper_cls only if the count
of builtin classifiers is above one.
I enable tc_skip_wrapper_act only it the count of builtin
actions is above one.
In our production kernels, we only have CONFIG_NET_CLS_BPF=y
and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled.
Tested on AMD Turin cpus, cls_bpf_classify() cost went
from 1% down to 0.18 %, and FDO will be able to inline
it in tcf_classify() for further gains.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260307133601.3863071-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Sat, 7 Mar 2026 09:22:14 +0000 (09:22 +0000)]
tcp: move sysctl_tcp_shrink_window to netns_ipv4_read_txrx group
Commit 18fd64d25422 ("netns-ipv4: reorganize netns_ipv4 fast path
variables") missed that __tcp_select_window() is reading
net->ipv4.sysctl_tcp_shrink_window.
Move this field to netns_ipv4_read_txrx group, as __tcp_select_window()
is used both in tx and rx paths.
net: airoha: Make flow control source port mapping dependent on nbq parameter
Flow control source port mapping for USB serdes needs to be configured
according to the GDM port nbq parameter. This is a preliminary patch
since nbq parameter is specific for the given port serdes and needs to
be read from the DTS (in the current codebase is assigned statically).
Alok Tiwari [Fri, 6 Mar 2026 18:08:19 +0000 (10:08 -0800)]
selftests: fib_tests: fix link-local retrieval in fib6_nexthop()
fib6_nexthop() retrieves the link-local address for two interfaces used
in the test. However, both lldummy and llv1 are obtained from dummy0.
llv1 is expected to be retrieved from veth1, which is the interface used
later in the test. The subsequent check and error message also expect
the address to be retrieved from veth1.
Jakub Kicinski [Tue, 10 Mar 2026 02:11:21 +0000 (19:11 -0700)]
Merge tag 'ib-gpio-remove-of-gpio-h-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into mbox
Bartosz Golaszewski says:
====================
Immutable branch between GPIO and net
Convert remaining users of of_gpio.h to using GPIO descriptors and
remove the header.
* tag 'ib-gpio-remove-of-gpio-h-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: remove of_get_named_gpio() and <linux/of_gpio.h>
nfc: nfcmrvl: convert to gpio descriptors
nfc: s3fwrn5: convert to gpio descriptors
====================
Qingfang Deng [Fri, 6 Mar 2026 09:36:49 +0000 (17:36 +0800)]
ppp: simplify input error handling
Currently, ppp_input_error() indicates an error by allocating a 0-length
skb and calling ppp_do_recv(). It takes an error code argument, which is
stored in skb->cb, but not used by ppp_receive_frame().
Simplify the error handling by removing the unused parameter and the
unnecessary skb allocation. Instead, call ppp_receive_error() directly
from ppp_input_error() under the recv lock, and the length check in
ppp_receive_frame() can be removed.
Signed-off-by: Qingfang Deng <dqfext@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethernet: ti: am65-cpsw: Use also port number to identify timestamps
The driver uses packet-type (RX/TX) PTP-message type and PTP-sequence
number to identify a matching timestamp packet for a skb. If the same
PTP packet arrives on both ports (as in a PRP environment) then it is
not obvious which event belongs to which skb.
The event contains also the port number on which it was received.
Instead of masking it out, use it for matching.
Tested-by: Chintan Vankar <c-vankar@ti.com> Reviewed-by: Martin Kaistra <martin.kaistra@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20260306144439.cVwaaopR@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Sat, 7 Mar 2026 16:34:30 +0000 (16:34 +0000)]
net/sched: do not reset queues in graft operations
Following typical script is extremely disruptive,
because each graft operation calls dev_deactivate()
which resets all the queues of the device.
QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 $TXQS`
do
slot=$( printf %x $(( i )) )
tc qd add dev $ETH parent 1:$slot fq $QPARAM
done
done
One can add "ip link set dev $ETH down/up" to reduce the disruption time:
QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
ip link set dev $ETH down
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 $TXQS`
do
slot=$( printf %x $(( i )) )
tc qd add dev $ETH parent 1:$slot fq $QPARAM
done
ip link set dev $ETH up
done
Or we can add a @reset_needed flag to dev_deactivate() and
dev_deactivate_many().
This flag is set to true at device dismantle or linkwatch_do_dev(),
and to false for graft operations.
In the future, we might only stop one queue instead of the whole
device, ie call dev_deactivate_queue() instead of dev_deactivate().
I think the problem (quadratic behavior) was added in commit 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op
for lockless qdisc") but this does not look serious enough to deserve
risky backports.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260307163430.470644-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tim Bird [Thu, 5 Mar 2026 00:47:22 +0000 (17:47 -0700)]
net: Add SPDX ids to some source files
Add SPDX-License-Identifier lines to several source
files under the network sub-directory. Work on files
in the core, dns_resolver, ipv4, ipv6 and
netfilter sub-dirs. Remove boilerplate
and license reference text to avoid ambiguity.
Rusty Russell has expressed that his contributions
were intended to be GPL-2.0-or-later.
====================
tools: ynl: convert samples into selftests
The "samples" were always poor man's tests, used to manually
confirm that C YNL works as expected. Since a proper tests/
directory now exists move the samples and use the kselftest
harness to turn them into selftests outputting KTAP.
====================
Jakub Kicinski [Sat, 7 Mar 2026 03:36:28 +0000 (19:36 -0800)]
tools: ynl: convert ethtool sample to selftest
Convert ethtool.c to use kselftest_harness.h with FIXTURE/TEST_F.
Move ethtool from BINS to TEST_GEN_FILES and add ethtool.sh wrapper
which sets up a netdevsim device before running the test binary.
Output:
TAP version 13
1..2
# Starting 2 tests from 1 test cases.
# RUN ethtool.channels ...
# nsim0: combined 1
# OK ethtool.channels
ok 1 ethtool.channels
# RUN ethtool.rings ...
# nsim0: rx 512 tx 512
# OK ethtool.rings
ok 2 ethtool.rings
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Sat, 7 Mar 2026 03:36:27 +0000 (19:36 -0800)]
tools: ynl: convert devlink sample to selftest
Convert devlink.c to use kselftest_harness.h with FIXTURE/TEST_F.
Move devlink from BINS to TEST_GEN_FILES in the Makefile since
it's invoked via the devlink.sh wrapper which sets up netdevsim.
Output:
TAP version 13
1..2
# Starting 2 tests from 1 test cases.
# RUN devlink.dump ...
# netdevsim/netdevsim1337
# OK devlink.dump
ok 1 devlink.dump
# RUN devlink.info ...
# netdevsim/netdevsim1337:
# driver: netdevsim
# running fw:
# fw.mgmt: 10.20.30
# OK devlink.info
ok 2 devlink.info
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Sat, 7 Mar 2026 03:36:25 +0000 (19:36 -0800)]
tools: ynl: convert tc and tc-filter-add samples to selftest
Convert tc.c and tc-filter-add.c to produce KTAP output with
kselftest_harness. Merge the two tests together. They both
test TC one is testing qdisc and the other classifiers but
they can easily live in a single selftest.
Make the test spawn a new netns, and run the operations on
lo to avoid onerous setup and cleanup.
TAP version 13
1..2
# Starting 2 tests from 1 test cases.
# RUN tc.qdisc ...
# lo: fq_codel limit: 10240p target: 5ms new_flow_cnt: 0
# OK tc.qdisc
ok 1 tc.qdisc
# RUN tc.flower ...
# flower pref 1 proto: 0x8100
# flower:
# vlan_id: 100
# vlan_prio: 5
# num_of_vlans: 3
# action order: 1 vlan push id 200 protocol 0x8100 priority 0
# action order: 2 vlan push id 300 protocol 0x8100 priority 0
# OK tc.flower
ok 2 tc.flower
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Sat, 7 Mar 2026 03:36:24 +0000 (19:36 -0800)]
tools: ynl: convert rt-link sample to selftest
Convert rt-link.c to use kselftest_harness.h with FIXTURE/TEST_F.
Move rt-link from BINS to TEST_GEN_PROGS.
Output:
TAP version 13
1..3
# Starting 3 tests from 1 test cases.
# RUN rt_link.dump ...
# 1: lo: mtu 65536
# 2: sit0: mtu 1480 kind sit
# OK rt_link.dump
ok 1 rt_link.dump
# RUN rt_link.netkit ...
# 4: nk1: mtu 1500 kind netkit primary 1 policy blackhole
# OK rt_link.netkit
ok 2 rt_link.netkit
# RUN rt_link.netkit_err_msg ...
# OK rt_link.netkit_err_msg
ok 3 rt_link.netkit_err_msg
# PASSED: 3 / 3 tests passed.
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Sat, 7 Mar 2026 03:36:23 +0000 (19:36 -0800)]
tools: ynl: convert ovs sample to selftest
Convert ovs.c to produce KTAP output with kselftest_harness.
The single "crud" test creates a new OVS datapath, fetches it back
by name, then dumps all datapaths verifying the new one appears.
IIRC I added this test because ovs is a genetlink family but
has a family-specific fixed header.
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN ovs.crud ...
# get:
# ynl-test(3): pid:0 cache:256
# dump:
# ynl-test(3): pid:0 cache:256
# OK ovs.crud
ok 1 ovs.crud
# PASSED: 1 / 1 tests passed.
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Sat, 7 Mar 2026 03:36:22 +0000 (19:36 -0800)]
tools: ynl: convert netdev sample to selftest
Convert netdev.c to produce KTAP output with 3 tests:
- dev_dump: dump all netdev devices, skip if empty
- dev_get: query first device from dump by ifindex
- ntf_check: subscribe to "mgmt", create a veth via rt-link,
verify netdev notification is received, then delete the veth
Remove stdin/scanf-based UI. Add rt-link dependency for the veth
notification test.
TAP version 13
1..3
# Starting 3 tests from 1 test cases.
# RUN netdev.dump ...
# lo[1] xdp-features (0): xdp-rx-metadata-features (0): xsk-fea...
# sit0[2] xdp-features (0): xdp-rx-metadata-features (0): xsk-fea...
# OK netdev.dump
ok 1 netdev.dump
# RUN netdev.get ...
# lo[1] xdp-features (0): xdp-rx-metadata-features (0): xsk-fea...
# OK netdev.get
ok 2 netdev.get
# RUN netdev.ntf_check ...
# veth0[7] xdp-features (0): xdp-rx-metadata-features (7): timesta...
# OK netdev.ntf_check
ok 3 netdev.ntf_check
# PASSED: 3 / 3 tests passed.
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Sat, 7 Mar 2026 03:36:21 +0000 (19:36 -0800)]
tools: ynl: move samples to tests
The "samples" were always poor man's tests (used to manually
confirm that C YNL works).
Move all C sample programs from tools/net/ynl/samples/ to
tools/net/ynl/tests/, "merge" the Makefiles. The subsequent
changes will convert each sample into a proper KTAP selftests.
Since these are now tests rather than samples - default to
enabling asan. After all we're testing user space code here.
Sort the gitignore while at it, the page-pool entry was a leftover
so delete it.
Jialu Xu [Sat, 7 Mar 2026 03:06:26 +0000 (11:06 +0800)]
gpio: remove of_get_named_gpio() and <linux/of_gpio.h>
All in-tree consumers have been converted to the descriptor-based API.
Remove the deprecated of_get_named_gpio() helper, delete the
<linux/of_gpio.h> header, and drop the corresponding entry from
MAINTAINERS.
Also remove the completed TODO item for this cleanup.
Jialu Xu [Sat, 7 Mar 2026 03:06:24 +0000 (11:06 +0800)]
nfc: nfcmrvl: convert to gpio descriptors
Replace the legacy of_get_named_gpio() / gpio_request_one() /
gpio_set_value() API with the descriptor-based devm_gpiod_get_optional() /
gpiod_set_value() API from <linux/gpio/consumer.h>, removing the
dependency on <linux/of_gpio.h>.
The "reset-n-io" property rename quirk already exists in gpiolib-of.c
(added in commit 9c2cc7171e08), so no additional quirk is needed.
Jialu Xu [Sat, 7 Mar 2026 03:06:22 +0000 (11:06 +0800)]
nfc: s3fwrn5: convert to gpio descriptors
Replace the legacy of_get_named_gpio() / gpio_request_one() /
gpio_set_value() API with the descriptor-based devm_gpiod_get() /
gpiod_set_value() API from <linux/gpio/consumer.h>, removing the
dependency on <linux/of_gpio.h>.
This removes the s3fwrn5_i2c_parse_dt() and s3fwrn82_uart_parse_dt()
functions since devm_gpiod_get() handles both DT lookup and resource
management. The gpio_en and gpio_fw_wake fields in struct phy_common
are changed from int to struct gpio_desc *.
Add rename quirks in gpiolib-of.c for the deprecated "s3fwrn5,en-gpios"
and "s3fwrn5,fw-gpios" properties to maintain backward compatibility
with old device trees.
Aleksei Oladko [Fri, 6 Mar 2026 00:01:23 +0000 (00:01 +0000)]
selftests: net: make ovs-dpctl.py fail when pyroute2 is unsupported
The pmtu.sh kselftest configures OVS using ovs-dpctl.py and falls back
to ovs-vsctl only when ovs-dpctl.py fails. However, ovs-dpctl.py exits
with a success status when the installed pyroute2 package version is
lower than 0.6, even though the OVS datapath is not configured.
As a result, pmtu.sh assumes that the setup was successful and
continues running the test, which later fails due to the missing
OVS configuration.
Fix the exit code handling in ovs-dpctl.py so that pmtu.sh can detect
that the setup did not complete successfully and fall back to
ovs-vsctl.
Johan Hovold [Thu, 5 Mar 2026 10:50:06 +0000 (11:50 +0100)]
net: usb: lan78xx: drop redundant device reference
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.
Drop the redundant device reference to reduce cargo culting, make it
easier to spot drivers where an extra reference is needed, and reduce
the risk of memory leaks when drivers fail to release it.
====================
net: ntb_netdev: Add Multi-queue support
ntb_netdev currently hard-codes a single NTB transport queue pair, which
means the datapath effectively runs as a single-queue netdev regardless
of available CPUs / parallel flows.
The longer-term motivation here is throughput scale-out: allow
ntb_netdev to grow beyond the single-QP bottleneck and make it possible
to spread TX/RX work across multiple queue pairs as link speeds and core
counts keep increasing.
Multi-queue also unlocks the standard networking knobs on top of it. In
particular, once the device exposes multiple TX queues, qdisc/tc can
steer flows/traffic classes into different queues (via
skb->queue_mapping), enabling per-flow/per-class scheduling and QoS in a
familiar way.
Usage
=====
1. Ensure the NTB device you want to use has multiple Memory Windows.
2. modprobe ntb_transport on both sides, if it's not built-in.
3. modprobe ntb_netdev on both sides, if it's not built-in.
4. Use ethtool -L to configure the desired number of queues.
The default number of real (combined) queues is 1.
e.g. ethtool -L eth0 combined 2 # to increase
ethtool -L eth0 combined 1 # to reduce back to 1
Note:
* If the NTB device has only a single Memory Window, ethtool -L eth0
combined N (N > 1) fails with:
"netlink error: No space left on device".
* ethtool -L can be executed while the net_device is up.
Compatibility
=============
The default remains a single queue, so behavior is unchanged unless
the user explicitly increases the number of queues.
Kernel base
===========
ntb-next (latest as of 2026-03-06):
commit 7b3302c687ca ("ntb_hw_amd: Fix incorrect debug message in link
disable path")
Testing / Results
=================
Environment / command line:
- 2x R-Car S4 Spider boards
"Kernel base" (see above) + this series
Without this series:
TCP / UDP : 589 Mbps / 580 Mbps
With this series (default single queue):
TCP / UDP : 583 Mbps / 583 Mbps
With this series + `ethtool -L eth0 combined 2`:
TCP / UDP : 576 Mbps / 584 Mbps
With this series + `ethtool -L eth0 combined 2` + [1], where flows are
properly distributed across queues:
TCP / UDP : 1.13 Gbps / 1.16 Gbps (re-measured with v3)
The 575~590 Mbps variation is run-to-run variance i.e. no measurable
regression or improvement is observed with a single queue. The key
point is scaling from ~600 Mbps to ~1.20 Gbps once flows are
distributed across multiple queues.
Note: On R-Car S4 Spider, only BAR2 is usable for ntb_transport MW.
For testing, BAR2 was expanded from 1 MiB to 2 MiB and split into two
Memory Windows. A follow-up series is planned to add split BAR support
for vNTB. On platforms where multiple BARs can be used for the
datapath, this series should allow >=2 queues without additional
changes.
[1] [PATCH v2 00/10] NTB: epf: Enable per-doorbell bit handling while keeping legacy offset
https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@valinux.co.jp/
(subject was accidentally incorrect in the original posting)
====================
Koichiro Den [Thu, 5 Mar 2026 15:56:39 +0000 (00:56 +0900)]
net: ntb_netdev: Support ethtool channels for multi-queue
Support dynamic queue pair addition/removal via ethtool channels.
Use the combined channel count to control the number of netdev TX/RX
queues, each corresponding to a ntb_transport queue pair.
When the number of queues is reduced, tear down and free the removed
ntb_transport queue pairs (not just deactivate them) so other
ntb_transport clients can reuse the freed resources.
When the number of queues is increased, create additional queue pairs up
to NTB_NETDEV_MAX_QUEUES (=64). The effective limit is determined by the
underlying ntb_transport implementation and NTB hardware resources (the
number of MWs), so set_channels may return -ENOSPC if no more QPs can be
allocated.
Keep the default at one queue pair to preserve the previous behavior.
Koichiro Den [Thu, 5 Mar 2026 15:56:38 +0000 (00:56 +0900)]
net: ntb_netdev: Factor out multi-queue helpers
Implementing .set_channels will otherwise duplicate the same multi-queue
operations at multiple call sites. Factor out the following helpers:
- ntb_netdev_update_carrier(): carrier is switched on when at least
one QP link is up
- ntb_netdev_queue_rx_drain(): drain and free all queued RX packets
for one QP
- ntb_netdev_queue_rx_fill(): prefill RX ring for one QP
Koichiro Den [Thu, 5 Mar 2026 15:56:37 +0000 (00:56 +0900)]
net: ntb_netdev: Gate subqueue stop/wake by transport link
When ntb_netdev is extended to multiple ntb_transport queue pairs, the
netdev carrier can be up as long as at least one QP link is up. In that
setup, a given QP may be link-down while the carrier remains on.
Make the link event handler start/stop the corresponding netdev TX
subqueue and drive carrier state based on whether any QP link is up.
Also guard subqueue wake/start points in the TX completion and timer
paths so a subqueue is not restarted while its QP link is down.
Stop all queues in ndo_open() and let the link event handler wake each
subqueue once ntb_transport link negotiation succeeds.
Koichiro Den [Thu, 5 Mar 2026 15:56:36 +0000 (00:56 +0900)]
net: ntb_netdev: Introduce per-queue context
Prepare ntb_netdev for multi-queue operation by moving queue-pair state
out of struct ntb_netdev.
Introduce struct ntb_netdev_queue to carry the ntb_transport_qp pointer,
the per-QP TX timer and queue id. Pass this object as the callback
context and convert the RX/TX handlers and link event path accordingly.
The probe path allocates a fixed upper bound for netdev queues while
instantiating only a single ntb_transport queue pair, preserving the
previous behavior. Also store client_dev for future queue pair
creation/removal via the ntb_transport API.
====================
nfc: drop redundant USB device references
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.
Drop redundant device references to reduce cargo culting, make it easier
to spot drivers where an extra reference is needed, and reduce the risk
of memory leaks when drivers fail to release them.
====================
Johan Hovold [Thu, 5 Mar 2026 11:10:19 +0000 (12:10 +0100)]
nfc: port100: drop redundant device reference
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.
Drop the redundant device reference to reduce cargo culting, make it
easier to spot drivers where an extra reference is needed, and reduce
the risk of memory leaks when drivers fail to release it.
Johan Hovold [Thu, 5 Mar 2026 11:10:18 +0000 (12:10 +0100)]
nfc: pn533: drop redundant device reference
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.
Drop the redundant device reference to reduce cargo culting, make it
easier to spot drivers where an extra reference is needed, and reduce
the risk of memory leaks when drivers fail to release it.
When XDP_USE_NEED_WAKEUP is used and the fill ring is empty so no buffer
is allocated on RX side, allow RX NAPI to be descheduled. This avoids
wasting CPU cycles on polling. Users will be notified and they need to
make a wakeup call after refilling the ring.
Aleksei Oladko [Thu, 5 Mar 2026 21:10:00 +0000 (21:10 +0000)]
selftests: net: forwarding: fix IPv6 address leak in cleanup
Several forwarding tests (e.g., gre_multipath.sh) initialize both IPv4
and IPv6 addresses using simple_if_init, but only clean up IPv4
in simple_if_fini. This leaves stale IPv6 addresses on the interfaces,
which causes subsequent tests to fail when they encounter unexpected
address configuration.
The issue can be reproduced by running tests in sequence:
# run_kselftest.sh -t net/forwarding:ipip_hier_gre.sh
# run_kselftest.sh -t net/forwarding:min_max_mtu.sh
TAP version 13
1..1
# timeout set to 0
# selftests: net/forwarding: min_max_mtu.sh
# TEST: ping [ OK ]
# TEST: ping6 [ OK ]
# TEST: Test maximum MTU configuration [ OK ]
# TEST: Test traffic, packet size is maximum MTU [FAIL]
# Ping6, packet size: 65487 succeeded, but should have failed
# TEST: Test minimum MTU configuration [ OK ]
# TEST: Test traffic, packet size is minimum MTU [ OK ]
not ok 1 selftests: net/forwarding: min_max_mtu.sh # exit=1
Fix this by removing the unused IPv6 argument from simple_if_init in
tests that don't use IPv6 (gre_multipath.sh, ipip_lib.sh), and by
adding the missing IPv6 argument to simple_if_fini in tests that
use IPv6 (gre_multipath_nh.sh, gre_multipath_nh_res.sh).
Jakub Kicinski [Fri, 6 Mar 2026 23:39:12 +0000 (15:39 -0800)]
Merge branch 'net-stmmac-mdio-related-cleanups'
Russell King says:
====================
net: stmmac: mdio related cleanups
The first four patches clean up the MDC clock divisor selection code,
turning the three different ways we choose a divisor into tabular form,
rather than doing the selection purely in code.
Convert MDIO to use field_prep() which allows a non-constant mask to be
used when preparing fields.
Then use u32 and the associated typed GENMASK for MDIO register field
definitions.
Finally, an extra couple of patches that use appropriate types in
struct mdio_bus_data.
====================
The PCS and PHY masks are passed to the mdio bus layer as phy_mask
to prevent bus addresses between 0 and 31 inclusive being scanned,
and this is declared as u32. Also declare these as u32 in stmmac
for type consistency.
Since this is a u32, use BIT_U32() rather than BIT() to generate
values for these fields.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1vy6AY-0000000BtxJ-3smT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>