git.ipfire.org Git - thirdparty/linux.git/log

net: airoha: select QDMA block according LAN/WAN configuration

Before this patch even GDM ports were assigned to QDMA0 while odd GDM
ports were using QDMA1, so, based on the DTS configuration, both QDMA0
and QDMA1 can theoretically receive traffic destinated to the host cpu
from LAN or WAN GDM ports.
Airoha folks reported the hw design assumes the LAN traffic destinated
to the host cpu is be forwarded to QDMA0 while traffic received on WAN
GDM port is managed by QDMA1. For this reason, select QDMA block according
to the GDM port LAN or WAN configuration:
- QDMA0 is used for GDM LAN devices
- QDMA1 is used for GDM WAN device

Assuming a device with three GDM ports, a typical configuration could be:
- MT7530 DSA switch -> GDM1 (eth0) -> QDMA0 (LAN traffic)
- External PHY -> GDM2 (eth1) -> QDMA1 (WAN traffic)
- External PHY -> GDM3 (eth2) -> QDMA0 (LAN traffic)

We can then bridge eth0 DSA port (lanX) with eth2 since they all tx/rx
LAN traffic.

Please note this patch introduces a change not visible to the user since
airoha_eth driver currently supports just the internal phy available via
the MT7530 DSA switch and there are no WAN interfaces officially supported
since PCS/external phy is not merged mainline yet (it will be posted with
following patches).

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260313-airoha-qdma-lan-wan-mode-v2-1-7d577db6e40c@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'initial-support-for-pic64-hpsc-hx-ethernet-endpoint'

Charles Perry says:

====================
Initial support for PIC64-HPSC/HX Ethernet endpoint

This series add basic support for Microchip "PIC64-HPSC" and "PIC64HX"
Ethernet endpoint. Both SoCs contain 4 GEM IP with support for
MII/RGMII/SGMII/USXGMII at rates of 10M to 10G. Only RGMII and SGMII at a
rate of 1G is tested for now. Each GEM IP has 8 priority queues and the
revision register reads 0x220c010e.

One particularity of this instantiation of GEM is that the MDIO controller
within the GEM IP is disconnected from any physical pin and the SoC rely on
another standalone MDIO controller.

The maximum jumbo frame size also seems to be different on PIC64-HPSC/HX
(16383) than what most other platforms use (10240). I've found that I need
to tweak a bit the MTU calculation for this, otherwise the RXBS field of
the DMACFG register overflows. See patch 2 for more details.

PIC64-HPSC/HX also supports other features guarded behind CAPS bit like
MACB_CAPS_QBV but I've omitted those intentionally because I didn't test
these.
====================

Link: https://patch.msgid.link/20260313140610.3681752-1-charles.perry@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: macb: add support for Microchip pic64hpsc ethernet endpoint

pic64hpsc doesn't have the USRIO register so MACB_CAPS_USRIO_DISABLED is
used.

pic64hpsc does support PTP and has the timestamping unit so
MACB_CAPS_GEM_HAS_PTP is used.

jumbo_max_len is set to 16383 (0x3FFF) as reported by the DCFG2 register
bits 0..13. The JML register also has a default value of 0x3FFF.

dma_burst_length is set to 16 because that's what most other platforms
use and it worked for me so far. There is one other mode where bursts of
up to 256 are allowed but this might impact negatively other masters on
the NOC. The register default value is 4 (bursts up to 4).

Signed-off-by: Charles Perry <charles.perry@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260313140610.3681752-4-charles.perry@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: macb: add safeguards for jumbo frame larger than 10240

The RX buffers for GEM can have a maximum size of 16320 bytes
(0xff in the RXBS field of the DMACFG register means 255*64 =
16320 bytes).

The GEM IP has configurable maximum jumbo frame length that can go up to
16383. The actual value for this limit can be found in the
"jumbo_max_length" field (bits 0..13) of the DCFG2 register.
Currently, the macb driver doesn't use the DCFG2 register when
determining the max MTU, instead an hardcoded value (jumbo_max_len in
struct macb_config) is used for each platform. Right now the maximum
value for jumbo_max_len is 10240 (0x2800).

GEM uses one buffer per packet which means that one buffer must allow
room for the max MTU plus L2 encapsulation and alignment. This is a
limitation of the driver.

This commit adds a limit to max_mtu and rx_buffer_size so that the RXBS
field can never overflow when a large MTU is used.

With this commit, it is now possible to add new platforms with a
jumbo_max_len of 16383 so that the hardware properties of each IP can be
properly captured in struct macb_config.

Signed-off-by: Charles Perry <charles.perry@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260313140610.3681752-3-charles.perry@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: cdns,macb: add a compatible for Microchip pic64hpsc

Add "microchip,pic64hpsc-gem" for "PIC64-HPSC" and
"microchip,pic64hx-gem" for "PIC64HX", compatible with the former.

The generic compatible "cdns,gem" works but offers limited features.
Keep it as a fallback.

The GEM IPs within pic64hpsc have their MDIO controllers unconnected
from any physical pin. Add a check to prevent adding PHYs under the GEM
node.

Signed-off-by: Charles Perry <charles.perry@microchip.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260313140610.3681752-2-charles.perry@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: add ipv6 RA route to ECMP merge test

As commit bbf4a17ad9ff ("ipv6: Fix ECMP sibling count mismatch when
clearing RTF_ADDRCONF") pointed out, RA routes are not elegible for ECMP
merging.

Add a test scenario mixing RA and static routes with gateway to check
that they are not getting merged.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260313124827.3945-1-fmancera@suse.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ppp: remove pch->chan NULL checks from tx path

Now that ppp_disconnect_channel() is called before pch->chan is set to
NULL, a channel from ppp->channels list on the transmit path is
guaranteed to have non-NULL pch->chan.

Remove the pch->chan NULL checks from ppp_push(), ppp_mp_explode(), and
ppp_fill_forward_path(), where a channel is obtained from the list.
Remove the corresponding WRITE/READ_ONCE annotations as they no longer
race.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260312093732.277254-2-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ppp: disconnect channel before nullifying pch->chan

In ppp_unregister_channel(), pch->chan is set to NULL before calling
ppp_disconnect_channel(), which removes the channel from ppp->channels
list using list_del_rcu() + synchronize_net(). This creates an
intermediate state where the channel is still connected (on the list)
but already unregistered (pch->chan == NULL).

Call ppp_disconnect_channel() before setting pch->chan to NULL. After
the synchronize_net(), no new reader on the transmit path will hold a
reference to the channel from the list.

This eliminates the problematic state, and prepares for removing the
pch->chan NULL checks from the transmit path in a subsequent patch.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260312093732.277254-1-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-dsa-mv88e6xxx-add-partial-support-for-tcam-entries'

Cedric Jehasse says:

====================
net: dsa: mv88e6xxx: Add partial support for TCAM entries

This series adds partial Ternary Content Addressable Memory (TCAM) for
the mv88e6390 and mv88e6393 family of switches. TCAM entries allow the
switch to match the first 48 or 96 bytes of a frame and take actions on
matched frames.

This patch introduces a subset of the available TCAM functionality.
Matching on ip addresses/protocol and trapping to the cpu.

Eg. to trap traffic with destination ip 224.0.1.129 to the cpu:

tc qdisc add dev p1 clsact
tc filter add dev p1 ingress protocol ip flower skip_sw \
dst_ip 224.0.1.129 action trap

Review of the mv88e6xxx changes have brought to light something in
cls_flower:
When adding a classifier with an ipv4 address both
FLOW_DISSECTOR_KEY_IPV4_ADDRS and FLOW_DISSECTOR_KEY_IPV6_ADDRS bits are
set in dissector->used_keys.
A change was made to address this.

Signed-off-by: Cedric Jehasse <cedric.jehasse@luminex.be>
====================

Link: https://patch.msgid.link/20260311-net-next-mv88e6xxx-tcam-v8-0-32dd5ba30002@luminex.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: mv88e6xxx: Add partial support for TCAM entries

This patch adds partial Ternary Content Addressable Memory (TCAM) for
the mv88e6390 and mv88e6393 family of switches. TCAM entries allow the
switch to match the first 48 or 96 bytes of a frame and take actions on
matched frames.

This patch introduces a subset of the available TCAM functionality.
Matching on ip addresses/protocol and trapping to the cpu.

Eg. to trap traffic with destination ip 224.0.1.129 to the cpu:

tc qdisc add dev p1 clsact
tc filter add dev p1 ingress protocol ip flower skip_sw \
dst_ip 224.0.1.129 action trap

Signed-off-by: Cedric Jehasse <cedric.jehasse@luminex.be>
Link: https://patch.msgid.link/20260311-net-next-mv88e6xxx-tcam-v8-2-32dd5ba30002@luminex.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: cls_flower: remove unions from fl_flow_key

When creating a flower classifier with an ipv4 address the
flow_dissector has both FLOW_DISSECTOR_KEY_IPV4_ADDRS and
FLOW_DISSECTOR_KEY_IPV6_ADDRS bits set in used_keys.
This happens because ipv4/ipv6 fields are a union and
FL_KEY_SET_IF_MASKED() will interpret either being set as both.

Removing the unions fixes this behavior without needing special handling
for union fields.

Example of a command that caused FLOW_DISSECTOR_KEY_IPV4_ADDRS and
FLOW_DISSECTOR_KEY_IPV6_ADDRS to be set:
tc filter add dev p1 ingress protocol ip flower skip_sw \
dst_ip 224.0.1.129 action trap

Signed-off-by: Cedric Jehasse <cedric.jehasse@luminex.be>
Link: https://patch.msgid.link/20260311-net-next-mv88e6xxx-tcam-v8-1-32dd5ba30002@luminex.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: macb: set default_an_inband to true for SGMII

Most platforms using GEM in SGMII mode use in-band autonegotiation
because it is on by default in GEM's 1G PCS and is always on since
commit e276e5e40e92 ("net: macb: Disable PCS auto-negotiation for SGMII
fixed-link mode"). Leave it on if possible using the "default_an_inband"
flag of "struct phylink_config" so that platforms that lack in-band
autonegotiation configurability at the PHY do not break with commit
1338cfef1ff1 ("net: macb: fix SGMII with inband aneg disabled") which
will turn off in-band autoneg for non hot pluggable PHYs.

Once the majority of the PHY drivers that support SGMII have the
->config_inband() callback, this commit could be reverted so that non
hot pluggable PHY use outband negotiation with macb, like its the case
for other MACs.

Fixes: 1338cfef1ff1 ("net: macb: fix SGMII with inband aneg disabled")
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Closes: https://lore.kernel.org/r/20260304-nebulizer-rounding-40fbc81a2ba1@spud
Signed-off-by: Charles Perry <charles.perry@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260313142140.4040647-1-charles.perry@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: move MSI data out of struct stmmac_priv

Only three platforms supprt MSIs, which means having all the strings
and interrupt arrays always allocated wastes space. None of this data
is performance critical - this data is only used when requesting and
releasing the MSI interrupts.

Move the MSI data out of struct stmmac_priv into its own separately
allocated structure, and move its initialisation to a separate
function.

This removes 768 bytes from struct stmmac_priv.

Link: https://lore.kernel.org/r/aYtq4ypxXTvn_Is6@shell.armlinux.org.uk
Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w113e-0000000DDwc-2oRv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'support-multi-channel-irqs-in-stmmac-platform-drivers'

Jan Petrous says:

====================
Support multi-channel IRQs in stmmac platform drivers

The stmmac core supports two interrupt modes, controlled by the
flag STMMAC_FLAG_MULTI_MSI_EN:

- When the flag is set, the driver uses multi-channel IRQ mode (Multi-IRQ).
- Otherwise, a single IRQ line is requested (aka MAC-IRQ):

static int stmmac_request_irq(struct net_device *dev)
{
        /* Request the IRQ lines */
        if (priv->plat->flags & STMMAC_FLAG_MULTI_MSI_EN)
                ret = stmmac_request_irq_multi_msi(dev);
        else
                ret = stmmac_request_irq_single(dev);
}

At present, only PCI drivers (Intel and Loongson) make use of the Multi-IRQ
mode. This concept can be extended to DT-based embedded glue drivers
(dwmac-xxx.c).

This series adds support for reading per-channel IRQs from the DT node
and reuses the existing STMMAC_FLAG_MULTI_MSI_EN flag to enable multi-IRQ
operation in platform drivers.

The final decision if Multi-IRQ gets enabled remains on glue driver
to allow implementing any reguirements/limitions the focused platform
needs.

NXP S32G2/S32G3/S32R SoCs integrate the DWMAC IP with multi-channel
interrupt support. The dwmac-s32.c driver change is provided as an example of
enabling multi-IRQ mode for non-PCI drivers.
====================

Link: https://patch.msgid.link/20260313-dwmac_multi_irq-v12-0-b5c9d0aa13d6@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: s32: enable support for Multi-IRQ mode

Based on previous changes in platform driver, the vendor
glue driver can enable Multi-IRQ mode, if needed.

To get enabled Multi-IRQ mode for dwmac-s32, the driver checks:

  1) property of 'snps,mtl-xx-config' subnode
     defines 'snps,xx-queues-to-use' bigger then one, ie:

     ethernet@4033c000 {
         compatible = "nxp,s32g2-dwmac";
         ...
         snps,mtl-rx-config = <&mtl_rx_setup>;
         ...

         mtl_rx_setup: rx-queues-config {
             snps,rx-queues-to-use = <2>;
         };

  2) queue based IRQs are set, ie:

     ethernet@4033c000 {
         compatible = "nxp,s32g2-dwmac";
         ...
         interrupts = <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>,
                      /* CHN 0: tx, rx */
                      <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH>,
                      <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>,
                      /* CHN 1: tx, rx */
                      <GIC_SPI 60 IRQ_TYPE_LEVEL_HIGH>,
                      <GIC_SPI 61 IRQ_TYPE_LEVEL_HIGH>;
         interrupt-names = "macirq",
                           "tx-queue-0", "rx-queue-0",
                           "tx-queue-1", "rx-queue-1";

If those prerequisites are met, the driver switches to Multi-IRQ mode,
using per-queue IRQs for rx/tx data pathr:

[    1.387045] s32-dwmac 4033c000.ethernet: Multi-IRQ mode (per queue IRQs) selected

Now the driver owns all queues IRQs:

root@s32g399aevb3:~# grep eth /proc/interrupts
29:    0    0    0    0    0    0    0    0    GICv3  89 Level   eth0:mac
30:    0    0    0    0    0    0    0    0    GICv3  91 Level   eth0:rx-0
31:    0    0    0    0    0    0    0    0    GICv3  93 Level   eth0:rx-1
32:    0    0    0    0    0    0    0    0    GICv3  95 Level   eth0:rx-2
33:    0    0    0    0    0    0    0    0    GICv3  97 Level   eth0:rx-3
34:    0    0    0    0    0    0    0    0    GICv3  99 Level   eth0:rx-4
35:    0    0    0    0    0    0    0    0    GICv3  90 Level   eth0:tx-0
36:    0    0    0    0    0    0    0    0    GICv3  92 Level   eth0:tx-1
37:    0    0    0    0    0    0    0    0    GICv3  94 Level   eth0:tx-2
38:    0    0    0    0    0    0    0    0    GICv3  96 Level   eth0:tx-3
39:    0    0    0    0    0    0    0    0    GICv3  98 Level   eth0:tx-4

Otherwise, if one of the prerequisite don't met, the driver
continue with MAC IRQ mode:

[    1.387045] s32-dwmac 4033c000.ethernet: MAC IRQ mode selected

And only MAC IRQ will be attached:

root@s32g399aevb3:~# grep eth /proc/interrupts
29:    0    0    0    0    0    0    0    0    GICv3  89 Level   eth0:mac

What represents the original MAC IRQ mode and is fully backward
compatible.

Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260313-dwmac_multi_irq-v12-4-b5c9d0aa13d6@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: nxp,s32-dwmac: Declare per-queue interrupts

The DWMAC IP on NXP S32G/R SoCs has connected queue-based IRQ lines,
set them to allow using Multi-IRQ mode.

Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Link: https://patch.msgid.link/20260313-dwmac_multi_irq-v12-3-b5c9d0aa13d6@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: platform: read channels irq

Read IRQ resources for all rx/tx channels, to allow Multi-IRQ mode
for platform glue drivers.

Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260313-dwmac_multi_irq-v12-1-b5c9d0aa13d6@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

xsk: remove repeated defines

Seems we have been carrying around repeated defines for unaligned mode
logic. Remove redundant ones.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260313111931.438911-1-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: improve inet6_ehashfn() entropy

Instead of only using the 32 low order bits of the local address,
use all of them.

Xor net_hash_mix(net) with the 32 high order bits of the local address
so that we can use __jhash_mix() three times.

If we were hashing 4 extra bytes, we would need one __jhash_final()
which is a bit expensive.

Using net_hash_mix() at the beginning allows better register allocation.

We no longer use a cascade of two jhash and inet6_ehash_secret,
this was dubious/weak.

Add a comment explaining why @lport is not part of the jhash computation.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-24 (-24)
Function old new delta
inet6_ehashfn 330 306 -24
Total: Before=24855958, After=24855934, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260313120346.3378811-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: py: give bpftrace more time to start

After commit under Fixes debug runners in the CI hit the following:

# subprocess.TimeoutExpired: Command '['bpftrace', '-f', 'json', '-q', '-e', 'kprobe:netpoll_poll_dev { @hits = count(); } interval:s:10 { exit(); }']' timed out after 15 seconds
# # Exception| net.lib.py.ksft.KsftFailEx: bpftrace failed to run!?: {}

in netpoll_basic.py >10% of the time. Let's give bpftool more time
to start, it can take a while on a debug kernel.

Fixes: 82562972b854 ("selftests: net: pass bpftrace timeout to cmd()")
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20260315160038.3187730-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ti: icssg-prueth: Add HSR multicast FDB port membership management

In HSR offload mode, multicast addresses can be added via HSR master
(hsr0) or directly to slave ports (eth1/eth2). The FDB must track port
membership: P0 (0x1) for HSR master, P1 (0x2) for slave port 1, and P2
(0x4) for slave port 2. When the same address is added from multiple
paths, memberships must accumulate.

Implement a hybrid approach using __dev_mc_sync() callbacks to track
basic add/delete operations, checking netdev_hw_addr->synced to
distinguish HSR-synced addresses from direct additions. Post-process
to handle overlapping memberships by checking refcount:
- refcount=2 with synced=1: HSR only (P0)
- refcount>=3 with synced=1: HSR + direct (P0|P1/P2)
- synced=0 with P0 set: HSR removed, clean up orphaned P0

On add operations, accumulate new membership with existing ports. On
delete operations, remove only the specific port and clean up orphaned
P0 bits if needed.

Add error handling for icssg_fdb_lookup() which can return negative
error codes (e.g., -ETIMEDOUT). On lookup failure in add/delete path,
default to no existing membership. In the post-processing path, skip
the address update to avoid corrupting FDB entries with garbage values.

VLAN Interface Handling:
Add support for multicast addresses added to VLAN interfaces on the HSR
master (e.g., hsr0.7). These addresses require P0 (HSR master) bit to be
set along with the port bits, since VLAN-tagged packets use separate FDB
entries per VLAN ID. Without P0, the HSR master would not receive
multicast packets on VLAN interfaces.

Track whether the add/del operation came from a VLAN interface path and
set P0 when in HSR offload mode with VLAN interfaces. Update orphaned P0
cleanup logic to preserve P0 for VLAN interfaces.

Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260311082923.2962937-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio_net: add page_pool support for buffer allocation

Use page_pool for RX buffer allocation in mergeable and small buffer
modes to enable page recycling and avoid repeated page allocator calls.
skb_mark_for_recycle() enables page reuse in the network stack.

Big packets mode is unchanged because it uses page->private for linked
list chaining of multiple pages per buffer, which conflicts with
page_pool's internal use of page->private.

Implement conditional DMA premapping using virtqueue_dma_dev():
- When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool
handling DMA mapping, submit via virtqueue_add_inbuf_premapped()
- When NULL (VDUSE, direct physical): page_pool handles allocation only,
submit via virtqueue_add_inbuf_ctx()

This preserves the DMA premapping optimization from commit 31f3cd4e5756b
("virtio-net: rq submits premapped per-buffer") while adding page_pool
support as a prerequisite for future zero-copy features (devmem TCP,
io_uring ZCRX).

Page pools are created in probe and destroyed in remove (not open/close),
following existing driver behavior where RX buffers remain in virtqueues
across interface state changes.

Signed-off-by: Vishwanath Seshagiri <vishs@meta.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20260310183107.2822016-1-vishs@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dpaa2-fix-config-relation-with-fsl_dpaa2_switch'

Cai Xinchen says:

====================
dpaa2: fix config relation with FSL_DPAA2_SWITCH

When compile FSL_DPAA2_SWITCH, it needs to set CONFIG_FSL_DPAA2_ETH=y,
otherwise it cannot be compiled.
And as Ioana Ciornei sugggested, FSL_DPAA2_SWITCH included dpaa2-mac.o in
the driver, but it does not select PCS_LYNX, PHYLINK and FSL_XGMAC_MDIO.
And FSL_DPAA2_SWITCH depends on FSL_MC_BUS && FSL_MC_DPIO becuase it uses
fsl_mc_driver APIs.
====================

Link: https://patch.msgid.link/20260312065907.476663-1-caixinchen1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2: compile dpaa2 even CONFIG_FSL_DPAA2_ETH=n

CONFIG_FSL_DPAA2_ETH and CONFIG_FSL_DPAA2_SWITCH are not
associated, but the compilation of FSL_DPAA2_SWITCH depends on
the compilation of the dpaa2 folder. The files controlled by
CONFIG_FSL_DPAA2_SWITCH in the dpaa2 folder are not controlled
by CONFIG_FSL_DPAA2_ETH, except for the files controlled by
CONFIG_FSL_DPAA2_SWITCH. Therefore, removing the restriction will
not affect the compilation of the files in the directory.

Fixes: f48298d3fbfaa ("staging: dpaa2-switch: move the driver out of staging")
Suggested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
Link: https://patch.msgid.link/20260312065907.476663-3-caixinchen1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2: add independent dependencies for FSL_DPAA2_SWITCH

Since the commit 84cba72956fd ("dpaa2-switch: integrate
the MAC endpoint support") included dpaa2-mac.o in the driver,
but it didn't select PCS_LYNX, PHYLINK and FSL_XGMAC_MDIO. it
will lead to link error, such as
undefined reference to `phylink_ethtool_ksettings_set'
undefined reference to `lynx_pcs_create_fwnode'

And the same reason as the commit d2624e70a2f53 ("dpaa2-eth: select
XGMAC_MDIO for MDIO bus support"), enable the FSL_XGMAC_MDIO Kconfig
option in order to have MDIO access to internal and external PHYs.

Because dpaa2-switch uses fsl_mc_driver APIs, add depends on FSL_MC_BUS
&& FSL_MC_DPIO as FSL_DPAA2_SWITCH do.

FSL_XGMAC_MDIO and FSL_MC_BUS depend on OF, thus the dependence of
FSL_MC_BUS can satisfy FSL_XGMAC_MDIO's OF requirement.

Fixes: 84cba72956fd ("dpaa2-switch: integrate the MAC endpoint support")
Suggested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
Link: https://patch.msgid.link/20260312065907.476663-2-caixinchen1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: add tcp_disorder_fin_in_FIN_WAIT.pkt

Commit 795a7dfbc3d9 ("net: tcp: accept old ack during closing")
was fixing an old bug, add a test to make sure we won't break
this case in future kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Link: https://patch.msgid.link/20260313115429.3365751-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'devlink-introduce-shared-devlink-instance-for-pfs-on-same-chip'

Jiri Pirko says:

====================
devlink: introduce shared devlink instance for PFs on same chip

Multiple PFs on a network adapter often reside on the same physical
chip, running a single firmware. Some resources and configurations
are inherently shared among these PFs - PTP clocks, VF group rates,
firmware parameters, and others. Today there is no good object in
the devlink model to attach these chip-wide configuration knobs to.
Drivers resort to workarounds like pinning shared state to PF0 or
maintaining ad-hoc internal structures (e.g., ice_adapter) that are
invisible to userspace.

This problem was discussed extensively starting with Przemek Kitszel's
"whole device devlink instance" RFC for the ice driver [1]. Several
approaches for representing the parent instance were considered:
using a partial PCI BDF as the dev_name (breaks when PFs have different
BDFs in VMs), creating a per-driver bus, using auxiliary devices, or
using faux devices. All of these required a backing struct device for
the parent devlink instance, which does not naturally exist - there is
no PCI device that represents the chip as a whole.

This patchset takes a different approach: allow devlink instances to
exist without any backing struct device. The instance is identified
purely by its internal index, exposed over devlin netlink. This avoids
fabricating fake devices and keeps the devlink handle semantics clean.

The first ten patches prepare the devlink core for device-less
instances by decoupling the handle from the parent device. The last
three introduce the shared devlink infrastructure and its first user
in the mlx5 driver.

Example output showing the shared instance and nesting:

  pci/0000:08:00.0: index 0
    nested_devlink:
      auxiliary/mlx5_core.eth.0
  devlink_index/1: index 1
    nested_devlink:
      pci/0000:08:00.0
      pci/0000:08:00.1
  auxiliary/mlx5_core.eth.0: index 2
  pci/0000:08:00.1: index 3
    nested_devlink:
      auxiliary/mlx5_core.eth.1
  auxiliary/mlx5_core.eth.1: index 4

[1] https://lore.kernel.org/netdev/20250219164410.35665-1-przemyslaw.kitszel@intel.com/
---
Decoupled from "devlink and mlx5: Support cross-function rate scheduling"
patchset to maintain 15-patches limit.

See individual patches for changelog.
====================

Link: https://patch.msgid.link/20260312100407.551173-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add a shared devlink instance for PFs on same chip

Use the previously introduced shared devlink infrastructure to create
a shared devlink instance for mlx5 PFs that reside on the same physical
chip. The shared instance is identified by the chip's serial number
extracted from PCI VPD (V3 keyword, with fallback to serial number
for older devices).

Each PF that probes calls mlx5_shd_init() which extracts the chip serial
number and uses devlink_shd_get() to get or create the shared instance.
When a PF is removed, mlx5_shd_uninit() calls devlink_shd_put()
to release the reference. The shared instance is automatically destroyed
when the last PF is removed.

Make the PF devlink instances nested in this shared devlink instance,
allowing userspace to identify which PFs belong to the same physical
chip.

Example:

pci/0000:08:00.0: index 0
  nested_devlink:
    auxiliary/mlx5_core.eth.0
devlink_index/1: index 1
  nested_devlink:
    pci/0000:08:00.0
    pci/0000:08:00.1
auxiliary/mlx5_core.eth.0: index 2
pci/0000:08:00.1: index 3
  nested_devlink:
    auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1: index 4

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-14-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

documentation: networking: add shared devlink documentation

Document shared devlink instances for multiple PFs on the same chip.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-13-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: introduce shared devlink instance for PFs on same chip

Multiple PFs may reside on the same physical chip, running a single
firmware. Some of the resources and configurations may be shared among
these PFs. Currently, there is no good object to pin the configuration
knobs on.

Introduce a shared devlink instance, instantiated upon probe of
the first PF and removed during remove of the last PF. The shared
devlink instance is not backed by any device device, as there is
no PCI device related to it.

The implementation uses reference counting to manage the lifecycle:
each PF that probes calls devlink_shd_get() to get or create
the shared instance, and calls devlink_shd_put() when it removes.
The shared instance is automatically destroyed when the last PF removes.

Example:

pci/0000:08:00.0: index 0
  nested_devlink:
    auxiliary/mlx5_core.eth.0
devlink_index/1: index 1
  nested_devlink:
    pci/0000:08:00.0
    pci/0000:08:00.1
auxiliary/mlx5_core.eth.0: index 2
pci/0000:08:00.1: index 3
  nested_devlink:
    auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1: index 4

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-12-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: allow devlink instance allocation without a backing device

Allow devlink_alloc_ns() to be called with dev=NULL to support
device-less devlink instances. When dev is NULL, the instance is
identified over netlink using "devlink_index" as bus_name and
the decimal index value as dev_name.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-11-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: add devl_warn() helper and use it in port warnings

Introduce devl_warn() macro that uses dev_warn() when a backing
device is available and falls back to pr_warn() otherwise. Convert
all dev_warn() calls in port.c to use it, preparing for devlink
instances without a backing device.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-10-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: add devlink_dev_driver_name() helper and use it in trace events

In preparation to dev-less devlinks, add devlink_dev_driver_name()
that returns the driver name stored in devlink struct, and use it in
all trace events.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-9-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: introduce __devlink_alloc() with dev driver pointer

Introduce __devlink_alloc() as an internal devlink allocator that
accepts a struct device_driver pointer and stores it in the devlink
instance. This allows internal devlink code (e.g. shared instances)
to associate a driver with a devlink instance without need to pass dev
pointer.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-8-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: support index-based notification filtering

Extend the notification filter descriptor with devlink_index so
that userspace can filter notifications by devlink instance index
in addition to bus_name/dev_name.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-7-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: support index-based lookup via bus_name/dev_name handle

Devlink instances without a backing device use bus_name
"devlink_index" and dev_name set to the decimal index string.
When user space sends this handle, detect the pattern and perform
a direct xarray lookup by index instead of iterating all instances.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-6-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: allow to use devlink index as a command handle

Currently devlink instances are addressed bus_name/dev_name tuple.
Allow the newly introduced DEVLINK_ATTR_INDEX to be used as
an alternative handle for all devlink commands.

When DEVLINK_ATTR_INDEX is present in the request, use it for a direct
xarray lookup instead of iterating over all instances comparing
bus_name/dev_name strings.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-5-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: avoid extra iterations when found devlink is not registered

Since the one found is not registered, very unlikely another one with
the same bus_name/dev_name is going to be found. Stop right away and
prepare common "found" path for the follow-up patch.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: add helpers to get bus_name/dev_name

Introduce devlink_bus_name() and devlink_dev_name() helpers and
convert all direct accesses to devlink->dev->bus->name and
dev_name(devlink->dev) to use them.

This prepares for dev-less devlink instances where these helpers
will be extended to handle the missing device.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: expose devlink instance index over netlink

Each devlink instance has an internally assigned index used for xarray
storage. Expose it as a new DEVLINK_ATTR_INDEX uint attribute alongside
the existing bus_name and dev_name handle.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260312100407.551173-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-phy-further-decouple-provider-from-consumer-part'

Heiner Kallweit says:

====================
net: phy: further decouple provider from consumer part

This series aims at further decoupling the provider and consumer
part in phylib.
====================

Link: https://patch.msgid.link/9d5724bc-e525-4f8f-b3f8-b16dd5a1164e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: move remaining provider code to mdio_bus_provider.c

This moves definition of mdio_bus class and bus_type to the provider
side, what allows to make them private to libphy.
As a prerequisite MDIO statistics handling is moved to the
provider side as well.

Note: This patch causes a checkpatch error "Macros with complex values
      should be enclosed in parentheses" for
      MDIO_BUS_STATS_ADDR_ATTR_GROUP. I consider this a false positive
      here, in addition the patch just moves existing code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/47b85676-b349-4aa0-a5ef-cd37769a4c69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: move registering mdio_bus_class and mdio_bus_type to libphy

The MDIO consumer side shouldn't register class and bus_type.
Therefore move this to libphy.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/b15b378a-fda2-44b9-9d63-bf82919b71b2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: move (of_)mdio_find_bus to mdio_bus_provider.c

Functionality outside libphy shouldn't access mdio_bus_class directly.
So move both functions to the provider side. This is a step towards
making mdio_bus_class private to libphy.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/b6161c64-68ac-4524-82ec-5b7d81b86dbc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: make mdio_device.c part of libphy

This patch
- makes mdio_device.c part of libphy
- makes mdio_device_(un)register_reset() static
- moves mdiobus_(un)register_device() from mdio_bus.c to mdio_device.c,
stops exporting both functions and makes them private to phylib

This further decouples the MDIO consumer functionality from libphy.

Note: This makes MDIO driver registration part of phylib, therefore
adjust Kconfig dependencies where needed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/c6dbf9b3-3ca0-434b-ad3a-71fe602ab809@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: move mdio_device reset handling functions in the code

In preparation of a follow-up patch this moves reset-related functions
in the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/8ea1a929-33b8-49ee-afe6-355f5a7d2bd1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: cdc-ether: unify error handling in probe

usbnet_generic_cdc_bind() is duplicating the error handling
multiple times. That is bad. Unify it with jumps.

V2: Update error logging with every cause a unique message

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://patch.msgid.link/20260312084612.1469853-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mdio: remove selecting FIXED_PHY for FWNODE_MDIO

Fwnode MDIO has never used the fixed PHY code, therefore don't select
symbol FIXED_PHY.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/880ca62b-a5d3-4865-bbce-2d2210928239@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: validate IPV4_DEVCONF attributes properly

As the IPV4_DEVCONF netlink attributes are not being validated, it is
possible to use netlink to set read-only values like mc_forwarding. In
addition, valid ranges are not being validated neither but that is less
relevant as they aren't in sysctl.

To avoid similar situations in the future, define a NLA policy for
IPV4_DEVCONF attributes which are nested in IFLA_INET_CONF.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260312142637.5704-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: avoid passing pci_dev

The pci_dev is only used to provide the ethtool bus_info using
pci_name(priv->plat->pdev). This is the same as dev_name(priv->device).
Thus, rather than passing the pci_dev, make use of what we already
have.

To avoid unexpectedly exposing the device name through ethtool where
it wasn't provided before, add a flag priv->plat->provide_bus_info
to enable this, which only dwmac-intel needs to set.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1w0evI-0000000CzY7-1fyo@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: enetc: remove stray semicolon

Remove stray semicolon from ntmp_table_name().

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260313112454.427191-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic3: Fix spelling mistake "capbility" -> "capability"

There is a spelling mistake in a dev_dbg message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260312235922.3442120-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-add-skb_drop_reason_recursion_limit'

Eric Dumazet says:

====================
net: add SKB_DROP_REASON_RECURSION_LIMIT

Add a new drop reason : SKB_DROP_REASON_RECURSION_LIMIT

Used for packets dropped in a too deep virtual device chain,
from tunnels and __dev_queue_xmit()

__dev_queue_xmit() can also return SKB_DROP_REASON_DEV_READY
====================

Link: https://patch.msgid.link/20260312201824.203093-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: plumb drop reasons to __dev_queue_xmit()

Add drop reasons to __dev_queue_xmit():

- SKB_DROP_REASON_DEV_READY : device is not UP.

- SKB_DROP_REASON_RECURSION_LIMIT : recursion limit on virtual device is hit.

Also add an unlikely() for the SKB_DROP_REASON_DEV_READY case,
and reduce indentation level.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260312201824.203093-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dropreason: add SKB_DROP_REASON_RECURSION_LIMIT

ip[6]tunnel_xmit() can drop packets if a too deep recursion level
is detected.

Add SKB_DROP_REASON_RECURSION_LIMIT drop reason.

We will use this reason later in __dev_queue_xmit().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260312201824.203093-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tcp-rfc-7323-compliant-window-retraction-handling'

Simon Baatz says:

====================
tcp: RFC 7323-compliant window retraction handling

this series implements the receiver-side requirements for TCP window
retraction as specified in RFC 7323 and adds packetdrill tests to
cover the new behavior.

Please see the first patch for background and implementation
details. Since MPTCP adjusts the TCP receive window on subflows, the
relevant MPTCP code paths are updated accordingly.
====================

Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-0-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: add tcp_rcv_neg_window.pkt

The test ensures we correctly apply the maximum advertised window limit
when rcv_nxt advances past rcv_mwnd_seq, so that the "usable window"
is properly clamped to zero rather than becoming negative.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-6-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt

This test verifies the sequence number checks using the maximum
advertised window sequence number when net.ipv4.tcp_shrink_window
is enabled.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-5-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt

This test verifies
- the sequence number checks using the maximum advertised window
sequence number and
- the logic for handling received data in tcp_data_queue()

for the cases:

1. The window is reduced to zero because of memory

2. The window grows again but still does not reach the originally
advertised window

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-4-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW

Since commit 9ca48d616ed7 ("tcp: do not accept packets beyond
window"), the path leading to SKB_DROP_REASON_TCP_OVERWINDOW in
tcp_data_queue() is probably dead. However, it can be reached now when
tcp_max_receive_window() is larger than tcp_receive_window(). In that
case, increment LINUX_MIB_BEYOND_WINDOW as done in tcp_sequence().

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-3-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wnd

MPTCP shares a receive window across subflows and applies it at the
subflow level by adjusting each subflow's rcv_wnd when needed. With
the new TCP tracking of the maximum advertised window sequence,
rcv_mwnd_seq must stay consistent with these subflow-level rcv_wnd
adjustments.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-2-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: implement RFC 7323 window retraction receiver requirements

By default, the Linux TCP implementation does not shrink the
advertised window (RFC 7323 calls this "window retraction") with the
following exceptions:

- When an incoming segment cannot be added due to the receive buffer
  running out of memory. Since commit 8c670bdfa58e ("tcp: correct
  handling of extreme memory squeeze") a zero window will be
  advertised in this case. It turns out that reaching the required
  memory pressure is easy when window scaling is in use. In the
  simplest case, sending a sufficient number of segments smaller than
  the scale factor to a receiver that does not read data is enough.

- Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
  allowing the tcp window to shrink") addressed the "eating memory"
  problem by introducing a sysctl knob that allows shrinking the
  window before running out of memory.

However, RFC 7323 does not only state that shrinking the window is
necessary in some cases, it also formulates requirements for TCP
implementations when doing so (Section 2.4).

This commit addresses the receiver-side requirements: After retracting
the window, the peer may have a snd_nxt that lies within a previously
advertised window but is now beyond the retracted window. This means
that all incoming segments (including pure ACKs) will be rejected
until the application happens to read enough data to let the peer's
snd_nxt be in window again (which may be never).

To comply with RFC 7323, the receiver MUST honor any segment that
would have been in window for any ACK sent by the receiver and, when
window scaling is in effect, SHOULD track the maximum window sequence
number it has advertised. This patch tracks that maximum window
sequence number rcv_mwnd_seq throughout the connection and uses it in
tcp_sequence() when deciding whether a segment is acceptable.

rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
tcp_select_window(). If we count tcp_sequence() as fast path, it is
read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
cacheline group.

The logic for handling received data in tcp_data_queue() is already
sufficient and does not need to be updated.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-1-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: ocp: Add support for Xilinx-based Adva TimeCard variant

Add support for the Adva TimeCard model built on a Xilinx-based design.
This patch enables detection and integration of the new hardware within
the existing OCP timecard framework.
The Xilinx variant relies on the shared driver infrastructure, requiring
only small, targeted additions to accommodate its specific
characteristics.

Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260312082009.249622-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'udp-retire-udp-lite'

Kuniyuki Iwashima says:

====================
udp: Retire UDP-Lite.

In 2023, syzbot found a null-ptr-deref bug triggered when UDP-Lite
attempted to charge an skb after the total memory usage for UDP-Lite
_and_ UDP exceeded a system-wide threshold, net.ipv4.udp_mem[1].

Since this threshold is shared with UDP, the bug would have been
easy to trigger if any real-world applications were using UDP-Lite;
however, only syzbot ever stumbled upon it.

The bug had persisted since 2016, suggesting that UDP-Lite had
remained unused for 7 years at that point.

The bug was fixed in commit ad42a35bdfc6 ("udplite: Fix NULL pointer
dereference in __sk_mem_raise_allocated()."), and we added another
commit be28c14ac8bb ("udplite: Print deprecation notice.") to
announce the deprecation plan.

Since then, no one has complained, so it is time to officially
retire UDP-Lite.

This series first removes IPv6 and IPv4 UDP-Lite sockets, then
gradually cleans up the remaining dead/unnecessary code within
the UDP stack.

By removing a bunch of conditionals for UDP-Lite from the fast
path, udp_rr with 20,000 flows sees a 10% increase in pps
(13.3 Mpps -> 14.7 Mpps)  on an AMD EPYC 7B12 (Zen 2) 64-Core
Processor platform.

[ With FDO, the baseline is much higher and the delta was ~3%,
  20.1 Mpps -> 20.7 Mpps ]

Before:

$ nstat > /dev/null; sleep 1; nstat | grep Udp
Udp6InDatagrams                 14013408           0.0
Udp6OutDatagrams                14013128           0.0

After:

$ nstat > /dev/null; sleep 1; nstat | grep Udp
Udp6InDatagrams                 15491971           0.0
Udp6OutDatagrams                15491671           0.0

$ ./scripts/bloat-o-meter vmlinux.before vmlinux.after
add/remove: 13/75 grow/shrink: 11/75 up/down: 13777/-18401 (-4624)
Function                                     old     new   delta
udp4_gro_receive                             872     866      -6
udp6_gro_receive                             910     903      -7
udp_rcv                                       32    1727   +1695
udpv6_rcv                                     32    1450   +1418
__udp4_lib_rcv                              2045       -   -2045
__udp6_lib_rcv                              2084       -   -2084
udp_unicast_rcv_skb                          160     149     -11
udp6_unicast_rcv_skb                         196     181     -15
__udp4_lib_mcast_deliver                     925     846     -79
__udp6_lib_mcast_deliver                     922     810    -112
__udp4_lib_lookup                            973     969      -4
__udp6_lib_lookup                            940     929     -11
__udp4_lib_lookup_skb                        106     100      -6
__udp6_lib_lookup_skb                         71      66      -5
udp4_lib_lookup_skb                          132     127      -5
udp6_lib_lookup_skb                           87      81      -6
udp_queue_rcv_skb                            326     356     +30
udpv6_queue_rcv_skb                          331     361     +30
udp_queue_rcv_one_skb                       1233     914    -319
udpv6_queue_rcv_one_skb                     1250     930    -320
__udp_enqueue_schedule_skb                  1067     995     -72
udp_rcv_segment                              520     480     -40
udp_post_segment_fix_csum                    120       -    -120
udp_lib_checksum_complete                    200      84    -116
udp_err                                       27    1103   +1076
udpv6_err                                     36    1417   +1381
__udp4_lib_err                              1112       -   -1112
__udp6_lib_err                              1448       -   -1448
udp_recvmsg                                 1149     994    -155
udpv6_recvmsg                               1349    1294     -55
udp_sendmsg                                 2730    2648     -82
udp_send_skb                                 909     681    -228
udpv6_sendmsg                               3022    2861    -161
udp_v6_send_skb                             1214     952    -262
...
Total: Before=18446744073748075501, After=18446744073748070877, chg -0.00%
====================

Link: https://patch.msgid.link/20260311052020.1213705-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Don't pass proto to __udp4_lib_rcv() and __udp6_lib_rcv().

UDP and UDP-Lite shared __udp4_lib_rcv() and __udp6_lib_rcv()
by passing IPPROTO_UDP or IPPROTO_UDPLITE.

Now, @proto is always IPPROTO_UDP.

Let's not pass it and rename the functions accordingly.

With this series removing a bunch of conditionals for UDP-Lite
from the fast path, udp_rr with 20,000 flows sees a 10% increase
in pps (13.3 Mpps -> 14.7 Mpps)  on an AMD EPYC 7B12 (Zen 2)
64-Core Processor platform.

[ With FDO, the baseline is much higher and the delta was ~3%,
  20.1 Mpps -> 20.7 Mpps ]

Before:

$ nstat > /dev/null; sleep 1; nstat | grep Udp
Udp6InDatagrams                 14013408           0.0
Udp6OutDatagrams                14013128           0.0

After:

$ nstat > /dev/null; sleep 1; nstat | grep Udp
Udp6InDatagrams                 15491971           0.0
Udp6OutDatagrams                15491671           0.0

$ ./scripts/bloat-o-meter vmlinux.before vmlinux.after
add/remove: 13/75 grow/shrink: 11/75 up/down: 13777/-18401 (-4624)
Function                                     old     new   delta
udp4_gro_receive                             872     866      -6
udp6_gro_receive                             910     903      -7
udp_rcv                                       32    1727   +1695
udpv6_rcv                                     32    1450   +1418
__udp4_lib_rcv                              2045       -   -2045
__udp6_lib_rcv                              2084       -   -2084
udp_unicast_rcv_skb                          160     149     -11
udp6_unicast_rcv_skb                         196     181     -15
__udp4_lib_mcast_deliver                     925     846     -79
__udp6_lib_mcast_deliver                     922     810    -112
__udp4_lib_lookup                            973     969      -4
__udp6_lib_lookup                            940     929     -11
__udp4_lib_lookup_skb                        106     100      -6
__udp6_lib_lookup_skb                         71      66      -5
udp4_lib_lookup_skb                          132     127      -5
udp6_lib_lookup_skb                           87      81      -6
udp_queue_rcv_skb                            326     356     +30
udpv6_queue_rcv_skb                          331     361     +30
udp_queue_rcv_one_skb                       1233     914    -319
udpv6_queue_rcv_one_skb                     1250     930    -320
__udp_enqueue_schedule_skb                  1067     995     -72
udp_rcv_segment                              520     480     -40
udp_post_segment_fix_csum                    120       -    -120
udp_lib_checksum_complete                    200      84    -116
udp_err                                       27    1103   +1076
udpv6_err                                     36    1417   +1381
__udp4_lib_err                              1112       -   -1112
__udp6_lib_err                              1448       -   -1448
udp_recvmsg                                 1149     994    -155
udpv6_recvmsg                               1349    1294     -55
udp_sendmsg                                 2730    2648     -82
udp_send_skb                                 909     681    -228
udpv6_sendmsg                               3022    2861    -161
udp_v6_send_skb                             1214     952    -262
...
Total: Before=18446744073748075501, After=18446744073748070877, chg -0.00%

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-16-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Don't pass udptable to IPv4 socket lookup functions.

Since UDP and UDP-Lite had dedicated socket hash tables for
each, we have had to pass the pointer down to many socket
lookup functions.

UDP-Lite gone, and we do not need to do that.

Let's fetch net->ipv4.udp_table only where needed in IPv4
stack: __udp4_lib_lookup(), __udp4_lib_mcast_deliver(),
and udp_diag_dump().

Some functions are renamed as the wrapper functions are no
longer needed.

  __udp4_lib_err()     -> udp_err()
  __udp_diag_destroy() -> udp_diag_destroy()
  udp_dump_one()       -> udp_diag_dump_one()
  udp_dump()           -> udp_diag_dump()

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-15-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Don't pass udptable to IPv6 socket lookup functions.

Since UDP and UDP-Lite had dedicated socket hash tables for
each, we have had to pass the pointer down to many socket
lookup functions.

UDP-Lite gone, and we do not need to do that.

Let's fetch net->ipv4.udp_table only where needed in IPv6
stack: __udp6_lib_lookup() and __udp6_lib_mcast_deliver().

__udp6_lib_err() is renamed to udpv6_err() as its wrapper
is no longer needed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Remove dead check in __udp[46]_lib_lookup() for BPF.

BPF socket lookup for SO_REUSEPORT does not support UDP-Lite.

In __udp4_lib_lookup() and __udp6_lib_lookup(), it checks if
the passed udptable pointer is the same as net->ipv4.udp_table,
which is only true for UDP.

Now, the condition is always true.

Let's remove the check.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Remove udp_table in struct udp_seq_afinfo.

Since UDP and UDP-Lite had dedicated socket hash tables for
each, we have had to fetch them from different pointers for
procfs or bpf iterator.

UDP always has its global or per-netns table in
net->ipv4.udp_table and struct udp_seq_afinfo.udp_table
is NULL.

OTOH, UDP-Lite had only one global table in the pointer.

We no longer use the field.

Let's remove it and udp_get_table_seq().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Remove struct proto.h.udp_table.

Since UDP and UDP-Lite had dedicated socket hash tables for
each, we have had to fetch them from different pointers.

UDP always has its global or per-netns table in
net->ipv4.udp_table and struct proto.h.udp_table is NULL.

OTOH, UDP-Lite had only one global table in the pointer.

We no longer use the field.

Let's remove it and udp_get_table_prot().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Remove UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV.

UDP-Lite supports variable-length checksum and has two socket
options, UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV, to control
the checksum coverage.

Let's remove the support.

setsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was only
available for UDP-Lite and returned -ENOPROTOOPT for UDP.

Now, the options are handled in ip_setsockopt() and
ipv6_setsockopt(), which still return the same error.

getsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was available
for UDP and always returned 0, meaning full checksum, but now
-ENOPROTOOPT is returned.

Given that getsockopt() is meaningless for UDP and even the options
are not defined under include/uapi/, this should not be a problem.

  $ man 7 udplite
  ...
  BUGS
       Where glibc support is missing, the following definitions
       are needed:

           #define IPPROTO_UDPLITE     136
           #define UDPLITE_SEND_CSCOV  10
           #define UDPLITE_RECV_CSCOV  11

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Remove partial csum code in TX.

UDP TX paths also have some code for UDP-Lite partial
checksum:

* udplite_csum() in udp_send_skb() and udp_v6_send_skb()
* udplite_getfrag() in udp_sendmsg() and udpv6_sendmsg()

Let's remove such code.

Now, we can use IPPROTO_UDP directly instead of sk->sk_protocol
or fl6->flowi6_proto for csum_tcpudp_magic() and csum_ipv6_magic().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Remove partial csum code in RX.

UDP-Lite supports the partial checksum and the coverage is
stored in the position of the length field of struct udphdr.

In RX paths, udp4_csum_init() / udp6_csum_init() save the value
in UDP_SKB_CB(skb)->cscov and set UDP_SKB_CB(skb)->partial_cov
to 1 if the coverage is not full.

The subsequent processing diverges depending on the value,
but such paths are now dead.

Also, these functions have some code guarded for UDP:

* udp_unicast_rcv_skb / udp6_unicast_rcv_skb
* __udp4_lib_rcv() and __udp6_lib_rcv().

Let's remove the partial csum code and the unnecessary
guard for UDP-Lite in RX.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

smack: Remove IPPROTO_UDPLITE support in security_sock_rcv_skb().

smack_socket_sock_rcv_skb() is registered as socket_sock_rcv_skb,
which is called as security_sock_rcv_skb() in sk_filter_trim_cap().

Now that UDP-Lite is gone, let's remove the IPPROTO_UDPLITE support
in smack_socket_sock_rcv_skb().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://patch.msgid.link/20260311052020.1213705-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Remove UDP-Lite SNMP stats.

Since UDP and UDP-Lite shared most of the code, we have had
to check the protocol every time we increment SNMP stats.

Now that the UDP-Lite paths are dead, let's remove UDP-Lite
SNMP stats.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: Retire UDP-Lite.

We have deprecated IPv6 UDP-Lite sockets.

Let's drop support for IPv4 UDP-Lite sockets as well.

Most of the changes are similar to the IPv6 patch: removing
udplite.c and udp_impl.h, marking most functions in udp_impl.h
as static, moving the prototype for udp_recvmsg() to udp.h, and
adding INDIRECT_CALLABLE_SCOPE for it.

In addition, the INET_DIAG support for UDP-Lite is dropped.

We will remove the remaining dead code in the following patches.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Remove UDP-Lite support for IPV6_ADDRFORM.

We cannot create IPv6 UDP-Lite sockets anymore.

Let's remove dead code in IPV6_ADDRFORM.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: Retire UDP-Lite.

As announced in commit be28c14ac8bb ("udplite: Print deprecation
notice."), it's time to deprecate UDP-Lite.

As a first step, let's drop support for IPv6 UDP-Lite sockets.

We will remove the remaining dead code gradually.

Along with the removal of udplite.c, most of the functions exposed
via udp_impl.h are made static.

The prototypes of udpv6_sendmsg() and udpv6_recvmsg() are moved
to udp.h, but only udpv6_recvmsg() has INDIRECT_CALLABLE_DECLARE()
because udpv6_sendmsg() is exported for rxrpc since commit ed472b0c8783
("rxrpc: Call udp_sendmsg() directly").

Also, udpv6_recvmsg() needs INDIRECT_CALLABLE_SCOPE for
CONFIG_MITIGATION_RETPOLINE=n.

Note that udplite.h is included temporarily for udplite_csum().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Make udp[46]_seq_show() static.

Since commit a3d2599b2446 ("ipv{4,6}/udp{,lite}: simplify proc
registration"), udp4_seq_show() and udp6_seq_show() are not
used in net/ipv4/udplite.c and net/ipv6/udplite.c.

Instead, udp_seq_ops and udp6_seq_ops are exposed to UDP-Lite.

Let's make udp4_seq_show() and udp6_seq_show() static.

udp_seq_ops and udp6_seq_ops are moved to udp_impl.h so that
we can make them static when the header is removed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: update outdated comment

The function netlink_clear_multicast_users() was removed as unused
in commit 2173f8d953e7 ("netlink: cleanup tap related functions").
Update the comment in netlink_change_ngroups() to remove the stale
reference, replacing it with a general description of the behavior
while preserving the warning.

Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>
Link: https://patch.msgid.link/20260311133519.688-1-kexinsun@smail.nju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qlcnic: update outdated comment

The function pci_unmap_page() was a compatibility wrapper around
dma_unmap_page(), and was removed by commit 7968778914e5 ("PCI:
Remove the deprecated pci-dma-compat.h API"). Update the comment
accordingly.

Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>
Link: https://patch.msgid.link/20260311133012.519-1-kexinsun@smail.nju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add skb_defer_disable_key static key

Add a static key to bypass skb_attempt_defer_free() steps
if net.core.skb_defer_max is set to zero.

Main benefit is the atomic_long_inc_return() avoidance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260311191340.1996888-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: page_pool: scale alloc cache with PAGE_SIZE

The current page_pool alloc-cache size and refill values were chosen to
match the NAPI budget and to leave headroom for XDP_DROP recycling.
These fixed values do not scale well with large pages,
as they significantly increase a given page_pool's memory footprint.

Scale these values to better balance memory footprint across page sizes,
while keeping behavior on 4KB-page systems unchanged.

Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20260309081301.103152-1-noren@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'genetlink-apply-reject-policy-for-split-ops-on-the-dispatch-path'

Jakub Kicinski says:

====================
genetlink: apply reject policy for split ops on the dispatch path

Looks like I somehow missed adding default reject policies to commands
in families using split Netlink ops. I realized this randomly trying
to dump page pools for a specific device and always getting all of them
back. The per-device dump is simply not implemented so the request
should have been rejected. Patch 2 is the real change, the rest is
just accompaniment.
====================

Link: https://patch.msgid.link/20260311032839.417748-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add test for Netlink policy dumps

Add validation for the nlctrl family, accessing family info and
dumping policies.

  TAP version 13
  1..4
  ok 1 nl_nlctrl.getfamily_do
  ok 2 nl_nlctrl.getfamily_dump
  ok 3 nl_nlctrl.getpolicy_dump
  ok 4 nl_nlctrl.getpolicy_by_op
  # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Link: https://patch.msgid.link/20260311032839.417748-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: make sure that Netlink rejects unknown attrs in dump

Add a test case for rejecting attrs if policy is not set.
dev_get dump has no input policy (accepts no attrs).

Link: https://patch.msgid.link/20260311032839.417748-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

genetlink: apply reject policy for split ops on the dispatch path

Commit 4fa86555d1cd ("genetlink: piggy back on resv_op to default to
a reject policy") added genl_policy_reject_all to ensure that ops
without an explicit policy reject all attributes rather than silently
accepting them. This change was applied to net.

When split ops were later introduced in net-next in
commit b8fd60c36a44 ("genetlink: allow families to use split ops directly"),
genl_op_fill_in_reject_policy_split() was added and called from
genl_op_from_split() (used for policy dumping and registration).
However, genl_get_cmd_split(), which is called for incoming messages,
copies split_ops entries as-is without applying the reject policy.
This means that split ops without policy accept all inputs.

This looks like an omission / mistake made when splitting the changes
between net and net-next. Let's try to re-introduce the checking.
Not considering this a fix given the regression potential.
If anyone reports issues we should probably fill in fake policies
for specific ops rather than reverting this.

Link: https://patch.msgid.link/20260311032839.417748-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

genetlink: use maxattr of 0 for the reject policy

Commit 4fa86555d1cd ("genetlink: piggy back on resv_op to default to
a reject policy") added genl_policy_reject_all to ensure that ops
without an explicit policy reject all attributes rather than silently
accepting them.

The reject policy had maxattr of 1. Passing info->attrs of size 2
may surprise families. Devlink, for instance, assumes that if
info->attrs is set it's safe to access DEVLINK_ATTR_BUS_NAME (1)
and DEVLINK_ATTR_DEV_NAME (2).

Before plugging reject policies into split ops we need to make sure
the genetlink code will not populate info->attrs if family
had no explicit policy for the op.

While even shared code paths within the families can figure out
that given op has no policy fairly easily themselves, passing attrs
with fixed size of 2 feels fairly useless and error prone.

This change has no user-visible impact, reject attrs are not
reported to the user space via getpolicy. We do have to remove
the safety check in netlink_policy_dump_get_policy_idx()
but it seems to have been there to catch likely faulty input,
the code can handle maxattr = 0 just fine.

Link: https://patch.msgid.link/20260311032839.417748-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: vitesse: add inband caps and configuration

Add support for VSC8662 reporting its inband capabilities, and also
hook to configure the PHY's inband mode.

This fixes a regression in the macb driver caused by commit
1338cfef1ff1 ("net: macb: fix SGMII with inband aneg disabled")

Cc: stable+noautosel@kernel.org # neither this nor commit under fixes should be backported
Reported-by: Conor Dooley <conor@kernel.org>
Link: https://lore.kernel.org/r/20260304-nebulizer-rounding-40fbc81a2ba1@spud
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 1338cfef1ff1b958 ("net: macb: fix SGMII with inband aneg disabled")
Link: https://patch.msgid.link/E1w082O-0000000ChNc-1wDz@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ynl: ethtool: remove duplicated unspec entry

There is a duplicated unspec entry. Remove it.
No user impact expected, found by inspection.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260311-b4-drop_dup_unspec-v1-1-e0dfa47b5981@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: octeontx2: fix typo in documentation

Fix spelling mistake "Crate" to "Create" in the documentation.

Signed-off-by: ShravyaPanchagiri <shravy112@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260311030450.8461-1-shravy112@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: xgbe: use device_get_mac_addr

device_get_mac_addr is basically device_property_read_u8_array with an
is_valid_ether_addr call. Allows just checking for ret.

Remove XGBE_MAC_ADDR_PROPERTY. device_get_mac_addr supports more
properties than just "mac-address".

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Link: https://patch.msgid.link/20260310194647.3794-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.0-rc4).

drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
  db25c42c2e1f9 ("net/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQ")
  dff1c3164a692 ("net/mlx5e: SHAMPO, Always calculate page size")
https://lore.kernel.org/aa7ORohmf67EKihj@sirena.org.uk

drivers/net/ethernet/ti/am65-cpsw-nuss.c
  840c9d13cb1ca ("net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support")
  a23c657e332f2 ("net: ethernet: ti: am65-cpsw: Use also port number to identify timestamps")
https://lore.kernel.org/abK3EkIXuVgMyGI7@sirena.org.uk

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from CAN and netfilter.

  Current release - regressions:

   - eth: mana: Null service_wq on setup error to prevent double destroy

  Previous releases - regressions:

   - nexthop: fix percpu use-after-free in remove_nh_grp_entry

   - sched: teql: fix NULL pointer dereference in iptunnel_xmit on TEQL slave xmit

   - bpf: fix nd_tbl NULL dereference when IPv6 is disabled

   - neighbour: restore protocol != 0 check in pneigh update

   - tipc: fix divide-by-zero in tipc_sk_filter_connect()

   - eth:
      - mlx5:
         - fix crash when moving to switchdev mode
         - fix DMA FIFO desync on error CQE SQ recovery
      - iavf: fix PTP use-after-free during reset
      - bonding: fix type confusion in bond_setup_by_slave()
      - lan78xx: fix WARN in __netif_napi_del_locked on disconnect

  Previous releases - always broken:

   - core: add xmit recursion limit to tunnel xmit functions

   - net-shapers: don't free reply skb after genlmsg_reply()

   - netfilter:
      - fix stack out-of-bounds read in pipapo_drop()
      - fix OOB read in nfnl_cthelper_dump_table()

   - mctp:
      - fix device leak on probe failure
      - i2c: fix skb memory leak in receive path

   - can: keep the max bitrate error at 5%

   - eth:
      - bonding: fix nd_tbl NULL dereference when IPv6 is disabled
      - bnxt_en: fix RSS table size check when changing ethtool channels
      - amd-xgbe: prevent CRC errors during RX adaptation with AN disabled
      - octeontx2-af: devlink: fix NIX RAS reporter recovery condition"

* tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
  net: prevent NULL deref in ip[6]tunnel_xmit()
  octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status
  octeontx2-af: devlink: fix NIX RAS reporter recovery condition
  net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support
  net/mana: Null service_wq on setup error to prevent double destroy
  selftests: rtnetlink: add neighbour update test
  neighbour: restore protocol != 0 check in pneigh update
  net: dsa: realtek: Fix LED group port bit for non-zero LED group
  tipc: fix divide-by-zero in tipc_sk_filter_connect()
  net: dsa: microchip: Fix error path in PTP IRQ setup
  bpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled
  bpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled
  net: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled
  ipv6: move the disable_ipv6_mod knob to core code
  net: bcmgenet: fix broken EEE by converting to phylib-managed state
  net-shapers: don't free reply skb after genlmsg_reply()
  net: dsa: mxl862xx: don't set user_mii_bus
  net: ethernet: arc: emac: quiesce interrupts before requesting IRQ
  page_pool: store detach_time as ktime_t to avoid false-negatives
  net: macb: Shuffle the tx ring before enabling tx
  ...

Merge tag 'apparmor-pr-mainline-2026-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull AppArmor fixes from John Johansen:
- fix race between freeing data and fs accessing it
- fix race on unreferenced rawdata dereference
- fix differential encoding verification
- fix unconfined unprivileged local user can do privileged policy management
- Fix double free of ns_name in aa_replace_profiles()
- fix missing bounds check on DEFAULT table in verify_dfa()
- fix side-effect bug in match_char() macro usage
- fix: limit the number of levels of policy namespaces
- replace recursive profile removal with iterative approach
- fix memory leak in verify_header
- validate DFA start states are in bounds in unpack_pdb

* tag 'apparmor-pr-mainline-2026-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: fix race between freeing data and fs accessing it
  apparmor: fix race on rawdata dereference
  apparmor: fix differential encoding verification
  apparmor: fix unprivileged local user can do privileged policy management
  apparmor: Fix double free of ns_name in aa_replace_profiles()
  apparmor: fix missing bounds check on DEFAULT table in verify_dfa()
  apparmor: fix side-effect bug in match_char() macro usage
  apparmor: fix: limit the number of levels of policy namespaces
  apparmor: replace recursive profile removal with iterative approach
  apparmor: fix memory leak in verify_header
  apparmor: validate DFA start states are in bounds in unpack_pdb

net: prevent NULL deref in ip[6]tunnel_xmit()

Blamed commit missed that both functions can be called with dev == NULL.

Also add unlikely() hints for these conditions that only fuzzers can hit.

Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260312043908.2790803-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ti: icssg: Fix wrong macro used in RX classifier configuration

The RX_CLASS_OR_REG macro is being used with RX_CLASS_OR_EN parameter
when writing to the AND enable register. This should use RX_CLASS_AND_EN
instead to properly configure the classifier AND enable register.

Fix this by using the correct RX_CLASS_AND_EN macro parameter for
RX_CLASS_OR_REG when configuring the PTP duplicate and HSR tag
classifiers.

Fixes: f56438a74d88 ("net: ti: icssg: Add HSR/PRP protocol frame filtering")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20260310132035.1299787-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

tcp: add tcp_release_cb_cond() helper

Majority of tcp_release_cb() calls do nothing at all.

Provide tcp_release_cb_cond() helper so that release_sock()
can avoid these calls.

Also hint the compiler that __release_sock() and wake_up()
are rarely called.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-77 (-77)
Function old new delta
release_sock 258 181 -77
Total: Before=25235790, After=25235713, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260310124451.2280968-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: qcom,ipa: document qcm2290 compatible

Document that ipa on qcm2290 uses version 4.2, the same
as sc7180.

Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Wojciech Slenska <wojciech.slenska@gmail.com>
Link: https://patch.msgid.link/20260310112309.79261-2-wojciech.slenska@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-hinic3-pf-initialization'

Fan Gong says:

====================
net: hinic3: PF initialization

This is [2/3] part of hinic3 Ethernet driver second submission.
With this patch hinic3 becomes a complete Ethernet driver with
pf and vf.

Add cmdq detailed-response interfaces.
Add dump interfaces for cmdq, aeq, ceq and mailbox.
Add msg_send_lock for message sending concurrency.
Add PF device support and chip_present_flag to check cards.
Add rx vlan offload support.
Add PF FLR wait and timeout handling.
Add 5 ethtool ops for information of driver and link.

v1: https://lore.kernel.org/netdev/cover.1771916043.git.zhuyikai1@h-partners.com/
v2: https://lore.kernel.org/netdev/cover.1772697509.git.zhuyikai1@h-partners.com/
====================

Link: https://patch.msgid.link/cover.1773062356.git.zhuyikai1@h-partners.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>