Xiang Mei [Sat, 4 Apr 2026 22:04:12 +0000 (15:04 -0700)]
bonding: remove unused bond_is_first_slave and bond_is_last_slave macros
Since commit 2884bf72fb8f ("net: bonding: fix use-after-free in
bond_xmit_broadcast()"), bond_is_last_slave() was only used in
bond_xmit_broadcast(). After the recent fix replaced that usage with
a simple index comparison, bond_is_last_slave() has no remaining
callers. bond_is_first_slave() likewise has no callers.
Jakub Kicinski [Thu, 9 Apr 2026 01:58:08 +0000 (18:58 -0700)]
Merge tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1) Fix ancient sparse warnings in nf conntrack nat modules, from
Sun Jian.
2) Fix typo in enum description, from Jelle van der Waa.
3) remove redundant refetch of netns pointer in nf_conntrack_sip.
4) add a deprecation warning for dccp match.
We can extend the deadline later if needed, but plan atm is to
remove the feature.
5) remove nf_conntrack_h323 debug code that can read out-of-bounds
with malformed messages. This code was commented out, but better
remove this.
6+7) add more netlink policy validations in netfilter.
This could theoretically cause issues when a client sends e.g.
unsupported feature flags that were previously ignored, so we
may have to relax some changes. For now, try to be stricter and
reject upfront.
8+9) minor code cleanup in nft_set_pipapo (an nftables set backend).
10) Add nftables matching support fro double-tagged vlan and pppoe
frames, from Pablo Neira Ayuso.
11) Fix up indentation of debug messages in nf_conntrack_h323 conntrack
helper, from David Laight.
12) Add a helper to iterate to next flow action and bail out if the
maximum number of actions is reached, also from Pablo.
* tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it
netfilter: nf_conntrack_h323: Correct indentation when H323_TRACE defined
netfilter: nft_meta: add double-tagged vlan and pppoe support
netfilter: nft_set_pipapo_avx2: remove redundant loop in lookup_slow
netfilter: nft_set_pipapo: increment data in one step
netfilter: nf_tables: add netlink policy based cap on registers
netfilter: add more netlink-based policy range checks
netfilter: nf_conntrack_h323: remove unreliable debug code in decode_octstr
netfilter: add deprecation warning for dccp support
netfilter: nf_conntrack_sip: remove net variable shadowing
netfilter: nf_tables: Fix typo in enum description
netfilter: use function typedefs for __rcu NAT helper hook pointers
====================
1) Update outdated comment in xfrm_dst_check().
From kexinsun.
2) Drop support for HMAC-RIPEMD-160 from IPsec.
From Eric Biggers.
* tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Drop support for HMAC-RIPEMD-160
xfrm: update outdated comment
====================
David Laight [Thu, 26 Mar 2026 20:18:19 +0000 (20:18 +0000)]
netfilter: nf_conntrack_h323: Correct indentation when H323_TRACE defined
The trace lines are indented using PRINT("%*.s", xx, " ").
Userspace will treat this as "%*.0s" and will output no characters
when 'xx' is zero, the kernel treats it as "%*s" and will output
a single ' ' - which is probably what is intended.
Change all the formats to "%*s" removing the default precision.
This gives a single space indent when level is zero.
Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Florian Westphal <fw@strlen.de>
netfilter: nft_meta: add double-tagged vlan and pppoe support
Currently:
add rule netdev x y ip saddr 1.1.1.1
does not work with neither double-tagged vlan nor pppoe packets. This is
because the network and transport header offset are not pointing to the
IP and transport protocol headers in the stack.
This patch expands NFT_META_PROTOCOL and NFT_META_L4PROTO to parse
double-tagged vlan and pppoe packets so matching network and transport
header fields becomes possible with the existing userspace generated
bytecode. Note that this parser only supports double-tagged vlan which
is composed of vlan offload + vlan header in the skb payload area for
simplicity.
NFT_META_PROTOCOL is used by bridge and netdev family as an implicit
dependency in the bytecode to match on network header fields.
Similarly, there is also NFT_META_L4PROTO, which is also used as an
implicit dependency when matching on the transport protocol header
fields.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Tue, 17 Mar 2026 14:50:08 +0000 (15:50 +0100)]
netfilter: nft_set_pipapo: increment data in one step
Since commit e807b13cb3e3 ("nft_set_pipapo: Generalise group size for buckets")
there is no longer a need to increment the data pointer in two steps.
Switch to a single invocation of NFT_PIPAPO_GROUPS_PADDED_SIZE() helper,
like the avx2 implementation.
Florian Westphal [Fri, 13 Mar 2026 12:12:30 +0000 (13:12 +0100)]
netfilter: nf_tables: add netlink policy based cap on registers
Should have no effect in practice; all of these use the
nft_parse_register_load/store apis which is mandatory anyway due
to the need to further validate the register load/store, e.g.
that the size argument doesn't result in out-of-bounds load/store.
OTOH this is a simple method to reject obviously wrong input
at earlier stage.
netfilter: add more netlink-based policy range checks
These spots either already check the attribute range manually
before use or the consuming functions tolerate unexpected values.
Nevertheless, add more range checks via netlink policy so we gain
more users and avoid possible re-use in other places that might
not have the required manual checks. This also improves error
reporting: netlink core can generate extack errors.
Sun Jian [Tue, 3 Mar 2026 10:15:25 +0000 (18:15 +0800)]
netfilter: use function typedefs for __rcu NAT helper hook pointers
After commit 07919126ecfc ("netfilter: annotate NAT helper hook pointers
with __rcu"), sparse can warn about type/address-space mismatches when
RCU-dereferencing NAT helper hook function pointers.
The hooks are __rcu-annotated and accessed via rcu_dereference(), but the
combination of complex function pointer declarators and the WRITE_ONCE()
machinery used by RCU_INIT_POINTER()/rcu_assign_pointer() can confuse
sparse and trigger false positives.
Introduce typedefs for the NAT helper function types, so __rcu applies to
a simple "fn_t __rcu *" pointer form. Also replace local typeof(hook)
variables with "fn_t *" to avoid propagating __rcu address space into
temporaries.
No functional change intended.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603022359.3dGE9fwI-lkp@intel.com/ Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Jakub Kicinski [Sat, 4 Apr 2026 23:01:03 +0000 (16:01 -0700)]
selftests: drv-net: adjust to socat changes
socat v1.8.1.0 now defaults to shut-null, it sends an extra
0-length UDP packet when sender disconnects. This breaks
our tests which expect the exact packet sequence.
Add shut-none which was the old default where necessary.
net: hsr: emit notification for PRP slave2 changed hw addr on port deletion
On PRP protocol, when deleting the port the MAC address change
notification was missing. In addition to that, make sure to only perform
the MAC address change on slave2 deletion and PRP protocol as the
operation isn't necessary for HSR nor slave1.
Note that the eth_hw_addr_set() is correct on PRP context as the slaves
are either in promiscuous mode or forward offload enabled.
Reported-by: Luka Gejak <luka.gejak@linux.dev> Closes: https://lore.kernel.org/netdev/DHFCZEM93FTT.1RWFBIE32K7OT@linux.dev/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Felix Maurer <fmaurer@redhat.com> Link: https://patch.msgid.link/20260403123928.4249-2-fmancera@suse.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
net/mlx5e: XDP, Add support for multi-packet per page
This series removes the limitation of having one packet per page in XDP
mode. This has the following implications:
- XDP in Striding RQ mode can now be used on 64K page systems.
- XDP in Legacy RQ mode was using a single packet per page which on 64K
page systems is quite inefficient. The improvement can be observed
with an XDP_DROP test when running in Legacy RQ mode on a ARM
Neoverse-N1 system with a 64K page size:
+-----------------------------------------------+
| MTU | baseline | this change | improvement |
|------+------------+-------------+-------------|
| 1500 | 15.55 Mpps | 18.99 Mpps | 22.0 % |
| 9000 | 15.53 Mpps | 18.24 Mpps | 17.5 % |
+-----------------------------------------------+
After lifting this limitation, the series switches to using fragments
for the side page in non-linear mode. This small improvement is at most
visible for XDP_DROP tests with small 64B packets and a large enough MTU
for Striding RQ to be in non-linear mode:
+----------------------------------------------------------------------+
| System | MTU | baseline | this change | improvement |
|----------------------+------+------------+-------------+-------------|
| 4K page x86_64 [1] | 9000 | 26.30 Mpps | 30.45 Mpps | 15.80 % |
| 64K page aarch64 [2] | 9000 | 15.27 Mpps | 20.10 Mpps | 31.62 % |
+----------------------------------------------------------------------+
This series does not cover the xsk (AF_XDP) paths for 64K page systems.
net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode
Currently in XDP multi-buffer mode for striding rq a whole page is
allocated for the linear part of the XDP buffer. This is wasteful,
especially on systems with larger page sizes.
This change splits the page into fixed sized fragments. The page is
replenished when the maximum number of allowed fragments is reached.
When a fragment is not used, it will be simply recycled on next packet.
This is great for XDP_DROP as the fragment can be recycled for the next
packet. In the most extreme case (XDP_DROP everything), there will be 0
fragments used => only one linear page allocation for the lifetime of
the XDP program.
The previous page_pool size increase was too conservative (doubling the
size) and now there are much fewer allocations (1/8 for a 4K page). So
drop the page_pool size extension altogether when the linear side page
is used.
This small improvement is at most visible for XDP_DROP tests with small
64B packets and a large enough MTU for Striding RQ to be in non-linear
mode:
+----------------------------------------------------------------------+
| System | MTU | baseline | this change | improvement |
|----------------------+------+------------+-------------+-------------|
| 4K page x86_64 [1] | 9000 | 26.30 Mpps | 30.45 Mpps | 15.80 % |
| 64K page aarch64 [2] | 9000 | 15.27 Mpps | 20.10 Mpps | 31.62 % |
+----------------------------------------------------------------------+
Currently in striding rq there is one mlx5e_frag_page member per WQE for
the linear page. This linear page is used only in XDP multi-buffer mode.
This is wasteful because only one linear page is needed per rq: the page
gets refreshed on every packet, regardless of WQE. Furthermore, it is
not needed in other modes (non-XDP, XDP single-buffer).
This change moves the linear page into its own structure (struct
mlx5_mpw_linear_info) and allocates it only when necessary.
A special structure is created because an upcoming patch will extend
this structure to support fragmentation of the linear page.
Currently XDP mode always uses PAGE_SIZE strides. This limitation
existed because page fragment counting was not implemented when XDP was
added. Furthermore, due to this limitation there were other issues as
well on system with larger pages (e.g. 64K):
- XDP for Striding RQ was effectively disabled on such systems.
- Legacy RQ allows the configuration but uses a fixed scheme of one XDP
buffer per page which is inefficient.
As fragment counting was added during the driver conversion to
page_pool and the support for XDP multi-buffer, it is now possible
to remove this stride size limitation. This patch does just that.
Now it is possible to use XDP on systems with higher page sizes (e.g.
64K):
- For Striding RQ, loading the program is no longer blocked.
Although a 64K page can fit any packet, MTUs that result in
stride > 8K will still make the RQ in non-linear mode. That's
because the HW doesn't support a higher than 8K stride.
- For Legacy RQ, the stride size was PAGE_SIZE which was very
inefficient. Now the stride size will be calculated relative to MTU.
Legacy RQ will always be in linear mode for larger system pages.
This can be observed with an XDP_DROP test [1] when running
in Legacy RQ mode on a ARM Neoverse-N1 system with a 64K
page size:
+-----------------------------------------------+
| MTU | baseline | this change | improvement |
|------+------------+-------------+-------------|
| 1500 | 15.55 Mpps | 18.99 Mpps | 22.0 % |
| 9000 | 15.53 Mpps | 18.24 Mpps | 17.5 % |
+-----------------------------------------------+
There are performance benefits for Striding RQ mode as well:
- Striding RQ non-linear mode now uses 256B strides, just like
non-XDP mode.
- Striding RQ linear mode can now fit a number of XDP buffers per page
that is relative to the MTU size. That means that on 4K page systems
and a small enough MTU, 2 XDP buffers can fit in one page.
The above benefits for Striding RQ can be observed with an
XDP_DROP test [1] when running on a 4K page x86_64 system
(Intel Xeon Platinum 8580):
+-----------------------------------------------+
| MTU | baseline | this change | improvement |
|------+------------+-------------+-------------|
| 1000 | 28.36 Mpps | 33.98 Mpps | 19.82 % |
| 9000 | 20.76 Mpps | 26.30 Mpps | 26.70 % |
+-----------------------------------------------+
[1] Test description:
- xdp-bench with XDP_DROP
- RX: single queue
- TX: sends 64B packets to saturate CPU on RX side
net/mlx5e: XDP, Improve dma address calculation of linear part for XDP_TX
When calculating the dma address of the linear part of an XDP frame, the
formula assumes that there is a single XDP buffer per page. Extend the
formula to allow multiple XDP buffers per page by calculating the data
offset in the page.
This is a preparation for the upcoming removal of a single XDP buffer
per page limitation when the formula will no longer be correct.
net/mlx5e: XSK, Increase size for chunk_size param
When 64K pages are used, chunk_size can take the 64K value
which doesn't fit in u16. This results in overflows that
are detected in mlx5e_mpwrq_log_wqe_sz().
Eric Biggers [Sun, 5 Apr 2026 01:15:13 +0000 (18:15 -0700)]
xfrm: Drop support for HMAC-RIPEMD-160
Drop support for HMAC-RIPEMD-160 from IPsec to reduce the UAPI surface
and simplify future maintenance. It's almost certainly unused.
RIPEMD-160 received some attention in the early 2000s when SHA-* weren't
quite as well established. But it never received much adoption outside
of certain niches such as Bitcoin.
It's actually unclear that Linux + IPsec + HMAC-RIPEMD-160 has *ever*
been used, even historically. When support for it was added in 2003, it
was done so in a "cleanup" commit without any justification [1]. It
didn't actually work until someone happened to fix it 5 years later [2].
That person didn't use or test it either [3]. Finally, also note that
"hmac(rmd160)" is by far the slowest of the algorithms in aalg_list[].
Of course, today IPsec is usually used with an AEAD, such as AES-GCM.
But even for IPsec users still using a dedicated auth algorithm, they
almost certainly aren't using, and shouldn't use, HMAC-RIPEMD-160.
Thus, let's just drop support for it. Note: no kconfig update is
needed, since CRYPTO_RMD160 wasn't actually being selected anyway.
References:
[1] linux-history commit d462985fc1941a47
("[IPSEC]: Clean up key manager algorithm handling.")
[2] linux commit a13366c632132bb9
("xfrm: xfrm_algo: correct usage of RIPEMD-160")
[3] https://lore.kernel.org/all/1212340578-15574-1-git-send-email-rueegsegger@swiss-it.ch
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
selftests: mptcp: join: recreate signal endp with same ID
In this "delete re-add signal" MPTCP Join subtest, the endpoint linked
to the initial subflow is removed, but readded once with different ID.
It appears that there was an issue when reusing the same ID, recently
fixed by commit d191101dee25 ("mptcp: pm: in-kernel: always set ID as
avail when rm endp"). The test then now reuses the same ID the first
time, but continue to use another one (88) the second time.
Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked()
and tcp_splice_read() to check whether to stop receiving. And use this
helper in mptcp_recvmsg() and mptcp_splice_read() to reduce redundant code.
Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-3-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gang Yan [Fri, 3 Apr 2026 11:29:28 +0000 (13:29 +0200)]
mptcp: preserve MSG_EOR semantics in sendmsg path
Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
which marks the end of a record for application-level message boundaries.
Data fragments tagged with MSG_EOR are explicitly marked in the
mptcp_data_frag structure and skb context to prevent unintended
coalescing with subsequent data chunks. This ensures the intent of
applications using MSG_EOR is preserved across MPTCP subflows,
maintaining consistent message segmentation behavior.
Gang Yan [Fri, 3 Apr 2026 11:29:27 +0000 (13:29 +0200)]
mptcp: reduce 'overhead' from u16 to u8
The 'overhead' in struct mptcp_data_frag can safely use u8, as it
represents 'alignment + sizeof(mptcp_data_frag)'. With a maximum
alignment of 7('ALIGN(1, sizeof(long)) - 1'), the overhead is at most
47, well below U8_MAX and validated with BUILD_BUG_ON().
This patch also adds a field named 'unused' for further extensions.
dpaa2: avoid linking objects into multiple modules
Each object file contains information about which module it gets linked
into, so linking the same file into multiple modules now causes a warning:
scripts/Makefile.build:254: drivers/net/ethernet/freescale/dpaa2/Makefile: dpaa2-mac.o is added to multiple modules: fsl-dpaa2-eth fsl-dpaa2-switch
scripts/Makefile.build:254: drivers/net/ethernet/freescale/dpaa2/Makefile: dpmac.o is added to multiple modules: fsl-dpaa2-eth fsl-dpaa2-switch
Change the way that dpaa2 is built by moving the two common files into a
separate module with exported symbols instead.
net: ethernet: ti-cpsw: fix linking built-in code to modules
There are six variants of the cpsw driver, sharing various parts of
the code: davinci-emac, cpsw, cpsw-switchdev, netcp, netcp_ethss and
am65-cpsw-nuss.
I noticed that this means some files can be linked into more than
one loadable module, or even part of vmlinux but also linked into
a loadable module, both of which mess up assumptions of the build
system, and causes warnings:
scripts/Makefile.build:279: cpsw_ale.o is added to multiple modules: ti-am65-cpsw-nuss ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_priv.o is added to multiple modules: ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_sl.o is added to multiple modules: ti-am65-cpsw-nuss ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: cpsw_ethtool.o is added to multiple modules: ti_cpsw ti_cpsw_new
scripts/Makefile.build:279: davinci_cpdma.o is added to multiple modules: ti_cpsw ti_cpsw_new ti_davinci_emac
Change this back to having separate modules for each portion that
can be linked standalone, exporting symbols as needed:
- ti-cpsw-common.ko now contains both cpsw-common.o and
davinci_cpdma.o as they are always used together
- ti-cpsw-priv.ko contains cpsw_priv.o, cpsw_sl.o and cpsw_ethtool.o,
which are the core of the cpsw and cpsw-new drivers.
- ti-cpsw-sl.ko contains the cpsw-sl.o object and is used on
ti-am65-cpsw-nuss.ko in addition to the two other cpsw variants.
- ti-cpsw-ale.o is the one standalone module that is used by all
except davinci_emac.
Each of these will be built-in if any of its users are built-in, otherwise
it's a loadable module if there is at least one module using it. I did
not bring back the separate Kconfig symbols for this, but just handle
it using Makefile logic.
Note: ideally this is something that Kbuild complains about, but usually
we just notice when something using THIS_MODULE misbehaves in a way that
a user notices.
net: ethernet: ti-cpsw:: rename soft_reset() function
While looking at the glob symbols shared between the cpsw drivers,
I noticed that soft_reset() is the only one that is missing a proper
namespace prefix, and will pollute the kernel namespace, so rename
it to be consistent with the other symbols.
Jakub Kicinski [Fri, 3 Apr 2026 22:05:01 +0000 (15:05 -0700)]
eth: remove the driver for acenic / tigon1&2
The entire git history for this driver looks like tree-wide
and automated cleanups. There's even more coming now with
AI, so let's try to delete it instead.
Jason Xing [Thu, 2 Apr 2026 03:41:14 +0000 (11:41 +0800)]
net: advance skb_defer_disable_key check in napi_consume_skb
When net.core.skb_defer_max is adjusted to zero, napi_consume_skb()
shouldn't go into that deeper in skb_attempt_defer_free() because it adds
an additional pair of local_bh_enable/disable() which is evidently not
needed. Advancing the check of the static key saves more cycles and
benefits non defer case.
====================
net: dsa: mxl862xx: add support for bridge offloading
As a next step to complete the mxl862xx DSA driver, add support for
offloading forwarding between bridged ports to the switch hardware.
This works pretty much without any big surprises, apart from two
subtleties:
* per-port control over flooding behavior has to be implemented by
(ab)using a 0-rate QoS meters as stopper in lack of any better
option.
* STP state transition unconditionally enables learning on a port
even if it was previously explicitely disabled (a firmware bug)
Note that as the driver is still lacking all VLAN features (which
are going to be added next), at this point some of the
bridge_vlan_aware.sh tests are failing after applying this series.
This is expected and cannot be avoided without implementing
port_vlan_filtering + port_vlan_add/del. And adding both bridge and
VLAN offloading at the same time would be too much for anyone to
review, so VLAN support is going to be submitted in a follow-up
series immediately after this series has been accepted.
All other relevant selftests (including bridge_vlan_unaware.sh) are
still passing.
Inspired by the comments received from Paolo Abeni as reply to v5
the driver now no longer caches bridge port membership in the
driver, but instead imports an existing helper from yt921x.c to dsa.h
in order to allow the driver to easily iterate over bridge members.
The mapping between DSA bridge num and firmware bridge ID is done
using a simple fixed-size array in mxl862xx_priv.
====================
Daniel Golle [Wed, 1 Apr 2026 13:35:01 +0000 (14:35 +0100)]
net: dsa: mxl862xx: implement bridge offloading
Implement joining and leaving bridges as well as add, delete and dump
operations on isolated FDBs, port MDB membership management, and
setting a port's STP state.
The switch supports a maximum of 63 bridges, however, up to 12 may
be used as "single-port bridges" to isolate standalone ports.
Allowing up to 48 bridges to be offloaded seems more than enough on
that hardware, hence that is set as max_num_bridges.
A total of 128 bridge ports are supported in the bridge portmap, and
virtual bridge ports have to be used eg. for link-aggregation, hence
potentially exceeding the number of hardware ports.
The firmware-assigned bridge identifier (FID) for each offloaded bridge
is stored in an array used to map DSA bridge num to firmware bridge ID,
avoiding the need for a driver-private bridge tracking structure.
Bridge member portmaps are rebuilt on join/leave using
dsa_switch_for_each_bridge_member().
As there are now more users of the BRIDGEPORT_CONFIG_SET API and the
state of each port is cached locally, introduce a helper function
mxl862xx_set_bridge_port(struct dsa_switch *ds, int port) which
applies the cached per-port state to hardware. For standalone user
ports (dp->bridge == NULL), it additionally resets the port to
single-port bridge state: CPU-only portmap, learning and flooding
disabled. The CPU port path sets its state explicitly before calling
this helper and is therefore not affected by the reset.
Note that MASK_VLAN_BASED_MAC_LEARNING is intentionally absent from
the firmware write mask. After mxl862xx_reset(), the firmware
initialises all VLAN-based MAC learning fields to 0 (disabled), so
SVL is the active mode by default without having to set it explicitly.
Note that there is no convenient way to control flooding on per-port
level, so the driver is using a 0-rate QoS meter setup as a stopper in
lack of any better option. In order to be perfect the firmware-enforced
minimum bucket size is bypassed by directly writing 0s to the relevant
registers -- without that at least one 64-byte packet could still
pass before the meter would change from 'yellow' into 'red' state.
Daniel Golle [Wed, 1 Apr 2026 13:34:52 +0000 (14:34 +0100)]
dsa: tag_mxl862xx: set dsa_default_offload_fwd_mark()
The MxL862xx offloads bridge forwarding in hardware, so set
dsa_default_offload_fwd_mark() to avoid duplicate forwarding of
packets of (eg. flooded) frames arriving at the CPU port.
Link-local frames are directly trapped to the CPU port only, so don't
set dsa_default_offload_fwd_mark() on those.
Daniel Golle [Wed, 1 Apr 2026 13:34:42 +0000 (14:34 +0100)]
net: dsa: add bridge member iteration macro
Drivers that offload bridges need to iterate over the ports that are
members of a given bridge, for example to rebuild per-port forwarding
bitmaps when membership changes. Currently drivers typically open-code
this by combining dsa_switch_for_each_user_port() with a
dsa_port_offloads_bridge_dev() check, or cache bridge membership
within the driver.
Add dsa_switch_for_each_bridge_member() macro to express this pattern
directly, and use it for the existing dsa_bridge_ports() inline
helper.
vsock: avoid timeout for non-blocking accept() with empty backlog
A common pattern in epoll network servers is to eagerly accept all
pending connections from the non-blocking listening socket after
epoll_wait indicates the socket is ready by calling accept in a loop
until EAGAIN is returned indicating that the backlog is empty.
Scheduling a timeout for a non-blocking accept with an empty backlog
meant AF_VSOCK sockets used by epoll network servers incurred hundreds
of microseconds of additional latency per accept loop compared to
AF_INET or AF_UNIX sockets.
Signed-off-by: Laurence Rowe <laurencerowe@gmail.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260402204918.130395-1-laurencerowe@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Zahka [Fri, 3 Apr 2026 10:26:31 +0000 (03:26 -0700)]
psp: add missing device stats to get-stats reply attributes
Commit f05d26198cf2 ("psp: add stats from psp spec to driver facing
api") added device statistics (rx-packets, rx-bytes, rx-auth-fail,
rx-error, rx-bad, tx-packets, tx-bytes, tx-error) to the stats
attribute-set but did not add them to the get-stats operation reply
attributes. The kernel reports these attributes in the reply, so
list them in the spec to match.
Jeremy Kerr [Fri, 3 Apr 2026 02:24:51 +0000 (10:24 +0800)]
net: mctp: defer creation of dst after source-address check
Sashiko reports:
> mctp_dst_from_route() increments the device reference count by calling
> mctp_dev_hold(). When a valid route is found and dst is NULL, the
> structure copy is bypassed and rc is set to 0.
Instead of optimistically creating a dst from the final route (then
releasing it if the saddr is invalid), perform the saddr check first.
This means we don't have an unuecessary hold/release on the dev, which
could leak if the dst pointer is NULL. No caller passes a NULL dst at
present though (so the leak is not possible), but this is an intended
use of mctp_dst_from_route().
Jakub Kicinski [Thu, 2 Apr 2026 21:54:44 +0000 (14:54 -0700)]
selftests: net: py: color the basics in the output
Sometimes it's hard to spot the ok / not ok lines in the output.
This is especially true for the GRO tests which retries a lot
so there's a wall of non-fatal output printed.
Try to color the crucial lines green / red / yellow when running
in a terminal.
====================
dpll: add frequency monitoring feature
This series adds support for monitoring the measured input frequency
of DPLL input pins via the DPLL netlink interface.
Some DPLL devices can measure the actual frequency being received on
input pins. The approach mirrors the existing phase-offset-monitor
feature: a device-level attribute (DPLL_A_FREQUENCY_MONITOR) enables
or disables monitoring, and a per-pin attribute
(DPLL_A_PIN_MEASURED_FREQUENCY) exposes the measured frequency in
millihertz (mHz) when monitoring is enabled.
Patch 1 adds the new attributes to the DPLL netlink spec (dpll.yaml),
the DPLL_PIN_MEASURED_FREQUENCY_DIVIDER constant, regenerates the
auto-generated UAPI header and netlink policy, and updates
Documentation/driver-api/dpll.rst.
Patch 2 adds the callback operations (freq_monitor_get/set for
devices, measured_freq_get for pins) and the corresponding netlink
GET/SET handlers in the DPLL core. The core only invokes
measured_freq_get when the frequency monitor is enabled on the parent
device. The freq_monitor_get callback is required when measured_freq_get
is provided.
Patch 3 implements the feature in the ZL3073x driver by extracting
a common measurement latch helper from the existing FFO update path,
adding a frequency measurement function, and wiring up the new
callbacks.
====================
Ivan Vecera [Thu, 2 Apr 2026 18:40:57 +0000 (20:40 +0200)]
dpll: zl3073x: implement frequency monitoring
Extract common measurement latch logic from zl3073x_ref_ffo_update()
into a new zl3073x_ref_freq_meas_latch() helper and add
zl3073x_ref_freq_meas_update() that uses it to latch and read absolute
input reference frequencies in Hz.
Add meas_freq field to struct zl3073x_ref and the corresponding
zl3073x_ref_meas_freq_get() accessor. The measured frequencies are
updated periodically alongside the existing FFO measurements.
Add freq_monitor boolean to struct zl3073x_dpll and implement the
freq_monitor_set/get device callbacks to enable/disable frequency
monitoring via the DPLL netlink interface.
Implement measured_freq_get pin callback for input pins that returns the
measured input frequency in mHz.
Ivan Vecera [Thu, 2 Apr 2026 18:40:56 +0000 (20:40 +0200)]
dpll: add frequency monitoring callback ops
Add new callback operations for a dpll device:
- freq_monitor_get(..) - to obtain current state of frequency monitor
feature from dpll device,
- freq_monitor_set(..) - to allow feature configuration.
Add new callback operation for a dpll pin:
- measured_freq_get(..) - to obtain the measured frequency in mHz.
Obtain the feature state value using the get callback and provide it to
the user if the device driver implements callbacks. The measured_freq_get
pin callback is only invoked when the frequency monitor is enabled.
The freq_monitor_get device callback is required when measured_freq_get
is provided by the driver.
Ivan Vecera [Thu, 2 Apr 2026 18:40:55 +0000 (20:40 +0200)]
dpll: add frequency monitoring to netlink spec
Add DPLL_A_FREQUENCY_MONITOR device attribute to allow control over
the frequency monitor feature. The attribute uses the existing
dpll_feature_state enum (enable/disable) and is present in both
device-get reply and device-set request.
Add DPLL_A_PIN_MEASURED_FREQUENCY pin attribute to expose the measured
input frequency in millihertz (mHz). The attribute is present in the
pin-get reply. Add DPLL_PIN_MEASURED_FREQUENCY_DIVIDER constant to
allow userspace to extract integer and fractional parts.
net: ethernet: ravb: Suspend and resume the transmission flow
The current driver does not follow the latest datasheet and does not
suspend the flow when stopping DMA and resume it when starting. Update
the driver to do so.
====================
net: macb: Remove dedicated IRQ handler for WoL
During debugging of a suspend/resume issue, I observed that the macb driver
employs a dedicated IRQ handler for Wake-on-LAN (WoL) support. To my knowledge,
no other Ethernet driver adopts this approach. This implementation unnecessarily
complicates the suspend/resume process without providing any clear benefit.
Instead, we can easily modify the existing IRQ handler to manage WoL events,
avoiding any overhead in the TX/RX hot path.
The net throughput shows no significant difference following these changes.
The following data(net throughput and execution time of macb_interrupt) were
collected from my AMD Zynqmp board using the commands:
taskset -c 1,2,3 iperf3 -c 192.168.3.4 -t 60 -Z -P 3 -R
cat /sys/kernel/debug/tracing/trace_stat/function0
Kevin Hao [Thu, 2 Apr 2026 13:41:25 +0000 (21:41 +0800)]
net: macb: Remove dedicated IRQ handler for WoL
In the current implementation, the suspend/resume path frees the
existing IRQ handler and sets up a dedicated WoL IRQ handler, then
restores the original handler upon resume. This approach is not used
by any other Ethernet driver and unnecessarily complicates the
suspend/resume process. After adjusting the IRQ handler in the previous
patches, we can now handle WoL interrupts without introducing any
overhead in the TX/RX hot path. Therefore, the dedicated WoL IRQ
handler is removed.
I have verified WoL functionality on my AMD ZynqMP board using the
following steps:
root@amd-zynqmp:~# ifconfig end0 192.168.3.3
root@amd-zynqmp:~# ethtool -s end0 wol a
root@amd-zynqmp:~# echo mem >/sys/power/state
Kevin Hao [Thu, 2 Apr 2026 13:41:24 +0000 (21:41 +0800)]
net: macb: Factor out the handling of non-hot IRQ events into a separate function
In the current code, the IRQ handler checks each IRQ event sequentially.
Since most IRQ events are related to TX/RX operations, while other
events occur infrequently, this approach introduces unnecessary overhead
in the hot path for TX/RX processing. This patch reduces such overhead
by extracting the handling of all non-TX/RX events into a new function
and consolidating these events under a new flag. As a result, only a
single check is required to determine whether any non-TX/RX events have
occurred. If such events exist, the handler jumps to the new function.
This optimization reduces four conditional checks to one and prevents
the instruction cache from being polluted with rarely used code in the
hot path.
Kevin Hao [Thu, 2 Apr 2026 13:41:23 +0000 (21:41 +0800)]
net: macb: Introduce macb_queue_isr_clear() helper function
The current implementation includes several occurrences of the
following pattern:
if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
queue_writel(queue, ISR, value);
Introduces a helper function to consolidate these repeated code
segments. No functional changes are made.
Kevin Hao [Thu, 2 Apr 2026 13:41:22 +0000 (21:41 +0800)]
net: macb: Replace open-coded implementation with napi_schedule()
The driver currently duplicates the logic of napi_schedule() primarily
to include additional debug information. However, these debug details
are not essential for a specific driver and can be effectively obtained
through existing tracepoints in the networking core, such as
/sys/kernel/tracing/events/napi/napi_poll. Therefore, this patch
replaces the open-coded implementation with napi_schedule() to
simplify the driver's code.
Qingfang Deng [Thu, 2 Apr 2026 05:00:50 +0000 (13:00 +0800)]
ppp: update Kconfig help message
Both links of the PPPoE section are no longer valid, and the CVS version
is no longer relevant.
- Replace the TLDP URL with the pppd project homepage.
- Update pppd version requirement for PPPoE.
- Update RP-PPPoE project homepage, and clarify that it's only needed
for server mode.
====================
selftests: drv-net: gro: more test cases
Add a few more test cases for GRO.
First 4 patches are unchanged from v1.
Patches 5 and 6 are new. Willem pointed out that the defines are
duplicated and all these imprecise defines have been annoying me
for a while so I decided to clean them up.
With the defines cleaned up and now more precise patch 7 (was 5)
no longer has to play any games with the MTU for ip6ip6.
The last patch now sends 3 segments as requested.
====================
Jakub Kicinski [Thu, 2 Apr 2026 20:59:59 +0000 (13:59 -0700)]
selftests: drv-net: gro: test ip6ip6
We explicitly test ipip encap. Let's add ip6ip6, too. Having
just ipip seems like favoring IPv4 which we should not do :)
Testing all combinations is left for future work, not sure
it's actually worth it.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:58 +0000 (13:59 -0700)]
selftests: drv-net: gro: make large packet math more precise
When constructing the packets for large_* test cases we use
a static value for packet count and MSS. It works okay for
ipv4 vs ipv6 but the gap between ipv4 and ip6ip6 is going to
be quite significant.
Make the defines calculate the worst case values, those
are only used for sizing stack arrays. Create helpers for
calculating precise values based on the exact test case.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:57 +0000 (13:59 -0700)]
selftests: drv-net: gro: remove TOTAL_HDR_LEN
Willem points out TOTAL_HDR_LEN is identical to MAX_HDR_LEN.
This seems to have been the case ever since the test was added.
Replace the uses of TOTAL_HDR_LEN with MAX_HDR_LEN, MAX seems
more common for what this value is.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:56 +0000 (13:59 -0700)]
selftests: drv-net: gro: prepare for ip6ip6 support
Try to use already calculated offsets and not depend on the ipip
flag as much. This patch should not change any functionality,
it's just a cleanup to make ip6ip6 support easier.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:55 +0000 (13:59 -0700)]
selftests: drv-net: gro: always wait for FIN in the capacity test
The new capacity/order test exits as soon as it sees the expected
packet sequence. This may allow the "flushing" FIN packet to spill
over to the next test. Let's always wait for the FIN before exiting.
Jakub Kicinski [Thu, 2 Apr 2026 20:59:54 +0000 (13:59 -0700)]
selftests: drv-net: gro: add 1 byte payload test
Small IPv4 packets get padded to 60B, this may break / confuse
some buggy implementations. Add a test to coalesce a 1B payload.
Keep this separate from the lrg_sml test because I suspect some
implementations may not handle this case (treat padded frames
as ineligible for coalescing).
Jakub Kicinski [Thu, 2 Apr 2026 20:59:53 +0000 (13:59 -0700)]
selftests: drv-net: gro: add data burst test case
Add a test trying to induce a GRO context timeout followed
by another sequence of packets for the same flow. The second
burst arrives 100ms after the first one so any implementation
(SW or HW) must time out waiting at that point. We expect both
bursts to be aggregated successfully but separately.
The clocks for qcom-ethqos return a rate of zero as firmware manages
their rate. According to hardware documentation, the clock which is
fed to the slave AHB interface can range between 50 to 100MHz for
non-RGMII and 30 to 75MHz for boards with a RGMII interfaces.
Currently, stmmac uses an undefined divisor value. Instead, use
STMMAC_CSR_60_100M which will mean we meet IEEE 802.3 specification
since this will generate:
Unfortunately, this divisor makes the upper bound of both ranges exeed
the IEEE 802.3 specification, and thus we can not use it without knowing
for certain what the current CSR clock rate actually is.
So, STMMAC_CSR_60_100M is the best fit for all boards based on the
information provided thus far.
stmmac: cleanup dead dependencies on STMMAC_PLATFORM and STMMAC_ETH in Kconfig
There are already 'if STMMAC_ETH' and 'STMMAC_PLATFORM'
conditions wrapping these config options, making the
'depends on' statements duplicate dependencies (dead code).
I propose leaving the outer 'if STMMAC_PLATFORM...endif' and
'if STMMAC_ETH...endif' conditions, and removing the
individual 'depends on' statements.
This dead code was found by kconfirm, a static analysis tool for Kconfig.
This is the first of four series adding SR-IOV V2 support to the enic
driver for Cisco VIC 14xx/15xx adapters.
The existing V1 SR-IOV implementation has VFs that interact directly
with the VIC firmware, leaving the PF driver with no visibility or
control over VF behavior. V2 introduces a PF-mediated model where VFs
communicate with the PF through a mailbox over a dedicated admin
channel. This brings enic in line with the standard Linux SR-IOV
model, enabling full PF management of VFs via ip link (MAC, VLAN,
link state, spoofchk, trust, and per-VF statistics).
This preparatory series adds detection and resource helper code with
no functional change to existing driver behavior:
- Extend BAR resource discovery for admin channel resources
- Register the V2 VF PCI device ID
- Detect VF type (V1/V2/usNIC) from SR-IOV PCI capability
- Make enic_dev_enable/disable ref-counted for shared use by data
path and admin channel
- Add type-aware resource allocation for admin WQ/RQ/CQ/INTR
- Detect presence of admin channel resources at probe time
Tested on VIC 14xx and 15xx series adapters with V2 VFs under KVM
(sriov_numvfs, VF passthrough, ip link VF configuration, VF traffic).
Based in part on initial work by Christian Benvenuti.
====================
Check for the presence of admin channel BAR resources
(RES_TYPE_ADMIN_WQ, ADMIN_RQ, ADMIN_CQ, SRIOV_INTR) during resource
discovery. Set has_admin_channel when all four are available.
Use ARRAY_SIZE(enic->admin_cq) for the admin CQ count check since the
driver allocates two admin CQs (one for WQ completions, one for RQ
completions) and both must be backed by hardware resources.
Add admin WQ, RQ, CQ and INTR fields to struct enic for use by the
upcoming admin channel open/close paths.
enic: add type-aware alloc for WQ, RQ, CQ and INTR resources
The existing vnic_wq_alloc(), vnic_rq_alloc(), vnic_cq_alloc() and
vnic_intr_alloc() hardcode data-path resource types (RES_TYPE_WQ,
RES_TYPE_RQ, RES_TYPE_CQ, RES_TYPE_INTR_CTRL). The upcoming admin
channel uses different BAR resource types (RES_TYPE_ADMIN_WQ/RQ/CQ,
RES_TYPE_SRIOV_INTR) for its queues.
Add _with_type() variants that accept an explicit resource type
parameter. Refactor the original functions as thin wrappers that
pass the default data-path type. No functional change.
Both the data path (ndo_open/ndo_stop) and the upcoming admin channel
need to enable and disable the vNIC device independently. Without
reference counting, closing the admin channel while the netdev is up
would inadvertently disable the entire device.
Add an enable_count to struct enic, protected by the existing
devcmd_lock. enic_dev_enable() issues CMD_ENABLE_WAIT only on the
first caller (0 -> 1 transition), and enic_dev_disable() issues
CMD_DISABLE only when the last caller releases (1 -> 0 transition).
Also check the return value of enic_dev_enable() in enic_open() and
fail the open if the firmware enable command fails. Without this check,
a failed enable leaves enable_count at zero while the interface appears
up, which can cause a later admin channel enable/disable cycle to
incorrectly disable the hardware under the active data path.
Read the VF device ID from the SR-IOV PCI capability at probe time to
determine whether the PF is configured for V1, USNIC, or V2 virtual
functions. Store the result in enic->vf_type for use by subsequent
SR-IOV operations.
The VF type is a firmware-configured property (set via UCSM, CIMC,
Intersight etc) that is immutable from the driver's perspective. Only
PFs are probed for this capability; VFs and dynamic vnics skip
detection.
Register the V2 VF PCI device ID (0x02b7) so the driver binds to V2
virtual functions created via sriov_configure. Update enic_is_sriov_vf()
to recognize V2 VFs alongside the existing V1 type.
enic: extend resource discovery for SR-IOV admin channel
VIC firmware exposes admin channel resources (WQ, RQ, CQ) for PF-VF
communication when SR-IOV is active. Add the corresponding resource
type definitions and teach the discovery and access functions to
handle them.
====================
net: phy: microchip: add downshift support for LAN88xx
Add standard ETHTOOL_PHY_DOWNSHIFT tunable support for the Microchip
LAN88xx PHY, following the same pattern used by Marvell and other PHY
drivers.
Ethernet cables with faulty or missing pairs (specifically C and D)
can successfully auto-negotiate 1000BASE-T but fail to establish a
stable link. The LAN88xx PHY supports automatic downshift to
100BASE-TX after a configurable number of failed attempts (2-5).
Patch 1 adds the get/set tunable implementation.
Patch 2 enables downshift by default with a count of 2. The setting is
stored in the driver's private data so that user changes via ethtool are
preserved across suspend/resume cycles.
Based on an earlier downstream implementation by Phil Elwell.
Tested on Raspberry Pi 3B+ (LAN7515/LAN88xx).
====================
net: phy: microchip: enable downshift by default on LAN88xx
Enable auto-downshift from 1000BASE-T to 100BASE-TX after 2 failed
auto-negotiation attempts by default. This ensures that links with
faulty or missing cable pairs (C and D) fall back to 100Mbps without
requiring userspace configuration.
The downshift count is stored in the driver's private data and applied
in config_init, so user changes via ethtool are preserved across
suspend/resume cycles.
Users can override or disable downshift at runtime:
net: phy: microchip: add downshift tunable support for LAN88xx
Implement the standard ETHTOOL_PHY_DOWNSHIFT tunable for the LAN88xx
PHY. This allows runtime configuration of the auto-downshift feature
via ethtool:
ethtool --set-phy-tunable eth0 downshift on count 3
The LAN88xx PHY supports downshifting from 1000BASE-T to 100BASE-TX
after 2-5 failed auto-negotiation attempts. Valid count values are
2, 3, 4 and 5.
This is based on an earlier downstream implementation by Phil Elwell.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260401123848.696766-2-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Wagner [Wed, 1 Apr 2026 11:49:31 +0000 (12:49 +0100)]
net: phy: bcm84881: add LED framework support for BCM84891/BCM84892
Expose LED1 and LED2 pins via the PHY LED framework. Each pin has a
source mask (MASK_LOW + MASK_EXT registers) selecting which hardware
events light it, plus a CTL field in the shared 0xA83B register
(RMW; LED4 is firmware-controlled per the datasheet).
Hardware can offload per-speed link triggers (1000/2500/5000/10000),
RX/TX activity, and force-on. LINK_100 is accepted only alongside
LINK_1000: source bit 4 lights at both speeds and 100-alone isn't
representable, so the unrepresentable case falls to software.
The chip has five LED pins; only LED1/LED2 are exposed here as those
are the only ones characterized on tested hardware. LED4 is firmware-
controlled regardless of strap configuration.
Tested on TRENDnet TEG-S750 (LED1/LED2 wired to an antiparallel
bicolor LED): brightness_set via sysfs; netdev trigger offloaded=1
with amber lit at 100M/1G/2.5G and green lit at 10G via respective
link_* modes; LED off immediately on cable unplug with no software
involvement.
This is a more refined version of the previous patch series fixing
and cleaning up the TSO code.
I'm not sure whether "TSO" or "GSO" should be used to describe this
feature - although it primarily handles TCP, dwmac4 appears to also
be able to handle UDP.
In essence, this series adds a .ndo_features_check() method to handle
whether TSO/GSO can be used for a particular skbuff - checking which
queue the skbuff is destined for and whether that has TBS available
which precludes TSO being enabled on that channel.
I'm also adding a check that the header is smaller than 1024 bytes,
as documented in those sources which have TSO support - this is due
to the hardware buffering the header in "TSO memory" which I guess
is limited to 1KiB. I expect this test never to trigger, but if
the headers ever exceed that size, the hardware will likely fail.
While IPv4 headers are unlikely to be anywhere near this, there is
nothing in the protocol which prevents IPv6 headers up to 64KiB.
As we now have a .ndo_features_check() method, I'm moving the VLAN
insertion for TSO packets into core code by unpublishing the VLAN
insertion features when we use TSO. Another move is for checksumming,
which is required for TSO, but stmmac's requirements for offloading
checksums are more strict - and this seems to be a bug in the TSO
path.
I've changed the hardware initialisation to always enable TSO support
on the channels even if the user requests TSO/GSO to be disabled -
this fixes another issue as pointed out by Jakub in a previous review.
I'm moving the setup of the GSO features, cleaning those up, and
adding a warning if platform glue requests this to be enabled but the
hardware has no support. Hopefully this will never trigger if everyone
got the STMMAC_FLAG_TSO_EN flag correct. Also adding a check for TxPBL
value.
Finally, moving the "TSO supported" message to the new
stmmac_set_gso_features() function so keep all this TSO stuff together.
====================
Documentation states that TxPBL must be >= 4 to allow TSO support, but
the driver doesn't check this. TxPBL comes from the platform glue code
or DT. Add a check with a warning if platform glue code attempts to
enable TSO support with TxPBL too low.
net: stmmac: simplify GSO/TSO test in stmmac_xmit()
The test in stmmac_xmit() to see whether we should pass the skbuff to
stmmac_tso_xmit() is more complex than it needs to be. This test can
be simplified by storing the mask of GSO types that we will pass, and
setting it according to the enabled features.
Note that "tso" is a mis-nomer since commit b776620651a1 ("net:
stmmac: Implement UDP Segmentation Offload"). Also note that this
commit controls both via the TSO feature. We preserve this behaviour
in this commit.
Also, this commit unconditionally accessed skb_shinfo(skb)->gso_type
for all frames, even when skb_is_gso() was false. This access is
eliminated.
net: stmmac: move check for hardware checksum supported
Add a check in .ndo_features_check() to indicate whether hardware
checksum can be performed on the skbuff. Where hardware checksum is
not supported - either because the channel does not support Tx COE
or the skb isn't suitable (stmmac uses a tighter test than
can_checksum_protocol()) we also need to disable TSO, which will be
done by harmonize_features() in net/core/dev.c
This fixes a bug where a channel which has COE disabled may still
receive TSO skbuffs.