Julian Braha [Mon, 30 Mar 2026 21:32:58 +0000 (22:32 +0100)]
ppp: dead code cleanup in Kconfig
There is already an 'if PPP' condition wrapping several config options
e.g. PPP_MPPE and PPPOE, making the 'depends on PPP' statement for each of
these a duplicate dependency (dead code).
I propose leaving the outer 'if PPP...endif' and removing the individual
'depends on PPP' statement from each option.
This dead code was found by kconfirm, a static analysis tool for Kconfig.
Nagamani PV [Mon, 30 Mar 2026 11:44:36 +0000 (13:44 +0200)]
net/iucv: Add missing kernel-doc return value descriptions
Add missing return value descriptions for several functions in
net/iucv/af_iucv.c and net/iucv/iucv.c to address kernel-doc warnings.
Warnings detected with:
scripts/kernel-doc -none -Wall net/iucv/*
Warning: net/iucv/af_iucv.c:131 No description found for return value of 'iucv_msg_length'
Warning: net/iucv/af_iucv.c:150 No description found for return value of 'iucv_sock_in_state'
...
No functional change.
Reviewed-by: Aswin Karuvally <aswin@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Nagamani PV <nagamani@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20260330114436.2010108-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: vxlan: check ipv6_mod_enabled() on neigh_reduce()
IPv6 must be enabled or otherwise neigh_reduce() might cause a kernel
panic. This was prevented by a check on in6_dev. Use ipv6_mod_enabled()
instead as it is cleaner and also consistent with the code at
route_shortcircuit().
net: mana: hardening: Validate adapter_mtu from MANA_QUERY_DEV_CONFIG
As a part of MANA hardening for CVM, validate the adapter_mtu value
returned from the MANA_QUERY_DEV_CONFIG HWC command.
The adapter_mtu value is used to compute ndev->max_mtu via:
gc->adapter_mtu - ETH_HLEN. If hardware returns a bogus adapter_mtu
smaller than ETH_HLEN (e.g. 0), the unsigned subtraction wraps to a
huge value, silently allowing oversized MTU settings.
Add a validation check to reject adapter_mtu values below
ETH_MIN_MTU + ETH_HLEN, returning -EPROTO to fail the device
configuration early with a clear error message.
Inspired by a recent discussion[1] I have come up with this pair of
small improvements to DMA error reporting with declance.
[1] Sebastian Andrzej Siewior, "declance: Remove IRQF_ONESHOT",
<https://lore.kernel.org/r/20260127135334.qUEaYP9G@linutronix.de/>
====================
declance: Include the offending address with DMA errors
The address latched in the I/O ASIC LANCE DMA Pointer Register uses the
TURBOchannel bus address encoding and therefore bits 33:29 of location
referred occupy bits 4:0, bits 28:2 are left-shifted by 3, and bits 1:0
are hardwired to zero. In reality no TURBOchannel system exceeds 1GiB
of RAM though, so the address reported will always fit in 8 hex digits.
Daniel Wagner [Mon, 30 Mar 2026 22:53:10 +0000 (23:53 +0100)]
net: phy: bcm84881: add BCM84891/BCM84892 support
The BCM84891 and BCM84892 are 10GBASE-T PHYs in the same family as the
BCM84881, sharing the register map and most callbacks. They add USXGMII
as a host interface mode.
bcm8489x_config_init() is separate from bcm84881_config_init(): it
allows only USXGMII (the only host mode available on the tested
hardware) and clears MDIO_CTRL1_LPOWER, which is set at boot on the
tested platform. Does not recur on ifdown/ifup, cable events, or
link-partner advertisement changes, so config_init is sufficient.
For USXGMII, read_status() skips the 0x4011 host-mode register: it
returns the same value regardless of negotiated copper speed (USXGMII
symbol replication). Speed comes from phy_resolve_aneg_linkmode() via
standard C45 AN resolution.
Tested on TRENDnet TEG-S750 (RTL9303 + 1x BCM84891 + 4x BCM84892)
running OpenWrt, where the MDIO controller driver is currently
OpenWrt-specific. Link verified at 100M, 1G, 2.5G, 10G.
Signed-off-by: Daniel Wagner <wagner.daniel.t@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260330225310.2801264-1-wagner.daniel.t@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Duyck [Fri, 27 Mar 2026 20:44:45 +0000 (13:44 -0700)]
fbnic: Set Relaxed Ordering PCIe TLP attributes for DMA engines
Add ATTR CSR bit field definitions for the DMA engine TLP header
configuration registers:
AW_CFG: RDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9]
AR_CFG: TDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9]
These fields control the PCIe TLP attribute bits for outbound
transactions from the TQM, RQM, RDE (write path), and TDE (read path)
DMA engines. An enum is added with standard PCIe TLP attribute values:
NS (No Snoop), RO (Relaxed Ordering), and IDO (ID-based Ordering).
Read the PCIe Relaxed Ordering capability at probe time and store it in
fbnic_dev. Configure Relaxed Ordering on the PCIe TLP attributes in
fbnic_mbx_init_desc_ring when the capability is enabled. For the write
path (AW_CFG), set RO on RDE and TQM attributes. For the read path
(AR_CFG), set RO on all three attributes (TDE, RQM, TQM). This allows
the PCIe fabric to reorder these transactions for improved throughput.
Chih Kai Hsu [Thu, 26 Mar 2026 07:39:23 +0000 (15:39 +0800)]
r8152: fix incorrect register write to USB_UPHY_XTAL
The old code used ocp_write_byte() to clear the OOBS_POLLING bit
(BIT(8)) in the USB_UPHY_XTAL register, but this doesn't correctly
clear a bit in the upper byte of the 16-bit register.
====================
net: stmmac: qcom-ethqos: more cleanups
Further cleanups to qcom-ethqos, mainly concentrating on the RGMII
code, making it clearer what the differences are for each speed, thus
making the code more readable.
I'm still not really happy with this. The speed specific configuration
remains split between ethqos_fix_mac_speed_rgmii() and
ethqos_rgmii_macro_init(), where the latter is only ever called from
the former. So, I think further work is needed here - maybe it needs
restructuring into the various componenet parts of the RGMII block?
====================
The comment for calculating the prg_rclk_dly value is incorrect as it
omits the brackets around the divisor. Add the brackets to allow the
reader to correctly evaluate the value. Validated with the values given
in the driver.
net: stmmac: qcom-ethqos: move loopback decision next to reg update
Move the loopback decision next to the register update, and make the
local variable unsigned. As a result, there is now no need for the
comment referring to the programming being later.
Rather than coding the entire register update twice with different
values, use a local variable to specify the value and have one
register update statement that uses this local variable. This results
in neater code.
net: stmmac: qcom-ethqos: finally eliminate the switch
Move the RCLK delay configuration out of the switch, which just leaves
the RGMII_CONFIG_LOOPBACK_EN setting in all three paths. This makes it
trivial to eliminate the switch.
Move RGMII_CONFIG2_RX_PROG_SWAP out of the switch. 1G speed always
sets this field. 100M and 10M sets it for has_emac_ge_3 devices,
otherwise it is cleared.
Move the speed programming for 100M and 10M out of the switch. There
is no programming done for 1G speed.
It looks like there are two fields, 7:6 which are programemd to '1'
to select a /2 divisor for 100M, and bits 16:8 which are programmed
to '19' to select a /20 divisor.
net: stmmac: qcom-ethqos: move two more RGMII_IO_MACRO_CONFIG2 out
RGMII_CONFIG2_DATA_DIVIDE_CLK_SEL is always cleared, and
RGMII_CONFIG2_TX_CLK_PHASE_SHIFT_EN is always updated with the phase
shift in each path through the switch, so these are independent of
the speed. Move them out.
net: stmmac: qcom-ethqos: move 1G vs 100M/10M RGMII settings
Move RGMII_CONFIG_BYPASS_TX_ID_EN, RGMII_CONFIG_POS_NEG_DATA_SEL and
RGMII_CONFIG_PROG_SWAP. There are two states for these: one group for
1G, and the logical inversion for 100M and 10M. Move this out of the
switch into an if-else clause.
net: stmmac: qcom-ethqos: move detection of invalid RGMII speed
Move detection of invalid RGMII speeds (which will never be triggered)
before the switch() to allow register modifications that are common to
all speeds to be moved out of the switch.
Since ethqos_fix_mac_speed() is called via a function pointer, and only
indirects via the configure_func function pointer, eliminate this
unnecessary indirection.
net: stmmac: qcom-ethqos: pass ethqos to ethqos_pcs_set_inband()
Rather than getting the stmmac_priv pointer in
ethqos_configure_sgmii(), move it into ethqos_pcs_set_inband() and pass
the struct qcom_ethqos pointer instead.
ethqos_configure() does nothing more than indirect via
ethqos->configure_func, and is only called from ethqos_fix_mac_speed()
just below. Move the indirect call into ethqos_fix_mac_speed().
Jan Hoffmann [Sun, 29 Mar 2026 19:11:11 +0000 (21:11 +0200)]
net: sfp: add quirk for ZOERAX SFP-2.5G-T
This is a 2.5G copper module which appears to be based on a Motorcomm
YT8821 PHY. There doesn't seem to be a usable way to to access the PHY
(I2C address 0x56 provides only read-only C22 access, and Rollball is
also not working).
The module does not report the correct extended compliance code for
2.5GBase-T, and instead claims to support SONET OC-48 and Fibre Channel:
Despite this, the kernel still enables the correct 2500Base-X interface
mode. However, for the module to actually work, it is also necessary to
disable inband auto-negotiation.
Enable the existing "sfp_quirk_oem_2_5g" for this module, which handles
that and also sets the bit for 2500Base-T link mode.
====================
net: hsr: subsystem cleanups and modernization
This series contains two focused HSR cleanups with practical benefit.
It constifies protocol operation tables and replaces a hardcoded
function name with __func__ to keep diagnostics correct across
refactoring.
====================
Luka Gejak [Thu, 26 Mar 2026 17:46:00 +0000 (18:46 +0100)]
net: hsr: use __func__ instead of hardcoded function name
Replace the hardcoded string "hsr_get_untagged_frame" with the
standard __func__ macro in netdev_warn_once() call to make the code
more robust to refactoring.
Luka Gejak [Thu, 26 Mar 2026 17:45:59 +0000 (18:45 +0100)]
net: hsr: constify hsr_ops and prp_ops protocol operation structures
The hsr_ops and prp_ops structures are assigned to hsr->proto_ops during
device initialization and are never modified at runtime. Declaring them
as const allows the compiler to place these structures in read-only
memory, which improves security by preventing accidental or malicious
modification of the function pointers they contain.
The proto_ops field in struct hsr_priv is also updated to a const
pointer to maintain type consistency.
Jakub Kicinski [Sun, 29 Mar 2026 21:35:29 +0000 (14:35 -0700)]
Merge branch 'macb-usrio-tsu-patches'
Conor Dooley says:
====================
macb usrio/tsu patches
At the very least, it'd be good of the soc vendor folks could check
their platforms and see if their usrio stuff actually lines up with what
the driver currently calls "macb_default_usrio". Ours didn't and it was
a nasty surprise.
Ryan and I figured out that the sama7g5 stuff is not actually using the
same usrio bits as earlier devices, so there's now more patches in this
series to split them apart. I've not tested the split or the new
property due to lack of hardware, but Ryan has.
Marking this stuff net-next, because although they're fixes I don't see
any particular urgency, and it avoids creating some dependencies between
cleanup items and the fixes.
====================
Théo Lebrun [Wed, 25 Mar 2026 16:28:18 +0000 (16:28 +0000)]
net: macb: drop usrio pointer on EyeQ5 config
USRIO is disabled on this platform, drop its inherited usrio config.
We will end up with MACB_CAPS_USRIO_DISABLED on this platform:
- We have no config->usrio so macb_configure_caps() deduces that the
feature is disabled.
- Anecdotally, we would also land in the runtime detection codepath
that reads DCFG1.
Théo Lebrun [Wed, 25 Mar 2026 16:28:17 +0000 (16:28 +0000)]
net: macb: set MACB_CAPS_USRIO_DISABLED if no usrio config is provided
bp->usrio is copied directly from dt_conf->usrio in macb_probe().
If dt_conf->usrio is NULL, we do not want to land in USRIO write
codepaths which dereference bp->usrio. Inherit automatically
MACB_CAPS_USRIO_DISABLED to avoid those.
This means a macb_config that wants to disable usrio can simply drop
its .usrio field, rather than add the disabled capability explicitly.
Nit: drop the dt_conf NULL check because the pointer is always valid.
DCFG1 (design config 1 register) carries a bit indicating whether User
I/O feature has been enabled or not. The MACB/GEM driver has a cap flag
indicating that HW has the feature disabled (default is enabled). Add
the missing connection between DCFG1 bit and MACB_CAPS_USRIO_DISABLED.
Indirect impact: avoid useless writel() on USERIO register; this is not
an important fix because USERIO is anyway read-only when feature is
disabled.
If for some reason a compatible sets USRIO_DISABLED but DCFG1 indicates
it is enabled, we still keep the disabled capability flag. This ensures
we don't break "cdns,np4-macb" that sets the flag from compatible match
data.
Conor Dooley [Wed, 25 Mar 2026 16:28:15 +0000 (16:28 +0000)]
net: macb: timer adjust mode is not supported
The ptp portion of this driver controls the tsu's timer using the
controls for "increment mode", which is not compatible with the hardware
trying to control it via the gem_tsu_inc_ctrl and gem_tsu_ms inputs in
"timer adjust mode". Abort probe if the property signalling that the
relevant signals have been wired up is present.
The GEM IP has two methods for modifying the ptp timer. The first of
these, named "increment mode", relies on software controlling the timer
by setting tsu_timer_incr and tsu_timer_incr_sub_nsec and performing
once-off adjustments via the tsu_timer_adjust register. This is what the
macb driver uses. The second mechanism, "timer adjust mode" uses the
gem_tsu_inc_ctrl and gem_tsu_ms signals to control the timer. These
modes are not intended to be used in parallel, but both can be possible
on the same device and which mode is used cannot be determined from the
compatible on all devices, because some users of the GEM IP are SoC
FPGAs that permit configuring how the IP is wired up.
Add a property to indicate that gem_tsu_inc_ctrl and gem_tsu_ms are wired
up for timer adjust mode.
Conor Dooley [Wed, 25 Mar 2026 16:28:13 +0000 (16:28 +0000)]
net: macb: clean up tsu clk rate acquisition
tsu_clk is grabbed during probe, so doesn't need to be re-grabbed here.
pclk is mandatory, probe will fail if it is err/NULL, so there's no need
to check it here or have a !pclk 3rd arm. Simplify gem_get_tsu_rate() to
account for these facts.
Conor Dooley [Wed, 25 Mar 2026 16:28:12 +0000 (16:28 +0000)]
net: macb: warn on pclk use as a tsu_clk fallback
The Candence GEM IP has a configuration parameter which determines the
source of the clock used for the timestamp unit (if it is enabled),
switching it between using the pclk and a dedicated input.
When ptp support was added to the macb driver, a new tsu_clk was added
to represent the dedicated input. While this is understandable, I think
it is bug prone and that the tsu_clk should represent whatever clock is
used for the timestamper and not just that specific input.
>From a driver point of view, the benefit of taking the conceptual
approach is avoiding misconfiguring the driver when the hardware
supports ptp (and it is set as a capability in the relevant per-device
structure) but no tsu_clk is provided in devicetree. At the moment, the
timestamper will be registered and programmed with an increment that
reflects the pclk in these cases, but will malfunction if the pclk and
tsu_clk frequencies do not match. Obviously, this means the devicetree
incorrectly represents the hardware, but this change in approach would
make the driver more resilient without meaningfully impacting correctly
described users.
Out of the devices that claim MACB_CAPS_GEM_HAS_PTP the fu540, mpfs,
sama5d2 and sama7g5-emac (but not sama7g5-gem) are at risk of having
this problem with the in-kernel devicetrees. mpfs and sama7g5-emac
have been confirmed to be incorrect, and sama5d2 is correct. It may be
that the other platforms actually do use the pclk for the timestamper
(either by supplying pclk to the tsu_clk input of the IP, or by having
the IP block configured to use pclk instead of the tsu_clk input), but
at least two are wrong, as they do not use pclk for the tsu_clk, so the
driver is registering the ptp clock incorrectly.
Add a warning if no tsu_clk is provided on a platform that uses the
timerstamper, to encourage people to specifically provide a tsu_clk and
avoid silently registering the timerstamper with the wrong clock. If the
pclk is actually used, it can be provided as a tsu_clk for improved
clarity in devicetrees.
While this changes the meaning of the devicetree property, it is
backwards compatible as there's no functional change for platforms that
didn't provide a tsu_clk and the changed meaning of providing a tsu_clk
in the devicetree does not impact platforms that already provided one as
the decision about the tsu clock source is at IP instantiation time
rather than at runtime, so there's no driver behaviour that needs to
change based on the input to the IP used for the timestamping unit.
Conor Dooley [Wed, 25 Mar 2026 16:28:11 +0000 (16:28 +0000)]
net: macb: add mpfs specific usrio configuration
On mpfs the driver needs to make sure the tsu clock source is not the
fabric, as this requires that the hardware is in Timer Adjust mode,
which is not compatible with the linux driver trying to control the
hardware. It is unlikely that this will be set, as the peripheral is
reset during probe, but if the resets are not provided in devicetree
it's probable that this bit is set incorrectly, as U-Boot's macb driver
has the same issue with using usrio settings for at91 platforms as the
default.
Conor Dooley [Wed, 25 Mar 2026 16:28:09 +0000 (16:28 +0000)]
net: macb: rework usrio refclk selection code
The USRIO based refclk selection code abuses a capability flag to set
the refclk to an external source based on match data/compatible on
sama7g5-emac and use an internal source for the gmac.
Ryan previously added a property in an attempt to decouple the refclk
source from the compatible, because this is not fixed by compatible
and there's variance based on the choices made by board designers.
Originally when Ryan added it, he removed the capability flag entirely
from match data, but this changed the default for the sama7g5-emac and
the removal had to be reverted for these devices. Because these devices
default to an external refclk, and the current property is only capable
of communicating external refclks, there's no way to make the
sama7g5-emac use an internal refclk.
Additionally, this property has no limiting based on compatible, and
if used on a platform with an external refclk that is not controlled
by USRIO the capability would be erroneously set. Because of the reuse
of the at91_default_usrio struct by non-at91 devices, this could cause
the refclk bit to be set in error, on a system where the refclk is
externally provided without usrio settings being required.
Change the new capability flag so that it actually represents the
hardware being capable of controlling the refclk source via USRIO,
and move the selection of default behaviour into the macb_usrio_config
struct provided as part of match data.
Modify the devicetree code to support a new property,
"cdns,refclk-source" which will support devices with either default,
retaining support for "cdns,refclk-external" for compatibility reasons.
Conor Dooley [Wed, 25 Mar 2026 16:28:08 +0000 (16:28 +0000)]
dt-bindings: net: cdns,macb: replace cdns,refclk-ext with cdns,refclk-source
Ryan added cdns,refclk-ext with the intent of decoupling the source of
the reference clock on sama7g5 (and related platforms) from the
compatible. Unfortunately, the default for sama7g5-emac is an external
reference clock, so this property had no effect there, so that
compatibility with older devicetrees is preserved.
Replace cdns,refclk-ext with one that supports both default states and
therefore is usable for sama7g5-emac.
For now, limit it to only the platforms that have USRIO controlled
reference clock selection, but this could be generalised in the future.
The existing property only works on devices that are compatible with
sama7g5-gem, so mark it deprecated, and limit its use to that specific
scenario.
Conor Dooley [Wed, 25 Mar 2026 16:28:07 +0000 (16:28 +0000)]
net: macb: split USRIO_HAS_CLKEN capability in two
While trying to rework the internal/external refclk selection on
sama7g5, Ryan and I noticed that the sama7g5 was "overloading" the
meaning of MACB_CAPS_USRIO_HAS_CLKEN, using it differently to how it was
originally intended.
Originally, on the macb hardware on sam9620 et al,
MACB_CAPS_USRIO_HAS_CLKEN represented the hardware having a bit that
needed to be set to turn on the input clock to the transceivers. The
sama7g5 doesn't have this bit, so for some reason the decision was made
to reuse this capability flag to control selection of internal/external
references.
Split the caps in two, so that capabilities do what they say on the tin,
and allow reworking the refclk selection handling without impacting the
older devices that use MACB_CAPS_USRIO_CLKEN for its original purpose.
Conor Dooley [Wed, 25 Mar 2026 16:28:06 +0000 (16:28 +0000)]
net: macb: rename macb_default_usrio to at91_default_usrio as not all platforms have mii mode control in usrio
Calling this structure macb_default_usrio is misleading, I believe, as
it implies that it should be used if your platform has nothing special
to do in usrio. Since usrio is platform dependent, the default here is
probably for each usrio to do nothing, with the macb documentation I
have access to prescribing no standard behaviour here. We noticed that
this was problematic because on mpfs, a bit that macb_default_usrio
sets to deal with the MII mode actually changes the source for the
tsu_clk to something with how the majority of mpfs devices are actually
configured!
Rename it to at91_default_usrio, since that's where the values actually
come from for these. I have no idea if any of the other platforms that
use the default actually copied at91's usrio configuration or if they
have usrio configurations where what the driver does has no impact.
Gate touching these bits behind a capability, like the clken refclock
usrio knob, so that platforms without the MII mode stuff can avoid
running this code.
Conor Dooley [Wed, 25 Mar 2026 16:28:05 +0000 (16:28 +0000)]
Revert "net: macb: Clean up the .usrio settings in macb_config instances"
Commit 0ae998c4efd69 ("net: macb: Clean up the .usrio settings in
macb_config instances") was a misguided attempt to clean up the driver
that actually just propagated problematic code. The default for usrio is
actually no usrio, and already there are issues with people using the
problematically named "macb_default_usrio" on platforms where the usrio
does not have this so-called default behaviour. usrio is platform
specific and using the default at91 usrio settings should be opt-in
only. Revert the "cleanup" patch.
====================
bnxt_en: Add XDP RSS hash metadata support
This series adds XDP RSS hash metadata extraction support for the bnxt_en
driver and includes selftests to validate the functionality. I was able
to test this on a BCM57414 NIC.
====================
This test loads xdp_metadata.bpf which calls bpf_xdp_metadata_rx_hash() on
incoming packets. The metadata from that packet is then sent to a BPF
map for validation. It borrows structure from xdp.py, reusing common
functions.
The test checks the device's xdp-rx-metadata-features via netlink
before running and skips on devices that do not advertise hash support.
This can be run on veth devices as well as real hardware.
The test is fairly simple and just verifies that a TCP or UDP packet can be
identified as an L4 flow. This minimal test also passes if run on a veth
device.
Chris J Arges [Wed, 25 Mar 2026 20:09:51 +0000 (15:09 -0500)]
selftests: net: move common xdp.py functions into lib
This moves a few functions which can be useful to other python programs
that manipulate XDP programs. This also refactors xdp.py to use the
refactored functions.
Chris J Arges [Wed, 25 Mar 2026 20:09:49 +0000 (15:09 -0500)]
bnxt_en: Move bnxt_rss_ext_op into header
This allows bnxt_rss_ext_op to be used by other functions. In addition this
modifies the rxcmp argument to be const since the function only reads from
this structure.
Add support for extracting RSS hash values and hash types from hardware
completion descriptors in XDP programs for bnxt_en.
Add IP_TYPE definition for determining if completion is ipv4 or ipv6. In
addition add ITYPE_ICMP flag for identifying ICMP completions.
Signed-off-by: Chris J Arges <carges@cloudflare.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chris J Arges [Wed, 25 Mar 2026 20:09:47 +0000 (15:09 -0500)]
bnxt_en: use bnxt_xdp_buff for xdp context
This adds bnxt_xdp_buff which embeds the xdp_buff struct and stores
pointers to hardware RX completion descriptors (rx_cmp and rx_cmp_ext)
along with the completion type.
Linus Walleij [Fri, 27 Mar 2026 12:23:45 +0000 (13:23 +0100)]
net: dsa: qca8k: Use the right GPIO header
The driver header for qca8k includes the legacy GPIO header
<linux/gpio.h> but does not use any symbols from it and actually
wants <linux/gpio/consumer.h> so fix this up.
Eric Dumazet [Fri, 27 Mar 2026 04:06:46 +0000 (04:06 +0000)]
tcp: use __jhash_final() in inet6_ehashfn()
I misread jhash2() implementation.
Last round should use __jhash_final() instead of __jhash_mix().
Using __jhash_mix() here leaves entropy distributed across a, b, and c,
which might lead to incomplete diffusion of the faddr and fport bits
into the bucket index. Replacing this last __jhash_mix() with
__jhash_final() provides the correct avalanche properties
for the returned value in c.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-4 (-4)
Function old new delta
inet6_ehashfn 306 302 -4
Total: Before=25155089, After=25155085, chg -0.00%
Fixes: 854587e69ef3 ("tcp: improve inet6_ehashfn() entropy") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260327040646.3849503-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Convert CONFIG_IPV6 to built-in and remove stubs
Historically, the Linux kernel has supported compiling the IPv6 stack as
a loadable module. While this made sense in the early days of IPv6
adoption, modern deployments and distributions overwhelmingly either
build IPv6 directly into the kernel (CONFIG_IPV6=y) or disable it
entirely (CONFIG_IPV6=n). The modular IPv6 use-case offers image size
and memory savings for specific setups, this benefit is outweighed by
the architectural burden it imposes on the subsystems on implementation
and maintenance.
In addition, most of the distributions are already using CONFIG_IPV6=y
by default [1], including openWRT [2] and Android gki_defconfig [3]. So
this won't have an impact on them. The most impacted architecture would
probably be arm64 as their default config is still using CONFIG_IPV6=m.
To allow core networking, BPF, Netfilter, and various device drivers to
safely interact with a potentially unloaded IPv6 module, the kernel
relies on indirect call structures like ipv6_stub, ipv6_bpf_stub, and
nf_ipv6_ops, along with dynamic RCU registrations for things like ICMPv6
senders.
This patch series addresses this by changing CONFIG_IPV6 from a tristate
to a boolean, enforcing that IPv6 is either built-in or disabled. This
allows us to completely rip out the stub infrastructures and safely
replace them with direct function calls.
The bloat-o-meter report the following results for m68k, arm64, x86_64
defconfig.
m68k (keep on mind that CONFIG_IPV6 is disabled now):
add/remove: 65/938 grow/shrink: 36/254 up/down: 3022/-49692 (-46670)
Considering that each new kernel release increases sizes by 30-40KiB on
average, this size increase isn't a huge jump for the distributions that
are still using CONFIG_IPV6=m. For the ones that are already using
CONFIG_IPV6=y, the size is reduced actually.
All the patches has been independently build tested. With allmodconfig
and allmodconfig + CONFIG_IPV6=n. In addition, net selftest has been run
against them on virtme-ng.
The series applied as a whole as been tested with allyesconfig and also
allyesconfig + CONFIG_IPV6=n but not all patches has been independently
tested this way.
netfilter: remove nf_ipv6_ops and use direct function calls
As IPv6 is built-in only, nf_ipv6_ops can be removed completely as it is
not longer necessary.
Convert all nf_ipv6_ops usage to direct function calls instead. In
addition, remove the ipv6_netfilter_init/fini() functions as they are
not necessary any longer.
bpf: remove ipv6_bpf_stub completely and use direct function calls
As IPv6 is built-in only, the ipv6_bpf_stub can be removed completely.
Convert all ipv6_bpf_stub usage to direct function calls instead. The
fallback functions introduced previously will prevent linkage errors
when CONFIG_IPV6 is disabled.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260325120928.15848-10-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: convert remaining ipv6_stub users to direct function calls
As IPv6 is built-in only, the ipv6_stub infrastructure is no longer
necessary.
Convert remaining ipv6_stub users to make direct function calls. The
fallback functions introduced previously will prevent linkage errors
when CONFIG_IPV6 is disabled.
ipv4: drop ipv6_stub usage and use direct function calls
As IPv6 is built-in only, the ipv6_stub infrastructure is no longer
necessary.
The IPv4 stack interacts with IPv6 mainly to support IPv4 routes with
IPv6 next-hops (RFC 8950). Convert all these cross-family calls from
ipv6_stub to direct function calls. The fallback functions introduced
previously will prevent linkage errors when CONFIG_IPV6 is disabled.
drivers: net: drop ipv6_stub usage and use direct function calls
As IPv6 is built-in only, the ipv6_stub infrastructure is no longer
necessary.
Convert all drivers currently utilizing ipv6_stub to make direct
function calls. The fallback functions introduced previously will
prevent linkage errors when CONFIG_IPV6 is disabled.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Antonio Quartulli <antonio@openvpn.net> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20260325120928.15848-7-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for dropping ipv6_stub and converting its users to direct
function calls, introduce static inline dummy functions and fallback
macros in the IPv6 networking headers. In addition, introduce checks on
fib6_nh_init(), ip6_dst_lookup_flow() and ip6_fragment() to avoid a
crash due to ipv6.disable=1 set during booting. The other functions are
safe as they cannot be called with ipv6.disable=1 set.
These fallbacks ensure that when CONFIG_IPV6 is completely disabled,
there are no compiling or linking errors due to code paths not guarded
by preprocessor macro IS_ENABLED(CONFIG_IPV6).
In addition, export ndisc_send_na(), ip6_route_input() and
ip6_fragment().
As IPv6 is built-in only, there is no need to maintain the sender
registration infrastructure used to allow built-in subsystems to send
ICMPv6 messages when IPv6 was compiled as a module.
Drop the registration mechanism and the __icmpv6_send() sender
implementation. While icmpv6_send() users could be converted to
icmp6_send() that doesn't seems necessary as none of them are using the
force_saddr parameter.
ipv6: replace IS_BUILTIN(CONFIG_IPV6) with IS_ENABLED(CONFIG_IPV6)
As IPv6 is built-in only, it does not make sense to continue using
IS_BUILTIN(CONFIG_IPV6). Therefore, replace it with IS_ENABLED() when
necessary and drop it if it isn't valid anymore.
Notice that there is still one instance related to ICMPv6, as it
requires more changes it will be handle separately.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260325120928.15848-4-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs
Maintaining a modular IPv6 stack offers image size savings for specific
setups, this benefit is outweighed by the architectural burden it
imposes on the subsystems on implementation and maintenance. Therefore,
drop it.
Change CONFIG_IPV6 from tristate to bool. Remove all Kconfig
dependencies across the tree that explicitly checked for IPV6=m. In
addition, remove MODULE_DESCRIPTION(), MODULE_ALIAS(), MODULE_AUTHOR()
and MODULE_LICENSE().
This is also replacing module_init() by device_initcall(). It is not
possible to use fs_initcall() as IPv4 does because that creates a race
condition on IPv6 addrconf.
Finally, modify the default configs from CONFIG_IPV6=m to CONFIG_IPV6=y
except for m68k as according to the bloat-o-meter the image is
increasing by 330KB~ and that isn't acceptable. Instead, disable IPv6 on
this architecture by default. This is aligned with m68k RAM requirements
and recommendations [1].
[1] http://www.linux-m68k.org/faq/ram.html
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> # arm64 Link: https://patch.msgid.link/20260325120928.15848-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 26 Mar 2026 20:32:33 +0000 (22:32 +0200)]
vrf: Remove unnecessary RCU protection around dst entries
During initialization of a VRF device, the VRF driver creates two dst
entries (for IPv4 and IPv6). They are attached to locally generated
packets that are transmitted out of the VRF ports (via the
l3mdev_l3_out() hook). Their purpose is to redirect packets towards the
VRF device instead of having the packets egress directly out of the VRF
ports. This is useful, for example, when a queuing discipline is
configured on the VRF device.
In order to avoid a NULL pointer dereference, commit b0e95ccdd775 ("net:
vrf: protect changes to private data with rcu") made the pointers to the
dst entries RCU protected. As far as I can tell, this was needed because
back then the dst entries were released (and the pointers reset to NULL)
before removing the VRF ports.
Later on, commit f630c38ef0d7 ("vrf: fix bug_on triggered by rx when
destroying a vrf") moved the removal of the VRF ports to the VRF
device's dellink() callback. As such, the tear down sequence of a VRF
device looks as follows:
1. VRF ports are removed.
2. VRF device is unregistered.
a. Device is closed.
b. An RCU grace period passes.
c. ndo_uninit() is called.
i. dst entries are released.
Given the above, the Tx path will always see the same fully initialized
dst entries and will never race with the ndo_uninit() callback.
Therefore, there is no need to make the pointers to the dst entries RCU
protected. Remove it as well as the unnecessary NULL checks in the Tx
path.
Ido Schimmel [Thu, 26 Mar 2026 20:32:32 +0000 (22:32 +0200)]
vrf: Use dst_dev_put() instead of using loopback device
Use dst_dev_put() to clean up the device referenced by the dst entry
instead of partially open coding it. Internally, the helper uses the
blackhole device instead of the loopback device.
Jakub Kicinski [Sat, 28 Mar 2026 03:57:40 +0000 (20:57 -0700)]
Merge branch 'net-stmmac-disable-eee-on-i-mx'
Laurent Pinchart says:
====================
net: stmmac: Disable EEE on i.MX
This small patch series fixes a long-standing interrupt storm issue with
stmmac on NXP i.MX platforms.
The initial attempt to fix^Wwork around the problem in DT ([1]) was
painfully but rightfully rejected by Russell, who helped me investigate
the issue in depth. It turned out that the root cause is a mistake in
how interrupts are wired in the SoC, a hardware bug that has been
replicated in all i.MX SoCs that integrate an stmmac. The only viable
solution is to disable EEE on those devices.
Individual patches explain the issue in more details. Patch 1/2,
authored by Russell, adds a new STMMAC_FLAG to disable EEE, and patch
2/2 sets the flag for i.MX platforms.
Laurent Pinchart [Wed, 25 Mar 2026 21:00:03 +0000 (23:00 +0200)]
net: stmmac: imx: Disable EEE
The i.MX8MP suffers from an interrupt storm related to the stmmac and
EEE. A long and tedious analysis ([1]) concluded that the SoC wires the
stmmac lpi_intr_o signal to an OR gate along with the main dwmac
interrupts, which causes an interrupt storm for two reasons.
First, there's a race condition due to the interrupt deassertion being
synchronous to the RX clock domain:
- When the PHY exits LPI mode, it restarts generating the RX clock
(clk_rx_i input signal to the GMAC).
- The MAC detects exit from LPI, and asserts lpi_intr_o. This triggers
the ENET_EQOS interrupt.
- Before the CPU has time to process the interrupt, the PHY enters LPI
mode again, and stops generating the RX clock.
- The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
registers. This does not clear lpi_intr_o as there's no clk_rx_i.
An attempt was made to fixing the issue by not stopping RX_CLK in Rx LPI
state ([2]). This alleviates the symptoms but doesn't fix the issue.
Since lpi_intr_o takes four RX_CLK cycles to clear, an interrupt storm
can still occur during that window. In 1000T mode this is harder to
notice, but slower receive clocks cause hundreds to thousands of
spurious interrupts.
Fix the issue by disabling EEE completely on i.MX8MP.
Some platforms have problems when EEE is enabled, and thus need a way
to disable stmmac EEE support. Add a flag before the other LPI related
flags which tells stmmac to avoid populating the phylink LPI
capabilities, which causes phylink to call phy_disable_eee() for any
PHY that is attached to the affected phylink instance.
iMX8MP is an example - the lpi_intr_o signal is wired to an OR gate
along with the main dwmac interrupts. Since lpi_intr_o is synchronous
to the receive clock domain, and takes four clock cycles to clear, this
leads to interrupt storms as the interrupt remains asserted for some
time after the LPI control and status register is read.
This problem becomes worse when the receive clock from the PHY stops
when the receive path enters LPI state - which means that lpi_intr_o
can not deassert until the clock restarts. Since the LPI state of the
receive path depends on the link partner, this is out of our control.
We could disable RX clock stop at the PHY, but that doesn't get around
the slow-to-deassert lpi_intr_o mentioned in the above paragraph.
Previously, iMX8MP worked around this by disabling gigabit EEE, but
this is insufficient - the problem is also visible at 100M speeds,
where the receive clock is slower.
There is extensive discussion and investigation in the thread linked
below, the result of which is summarised in this commit message.
net: mana: Use at least SZ_4K in doorbell ID range check
mana_gd_ring_doorbell() accesses offsets up to DOORBELL_OFFSET_EQ
(0xFF8) + 8 bytes = 4KB within each doorbell page. A db_page_size
smaller than SZ_4K is fundamentally incompatible with the driver:
doorbell pages would overlap and the device cannot function correctly.
Validate db_page_size at the source and fail the
probe early if the value is below SZ_4K. This ensures the doorbell ID
range check in mana_gd_register_device() can rely on db_page_size
being valid.
Fixes: 89fe91c65992 ("net: mana: hardening: Validate doorbell ID from GDMA_REGISTER_DEVICE response") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260325180423.1923060-1-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko [Wed, 25 Mar 2026 15:28:01 +0000 (16:28 +0100)]
mlx5: shd: Gracefully avoid shared devlink creation when no usable SN is found
On some HW, not even fall-back "SERIALNO" is found in VPD data. Unify
the behavior with the case there are no VPD data at all and avoid
creation of shared devlink instance.
Fixes: 2a8c8a03f306 ("net/mlx5: Add a shared devlink instance for PFs on same chip") Reported-by: Adam Young <admiyo@amperemail.onmicrosoft.com> Closes: https://lore.kernel.org/all/bab5b6bc-aa42-4af1-80d1-e56bcef06bc2@amperemail.onmicrosoft.com/ Reported-by: Ben Copeland <ben.copeland@linaro.org> Closes: https://lore.kernel.org/all/20260324151014.860376-1-ben.copeland@linaro.org/ Signed-off-by: Jiri Pirko <jiri@nvidia.com> Tested-by: Ben Copeland <ben.copeland@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260325152801.236343-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
value changed: 0xffff888104a93a00 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 22632 Comm: syz.0.4135 Tainted: G W syzkaller #0
PREEMPT(full)
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/24/2026
There is no race on ring accesses: reading/writing a partial pointer
would be fine, because the reading is done by the producer
which merely cares about NULL/non NULL.
Document and disable the warnings using data_race().
net: qrtr: fix endian handling of confirm_rx field
Convert confirm_rx to little endian when enqueueing and convert it back on
receive. This fixes control flow on big endian hosts, little endian is
unaffected.
On transmit, store confirm_rx as __le32 using cpu_to_le32(). On receive,
apply le32_to_cpu() before using the value. !! ensures the value is 0 or 1
in native endianness, so the conversion isn’t strictly required here, but
it is kept for consistency and clarity.
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Alexander Wilhelm <alexander.wilhelm@westermo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: stmmac: remove unused and unimplemented AXI properties
commit afea03656add ("stmmac: rework DMA bus setting and introduce new
platform AXI structure") added support for parsing all the stmmac AXI
attributes, and added code to set most of the appropriate register bits
with three exceptions:
snps,kbbe
snps,mb
snps,rb
These were parsed by the driver, but the result of parsing was never
used by any of the cores.
Moreover, no DTS in the kernel makes use of these properties.
Thus, it doesn't make sense for the driver to parse these, so let's
remove them. Also remove them from the DT binding document.
====================
dt-bindings: remove unimplemented AXI snps,kbbe snps,mb and snps,rb
Remove the AXI snps,kbbe snps,mb and snps,rb properties as they have
not been used, and although the driver parses these, the code hasn't
ever used the parsed result. This parsing has now been removed.
These were introduced by commit afea03656add ("stmmac: rework DMA bus
setting and introduce new platform AXI structure").