Jakub Kicinski [Mon, 1 Sep 2025 21:12:08 +0000 (14:12 -0700)]
eth: fbnic: split fbnic_enable()
Factor out handling a single nv from fbnic_enable() to make
it reusable for queue ops. Use a __ prefix for the factored
out code. The real fbnic_nv_enable() which will include
fbnic_wrfl() will be added with the qops, to avoid unused
function warnings.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:06 +0000 (14:12 -0700)]
eth: fbnic: split fbnic_disable()
Factor out handling a single nv from fbnic_disable() to make
it reusable for queue ops. Use a __ prefix for the factored
out code. The real fbnic_nv_disable() which will include
fbnic_wrfl() will be added with the qops, to avoid unused
function warnings.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:05 +0000 (14:12 -0700)]
eth: fbnic: request ops lock
We'll add queue ops soon so. queue ops will opt the driver into
extra locking. Request this locking explicitly already to make
future patches smaller and easier to review.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:04 +0000 (14:12 -0700)]
eth: fbnic: use netmem_ref where applicable
Use netmem_ref instead of struct page pointer in prep for
unreadable memory. fbnic has separate free buffer submission
queues for headers and for data. Refactor the helper which
returns page pointer for a submission buffer to take the
high level queue container, create a separate handler
for header and payload rings. This ties the "upcast" from
netmem to system page to use of sub0 which we know has
system pages.
Jakub Kicinski [Mon, 1 Sep 2025 21:12:02 +0000 (14:12 -0700)]
eth: fbnic: move xdp_rxq_info_reg() to resource alloc
Move rxq_info and mem model registration from fbnic_alloc_napi_vector()
and fbnic_alloc_nv_resources() to fbnic_alloc_rx_qt_resources().
The rxq_info is now registered later in the process, but that
should not cause any issues.
rxq_info lives in the fbnic_q_triad (qt) struct so qt init is a more
natural place. Encapsulating the logic in the qt functions will also
allow simplifying the cleanup in the NAPI related alloc functions
in the next commit.
Rx does not have a dedicated fbnic_free_rx_qt_resources(),
but we can use xdp_rxq_info_is_reg() to tell whether given
rxq_info was in use (effectively - if it's a qt for an Rx queue).
Having to pass nv into fbnic_alloc_rx_qt_resources() is not
great in terms of layering, but that's temporary, pp will
move soon..
Jakub Kicinski [Mon, 1 Sep 2025 21:12:01 +0000 (14:12 -0700)]
eth: fbnic: move page pool pointer from NAPI to the ring struct
In preparation for memory providers we need a closer association
between queues and page pools. We used to have a page pool at the
NAPI level to serve all associated queues but with MP the queues
under a NAPI may no longer be created equal.
The "ring" structure in fbnic is a descriptor ring. We have separate
"rings" for payload and header pages ("to device"), as well as a ring
for completions ("from device"). Technically we only need the page
pool pointers in the "to device" rings, so adding the pointer to
the ring struct is a bit wasteful. But it makes passing the structures
around much easier.
For now both "to device" rings store a pointer to the same
page pool. Using more than one queue per NAPI is extremely rare
so don't bother trying to share a single page pool between queues.
ipv6: sit: Add ipip6_tunnel_dst_find() for cleanup
Extract the dst lookup logic from ipip6_tunnel_xmit() into new helper
ipip6_tunnel_dst_find() to reduce code duplication and enhance readability.
No functional change intended.
On a x86_64, with allmodconfig object size is also reduced:
The current R-Car S4 rswitch driver only supports port based fowarding.
This patch set adds HW offloading for L2 switching/bridgeing. The driver
hooks into switchdev.
1. Rename the base driver file to keep the driver name (rswitch.ko)
2. Add setting of default MAC ageing time in hardware.
3. Add the L2 driver extension in a separate file. The HW offloading
is automatically configured when a port is added to the bridge device.
Usage example:
ip link add name br0 type bridge
ip link set dev tsn0 master br0
ip link set dev tsn1 master br0
ip link set dev br0 up
ip link set dev tsn0 up
ip link set dev tsn1 up
Layer 2 traffic is now fowarded by HW from port TSN0 to port TSN1.
4. Provides the functionality to set the MAC table ageing time in the
Rswitch.
Usage example:
ip link change dev br0 type bridge ageing 100
Michael Dege [Mon, 1 Sep 2025 04:58:07 +0000 (06:58 +0200)]
net: renesas: rswitch: add offloading for L2 switching
Add hardware offloading for L2 switching on R-Car S4.
On S4 brdev is limited to one per-device (not per port). Reasoning
is that hw L2 forwarding support lacks any sort of source port based
filtering, which makes it unusable to offload more than one bridge
device. Either you allow hardware to forward destination MAC to a
port, or you have to send it to CPU. You can't make it forward only
if src and dst ports are in the same brdev.
Michael Dege [Mon, 1 Sep 2025 04:58:05 +0000 (06:58 +0200)]
net: renesas: rswitch: rename rswitch.c to rswitch_main.c
Adding new functionality to the driver. Therefore splitting into multiple
c files to keep them manageable. New functionality will be added to
separate files.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Michael Dege <michael.dege@renesas.com> Link: https://patch.msgid.link/20250901-add_l2_switching-v5-1-5f13e46860d5@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
net: phy: micrel: Add PTP support for lan8842
The PTP block in lan8842 is the same as lan8814 so reuse all these
functions. The first patch of the series just does cosmetic changes such
that lan8842 can reuse the function lan8814_ptp_probe. There should not be
any functional changes here. While the second patch adds the PTP support
to lan8842.
====================
It has the same PTP IP block as lan8814, only the number of GPIOs is
different, all the other functionality is the same. So reuse the same
functions as lan8814 for lan8842.
There is a revision of lan8842 called lan8832 which doesn't have the PTP
IP block. So make sure in that case the PTP is not initialized.
net: phy: micrel: Introduce function __lan8814_ptp_probe_once
Introduce the function __lan8814_ptp_probe_once as this function will be
used also by lan8842 driver which has a different number of GPIOs
compared to lan8814. This change doesn't have any functional
changes.
Juraj Šarinay [Tue, 2 Sep 2025 11:36:28 +0000 (13:36 +0200)]
net: nfc: nci: Increase NCI_DATA_TIMEOUT to 3000 ms
An exchange with a NFC target must complete within NCI_DATA_TIMEOUT.
A delay of 700 ms is not sufficient for cryptographic operations on smart
cards. CardOS 6.0 may need up to 1.3 seconds to perform 256-bit ECDH
or 3072-bit RSA. To prevent brute-force attacks, passports and similar
documents introduce even longer delays into access control protocols
(BAC/PACE).
The timeout should be higher, but not too much. The expiration allows
us to detect that a NFC target has disappeared.
====================
net: stmmac: allow generation of flexible PPS relative to MAC time
When doing some testing on stm32mp2x platforms(MACv5), I noticed that
the command previously used with a MACv4 for genering a PPS signal:
echo "0 0 0 1 1" > /sys/class/ptp/ptp0/period
did not work.
This is because the arguments passed through this command must contain
the start time at which the PPS should be generated, relative to the
MAC system time. For some reason, a time set in the past seems to work
with a MACv4.
Because passing such an argument is tedious, consider that any time
set in the past is an offset regarding the MAC system time. This way,
this does not impact existing scripts and the past time use case is
handled. Edit: But maybe that's not important and we can just change
the default behavior to this.
Example to generate a flexible PPS signal that has a 1s period 3s
relative to when the command was entered:
drivers: net: stmmac: handle start time set in the past for flexible PPS
In case the time arguments used for flexible PPS signal generation are in
the past, consider the arguments to be a time offset relative to the MAC
system time.
This way, past time use case is handled and it avoids the tedious work
of passing an absolute time value for the flexible PPS signal generation
while not breaking existing scripts that may rely on this behavior.
Joy Zou [Mon, 1 Sep 2025 10:36:32 +0000 (18:36 +0800)]
net: stmmac: imx: add i.MX91 support
Add i.MX91 specific settings for EQoS.
Reviewed-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Joy Zou <joy.zou@nxp.com> Link: https://patch.msgid.link/20250901103632.3409896-7-joy.zou@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 3 Sep 2025 23:06:44 +0000 (16:06 -0700)]
Merge tag 'nf-next-25-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1) prefer vmalloc_array in ebtables, from Qianfeng Rong.
2) Use csum_replace4 instead of open-coding it, from Christophe Leroy.
3+4) Get rid of GFP_ATOMIC in transaction object allocations, those
cause silly failures with large sets under memory pressure, from
myself.
5) Remove test for AVX cpu feature in nftables pipapo set type,
testing for AVX2 feature is sufficient.
6) Unexport a few function in nf_reject infra: no external callers.
7) Extend payload offset to u16, this was restricted to values <=255
so far, from Fernando Fernandez Mancera.
* tag 'nf-next-25-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nft_payload: extend offset to 65535 bytes
netfilter: nf_reject: remove unneeded exports
netfilter: nft_set_pipapo: remove redundant test for avx feature bit
netfilter: nf_tables: all transaction allocations can now sleep
netfilter: nf_tables: allow iter callbacks to sleep
netfilter: nft_payload: Use csum_replace4() instead of opencoding
netfilter: ebtables: Use vmalloc_array() to improve code
====================
STMMAC on SoCFPGA uses exactly one interrupt in in-kernel DTS and common
snps,dwmac.yaml binding is flexible, so define precise constraint for
this device.
tools: ynl-gen: use macro for binary min-len check
This patch changes the generated min-len check for binary
attributes to use the NLA_POLICY_MIN_LEN() macro, thereby the
generated code supports strict policy validation.
With this change TypeBinary will always generate a NLA_BINARY
attribute policy.
This doesn't change any currently generated code, as it isn't
used in any specs currently used for generating code.
While updating the binary min-len implementation, I noticed that
the only user, should AFAICT be using exact-len instead.
In net/ipv4/fou_core.c FOU_ATTR_LOCAL_V6 and FOU_ATTR_PEER_V6
are only used for singular IPv6 addresses, and there are AFAICT
no known implementations trying to send more, it therefore
appears safe to change it to an exact-len policy.
This patch therefore changes the local-v6/peer-v6 attributes to
use an exact-len check, instead of a min-len check.
Accelerated Receive Flow Steering (aRFS) relies on sockets recording
their RX flow hash into the rps_sock_flow_table so that incoming packets
are steered to the CPU where the application runs.
With MPTCP, the application interacts with the parent MPTCP socket while
data is carried over per-subflow TCP sockets. Without recording these
subflows, aRFS cannot steer interrupts and RX processing for the flows
to the desired CPU.
Record all subflows in the RPS table by calling sock_rps_record_flow()
for each subflow at the start of mptcp_sendmsg(), mptcp_recvmsg() and
mptcp_stream_accept(), by using the new helper
mptcp_rps_record_subflows().
It does not by itself improve throughput, but ensures that IRQ and RX
processing are directed to the right CPU, which is a
prerequisite for effective aRFS.
Add a helper to check if RFS is needed or not. Allows to make the code a
bit cleaner and the next patch to have MPTCP use this helper to decide
whether or not to iterate over the subflows.
tun_flow_update() was calling sock_rps_record_flow_hash() regardless of
the state of rfs_needed. This was not really a bug as sock_flow_table
simply ends up being NULL and thus everything will be fine.
This commit here thus also implicitly makes tun_flow_update() respect
the state of rfs_needed.
Gang Yan [Tue, 2 Sep 2025 21:11:34 +0000 (23:11 +0200)]
selftests: mptcp: add checks for fallback counters
Recently, some mib counters about fallback has been added, this patch
provides a method to check the expected behavior of these mib counters
during the test execution.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/571 Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250902-net-next-mptcp-misc-feat-6-18-v2-2-fa02bb3188b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: dsa: lantiq_gswip: prepare for supporting MaxLinear GSW1xx
Continue to prepare for supporting the newer standalone MaxLinear GSW1xx
switch family by extending the existing lantiq_gswip driver to allow it
to support MII interfaces and MDIO bus of the GSW1xx.
This series has been preceded by an RFC series which covers everything
needed to support the MaxLinear GSW1xx family of switches. Andrew Lunn
had suggested to split it into a couple of smaller series and start
with the changes which don't yet make actual functional changes or
support new features.
Everything has been compile and runtime tested on AVM Fritz!Box 7490
(GSWIP version 2.1, VR9 v1.2)
Daniel Golle [Sat, 30 Aug 2025 02:34:48 +0000 (03:34 +0100)]
net: dsa: lantiq_gswip: move MDIO bus registration to .setup()
Instead of registering the switch MDIO bus in the probe() function, move
the call to gswip_mdio() into the .setup() DSA switch op, so it can be
reused independently of the probe() function.
Daniel Golle [Sat, 30 Aug 2025 02:34:23 +0000 (03:34 +0100)]
net: dsa: lantiq_gswip: support standard MDIO node name
Instead of matching against the child node's compatible string also
support locating the node of the device tree node of the MDIO bus
in the standard way by referencing the node name ("mdio").
Daniel Golle [Sat, 30 Aug 2025 02:34:13 +0000 (03:34 +0100)]
net: dsa: lantiq_gswip: support offset of MII registers
The MaxLinear GSW1xx family got a single (R)(G)MII port at index 5 but
the registers MII_PCDU and MII_CFG are those of port 0.
Allow applying an offset for the port index to access those registers.
Daniel Golle [Sat, 30 Aug 2025 02:33:48 +0000 (03:33 +0100)]
net: dsa: lantiq_gswip: ignore SerDes modes in phylink_mac_config()
We can safely ignore SerDes interface modes 1000Base-X, 2500Base-X and
SGMII in phylink_mac_config() as they are being taken care of by the PCS
and the SGMII port anyway doesn't have MII_CFG and MII_PCDU registers
and hence gswip_phylink_mac_config() is already a no-op apart from
outputing a misleading error message.
Return early in case of SerDes interface modes to avoid printing that
error message.
Daniel Golle [Sat, 30 Aug 2025 02:33:00 +0000 (03:33 +0100)]
net: dsa: lantiq_gswip: support model-specific mac_select_pcs()
Call mac_select_pcs() function if provided in struct gswip_hwinfo.
The MaxLinear GSW1xx series got one port wired to a SerDes PCS and
PHY which can do 1000Base-X, 2500Base-X and SGMII. Support for the
SerDes port will be provided using phylink_pcs, so provide a
convenient way for mac_select_pcs() to differ based on the hardware
model.
Daniel Golle [Sat, 30 Aug 2025 02:32:42 +0000 (03:32 +0100)]
net: dsa: lantiq_gswip: move to dedicated folder
Move the lantiq_gswip driver to its own folder and update
MAINTAINERS file accordingly.
This is done ahead of extending the driver to support the MaxLinear
GSW1xx series of standalone switch ICs, which includes adding a bunch
of files.
ipv6: Add sanity checks on ipv6_devconf.rpl_seg_enabled
In ipv6_rpl_srh_rcv() we use min(net->ipv6.devconf_all->rpl_seg_enabled,
idev->cnf.rpl_seg_enabled) is intended to return 0 when either value is
zero, but if one of the values is negative it will in fact return non-zero.
net: macb: Validate the value of base_time properly
In macb_taprio_setup_replace(), the value of start_time is being
compared against zero which would never be true since start_time
is an unsigned value. Due to this there is a chance that an
incorrect config base time value can be used for computation.
Fix by checking the value of conf->base_time directly.
This issue was reported by static coverity analyzer.
Jakub Kicinski [Mon, 1 Sep 2025 17:31:39 +0000 (10:31 -0700)]
selftests: drv-net: rss_ctx: make the test pass with few queues
rss_ctx.test_rss_key_indir implicitly expects at least 5 queues,
as it checks that the traffic on first 2 queues is lower than
the remaining queues when we use all queues. Special case fewer
queues.
Jakub Kicinski [Mon, 1 Sep 2025 17:31:38 +0000 (10:31 -0700)]
selftests: drv-net: rss_ctx: use Netlink for timed reconfig
The rss_ctx test has gotten pretty flaky after I increased
the queue count in NIPA 2->3. Not 100% clear why. We get
a lot of failures in the rss_ctx.test_hitless_key_update case.
Looking closer it appears that the failures are mostly due
to startup costs. I measured the following timing for ethtool -X:
- python cmd(shell=True) : 150-250msec
- python cmd(shell=False) : 50- 70msec
- timed in bash : 45- 55msec
- YNL Netlink call : 2- 4msec
- .set_rxfh callback : 1- 2msec
The target in the test was set to 200msec. We were mostly measuring
ethtool startup cost it seems. Switch to YNL since it's 100x faster.
Lower the pass criteria to 150msec, no real science behind this number
but we removed some overhead, drivers which previously passed 200msec
should easily pass 150msec now.
Separately we should probably follow up on defaulting to shell=False,
when script doesn't explicitly ask for True, because the overhead
is rather significant.
Switch from _rss_key_rand() to random.randbytes(), YNL takes a binary
array rather than array of ints.
Jakub Kicinski [Sat, 30 Aug 2025 18:43:16 +0000 (11:43 -0700)]
selftests: drv-net: adjust tests before defaulting to shell=False
Clean up tests which expect shell=True without explicitly passing
that param to cmd(). There seems to be only one such case, and
in fact it's better converted to a direct write.
Eric Dumazet [Mon, 1 Sep 2025 09:26:07 +0000 (09:26 +0000)]
net_sched: add back BH safety to tcf_lock
Jamal reported that we had to use BH safety after all,
because stats can be updated from BH handler.
Fixes: 3133d5c15cb5 ("net_sched: remove BH blocking in eight actions") Fixes: 53df77e78590 ("net_sched: act_skbmod: use RCU in tcf_skbmod_dump()") Fixes: e97ae742972f ("net_sched: act_tunnel_key: use RCU in tunnel_key_dump()") Fixes: 48b5e5dbdb23 ("net_sched: act_vlan: use RCU in tcf_vlan_dump()") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Closes: https://lore.kernel.org/netdev/CAM0EoMmhq66EtVqDEuNik8MVFZqkgxFbMu=fJtbNoYD7YXg4bA@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250901092608.2032473-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: selftests: clean up tools/testing/selftests/net/lib/py/utils.py
This patch improves the utils.py module by removing unused imports
(errno, random), simplifying the fd_read_timeout() function by
eliminating unnecessary else clause, and cleaning up code style in the
defer class constructor.
Additionally, it renames the parameter in rand_port() from 'type' to
'stype' to avoid shadowing the built-in Python name 'type', improving
code clarity and preventing potential issues.
These changes enhance code readability and maintainability without
affecting functionality.
James Flowers [Mon, 1 Sep 2025 03:04:59 +0000 (20:04 -0700)]
net/smc: Replace use of strncpy on NUL-terminated string with strscpy
strncpy is deprecated for use on NUL-terminated strings, as indicated in
Documentation/process/deprecated.rst. strncpy NUL-pads the destination
buffer and doesn't guarantee the destination buffer will be NUL
terminated.
Signed-off-by: James Flowers <bold.zone2373@fastmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250901030512.80099-1-bold.zone2373@fastmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jay Vosburgh [Sat, 30 Aug 2025 00:08:37 +0000 (17:08 -0700)]
bonding: Remove support for use_carrier
Remove the implementation of use_carrier, the link monitoring
method that utilizes ethtool or ioctl to determine the link state of an
interface in a bond. Bonding will always behaves as if use_carrier=1,
which relies on netif_carrier_ok() to determine the link state of
interfaces.
To avoid acquiring RTNL many times per second, bonding inspects
link state under RCU, but not under RTNL. However, ethtool
implementations in drivers may sleep, and therefore this strategy is
unsuitable for use with calls into driver ethtool functions.
The use_carrier option was introduced in 2003, to provide
backwards compatibility for network device drivers that did not support
the then-new netif_carrier_ok/on/off system. Device drivers are now
expected to support netif_carrier_*, and the use_carrier backwards
compatibility logic is no longer necessary.
The option itself remains, but when queried always returns 1,
and may only be set to 1.
netfilter: nft_payload: extend offset to 65535 bytes
In some situations 255 bytes offset is not enough to match or manipulate
the desired packet field. Increase the offset limit to 65535 or U16_MAX.
In addition, the nla policy maximum value is not set anymore as it is
limited to s16. Instead, the maximum value is checked during the payload
expression initialization function.
Tested with the nft command line tool.
table ip filter {
chain output {
@nh,2040,8 set 0xff
@nh,524280,8 set 0xff
@nh,524280,8 0xff
@nh,2040,8 0xff
}
}
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Sun, 24 Aug 2025 10:44:39 +0000 (12:44 +0200)]
netfilter: nft_set_pipapo: remove redundant test for avx feature bit
Sebastian points out that avx2 depends on avx, see check_cpufeature_deps()
in arch/x86/kernel/cpu/cpuid-deps.c:
avx2 feature bit will be cleared when avx isn't available.
No functional change intended.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Fri, 22 Aug 2025 08:15:37 +0000 (10:15 +0200)]
netfilter: nf_tables: allow iter callbacks to sleep
Quoting Sven Auhagen:
we do see on occasions that we get the following error message, more so on
x86 systems than on arm64:
Error: Could not process rule: Cannot allocate memory delete table inet filter
It is not a consistent error and does not happen all the time.
We are on Kernel 6.6.80, seems to me like we have something along the lines
of the nf_tables: allow clone callbacks to sleep problem using GFP_ATOMIC.
As hinted at by Sven, this is because of GFP_ATOMIC allocations during
set flush.
When set is flushed, all elements are deactivated. This triggers a set
walk and each element gets added to the transaction list.
The rbtree and rhashtable sets don't allow the iter callback to sleep:
rbtree walk acquires read side of an rwlock with bh disabled, rhashtable
walk happens with rcu read lock held.
Rbtree is simple enough to resolve:
When the walk context is ITER_READ, no change is needed (the iter
callback must not deactivate elements; we're not in a transaction).
When the iter type is ITER_UPDATE, the rwlock isn't needed because the
caller holds the transaction mutex, this prevents any and all changes to
the ruleset, including add/remove of set elements.
Rhashtable is slightly more complex.
When the iter type is ITER_READ, no change is needed, like rbtree.
For ITER_UPDATE, we hold transaction mutex which prevents elements from
getting free'd, even outside of rcu read lock section.
So build a temporary list of all elements while doing the rcu iteration
and then call the iterator in a second pass.
The disadvantage is the need to iterate twice, but this cost comes with
the benefit to allow the iter callback to use GFP_KERNEL allocations in
a followup patch.
The new list based logic makes it necessary to catch recursive calls to
the same set earlier.
Such walk -> iter -> walk recursion for the same set can happen during
ruleset validation in case userspace gave us a bogus (cyclic) ruleset
where verdict map m jumps to chain that sooner or later also calls
"vmap @m".
Before the new ->in_update_walk test, the ruleset is rejected because the
infinite recursion causes ctx->level to exceed the allowed maximum.
But with the new logic added here, elements would get skipped:
nft_rhash_walk_update would see elements that are on the walk_list of
an older stack frame.
As all recursive calls into same map results in -EMLINK, we can avoid this
problem by using the new in_update_walk flag and reject immediately.
Next patch converts the problematic GFP_ATOMIC allocations.
Reported-by: Sven Auhagen <Sven.Auhagen@belden.com> Closes: https://lore.kernel.org/netfilter-devel/BY1PR18MB5874110CAFF1ED098D0BC4E7E07BA@BY1PR18MB5874.namprd18.prod.outlook.com/ Signed-off-by: Florian Westphal <fw@strlen.de>
Qianfeng Rong [Sun, 17 Aug 2025 09:15:56 +0000 (17:15 +0800)]
netfilter: ebtables: Use vmalloc_array() to improve code
Remove array_size() calls and replace vmalloc() with vmalloc_array() to
simplify the code. vmalloc_array() is also optimized better, uses fewer
instructions, and handles overflow more concisely[1].
[1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/ Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Florian Westphal <fw@strlen.de>
An mlx5 E-Switch FDB table can manage vports belonging to other sibling
physical functions, such as ECPF (ARM embedded cores) and Host PF (x86).
This enables a single source of truth for SDN software to manage network
pipelines from one host. While such functionality already exists in mlx5,
it is currently limited by static vport allocation,
meaning the number of vports shared between multi-host functions
must be known pre-boot.
This patchset enables delegated/external vports to be discovered
dynamically when switchdev mode is enabled, leveraging new firmware
capabilities for dynamic vport creation.
Adjacent functions that delegate their SR-IOV VFs to sibling PFs, can be
dynamically discovered on the sibling PF's switchdev mode enabling,
after sriov was enabled on the originating PF, allowing for more
flexible and scalable management in multi-host and ECPF-to-host
scenarios.
The patchset consists of the following changes:
- Refactoring of ACL root namespace handling: The storage of vport ACL root
namespaces is converted from a linear array to an xarray, allowing dynamic
creation of ACLs per individual vport.
- Improvements for vhca_id to vport mapping.
- Dynamic querying and creation of delegated functions/vports.
====================
net/mlx5: E-switch, Set representor attributes for adjacent VFs
Adjacent vfs get their devlink port information from firmware,
use the information (pfnum, function id) from FW when populating the
devlink port attributes.
Before:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev eth0 flavour pcivf controller 0 pfnum 0 vfnum 49152 external false splittable false
function:
hw_addr 00:00:00:00:00:00
After:
$ devlink port show
pci/0000:00:03.0/180225: type eth netdev enp0s3npf0vf2 flavour pcivf controller 0 pfnum 0 vfnum 2 external false splittable false
function:
hw_addr 00:00:00:00:00:00
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250829223722.900629-7-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Saeed Mahameed [Fri, 29 Aug 2025 22:37:20 +0000 (15:37 -0700)]
net/mlx5: E-Switch, Register representors for adjacent vports
Register representors for adjacent vports dynamically when they are
discovered. Dynamically added representors state will now be set to
'REGISTERED' when the representor type was already registered,
otherwise they won't be loaded.
net/mlx5: E-Switch, Add support for adjacent functions vports discovery
Adding driver support to query adjacent functions vports, AKA
delegated vports.
Adjacent functions can delegate their sriov vfs to other sibling PF in
the system, to be managed by the eswitch capable sibling PF.
E.g, ECPF to Host PF, multi host PF between each other, etc.
Only supported in switchdev mode.
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250829223722.900629-4-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Saeed Mahameed [Fri, 29 Aug 2025 22:37:17 +0000 (15:37 -0700)]
net/mlx5: E-Switch, Move vport acls root namespaces creation to eswitch
Move the loop that creates the vports ACLs root name spaces to eswitch,
since it is the eswitch responsibility to decide when and how many
vports ACLs root namespaces to create, in the next patches we will use
the fs_core vport ACL root namespace APIs to create/remove root ns
ACLs dynamically for dynamically created vports.
Saeed Mahameed [Fri, 29 Aug 2025 22:37:16 +0000 (15:37 -0700)]
net/mlx5: FS, Convert vport acls root namespaces to xarray
Before this patch it was a linear array and could only support a certain
number of vports, in the next patches, vport numbers are not bound to a
well known limit, thus convert acl root name space storage to xarray.
In addition create fs_core public API to add/remove vport acl namespaces
as it is the eswitch responsibility to create the vports and their
root name spaces for acls, in the next patch we will move
mlx5_fs_ingress_acls_{init,cleanup} to eswitch and will use
the individual mlx5_fs_vport_{egress,ingresS}_acl_ns_{add,remove}
APIs for dynamically create vports.
====================
Add NETC Timer PTP driver and add PTP support for i.MX95
This series adds NETC Timer PTP clock driver, which supports precise
periodic pulse, time capture on external pulse and PTP synchronization.
It also adds PTP support to the enetc v4 driver for i.MX95 and optimizes
the PTP-related code in the enetc driver.
====================
Wei Fang [Fri, 29 Aug 2025 05:06:15 +0000 (13:06 +0800)]
net: enetc: don't update sync packet checksum if checksum offload is used
For ENETC v4, the hardware has the capability to support Tx checksum
offload. so the enetc driver does not need to update the UDP checksum
of PTP sync packets if Tx checksum offload is enabled.
Wei Fang [Fri, 29 Aug 2025 05:06:14 +0000 (13:06 +0800)]
net: enetc: add PTP synchronization support for ENETC v4
Regarding PTP, ENETC v4 has some changes compared to ENETC v1 (LS1028A),
mainly as follows.
1. ENETC v4 uses a different PTP driver, so the way to get phc_index is
different from LS1028A. Therefore, enetc_get_ts_info() has been modified
appropriately to be compatible with ENETC v1 and v4.
2. The PMa_SINGLE_STEP register has changed in ENETC v4, not only the
register offset, but also some register fields. Therefore, two helper
functions are added, enetc_set_one_step_ts() for ENETC v1 and
enetc4_set_one_step_ts() for ENETC v4.
3. Since the generic helper functions from ptp_clock are used to get
the PHC index of the PTP clock, so FSL_ENETC_CORE depends on Kconfig
symbol "PTP_1588_CLOCK_OPTIONAL". But FSL_ENETC_CORE can only be
selected, so add the dependency to FSL_ENETC, FSL_ENETC_VF and
NXP_ENETC4. Perhaps the best approach would be to change FSL_ENETC_CORE
to a visible menu entry. Then make FSL_ENETC, FSL_ENETC_VF, and
NXP_ENETC4 depend on it, but this is not the goal of this patch, so this
may be changed in the future.
Wei Fang [Fri, 29 Aug 2025 05:06:13 +0000 (13:06 +0800)]
net: enetc: move sync packet modification before dma_map_single()
Move sync packet content modification before dma_map_single() to follow
correct DMA usage process, even though the previous sequence worked due
to hardware DMA-coherence support (LS1028A). But for the upcoming i.MX95,
its ENETC (v4) does not support "dma-coherent", so this step is very
necessary. Otherwise, the originTimestamp and correction fields of the
sent packets will still be the values before the modification.
The ENETC_F_RX_TSTAMP flag of priv->active_offloads can only be set when
CONFIG_FSL_ENETC_PTP_CLOCK is enabled. Similarly, rx_ring->ext_en can
only be set when CONFIG_FSL_ENETC_PTP_CLOCK is enabled as well. So it is
safe to remove unnecessary CONFIG_FSL_ENETC_PTP_CLOCK check.
Wei Fang [Fri, 29 Aug 2025 05:06:11 +0000 (13:06 +0800)]
net: enetc: extract enetc_update_ptp_sync_msg() to handle PTP Sync packets
Move PTP Sync packet processing from enetc_map_tx_buffs() to a new helper
function enetc_update_ptp_sync_msg() to simplify the original function.
Prepare for upcoming ENETC v4 one-step support. There is no functional
change. It is worth mentioning that ENETC_TXBD_TSTAMP is added to replace
0x3fffffff.
Wei Fang [Fri, 29 Aug 2025 05:06:10 +0000 (13:06 +0800)]
net: enetc: save the parsed information of PTP packet to skb->cb
Currently, the Tx PTP packets are parsed twice in the enetc driver, once
in enetc_xmit() and once in enetc_map_tx_buffs(). The latter is duplicate
and is unnecessary, since the parsed information can be saved to skb->cb
so that enetc_map_tx_buffs() can get the previously parsed data from
skb->cb. Therefore, add struct enetc_skb_cb as the format of the data
in the skb->cb buffer to save the parsed information of PTP packet. Use
saved information in enetc_map_tx_buffs() to avoid parsing data again.
In addition, rename variables offset1 and offset2 in enetc_map_tx_buffs()
to corr_off and tstamp_off for better readability.
F.S. Peng [Fri, 29 Aug 2025 05:06:08 +0000 (13:06 +0800)]
ptp: netc: add external trigger stamp support
The NETC Timer is capable of recording the timestamp on receipt of an
external pulse on a GPIO pin. It supports two such external triggers.
The recorded value is saved in a 16 entry FIFO accessed by
TMR_ETTSa_H/L. An interrupt can be generated when the trigger occurs,
when the FIFO reaches a threshold or overflows.
Wei Fang [Fri, 29 Aug 2025 05:06:07 +0000 (13:06 +0800)]
ptp: netc: add periodic pulse output support
NETC Timer has three pulse channels, all of which support periodic pulse
output. Bind the channel to a ALARM register and then sets a future time
into the ALARM register. When the current time is greater than the ALARM
value, the FIPER register will be triggered to count down, and when the
count reaches 0, the pulse will be triggered. The PPS signal is also
implemented in this way.
i.MX95 only has ALARM1 can be used as an indication to the FIPER start
down counting, but i.MX943 has ALARM1 and ALARM2 can be used. Therefore,
only one channel can work for i.MX95, two channels for i.MX943 as most.
In addition, change the PPS channel to be dynamically selected from fixed
number (0) because add PTP_CLK_REQ_PEROUT support.
Wei Fang [Fri, 29 Aug 2025 05:06:06 +0000 (13:06 +0800)]
ptp: netc: add PTP_CLK_REQ_PPS support
The NETC Timer is capable of generating a PPS interrupt to the host. To
support this feature, a 64-bit alarm time (which is a integral second
of PHC in the future) is set to TMR_ALARM, and the period is set to
TMR_FIPER. The alarm time is compared to the current time on each update,
then the alarm trigger is used as an indication to the TMR_FIPER starts
down counting. After the period has passed, the PPS event is generated.
According to the NETC block guide, the Timer has three FIPERs, any of
which can be used to generate the PPS events, but in the current
implementation, we only need one of them to implement the PPS feature,
so FIPER 0 is used as the default PPS generator. Also, the Timer has
2 ALARMs, currently, ALARM 0 is used as the default time comparator.
However, if the time is adjusted or the integer of period is changed when
PPS is enabled, the PPS event will not be generated at an integral second
of PHC. The suggested steps from IP team if time drift happens:
1. Disable FIPER before adjusting the hardware time
2. Rearm ALARM after the time adjustment to make the next PPS event be
generated at an integral second of PHC.
3. Re-enable FIPER.
Wei Fang [Fri, 29 Aug 2025 05:06:05 +0000 (13:06 +0800)]
ptp: netc: add NETC V4 Timer PTP driver support
NETC V4 Timer provides current time with nanosecond resolution, precise
periodic pulse, pulse on timeout (alarm), and time capture on external
pulse support. And it supports time synchronization as required for
IEEE 1588 and IEEE 802.1AS-2020.
Inside NETC, ENETC can capture the timestamp of the sent/received packet
through the PHC provided by the Timer and record it on the Tx/Rx BD. And
through the relevant PHC interfaces provided by the driver, the enetc V4
driver can support PTP time synchronization.
In addition, NETC V4 Timer is similar to the QorIQ 1588 timer, but it is
not exactly the same. The current ptp-qoriq driver is not compatible with
NETC V4 Timer, most of the code cannot be reused, see below reasons.
1. The architecture of ptp-qoriq driver makes the register offset fixed,
however, the offsets of all the high registers and low registers of V4
are swapped, and V4 also adds some new registers. so extending ptp-qoriq
to make it compatible with V4 Timer is tantamount to completely rewriting
ptp-qoriq driver.
2. The usage of some functions is somewhat different from QorIQ timer,
such as the setting of TCLK_PERIOD and TMR_ADD, the logic of configuring
PPS, etc., so making the driver compatible with V4 Timer will undoubtedly
increase the complexity of the code and reduce readability.
3. QorIQ is an expired brand. It is difficult for us to verify whether
it works stably on the QorIQ platforms if we refactor the driver, and
this will make maintenance difficult, so refactoring the driver obviously
does not bring any benefits.
Therefore, add this new driver for NETC V4 Timer. Note that the missing
features like PEROUT, PPS and EXTTS will be added in subsequent patches.
Wei Fang [Fri, 29 Aug 2025 05:06:04 +0000 (13:06 +0800)]
ptp: add helpers to get the phc_index by of_node or dev
Some Ethernet controllers do not have an integrated PTP timer function.
Instead, the PTP timer is a separated device and provides PTP hardware
clock to the Ethernet controller to use. Therefore, the Ethernet
controller driver needs to obtain the PTP clock's phc_index in its
ethtool_ops::get_ts_info(). Currently, most drivers implement this in
the following ways.
1. The PTP device driver adds a custom API and exports it to the Ethernet
controller driver.
2. The PTP device driver adds private data to its device structure. So
the private data structure needs to be exposed to the Ethernet controller
driver.
When registering the ptp clock, ptp_clock_register() always saves the
ptp_clock pointer to the private data of ptp_clock::dev. Therefore, as
long as ptp_clock::dev is obtained, the phc_index can be obtained. So
the following generic APIs can be added to the ptp driver to obtain the
phc_index.
1. ptp_clock_index_by_dev(): Obtain the phc_index by the device pointer
of the PTP device.
2.ptp_clock_index_by_of_node(): Obtain the phc_index by the of_node
pointer of the PTP device.
Also, we can add another API like ptp_clock_index_by_fwnode() to get the
phc_index by fwnode of PTP device. However, this API is not used in this
patch set, so it is better to add it when needed.
Wei Fang [Fri, 29 Aug 2025 05:06:03 +0000 (13:06 +0800)]
dt-bindings: net: move ptp-timer property to ethernet-controller.yaml
For some Ethernet controllers, the PTP timer function is not integrated.
Instead, the PTP timer is a separate device and provides PTP Hardware
Clock (PHC) to the Ethernet controller to use, such as NXP FMan MAC,
ENETC, etc. Therefore, a property is needed to indicate this hardware
relationship between the Ethernet controller and the PTP timer.
Since this use case is also very common, it is better to add a generic
property to ethernet-controller.yaml. According to the existing binding
docs, there are two good candidates, one is the "ptp-timer" defined in
fsl,fman-dtsec.yaml, and the other is the "ptimer-handle" defined in
fsl,fman.yaml. From the perspective of the name, the former is more
straightforward, so move the "ptp-timer" from fsl,fman-dtsec.yaml to
ethernet-controller.yaml.
Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250829050615.1247468-3-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Fri, 29 Aug 2025 05:06:02 +0000 (13:06 +0800)]
dt-bindings: ptp: add NETC Timer PTP clock
NXP NETC (Ethernet Controller) is a multi-function PCIe Root Complex
Integrated Endpoint (RCiEP), the Timer is one of its functions which
provides current time with nanosecond resolution, precise periodic
pulse, pulse on timeout (alarm), and time capture on external pulse
support. And also supports time synchronization as required for IEEE
1588 and IEEE 802.1AS-2020. So add device tree binding doc for the PTP
clock based on NETC Timer.
NETC Timer has three reference clock sources, but the clock mux is inside
the IP. Therefore, the driver will parse the clock name to select the
desired clock source. If the clocks property is not present, NETC Timer
will use the system clock of NETC IP as its reference clock. Because the
Timer is a PCIe function of NETC IP, the system clock of NETC is always
available to the Timer.
Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250829050615.1247468-2-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 3c7826d0b106 ("net: stmmac: Separate C22 and C45 transactions
for xgmac") missed a change that happened in commit e2d0acd40c87
("net: stmmac: using pm_runtime_resume_and_get instead of
pm_runtime_get_sync").
Update the two clause 45 functions that didn't get switched to
pm_runtime_resume_and_get().
Miroslav Lichvar [Thu, 28 Aug 2025 10:32:53 +0000 (12:32 +0200)]
ptp: Limit time setting of PTP clocks
Networking drivers implementing PTP clocks and kernel socket code
handling hardware timestamps use the 64-bit signed ktime_t type counting
nanoseconds. When a PTP clock reaches the maximum value in year 2262,
the timestamps returned to applications will overflow into year 1667.
The same thing happens when injecting a large offset with
clock_adjtime(ADJ_SETOFFSET).
The commit 7a8e61f84786 ("timekeeping: Force upper bound for setting
CLOCK_REALTIME") limited the maximum accepted value setting the system
clock to 30 years before the maximum representable value (i.e. year
2232) to avoid the overflow, assuming the system will not run for more
than 30 years.
Enforce the same limit for PTP clocks. Don't allow negative values and
values closer than 30 years to the maximum value. Drivers may implement
an even lower limit if the hardware registers cannot represent the whole
interval between years 1970 and 2262 in the required resolution.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <jstultz@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250828103300.1387025-1-mlichvar@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethernet: qualcomm: QCOM_PPE should depend on ARCH_QCOM
The Qualcomm Technologies, Inc. Packet Process Engine (PPE) is only
present on Qualcomm IPQ SoCs. Hence add a dependency on ARCH_QCOM, to
prevent asking the user about this driver when configuring a kernel
without Qualcomm platform support,
Eric Dumazet [Thu, 28 Aug 2025 19:58:23 +0000 (19:58 +0000)]
ipv4: start using dst_dev_rcu()
Change icmpv4_xrlim_allow(), ip_defrag() to prevent possible UAF.
Change ipmr_prepare_xmit(), ipmr_queue_fwd_xmit(), ip_mr_output(),
ipv4_neigh_lookup() to use lockdep enabled dst_dev_rcu().
Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>