Ivan Vecera [Sat, 26 Jul 2025 18:41:45 +0000 (20:41 +0200)]
dpll: zl3073x: Fix build failure
If CONFIG_ZL3073X is enabled but both CONFIG_ZL3073X_I2C and
CONFIG_ZL3073X_SPI are disabled, the compilation may fail because
CONFIG_REGMAP is not enabled.
Fix the issue by selecting CONFIG_REGMAP when CONFIG_ZL3073X is enabled.
Jakub Kicinski [Sat, 26 Jul 2025 15:53:49 +0000 (08:53 -0700)]
selftests: bpf: fix legacy netfilter options
Recent commit to add NETFILTER_XTABLES_LEGACY missed setting
a couple of configs to y. They are still enabled but as modules
which appears to have upset BPF CI, e.g.:
test_bpf_nf_ct:FAIL:iptables-legacy -t raw -A PREROUTING -j CONNMARK --set-mark 42/0 unexpected error: 768 (errno 0)
Merge in late fixes to prepare for the 6.17 net-next PR.
Conflicts:
net/core/neighbour.c 1bbb76a89948 ("neighbour: Fix null-ptr-deref in neigh_flush_dev().") 13a936bb99fb ("neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.") 03dc03fa0432 ("neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries")
Adjacent changes:
drivers/net/usb/usbnet.c 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event") 2c04d279e857 ("net: usb: Convert tasklet API to new bottom half workqueue mechanism")
net/ipv6/route.c 31d7d67ba127 ("ipv6: annotate data-races around rt->fib6_nsiblings") 1caf27297215 ("ipv6: adopt dst_dev() helper") 3b3ccf9ed05e ("net: Remove unnecessary NULL check for lwtunnel_fill_encap()")
====================
ipv6: f6i->fib6_siblings and rt->fib6_nsiblings fixes
Series based on an internal syzbot report with a repro.
After fixing (in the first patch) the original minor issue,
I found that syzbot repro was able to trigger a second
more serious bug in rt6_nlmsg_size().
Code review then led to the two final patches.
I have not released the syzbot bug, because other issues
still need investigations.
====================
Eric Dumazet [Fri, 25 Jul 2025 14:07:24 +0000 (14:07 +0000)]
ipv6: fix possible infinite loop in fib6_info_uses_dev()
fib6_info_uses_dev() seems to rely on RCU without an explicit
protection.
Like the prior fix in rt6_nlmsg_size(),
we need to make sure fib6_del_route() or fib6_add_rt2node()
have not removed the anchor from the list, or we risk an infinite loop.
This is because fib6_del_route() and fib6_add_rt2node()
uses list_del_rcu(), which can confuse rcu readers,
because they might no longer see the head of the list.
Eric Dumazet [Fri, 25 Jul 2025 14:07:22 +0000 (14:07 +0000)]
ipv6: add a retry logic in net6_rt_notify()
inet6_rt_notify() can be called under RCU protection only.
This means the route could be changed concurrently
and rt6_fill_node() could return -EMSGSIZE.
Re-size the skb when this happens and retry, removing
one WARN_ON() that syzbot was able to trigger:
Fixes: 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://patch.msgid.link/20250725140725.3626540-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Fri, 25 Jul 2025 09:56:47 +0000 (10:56 +0100)]
net/sched: taprio: align entry index attr validation with mqprio
Both taprio and mqprio have code to validate respective entry index
attributes. The validation is indented to ensure that the attribute is
present, and that it's value is in range, and that each value is only
used once.
The purpose of this patch is to align the implementation of taprio with
that of mqprio as there seems to be no good reason for them to differ.
For one thing, this way, bugs will be present in both or neither.
As a follow-up some consideration could be given to a common function
used by both sch.
No functional change intended.
Except of tdc run: the results of the taprio tests
# ok 81 ba39 - Add taprio Qdisc to multi-queue device (8 queues)
# ok 82 9462 - Add taprio Qdisc with multiple sched-entry
# ok 83 8d92 - Add taprio Qdisc with txtime-delay
# ok 84 d092 - Delete taprio Qdisc with valid handle
# ok 85 8471 - Show taprio class
# ok 86 0a85 - Add taprio Qdisc to single-queue device
# ok 87 6f62 - Add taprio Qdisc with too short interval
# ok 88 831f - Add taprio Qdisc with too short cycle-time
# ok 89 3e1e - Add taprio Qdisc with an invalid cycle-time
# ok 90 39b4 - Reject grafting taprio as child qdisc of software taprio
# ok 91 e8a1 - Reject grafting taprio as child qdisc of offloaded taprio
# ok 92 a7bf - Graft cbs as child of software taprio
# ok 93 6a83 - Graft cbs as child of offloaded taprio
Alexander Stein [Fri, 25 Jul 2025 05:56:13 +0000 (07:56 +0200)]
net: fsl_pq_mdio: use dev_err_probe
Silence deferred probes using dev_err_probe(). This can happen when
the ethernet PHY uses an IRQ line attached to a i2c GPIO expander. If the
i2c bus is not yet ready, a probe deferral can occur.
Xiumei Mu [Fri, 25 Jul 2025 03:50:28 +0000 (11:50 +0800)]
selftests: rtnetlink.sh: remove esp4_offload after test
The esp4_offload module, loaded during IPsec offload tests, should
be reset to its default settings after testing.
Otherwise, leaving it enabled could unintentionally affect subsequence
test cases by keeping offload active.
Jason Xing [Wed, 23 Jul 2025 14:23:27 +0000 (22:23 +0800)]
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
There is no break time in the while() loop, so every time at the end of
igb_xmit_zc(), negative overflow of nb_pkts will occur, which renders
the return value always false. But theoretically, the result should be
set after calling xsk_tx_peek_release_desc_batch(). We can take
i40e_xmit_zc() as a good example.
Returning false means we're not done with transmission and we need one
more poll, which is exactly what igb_xmit_zc() always did before this
patch. After this patch, the return value depends on the nb_pkts value.
Two cases might happen then:
1. if (nb_pkts < budget), it means we process all the possible data, so
return true and no more necessary poll will be triggered because of
this.
2. if (nb_pkts == budget), it means we might have more data, so return
false to let another poll run again.
Fixes: f8e284a02afc ("igb: Add AF_XDP zero-copy Tx support") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250723142327.85187-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Wed, 23 Jul 2025 14:23:26 +0000 (22:23 +0800)]
stmmac: xsk: fix negative overflow of budget in zerocopy mode
A negative overflow can happen when the budget number of descs are
consumed. as long as the budget is decreased to zero, it will again go
into while (budget-- > 0) statement and get decreased by one, so the
overflow issue can happen. It will lead to returning true whereas the
expected value should be false.
In this case where all the budget is used up, it means zc function
should return false to let the poll run again because normally we
might have more data to process. Without this patch, zc function would
return true instead.
Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250723142327.85187-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tristram Ha [Fri, 25 Jul 2025 00:17:53 +0000 (17:17 -0700)]
net: dsa: microchip: Disable PTP function of KSZ8463
The PTP function of KSZ8463 is on by default. However, its proprietary
way of storing timestamp directly in a reserved field inside the PTP
message header is not suitable for use with the current Linux PTP stack
implementation. It is necessary to disable the PTP function to not
interfere the normal operation of the MAC.
Note the PTP driver for KSZ switches does not work for KSZ8463 and is not
activated for it.
Tristram Ha [Fri, 25 Jul 2025 00:17:52 +0000 (17:17 -0700)]
net: dsa: microchip: Setup fiber ports for KSZ8463
The fiber ports in KSZ8463 cannot be detected internally, so it requires
specifying that condition in the device tree. Like the one used in
Micrel PHY the port link can only be read and there is no write to the
PHY. The driver programs registers to operate fiber ports correctly.
Tristram Ha [Fri, 25 Jul 2025 00:17:49 +0000 (17:17 -0700)]
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
KSZ8463 switch is a 3-port switch based from KSZ8863. Its major
difference from other KSZ SPI switches is its register access is not a
simple continual 8-bit transfer with automatic address increase but uses
a byte-enable mechanism specifying 8-bit, 16-bit, or 32-bit access. Its
registers are also defined in 16-bit format because it shares a design
with a MAC controller using 16-bit access. As a result some common
register accesses need to be re-arranged.
This patch adds the basic structure for using KSZ8463. It cannot use the
same regmap table for other KSZ switches as it interprets the 16-bit
value as little-endian and its SPI commands are different.
KSZ8463 uses a byte-enable mechanism to specify 8-bit, 16-bit, and 32-bit
access. The register is first shifted right by 2 then left by 4. Extra
4 bits are added. If the access is 8-bit one of the 4 bits is set. If
the access is 16-bit two of the 4 bits are set. If the access is 32-bit
all 4 bits are set. The SPI command for read or write is then added.
Because of this register transformation separate SPI read and write
functions are provided for KSZ8463.
KSZ8463's internal PHYs use standard PHY register definitions so there is
no need to remap things. However, the hardware has a bug that the high
word and low word of the PHY id are swapped. In addition the port
registers are arranged differently so KSZ8463 has its own mapping for
port registers and PHY registers. Therefore the PORT_CTRL_ADDR macro is
replaced with the get_port_addr helper function.
Currently, UDP exchange is prone to failure when cmd attempt to send data
while socat in bkg is not ready. Since, the behavior is probabilistic, this
can result in flakiness for XDP tests. While testing
test_xdp_native_tx_mb() on netdevsim, a failure rate of around 1% in 500
500 iterations was observed.
Use wait_port_listen() to ensure that the bkg socat is started and ready to
receive before cmd start sending. With proposed changes, a re-run of the
same test passed 100% of time.
====================
arm64: dts: socfpga: enable ethernet support for Agilex5
This patch set enables ethernet support for the Agilex5 family of SOCFPGAs,
and specifically enables gmac2 on the Agilex5 SOCFPGA Premium Development
Kit.
Patch 1 defines Agilex5 compatibility string in the device tree bindings.
Patch 2 add the new compatibility string to dwmac-socfpga.c.
====================
Jakub Kicinski [Fri, 25 Jul 2025 23:48:37 +0000 (16:48 -0700)]
Merge tag 'linux-can-fixes-for-6.16-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-07-25
The patch is by Stephane Grosjean and adds support the recent firmware
of USB CAN FD interfaces to the peak_usb driver.
* tag 'linux-can-fixes-for-6.16-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: peak_usb: fix USB FD devices potential malfunction
====================
The first patch is by Khaled Elnaggar and converts the janz-ican3
driver's fwinfo_show() to sysfs_emit().
Vincent Mailhol contributes 3 patches that first fix a warning in the
ti_hecc driver and then add missing COMPILE_TEST more compile
coverage to the ti_hecc and tscan1 driver.
Randy Dunlap's patch let's the tscan1 driver depend on PC104.
A patch by Luis Felipe Hernandez fixes a kernel-doc error in the
ctucanfd driver.
Jimmy Assarsson contributes 10 patches for the kvaser_pciefd and 11
for the kvaser_usb driver. Both series simplify the identification of
physical the CAN interfaces and add devlink support to get information
about the running firmware.
* tag 'linux-can-next-for-6.17-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (27 commits)
Documentation: devlink: add devlink documentation for the kvaser_usb driver
can: kvaser_usb: Add devlink port support
can: kvaser_usb: Expose device information via devlink info_get()
can: kvaser_usb: Add devlink support
can: kvaser_usb: Store additional device information
can: kvaser_usb: Store the different firmware version components in a struct
can: kvaser_usb: Move comment regarding max_tx_urbs
can: kvaser_usb: Add intermediate variables
can: kvaser_usb: Assign netdev.dev_port based on device channel index
can: kvaser_usb: Add support for ethtool set_phys_id()
can: kvaser_usb: Add support to control CAN LEDs on device
Documentation: devlink: add devlink documentation for the kvaser_pciefd driver
can: kvaser_pciefd: Add devlink port support
can: kvaser_pciefd: Expose device firmware version via devlink info_get()
can: kvaser_pciefd: Add devlink support
can: kvaser_pciefd: Split driver into C-file and header-file.
can: kvaser_pciefd: Store device channel index
can: kvaser_pciefd: Store the different firmware version components in a struct
can: kvaser_pciefd: Add intermediate variable for device struct in probe()
can: kvaser_pciefd: Add support for ethtool set_phys_id()
...
====================
It is a prework to allow reusing some specific Intel code (eq. fwlog).
Move common *_aq_desc structure to libie header and changing
it in ice, ixgbe, i40e and iavf.
Only generic adminq commands can be easily moved to common header, as
rest is slightly different. Format remains the same. It will be better
to correctly move it when it will be needed to commonize other part of
the code.
Move *_aq_str() to new libie module (libie_adminq) and use it across
drivers. The functions are exactly the same in each driver. Some more
adminq helpers/functions can be moved to libie_adminq when needed.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: use libie_aq_str
iavf: use libie_aq_str
ice: use libie_aq_str
libie: add adminq helper for converting err to str
iavf: use libie adminq descriptors
i40e: use libie adminq descriptors
ixgbe: use libie adminq descriptors
ice, libie: move generic adminq descriptors to lib
====================
There are cases in networking (e.g. wireguard, sctp) where a union is
used to provide coverage for either IPv4 or IPv6 network addresses,
and they include an embedded "struct sockaddr" as well (for "sa_family"
and raw "sa_data" access). The current struct sockaddr contains a
flexible array, which means these unions should not be further embedded
in other structs because they do not technically have a fixed size (and
are generating warnings for the coming -Wflexible-array-not-at-end flag
addition). But the future changes to make struct sockaddr a fixed size
(i.e. with a 14 byte sa_data member) make the "sa_data" uses with an IPv6
address a potential place for the compiler to get upset about object size
mismatches. Therefore, we need a sockaddr that cleanly provides both an
sa_family member and an appropriately fixed-sized sa_data member that does
not bloat member usage via the potential alternative of sockaddr_storage
to cover both IPv4 and IPv6, to avoid unseemly churn in the affected code
bases.
Introduce sockaddr_inet as a unified structure for holding both IPv4 and
IPv6 addresses (i.e. large enough to accommodate sockaddr_in6).
The structure is defined in linux/in6.h since its max size is sized
based on sockaddr_in6 and provides a more specific alternative to the
generic sockaddr_storage for IPv4 with IPv6 address family handling.
The "sa_family" member doesn't use the sa_family_t type to avoid needing
layer violating header inclusions.
Also includes the replacements for wireguard and sctp.
====================
There are cases in networking (e.g. wireguard, sctp) where a union is
used to provide coverage for either IPv4 or IPv6 network addresses,
and they include an embedded "struct sockaddr" as well (for "sa_family"
and raw "sa_data" access). The current struct sockaddr contains a
flexible array, which means these unions should not be further embedded
in other structs because they do not technically have a fixed size (and
are generating warnings for the coming -Wflexible-array-not-at-end flag
addition). But the future changes to make struct sockaddr a fixed size
(i.e. with a 14 byte sa_data member) make the "sa_data" uses with an IPv6
address a potential place for the compiler to get upset about object size
mismatches. Therefore, we need a sockaddr that cleanly provides both an
sa_family member and an appropriately fixed-sized sa_data member that does
not bloat member usage via the potential alternative of sockaddr_storage
to cover both IPv4 and IPv6, to avoid unseemly churn in the affected code
bases.
Introduce sockaddr_inet as a unified structure for holding both IPv4 and
IPv6 addresses (i.e. large enough to accommodate sockaddr_in6).
The structure is defined in linux/in6.h since its max size is sized
based on sockaddr_in6 and provides a more specific alternative to the
generic sockaddr_storage for IPv4 with IPv6 address family handling.
The "sa_family" member doesn't use the sa_family_t type to avoid needing
layer violating header inclusions.
net/mlx5e: Support routed networks during IPsec MACs initialization
Remote IPsec tunnel endpoint may refer to a network segment that is
not directly connected to the host. In such a case, IPsec tunnel
endpoints are connected to a router and reachable via a routing path.
In IPsec packet offload mode, HW is initialized with the MAC address
of both IPsec tunnel endpoints.
Extend the current IPsec init MACs procedure to resolve nexthop for
routed networks. Direct neighbour lookup and probe is still used
for directly connected networks and as a fallback mechanism if fib
lookup fails.
Signed-off-by: Alexandre Cassen <acassen@corp.free.fr> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1753194228-333722-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: dsa: b53: mmap: Add bcm63xx EPHY power control
The gpio controller on some bcm63xx SoCs has a register for
controlling functionality of the internal fast ethernet phys.
These patches allow the b53 driver to enable/disable phy
power.
The register also contains reset bits which will be set by
a reset driver in another patch series:
https://lore.kernel.org/all/20250715234605.36216-1-kylehendrydev@gmail.com/
net: dsa: b53: mmap: Add syscon reference and register layout for bcm63268
On bcm63xx SoCs there are registers that control the PHYs in
the GPIO controller. Allow the b53 driver to access them
by passing in the syscon through the device tree.
Add a structure to describe the ephy control register
and add register info for bcm63268.
Commit 21b688dabecb ("net: phy: micrel: Cable Diag feature for lan8814
phy") introduced cable_test support for the LAN8814 that reuses parts of
the KSZ886x logic and introduced the cable_diag_reg and pair_mask
parameters to account for differences between those chips.
However, it did not update the ksz8081_type struct, so those members are
now 0, causing no pairs to be tested in ksz886x_cable_test_get_status
and ksz886x_cable_test_wait_for_completion to poll the wrong register
for the affected PHYs (Basic Control/Reset, which is 0 in normal
operation) and exit immediately.
Fix this by setting both struct members accordingly.
Tristram Ha [Wed, 23 Jul 2025 03:04:03 +0000 (20:04 -0700)]
net: dsa: microchip: Fix wrong rx drop MIB counter for KSZ8863
When KSZ8863 support was first added to KSZ driver the RX drop MIB
counter was somehow defined as 0x105. The TX drop MIB counter
starts at 0x100 for port 1, 0x101 for port 2, and 0x102 for port 3, so
the RX drop MIB counter should start at 0x103 for port 1, 0x104 for
port 2, and 0x105 for port 3.
There are 5 ports for KSZ8895, so its RX drop MIB counter starts at
0x105.
Fixes: 4b20a07e103f ("net: dsa: microchip: ksz8795: add support for ksz88xx chips") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250723030403.56878-1-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gabriel Goller [Tue, 22 Jul 2025 08:18:45 +0000 (10:18 +0200)]
ipv6: add `force_forwarding` sysctl to enable per-interface forwarding
It is currently impossible to enable ipv6 forwarding on a per-interface
basis like in ipv4. To enable forwarding on an ipv6 interface we need to
enable it on all interfaces and disable it on the other interfaces using
a netfilter rule. This is especially cumbersome if you have lots of
interfaces and only want to enable forwarding on a few. According to the
sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding
for all interfaces, while the interface-specific
`net.ipv6.conf.<interface>.forwarding` configures the interface
Host/Router configuration.
Introduce a new sysctl flag `force_forwarding`, which can be set on every
interface. The ip6_forwarding function will then check if the global
forwarding flag OR the force_forwarding flag is active and forward the
packet.
To preserve backwards-compatibility reset the flag (on all interfaces)
to 0 if the net.ipv6.conf.all.forwarding flag is set to 0.
Add a short selftest that checks if a packet gets forwarded with and
without `force_forwarding`.
Simon Horman [Thu, 24 Jul 2025 13:10:54 +0000 (14:10 +0100)]
octeontx2-af: use unsigned int as iterator for unsigned values
The local variable i is used to iterate over unsigned
values. The lower bound of the loop is set to 0. While
the upper bound is cgx->lmac_count, where they lmac_count is
an u8. So the theoretical upper bound is 255.
As is, GCC can't see this range of values and warns that
a formatted string, which includes the %d representation of i,
may overflow the buffer provided.
GCC 15.1.0 says:
.../cgx.c: In function 'cgx_lmac_init':
.../cgx.c:1737:49: warning: '%d' directive writing between 1 and 11 bytes into a region of size between 4 and 6 [-Wformat-overflow=]
1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i);
| ^~
.../cgx.c:1737:37: note: directive argument in the range [-2147483641, 254]
1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i);
| ^~~~~~~~~~~~~~~
.../cgx.c:1737:17: note: 'sprintf' output between 12 and 24 bytes into a destination of size 16
1737 | sprintf(lmac->name, "cgx_fwi_%d_%d", cgx->cgx_id, i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Empirically, changing the type of i from (signed) int to unsigned int
addresses this problem. I assume by allowing GCC to see the range of
values described above.
Also update the format specifiers for the integer values in the string
in question from %d to %u. This seems appropriate as they are now both
unsigned.
No functional change intended.
Compile tested only.
Paolo Abeni [Wed, 23 Jul 2025 14:32:23 +0000 (16:32 +0200)]
mptcp: track fallbacks accurately via mibs
Add the mibs required to cover the few possible fallback causes still
lacking suck info.
Move the relevant mib increment into the fallback helper, so that no
eventual future fallback operation will miss a paired mib increment.
Additionally track failed fallback via its own mib, such mib is
incremented only when a fallback mandated by the protocol fails - due to
racing subflow creation.
While at the above, rename an existing helper to reduce long lines
problems all along.
selftests: net: Skip test if IPv6 is not configured
Extend the `check_for_dependencies()` function in `lib_netcons.sh` to check
whether IPv6 is enabled by verifying the existence of
`/proc/net/if_inet6`. Having IPv6 is a now a dependency of netconsole
tests. If the file does not exist, the script will skip the test with an
appropriate message suggesting to verify if `CONFIG_IPV6` is enabled.
This prevents the test to misbehave if IPv6 is not configured.
Yi Cong [Thu, 24 Jul 2025 01:31:33 +0000 (09:31 +0800)]
usbnet: Set duplex status to unknown in the absence of MII
Currently, USB CDC devices that do not use MDIO to get link status have
their duplex mode set to half-duplex by default. However, since the CDC
specification does not define a duplex status, this can be misleading.
This patch changes the default to DUPLEX_UNKNOWN in the absence of MII,
which more accurately reflects the state of the link and avoids implying
an incorrect or error state.
Which is similar to recent syzkaller reports in [0] and [1] and triggers
because macsec does not advertise IFF_UNICAST_FLT although it has proper
ndo_set_rx_mode callback that takes care of pushing uc/mc addresses
down to the real device.
In general, dev_uc_add call path is problematic for stacking
non-IFF_UNICAST_FLT because we might grab netdev instance lock under
addr_list_lock spinlock, so this is not a systemic fix.
John Ernberg [Wed, 23 Jul 2025 10:25:35 +0000 (10:25 +0000)]
net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
up and down events when the WWAN interface is activated on the modem-side.
Interrupt URBs will in consecutive polls grab:
* Link Connected
* Link Disconnected
* Link Connected
Where the last Connected is then a stable link state.
When the system is under load this may cause the unlink_urbs() work in
__handle_link_change() to not complete before the next usbnet_link_change()
call turns the carrier on again, allowing rx_submit() to queue new SKBs.
In that event the URB queue is filled faster than it can drain, ending up
in a RCU stall:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
net: hibmcge: support for statistics of reset failures
Add a statistical item to count the number of reset failures.
This statistical item can be queried using ethtool -S or
reported through diagnose information.
Shahar Shitrit [Wed, 23 Jul 2025 07:44:32 +0000 (10:44 +0300)]
net/mlx5e: Fix potential deadlock by deferring RX timeout recovery
mlx5e_reporter_rx_timeout() is currently invoked synchronously
in the driver's open error flow. This causes the thread holding
priv->state_lock to attempt acquiring the devlink lock, which
can result in a circular dependency with other devlink operations.
For example:
- Devlink health diagnose flow:
- __devlink_nl_pre_doit() acquires the devlink lock.
- devlink_nl_health_reporter_diagnose_doit() invokes the
driver's diagnose callback.
- mlx5e_rx_reporter_diagnose() then attempts to acquire
priv->state_lock.
- Driver open flow:
- mlx5e_open() acquires priv->state_lock.
- If an error occurs, devlink_health_reporter may be called,
attempting to acquire the devlink lock.
To prevent this circular locking scenario, defer the RX timeout
recovery by scheduling it via a workqueue. This ensures that the
recovery work acquires locks in a consistent order: first the
devlink lock, then priv->state_lock.
Additionally, make the recovery work acquire the netdev instance
lock to safely synchronize with the open/close channel flows,
similar to mlx5e_tx_timeout_work. Repeatedly attempt to acquire
the netdev instance lock until it is taken or the target RQ is no
longer active, as indicated by the MLX5E_STATE_CHANNELS_ACTIVE bit.
Fixes: 32c57fb26863 ("net/mlx5e: Report and recover from rx timeout") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1753256672-337784-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianbo Liu [Wed, 23 Jul 2025 07:44:31 +0000 (10:44 +0300)]
net/mlx5e: Remove skb secpath if xfrm state is not found
Hardware returns a unique identifier for a decrypted packet's xfrm
state, this state is looked up in an xarray. However, the state might
have been freed by the time of this lookup.
Currently, if the state is not found, only a counter is incremented.
The secpath (sp) extension on the skb is not removed, resulting in
sp->len becoming 0.
Subsequently, functions like __xfrm_policy_check() attempt to access
fields such as xfrm_input_state(skb)->xso.type (which dereferences
sp->xvec[sp->len - 1]) without first validating sp->len. This leads to
a crash when dereferencing an invalid state pointer.
This patch prevents the crash by explicitly removing the secpath
extension from the skb if the xfrm state is not found after hardware
decryption. This ensures downstream functions do not operate on a
zero-length secpath.
net/mlx5e: Clear Read-Only port buffer size in PBMC before update
When updating the PBMC register, we read its current value,
modify desired fields, then write it back.
The port_buffer_size field within PBMC is Read-Only (RO).
If this RO field contains a non-zero value when read,
attempting to write it back will cause the entire PBMC
register update to fail.
This commit ensures port_buffer_size is explicitly cleared
to zero after reading the PBMC register but before writing
back the modified value.
This allows updates to other fields in the PBMC register to succeed.
Yi Chen [Thu, 24 Jul 2025 08:06:53 +0000 (16:06 +0800)]
selftests: netfilter: ipvs.sh: Explicity disable rp_filter on interface tunl0
Although setup_ns() set net.ipv4.conf.default.rp_filter=0,
loading certain module such as ipip will automatically create a tunl0 interface
in all netns including new created ones. In the script, this is before than
default.rp_filter=0 applied, as a result tunl0.rp_filter remains set to 1
which causes the test report FAIL when ipip module is preloaded.
Before fix:
Testing DR mode...
Testing NAT mode...
Testing Tunnel mode...
ipvs.sh: FAIL
After fix:
Testing DR mode...
Testing NAT mode...
Testing Tunnel mode...
ipvs.sh: PASS
Fixes: 7c8b89ec506e ("selftests: netfilter: remove rp_filter configuration") Signed-off-by: Yi Chen <yiche@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nft_set_pipapo: prefer kvmalloc for scratch maps
The scratchmap size depends on the number of elements in the set.
For huge sets, each scratch map can easily require very large
allocations, e.g. for 100k entries each scratch map will require
close to 64kbyte of memory.
netfilter: nft_set: remove indirection from update API call
This stems from a time when sets and nft_dynset resided in different kernel
modules. We can replace this with a direct call.
We could even remove both ->update and ->delete, given its only
supported by rhashtable, but on the off-chance we'll see runtime
add/delete for other types or a new set type keep that as-is for now.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Tue, 8 Jul 2025 13:04:02 +0000 (15:04 +0200)]
netfilter: nfnetlink_hook: Dump flowtable info
Introduce NFNL_HOOK_TYPE_NFT_FLOWTABLE to distinguish flowtable hooks
from base chain ones. Nested attributes are shared with the old NFTABLES
hook info type since they fit apart from their misleading name.
Old nftables in user space will ignore this new hook type and thus
continue to print flowtable hooks just like before, e.g.:
Phil Sutter [Tue, 8 Jul 2025 13:04:01 +0000 (15:04 +0200)]
netfilter: nfnetlink: New NFNLA_HOOK_INFO_DESC helper
Introduce a helper routine adding the nested attribute for use by a
second caller later.
Note how this introduces cancelling of 'nest2' for categorical reasons.
Since always followed by cancelling of the outer 'nest', it is
technically not needed.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ipvs: Rename del_timer in comment in ip_vs_conn_expire_now()
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()")
switched del_timer to timer_delete, but did not modify the comment for
ip_vs_conn_expire_now(). Now fix it.
Some specified options rely on NETFILTER_XTABLES_LEGACY to be enabled.
IP_NF_TARGET_TTL for instance depends on IP_NF_MANGLE which in turn
depends on IP_NF_IPTABLES_LEGACY -> NETFILTER_XTABLES_LEGACY.
Enable relevant iptables config options explicitly, this is needed
to avoid breakage when symbols related to iptables-legacy
will depend on NETFILTER_LEGACY resp. IP_TABLES_LEGACY.
This also means that the classic tables (Kernel modules) will
not be enabled by default, so enable them too.
Signed-off-by: Florian Westphal <fw@strlen.de>
[bigeasy: Split out the config bits from the main patch] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The seqcount xt_recseq is used to synchronize the replacement of
xt_table::private in xt_replace_table() against all readers such as
ipt_do_table()
To ensure that there is only one writer, the writing side disables
bottom halves. The sequence counter can be acquired recursively. Only the
first invocation modifies the sequence counter (signaling that a writer
is in progress) while the following (recursive) writer does not modify
the counter.
The lack of a proper locking mechanism for the sequence counter can lead
to live lock on PREEMPT_RT if the high prior reader preempts the
writer. Additionally if the per-CPU lock on PREEMPT_RT is removed from
local_bh_disable() then there is no synchronisation for the per-CPU
sequence counter.
The affected code is "just" the legacy netfilter code which is replaced
by "netfilter tables". That code can be disabled without sacrificing
functionality because everything is provided by the newer
implementation. This will only requires the usage of the "-nft" tools
instead of the "-legacy" ones.
The long term plan is to remove the legacy code so lets accelerate the
progress.
Relax dependencies on iptables legacy, replace select with depends on,
this should cause no harm to existing kernel configs and users can still
toggle IP{6}_NF_IPTABLES_LEGACY in any case.
Make EBTABLES_LEGACY, IPTABLES_LEGACY and ARPTABLES depend on
NETFILTER_XTABLES_LEGACY. Hide xt_recseq and its users,
xt_register_table() and xt_percpu_counter_alloc() behind
NETFILTER_XTABLES_LEGACY. Let NETFILTER_XTABLES_LEGACY depend on
!PREEMPT_RT.
This will break selftest expecing the legacy options enabled and will be
addressed in a following patch.
Co-developed-by: Florian Westphal <fw@strlen.de> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Lance Yang [Mon, 26 May 2025 08:59:02 +0000 (16:59 +0800)]
netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid
When no logger is registered, nf_conntrack_log_invalid fails to log invalid
packets, leaving users unaware of actual invalid traffic. Improve this by
loading nf_log_syslog, similar to how 'iptables -I FORWARD 1 -m conntrack
--ctstate INVALID -j LOG' triggers it.
Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Zi Li <zi.li@linux.dev> Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Merge patch series "can: kvaser_usb: Simplify identification of physical CAN interfaces"
Jimmy Assarsson <extja@kvaser.com> says:
This patch series simplifies the process of identifying which network
interface (can0..canX) corresponds to which physical CAN channel on
Kvaser USB based CAN interfaces.
Note that this patch series is based on [1]
"can: kvaser_pciefd: Simplify identification of physical CAN interfaces"
Changes in v3:
- Fix GCC compiler array warning (-Warray-bounds)
- Fix transient Sparse warning
- Add tag Reviewed-by Vincent Mailhol
Changes in v2:
- New patch with devlink documentation
- New patch assigning netdev.dev_port
- Formatting and refactoring
Jimmy Assarsson [Fri, 25 Jul 2025 12:34:52 +0000 (14:34 +0200)]
Documentation: devlink: add devlink documentation for the kvaser_usb driver
List the version information reported by the kvaser_usb driver
through devlink.
Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-12-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jimmy Assarsson [Fri, 25 Jul 2025 12:34:51 +0000 (14:34 +0200)]
can: kvaser_usb: Add devlink port support
Register each CAN channel of the device as an devlink physical port.
This makes it easier to get device information for a given network
interface (i.e. can2).
Example output:
$ devlink dev
usb/1-1.3:1.0
$ devlink port
usb/1-1.3:1.0/0: type eth netdev can0 flavour physical port 0 splittable false
usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 1 splittable false
$ devlink port show can1
usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 0 splittable false
$ devlink dev info
usb/1-1.3:1.0:
driver kvaser_usb
serial_number 1020
versions:
fixed:
board.rev 1
board.id 7330130009653
running:
fw 3.22.527
$ ethtool -i can1
driver: kvaser_usb
version: 6.12.10-arch1-1
firmware-version: 3.22.527
expansion-rom-version:
bus-info: 1-1.3:1.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
Jimmy Assarsson [Fri, 25 Jul 2025 12:34:47 +0000 (14:34 +0200)]
can: kvaser_usb: Store the different firmware version components in a struct
Store firmware version in kvaser_usb_fw_version struct, specifying the
different components of the version number.
And drop debug prinout of firmware version, since later patches will expose
it via the devlink interface.
Jimmy Assarsson [Fri, 25 Jul 2025 12:34:44 +0000 (14:34 +0200)]
can: kvaser_usb: Assign netdev.dev_port based on device channel index
Assign netdev.dev_port based on the device channel index, to indicate the
port number of the network device.
While this driver already uses netdev.dev_id for that purpose, dev_port is
more appropriate. However, retain dev_id to avoid potential regressions.
Fixes: 3e66d0138c05 ("can: populate netdev::dev_id for udev discrimination") Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-4-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Merge patch series "can: kvaser_pciefd: Simplify identification of physical CAN interfaces"
Jimmy Assarsson <extja@kvaser.com> says:
This patch series simplifies the process of identifying which network
interface (can0..canX) corresponds to which physical CAN channel on
Kvaser PCIe based CAN interfaces.
Changes in v4:
- Fix transient Sparse warning
- Add tag Reviewed-by Vincent Mailhol
Changes in v3:
- Fixed typo; kvaser_pcied -> kvaser_pciefd in documentation patch
Changes in v2:
- Replace use of netdev.dev_id with netdev.dev_port
- Formatting and refactoring
- New patch with devlink documentation