====================
mptcp: avoid dup NL events and propagate error
Here are two fixes affecting the MPTCP Netlink events with their tests:
- Patches 1 & 2: a subflow closed NL event was visible multiple times in
some specific conditions. A fix for v5.12.
- Patches 3 & 4: subflow closed NL events never contained the error
code, even when expected. A fix for v5.11.
Plus an extra fix:
- Patch 5: fix a false positive with the "signal addresses race test"
subtest when validating the MPTCP Join selftest on a v5.15.y stable
kernel.
====================
selftests: mptcp: join: fix local endp not being tracked
When running this mptcp_join.sh selftest on older kernel versions not
supporting local endpoints tracking, this test fails because 3 MP_JOIN
ACKs have been received, while only 2 were expected.
It is not clear why only 2 MP_JOIN ACKs were expected on old kernel
versions, while 3 MP_JOIN SYN and SYN+ACK were expected. When testing on
the v5.15.197 kernel, 3 MP_JOIN ACKs are seen, which is also what is
expected in the selftests included in this kernel version, see commit f4480eaad489 ("selftests: mptcp: add missing join check").
Switch the expected MP_JOIN ACKs to 3. While at it, move this
chk_join_nr helper out of the special condition for older kernel
versions as it is now the same as with more recent ones. Also, invert
the condition to be more logical: what's expected on newer kernel
versions having such helper first.
Fixes: d4c81bbb8600 ("selftests: mptcp: join: support local endpoint being tracked or not") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-5-7f71e1bc4feb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
selftests: mptcp: check subflow errors in close events
This validates the previous commit: subflow closed events should contain
an error field when a subflow got closed with an error, e.g. reset or
timeout.
For this test, the chk_evt_nr helper has been extended to check
attributes in the matched events.
In this test, the 2 subflow closed events should have an error.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Some subflow socket errors need to be reported to the MPTCP socket: the
initial subflow connect (MP_CAPABLE), and the ones from the fallback
sockets. The others are not propagated.
The issue is that sock_error() was used to retrieve the error, which was
also resetting the sk_err field. Because of that, when notifying the
userspace about subflow close events later on from the MPTCP worker, the
ssk->sk_err field was always 0.
Now, the error (sk_err) is only reset when propagating it to the msk.
selftests: mptcp: check no dup close events after error
This validates the previous commit: subflow closed events are re-sent
with less info when the initial subflow is disconnected after an error
and each time a subflow is closed after that.
In this new test, the userspace PM is involved because that's how it was
discovered, but it is not specific to it. The initial subflow is
terminated with a RESET, and that will cause the subflow disconnect.
Then, a new subflow is initiated, but also got rejected, which cause a
second subflow closed event, but not a third one.
While at it, in case of failure to get the expected amount of events,
the events are printed.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
mptcp: avoid dup SUB_CLOSED events after disconnect
In case of subflow disconnect(), which can also happen with the first
subflow in case of errors like timeout or reset, mptcp_subflow_ctx_reset
will reset most fields from the mptcp_subflow_context structure,
including close_event_done. Then, when another subflow is closed, yet
another SUB_CLOSED event for the disconnected initial subflow is sent.
Because of the previous reset, there are no source address and
destination port.
A solution is then to also check the subflow's local id: it shouldn't be
negative anyway.
Another solution would be not to reset subflow->close_event_done at
disconnect time, but when reused. But then, probably the whole reset
could be done when being reused. Let's not change this logic, similar
to TCP with tcp_disconnect().
Jianbo Liu [Tue, 27 Jan 2026 08:52:41 +0000 (10:52 +0200)]
net/mlx5e: Skip ESN replay window setup for IPsec crypto offload
Commit a5e400a985df ("net/mlx5e: Honor user choice of IPsec replay
window size") introduced logic to setup the ESN replay window size.
This logic is only valid for packet offload.
However, the check to skip this block only covered outbound offloads.
It was not skipped for crypto offload, causing it to fall through to
the new switch statement and trigger its WARN_ON default case (for
instance, if a window larger than 256 bits was configured).
Fix this by amending the condition to also skip the replay window
setup if the offload type is not XFRM_DEV_OFFLOAD_PACKET.
Fixes: a5e400a985df ("net/mlx5e: Honor user choice of IPsec replay window size") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1769503961-124173-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parav Pandit [Tue, 27 Jan 2026 08:52:40 +0000 (10:52 +0200)]
net/mlx5: Fix vhca_id access call trace use before alloc
HCA CAP structure is allocated in mlx5_hca_caps_alloc().
mlx5_mdev_init()
mlx5_hca_caps_alloc()
And HCA CAP is read from the device in mlx5_init_one().
The vhca_id's debugfs file is published even before above two
operations are done.
Due to this when user reads the vhca id before the initialization,
following call trace is observed.
Fix this by deferring debugfs publication until the HCA CAP is
allocated and read from the device.
Shay Drory [Tue, 27 Jan 2026 08:52:38 +0000 (10:52 +0200)]
net/mlx5: fs, Fix inverted cap check in tx flow table root disconnect
The capability check for reset_root_to_default was inverted, causing
the function to return -EOPNOTSUPP when the capability IS supported,
rather than when it is NOT supported.
Fix the capability check condition.
Fixes: 3c9c34c32bc6 ("net/mlx5: fs, Command to control TX flow table root") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1769503961-124173-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Fang [Mon, 26 Jan 2026 08:15:44 +0000 (16:15 +0800)]
net: phy: micrel: fix clk warning when removing the driver
Since the commit 25c6a5ab151f ("net: phy: micrel: Dynamically control
external clock of KSZ PHY"), the clock of Micrel PHY has been enabled
by phy_driver::resume() and disabled by phy_driver::suspend(). However,
devm_clk_get_optional_enabled() is used in kszphy_probe(), so the clock
will automatically be disabled when the device is unbound from the bus.
Therefore, this could cause the clock to be disabled twice, resulting
in clk driver warnings.
For example, this issue can be reproduced on i.MX6ULL platform, and we
can see the following logs when removing the FEC MAC drivers.
$ echo 2188000.ethernet > /sys/bus/platform/drivers/fec/unbind
$ echo 20b4000.ethernet > /sys/bus/platform/drivers/fec/unbind
[ 109.758207] ------------[ cut here ]------------
[ 109.758240] WARNING: drivers/clk/clk.c:1188 at clk_core_disable+0xb4/0xd0, CPU#0: sh/639
[ 109.771011] enet2_ref already disabled
[ 109.793359] Call trace:
[ 109.822006] clk_core_disable from clk_disable+0x28/0x34
[ 109.827340] clk_disable from clk_disable_unprepare+0xc/0x18
[ 109.833029] clk_disable_unprepare from devm_clk_release+0x1c/0x28
[ 109.839241] devm_clk_release from devres_release_all+0x98/0x100
[ 109.845278] devres_release_all from device_unbind_cleanup+0xc/0x70
[ 109.851571] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4
[ 109.859170] device_release_driver_internal from bus_remove_device+0xbc/0xe4
[ 109.866243] bus_remove_device from device_del+0x140/0x458
[ 109.871757] device_del from phy_mdio_device_remove+0xc/0x24
[ 109.877452] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac
[ 109.883918] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78
[ 109.890125] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158
[ 109.896076] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4
[ 109.962748] WARNING: drivers/clk/clk.c:1047 at clk_core_unprepare+0xfc/0x13c, CPU#0: sh/639
[ 109.975805] enet2_ref already unprepared
[ 110.002866] Call trace:
[ 110.031758] clk_core_unprepare from clk_unprepare+0x24/0x2c
[ 110.037440] clk_unprepare from devm_clk_release+0x1c/0x28
[ 110.042957] devm_clk_release from devres_release_all+0x98/0x100
[ 110.048989] devres_release_all from device_unbind_cleanup+0xc/0x70
[ 110.055280] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4
[ 110.062877] device_release_driver_internal from bus_remove_device+0xbc/0xe4
[ 110.069950] bus_remove_device from device_del+0x140/0x458
[ 110.075469] device_del from phy_mdio_device_remove+0xc/0x24
[ 110.081165] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac
[ 110.087632] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78
[ 110.093836] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158
[ 110.099782] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4
After analyzing the process of removing the FEC driver, as shown below,
it can be seen that the clock was disabled twice by the PHY driver.
fec_drv_remove()
--> fec_enet_close()
--> phy_stop()
--> phy_suspend()
--> kszphy_suspend() #1 The clock is disabled
--> fec_enet_mii_remove()
--> mdiobus_unregister()
--> phy_mdio_device_remove()
--> device_del()
--> devm_clk_release() #2 The clock is disabled again
Therefore, devm_clk_get_optional() is used to fix the above issue. And
to avoid the issue mentioned by the commit 985329462723 ("net: phy:
micrel: use devm_clk_get_optional_enabled for the rmii-ref clock"), the
clock is enabled by clk_prepare_enable() to get the correct clock rate.
Fixes: 25c6a5ab151f ("net: phy: micrel: Dynamically control external clock of KSZ PHY") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260126081544.983517-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Zahka [Mon, 26 Jan 2026 19:38:17 +0000 (11:38 -0800)]
net/mlx5e: don't assume psp tx skbs are ipv6 csum handling
mlx5e_psp_handle_tx_skb() assumes skbs are ipv6 when doing a partial
TCP checksum with tso. Make correctly mlx5e_psp_handle_tx_skb() handle
ipv4 packets.
Jakub Kicinski [Thu, 29 Jan 2026 03:40:54 +0000 (19:40 -0800)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-27 (ixgbe, ice)
For ixgbe:
Kohei Enju adjusts the cleanup path on firmware error to resolve some
memory leaks and removes an instance of double init, free on ACI mutex.
For ice:
Aaron Ma adds NULL checks for q_vectors to avoid NULL pointer
dereference.
Jesse Brandeburg removes UDP checksum mismatch from being counted in Rx
errors.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: stop counting UDP csum mismatch as rx_errors
ice: Fix NULL pointer dereference in ice_vsi_set_napi_queues
ixgbe: don't initialize aci lock in ixgbe_recovery_probe()
ixgbe: fix memory leaks in the ixgbe_recovery_probe() path
====================
Martin Kaiser [Tue, 27 Jan 2026 10:19:23 +0000 (11:19 +0100)]
net: bridge: fix static key check
Fix the check if netfilter's static keys are available. netfilter defines
and exports static keys if CONFIG_JUMP_LABEL is enabled. (HAVE_JUMP_LABEL
is never defined.)
Fixes: 971502d77faa ("bridge: netfilter: unroll NF_HOOK helper in bridge input path") Signed-off-by: Martin Kaiser <martin@kaiser.cx> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260127101925.1754425-1-martin@kaiser.cx Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nfc: nci: Fix race between rfkill and nci_unregister_device().
syzbot reported the splat below [0] without a repro.
It indicates that struct nci_dev.cmd_wq had been destroyed before
nci_close_device() was called via rfkill.
nci_dev.cmd_wq is only destroyed in nci_unregister_device(), which
(I think) was called from virtual_ncidev_close() when syzbot close()d
an fd of virtual_ncidev.
The problem is that nci_unregister_device() destroys nci_dev.cmd_wq
first and then calls nfc_unregister_device(), which removes the
device from rfkill by rfkill_unregister().
So, the device is still visible via rfkill even after nci_dev.cmd_wq
is destroyed.
Let's unregister the device from rfkill first in nci_unregister_device().
Note that we cannot call nfc_unregister_device() before
nci_close_device() because
1) nfc_unregister_device() calls device_del() which frees
all memory allocated by devm_kzalloc() and linked to
ndev->conn_info_list
2) nci_rx_work() could try to queue nci_conn_info to
ndev->conn_info_list which could be leaked
Thus, nfc_unregister_device() is split into two functions so we
can remove rfkill interfaces only before nci_close_device().
Jordan Rhee [Tue, 27 Jan 2026 01:02:10 +0000 (01:02 +0000)]
gve: fix probe failure if clock read fails
If timestamping is supported, GVE reads the clock during probe,
which can fail for various reasons. Previously, this failure would
abort the driver probe, rendering the device unusable. This behavior
has been observed on production GCP VMs, causing driver initialization
to fail completely.
This patch allows the driver to degrade gracefully. If gve_init_clock()
fails, it logs a warning and continues loading the driver without PTP
support.
Cc: stable@vger.kernel.org Fixes: a479a27f4da4 ("gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call") Signed-off-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Shachar Raindel <shacharr@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260127010210.969823-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Mon, 26 Jan 2026 07:14:55 +0000 (09:14 +0200)]
net/mlx5e: Account for netdev stats in ndo_get_stats64
The driver's ndo_get_stats64 callback is only reporting mlx5 counters,
without accounting for the netdev stats, causing errors from the network
stack to be invisible in statistics.
Add netdev_stats_to_stats64() call to first populate the counters, then
add mlx5 counters on top, ensuring both are accounted for (where
appropriate).
Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1769411695-18820-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mark Bloch [Mon, 26 Jan 2026 07:14:54 +0000 (09:14 +0200)]
net/mlx5e: TC, delete flows only for existing peers
When deleting TC steering flows, iterate only over actual devcom
peers instead of assuming all possible ports exist. This avoids
touching non-existent peers and ensures cleanup is limited to
devices the driver is currently connected to.
Shay Drory [Mon, 26 Jan 2026 07:14:53 +0000 (09:14 +0200)]
net/mlx5: Fix Unbinding uplink-netdev in switchdev mode
It is possible to unbind the uplink ETH driver while the E-Switch is
in switchdev mode. This leads to netdevice reference counting issues[1],
as the driver removal path was not designed to clean up from this state.
During uplink ETH driver removal (_mlx5e_remove), the code now waits for
any concurrent E-Switch mode transition to finish. It then removes the
REPs auxiliary device, if exists. This ensures a graceful cleanup.
[1]
unregister_netdevice: waiting for eth2 to become free. Usage count = 2
ref_tracker: netdev@00000000c912e04b has 1/1 users at
ib_device_set_netdev+0x130/0x270 [ib_core]
mlx5_ib_vport_rep_load+0xf4/0x3e0 [mlx5_ib]
mlx5_esw_offloads_rep_load+0xc7/0xe0 [mlx5_core]
esw_offloads_enable+0x583/0x900 [mlx5_core]
mlx5_eswitch_enable_locked+0x1b2/0x290 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x107/0x3e0 [mlx5_core]
devlink_nl_eswitch_set_doit+0x60/0xd0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x183/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
__x64_sys_sendto+0x20/0x30
Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1769411695-18820-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the beginning, the Intel ice driver has counted receive checksum
offload mismatches into the rx_errors member of the rtnl_link_stats64
struct. In ethtool -S these show up as rx_csum_bad.nic.
I believe counting these in rx_errors is fundamentally wrong, as it's
pretty clear from the comments in if_link.h and from every other statistic
the driver is summing into rx_errors, that all of them would cause a
"hardware drop" except for the UDP checksum mismatch, as well as the fact
that all the other causes for rx_errors are L2 reasons, and this L4 UDP
"mismatch" is an outlier.
A last nail in the coffin is that rx_errors is monitored in production and
can indicate a bad NIC/cable/Switch port, but instead some random series of
UDP packets with bad checksums will now trigger this alert. This false
positive makes the alert useless and affects us as well as other companies.
This packet with presumably a bad UDP checksum is *already* passed to the
stack, just not marked as offloaded by the hardware/driver. If it is
dropped by the stack it will show up as UDP_MIB_CSUMERRORS.
And one more thing, none of the other Intel drivers, and at least bnxt_en
and mlx5 both don't appear to count UDP offload mismatches as rx_errors.
Here is a related customer complaint:
https://community.intel.com/t5/Ethernet-Products/ice-rx-errros-is-too-sensitive-to-IP-TCP-attack-packets-Intel/td-p/1662125
Fixes: 4f1fe43c920b ("ice: Add more Rx errors to netdev's rx_error counter") Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Jake Keller <jacob.e.keller@intel.com> Cc: IWL <intel-wired-lan@lists.osuosl.org> Signed-off-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add defensive checks for both the ring pointer and its q_vector
before dereferencing, allowing the system to resume successfully even when
q_vectors are unmapped.
Fixes: 2a5dc090b92cf ("ice: move netif_queue_set_napi to rtnl-protected sections") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Kohei Enju [Thu, 11 Dec 2025 09:15:32 +0000 (18:15 +0900)]
ixgbe: don't initialize aci lock in ixgbe_recovery_probe()
hw->aci.lock is already initialized in ixgbe_sw_init(), so
ixgbe_recovery_probe() doesn't need to initialize the lock. This
function is also not responsible for destroying the lock on failures.
Additionally, change the name of label in accordance with this change.
Fixes: 29cb3b8d95c7 ("ixgbe: add E610 implementation of FW recovery mode") Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/intel-wired-lan/aTcFhoH-z2btEKT-@horms.kernel.org/ Signed-off-by: Kohei Enju <enjuk@amazon.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Kohei Enju [Thu, 11 Dec 2025 09:15:31 +0000 (18:15 +0900)]
ixgbe: fix memory leaks in the ixgbe_recovery_probe() path
When ixgbe_recovery_probe() is invoked and this function fails,
allocated resources in advance are not completely freed, because
ixgbe_probe() returns ixgbe_recovery_probe() directly and
ixgbe_recovery_probe() only frees partial resources, resulting in memory
leaks including:
- adapter->io_addr
- adapter->jump_tables[0]
- adapter->mac_table
- adapter->rss_key
- adapter->af_xdp_zc_qps
The leaked MMIO region can be observed in /proc/vmallocinfo, and the
remaining leaks are reported by kmemleak.
Don't return ixgbe_recovery_probe() directly, and instead let
ixgbe_probe() to clean up resources on failures.
Fixes: 29cb3b8d95c7 ("ixgbe: add E610 implementation of FW recovery mode") Signed-off-by: Kohei Enju <enjuk@amazon.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
bonding: fix use-after-free due to enslave fail after slave array update
Fix a use-after-free which happens due to enslave failure after the new
slave has been added to the array. Since the new slave can be used for Tx
immediately, we can use it after it has been freed by the enslave error
cleanup path which frees the allocated slave memory. Slave update array is
supposed to be called last when further enslave failures are not expected.
Move it after xdp setup to avoid any problems.
It is very easy to reproduce the problem with a simple xdp_pass prog:
ip l add bond1 type bond mode balance-xor
ip l set bond1 up
ip l set dev bond1 xdp object xdp_pass.o sec xdp_pass
ip l add dumdum type dummy
Then run in parallel:
while :; do ip l set dumdum master bond1 1>/dev/null 2>&1; done;
mausezahn bond1 -a own -b rand -A rand -B 1.1.1.1 -c 0 -t tcp "dp=1-1023, flags=syn"
Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reported-by: Chen Zhen <chenzhen126@huawei.com> Closes: https://lore.kernel.org/netdev/fae17c21-4940-5605-85b2-1d5e17342358@huawei.com/ CC: Jussi Maki <joamaki@gmail.com> CC: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260123120659.571187-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
nfc: llcp: Fix memleak in nfc_llcp_send_ui_frame().
syzbot reported various memory leaks related to NFC, struct
nfc_llcp_sock, sk_buff, nfc_dev, etc. [0]
The leading log hinted that nfc_llcp_send_ui_frame() failed
to allocate skb due to sock_error(sk) being -ENXIO.
ENXIO is set by nfc_llcp_socket_release() when struct
nfc_llcp_local is destroyed by local_cleanup().
The problem is that there is no synchronisation between
nfc_llcp_send_ui_frame() and local_cleanup(), and skb
could be put into local->tx_queue after it was purged in
local_cleanup():
Vivian Wang [Fri, 23 Jan 2026 03:52:23 +0000 (11:52 +0800)]
net: spacemit: Check for netif_carrier_ok() in emac_stats_update()
Some PHYs stop the refclk for power saving, usually while link down.
This causes reading stats to time out.
Therefore, in emac_stats_update(), also don't update and reschedule if
!netif_carrier_ok(). But that means we could be missing later updates if
the link comes back up, so also reschedule when link up is detected in
emac_adjust_link().
While we're at it, improve the comments and error message prints around
this to reflect the better understanding of how this could happen.
Hopefully if this happens again on new hardware, these comments will
direct towards a solution.
Closes: https://lore.kernel.org/r/20260119141620.1318102-1-amadeus@jmu.edu.cn/ Fixes: bfec6d7f2001 ("net: spacemit: Add K1 Ethernet MAC") Co-developed-by: Chukun Pan <amadeus@jmu.edu.cn> Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Link: https://patch.msgid.link/20260123-k1-ethernet-clarify-stat-timeout-v3-1-93b9df627e87@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kery Qi [Fri, 23 Jan 2026 21:10:31 +0000 (05:10 +0800)]
rocker: fix memory leak in rocker_world_port_post_fini()
In rocker_world_port_pre_init(), rocker_port->wpriv is allocated with
kzalloc(wops->port_priv_size, GFP_KERNEL). However, in
rocker_world_port_post_fini(), the memory is only freed when
wops->port_post_fini callback is set:
if (!wops->port_post_fini)
return;
wops->port_post_fini(rocker_port);
kfree(rocker_port->wpriv);
Since rocker_ofdpa_ops does not implement port_post_fini callback
(it is NULL), the wpriv memory allocated for each port is never freed
when ports are removed. This leads to a memory leak of
sizeof(struct ofdpa_port) bytes per port on every device removal.
Fix this by always calling kfree(rocker_port->wpriv) regardless of
whether the port_post_fini callback exists.
Fixes: e420114eef4a ("rocker: introduce worlds infrastructure") Signed-off-by: Kery Qi <qikeyu2017@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260123211030.2109-2-qikeyu2017@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reported by the following Smatch static checker warning:
drivers/net/dsa/yt921x.c:702 yt921x_read_mib()
warn: was expecting a 64 bit value instead of '(~0)'
Fixes: 186623f4aa72 ("net: dsa: yt921x: Add support for Motorcomm YT921x") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/netdev/aPsjYKQMzpY0nSXm@stanley.mountain/ Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260122170512.2713738-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zeng Chi [Fri, 23 Jan 2026 08:57:49 +0000 (16:57 +0800)]
net/mlx5: Fix return type mismatch in mlx5_esw_vport_vhca_id()
The function mlx5_esw_vport_vhca_id() is declared to return bool,
but returns -EOPNOTSUPP (-45), which is an int error code. This
causes a signedness bug as reported by smatch.
This patch fixes this smatch report:
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:981 mlx5_esw_vport_vhca_id()
warn: signedness bug returning '(-45)'
Fixes: 1baf30426553 ("net/mlx5: E-Switch, Set/Query hca cap via vhca id") Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Zeng Chi <zengchi@kylinos.cn> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260123085749.1401969-1-zeng_chi911@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kery Qi [Thu, 22 Jan 2026 17:04:01 +0000 (01:04 +0800)]
net: wwan: t7xx: fix potential skb->frags overflow in RX path
When receiving data in the DPMAIF RX path,
the t7xx_dpmaif_set_frag_to_skb() function adds
page fragments to an skb without checking if the number of
fragments has exceeded MAX_SKB_FRAGS. This could lead to a buffer overflow
in skb_shinfo(skb)->frags[] array, corrupting adjacent memory and
potentially causing kernel crashes or other undefined behavior.
This issue was identified through static code analysis by comparing with a
similar vulnerability fixed in the mt76 driver commit b102f0c522cf ("mt76:
fix array overflow on receiving too many fragments for a packet").
The vulnerability could be triggered if the modem firmware sends packets
with excessive fragments. While under normal protocol conditions (MTU 3080
bytes, BAT buffer 3584 bytes),
a single packet should not require additional
fragments, the kernel should not blindly trust firmware behavior.
Malicious, buggy, or compromised firmware could potentially craft packets
with more fragments than the kernel expects.
Fix this by adding a bounds check before calling skb_add_rx_frag() to
ensure nr_frags does not exceed MAX_SKB_FRAGS.
The check must be performed before unmapping to avoid a page leak
and double DMA unmap during device teardown.
ipv6: use the right ifindex when replying to icmpv6 from localhost
When replying to a ICMPv6 echo request that comes from localhost address
the right output ifindex is 1 (lo) and not rt6i_idev dev index. Use the
skb device ifindex instead. This fixes pinging to a local address from
localhost source address.
$ ping6 -I ::1 2001:1:1::2 -c 3
PING 2001:1:1::2 (2001:1:1::2) from ::1 : 56 data bytes
64 bytes from 2001:1:1::2: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 2001:1:1::2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 2001:1:1::2: icmp_seq=3 ttl=64 time=0.122 ms
2001:1:1::2 ping statistics
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.037/0.076/0.122/0.035 ms
Fixes: 1b70d792cf67 ("ipv6: Use rt6i_idev index for echo replies to a local address") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260121194409.6749-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zilin Guan [Fri, 23 Jan 2026 06:57:16 +0000 (06:57 +0000)]
net: mvpp2: cls: Fix memory leak in mvpp2_ethtool_cls_rule_ins()
In mvpp2_ethtool_cls_rule_ins(), the ethtool_rule is allocated by
ethtool_rx_flow_rule_create(). If the subsequent conversion to flow
type fails, the function jumps to the clean_rule label.
However, the clean_rule label only frees efs, skipping the cleanup
of ethtool_rule, which leads to a memory leak.
Fix this by jumping to the clean_eth_rule label, which properly calls
ethtool_rx_flow_rule_destroy() before freeing efs.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: f4f1ba18195d ("net: mvpp2: cls: Report an error for unsupported flow types") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260123065716.2248324-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edward Cree [Fri, 23 Jan 2026 16:16:34 +0000 (16:16 +0000)]
sfc: fix deadlock in RSS config read
Since cited commit, core locks the net_device's rss_lock when handling
ethtool -x command, so driver's implementation should not lock it
again. Remove the latter.
Fixes: 040cef30b5e6 ("net: ethtool: move get_rxfh callback under the rss_lock") Reported-by: Damir Mansurov <damir.mansurov@oktetlabs.ru> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1126015 Suggested-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20260123161634.1215006-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zilin Guan [Wed, 21 Jan 2026 13:05:51 +0000 (13:05 +0000)]
octeon_ep: Fix memory leak in octep_device_setup()
In octep_device_setup(), if octep_ctrl_net_init() fails, the function
returns directly without unmapping the mapped resources and freeing the
allocated configuration memory.
Fix this by jumping to the unsupported_dev label, which performs the
necessary cleanup. This aligns with the error handling logic of other
paths in this function.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260121130551.3717090-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 23 Jan 2026 18:47:02 +0000 (10:47 -0800)]
Merge tag 'for-net-2026-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_uart: fix null-ptr-deref in hci_uart_write_work
- MGMT: Fix memory leak in set_ssp_complete
* tag 'for-net-2026-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix memory leak in set_ssp_complete
Bluetooth: hci_uart: fix null-ptr-deref in hci_uart_write_work
====================
Sinc commit 79a6d1bfe114 ("can: gs_usb: gs_usb_receive_bulk_callback():
unanchor URL on usb_submit_urb() error") a failing resubmit URB will print
an info message.
In the case of a short read where netdev has not yet been assigned,
initialize as NULL to avoid dereferencing an undefined value. Also report
the error value of the failed resubmit.
Fixes: 79a6d1bfe114 ("can: gs_usb: gs_usb_receive_bulk_callback(): unanchor URL on usb_submit_urb() error") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/all/20260119181904.1209979-1-kuba@kernel.org/ Link: https://patch.msgid.link/20260120-gs_usb-fix-error-message-v1-1-6be04de572bc@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Zilin Guan [Tue, 20 Jan 2026 13:46:40 +0000 (13:46 +0000)]
net/mlx5: Fix memory leak in esw_acl_ingress_lgcy_setup()
In esw_acl_ingress_lgcy_setup(), if esw_acl_table_create() fails,
the function returns directly without releasing the previously
created counter, leading to a memory leak.
Fix this by jumping to the out label instead of returning directly,
which aligns with the error handling logic of other paths in this
function.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Jianpeng Chang [Wed, 21 Jan 2026 05:29:26 +0000 (13:29 +0800)]
Bluetooth: MGMT: Fix memory leak in set_ssp_complete
Fix memory leak in set_ssp_complete() where mgmt_pending_cmd structures
are not freed after being removed from the pending list.
Commit 302a1f674c00 ("Bluetooth: MGMT: Fix possible UAFs") replaced
mgmt_pending_foreach() calls with individual command handling but missed
adding mgmt_pending_free() calls in both error and success paths of
set_ssp_complete(). Other completion functions like set_le_complete()
were fixed correctly in the same commit.
This causes a memory leak of the mgmt_pending_cmd structure and its
associated parameter data for each SSP command that completes.
Add the missing mgmt_pending_free(cmd) calls in both code paths to fix
the memory leak. Also fix the same issue in set_advertising_complete().
Fixes: 302a1f674c00 ("Bluetooth: MGMT: Fix possible UAFs") Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Jia-Hong Su [Sun, 18 Jan 2026 12:08:59 +0000 (20:08 +0800)]
Bluetooth: hci_uart: fix null-ptr-deref in hci_uart_write_work
hci_uart_set_proto() sets HCI_UART_PROTO_INIT before calling
hci_uart_register_dev(), which calls proto->open() to initialize
hu->priv. However, if a TTY write wakeup occurs during this window,
hci_uart_tx_wakeup() may schedule write_work before hu->priv is
initialized, leading to a NULL pointer dereference in
hci_uart_write_work() when proto->dequeue() accesses hu->priv.
Linus Torvalds [Thu, 22 Jan 2026 17:32:11 +0000 (09:32 -0800)]
Merge tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and wireless.
Pretty big, but hard to make up any cohesive story that would explain
it, a random collection of fixes. The two reverts of bad patches from
this release here feel like stuff that'd normally show up by rc5 or
rc6. Perhaps obvious thing to say, given the holiday timing.
That said, no active investigations / regressions. Let's see what the
next week brings.
Current release - fix to a fix:
- can: alloc_candev_mqs(): add missing default CAN capabilities
Current release - regressions:
- usbnet: fix crash due to missing BQL accounting after resume
Hariprasad Kelam [Wed, 21 Jan 2026 09:48:19 +0000 (15:18 +0530)]
Octeontx2-af: Add proper checks for fwdata
firmware populates MAC address, link modes (supported, advertised)
and EEPROM data in shared firmware structure which kernel access
via MAC block(CGX/RPM).
Accessing fwdata, on boards booted with out MAC block leading to
kernel panics.
Fixes: 997814491cee ("Octeontx2-af: Fetch MAC channel info from firmware") Fixes: 5f21226b79fd ("Octeontx2-pf: ethtool: support multi advertise mode") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20260121094819.2566786-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Wed, 21 Jan 2026 13:00:11 +0000 (14:00 +0100)]
dpll: Prevent duplicate registrations
Modify the internal registration helpers dpll_xa_ref_{dpll,pin}_add()
to reject duplicate registration attempts.
Previously, if a caller attempted to register the same pin multiple
times (with the same ops, priv, and cookie) on the same device, the core
silently increments the reference count and return success. This behavior
is incorrect because if the caller makes these duplicate registrations
then for the first one dpll_pin_registration is allocated and for others
the associated dpll_pin_ref.refcount is incremented. During the first
unregistration the associated dpll_pin_registration is freed and for
others WARN is fired.
Fix this by updating the logic to return `-EEXIST` if a matching
registration is found to enforce a strict "register once" policy.
Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260121130012.112606-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Incorrectly transmitted interrupt number instead of queue number
when using netif_queue_set_napi. Besides, move this to appropriate
code location to set napi.
Remove redundant netif_stop_subqueue beacuase it is not part of the
hinic3_send_one_skb process.
Zilin Guan [Thu, 22 Jan 2026 11:41:28 +0000 (11:41 +0000)]
can: at91_can: Fix memory leak in at91_can_probe()
In at91_can_probe(), the dev structure is allocated via alloc_candev().
However, if the subsequent call to devm_phy_optional_get() fails, the
code jumps directly to exit_iounmap, missing the call to free_candev().
This results in a memory leak of the allocated net_device structure.
Fix this by jumping to the exit_free label instead, which ensures that
free_candev() is called to properly release the memory.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Jakub Kicinski [Thu, 22 Jan 2026 15:54:30 +0000 (07:54 -0800)]
Merge tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Another set of updates:
- various small fixes for ath10k/ath12k/mwifiex/rsi
- cfg80211 fix for HE bitrate overflow
- mac80211 fixes
- S1G beacon handling in scan
- skb tailroom handling for HW encryption
- CSA fix for multi-link
- handling of disabled links during association
* tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: ignore link disabled flag from userspace
wifi: mac80211: apply advertised TTLM from association response
wifi: mac80211: parse all TTLM entries
wifi: mac80211: don't increment crypto_tx_tailroom_needed_cnt twice
wifi: mac80211: don't perform DA check on S1G beacon
wifi: ath12k: Fix wrong P2P device link id issue
wifi: ath12k: fix dead lock while flushing management frames
wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel
wifi: ath12k: cancel scan only on active scan vdev
wifi: mwifiex: Fix a loop in mwifiex_update_ampdu_rxwinsize()
wifi: mac80211: correctly check if CSA is active
wifi: cfg80211: Fix bitrate calculation overflow for HE rates
wifi: rsi: Fix memory corruption due to not set vif driver data size
wifi: ath12k: don't force radio frequency check in freq_to_idx()
wifi: ath12k: fix dma_free_coherent() pointer
wifi: ath10k: fix dma_free_coherent() pointer
====================
The original series was posted by Melbin K Mathew <mlbnkm1@gmail.com> till v4.
Since it's a real issue and the original author seems busy, I'm sending
the new version fixing my comments but keeping the authorship (and restoring
mine on patch 2 as reported on v4).
This series fixes TX credit handling in virtio-vsock:
Patch 1: Fix potential underflow in get_credit() using s64 arithmetic
Patch 2: Fix vsock_test seqpacket bounds test
Patch 3: Cap TX credit to local buffer size (security hardening)
Patch 4: Add stream TX credit bounds regression test
The core issue is that a malicious guest can advertise a huge buffer
size via SO_VM_SOCKETS_BUFFER_SIZE, causing the host to allocate
excessive sk_buff memory when sending data to that guest.
On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with
32 guest vsock connections advertising 2 GiB each and reading slowly
drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB; the system only
recovered after killing the QEMU process.
With this series applied, the same PoC shows only ~35 MiB increase in
Slab/SUnreclaim, no host OOM, and the guest remains responsive.
====================
Melbin K Mathew [Wed, 21 Jan 2026 09:36:28 +0000 (10:36 +0100)]
vsock/test: add stream TX credit bounds test
Add a regression test for the TX credit bounds fix. The test verifies
that a sender with a small local buffer size cannot queue excessive
data even when the peer advertises a large receive buffer.
The client:
- Sets a small buffer size (64 KiB)
- Connects to server (which advertises 2 MiB buffer)
- Sends in non-blocking mode until EAGAIN
- Verifies total queued data is bounded
This guards against the original vulnerability where a remote peer
could cause unbounded kernel memory allocation by advertising a large
buffer and reading slowly.
Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: use sock_buf_size to check the bytes sent + small fixes] Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260121093628.9941-5-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Melbin K Mathew [Wed, 21 Jan 2026 09:36:27 +0000 (10:36 +0100)]
vsock/virtio: cap TX credit to local buffer size
The virtio transports derives its TX credit directly from peer_buf_alloc,
which is set from the remote endpoint's SO_VM_SOCKETS_BUFFER_SIZE value.
On the host side this means that the amount of data we are willing to
queue for a connection is scaled by a guest-chosen buffer size, rather
than the host's own vsock configuration. A malicious guest can advertise
a large buffer and read slowly, causing the host to allocate a
correspondingly large amount of sk_buff memory.
The same thing would happen in the guest with a malicious host, since
virtio transports share the same code base.
Introduce a small helper, virtio_transport_tx_buf_size(), that
returns min(peer_buf_alloc, buf_alloc), and use it wherever we consume
peer_buf_alloc.
This ensures the effective TX window is bounded by both the peer's
advertised buffer and our own buf_alloc (already clamped to
buffer_max_size via SO_VM_SOCKETS_BUFFER_MAX_SIZE), so a remote peer
cannot force the other to queue more data than allowed by its own
vsock settings.
On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with
32 guest vsock connections advertising 2 GiB each and reading slowly
drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB; the system only
recovered after killing the QEMU process. That said, if QEMU memory is
limited with cgroups, the maximum memory used will be limited.
Only ~35 MiB increase in Slab/SUnreclaim, no host OOM, and the guest
remains responsive.
Compatibility with non-virtio transports:
- VMCI uses the AF_VSOCK buffer knobs to size its queue pairs per
socket based on the local vsk->buffer_* values; the remote side
cannot enlarge those queues beyond what the local endpoint
configured.
- Hyper-V's vsock transport uses fixed-size VMBus ring buffers and
an MTU bound; there is no peer-controlled credit field comparable
to peer_buf_alloc, and the remote endpoint cannot drive in-flight
kernel memory above those ring sizes.
- The loopback path reuses virtio_transport_common.c, so it
naturally follows the same semantics as the virtio transport.
This change is limited to virtio_transport_common.c and thus affects
virtio-vsock, vhost-vsock, and loopback, bringing them in line with the
"remote window intersected with local policy" behaviour that VMCI and
Hyper-V already effectively have.
Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: small adjustments after changing the previous patch]
[Stefano: tweak the commit message] Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20260121093628.9941-4-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test requires the sender (client) to send all messages before waking
up the receiver (server).
Since virtio-vsock had a bug and did not respect the size of the TX
buffer, this test worked, but now that we are going to fix the bug, the
test hangs because the sender would fill the TX buffer before waking up
the receiver.
Set the buffer size in the sender (client) as well, as we already do for
the receiver (server).
Fixes: 5c338112e48a ("test/vsock: rework message bounds test") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260121093628.9941-3-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Melbin K Mathew [Wed, 21 Jan 2026 09:36:25 +0000 (10:36 +0100)]
vsock/virtio: fix potential underflow in virtio_transport_get_credit()
The credit calculation in virtio_transport_get_credit() uses unsigned
arithmetic:
ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
If the peer shrinks its advertised buffer (peer_buf_alloc) while bytes
are in flight, the subtraction can underflow and produce a large
positive value, potentially allowing more data to be queued than the
peer can handle.
Reuse virtio_transport_has_space() which already handles this case and
add a comment to make it clear why we are doing that.
Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Melbin K Mathew <mlbnkm1@gmail.com>
[Stefano: use virtio_transport_has_space() instead of duplicating the code]
[Stefano: tweak the commit message] Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20260121093628.9941-2-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Clemens Gruber [Wed, 21 Jan 2026 08:37:51 +0000 (09:37 +0100)]
net: fec: account for VLAN header in frame length calculations
The MAX_FL (maximum frame length) and related calculations used ETH_HLEN,
which does not account for the 4-byte VLAN tag in tagged frames. This
caused the hardware to reject valid VLAN frames as oversized, resulting
in RX errors and dropped packets.
Use VLAN_ETH_HLEN instead of ETH_HLEN in the MAX_FL register setup,
cut-through mode threshold, buffer allocation, and max_mtu calculation.
Cc: stable@kernel.org # v6.18+ Fixes: 62b5bb7be7bc ("net: fec: update MAX_FL based on the current MTU") Fixes: d466c16026e9 ("net: fec: enable the Jumbo frame support for i.MX8QM") Fixes: 59e9bf037d75 ("net: fec: add change_mtu to support dynamic buffer allocation") Fixes: ec2a1681ed4f ("net: fec: use a member variable for maximum buffer size") Signed-off-by: Clemens Gruber <mail@clemensgruber.at> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260121083751.66997-1-mail@clemensgruber.at Signed-off-by: Paolo Abeni <pabeni@redhat.com>
David Yang [Wed, 21 Jan 2026 07:29:26 +0000 (15:29 +0800)]
net: openvswitch: fix data race in ovs_vport_get_upcall_stats
In ovs_vport_get_upcall_stats(), some statistics protected by
u64_stats_sync, are read and accumulated in ignorance of possible
u64_stats_fetch_retry() events. These statistics are already accumulated
by u64_stats_inc(). Fix this by reading them into temporary variables
first.
Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets") Signed-off-by: David Yang <mmyangfl@gmail.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20260121072932.2360971-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Linus Torvalds [Thu, 22 Jan 2026 05:53:26 +0000 (21:53 -0800)]
Merge tag 'hyperv-fixes-signed-20260121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix ARM64 port of the MSHV driver (Anirudh Rayabharam)
- Fix huge page handling in the MSHV driver (Stanislav Kinsburskii)
- Minor fixes to driver code (Julia Lawall, Michael Kelley)
* tag 'hyperv-fixes-signed-20260121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: handle gpa intercepts for arm64
mshv: add definitions for arm64 gpa intercepts
mshv: Add __user attribute to argument passed to access_ok()
mshv: Store the result of vfs_poll in a variable of type __poll_t
mshv: Align huge page stride with guest mapping
Drivers: hv: Always do Hyper-V panic notification in hv_kmsg_dump()
Drivers: hv: vmbus: fix typo in function name reference
Ratheesh Kannoth [Wed, 21 Jan 2026 03:39:34 +0000 (09:09 +0530)]
octeontx2-af: Fix error handling
This commit adds error handling and rollback logic to
rvu_mbox_handler_attach_resources() to properly clean up partially
attached resources when rvu_attach_block() fails.
Daniel Golle [Wed, 21 Jan 2026 02:23:17 +0000 (02:23 +0000)]
net: pcs: pcs-mtk-lynxi: report in-band capability for 2500Base-X
It turns out that 2500Base-X actually works fine with in-band status on
MediaTek's LynxI PCS -- I wrongly concluded it didn't because it is
broken in all the copper SFP modules and GPON sticks I used for testing.
Hence report LINK_INBAND_ENABLE also for 2500Base-X mode.
This reverts most of commit a003c38d9bbb ("net: pcs: pcs-mtk-lynxi:
correctly report in-band status capabilities").
The removal of the QSGMII interface mode was correct and is left
untouched.
The lockless accesses to these to values aren't actually a problem as the
read only needs an approximate time of last transmission for the purposes
of deciding whether or not the transmission of a keepalive packet is
warranted yet.
Also, as ->last_tx_at is a 64-bit value, tearing can occur on a 32-bit
arch.
Fix both of these by switching to an unsigned int for ->last_tx_at and only
storing the LSW of the time64_t. It can then be reconstructed at need
provided no more than 68 years has elapsed since the last transmission.
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Reported-by: syzbot+6182afad5045e6703b3d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/695e7cfb.050a0220.1c677c.036b.GAE@google.com/ Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org Link: https://patch.msgid.link/1107124.1768903985@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 20 Jan 2026 21:10:39 +0000 (23:10 +0200)]
net: dsa: fix off-by-one in maximum bridge ID determination
Prior to the blamed commit, the bridge_num range was from
0 to ds->max_num_bridges - 1. After the commit, it is from
1 to ds->max_num_bridges.
So this check:
if (bridge_num >= max)
return 0;
must be updated to:
if (bridge_num > max)
return 0;
in order to allow the last bridge_num value (==max) to be used.
This is easiest visible when a driver sets ds->max_num_bridges=1.
The observed behaviour is that even the first created bridge triggers
the netlink extack "Range of offloadable bridges exceeded" warning, and
is handled in software rather than being offloaded.
Eric Dumazet [Tue, 20 Jan 2026 16:17:44 +0000 (16:17 +0000)]
bonding: provide a net pointer to __skb_flow_dissect()
After 3cbf4ffba5ee ("net: plumb network namespace into __skb_flow_dissect")
we have to provide a net pointer to __skb_flow_dissect(),
either via skb->dev, skb->sk, or a user provided pointer.
In the following case, syzbot was able to cook a bare skb.
Taehee Yoo [Tue, 20 Jan 2026 13:39:30 +0000 (13:39 +0000)]
selftests: net: amt: wait longer for connection before sending packets
Both send_mcast4() and send_mcast6() use sleep 2 to wait for the tunnel
connection between the gateway and the relay, and for the listener
socket to be created in the LISTENER namespace.
However, tests sometimes fail because packets are sent before the
connection is fully established.
Increase the waiting time to make the tests more reliable, and use
wait_local_port_listen() to explicitly wait for the listener socket.
Andrey Vatoropin [Tue, 20 Jan 2026 11:37:47 +0000 (11:37 +0000)]
be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list
When the parameter pmac_id_valid argument of be_cmd_get_mac_from_list() is
set to false, the driver may request the PMAC_ID from the firmware of the
network card, and this function will store that PMAC_ID at the provided
address pmac_id. This is the contract of this function.
However, there is a location within the driver where both
pmac_id_valid == false and pmac_id == NULL are being passed. This could
result in dereferencing a NULL pointer.
To resolve this issue, it is necessary to pass the address of a stub
variable to the function.
This change lead to MHI WWAN device can't connect to internet.
I found a netwrok issue with kernel 6.19-rc4, but network works
well with kernel 6.18-rc1. After checking, this commit is the
root cause.
Before appliing this serial changes on MHI WWAN network, we shall
revert this change in case of v6.19 being impacted.
Linus Torvalds [Wed, 21 Jan 2026 17:34:45 +0000 (09:34 -0800)]
Merge tag 'soc-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"The main changes are devicetree updates for qualcomm and rockchips
arm64 platforms, fixing minor mistakes in SoC and board specific
settings:
- GPIO settings for Pinephone Pro buttons
- Register ranges for rk3576 GPU
- Power domains on sc8280xp
- Clocks on qcom talos
- dtc warnings for extraneous properties, nonstandard node names and
undocument identifiers
The Tegra210 platform gets a single revert for a devicetree change
that caused a 6.19 regression.
On 32-bit Arm, we have trivial fixes for Microchip SAMA7 devicetree
files and NPCM Kconfig, as well as Andrew Jeffery being officially
listed as MAINTAINER for NPCM.
A single driver fix is for Qualcomm RPMHD power domains, bringing the
driver up to date with a devicetree change that added additional power
domains to be enabled"
* tag 'soc-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits)
MAINTAINERS: Add Andrew as M: to ARM/NUVOTON NPCM ARCHITECTURE
MAINTAINERS: update email address for Yixun Lan
Revert "arm64: tegra: Add interconnect properties for Tegra210"
arm64: dts: rockchip: Drop unsupported properties
arm64: dts: rockchip: Fix gpio pinctrl node names
arm64: dts: rockchip: Fix pinctrl property typo on rk3326-odroid-go3
arm64: dts: rockchip: Drop "sitronix,st7789v" fallback compatible from rk3568-wolfvision
ARM: dts: microchip: sama7d65: fix size-cells property for i2c3
ARM: dts: microchip: sama7d65: fix the ranges property for flx9
arm: npcm: drop unused Kconfig ERRATA symbol
arm64: dts: rockchip: Fix wrong register range of rk3576 gpu
arm64: dts: rockchip: Configure MCLK for analog sound on NanoPi M5
arm64: dts: rockchip: Fix headphones widget name on NanoPi M5
ARM: dts: microchip: lan966x: Fix the access to the PHYs for pcb8290
arm64: dts: rockchip: remove redundant max-link-speed from nanopi-r4s
arm64: dts: rockchip: remove dangerous max-link-speed from helios64
arm64: dts: rockchip: fix unit-address for RK3588 NPU's core1 and core2's IOMMU
arm64: dts: rockchip: Fix wifi interrupts flag on Sakura Pi RK3308B
arm64: dts: qcom: sm8650: Fix compile warnings in USB controller node
arm64: dts: qcom: sm8550: Fix compile warnings in USB controller node
...
Linus Torvalds [Wed, 21 Jan 2026 16:42:34 +0000 (08:42 -0800)]
Merge tag 'for-6.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- protect reading super block vs setting block size externally (found
by syzbot)
- make sure no transaction is started in read-only mode even with some
rescue mount option combinations
- fix checksum calculation of backup super blocks when block-group-tree
is enabled
- more extensive mount-time checks of device items that could be left
after device replace and attempting degraded mount
- fix build warning with -Wmaybe-uninitialized on loongarch64-gcc 12
* tag 'for-6.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: add extra device item checks at mount
btrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREE
btrfs: reject new transactions if the fs is fully read-only
btrfs: sync read disk super and set block size
btrfs: fix Wmaybe-uninitialized warning in replay_one_buffer()
Swaraj Gaikwad [Tue, 13 Jan 2026 15:06:39 +0000 (20:36 +0530)]
slab: fix kmalloc_nolock() context check for PREEMPT_RT
On PREEMPT_RT kernels, local_lock becomes a sleeping lock. The current
check in kmalloc_nolock() only verifies we're not in NMI or hard IRQ
context, but misses the case where preemption is disabled.
When a BPF program runs from a tracepoint with preemption disabled
(preempt_count > 0), kmalloc_nolock() proceeds to call
local_lock_irqsave() which attempts to acquire a sleeping lock,
triggering:
BUG: sleeping function called from invalid context
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6128
preempt_count: 2, expected: 0
Fix this by checking !preemptible() on PREEMPT_RT, which directly
expresses the constraint that we cannot take a sleeping lock when
preemption is disabled. This encompasses the previous checks for NMI
and hard IRQ contexts while also catching cases where preemption is
disabled.
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Reported-by: syzbot+b1546ad4a95331b2101e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b1546ad4a95331b2101e Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113150639.48407-1-swarajgaikwad1925@gmail.co Cc: <stable@vger.kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Jeongjun Park [Mon, 19 Jan 2026 06:33:59 +0000 (15:33 +0900)]
netrom: fix double-free in nr_route_frame()
In nr_route_frame(), old_skb is immediately freed without checking if
nr_neigh->ax25 pointer is NULL. Therefore, if nr_neigh->ax25 is NULL,
the caller function will free old_skb again, causing a double-free bug.
Therefore, to prevent this, we need to modify it to check whether
nr_neigh->ax25 is NULL before freeing old_skb.
Cc: <stable@vger.kernel.org> Reported-by: syzbot+999115c3bf275797dc27@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69694d6f.050a0220.58bed.0029.GAE@google.com/ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Link: https://patch.msgid.link/20260119063359.10604-1-aha310510@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Laurent Vivier [Mon, 19 Jan 2026 07:55:18 +0000 (08:55 +0100)]
usbnet: limit max_mtu based on device's hard_mtu
The usbnet driver initializes net->max_mtu to ETH_MAX_MTU before calling
the device's bind() callback. When the bind() callback sets
dev->hard_mtu based the device's actual capability (from CDC Ethernet's
wMaxSegmentSize descriptor), max_mtu is never updated to reflect this
hardware limitation).
This allows userspace (DHCP or IPv6 RA) to configure MTU larger than the
device can handle, leading to silent packet drops when the backend sends
packet exceeding the device's buffer size.
Fix this by limiting net->max_mtu to the device's hard_mtu after the
bind callback returns.
See https://gitlab.com/qemu-project/qemu/-/issues/3268 and
https://bugs.passt.top/attachment.cgi?bugid=189
Jiawen Wu [Mon, 19 Jan 2026 06:59:35 +0000 (14:59 +0800)]
net: txgbe: remove the redundant data return in SW-FW mailbox
For these two firmware mailbox commands, in txgbe_test_hostif() and
txgbe_set_phy_link_hostif(), there is no need to read data from the
buffer.
Under the current setting, OEM firmware will cause the driver to fail to
probe. Because OEM firmware returns more link information, with a larger
OEM structure txgbe_hic_ephy_getlink. However, the current driver does
not support the OEM function. So just fix it in the way that does not
involve reading the returned data.
====================
fix some bugs in the flow director of HNS3 driver
This patchset fixes two bugs in the flow director:
1. Incorrect definition of HCLGE_FD_AD_COUNTER_NUM_M
2. Incorrect assignment of HCLGE_FD_AD_NXT_KEY
====================
Tao Wang reports that sometimes, after resume, stmmac can watchdog:
NETDEV WATCHDOG: CPU: x: transmit queue x timed out xx ms
When this occurs, the DMA transmit descriptors contain:
eth0: 221 [0x0000000876d10dd0]: 0x73660cbe 0x8 0x42 0xb04416a0
eth0: 222 [0x0000000876d10de0]: 0x77731d40 0x8 0x16a0 0x90000000
where descriptor 221 is the TSO header and 222 is the TSO payload.
tdes3 for descriptor 221 (0xb04416a0) has both bit 29 (first
descriptor) and bit 28 (last descriptor) set, which is incorrect.
The following packet also has bit 28 set, but isn't marked as a
first descriptor, and this causes the transmit DMA to stall.
This occurs because stmmac_tso_allocator() populates the first
descriptor, but does not set .last_segment correctly. There are two
places where this matters: one is later in stmmac_tso_xmit() where
we use it to update the TSO header descriptor. The other is in the
ring/chain mode clean_desc3() which is a performance optimisation.
Rather than using tx_q->tx_skbuff_dma[].last_segment to determine
whether the first descriptor entry is the only segment, calculate the
number of descriptor entries used. If there is only one descriptor,
then the first is also the last, so mark it as such.
Further work will be necessary to either eliminate .last_segment
entirely or set it correctly. Code analysis also indicates that a
similar issue exists with .is_jumbo. These will be the subject of
a future patch.
Reported-by: Tao Wang <tao03.wang@horizon.auto> Fixes: c2837423cb54 ("net: stmmac: Rework TX Coalesce logic") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vhq8O-00000005N5s-0Ke5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Yang [Mon, 19 Jan 2026 15:34:36 +0000 (23:34 +0800)]
be2net: fix data race in be_get_new_eqd
In be_get_new_eqd(), statistics of pkts, protected by u64_stats_sync, are
read and accumulated in ignorance of possible u64_stats_fetch_retry()
events. Before the commit in question, these statistics were retrieved
one by one directly from queues. Fix this by reading them into temporary
variables first.
Fixes: 209477704187 ("be2net: set interrupt moderation for Skyhawk-R using EQ-DB") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260119153440.1440578-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Yang [Mon, 19 Jan 2026 16:27:16 +0000 (00:27 +0800)]
idpf: Fix data race in idpf_net_dim
In idpf_net_dim(), some statistics protected by u64_stats_sync, are read
and accumulated in ignorance of possible u64_stats_fetch_retry() events.
The correct way to copy statistics is already illustrated by
idpf_add_queue_stats(). Fix this by reading them into temporary variables
first.
Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260119162720.1463859-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Yang [Mon, 19 Jan 2026 16:07:37 +0000 (00:07 +0800)]
net: hns3: fix data race in hns3_fetch_stats
In hns3_fetch_stats(), ring statistics, protected by u64_stats_sync, are
read and accumulated in ignorance of possible u64_stats_fetch_retry()
events. These statistics are already accumulated by
hns3_ring_stats_update(). Fix this by reading them into a temporary
buffer first.
nfc: MAINTAINERS: Orphan the NFC and look for new maintainers
NFC stack in Linux is in poor shape, with several bugs being discovered
last years via fuzzing, not much new development happening and limited
review and testing. It requires some more effort than drive-by reviews
I have been offering last one or two years.
I don't have much time nor business interests to keep looking at NFC,
so let's drop me from the maintainers to clearly indicate that more
hands are needed.
Linus Torvalds [Tue, 20 Jan 2026 23:01:15 +0000 (15:01 -0800)]
Merge tag 'devicetree-fixes-for-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix a refcount leak in of_alias_scan()
- Support descending into child nodes when populating nodes
in /firmware
* tag 'devicetree-fixes-for-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fix reference count leak in of_alias_scan()
of: platform: Use default match table for /firmware
Linus Torvalds [Tue, 20 Jan 2026 21:32:16 +0000 (13:32 -0800)]
Merge tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
- A patch series from David Hildenbrand which fixes a few things
related to hugetlb PMD sharing
- The remainder are singletons, please see their changelogs for details
* tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: restore per-memcg proactive reclaim with !CONFIG_NUMA
mm/kfence: fix potential deadlock in reboot notifier
Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode
mm: do not copy page tables unnecessarily for VM_UFFD_WP
mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
mm/rmap: fix two comments related to huge_pmd_unshare()
mm/hugetlb: fix two comments related to huge_pmd_unshare()
mm/hugetlb: fix hugetlb_pmd_shared()
mm: remove unnecessary and incorrect mmap lock assert
x86/kfence: avoid writing L1TF-vulnerable PTEs
mm/vma: do not leak memory when .mmap_prepare swaps the file
migrate: correct lock ordering for hugetlb file folios
panic: only warn about deprecated panic_print on write access
fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
mm: take into account mm_cid size for mm_struct static definitions
mm: rename cpu_bitmap field to flexible_array
mm: add missing static initializer for init_mm::mm_cid.lock
Mina Almasry [Thu, 11 Dec 2025 10:19:29 +0000 (10:19 +0000)]
idpf: read lower clock bits inside the time sandwich
PCIe reads need to be done inside the time sandwich because PCIe
writes may get buffered in the PCIe fabric and posted to the device
after the _postts completes. Doing the PCIe read inside the time
sandwich guarantees that the write gets flushed before the _postts
timestamp is taken.
Cc: lrizzo@google.com Cc: namangulati@google.com Cc: willemb@google.com Cc: intel-wired-lan@lists.osuosl.org Cc: milena.olech@intel.com Cc: jacob.e.keller@intel.com Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock") Suggested-by: Shachar Raindel <shacharr@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Paul Greenwalt [Mon, 29 Dec 2025 08:52:34 +0000 (03:52 -0500)]
ice: fix devlink reload call trace
Commit 4da71a77fc3b ("ice: read internal temperature sensor") introduced
internal temperature sensor reading via HWMON. ice_hwmon_init() was added
to ice_init_feature() and ice_hwmon_exit() was added to ice_remove(). As a
result if devlink reload is used to reinit the device and then the driver
is removed, a call trace can occur.
BUG: unable to handle page fault for address: ffffffffc0fd4b5d
Call Trace:
string+0x48/0xe0
vsnprintf+0x1f9/0x650
sprintf+0x62/0x80
name_show+0x1f/0x30
dev_attr_show+0x19/0x60
The call trace repeats approximately every 10 minutes when system
monitoring tools (e.g., sadc) attempt to read the orphaned hwmon sysfs
attributes that reference freed module memory.
The sequence is:
1. Driver load, ice_hwmon_init() gets called from ice_init_feature()
2. Devlink reload down, flow does not call ice_remove()
3. Devlink reload up, ice_hwmon_init() gets called from
ice_init_feature() resulting in a second instance
4. Driver unload, ice_hwmon_exit() called from ice_remove() leaving the
first hwmon instance orphaned with dangling pointer
Fix this by moving ice_hwmon_exit() from ice_remove() to
ice_deinit_features() to ensure proper cleanup symmetry with
ice_hwmon_init().
Fixes: 4da71a77fc3b ("ice: read internal temperature sensor") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on
error paths") removed ice_deinit_hw() from ice_deinit_dev(). As a result
ice_devlink_reinit_down() no longer calls ice_deinit_hw(), but
ice_devlink_reinit_up() still calls ice_init_hw(). Since the control
queues are not uninitialized, ice_init_hw() fails with -EBUSY.
Add ice_deinit_hw() to ice_devlink_reinit_down() to correspond with
ice_init_hw() in ice_devlink_reinit_up().
Fixes: 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on error paths") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Cody Haas [Sat, 13 Dec 2025 00:22:26 +0000 (16:22 -0800)]
ice: Fix persistent failure in ice_get_rxfh
Several ioctl functions have the ability to call ice_get_rxfh, however
all of these ioctl functions do not provide all of the expected
information in ethtool_rxfh_param. For example, ethtool_get_rxfh_indir does
not provide an rss_key. This previously caused ethtool_get_rxfh_indir to
always fail with -EINVAL.
This change draws inspiration from i40e_get_rss to handle this
situation, by only calling the appropriate rss helpers when the
necessary information has been provided via ethtool_rxfh_param.
Fixes: b66a972abb6b ("ice: Refactor ice_set/get_rss into LUT and key specific functions") Signed-off-by: Cody Haas <chaas@riotgames.com> Closes: https://lore.kernel.org/intel-wired-lan/CAH7f-UKkJV8MLY7zCdgCrGE55whRhbGAXvgkDnwgiZ9gUZT7_w@mail.gmail.com/ Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>