neightbl_set() fetches neigh_tables[] and updates attributes under
write_lock_bh(&tbl->lock), so RTNL is not needed.
neigh_table_clear() synchronises RCU only, and rcu_dereference_rtnl()
protects nothing here.
If we released RCU after fetching neigh_tables[], there would be no
synchronisation to block neigh_table_clear() further, so RCU is held
until the end of the function.
Another option would be to protect neigh_tables[] user with SRCU
and add synchronize_srcu() in neigh_table_clear().
But, holding RCU should be fine as we hold write_lock_bh() for the
rest of neightbl_set() anyway.
Let's perform RTM_SETNEIGHTBL under RCU and drop RTNL.
NEIGH_VAR() is read locklessly in the fast path, and IPv6 ndisc uses
NEIGH_VAR_SET() locklessly.
The next patch will convert neightbl_dump_info() to RCU.
Let's annotate accesses to neigh_param with READ_ONCE() and WRITE_ONCE().
Note that ndisc_ifinfo_sysctl_change() uses &NEIGH_VAR() and we cannot
use '&' with READ_ONCE(), so NEIGH_VAR_PTR() is introduced.
Note also that NEIGH_VAR_INIT() does not need WRITE_ONCE() as it is before
parms is published. Also, the only user hippi_neigh_setup_dev() is no
longer called since commit e3804cbebb67 ("net: remove COMPAT_NET_DEV_OPS"),
which looks wrong, but probably no one uses HIPPI and RoadRunner.
====================
net: dsa: lantiq_gswip: use regmap for register access
This series refactors the lantiq_gswip driver to utilize the regmap API
for register access, replacing the previous approach of open-coding
register operations.
Using regmap paves the way for supporting different busses to access the
switch registers, for example it makes it easier to use an MDIO-based
method required to access the registers of the MaxLinear GSW1xx series
of dedicated switch ICs.
Apart from that, the use of regmap improves readability and
maintainability of the driver by standardizing register access.
When ever possible changes were made using Coccinelle semantic patches,
sometimes adjusting white space and adding line breaks when needed.
The remaining changes which were not done using semantic patches are
small and should be easy to review and verify.
The whole series has been Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
====================
The 'clear' parameter of gswip_mii_mask_cfg() and gswip_mii_mask_pcdu()
is inconsistent with the semantics of regmap_write_bits() which also
applies the mask to the value to be written.
Change the semantic mask/set of the functions gswip_mii_mask_cfg() and
gswip_mii_mask_pcdu() to follow the regmap_write_bits() pattern.
Daniel Golle [Tue, 21 Oct 2025 11:16:37 +0000 (12:16 +0100)]
net: dsa: lantiq_gswip: convert accessors to use regmap
Use regmap for register access in preparation for supporting the MaxLinear
GSW1xx family of switches connected via MDIO or SPI.
Rewrite the existing accessor read-poll-timeout functions to use calls to
the regmap API for now.
Daniel Golle [Tue, 21 Oct 2025 11:16:30 +0000 (12:16 +0100)]
net: dsa: lantiq_gswip: clarify GSWIP 2.2 VLAN mode in comment
The comment above writing the default PVID incorrectly states that
"GSWIP 2.2 (GRX300) and later program here the VID directly."
The truth is that even GSWIP 2.2 and newer maintain the behavior of
GSWIP 2.1 unless the VLANMD bit in PCE Global Control Register 1 is
set ("GSWIP2.2 VLAN Mode").
Fix the misleading comment accordingly.
Eric Biggers [Wed, 22 Oct 2025 22:12:09 +0000 (15:12 -0700)]
tcp: Remove unnecessary null check in tcp_inbound_md5_hash()
The 'if (!key && hash_location)' check in tcp_inbound_md5_hash() implies
that hash_location might be null. However, later code in the function
dereferences hash_location anyway, without checking for null first.
Fortunately, there is no real bug, since tcp_inbound_md5_hash() is
called only with non-null values of hash_location.
Therefore, remove the unnecessary and misleading null check of
hash_location. This silences a Smatch static checker warning
(https://lore.kernel.org/netdev/aPi4b6aWBbBR52P1@stanley.mountain/)
Also fix the related comment at the beginning of the function.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20251022221209.19716-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sunday Adelodun [Tue, 21 Oct 2025 19:59:06 +0000 (20:59 +0100)]
net: unix: remove outdated BSD behavior comment in unix_release_sock()
Remove the long-standing comment in unix_release_sock() that described a
behavioral difference between Linux and BSD regarding when ECONNRESET is
sent to connected UNIX sockets upon closure.
As confirmed by testing on macOS (similar to BSD behavior), ECONNRESET
is only observed for SOCK_DGRAM sockets, not for SOCK_STREAM. Meanwhile,
Linux already returns ECONNRESET in cases where a socket is closed with
unread data or is not yet accept()ed. This means the previous comment no
longer accurately describes current behavior and is misleading.
Linus Torvalds [Thu, 23 Oct 2025 17:03:18 +0000 (07:03 -1000)]
Merge tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can. Slim pickings, I'm guessing people haven't
really started testing.
Current release - new code bugs:
- eth: mlx5e:
- psp: avoid 'accel' NULL pointer dereference
- skip PPHCR register query for FEC histogram if not supported
Previous releases - regressions:
- bonding: update the slave array for broadcast mode
- rtnetlink: re-allow deleting FDB entries in user namespace
- eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path
Previous releases - always broken:
- can: drop skb on xmit if device is in listen-only mode
- gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb()
- eth: mlx5e
- RX, fix generating skb from non-linear xdp_buff if program
trims frags
- make devcom init failures non-fatal, fix races with IPSec
Misc:
- some documentation formatting 'fixes'"
* tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5: Fix IPsec cleanup over MPV device
net/mlx5: Refactor devcom to return NULL on failure
net/mlx5e: Skip PPHCR register query if not supported by the device
net/mlx5: Add PPHCR to PCAM supported registers mask
virtio-net: zero unused hash fields
net: phy: micrel: always set shared->phydev for LAN8814
vsock: fix lock inversion in vsock_assign_transport()
ovpn: use datagram_poll_queue for socket readiness in TCP
espintcp: use datagram_poll_queue for socket readiness
net: datagram: introduce datagram_poll_queue for custom receive queues
net: bonding: fix possible peer notify event loss or dup issue
net: hsr: prevent creation of HSR device with slaves from another netns
sctp: avoid NULL dereference when chunk data buffer is missing
ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop
net: ravb: Ensure memory write completes before ringing TX doorbell
net: ravb: Enforce descriptor type ordering
net: hibmcge: select FIXED_PHY
net: dlink: use dev_kfree_skb_any instead of dev_kfree_skb
Documentation: networking: ax25: update the mailing list info.
net: gro_cells: fix lock imbalance in gro_cells_receive()
...
Linus Torvalds [Thu, 23 Oct 2025 16:53:12 +0000 (06:53 -1000)]
Merge tag 'acpi-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a fallout of a recent ACPI properties management update and
work around a compiler bug in ACPICA:
- Fix a recent coding mistake causing __acpi_node_get_property_reference()
arguments to be put in an incorrect order (Sunil V L)
- Work around bogus -Wstringop-overread warning on LoongArch since
GCC 11 in ACPICA (Xi Ruoyao)"
* tag 'acpi-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Work around bogus -Wstringop-overread warning since GCC 11
ACPI: property: Fix argument order in __acpi_node_get_property_reference()
Linus Torvalds [Thu, 23 Oct 2025 16:48:32 +0000 (06:48 -1000)]
Merge tag 'pm-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These revert a cpuidle menu governor commit leading to a performance
regression, fix an amd-pstate driver regression introduced recently,
and fix new conditional guard definitions for runtime PM.
- Add missing _RET == 0 condition to recently introduced conditional
guard definitions for runtime PM (Rafael Wysocki)
- Revert a cpuidle menu governor change that introduced a serious
performance regression on Chromebooks with Intel Jasper Lake
processors (Rafael Wysocki)
- Fix an amd-pstate driver regression leading to EPP=0 after
hibernation (Mario Limonciello)"
* tag 'pm-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Fix conditional guard definitions
Revert "cpuidle: menu: Avoid discarding useful information"
cpufreq/amd-pstate: Fix a regression leading to EPP 0 after hibernate
Linus Torvalds [Thu, 23 Oct 2025 16:44:43 +0000 (06:44 -1000)]
Merge tag 'for-6.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- in send, fix duplicated rmdir operations when using extrefs
(hardlinks), receive can fail with ENOENT
- fixup of error check when reading extent root in ref-verify and
damaged roots are allowed by mount option (found by smatch)
- fix freeing partially initialized fs info (found by syzkaller)
- fix use-after-free when printing ref_tracking status of delayed
inodes
* tag 'for-6.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ref-verify: fix IS_ERR() vs NULL check in btrfs_build_ref_tree()
btrfs: fix delayed_node ref_tracker use after free
btrfs: send: fix duplicated rmdir operations when using extrefs
btrfs: directly free partially initialized fs_info in btrfs_check_leaked_roots()
When we do mlx5e_detach_netdev() we eventually disable blocking events
notifier, among those events are IPsec MPV events from IB to core.
So before disabling those blocking events, make sure to also unregister
the devcom device and mark all this device operations as complete,
in order to prevent the other device from using invalid netdev
during future devcom events which could cause the trace below.
net/mlx5: Refactor devcom to return NULL on failure
Devcom device and component registration isn't always critical to the
functionality of the caller, hence the registration can fail and we can
continue working with an ERR_PTR value saved inside a variable.
In order to avoid that make sure all devcom failures return NULL.
Alexei Lazar [Wed, 22 Oct 2025 12:29:39 +0000 (15:29 +0300)]
net/mlx5: Add PPHCR to PCAM supported registers mask
Add the PPHCR bit to the port_access_reg_cap_mask field of PCAM
register to indicate that the device supports the PPHCR register
and the RS-FEC histogram feature.
Jason Wang [Wed, 22 Oct 2025 03:44:21 +0000 (11:44 +0800)]
virtio-net: zero unused hash fields
When GSO tunnel is negotiated virtio_net_hdr_tnl_from_skb() tries to
initialize the tunnel metadata but forget to zero unused rxhash
fields. This may leak information to another side. Fixing this by
zeroing the unused hash fields.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Fixes: a2fb4bc4e2a6a ("net: implement virtio helpers to handle UDP GSO tunneling") Cc: <stable@vger.kernel.org> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20251022034421.70244-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Robert Marko [Tue, 21 Oct 2025 13:20:26 +0000 (15:20 +0200)]
net: phy: micrel: always set shared->phydev for LAN8814
Currently, during the LAN8814 PTP probe shared->phydev is only set if PTP
clock gets actually set, otherwise the function will return before setting
it.
This is an issue as shared->phydev is unconditionally being used when IRQ
is being handled, especially in lan8814_gpio_process_cap and since it was
not set it will cause a NULL pointer exception and crash the kernel.
So, simply always set shared->phydev to avoid the NULL pointer exception.
Fixes: b3f1a08fcf0d ("net: phy: micrel: Add support for PTP_PF_EXTTS for lan8814") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://patch.msgid.link/20251021132034.983936-1-robert.marko@sartura.hr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
vsock: fix lock inversion in vsock_assign_transport()
Syzbot reported a potential lock inversion deadlock between
vsock_register_mutex and sk_lock-AF_VSOCK when vsock_linger() is called.
The issue was introduced by commit 687aa0c5581b ("vsock: Fix
transport_* TOCTOU") which added vsock_register_mutex locking in
vsock_assign_transport() around the transport->release() call, that can
call vsock_linger(). vsock_assign_transport() can be called with sk_lock
held. vsock_linger() calls sk_wait_event() that temporarily releases and
re-acquires sk_lock. During this window, if another thread hold
vsock_register_mutex while trying to acquire sk_lock, a circular
dependency is created.
Fix this by releasing vsock_register_mutex before calling
transport->release() and vsock_deassign_transport(). This is safe
because we don't need to hold vsock_register_mutex while releasing the
old transport, and we ensure the new transport won't disappear by
obtaining a module reference first via try_module_get().
====================
fix poll behaviour for TCP-based tunnel protocols
This patch series introduces a polling function for datagram-style
sockets that operates on custom skb queues, and updates ovpn (the
OpenVPN data-channel offload module) and espintcp (the TCP Encapsulation
of IKE and IPsec Packets implementation) to use it accordingly.
Protocols like the aforementioned one decapsulate packets received over
TCP and deliver userspace-bound data through a separate skb queue, not
the standard sk_receive_queue. Previously, both relied on
datagram_poll(), which would signal readiness based on non-userspace
packets, leading to misleading poll results and unnecessary recv
attempts in userspace.
Patch 1 introduces datagram_poll_queue(), a variant of datagram_poll()
that accepts an explicit receive queue. This builds on the approach
introduced in commit b50b058, which extended other skb-related functions
to support custom queues. Patch 2 and 3 update espintcp_poll() and
ovpn_tcp_poll() respectively to use this helper, ensuring readiness is
only signaled when userspace data is available.
Each patch is self-contained and the ovpn one includes rationale and
lifecycle enforcement where appropriate.
====================
Ralf Lici [Tue, 21 Oct 2025 10:09:42 +0000 (12:09 +0200)]
ovpn: use datagram_poll_queue for socket readiness in TCP
openvpn TCP encapsulation uses a custom queue to deliver packets to
userspace. Currently it relies on datagram_poll, which checks
sk_receive_queue, leading to false readiness signals when that queue
contains non-userspace packets.
Switch ovpn_tcp_poll to use datagram_poll_queue with the peer's
user_queue, ensuring poll only signals readiness when userspace data is
actually available. Also refactor ovpn_tcp_poll in order to enforce the
assumption we can make on the lifetime of ovpn_sock and peer.
Ralf Lici [Tue, 21 Oct 2025 10:09:41 +0000 (12:09 +0200)]
espintcp: use datagram_poll_queue for socket readiness
espintcp uses a custom queue (ike_queue) to deliver packets to
userspace. The polling logic relies on datagram_poll, which checks
sk_receive_queue, which can lead to false readiness signals when that
queue contains non-userspace packets.
Switch espintcp_poll to use datagram_poll_queue with ike_queue, ensuring
poll only signals readiness when userspace data is actually available.
Ralf Lici [Tue, 21 Oct 2025 10:09:40 +0000 (12:09 +0200)]
net: datagram: introduce datagram_poll_queue for custom receive queues
Some protocols using TCP encapsulation (e.g., espintcp, openvpn) deliver
userspace-bound packets through a custom skb queue rather than the
standard sk_receive_queue.
Introduce datagram_poll_queue that accepts an explicit receive queue,
and convert datagram_poll into a wrapper around datagram_poll_queue.
This allows protocols with custom skb queues to reuse the core polling
logic without relying on sk_receive_queue.
Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20251021100942.195010-2-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Merge an ACPI device properties handling change fixing the order of
__acpi_node_get_property_reference() arguments broken by a recent
update (Sunil V L)
* 'acpi-property':
ACPI: property: Fix argument order in __acpi_node_get_property_reference()
- Revert a cpuidle menu governor change that introduced a serious
performance regression on Chromebooks with Intel Jasper Lake
processors (Rafael Wysocki)
- Fix an amd-pstate driver regression leading to EPP=0 after
hibernation (Mario Limonciello)
Horatiu Vultur [Tue, 21 Oct 2025 07:07:26 +0000 (09:07 +0200)]
net: phy: micrel: Add support for non PTP SKUs for lan8814
The lan8814 has 4 different SKUs and for 2 of these SKUs the PTP is
disabled. All these SKUs have the same value in the register 2 and 3.
Meaning that we can't differentiate them based on device id, therefore
check the SKU register and based on this allow or not to create a PTP
device.
Tonghao Zhang [Tue, 21 Oct 2025 05:09:33 +0000 (13:09 +0800)]
net: bonding: fix possible peer notify event loss or dup issue
If the send_peer_notif counter and the peer event notify are not synchronized.
It may cause problems such as the loss or dup of peer notify event.
Before this patch:
- If should_notify_peers is true and the lock for send_peer_notif-- fails, peer
event may be sent again in next mii_monitor loop, because should_notify_peers
is still true.
- If should_notify_peers is true and the lock for send_peer_notif-- succeeded,
but the lock for peer event fails, the peer event will be lost.
This patch locks the RTNL for send_peer_notif, events, and commit simultaneously.
Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications") Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Vincent Bernat <vincent@bernat.ch> Cc: <stable@vger.kernel.org> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20251021050933.46412-1-tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Markus Elfring [Mon, 20 Oct 2025 13:46:11 +0000 (15:46 +0200)]
net: ti: icssg-prueth: Omit a variable reassignment in prueth_netdev_init()
An error code was assigned to a variable and checked accordingly.
This value was passed to a dev_err_probe() call in an if branch.
This function is documented in the way that the same value is returned.
Thus delete two redundant variable reassignments.
The source code was transformed by using the Coccinelle software.
net: hsr: prevent creation of HSR device with slaves from another netns
HSR/PRP driver does not handle correctly having slaves/interlink devices
in a different net namespace. Currently, it is possible to create a HSR
link in a different net namespace than the slaves/interlink with the
following command:
ip link add hsr0 netns hsr-ns type hsr slave1 eth1 slave2 eth2
As there is no use-case on supporting this scenario, enforce that HSR
device link matches netns defined by IFLA_LINK_NETNSID.
The iproute2 command mentioned above will throw the following error:
Error: hsr: HSR slaves/interlink must be on the same net namespace than HSR link.
Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20251020135533.9373-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pei Xiao [Tue, 21 Oct 2025 09:42:27 +0000 (17:42 +0800)]
eth: fbnic: fix integer overflow warning in TLV_MAX_DATA definition
The TLV_MAX_DATA macro calculates (PAGE_SIZE - 512) which can exceed
the maximum value of a 16-bit unsigned integer on architectures with
large page sizes, causing compiler warnings:
drivers/net/ethernet/meta/fbnic/fbnic_tlv.h:83:24: warning: conversion
from 'long unsigned int' to 'short unsigned int' changes value from
'261632' to '65024' [-Woverflow]
Fix this by explicitly masking the result to 16 bits using bitwise AND
with 0xFFFF, ensuring the value fits within the expected data type
while maintaining the intended behavior for normal page sizes.
This preserves the existing functionality while eliminating the
compiler warning and potential undefined behavior from integer
truncation.
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Use regular arrays instead of flexible-array members (they're not
really needed in this case) in a couple of unions, and fix the
following warnings:
1 drivers/net/ethernet/spacemit/k1_emac.c:122:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
1 drivers/net/ethernet/spacemit/k1_emac.c:122:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
1 drivers/net/ethernet/spacemit/k1_emac.c:121:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
1 drivers/net/ethernet/spacemit/k1_emac.c:121:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Vivian Wang <wangruikang@iscas.ac.cn> Link: https://patch.msgid.link/aPd0YjO-oP60Lgvj@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexey Simakov [Tue, 21 Oct 2025 13:00:36 +0000 (16:00 +0300)]
sctp: avoid NULL dereference when chunk data buffer is missing
chunk->skb pointer is dereferenced in the if-block where it's supposed
to be NULL only.
chunk->skb can only be NULL if chunk->head_skb is not. Check for frag_list
instead and do it just before replacing chunk->skb. We're sure that
otherwise chunk->skb is non-NULL because of outer if() condition.
Jiasheng Jiang [Tue, 21 Oct 2025 18:24:56 +0000 (18:24 +0000)]
ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop
In ptp_ocp_sma_fb_init(), the code mistakenly used bp->sma[1]
instead of bp->sma[i] inside a for-loop, which caused only SMA[1]
to have its DIRECTION_CAN_CHANGE capability cleared. This led to
inconsistent capability flags across SMA pins.
Replace the has_gmac, has_gmac4 and has_xgmac ints, of which only one
can be set when matching a core to its driver backend, with an
enumerated type carrying the DWMAC core type.
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Chen-Yu Tsai <wens@kernel.org> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/E1vB6ld-0000000BIPy-2Qi4@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This series addresses several issues in the Renesas Ethernet AVB (ravb)
driver related descriptor ordering.
A potential ordering hazard in descriptor setup could cause
the DMA engine to start prematurely, leading to TX stalls on some
platforms.
The series includes the following changes:
Enforce descriptor type ordering to prevent early DMA start
Ensure proper write ordering of TX descriptor type fields to prevent the
DMA engine from observing an incomplete descriptor chain. This fixes
observed TX stalls on RZ/G2L platforms running RT kernels.
Tested on R/G1x Gen2, RZ/G2x Gen3 and RZ/G2L family hardware.
====================
Lad Prabhakar [Fri, 17 Oct 2025 15:18:30 +0000 (16:18 +0100)]
net: ravb: Ensure memory write completes before ringing TX doorbell
Add a final dma_wmb() barrier before triggering the transmit request
(TCCR_TSRQ) to ensure all descriptor and buffer writes are visible to
the DMA engine.
According to the hardware manual, a read-back operation is required
before writing to the doorbell register to guarantee completion of
previous writes. Instead of performing a dummy read, a dma_wmb() is
used to both enforce the same ordering semantics on the CPU side and
also to ensure completion of writes.
Lad Prabhakar [Fri, 17 Oct 2025 15:18:29 +0000 (16:18 +0100)]
net: ravb: Enforce descriptor type ordering
Ensure the TX descriptor type fields are published in a safe order so the
DMA engine never begins processing a descriptor chain before all descriptor
fields are fully initialised.
For multi-descriptor transmits the driver writes DT_FEND into the last
descriptor and DT_FSTART into the first. The DMA engine begins processing
when it observes DT_FSTART. Move the dma_wmb() barrier so it executes
immediately after DT_FEND and immediately before writing DT_FSTART
(and before DT_FSINGLE in the single-descriptor case). This guarantees
that all prior CPU writes to the descriptor memory are visible to the
device before DT_FSTART is seen.
This avoids a situation where compiler/CPU reordering could publish
DT_FSTART ahead of DT_FEND or other descriptor fields, allowing the DMA to
start on a partially initialised chain and causing corrupted transmissions
or TX timeouts. Such a failure was observed on RZ/G2L with an RT kernel as
transmit queue timeouts and device resets.
Linus Torvalds [Thu, 23 Oct 2025 01:00:34 +0000 (15:00 -1000)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"All driver fixes. The big change is the storvsc one to rejig the
hyper-v channel handling to be more efficient for SMP virtual
machines"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: phy: dt-bindings: Add QMP UFS PHY compatible for Kaanapali
scsi: ufs: qcom: dt-bindings: Document the Kaanapali UFS controller
scsi: libfc: Prevent integer overflow in fc_fcp_recv_data()
scsi: qla4xxx: Fix typos in comments
scsi: storvsc: Prefer returning channel with the same CPU as on the I/O issuing CPU
Linus Torvalds [Thu, 23 Oct 2025 00:57:35 +0000 (14:57 -1000)]
Merge tag 'mm-hotfixes-stable-2025-10-22-12-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"17 hotfixes. 12 are cc:stable and 14 are for MM.
There's a two-patch DAMON series from SeongJae Park which addresses a
missed check and possible memory leak. Apart from that it's all
singletons - please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-10-22-12-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
csky: abiv2: adapt to new folio flags field
mm/damon/core: use damos_commit_quota_goal() for new goal commit
mm/damon/core: fix potential memory leak by cleaning ops_filter in damon_destroy_scheme
hugetlbfs: move lock assertions after early returns in huge_pmd_unshare()
vmw_balloon: indicate success when effectively deflating during migration
mm/damon/core: fix list_add_tail() call on damon_call()
mm/mremap: correctly account old mapping after MREMAP_DONTUNMAP remap
mm: prevent poison consumption when splitting THP
ocfs2: clear extent cache after moving/defragmenting extents
mm: don't spin in add_stack_record when gfp flags don't allow
dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC
mm/damon/sysfs: dealloc commit test ctx always
mm/damon/sysfs: catch commit test ctx alloc failure
hung_task: fix warnings caused by unaligned lock pointers
Linus Torvalds [Wed, 22 Oct 2025 15:17:32 +0000 (05:17 -1000)]
Merge tag 'platform-drivers-x86-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- alienware-wmi-wmax:
- Fix NULL pointer dereference in sleep handlers
- Add AWCC support to Dell G15 5530
- mellanox: mlxbf-pmc: add sysfs_attr_init() to count_clock init
* tag 'platform-drivers-x86-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: alienware-wmi-wmax: Add AWCC support to Dell G15 5530
MAINTAINERS: add Denis Benato as maintainer for asus notebooks
platform/mellanox: mlxbf-pmc: add sysfs_attr_init() to count_clock init
platform/x86: alienware-wmi-wmax: Fix NULL pointer dereference in sleep handlers
Since pm_runtime_get_active() returns 0 on success, all of the
DEFINE_GUARD_COND() macros in pm_runtime.h need the "_RET == 0"
condition at the end of the argument list or they would not work
correctly.
Leo Martins [Mon, 20 Oct 2025 23:16:15 +0000 (16:16 -0700)]
btrfs: fix delayed_node ref_tracker use after free
Move the print before releasing the delayed node.
In my initial testing there was a bug that was causing delayed_nodes
to not get freed which is why I put the print after the release. This
obviously neglects the case where the delayed node is properly freed.
Add condition to make sure we only print if we have more than one
reference to the delayed_node to prevent printing when we only have
the reference taken in btrfs_kill_all_delayed_nodes().
Fixes: b767a28d6154 ("btrfs: print leaked references in kill_all_delayed_nodes()") Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
That commit broke cache=mmap, a mode that doesn't cache metadata,
but still has writeback cache.
In commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate
for uncached mode too") we considered metadata cache to be enough to
not look at the server, but in writeback cache too looking at the server
size would make the vfs consider the file has been truncated before the
data has been flushed out, making the following repro fail (nothing is
ever read back, the resulting file ends up with no data written)
```
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
char buf[4096];
int main(int argc, char *argv[])
{
int ret, i;
int fdw, fdr;
if (fdr < 0) {
fprintf(stderr, "cannot open fdr\n");
close(fdw);
return 1;
}
for (i = 0; i < 10; i++) {
ret = read(fdr, buf, sizeof(buf));
fprintf(stderr, "i: %d, read returns %d\n", i, ret);
}
close(fdr);
close(fdw);
return 0;
}
```
There is a fix for this particular reproducer but it looks like there
are other problems around metadata refresh (e.g. around file rename), so
revert this to avoid d_revalidate in uncached mode for now.
Heiner Kallweit [Mon, 20 Oct 2025 06:54:54 +0000 (08:54 +0200)]
net: hibmcge: select FIXED_PHY
hibmcge uses fixed_phy_register() et al, but doesn't cater for the case
that hibmcge is built-in and fixed_phy is a module. To solve this
select FIXED_PHY.
Fixes: 1d7cd7a9c69c ("net: hibmcge: support scenario without PHY") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/c4fc061f-b6d5-418b-a0dc-6b238cdbedce@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This issue can occur when the link state changes from off to on
(e.g., plugging or unplugging the LAN cable) while transmitting a
packet. If the skb has a destructor, a warning message may be
printed in this situation.
Shi Hao [Sat, 18 Oct 2025 05:25:41 +0000 (10:55 +0530)]
eth: 3c515: replace cleanup_module with __exit
update old legacy cleanup_module from the file with __exit
module as per kernel code practices and restore the #ifdef
MODULE condition to allow successful compilation as a built
-in driver.
The file had an old cleanup_module still in use which could
be updated with __exit module function although its init_module
is indeed newer however the cleanup_module was still
using the older version of exit.
To set proper exit module function replace cleanup_module
with __exit corkscrew_exit_module to align it to the
kernel code consistency.
====================
net: common feature compute for upper interface
Some high-level virtual drivers need to compute features from their
lower devices, but each currently has its own implementation and may
miss some feature computations. This patch set introduces a common function
to compute features for such devices.
Currently, bonding, team, and bridge have been updated to use the new
helper.
====================
Hangbin Liu [Fri, 17 Oct 2025 03:41:55 +0000 (03:41 +0000)]
net: bridge: use common function to compute the features
Previously, bridge ignored all features propagation and DST retention,
only handling explicitly the GSO limits.
By switching to the new helper netdev_compute_master_upper_features(), the bridge
now expose additional features, depending on the lowers capabilities.
Since br_set_gso_limits() is already covered by the helper, it can be
removed safely.
Bridge has it's own way to update needed_headroom. So we don't need to
update it in the helper.
Hangbin Liu [Fri, 17 Oct 2025 03:41:54 +0000 (03:41 +0000)]
team: use common function to compute the features
Use the new helper netdev_compute_master_upper_features() to compute the
team device features. This helper performs both the feature computation
and the netdev_change_features() call.
Note that such change replace the lower layer traversing currently done
using team->port_list with netdev_for_each_lower_dev(). Such change is
safe as `port_list` contains exactly the same elements as
`team->dev->adj_list.lower` and the helper is always invoked under the
RTNL lock.
With this change, the explicit netdev_change_features() in team_add_slave()
can be safely removed, as team_port_add() already takes care of the
notification via netdev_compute_master_upper_features(), and same thing for
team_del_slave()
This also fixes missing computations for MPLS, XFRM, and TSO/GSO partial
features.
Hangbin Liu [Fri, 17 Oct 2025 03:41:53 +0000 (03:41 +0000)]
bonding: use common function to compute the features
Use the new functon netdev_compute_master_upper_features() to compute the bonding
features.
Note that bond_compute_features() currently uses bond_for_each_slave()
to traverse the lower devices list, and that is just a macro wrapper of
netdev_for_each_lower_private(). We use similar helper
netdev_for_each_lower_dev() in netdev_compute_master_upper_features() to
iterate the slave device, as there is not need to get the private data.
Hangbin Liu [Fri, 17 Oct 2025 03:41:52 +0000 (03:41 +0000)]
net: add a common function to compute features for upper devices
Some high level software drivers need to compute features from lower
devices. But each has their own implementations and may lost some
feature compute. Let's use one common function to compute features
for kinds of these devices.
The new helper uses the current bond implementation as the reference
one, as the latter already handles all the relevant aspects: netdev
features, TSO limits and dst retention.
Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251017034155.61990-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 22 Oct 2025 00:42:54 +0000 (17:42 -0700)]
Merge tag 'linux-can-fixes-for-6.18-20251020' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-10-20
All patches are by me. The first 3 update the bxcan, esd and rockchip
driver to drop skbs in xmit of the device is in listen only mode.
The last patch targets the CAN netlink implementation to allow the
disabling of automatic restart after Bus-Off, even if the a driver
doesn't implement that callback.
* tag 'linux-can-fixes-for-6.18-20251020' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: netlink: can_changelink(): allow disabling of automatic restart
can: rockchip-canfd: rkcanfd_start_xmit(): use can_dev_dropped_skb() instead of can_dropped_invalid_skb()
can: esd: acc_start_xmit(): use can_dev_dropped_skb() instead of can_dropped_invalid_skb()
can: bxcan: bxcan_start_xmit(): use can_dev_dropped_skb() instead of can_dropped_invalid_skb()
====================
Given the introduction of @have_bh_lock variable, it seems the author
intent was to have the local_unlock_nested_bh() after the @unlock label.
Fixes: 25718fdcbdd2 ("net: gro_cells: Use nested-BH locking for gro_cell") Reported-by: syzbot+f9651b9a8212e1c8906f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68f65eb9.a70a0220.205af.0034.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251020161114.1891141-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alok Tiwari [Mon, 20 Oct 2025 17:09:13 +0000 (10:09 -0700)]
devlink: region: correct port region lookup to use port_ops
The function devlink_port_region_get_by_name() incorrectly uses
region->ops->name to compare the region name. as it is not any critical
impact as ops and port_ops define as union for devlink_region but as per
code logic it should refer port_ops here.
No functional impact as ops and port_ops are part of same union,
and name is the first member of both.
Update it to use region->port_ops->name to properly reference
the name of the devlink port region.
====================
mptcp: handle late ADD_ADDR + selftests skip
Here are a few independent fixes related to MPTCP and its selftests:
- Patch 1: correctly handle ADD_ADDR being received after the switch to
'fully-established'. A fix for another recent fix backported up to
v5.14.
- Patches 2-5: properly mark some MPTCP Join subtests as 'skipped' if
the tested kernel doesn't support the feature being validated. Some
fixes for up to v5.13, v5.18, v6.11 and v6.18-rc1 respectively.
====================
mptcp: pm: in-kernel: C-flag: handle late ADD_ADDR
The special C-flag case expects the ADD_ADDR to be received when
switching to 'fully-established'. But for various reasons, the ADD_ADDR
could be sent after the "4th ACK", and the special case doesn't work.
On NIPA, the new test validating this special case for the C-flag failed
a few times, e.g.
102 default limits, server deny join id 0
syn rx [FAIL] got 0 JOIN[s] syn rx expected 2
Server ns stats
(...)
MPTcpExtAddAddrTx 1
MPTcpExtEchoAdd 1
synack rx [FAIL] got 0 JOIN[s] synack rx expected 2
ack rx [FAIL] got 0 JOIN[s] ack rx expected 2
join Rx [FAIL] see above
syn tx [FAIL] got 0 JOIN[s] syn tx expected 2
join Tx [FAIL] see above
I had a suspicion about what the issue could be: the ADD_ADDR might have
been received after the switch to the 'fully-established' state. The
issue was not easy to reproduce. The packet capture shown that the
ADD_ADDR can indeed be sent with a delay, and the client would not try
to establish subflows to it as expected.
A simple fix is not to mark the endpoints as 'used' in the C-flag case,
when looking at creating subflows to the remote initial IP address and
port. In this case, there is no need to try.
Note: newly added fullmesh endpoints will still continue to be used as
expected, thanks to the conditions behind mptcp_pm_add_addr_c_flag_case.
Move the corresponding check for full compress indexes to
`z_erofs_load_lcluster_from_disk()` to also cover subpage compact
compress indexes.
It also fixes the position of `m->type >= Z_EROFS_LCLUSTER_TYPE_MAX`
check, since it should be placed right after
`z_erofs_load_{compact,full}_lcluster()`.
Fixes: 8d2517aaeea3 ("erofs: fix up compacted indexes for block size < 4096") Fixes: 1a5223c182fd ("erofs: do sanity check on m->type in z_erofs_load_compact_lcluster()") Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/r/35167.1760645886@localhost Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Recent changes require the raw folio flags to be accessed via ".f". The
merge commit introducing this change adapted most architecture code but
forgot the csky abiv2.
[rppt@kernel.org: add fix for arch/csky/abiv2/cacheflush.c] Link: https://lkml.kernel.org/r/aPCE238oxAB9QcZa@kernel.org Fixes: 53fbef56e07d ("mm: introduce memdesc_flags_t") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Guo Ren <guoren@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 14 Oct 2025 00:18:44 +0000 (17:18 -0700)]
mm/damon/core: use damos_commit_quota_goal() for new goal commit
When damos_commit_quota_goals() is called for adding new DAMOS quota goals
of DAMOS_QUOTA_USER_INPUT metric, current_value fields of the new goals
should be also set as requested.
However, damos_commit_quota_goals() is not updating the field for the
case, since it is setting only metrics and target values using
damos_new_quota_goal(), and metric-optional union fields using
damos_commit_quota_goal_union(). As a result, users could see the first
current_value parameter that committed online with a new quota goal is
ignored. Users are assumed to commit the current_value for
DAMOS_QUOTA_USER_INPUT quota goals, since it is being used as a feedback.
Hence the real impact would be subtle. That said, this is obviously not
intended behavior.
Fix the issue by using damos_commit_quota_goal() which sets all quota goal
parameters, instead of damos_commit_quota_goal_union(), which sets only
the union fields.
Link: https://lkml.kernel.org/r/20251014001846.279282-1-sj@kernel.org Fixes: 1aef9df0ee90 ("mm/damon/core: commit damos_quota_goal->nid") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Enze Li [Tue, 14 Oct 2025 08:42:25 +0000 (16:42 +0800)]
mm/damon/core: fix potential memory leak by cleaning ops_filter in damon_destroy_scheme
Currently, damon_destroy_scheme() only cleans up the filter list but
leaves ops_filter untouched, which could lead to memory leaks when a
scheme is destroyed.
This patch ensures both filter and ops_filter are properly freed in
damon_destroy_scheme(), preventing potential memory leaks.
Link: https://lkml.kernel.org/r/20251014084225.313313-1-lienze@kylinos.cn Fixes: ab82e57981d0 ("mm/damon/core: introduce damos->ops_filters") Signed-off-by: Enze Li <lienze@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Tested-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hugetlbfs: move lock assertions after early returns in huge_pmd_unshare()
When hugetlb_vmdelete_list() processes VMAs during truncate operations, it
may encounter VMAs where huge_pmd_unshare() is called without the required
shareable lock. This triggers an assertion failure in
hugetlb_vma_assert_locked().
The previous fix in commit dd83609b8898 ("hugetlbfs: skip VMAs without
shareable locks in hugetlb_vmdelete_list") skipped entire VMAs without
shareable locks to avoid the assertion. However, this prevented pages
from being unmapped and freed, causing a regression in
fallocate(PUNCH_HOLE) operations where pages were not freed immediately,
as reported by Mark Brown.
Instead of checking locks in the caller or skipping VMAs, move the lock
assertions in huge_pmd_unshare() to after the early return checks. The
assertions are only needed when actual PMD unsharing work will be
performed. If the function returns early because sz != PMD_SIZE or the
PMD is not shared, no locks are required and assertions should not fire.
This approach reverts the VMA skipping logic from commit dd83609b8898
("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
while moving the assertions to avoid the assertion failure, keeping all
the logic within huge_pmd_unshare() itself and allowing page unmapping and
freeing to proceed for all VMAs.
Link: https://lkml.kernel.org/r/20251014113344.21194-1-kartikey406@gmail.com Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list") Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reported-by: <syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com> Reported-by: Mark Brown <broonie@kernel.org> Closes: https://syzkaller.appspot.com/bug?extid=f26d7c75c26ec19790e7 Suggested-by: David Hildenbrand <david@redhat.com> Suggested-by: Oscar Salvador <osalvador@suse.de> Tested-by: <syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
vmw_balloon: indicate success when effectively deflating during migration
When migrating a balloon page, we first deflate the old page to then
inflate the new page.
However, if inflating the new page succeeded, we effectively deflated the
old page, reducing the balloon size.
In that case, the migration actually worked: similar to migrating+
immediately deflating the new page. The old page will be freed back to
the buddy.
Right now, the core will leave the page be marked as isolated (as we
returned an error). When later trying to putback that page, we will run
into the WARN_ON_ONCE() in balloon_page_putback().
That handling was changed in commit 3544c4faccb8 ("mm/balloon_compaction:
stop using __ClearPageMovable()"); before that change, we would have
tolerated that way of handling it.
To fix it, let's just return 0 in that case, making the core effectively
just clear the "isolated" flag + freeing it back to the buddy as if the
migration succeeded. Note that the new page will also get freed when the
core puts the last reference.
Note that this also makes it all be more consistent: we will no longer
unisolate the page in the balloon driver while keeping it marked as being
isolated in migration core.
This was found by code inspection.
Link: https://lkml.kernel.org/r/20251014124455.478345-1-david@redhat.com Fixes: 3544c4faccb8 ("mm/balloon_compaction: stop using __ClearPageMovable()") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Tue, 14 Oct 2025 20:59:36 +0000 (13:59 -0700)]
mm/damon/core: fix list_add_tail() call on damon_call()
Each damon_ctx maintains callback requests using a linked list
(damon_ctx->call_controls). When a new callback request is received via
damon_call(), the new request should be added to the list. However, the
function is making a mistake at list_add_tail() invocation: putting the
new item to add and the list head to add it before, in the opposite order.
Because of the linked list manipulation implementation, the new request
can still be reached from the context's list head. But the list items
that were added before the new request are dropped from the list.
As a result, the callbacks are unexpectedly not invocated. Worse yet, if
the dropped callback requests were dynamically allocated, the memory is
leaked. Actually DAMON sysfs interface is using a dynamically allocated
repeat-mode callback request for automatic essential stats update. And
because the online DAMON parameters commit is using a non-repeat-mode
callback request, the issue can easily be reproduced, like below.
# damo start --damos_action stat --refresh_stat 1s
# damo tune --damos_action stat --refresh_stat 1s
The first command dynamically allocates the repeat-mode callback request
for automatic essential stat update. Users can see the essential stats
are automatically updated for every second, using the sysfs interface.
The second command calls damon_commit() with a new callback request that
was made for the commit. As a result, the previously added repeat-mode
callback request is dropped from the list. The automatic stats refresh
stops working, and the memory for the repeat-mode callback request is
leaked. It can be confirmed using kmemleak.
Fix the mistake on the list_add_tail() call.
Link: https://lkml.kernel.org/r/20251014205939.1206-1-sj@kernel.org Fixes: 004ded6bee11 ("mm/damon: accept parallel damon_call() requests") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Mon, 13 Oct 2025 16:58:36 +0000 (17:58 +0100)]
mm/mremap: correctly account old mapping after MREMAP_DONTUNMAP remap
Commit b714ccb02a76 ("mm/mremap: complete refactor of move_vma()")
mistakenly introduced a new behaviour - clearing the VM_ACCOUNT flag of
the old mapping when a mapping is mremap()'d with the MREMAP_DONTUNMAP
flag set.
While we always clear the VM_LOCKED and VM_LOCKONFAULT flags for the old
mapping (the page tables have been moved, so there is no data that could
possibly be locked in memory), there is no reason to touch any other VMA
flags.
This is because after the move the old mapping is in a state as if it were
freshly mapped. This implies that the attributes of the mapping ought to
remain the same, including whether or not the mapping is accounted.
Link: https://lkml.kernel.org/r/20251013165836.273113-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Fixes: b714ccb02a76 ("mm/mremap: complete refactor of move_vma()") Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eric Dumazet [Fri, 17 Oct 2025 13:37:12 +0000 (13:37 +0000)]
net: avoid extra access to sk->sk_wmem_alloc in sock_wfree()
UDP TX packets destructor is sock_wfree().
It suffers from a cache line bouncing in sock_def_write_space_wfree().
Instead of reading sk->sk_wmem_alloc after we just did an atomic RMW
on it, use __refcount_sub_and_test() to get the old value for free,
and pass the new value to sock_def_write_space_wfree().
====================
net: airoha: Add AN7583 ethernet controller support
Introduce support for AN7583 ethernet controller to airoha-eth dirver.
The main differences between EN7581 and AN7583 is the latter runs a
single PPE module while EN7581 runs two of them. Moreover PPE SRAM in
AN7583 SoC is reduced to 8K (while SRAM is 16K on EN7581).