Guenter Roeck [Wed, 25 Mar 2026 01:54:11 +0000 (18:54 -0700)]
hwmon: (pmbus) Introduce the concept of "write-only" attributes
Attributes intended to clear sensor history are intended to be writeable
only. Reading those attributes today results in reporting more or less
random values. To avoid ABI surprises, have those attributes explicitly
return 0 when reading.
Fixes: 787c095edaa9d ("hwmon: (pmbus/core) Add support for rated attributes") Reviewed-by: Sanman Pradhan <psanman@juniper.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Guenter Roeck [Tue, 24 Mar 2026 23:41:07 +0000 (16:41 -0700)]
hwmon: (pmbus) Mark lowest/average/highest/rated attributes as read-only
Writing those attributes is not supported, so mark them as read-only.
Prior to this change, attempts to write into these attributes returned
an error.
Mark boolean fields in struct pmbus_limit_attr and in struct
pmbus_sensor_attr as bit fields to reduce configuration data size.
The data is scanned only while probing, so performance is not a concern.
Fixes: 6f183d33a02e6 ("hwmon: (pmbus) Add support for peak attributes") Reviewed-by: Sanman Pradhan <psanman@juniper.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Alexander Popov [Tue, 24 Mar 2026 22:46:02 +0000 (01:46 +0300)]
wifi: virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free
Currently we execute `SET_NETDEV_DEV(dev, &priv->lowerdev->dev)` for
the virt_wifi net devices. However, unregistering a virt_wifi device in
netdev_run_todo() can happen together with the device referenced by
SET_NETDEV_DEV().
It can result in use-after-free during the ethtool operations performed
on a virt_wifi device that is currently being unregistered. Such a net
device can have the `dev.parent` field pointing to the freed memory,
but ethnl_ops_begin() calls `pm_runtime_get_sync(dev->dev.parent)`.
Let's remove SET_NETDEV_DEV for virt_wifi to avoid bugs like this:
==================================================================
BUG: KASAN: slab-use-after-free in __pm_runtime_resume+0xe2/0xf0
Read of size 2 at addr ffff88810cfc46f8 by task pm/606
Pengpeng Hou [Wed, 25 Mar 2026 00:42:45 +0000 (08:42 +0800)]
Bluetooth: btusb: clamp SCO altsetting table indices
btusb_work() maps the number of active SCO links to USB alternate
settings through a three-entry lookup table when CVSD traffic uses
transparent voice settings. The lookup currently indexes alts[] with
data->sco_num - 1 without first constraining sco_num to the number of
available table entries.
While the table only defines alternate settings for up to three SCO
links, data->sco_num comes from hci_conn_num() and is used directly.
Cap the lookup to the last table entry before indexing it so the
driver keeps selecting the highest supported alternate setting without
reading past alts[].
Fixes: baac6276c0a9 ("Bluetooth: btusb: handle mSBC audio over USB Endpoints") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Hyunwoo Kim [Fri, 20 Mar 2026 11:23:10 +0000 (20:23 +0900)]
Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
l2cap_config_req() processes CONFIG_REQ for channels in BT_CONNECTED
state to support L2CAP reconfiguration (e.g. MTU changes). However,
since both CONF_INPUT_DONE and CONF_OUTPUT_DONE are already set from
the initial configuration, the reconfiguration path falls through to
l2cap_ertm_init(), which re-initializes tx_q, srej_q, srej_list, and
retrans_list without freeing the previous allocations and sets
chan->sdu to NULL without freeing the existing skb. This leaks all
previously allocated ERTM resources.
Additionally, l2cap_parse_conf_req() does not validate the minimum
value of remote_mps derived from the RFC max_pdu_size option. A zero
value propagates to l2cap_segment_sdu() where pdu_len becomes zero,
causing the while loop to never terminate since len is never
decremented, exhausting all available memory.
Fix the double-init by skipping l2cap_ertm_init() and
l2cap_chan_ready() when the channel is already in BT_CONNECTED state,
while still allowing the reconfiguration parameters to be updated
through l2cap_parse_conf_req(). Also add a pdu_len zero check in
l2cap_segment_sdu() as a safeguard.
Fixes: 96298f640104 ("Bluetooth: L2CAP: handle l2cap config request during open state") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Hyunwoo Kim [Fri, 20 Mar 2026 11:01:26 +0000 (20:01 +0900)]
Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
l2cap_conn_del() calls cancel_delayed_work_sync() for both info_timer
and id_addr_timer while holding conn->lock. However, the work functions
l2cap_info_timeout() and l2cap_conn_update_id_addr() both acquire
conn->lock, creating a potential AB-BA deadlock if the work is already
executing when l2cap_conn_del() takes the lock.
Move the work cancellations before acquiring conn->lock and use
disable_delayed_work_sync() to additionally prevent the works from
being rearmed after cancellation, consistent with the pattern used in
hci_conn_del().
Fixes: ab4eedb790ca ("Bluetooth: L2CAP: Fix corrupted list in hci_chan_del") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cen Zhang [Wed, 18 Mar 2026 12:54:03 +0000 (20:54 +0800)]
Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
btintel_hw_error() issues two __hci_cmd_sync() calls (HCI_OP_RESET
and Intel exception-info retrieval) without holding
hci_req_sync_lock(). This lets it race against
hci_dev_do_close() -> btintel_shutdown_combined(), which also runs
__hci_cmd_sync() under the same lock. When both paths manipulate
hdev->req_status/req_rsp concurrently, the close path may free the
response skb first, and the still-running hw_error path hits a
slab-use-after-free in kfree_skb().
Wrap the whole recovery sequence in hci_req_sync_lock/unlock so it
is serialized with every other synchronous HCI command issuer.
Below is the data race report and the kasan report:
BUG: data-race in __hci_cmd_sync_sk / btintel_shutdown_combined
read of hdev->req_rsp at net/bluetooth/hci_sync.c:199
by task kworker/u17:1/83:
__hci_cmd_sync_sk+0x12f2/0x1c30 net/bluetooth/hci_sync.c:200
__hci_cmd_sync+0x55/0x80 net/bluetooth/hci_sync.c:223
btintel_hw_error+0x114/0x670 drivers/bluetooth/btintel.c:254
hci_error_reset+0x348/0xa30 net/bluetooth/hci_core.c:1030
write/free by task ioctl/22580:
btintel_shutdown_combined+0xd0/0x360
drivers/bluetooth/btintel.c:3648
hci_dev_close_sync+0x9ae/0x2c10 net/bluetooth/hci_sync.c:5246
hci_dev_do_close+0x232/0x460 net/bluetooth/hci_core.c:526
BUG: KASAN: slab-use-after-free in
sk_skb_reason_drop+0x43/0x380 net/core/skbuff.c:1202
Read of size 4 at addr ffff888144a738dc
by task kworker/u17:1/83:
__hci_cmd_sync_sk+0x12f2/0x1c30 net/bluetooth/hci_sync.c:200
__hci_cmd_sync+0x55/0x80 net/bluetooth/hci_sync.c:223
btintel_hw_error+0x186/0x670 drivers/bluetooth/btintel.c:260
Fixes: 973bb97e5aee ("Bluetooth: btintel: Add generic function for handling hardware errors") Signed-off-by: Cen Zhang <zzzccc427@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Zhang Chen [Thu, 19 Mar 2026 09:32:11 +0000 (17:32 +0800)]
Bluetooth: L2CAP: Fix send LE flow credits in ACL link
When the L2CAP channel mode is L2CAP_MODE_ERTM/L2CAP_MODE_STREAMING,
l2cap_publish_rx_avail will be called and le flow credits will be sent in
l2cap_chan_rx_avail, even though the link type is ACL.
The logs in question as follows:
> ACL Data RX: Handle 129 flags 0x02 dlen 12
L2CAP: Unknown (0x16) ident 4 len 4
40 00 ed 05
< ACL Data TX: Handle 129 flags 0x00 dlen 10
L2CAP: Command Reject (0x01) ident 4 len 2
Reason: Command not understood (0x0000)
Bluetooth: Unknown BR/EDR signaling command 0x16
Bluetooth: Wrong link type (-22)
Fixes: ce60b9231b66 ("Bluetooth: compute LE flow credits based on recvbuf space") Signed-off-by: Zhang Chen <zhangchen01@kylinos.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Boqun Feng [Thu, 19 Mar 2026 00:56:21 +0000 (17:56 -0700)]
rcu: Use an intermediate irq_work to start process_srcu()
Since commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms
of SRCU-fast") we switched to SRCU in BPF. However as BPF instrument can
happen basically everywhere (including where a scheduler lock is held),
call_srcu() now needs to avoid acquiring scheduler lock because
otherwise it could cause deadlock [1]. Fix this by following what the
previous RCU Tasks Trace did: using an irq_work to delay the queuing of
the work to start process_srcu().
[boqun: Apply Joel's feedback]
[boqun: Apply Andrea's test feedback]
Reported-by: Andrea Righi <arighi@nvidia.com> Closes: https://lore.kernel.org/all/abjzvz_tL_siV17s@gpd4/ Fixes: commit c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast") Link: https://lore.kernel.org/rcu/3c4c5a29-24ea-492d-aeee-e0d9605b4183@nvidia.com/ Suggested-by: Zqiang <qiang.zhang@linux.dev> Tested-by: Andrea Righi <arighi@nvidia.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Boqun Feng <boqun@kernel.org>
Paul E. McKenney [Sat, 21 Mar 2026 03:29:20 +0000 (20:29 -0700)]
srcu: Push srcu_node allocation to GP when non-preemptible
When the srcutree.convert_to_big and srcutree.big_cpu_lim kernel boot
parameters specify initialization-time allocation of the srcu_node
tree for statically allocated srcu_struct structures (for example, in
DEFINE_SRCU() at build time instead of init_srcu_struct() at runtime),
init_srcu_struct_nodes() will attempt to dynamically allocate this tree
at the first run-time update-side use of this srcu_struct structure,
but while holding a raw spinlock. Because the memory allocator can
acquire non-raw spinlocks, this can result in lockdep splats.
This commit therefore uses the same SRCU_SIZE_ALLOC trick that is used
when the first run-time update-side use of this srcu_struct structure
happens before srcu_init() is called. The actual allocation then takes
place from workqueue context at the ends of upcoming SRCU grace periods.
[boqun: Adjust the sha1 of the Fixes tag]
Fixes: 175b45ed343a ("srcu: Use raw spinlocks so call_srcu() can be used under preempt_disable()") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun@kernel.org>
Paul E. McKenney [Sat, 14 Mar 2026 11:12:58 +0000 (04:12 -0700)]
srcu: Use raw spinlocks so call_srcu() can be used under preempt_disable()
Tree SRCU has used non-raw spinlocks for many years, motivated by a desire
to avoid unnecessary real-time latency and the absence of any reason to
use raw spinlocks. However, the recent use of SRCU in tracing as the
underlying implementation of RCU Tasks Trace means that call_srcu()
is invoked from preemption-disabled regions of code, which in turn
requires that any locks acquired by call_srcu() or its callees must be
raw spinlocks.
This commit therefore converts SRCU's spinlocks to raw spinlocks.
[boqun: Add Fixes tag]
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Fixes: c27cea4416a3 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Petr Mladek [Wed, 25 Mar 2026 12:34:18 +0000 (13:34 +0100)]
workqueue: Better describe stall check
Try to be more explicit why the workqueue watchdog does not take
pool->lock by default. Spin locks are full memory barriers which
delay anything. Obviously, they would primary delay operations
on the related worker pools.
Explain why it is enough to prevent the false positive by re-checking
the timestamp under the pool->lock.
Finally, make it clear what would be the alternative solution in
__queue_work() which is a hotter path.
Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
Shuming Fan [Wed, 25 Mar 2026 11:04:06 +0000 (19:04 +0800)]
ASoC: SDCA: fix finding wrong entity
This patch fixes an issue like:
where searching for the entity 'FU 11' could incorrectly match 'FU 113' first.
The driver should first perform an exact match on the full string name.
If no exact match is found, it can then fall back to a partial match.
Fixes: 48fa77af2f4a ("ASoC: SDCA: Add terminal type into input/output widget name") Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://patch.msgid.link/20260325110406.3232420-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
Jianmin Lv [Fri, 20 Mar 2026 10:10:12 +0000 (18:10 +0800)]
MAINTAINERS: Update GPU driver maintainer information
I and Qianhai are GPU R&D engineers at Loongson, specializing
in kernel driver development. We understand that the current
Loongson GPU driver lacks dedicated maintenance resources
because of some reasons.
As Loongson GPU driver developers, we have both the capability
and the responsibility to continuously maintain the Loongson
GPU driver, ensuring minimal impact on its users. After internal
discussions, our team has decided to recommend me and Qianhai
to take over the maintenance responsibilities, and recommend
Huacai, Mingcong and Ruoyao to help to review.
And We'll continue to maintain it for current supported chips
and drive future updates according to chip support plan.
Sanman Pradhan [Wed, 25 Mar 2026 05:13:06 +0000 (05:13 +0000)]
hwmon: (adm1177) fix sysfs ABI violation and current unit conversion
The adm1177 driver exposes the current alert threshold through
hwmon_curr_max_alarm. This violates the hwmon sysfs ABI, where
*_alarm attributes are read-only status flags and writable thresholds
must use currN_max.
The driver also stores the threshold internally in microamps, while
currN_max is defined in milliamps. Convert the threshold accordingly
on both the read and write paths.
Widen the cached threshold and related calculations to 64 bits so
that small shunt resistor values do not cause truncation or overflow.
Also use 64-bit arithmetic for the mA/uA conversions, clamp writes
to the range the hardware can represent, and propagate failures from
adm1177_write_alert_thr() instead of silently ignoring them.
Update the hwmon documentation to reflect the attribute rename and
the correct units returned by the driver.
Fixes: 09b08ac9e8d5 ("hwmon: (adm1177) Add ADM1177 Hot Swap Controller and Digital Power Monitor driver") Signed-off-by: Sanman Pradhan <psanman@juniper.net> Acked-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20260325051246.28262-1-sanman.pradhan@hpe.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Shuming Fan [Wed, 25 Mar 2026 09:20:17 +0000 (17:20 +0800)]
ASoC: SDCA: remove the max count of initialization table
The number of the initialization table may exceed 2048.
Therefore, this patch removes the limitation and allows the driver to
allocate memory dynamically based on the size of the initialization table.
Which seems to hint that the vma we are re-inserting for the ops unwind
is either invalid or overlapping with something already inserted in the
vm. It shouldn't be invalid since this is a re-insertion, so must have
worked before. Leaving the likely culprit as something already placed
where we want to insert the vma.
Following from that, for the case where we do something like a rebind in
the middle of a vma, and one or both mapped ends are already compatible,
we skip doing the rebind of those vma and set next/prev to NULL. As well
as then adjust the original unmap va range, to avoid unmapping the ends.
However, if we trigger the unwind path, we end up with three va, with
the two ends never being removed and the original va range in the middle
still being the shrunken size.
If this occurs, one failure mode is when another unwind op needs to
interact with that range, which can happen with a vector of binds. For
example, if we need to re-insert something in place of the original va.
In this case the va is still the shrunken version, so when removing it
and then doing a re-insert it can overlap with the ends, which were
never removed, triggering a warning like above, plus leaving the vm in a
bad state.
With that, we need two things here:
1) Stop nuking the prev/next tracking for the skip cases. Instead
relying on checking for skip prev/next, where needed. That way on the
unwind path, we now correctly remove both ends.
2) Undo the unmap va shrinkage, on the unwind path. With the two ends
now removed the unmap va should expand back to the original size again,
before re-insertion.
v2:
- Update the explanation in the commit message, based on an actual IGT of
triggering this issue, rather than conjecture.
- Also undo the unmap shrinkage, for the skip case. With the two ends
now removed, the original unmap va range should expand back to the
original range.
v3:
- Track the old start/range separately. vma_size/start() uses the va
info directly.
Tvrtko Ursulin [Tue, 24 Mar 2026 11:10:19 +0000 (11:10 +0000)]
drm/syncobj: Fix xa_alloc allocation flags
The xarray conversion blindly and wrongly replaced idr_alloc with xa_alloc
and kept the GFP_NOWAIT. It should have been GFP_KERNEL to account for
idr_preload it removed. Fix it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: fec2c3c01f1c ("drm/syncobj: Convert syncobj idr to xarray") Reported-by: Himanshu Girotra <himanshu.girotra@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himanshu Girotra <himanshu.girotra@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20260324111019.22467-1-tvrtko.ursulin@igalia.com
Miguel Ojeda [Wed, 25 Mar 2026 01:55:48 +0000 (02:55 +0100)]
dma-mapping: add missing `inline` for `dma_free_attrs`
Under an UML build for an upcoming series [1], I got `-Wstatic-in-inline`
for `dma_free_attrs`:
BINDGEN rust/bindings/bindings_generated.rs - due to target missing
In file included from rust/helpers/helpers.c:59:
rust/helpers/dma.c:17:2: warning: static function 'dma_free_attrs' is used in an inline function with external linkage [-Wstatic-in-inline]
17 | dma_free_attrs(dev, size, cpu_addr, dma_handle, attrs);
| ^
rust/helpers/dma.c:12:1: note: use 'static' to give inline function 'rust_helper_dma_free_attrs' internal linkage
12 | __rust_helper void rust_helper_dma_free_attrs(struct device *dev, size_t size,
| ^
| static
The issue is that `dma_free_attrs` was not marked `inline` when it was
introduced alongside the rest of the stubs.
Thus mark it.
Fixes: ed6ccf10f24b ("dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA") Closes: https://lore.kernel.org/rust-for-linux/20260322194616.89847-1-ojeda@kernel.org/ [1] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260325015548.70912-1-ojeda@kernel.org
Guangshuo Li [Mon, 23 Mar 2026 16:57:30 +0000 (00:57 +0800)]
net: mana: fix use-after-free in add_adev() error path
If auxiliary_device_add() fails, add_adev() jumps to add_fail and calls
auxiliary_device_uninit(adev).
The auxiliary device has its release callback set to adev_release(),
which frees the containing struct mana_adev. Since adev is embedded in
struct mana_adev, the subsequent fall-through to init_fail and access
to adev->id may result in a use-after-free.
Fix this by saving the allocated auxiliary device id in a local
variable before calling auxiliary_device_add(), and use that saved id
in the cleanup path after auxiliary_device_uninit().
Fixes: a69839d4327d ("net: mana: Add support for auxiliary device") Cc: stable@vger.kernel.org Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260323165730.945365-1-lgs201920130244@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonas Köppeler [Mon, 23 Mar 2026 17:49:20 +0000 (18:49 +0100)]
net_sched: codel: fix stale state for empty flows in fq_codel
When codel_dequeue() finds an empty queue, it resets vars->dropping
but does not reset vars->first_above_time. The reference CoDel
algorithm (Nichols & Jacobson, ACM Queue 2012) resets both:
dodeque_result codel_queue_t::dodeque(time_t now) {
...
if (r.p == NULL) {
first_above_time = 0; // <-- Linux omits this
}
...
}
Note that codel_should_drop() does reset first_above_time when called
with a NULL skb, but codel_dequeue() returns early before ever calling
codel_should_drop() in the empty-queue case. The post-drop code paths
do reach codel_should_drop(NULL) and correctly reset the timer, so a
dropped packet breaks the cycle -- but the next delivered packet
re-arms first_above_time and the cycle repeats.
For sparse flows such as ICMP ping (one packet every 200ms-1s), the
first packet arms first_above_time, the flow goes empty, and the
second packet arrives after the interval has elapsed and gets dropped.
The pattern repeats, producing sustained loss on flows that are not
actually congested.
Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the
consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms
target), ping at 5 pps under UDP flood:
Before fix: 26% ping packet loss
After fix: 0% ping packet loss
Fix by resetting first_above_time to zero in the empty-queue path
of codel_dequeue(), matching the reference algorithm.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Fixes: d068ca2ae2e6 ("codel: split into multiple files") Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reported-by: Chris Arges <carges@cloudflare.com> Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/all/20260318134826.1281205-7-hawk@kernel.org/ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260323174920.253526-1-hawk@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca [Mon, 23 Mar 2026 15:19:43 +0000 (16:19 +0100)]
rtnetlink: fix leak of SRCU struct in rtnl_link_register
Commit 6b57ff21a310 ("rtnetlink: Protect link_ops by mutex.") swapped
the EEXIST check with the init_srcu_struct, but didn't add cleanup of
the SRCU struct we just allocated in case of error.
net: lan743x: fix duplex configuration in mac_link_up
The driver does not explicitly configure the MAC duplex mode when
bringing the link up. As a result, the MAC may retain a stale duplex
setting from a previous link state, leading to duplex mismatches with
the link partner and degraded network performance.
Update lan743x_phylink_mac_link_up() to set or clear the MAC_CR_DPX_
bit according to the negotiated duplex mode.
This ensures the MAC configuration is consistent with the phylink
resolved state.
Fixes: a5f199a8d8a03 ("net: lan743x: Migrate phylib to phylink") Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260323065345.144915-1-thangaraj.s@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xietangxin [Thu, 12 Mar 2026 02:54:06 +0000 (10:54 +0800)]
virtio_net: Fix UAF on dst_ops when IFF_XMIT_DST_RELEASE is cleared and napi_tx is false
A UAF issue occurs when the virtio_net driver is configured with napi_tx=N
and the device's IFF_XMIT_DST_RELEASE flag is cleared
(e.g., during the configuration of tc route filter rules).
When IFF_XMIT_DST_RELEASE is removed from the net_device, the network stack
expects the driver to hold the reference to skb->dst until the packet
is fully transmitted and freed. In virtio_net with napi_tx=N,
skbs may remain in the virtio transmit ring for an extended period.
If the network namespace is destroyed while these skbs are still pending,
the corresponding dst_ops structure has freed. When a subsequent packet
is transmitted, free_old_xmit() is triggered to clean up old skbs.
It then calls dst_release() on the skb associated with the stale dst_entry.
Since the dst_ops (referenced by the dst_entry) has already been freed,
a UAF kernel paging request occurs.
fix it by adds skb_dst_drop(skb) in start_xmit to explicitly release
the dst reference before the skb is queued in virtio_net.
config_qdisc_route_filter() {
tc qdisc del dev $NETDEV root
tc qdisc add dev $NETDEV root handle 1: prio
tc filter add dev $NETDEV parent 1:0 \
protocol ip prio 100 route to 100 flowid 1:1
ip route add 192.168.1.100/32 dev $NETDEV realm 100
}
test_ns() {
ip netns add testns
ip link set $NETDEV netns testns
ip netns exec testns ifconfig $NETDEV 10.0.32.46/24
ip netns exec testns ping -c 1 10.0.32.1
ip netns del testns
}
Gao Xiang [Tue, 24 Mar 2026 15:54:07 +0000 (23:54 +0800)]
erofs: fix .fadvise() for page cache sharing
Currently, .fadvise() doesn't work well if page cache sharing is on
since shared inodes belong to a pseudo fs generated with init_pseudo(),
and sb->s_bdi is the default one &noop_backing_dev_info.
Then, generic_fadvise() will just behave as a no-op if sb->s_bdi is
&noop_backing_dev_info, but as the bdev fs (the bdev fs changes
inode_to_bdi() instead), it's actually NOT a pure memfs.
Let's generate a real bdi for erofs_ishare_mnt instead.
Fixes: d86d7817c042 ("erofs: implement .fadvise for page cache share") Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Tejun Heo [Tue, 24 Mar 2026 20:21:47 +0000 (10:21 -1000)]
selftests/cgroup: Don't require synchronous populated update on task exit
test_cgcore_populated (test_core) and test_cgkill_{simple,tree,forkbomb}
(test_kill) check cgroup.events "populated 0" immediately after reaping
child tasks with waitpid(). This used to work because cgroup_task_exit() in
do_exit() unlinked tasks from css_sets before exit_notify() woke up
waitpid().
d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done
switching out") moved the unlink to cgroup_task_dead() in
finish_task_switch(), which runs after exit_notify(). The populated counter
is now decremented after the parent's waitpid() can return, so there is no
longer a synchronous ordering guarantee. On PREEMPT_RT, where
cgroup_task_dead() is further deferred through lazy irq_work, the race
window is even larger.
The synchronous populated transition was never part of the cgroup interface
contract - it was an implementation artifact. Use cg_read_strcmp_wait() which
retries for up to 1 second, matching what these tests actually need to
verify: that the cgroup eventually becomes unpopulated after all tasks exit.
Fixes: d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Christian Brauner <brauner@kernel.org> Cc: cgroups@vger.kernel.org
Tejun Heo [Tue, 24 Mar 2026 20:21:25 +0000 (10:21 -1000)]
cgroup: Wait for dying tasks to leave on rmdir
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") hid PF_EXITING
tasks from cgroup.procs so that systemd doesn't see tasks that have already
been reaped via waitpid(). However, the populated counter (nr_populated_csets)
is only decremented when the task later passes through cgroup_task_dead() in
finish_task_switch(). This means cgroup.procs can appear empty while the
cgroup is still populated, causing rmdir to fail with -EBUSY.
Fix this by making cgroup_rmdir() wait for dying tasks to fully leave. If the
cgroup is populated but all remaining tasks have PF_EXITING set (the task
iterator returns none due to the existing filter), wait for a kick from
cgroup_task_dead() and retry. The wait is brief as tasks are removed from the
cgroup's css_set between PF_EXITING assertion in do_exit() and
cgroup_task_dead() in finish_task_switch().
v2: cgroup_is_populated() true to false transition happens under css_set_lock
not cgroup_mutex, so retest under css_set_lock before sleeping to avoid
missed wakeups (Sebastian).
Fixes: a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202603222104.2c81684e-lkp@intel.com Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Michal Koutny <mkoutny@suse.com> Cc: cgroups@vger.kernel.org
s390/zcrypt: Fix memory leak with CCA cards used as accelerator
Tests showed that there is a memory leak if CCA cards are used as
accelerator for clear key RSA requests (ME and CRT). With the last
rework for the memory allocation the AP messages are allocated by
ap_init_apmsg() but for some reason on two places (ME and CRT) the
older allocation was still in place. So the first allocation simple
was never freed.
Thomas Richter [Fri, 6 Mar 2026 12:50:31 +0000 (13:50 +0100)]
s390/cpum_sf: Cap sampling rate to prevent lsctl exception
commit fcc43a7e294f ("s390/configs: Set HZ=1000") changed the interrupt
frequency of the system. On machines with heavy load and many perf event
overflows, this might lead to an exception. Dmesg displays these entries:
[112.242542] cpum_sf: Loading sampling controls failed: op 1 err -22
One line per CPU online.
The root cause is the CPU Measurement sampling facility overflow
adjustment. Whenever an overflow (too much samples per tick) occurs, the
sampling rate is adjusted and increased. This was done without observing
the maximum sampling rate limit. When the current sampling interval is
higher than the maximum sampling rate limit, the lsctl instruction raises
an exception. The error messages is the result of such an exception.
Observe the upper limit when the new sampling rate is recalculated.
Cc: stable@vger.kernel.org Fixes: 39d4a501a9ef ("s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
landlock: Expand restrict flags example for ABI version 8
Add LANDLOCK_RESTRICT_SELF_TSYNC to the backwards compatibility example
for restrict flags. This introduces completeness, similar to that of
the ruleset attributes example. However, as the new example can impact
enforcement in certain cases, an appropriate warning is also included.
Additionally, I modified the two comments of the example to make them
more consistent with the ruleset attributes example's.
Linus Torvalds [Tue, 24 Mar 2026 19:41:29 +0000 (12:41 -0700)]
Merge tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang:
- Adjust the startup priority of cxl_pmem to be higher than that of
cxl_acpi
- Use proper endpoint validity check upon sanitize
- Avoid incorrect DVSEC fallback when HDM decoders are enabled
- Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch
- Fix leakage in __construct_region()
- Fix use after free of parent_port in cxl_detach_ep()
* tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Adjust the startup priority of cxl_pmem to be higher than that of cxl_acpi
cxl/mbox: Use proper endpoint validity check upon sanitize
cxl/hdm: Avoid incorrect DVSEC fallback when HDM decoders are enabled
cxl/acpi: Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch
cxl/region: Fix leakage in __construct_region()
cxl/port: Fix use after free of parent_port in cxl_detach_ep()
thermal: intel: int340x: soc_slider: Set offset only for balanced mode
The slider offset can be set via debugfs for balanced mode. The offset
should be only applicable in balanced mode. For other modes, it should
be 0 when writing to MMIO offset,
Fixes: 8306bcaba06d ("thermal: intel: int340x: Add module parameter to change slider offset") Tested-by: Erin Park <erin.park@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 6.18+ <stable@vger.kernel.org> # 6.18+
[ rjw: Subject and changelog tweaks ] Link: https://patch.msgid.link/20260324172346.3317145-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Alex Deucher [Thu, 26 Feb 2026 22:12:08 +0000 (17:12 -0500)]
drm/amd/display: Fix DCE LVDS handling
LVDS does not use an HPD pin so it may be invalid. Handle
this case correctly in link encoder creation.
Fixes: 7c8fb3b8e9ba ("drm/amd/display: Add hpd_source index check for DCE60/80/100/110/112/120 link encoders") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5012 Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Roman Li <roman.li@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3b5620f7ee688177fcf65cf61588c5435bce1872) Cc: stable@vger.kernel.org
Donet Tom [Mon, 23 Mar 2026 04:28:36 +0000 (09:58 +0530)]
drm/amdgpu: Handle GPU page faults correctly on non-4K page systems
During a GPU page fault, the driver restores the SVM range and then maps it
into the GPU page tables. The current implementation passes a GPU-page-size
(4K-based) PFN to svm_range_restore_pages() to restore the range.
SVM ranges are tracked using system-page-size PFNs. On systems where the
system page size is larger than 4K, using GPU-page-size PFNs to restore the
range causes two problems:
Range lookup fails:
Because the restore function receives PFNs in GPU (4K) units, the SVM
range lookup does not find the existing range. This will result in a
duplicate SVM range being created.
VMA lookup failure:
The restore function also tries to locate the VMA for the faulting address.
It converts the GPU-page-size PFN into an address using the system page
size, which results in an incorrect address on non-4K page-size systems.
As a result, the VMA lookup fails with the message: "address 0xxxx VMA is
removed".
This patch passes the system-page-size PFN to svm_range_restore_pages() so
that the SVM range is restored correctly on non-4K page systems.
Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 074fe395fb13247b057f60004c7ebcca9f38ef46)
Yang Wang [Thu, 19 Mar 2026 07:36:50 +0000 (03:36 -0400)]
drm/amd/pm: disable OD_FAN_CURVE if temp or pwm range invalid for smu v14
Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid,
otherwise PMFW will reject this configuration on smu v14.0.2/14.0.3.
kernel log:
[ 969.761627] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
[ 1010.897800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ab4905d466b60f170d85e19ca2a5d2b159aeb780) Cc: stable@vger.kernel.org
drm/amdkfd: Fix NULL pointer check order in kfd_ioctl_create_process
In kfd_ioctl_create_process(), the pointer 'p' is used before checking
if it is NULL.
The code accesses p->context_id before validating 'p'. This can lead
to a possible NULL pointer dereference.
Move the NULL check before using 'p' so that the pointer is validated
before access.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.c:3177 kfd_ioctl_create_process() warn: variable dereferenced before check 'p' (see line 3174)
Fixes: cc6b66d661fd ("amdkfd: introduce new ioctl AMDKFD_IOC_CREATE_PROCESS") Cc: Zhu Lingshan <lingshan.zhu@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 19d4149b22f57094bfc4b86b742381b3ca394ead)
Fixes: 9ae55f030dc5 ("drm/amdgpu: Follow up change to previous drm scheduler change.") Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8b9e5259adc385b61a6590a13b82ae0ac2bd3482)
Weiming Shi [Tue, 24 Mar 2026 16:54:59 +0000 (00:54 +0800)]
ACPI: EC: clean up handlers on probe failure in acpi_ec_setup()
When ec_install_handlers() returns -EPROBE_DEFER on reduced-hardware
platforms, it has already started the EC and installed the address
space handler with the struct acpi_ec pointer as handler context.
However, acpi_ec_setup() propagates the error without any cleanup.
The caller acpi_ec_add() then frees the struct acpi_ec for non-boot
instances, leaving a dangling handler context in ACPICA.
Any subsequent AML evaluation that accesses an EC OpRegion field
dispatches into acpi_ec_space_handler() with the freed pointer,
causing a use-after-free:
BUG: KASAN: slab-use-after-free in mutex_lock (kernel/locking/mutex.c:289)
Write of size 8 at addr ffff88800721de38 by task init/1
Call Trace:
<TASK>
mutex_lock (kernel/locking/mutex.c:289)
acpi_ec_space_handler (drivers/acpi/ec.c:1362)
acpi_ev_address_space_dispatch (drivers/acpi/acpica/evregion.c:293)
acpi_ex_access_region (drivers/acpi/acpica/exfldio.c:246)
acpi_ex_field_datum_io (drivers/acpi/acpica/exfldio.c:509)
acpi_ex_extract_from_field (drivers/acpi/acpica/exfldio.c:700)
acpi_ex_read_data_from_field (drivers/acpi/acpica/exfield.c:327)
acpi_ex_resolve_node_to_value (drivers/acpi/acpica/exresolv.c:392)
</TASK>
Allocated by task 1:
acpi_ec_alloc (drivers/acpi/ec.c:1424)
acpi_ec_add (drivers/acpi/ec.c:1692)
Freed by task 1:
kfree (mm/slub.c:6876)
acpi_ec_add (drivers/acpi/ec.c:1751)
The bug triggers on reduced-hardware EC platforms (ec->gpe < 0)
when the GPIO IRQ provider defers probing. Once the stale handler
exists, any unprivileged sysfs read that causes AML to touch an
EC OpRegion (battery, thermal, backlight) exercises the dangling
pointer.
Fix this by calling ec_remove_handlers() in the error path of
acpi_ec_setup() before clearing first_ec. ec_remove_handlers()
checks each EC_FLAGS_* bit before acting, so it is safe to call
regardless of how far ec_install_handlers() progressed:
-ENODEV (handler not installed): only calls acpi_ec_stop()
-EPROBE_DEFER (handler installed): removes handler, stops EC
Fixes: 03e9a0e05739 ("ACPI: EC: Consolidate event handler installation code") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Link: https://patch.msgid.link/20260324165458.1337233-2-bestswngs@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Tue, 24 Mar 2026 16:12:45 +0000 (09:12 -0700)]
Merge tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"6 hotfixes. 2 are cc:stable. All are for MM.
All are singletons - please see the changelogs for details"
* tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/stat: monitor all System RAM resources
mm/zswap: add missing kunmap_local()
mailmap: update email address for Muhammad Usama Anjum
zram: do not slot_free() written-back slots
mm/damon/core: avoid use of half-online-committed context
mm/rmap: clear vma->anon_vma on error
Linus Torvalds [Tue, 24 Mar 2026 15:58:38 +0000 (08:58 -0700)]
Merge tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix parsing 'overwrite' in command line event definitions in
big-endian machines by writing correct union member
- Fix finding default metric in 'perf stat'
- Fix relative paths for including headers in 'perf kvm stat'
- Sync header copies with the kernel sources: msr-index.h, kvm,
build_bug.h
* tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools headers: Synchronize linux/build_bug.h with the kernel sources
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf kvm stat: Fix relative paths for including headers
perf parse-events: Fix big-endian 'overwrite' by writing correct union member
perf metricgroup: Fix metricgroup__has_metric_or_groups()
tools headers: Skip arm64 cputype.h check
Amir Goldstein [Sun, 8 Mar 2026 11:02:21 +0000 (12:02 +0100)]
ovl: fix wrong detection of 32bit inode numbers
The implicit FILEID_INO32_GEN encoder was changed to be explicit,
so we need to fix the detection.
When mounting overlayfs with upperdir and lowerdir on different ext4
filesystems, the expected kmsg log is:
overlayfs: "xino" feature enabled using 32 upper inode bits.
But instead, since the regressing commit, the kmsg log was:
overlayfs: "xino" feature enabled using 2 upper inode bits.
Fixes: e21fc2038c1b9 ("exportfs: make ->encode_fh() a mandatory method for NFS export") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Amir Goldstein <amir73il@gmail.com>
wifi: iwlwifi: mvm: fix potential out-of-bounds read in iwl_mvm_nd_match_info_handler()
The memcpy function assumes the dynamic array notif->matches is at least
as large as the number of bytes to copy. Otherwise, results->matches may
contain unwanted data. To guarantee safety, extend the validation in one
of the checks to ensure sufficient packet length.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable@vger.kernel.org Fixes: 5ac54afd4d97 ("wifi: iwlwifi: mvm: Add handling for scan offload match info notification") Signed-off-by: Alexey Velichayshiy <a.velichayshiy@ispras.ru> Link: https://patch.msgid.link/20260207150335.1013646-1-a.velichayshiy@ispras.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Danilo Krummrich [Tue, 24 Mar 2026 00:59:15 +0000 (01:59 +0100)]
spi: use generic driver_override infrastructure
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Also note that we do not enable the driver_override feature of struct
bus_type, as SPI - in contrast to most other buses - passes "" to
sysfs_emit() when the driver_override pointer is NULL. Thus, printing
"\n" instead of "(null)\n".
Sanman Pradhan [Mon, 23 Mar 2026 00:24:37 +0000 (00:24 +0000)]
hwmon: (peci/cputemp) Fix off-by-one in cputemp_is_visible()
cputemp_is_visible() validates the channel index against
CPUTEMP_CHANNEL_NUMS, but currently uses '>' instead of '>='.
As a result, channel == CPUTEMP_CHANNEL_NUMS is not rejected even though
valid indices are 0 .. CPUTEMP_CHANNEL_NUMS - 1.
Fix the bounds check by using '>=' so invalid channel indices are
rejected before indexing the core bitmap.
Sanman Pradhan [Mon, 23 Mar 2026 00:24:25 +0000 (00:24 +0000)]
hwmon: (peci/cputemp) Fix crit_hyst returning delta instead of absolute temperature
The hwmon sysfs ABI expects tempN_crit_hyst to report the temperature at
which the critical condition clears, not the hysteresis delta from the
critical limit.
The peci cputemp driver currently returns tjmax - tcontrol for
crit_hyst_type, which is the hysteresis margin rather than the
corresponding absolute temperature.
Return tcontrol directly, and update the documentation accordingly.
Sanman Pradhan [Thu, 19 Mar 2026 17:31:29 +0000 (17:31 +0000)]
hwmon: (pmbus/isl68137) Add mutex protection for AVS enable sysfs attributes
The custom avs0_enable and avs1_enable sysfs attributes access PMBus
registers through the exported API helpers (pmbus_read_byte_data,
pmbus_read_word_data, pmbus_write_word_data, pmbus_update_byte_data)
without holding the PMBus update_lock mutex. These exported helpers do
not acquire the mutex internally, unlike the core's internal callers
which hold the lock before invoking them.
The store callback is especially vulnerable: it performs a multi-step
read-modify-write sequence (read VOUT_COMMAND, write VOUT_COMMAND, then
update OPERATION) where concurrent access from another thread could
interleave and corrupt the register state.
Add pmbus_lock_interruptible()/pmbus_unlock() around both the show and
store callbacks to serialize PMBus register access with the rest of the
driver.
Sanman Pradhan [Thu, 19 Mar 2026 17:31:19 +0000 (17:31 +0000)]
hwmon: (pmbus/ina233) Fix error handling and sign extension in shunt voltage read
ina233_read_word_data() reads MFR_READ_VSHUNT via pmbus_read_word_data()
but has two issues:
1. The return value is not checked for errors before being used in
arithmetic. A negative error code from a failed I2C transaction is
passed directly to DIV_ROUND_CLOSEST(), producing garbage data.
2. MFR_READ_VSHUNT is a 16-bit two's complement value. Negative shunt
voltages (values with bit 15 set) are treated as large positive
values since pmbus_read_word_data() returns them zero-extended in an
int. This leads to incorrect scaling in the VIN coefficient
conversion.
Fix both issues by adding an error check, casting to s16 for proper
sign extension, and clamping the result to a valid non-negative range.
The clamp is necessary because read_word_data callbacks must return
non-negative values on success (negative values indicate errors to the
pmbus core).
Fixes: b64b6cb163f16 ("hwmon: Add driver for TI INA233 Current and Power Monitor") Cc: stable@vger.kernel.org Signed-off-by: Sanman Pradhan <psanman@juniper.net> Link: https://lore.kernel.org/r/20260319173055.125271-2-sanman.pradhan@hpe.com
[groeck: Fixed clamp to avoid losing the sign bit] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
- Fix MLO scan timing (record the scan start in FW)
- don't send a 6E related command when not supported
- correctly set wifi generation data
====================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pengpeng Hou [Mon, 23 Mar 2026 08:08:45 +0000 (16:08 +0800)]
wifi: wl1251: validate packet IDs before indexing tx_frames
wl1251_tx_packet_cb() uses the firmware completion ID directly to index
the fixed 16-entry wl->tx_frames[] array. The ID is a raw u8 from the
completion block, and the callback does not currently verify that it
fits the array before dereferencing it.
Reject completion IDs that fall outside wl->tx_frames[] and keep the
existing NULL check in the same guard. This keeps the fix local to the
trust boundary and avoids touching the rest of the completion flow.
The variable valuesize is declared as u8 but accumulates the total
length of all SSIDs to scan. Each SSID contributes up to 33 bytes
(IEEE80211_MAX_SSID_LEN + 1), and with WILC_MAX_NUM_PROBED_SSID (10)
SSIDs the total can reach 330, which wraps around to 74 when stored
in a u8.
This causes kmalloc to allocate only 75 bytes while the subsequent
memcpy writes up to 331 bytes into the buffer, resulting in a 256-byte
heap buffer overflow.
Widen valuesize from u8 to u32 to accommodate the full range.
1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi.
From Sabrina Dubroca.
2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the
proper check. From Sabrina Dubroca.
3) Call xdo_dev_state_delete during state update to properly cleanup
the xdo device state. From Sabrina Dubroca.
4) Fix a potential skb leak in espintcp when async crypto is used.
From Sabrina Dubroca.
5) Validate inner IPv4 header length in IPTFS payload to avoid
parsing malformed packets. From Roshan Kumar.
6) Fix skb_put() panic on non-linear skb during IPTFS reassembly.
From Fernando Fernandez Mancera.
7) Silence various sparse warnings related to RCU, state, and policy
handling. From Sabrina Dubroca.
8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini().
From Hyunwoo Kim.
9) Prevent policy_hthresh.work from racing with netns teardown by using
a proper cleanup mechanism. From Minwoo Ra.
10) Validate that the family of the source and destination addresses match
in pfkey_send_migrate(). From Eric Dumazet.
11) Only publish mode_data after the clone is setup in the IPTFS receive path.
This prevents leaving x->mode_data pointing at freed memory on error.
From Paul Moses.
Please pull or let me know if there are problems.
ipsec-2026-03-23
* tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: iptfs: only publish mode_data after clone setup
af_key: validate families in pfkey_send_migrate()
xfrm: prevent policy_hthresh.work from racing with netns teardown
xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
xfrm: avoid RCU warnings around the per-netns netlink socket
xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo
xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo
xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini}
xfrm: state: silence sparse warnings during netns exit
xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto
xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock
xfrm: state: fix sparse warnings around XFRM_STATE_INSERT
xfrm: state: fix sparse warnings in xfrm_state_init
xfrm: state: fix sparse warnings on xfrm_state_hold_rcu
xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly
xfrm: iptfs: validate inner IPv4 header length in IPTFS payload
esp: fix skb leak with espintcp and async crypto
xfrm: call xdo_dev_state_delete during state update
xfrm: fix the condition on x->pcpu_num in xfrm_sa_len
xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi
====================
wifi: ath12k: Pass the correct value of each TID during a stop AMPDU session
With traffic ongoing for data TID [TID 0], an DELBA request to
stop AMPDU for the BA session was received on management TID [TID 4].
The corresponding TID number was incorrectly passed to stop the BA session,
resulting in the BA session for data TIDs being stopped and the BA size
being reduced to 1, causing an overall dip in TCP throughput.
Fix this issue by passing the correct argument from
ath12k_dp_rx_ampdu_stop() to ath12k_dp_arch_peer_rx_tid_reo_update()
during an AMPDU stop session. Instead of passing peer->dp_peer->rx_tid,
which is the base address of the array, corresponding to TID 0, pass
the value of &peer->dp_peer->rx_tid[params->tid]. With this, the
different TID numbers are accounted for.
wifi: ath11k: Pass the correct value of each TID during a stop AMPDU session
During ongoing traffic, a request to stop an AMPDU session
for one TID could incorrectly affect other active sessions.
This can happen because an incorrect TID reference would be
passed when updating the BA session state, causing the wrong
session to be stopped. As a result, the affected session would
be reduced to a minimal BA size, leading to a noticeable
throughput degradation.
Fix this issue by passing the correct argument from
ath11k_dp_rx_ampdu_stop() to ath11k_peer_rx_tid_reo_update()
during a stop AMPDU session. Instead of passing peer->tx_tid, which
is the base address of the array, corresponding to TID 0; pass
the value of &peer->rx_tid[params->tid], where the different TID numbers
are accounted for.
Matt Roper [Thu, 19 Mar 2026 22:30:34 +0000 (15:30 -0700)]
drm/xe: Implement recent spec updates to Wa_16025250150
The hardware teams noticed that the originally documented workaround
steps for Wa_16025250150 may not be sufficient to fully avoid a hardware
issue. The workaround documentation has been augmented to suggest
programming one additional register; make the corresponding change in
the driver.
Looks like we ended up with a typo during device tree data parsing
as part of 4f16b6351bbff ("ASoC: codecs: wcd: add common helper for wcd
codecs") patch.
This will result in not parsing the device tree data and results in
zero mic bias values.
Fix this by calling wcd_dt_parse_micbias_info instead of
wcd_dt_parse_mbhc_data.
Fixes: 4f16b6351bbff ("ASoC: codecs: wcd: add common helper for wcd codecs") Cc: Stable@vger.kernel.org Reported-by: Joel Selvaraj <foss@joelselvaraj.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://patch.msgid.link/20260323231748.2217967-1-srinivas.kandagatla@oss.qualcomm.com Signed-off-by: Mark Brown <broonie@kernel.org>
Alice Ryhl [Tue, 24 Mar 2026 10:49:59 +0000 (10:49 +0000)]
rust: regulator: do not assume that regulator_get() returns non-null
The Rust `Regulator` abstraction uses `NonNull` to wrap the underlying
`struct regulator` pointer. When `CONFIG_REGULATOR` is disabled, the C
stub for `regulator_get` returns `NULL`. `from_err_ptr` does not treat
`NULL` as an error, so it was passed to `NonNull::new_unchecked`,
causing undefined behavior.
Fix this by using a raw pointer `*mut bindings::regulator` instead of
`NonNull`. This allows `inner` to be `NULL` when `CONFIG_REGULATOR` is
disabled, and leverages the C stubs which are designed to handle `NULL`
or are no-ops.
Fixes: 9b614ceada7c ("rust: regulator: add a bare minimum regulator abstraction") Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://lore.kernel.org/r/20260322193830.89324-1-ojeda@kernel.org Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Link: https://patch.msgid.link/20260324-regulator-fix-v1-1-a5244afa3c15@google.com Signed-off-by: Mark Brown <broonie@kernel.org>
Jihed Chaibi [Sat, 21 Mar 2026 01:20:11 +0000 (02:20 +0100)]
ASoC: dt-bindings: stm32: Fix incorrect compatible string in stm32h7-sai match
The conditional block that defines clock constraints for the stm32h7-sai
variant references "st,stm32mph7-sai", which does not match any compatible
string in the enum. As a result, clock validation for the h7 variant is
silently skipped. Correct the compatible string to "st,stm32h7-sai".
Fixes: 8509bb1f11a1f ("ASoC: dt-bindings: add stm32mp25 support for sai") Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com> Reviewed-by: Olivier Moysan <olivier.moysan@foss.st.com> Link: https://patch.msgid.link/20260321012011.125791-1-jihed.chaibi.dev@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
Johannes Berg [Tue, 24 Mar 2026 09:33:26 +0000 (11:33 +0200)]
wifi: iwlwifi: mld: correctly set wifi generation data
In each MAC context, the firmware expects the wifi generation
data, i.e. whether or not HE/EHT (and in the future UHR) is
enabled on that MAC.
However, this is currently handled wrong in two ways:
- EHT is only enabled when the interface is also an MLD, but
we currently allow (despite the spec) connecting with EHT
but without MLO.
- when HE or EHT are used by TDLS peers, the firmware needs
to have them enabled regardless of the AP
Fix this by iterating setting up the data depending on the
interface type:
- for AP, just set it according to the BSS configuration
- for monitor, set it according to HW capabilities
- otherwise, particularly for client, iterate all stations
and then their links on the interface in question and set
according to their capabilities, this handles the AP and
TDLS peers. Re-calculate this whenever a TDLS station is
marked associated or removed so that it's kept updated,
for the AP it's already updated on assoc/disassoc.
wifi: iwlwifi: mvm: don't send a 6E related command when not supported
MCC_ALLOWED_AP_TYPE_CMD is related to 6E support. Do not send it if the
device doesn't support 6E.
Apparently, the firmware is mistakenly advertising support for this
command even on AX201 which does not support 6E and then the firmware
crashes.
Fixes: 0d2fc8821a7d ("wifi: iwlwifi: nvm: parse the VLP/AFC bit from regulatory") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220804 Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260324113316.e171f0163f2a.I0c444d1f82d1773054e7ffc391ad49697d58f44e@changeid
Calculate MLO scan start time based on actual
scan start notification from firmware instead of recording
time when scan command is sent.
Currently, MLO scan start time was captured immediately
after sending the scan command to firmware. However, the
actual scan start time may differ due to the FW being busy
with a previous scan.
In that case, the link selection code will think that the MLO
scan is too old, and will warn.
To fix it, Implement start scan notification handling to
capture the precise moment when firmware begins the scan
operation.
Willem de Bruijn [Fri, 20 Mar 2026 19:01:46 +0000 (15:01 -0400)]
net: correctly handle tunneled traffic on IPV6_CSUM GSO fallback
NETIF_F_IPV6_CSUM only advertises support for checksum offload of
packets without IPv6 extension headers. Packets with extension
headers must fall back onto software checksumming. Since TSO
depends on checksum offload, those must revert to GSO.
The below commit introduces that fallback. It always checks
network header length. For tunneled packets, the inner header length
must be checked instead. Extend the check accordingly.
A special case is tunneled packets without inner IP protocol. Such as
RFC 6951 SCTP in UDP. Those are not standard IPv6 followed by
transport header either, so also must revert to the software GSO path.
Cc: stable@vger.kernel.org Fixes: 864e3396976e ("net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM") Reported-by: Tangxin Xie <xietangxin@yeah.net> Closes: https://lore.kernel.org/netdev/0414e7e2-9a1c-4d7c-a99d-b9039cf68f40@yeah.net/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260320190148.2409107-1-willemdebruijn.kernel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* tag 'linux-can-fixes-for-7.0-20260323' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: isotp: fix tx.buf use-after-free in isotp_sendmsg()
can: gw: fix OOB heap access in cgw_csum_crc8_rel()
can: statistics: add missing atomic access in hot path
can: mcp251x: add error handling for power enable in open and resume
can: netlink: can_changelink(): add missing error handling to call can_ctrlmode_changelink()
====================
David Carlier [Fri, 20 Mar 2026 17:44:39 +0000 (17:44 +0000)]
net: ti: icssg-prueth: fix use-after-free of CPPI descriptor in RX path
cppi5_hdesc_get_psdata() returns a pointer into the CPPI descriptor.
In both emac_rx_packet() and emac_rx_packet_zc(), the descriptor is
freed via k3_cppi_desc_pool_free() before the psdata pointer is used
by emac_rx_timestamp(), which dereferences psdata[0] and psdata[1].
This constitutes a use-after-free on every received packet that goes
through the timestamp path.
Defer the descriptor free until after all accesses through the psdata
pointer are complete. For emac_rx_packet(), move the free into the
requeue label so both early-exit and success paths free the descriptor
after all accesses are done. For emac_rx_packet_zc(), move the free to
the end of the loop body after emac_dispatch_skb_zc() (which calls
emac_rx_timestamp()) has returned.
Fixes: 46eeb90f03e0 ("net: ti: icssg-prueth: Use page_pool API for RX buffer allocation") Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260320174439.41080-1-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
team: fix header_ops type confusion and add selftest
Hi,
This patch series fixes a panic reported by syzkaller in the team/bond/gre
stacked non-Ethernet configuration:
https://syzkaller.appspot.com/bug?extid=3d8bc31c45e11450f24c
The first patch fixes the header_ops type confusion / parse recursion
context issue in team. The second patch adds a selftest to reproduce the
reported scenario and prevent regressions in the future.
Add a team selftest that sets up:
g0 (gre) -> b0 (bond) -> t0 (team)
and triggers IPv6 traffic on t0. This reproduces the non-Ethernet
header_ops confusion scenario and protects against regressions in stacked
team/bond/gre configurations.
Using this script, the panic reported by syzkaller can be reproduced [1].
After the fix:
# ./non_ether_header_ops.sh
PASS: non-Ethernet header_ops stacking did not crash
Jiayuan Chen [Fri, 20 Mar 2026 07:21:26 +0000 (15:21 +0800)]
team: fix header_ops type confusion with non-Ethernet ports
Similar to commit 950803f72547 ("bonding: fix type confusion in
bond_setup_by_slave()") team has the same class of header_ops type
confusion.
For non-Ethernet ports, team_setup_by_port() copies port_dev->header_ops
directly. When the team device later calls dev_hard_header() or
dev_parse_header(), these callbacks can run with the team net_device
instead of the real lower device, so netdev_priv(dev) is interpreted as
the wrong private type and can crash.
The syzbot report shows a crash in bond_header_create(), but the root
cause is in team: the topology is gre -> bond -> team, and team calls
the inherited header_ops with its own net_device instead of the lower
device, so bond_header_create() receives a team device and interprets
netdev_priv() as bonding private data, causing a type confusion crash.
Fix this by introducing team header_ops wrappers for create/parse,
selecting a team port under RCU, and calling the lower device callbacks
with port->dev, so each callback always sees the correct net_device
context.
Also pass the selected lower device to the lower parse callback, so
recursion is bounded in stacked non-Ethernet topologies and parse
callbacks always run with the correct device context.
Fixes: 1d76efe1577b ("team: add support for non-ethernet devices") Reported-by: syzbot+3d8bc31c45e11450f24c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69b46af7.050a0220.36eb34.000e.GAE@google.com/T/ Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://patch.msgid.link/20260320072139.134249-2-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
virtio-net: fix for VIRTIO_NET_F_GUEST_HDRLEN
The commit be50da3e9d4a ("net: virtio_net: implement exact header length
guest feature") introduces support for the VIRTIO_NET_F_GUEST_HDRLEN
feature in virtio-net.
This feature requires virtio-net to set hdr_len to the actual header
length of the packet when transmitting, the number of
bytes from the start of the packet to the beginning of the
transport-layer payload.
However, in practice, hdr_len was being set using skb_headlen(skb),
which is clearly incorrect. This path set fixes that issue.
As discussed in [0], this version checks the VIRTIO_NET_F_GUEST_HDRLEN is
negotiated.
Xuan Zhuo [Fri, 20 Mar 2026 02:18:18 +0000 (10:18 +0800)]
virtio-net: correct hdr_len handling for tunnel gso
The commit a2fb4bc4e2a6a03 ("net: implement virtio helpers to handle UDP
GSO tunneling.") introduces support for the UDP GSO tunnel feature in
virtio-net.
The virtio spec says:
If the \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or
VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, \field{hdr_len} accounts for
all the headers up to and including the inner transport.
The commit did not update the hdr_len to include the inner transport.
I observed that the "hdr_len" is 116 for this packet:
Xuan Zhuo [Fri, 20 Mar 2026 02:18:17 +0000 (10:18 +0800)]
virtio-net: correct hdr_len handling for VIRTIO_NET_F_GUEST_HDRLEN
The commit be50da3e9d4a ("net: virtio_net: implement exact header length
guest feature") introduces support for the VIRTIO_NET_F_GUEST_HDRLEN
feature in virtio-net.
This feature requires virtio-net to set hdr_len to the actual header
length of the packet when transmitting, the number of
bytes from the start of the packet to the beginning of the
transport-layer payload.
However, in practice, hdr_len was being set using skb_headlen(skb),
which is clearly incorrect. This commit fixes that issue.
Fixes: be50da3e9d4a ("net: virtio_net: implement exact header length guest feature") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20260320021818.111741-2-xuanzhuo@linux.alibaba.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Karol Wachowski [Mon, 23 Mar 2026 09:50:29 +0000 (10:50 +0100)]
accel/ivpu: Add disable clock relinquish workaround for NVL-A0
Turn on disable clock relinquish workaround for Nova Lake A0.
Without this workaround NPU may not power off correctly after
inference, leading to unexpected system behavior.
Darrick J. Wong [Mon, 23 Mar 2026 21:00:17 +0000 (14:00 -0700)]
iomap: fix lockdep complaint when reads fail
Zorro Lang reported the following lockdep splat:
"While running fstests xfs/556 on kernel 7.0.0-rc4+ (HEAD=04a9f1766954),
a lockdep warning was triggered indicating an inconsistent lock state
for sb->s_type->i_lock_key.
"The deadlock might occur because iomap_read_end_io (called from a
hardware interrupt completion path) invokes fserror_report, which then
calls igrab. igrab attempts to acquire the i_lock spinlock. However,
the i_lock is frequently acquired in process context with interrupts
enabled. If an interrupt occurs while a process holds the i_lock, and
that interrupt handler calls fserror_report, the system deadlocks.
"I hit this warning several times by running xfs/556 (mostly) or
generic/648 on xfs. More details refer to below console log."
along with this dmesg, for which I've cleaned up the stacktraces:
run fstests xfs/556 at 2026-03-18 20:05:30
XFS (sda3): Mounting V5 Filesystem 396e9164-c45a-4e05-be9d-b38c2c5c6477
XFS (sda3): Ending clean mount
XFS (sda3): Unmounting Filesystem 396e9164-c45a-4e05-be9d-b38c2c5c6477
XFS (sda3): Mounting V5 Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (sda3): Ending clean mount
XFS (sda3): Unmounting Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Mounting V5 Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Ending clean mount
device-mapper: table: 253:0: adding target device (start sect 209 len 1) caused an alignment inconsistency
device-mapper: table: 253:0: adding target device (start sect 210 len 62914350) caused an alignment inconsistency
buffer_io_error: 6 callbacks suppressed
Buffer I/O error on dev dm-0, logical block 209, async page read
Buffer I/O error on dev dm-0, logical block 209, async page read
XFS (dm-0): Unmounting Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Mounting V5 Filesystem bf3f89c3-3c45-4650-a9c7-744f39c0191e
XFS (dm-0): Ending clean mount
================================
WARNING: inconsistent lock state
7.0.0-rc4+ #1 Tainted: G S W
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
od/2368602 [HC1[1]:SC0[0]:HE0:SE1] takes: ff1100069f2b4a98 (&sb->s_type->i_lock_key#31){?.+.}-{3:3}, at: igrab+0x28/0x1a0
{HARDIRQ-ON-W} state was registered at:
__lock_acquire+0x40d/0xbd0
lock_acquire.part.0+0xbd/0x260
_raw_spin_lock+0x37/0x80
unlock_new_inode+0x66/0x2a0
xfs_iget+0x67b/0x7b0 [xfs]
xfs_mountfs+0xde4/0x1c80 [xfs]
xfs_fs_fill_super+0xe86/0x17a0 [xfs]
get_tree_bdev_flags+0x312/0x590
vfs_get_tree+0x8d/0x2f0
vfs_cmd_create+0xb2/0x240
__do_sys_fsconfig+0x3d8/0x9a0
do_syscall_64+0x13a/0x1520
entry_SYSCALL_64_after_hwframe+0x76/0x7e
irq event stamp: 3118
hardirqs last enabled at (3117): [<ffffffffb54e4ad8>] _raw_spin_unlock_irq+0x28/0x50
hardirqs last disabled at (3118): [<ffffffffb54b84c9>] common_interrupt+0x19/0xe0
softirqs last enabled at (3040): [<ffffffffb290ca28>] handle_softirqs+0x6b8/0x950
softirqs last disabled at (3023): [<ffffffffb290ce4d>] __irq_exit_rcu+0xfd/0x250
other info that might help us debug this:
Possible unsafe locking scenario:
Zorro's diagnosis makes sense, so the solution is to kick the failed
read handling to a workqueue much like we added for writeback ioends in
commit 294f54f849d846 ("fserror: fix lockdep complaint when igrabbing
inode").
Imre Deak [Fri, 20 Mar 2026 09:29:00 +0000 (11:29 +0200)]
drm/i915/dp_tunnel: Fix error handling when clearing stream BW in atomic state
Clearing the DP tunnel stream BW in the atomic state involves getting
the tunnel group state, which can fail. Handle the error accordingly.
This fixes at least one issue where drm_dp_tunnel_atomic_set_stream_bw()
failed to get the tunnel group state returning -EDEADLK, which wasn't
handled. This lead to the ctx->contended warn later in modeset_lock()
while taking a WW mutex for another object in the same atomic state, and
thus within the same already contended WW context.
Moving intel_crtc_state_alloc() later would avoid freeing saved_state on
the error path; this stable patch leaves that simplification for a
follow-up.
Cc: Uma Shankar <uma.shankar@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.9+ Fixes: a4efae87ecb2 ("drm/i915/dp: Compute DP tunnel BW during encoder state computation") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7617 Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20260320092900.13210-1-imre.deak@intel.com
(cherry picked from commit fb69d0076e687421188bc8103ab0e8e5825b1df1) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Linus Torvalds [Tue, 24 Mar 2026 04:30:14 +0000 (21:30 -0700)]
Merge tag 'xsa482-7.0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Restrict the xen privcmd driver in unprivileged domU to only allow
hypercalls to target domain when using secure boot"
* tag 'xsa482-7.0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/privcmd: add boot control for restricted usage in domU
xen/privcmd: restrict usage in unprivileged domU
Wei Fang [Fri, 20 Mar 2026 09:42:22 +0000 (17:42 +0800)]
net: enetc: fix the output issue of 'ethtool --show-ring'
Currently, enetc_get_ringparam() only provides rx_pending and tx_pending,
but 'ethtool --show-ring' no longer displays these fields. Because the
ringparam retrieval path has moved to the new netlink interface, where
rings_fill_reply() emits the *x_pending only if the *x_max_pending values
are non-zero. So rx_max_pending and tx_max_pending to are added to
enetc_get_ringparam() to fix the issue.
Note that the maximum tx/rx ring size of hardware is 64K, but we haven't
added set_ringparam() to make the ring size configurable. To avoid users
mistakenly believing that the ring size can be increased, so set
the *x_max_pending to priv->*x_bd_count.
Martin KaFai Lau [Thu, 19 Mar 2026 18:18:17 +0000 (11:18 -0700)]
udp: Fix wildcard bind conflict check when using hash2
When binding a udp_sock to a local address and port, UDP uses
two hashes (udptable->hash and udptable->hash2) for collision
detection. The current code switches to "hash2" when
hslot->count > 10.
"hash2" is keyed by local address and local port.
"hash" is keyed by local port only.
The issue can be shown in the following bind sequence (pseudo code):
/* Correctly return -EADDRINUSE because "hash" is used
* instead of "hash2". udp_lib_lport_inuse() detects the
* conflict.
*/
bind(fail_fd, "[::]:8888")
/* After one more socket is bound to "[fd00::11]:8888",
* hslot->count exceeds 10 and "hash2" is used instead.
*/
bind(fd11, "[fd00::11]:8888")
bind(fail_fd, "[::]:8888") /* succeeds unexpectedly */
The same issue applies to the IPv4 wildcard address "0.0.0.0"
and the IPv4-mapped wildcard address "::ffff:0.0.0.0". For
example, if there are existing sockets bound to
"192.168.1.[1-11]:8888", then binding "0.0.0.0:8888" or
"[::ffff:0.0.0.0]:8888" can also miss the conflict when
hslot->count > 10.
TCP inet_csk_get_port() already has the correct check in
inet_use_bhash2_on_bind(). Rename it to
inet_use_hash2_on_bind() and move it to inet_hashtables.h
so udp.c can reuse it in this fix.
Fixes: 30fff9231fad ("udp: bind() optimisation") Reported-by: Andrew Onyshchuk <oandrew@meta.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260319181817.1901357-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qingfang Deng [Fri, 20 Mar 2026 09:43:15 +0000 (17:43 +0800)]
net: airoha: add RCU lock around dev_fill_forward_path
Since 0417adf367a0 ("ppp: fix race conditions in ppp_fill_forward_path")
dev_fill_forward_path() should be called with RCU read lock held. This
fix was applied to net, while the Airoha flowtable commit was applied to
net-next, so it hadn't been an issue until net was merged into net-next.
Fixes: a8bdd935d1dd ("net: airoha: Add wlan flowtable TX offload") Signed-off-by: Qingfang Deng <dqfext@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260320094315.525126-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yochai Eisenrich [Thu, 19 Mar 2026 20:06:10 +0000 (22:06 +0200)]
net: fix fanout UAF in packet_release() via NETDEV_UP race
`packet_release()` has a race window where `NETDEV_UP` can re-register a
socket into a fanout group's `arr[]` array. The re-registration is not
cleaned up by `fanout_release()`, leaving a dangling pointer in the fanout
array.
`packet_release()` does NOT zero `po->num` in its `bind_lock` section.
After releasing `bind_lock`, `po->num` is still non-zero and `po->ifindex`
still matches the bound device. A concurrent `packet_notifier(NETDEV_UP)`
that already found the socket in `sklist` can re-register the hook.
For fanout sockets, this re-registration calls `__fanout_link(sk, po)`
which adds the socket back into `f->arr[]` and increments `f->num_members`,
but does NOT increment `f->sk_ref`.
The fix sets `po->num` to zero in `packet_release` while `bind_lock` is
held to prevent NETDEV_UP from linking, preventing the race window.
This bug was found following an additional audit with Claude Code based
on CVE-2025-38617.
selftest: net: Add GC test for temporary routes with exceptions.
Without the prior commit, IPv6 GC cannot track exceptions tied
to permanent routes if they were originally added as temporary
routes.
Let's add a test case for the issue.
1. Add temporary routes
2. Create exceptions for the temporary routes
3. Promote the routes to permanent routes
4. Check if GC can find and purge the exceptions
A few notes:
+ At step 4, unlike other test cases, we cannot wait for
$GC_WAIT_TIME. While the exceptions are always iterable via
netlink (since it traverses the entire fib tree instead of
tb6_gc_hlist), rt6_nh_dump_exceptions() skips expired entries.
If we waited for the expiration time, we would be unable to
distinguish whether the exceptions were truly purged by GC or
just hidden due to being expired.
+ For the same reason, at step 2, we use ICMPv6 redirect message
instead of Packet Too Big message. This is because MTU exceptions
always have RTF_EXPIRES, and rt6_age_examine_exception() does not
respect the period specified by net.ipv6.route.flush=1.
+ We add a neighbour entry for the redirect target with NTF_ROUTER.
Without this, the exceptions would be removed at step 3 when the
fib6_may_remove_gc_list() is called.
Without the fix, the exceptions remain even after GC is triggered
by sysctl -wq net.ipv6.route.flush=1.
ipv6: Don't remove permanent routes with exceptions from tb6_gc_hlist.
The cited commit mechanically put fib6_remove_gc_list()
just after every fib6_clean_expires() call.
When a temporary route is promoted to a permanent route,
there may already be exception routes tied to it.
If fib6_remove_gc_list() removes the route from tb6_gc_hlist,
such exception routes will no longer be aged.
Let's replace fib6_remove_gc_list() with a new helper
fib6_may_remove_gc_list() and use fib6_age_exceptions() there.
Note that net->ipv6 is only compiled when CONFIG_IPV6 is
enabled, so fib6_{add,remove,may_remove}_gc_list() are guarded.
Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260320072317.2561779-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipv6: Remove permanent routes from tb6_gc_hlist when all exceptions expire.
Commit 5eb902b8e719 ("net/ipv6: Remove expired routes with a
separated list of routes.") introduced a per-table GC list and
changed GC to iterate over that list instead of traversing
the entire route table.
However, it forgot to add permanent routes to tb6_gc_hlist
when exception routes are added.
Commit cfe82469a00f ("ipv6: add exception routes to GC list
in rt6_insert_exception") fixed that issue but introduced
another one.
Even after all exception routes expire, the permanent routes
remain in tb6_gc_hlist, potentially negating the performance
benefits intended by the initial change.
Let's count gc_args->more before and after rt6_age_exceptions()
and remove the permanent route when the delta is 0.
Note that the next patch will reuse fib6_age_exceptions().
Fixes: cfe82469a00f ("ipv6: add exception routes to GC list in rt6_insert_exception") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260320072317.2561779-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Wunner [Mon, 23 Mar 2026 06:52:39 +0000 (07:52 +0100)]
Documentation: PCI: Document PCIe TLP Header decoder for AER messages
The prefix/header of a TLP that caused an error may be recorded in the AER
Capability and emitted to the kernel log in raw hex format. Document the
existence and usage of tlp-tool, which decodes the TLP Header into
human-readable form.
The TLP Header hints at the root cause of an error, yet is often ignored
because of its seeming opaqueness. Instead, PCIe errors are frequently
worked around by a change in the kernel without fully understanding the
actual source of the problem. With more documentation on available tools
we'll hopefully come up with better solutions.
There are also wireshark dissectors for TLPs, but it seems they expect a
complete TLP, not just the header, and they cannot grok the hex format
emitted by the kernel directly. tlp-tool appears to be the most cut and
dried solution out there.
Joshua Hay [Sat, 7 Mar 2026 02:12:47 +0000 (18:12 -0800)]
idpf: only assign num refillqs if allocation was successful
As reported by AI review [1], if the refillqs allocation fails, refillqs
will be NULL but num_refillqs will be non-zero. The release function
will then dereference refillqs since it thinks the refillqs are present,
resulting in a NULL ptr dereference.
Only assign the num refillqs if the allocation was successful. This will
prevent the release function from entering the loop and accessing
refillqs.
Joshua Hay [Tue, 3 Mar 2026 01:28:31 +0000 (17:28 -0800)]
idpf: clear stale cdev_info ptr
Deinit calls idpf_idc_deinit_core_aux_device to free the cdev_info
memory, but leaves the adapter->cdev_info field with a stale pointer
value. This will bypass subsequent "if (!cdev_info)" checks if cdev_info
is not reallocated. For example, if idc_init fails after a reset,
cdev_info will already have been freed during the reset handling, but it
will not have been reallocated. The next reset or rmmod will result in a
crash.
Kohei Enju [Sat, 14 Feb 2026 19:14:25 +0000 (19:14 +0000)]
iavf: fix out-of-bounds writes in iavf_get_ethtool_stats()
iavf incorrectly uses real_num_tx_queues for ETH_SS_STATS. Since the
value could change in runtime, we should use num_tx_queues instead.
Moreover iavf_get_ethtool_stats() uses num_active_queues while
iavf_get_sset_count() and iavf_get_stat_strings() use
real_num_tx_queues, which triggers out-of-bounds writes when we do
"ethtool -L" and "ethtool -S" simultaneously [1].
For example when we change channels from 1 to 8, Thread 3 could be
scheduled before Thread 2, and out-of-bounds writes could be triggered
in Thread 3:
Petr Oros [Thu, 12 Feb 2026 07:53:11 +0000 (08:53 +0100)]
ice: use ice_update_eth_stats() for representor stats
ice_repr_get_stats64() and __ice_get_ethtool_stats() call
ice_update_vsi_stats() on the VF's src_vsi. This always returns early
because ICE_VSI_DOWN is permanently set for VF VSIs - ice_up() is never
called on them since queues are managed by iavf through virtchnl.
In __ice_get_ethtool_stats() the original code called
ice_update_vsi_stats() for all VSIs including representors, iterated
over ice_gstrings_vsi_stats[] to populate the data, and then bailed out
with an early return before the per-queue ring stats section. That early
return was necessary because representor VSIs have no rings on the PF
side - the rings belong to the VF driver (iavf), so accessing per-queue
stats would be invalid.
Move the representor handling to the top of __ice_get_ethtool_stats()
and call ice_update_eth_stats() directly to read the hardware GLV_*
counters. This matches ice_get_vf_stats() which already uses
ice_update_eth_stats() for the same VF VSI in legacy mode. Apply the
same fix to ice_repr_get_stats64().
Note that ice_gstrings_vsi_stats[] contains five software ring counters
(rx_buf_failed, rx_page_failed, tx_linearize, tx_busy, tx_restart) that
are always zero for representors since the PF never processes packets on
VF rings. This is pre-existing behavior unchanged by this patch.
Fixes: 7aae80cef7ba ("ice: add port representor ethtool ops and stats") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Patryk Holda <patryk.holda@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>