Peter Zijlstra [Tue, 26 May 2026 09:42:29 +0000 (11:42 +0200)]
sched/proxy: Switch proxy to use p->is_blocked
Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.
This opens up the state: '->is_blocked && !->blocked_on' for future use.
Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
guaranteed that sched_class::pick_task() will never return a delayed task.
Therefore any task returned from pick_next_task() that has ->is_blocked set,
must be a proxy task.
Peter Zijlstra [Wed, 27 May 2026 07:58:02 +0000 (09:58 +0200)]
sched/proxy: Only return migrate when needed
Current code will 'unconditionally' return migrate on PROXY_WAKING, even if the
task is (still) on the original CPU.
Check task_cpu(p) against p->waking_cpu, which per proxy_set_task_cpu()
preserves the original CPU the task was on. If they do not mis-match, there is
no need to go through the more expensive wakeup path.
Peter Zijlstra [Tue, 26 May 2026 09:28:46 +0000 (11:28 +0200)]
sched/proxy: Optimize try_to_wake_up()
The reason for the clause in try_to_wake_up() is, per its comment, that
find_proxy_task()'s proxy_deactivate() is not always called with a cleared
p->blocked_on.
However, that seems silly and easily cured. Make sure to always call
proxy_deactivate() with a cleared p->blocked_on such that we might remove this
clause from the common wake-up path.
Peter Zijlstra [Tue, 12 May 2026 02:56:17 +0000 (02:56 +0000)]
sched: Add blocked_donor link to task for smarter mutex handoffs
Add link to the task this task is proxying for, and use it so
the mutex owner can do an intelligent hand-off of the mutex to
the task that the owner is running on behalf.
[jstultz: This patch was split out from larger proxy patch] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260512025635.2840817-8-jstultz@google.com
John Stultz [Tue, 12 May 2026 02:56:16 +0000 (02:56 +0000)]
sched: Add is_blocked task flag
Add a new is_blocked flag to the task struct. This flag is set
by try_to_block_task() and cleared by ttwu_do_wakeup() and
tracks if the task is blocked.
Traditionally this would mirror !p->on_rq, however due things
like DELAY_DEQUEUE and PROXY_EXEC, this can diverge, so its
useful to manage separately.
Additionally with this, we might be able to get rid of the
p->se.sched_delayed (ab)use in the core code (eventually).
Taken whole cloth from Peter's email:
https://lore.kernel.org/lkml/20260501132143.GC1026330@noisy.programming.kicks-ass.net/
With a few additional p->is_blocked = 0 in a few cases where
we return current if blocked_on gets zeroed or there is
no owner. This may hint that these current special cases
might be dropped eventually.
This change also helps resolve wait-queue stalls seen with
proxy-execution. See previous patch attempts for details:
https://lore.kernel.org/lkml/20260430215103.2978955-2-jstultz@google.com/
Reported-by: Vineeth Pillai <vineethrp@google.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260512025635.2840817-7-jstultz@google.com
John Stultz [Tue, 12 May 2026 02:56:15 +0000 (02:56 +0000)]
sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case
This patch adds logic so try_to_wake_up() will notice if we are
waking a task where blocked_on == PROXY_WAKING, and if necessary
dequeue the task so the wakeup will naturally return-migrate the
donor task back to a cpu it can run on.
This helps performance as we do the dequeue and wakeup under the
locks normally taken in the try_to_wake_up() and avoids having
to do proxy_force_return() from __schedule(), which has to
re-take similar locks and then force a pick again loop.
This was split out from the larger proxy patch, and
significantly reworked.
Credits for the original patch go to:
Peter Zijlstra (Intel) <peterz@infradead.org>
Juri Lelli <juri.lelli@redhat.com>
Valentin Schneider <valentin.schneider@arm.com>
Connor O'Brien <connoro@google.com>
John Stultz [Tue, 12 May 2026 02:56:14 +0000 (02:56 +0000)]
sched: Rework block_task so it can be directly called
Pull most of the logic out of try_to_block_task() and put it
into block_task() directly, so that we can call block_task() and
not have to worry about the failing cases in try_to_block_task()
John Stultz [Tue, 12 May 2026 02:56:13 +0000 (02:56 +0000)]
sched: deadline: Add dl_rq->curr pointer to address issues with Proxy Exec
The DL scheduler keeps the current task in the rbtree, since
the deadline value isn't usually chagned while the task is
runnable.
This results in set_next_task() and put_prev_task() being
simpler, but unfortunately this causes complexity elsewhere.
Specifically when update_curr_dl() updates the deadline, it has
to dequeue and then enqueue the task.
From put_prev_task_dl(), we first call update_curr_dl(), and
then call enqueue_pushable_dl_task().
However, with Proxy Exec this goes awry. Since when a mutex is
released, we might wake the waiting rq->donor. This will cause
put_prev_task() to be called on the donor to take it off the
cpu for return migration.
At that point, from put_prev_task_dl() the update_curr_dl()
logic will dequeue & enqueue the task, and the enqueue function
will call enqueue_pushable_dl_task() (since the task_current()
check won't prevent it). Then back up the callstack in
put_prev_task_dl() we'll end up calling
enqueue_pushable_dl_task() again, tripping the
!RB_EMPTY_NODE(&p->pushable_dl_tasks) warning.
So to avoid this, use Peter's suggested[1] approach, and add a
dl_rq->curr pointer that is set/cleared from set_next_task()/
put_prev_task(), which effectively tracks the rq->donor. We can
then use this to avoid adding the active donor to the pushable
list from enqueue_task_dl().
John Stultz [Tue, 12 May 2026 02:56:12 +0000 (02:56 +0000)]
sched: deadline: Add some helper variables to cleanup deadline logic
As part of an improvement to handling pushable deadline tasks,
Peter suggested this cleanup[1], to use helper values for
dl_entity and dl_rq in the enqueue_task_dl() and
put_prev_task_dl() functions. There should be no functional
change from this patch.
To make sure this cleanup change doesn't obscure later logic
changes, I've split it into its own patch.
John Stultz [Tue, 12 May 2026 02:56:11 +0000 (02:56 +0000)]
sched: Rework prev_balance() to avoid stale prev references
Historically, the prev value from __schedule() was the rq->curr.
This prev value is passed down through numerous functions, and
used in the class scheduler implementations. The fact that
prev was on_cpu until the end of __schedule(), meant it was
stable across the rq lock drops that the class->balance()
implementations often do.
However, with proxy-exec, the prev passed to functions called
by __schedule() is rq->donor, which may not be the same as
rq->curr and may not be on_cpu, this makes the prev value
potentially unstable across rq lock drops.
A recently found issue with proxy-exec, is when we begin doing
return migration from try_to_wake_up(), its possible we may be
waking up the rq->donor. When we do this, we proxy_resched_idle()
to put_prev_set_next() setting the rq->donor to rq->idle, allowing
the rq->donor to be return migrated and allowed to run.
This however runs into trouble, as on another cpu we might be in
the middle of calling __schedule(). Conceptually the rq lock is
held for the majority of the time, but in calling prev_balance()
its possible the class->balance() handler call may briefly drop the rq lock.
This opens a window for try_to_wake_up() to wake and return migrate the
rq->donor before the class logic reacquires the rq lock.
Unfortunately prev_balance() pass in a prev argument, to which we pass
rq->donor. However this prev value can now become stale and incorrect across a
rq lock drop.
So, to correct this, rework the prev_balance() call so that it does not take a
"prev" argument.
Vineeth found came up with a test driver that could trip up
workqueue stalls. After fixing one issue this test found,
Vineeth reported the test was still failing.
Greatly simplified, a task that tries to take a mutex already
owned by another task that is sleeping, can hit a edge case in
the mutex_lock_common() case.
If the task fails to get the lock, calls into schedule, but gets
a spurious wakeup, it will find that it is first waiter, and
go into the mutex_optimistic_spin() logic. Though before calling
mutex_optimistic_spin(), we clear task blocked_on state, since
mutex_optimistic_spin() may call schedule() if need_resched() is
set.
After mutex_optimistic_spin() fails, we set blocked_on again,
restart the main mutex loop, try to take the lock and call into
schedule_preempt_disabled().
From there, with proxy-execution, we'll see the task is
blocked_on, follow the chain, see the owner is sleeping and
dequeue the waiting task from the runqueue.
This all sounds fine and reasonable. But what I had missed is
that in mutex_optimistic_spin(), not only do we call schedule()
but we set TASK_RUNNABLE right before doing so.
This is ok for that invocation of schedule(). But when we come
back we re-set the blocked_on we had just cleared, but we do not
re-set the task state to TASK_INTERRUPTIBLE/UNINTERRUPTIBLE.
This means we have a task that is blocked_on & TASK_RUNNABLE,
so when the proxy execution code dequeues the task, we are
in trouble since future wakeups will be shortcut by the
ttwu_state_match() check.
Thus, to avoid this, after mutex_optimistic_spin(), set the task
state back when we set blocked_on.
Many many thanks again to Vineeth for his very useful testing
driver that uncovered this long hidden bug, that I hadn't
tripped in all my testing! Very impressed with the problems he's
uncovered!
Reported-by: Vineeth Pillai <vineethrp@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Vineeth Pillai <vineethrp@google.com> Link: https://patch.msgid.link/20260430215103.2978955-3-jstultz@google.com
Danilo Krummrich [Fri, 29 May 2026 00:00:54 +0000 (02:00 +0200)]
drm/tyr: use IoMem directly instead of Devres
Now that IoMem is lifetime-parameterized, use it directly in probe
rather than wrapping it in Devres and Arc. The I/O memory mapping is
only used during probe and not stored in driver data, so device-managed
revocation is unnecessary.
This removes the Devres access(dev) pattern from issue_soft_reset(),
GpuInfo::new(), and l2_power_on(), simplifying register access.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com> Link: https://patch.msgid.link/20260529000106.2257996-3-dakr@kernel.org Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Achkinazi, Igor [Thu, 28 May 2026 15:24:27 +0000 (15:24 +0000)]
nvme-multipath: set BIO_REMAPPED on bios remapped to per-path namespace disks
When nvme_ns_head_submit_bio() remaps a bio from the multipath head to a
per-path namespace, bio_set_dev() clears BIO_REMAPPED. The remapped bio
is then resubmitted through submit_bio_noacct() which calls
bio_check_eod() because BIO_REMAPPED is not set.
This races with nvme_ns_remove() which zeroes the per-path capacity
before synchronize_srcu():
CPU 0 (IO submission)
---------------------
srcu_read_lock()
nvme_find_path() -> ns
[NVME_NS_READY is set]
CPU 0 (IO submission)
---------------------
bio_set_dev(bio, ns->disk->part0)
[clears BIO_REMAPPED]
submit_bio_noacct(bio)
-> bio_check_eod() sees capacity=0
-> bio fails with IO error
The SRCU read lock prevents synchronize_srcu() from completing, but does
not prevent set_capacity(0) from executing. The bio fails the EOD check
before it reaches the NVMe driver, so nvme_failover_req() never gets a
chance to redirect it to another path of multipath. IO errors are
reported to the application despite another path being available.
On older kernels (before commit 0b64682e78f7 "block: skip unnecessary
checks for split bio"), the same race was also reachable through split
remainders resubmitted via submit_bio_noacct().
Fix this by setting BIO_REMAPPED after bio_set_dev() in
nvme_ns_head_submit_bio(). This skips bio_check_eod() on the per-path
device; the EOD check already passed on the multipath head.
NVMe per-path namespace devices are always whole disks (bd_partno=0), so
the blk_partition_remap() skip also gated by BIO_REMAPPED is a no-op.
The flag does not persist across failover and cannot go stale if the
namespace geometry changes between attempts: nvme_failover_req() calls
bio_set_dev() to redirect the bio back to the multipath head, which
clears BIO_REMAPPED. When nvme_requeue_work() resubmits through
submit_bio_noacct(), bio_check_eod() runs normally against the current
capacity.
Same approach as commit 3a905c37c351 ("block: skip bio_check_eod for
partition-remapped bios").
Fixes: a7c7f7b2b641 ("nvme: use bio_set_dev to assign ->bi_bdev") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Igor Achkinazi <igor.achkinazi@dell.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
Zhenghang Xiao [Tue, 26 May 2026 10:53:28 +0000 (18:53 +0800)]
xfrm: iptfs: fix use-after-free on first_skb in __input_process_payload
__input_process_payload() stores first_skb into xtfs->ra_newskb under
drop_lock when starting partial reassembly, then unlocks and breaks out
of the processing loop. The post-loop check reads xtfs->ra_newskb
without the lock to decide whether first_skb is still owned:
if (first_skb && first_iplen && !defer && first_skb != xtfs->ra_newskb)
Between spin_unlock and this read, a concurrent CPU running
iptfs_reassem_cont() (or the drop_timer hrtimer) can complete
reassembly, NULL xtfs->ra_newskb, and free the skb. The check then
evaluates first_skb != NULL as true, and pskb_trim/ip_summed/consume_skb
operate on the freed skb — a use-after-free in skbuff_head_cache.
Replace the unlocked read with a local bool that records whether
first_skb was handed to the reassembly state in the current call. The
flag is set after the existing spin_unlock, before the break, using the
pointer equality that is stable at that point (first_skb == skb iff
first_skb was stored in ra_newskb).
Fixes: 3f3339885fb3 ("xfrm: iptfs: add reusing received skb for the tunnel egress packet") Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
liyouhong [Fri, 29 May 2026 08:51:43 +0000 (16:51 +0800)]
nvme-multipath: require exact iopolicy names for module parameter
The iopolicy module parameter uses strncmp prefix matching, so values
like "numax" are accepted as "numa". The per-subsystem sysfs attribute
already requires an exact match via sysfs_streq(). Parse both through
a shared helper so invalid values are rejected consistently.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: liyouhong <liyouhong@kylinos.cn> Signed-off-by: Keith Busch <kbusch@kernel.org>
Adrian Korwel [Mon, 25 May 2026 14:58:32 +0000 (09:58 -0500)]
USB: serial: io_ti: fix heap overflow in build_i2c_fw_hdr()
build_i2c_fw_hdr() allocates a fixed-size buffer of
(16*1024 - 512) + sizeof(struct ti_i2c_firmware_rec) bytes, then
copies le16_to_cpu(img_header->Length) bytes into it without
validating that Length fits within the available space after the
firmware record header.
img_header->Length is a __le16 from the firmware file and can be
up to 65535. check_fw_sanity() validates the total firmware size
but not img_header->Length specifically.
Fix by rejecting images where img_header->Length exceeds the
available destination space.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Adrian Korwel <adriank20047@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org>
Adrian Korwel [Mon, 25 May 2026 14:58:31 +0000 (09:58 -0500)]
USB: serial: io_ti: fix heap overflow in get_manuf_info()
get_manuf_info() reads le16_to_cpu(rom_desc->Size) bytes from the
device I2C EEPROM into a buffer allocated with kmalloc_obj(), which
is sizeof(struct edge_ti_manuf_descriptor) = 10 bytes.
The Size field comes from the device and is only validated (in
check_i2c_image()) to make sure the descriptor fits within
TI_MAX_I2C_SIZE (16384 bytes), not against the destination buffer size.
A malicious USB device can therefore set Size to any value up to 16377,
causing a heap overflow of up to 16367 bytes when plugged into a host
running this driver.
valid_csum() is called after read_rom() and also iterates
buffer[0..Size-1], compounding the out-of-bounds access.
Fix by rejecting descriptors with unexpected length before calling
read_rom().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Adrian Korwel <adriank20047@gmail.com>
[ johan: amend commit message; also check for short descriptors ] Signed-off-by: Johan Hovold <johan@kernel.org>
Rosen Penev [Sat, 30 May 2026 00:36:25 +0000 (17:36 -0700)]
ata: pata_ep93xx: avoid asm on non ARM
The raw ARM asm delay loop prevents COMPILE_TEST builds on
non-ARM architectures. Guard it with CONFIG_ARM and provide a
cpu_relax() fallback for compilation on other architectures.
Thomas Gleixner [Fri, 29 May 2026 20:00:04 +0000 (22:00 +0200)]
pps: Convert to ktime_get_snapshot_id()
ktime_get_snapshot() resolves to ktime_get_snapshot_id(CLOCK_REALTIME).
Make it obvious in the code and convert the readout to use the
snapshot::systime and monoraw fields instead of snapshot::real and raw,
which aregoing away.
Similar to the PPS generators, avoid the more expensive snapshot when
CONFIG_NTP_PPS is disabled.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260529195557.123410250@kernel.org
Thomas Gleixner [Fri, 29 May 2026 19:59:55 +0000 (21:59 +0200)]
timekeeping: Use system_time_snapshot::systime/monoraw instead of ::real/raw
system_time_snapshot::systime provides the same information as
system_time_snapshot::real when the snapshot was taken with
ktime_get_snapshot_id(CLOCK_REALTIME).
Convert the history interpolation over to use 'systime' and 'monoraw' as
'real/raw' are going away once all users are converted.
As a side effect this is the first step to support CLOCK_AUX with
get_device_crosstime_stamp() and the history interpolation.
Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260529195557.024415766@kernel.org
Thomas Gleixner [Fri, 29 May 2026 19:59:51 +0000 (21:59 +0200)]
timekeeping: Provide ktime_get_snapshot_id()
ktime_get_snapshot() provides a snapshot of the underlying clocksource
counter value and the corresponding CLOCK_MONOTONIC_RAW, CLOCK_REALTIME and
CLOCK_BOOTTIME timestamps.
There is no usage of CLOCK_REALTIME and CLOCK_BOOTTIME at the same time and
CLOCK_BOOTTIME support was just added for the ARM64 KVM tracing mechanism,
which needs CLOCK_BOOTTIME and the underlying clocksource counter value.
ktime_get_snapshot() is also not suitable for usage with CLOCK_AUX, but
that's a prerequisite to support PTP hardware timestamping for CLOCK_AUX
steering.
As a first step, rename ktime_get_snapshot() to ktime_get_snapshot_id(),
which now takes a clockid argument to select the clock which needs to be
captured. The result is stored in system_time_snapshot::systime, which will
replace the system_time_snapshot::real/boot members once all usage sites
have been converted.
ktime_get_snapshot() is a simple wrapper which hands in CLOCK_REALTIME as
clockid argument for the conversion period. That means CLOCK_REALTIME is
now captured twice, but that redunancy is only temporary.
As all usage sites of struct system_time_snapshot has to be updated anyway,
rename the 'raw' member to 'monoraw' for clarity.
No functional change vs. current users of ktime_get_snapshot()
Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260529195556.971591633@kernel.org
ext2: fix ignored return value of generic_write_sync()
Fix ext2_dio_write_iter() to propagate the error returned by
generic_write_sync() instead of silently discarding it, which could
cause write(2) to return success to userspace on O_SYNC/O_DSYNC files
even when the sync failed.
The correct pattern, already used in ext2_dax_write_iter() in the same
file and in ext4, xfs, f2fs among others, is:
if (ret > 0)
ret = generic_write_sync(iocb, ret);
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[JK: Reflect also filemap_write_and_wait() return value]
Linus Walleij [Tue, 2 Jun 2026 08:53:32 +0000 (10:53 +0200)]
Merge tag 'qcom-arm64-for-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/dt
Qualcomm Arm64 DeviceTree updates for v7.2
Introduce the Qualcomm IPQ9650 router/gateway platform and the RDP488
board. Add support for the Motorola Edge 30 and the Nothing Phone.
Describe the IPA block on the Agatti platform and missing OPP-levels for
the video encoder/decoder.
For Eliza, describe the QUP Serial Engines, GPI DMA, SDHCI, LLCC, IMEM,
QCE crypto, ADSP remoteproc and USB nodes. Enable DSI panel,
DisplayPort, USB, and ADSP support on the Eliza MTP.
On Glymur enable ADSP and CDSP remoteprocs, FastRPC, crypto hardware,
CPUfreq cooling devices, and coresight nodes. Enable the remoteprocs and
the LID sensor on the Glymur CRD.
Describe the CAN-FD controller found on the Hamoa EVK. Correct the
DisplayPort controller OPP tables.
Describe the watchdog on IPQ5210 and IPQ9650.
Describe USB controller and PHYs for the Kaanapali platform and enable
basic USB support on the MTP and QRD devices.
Enable the second display subsystem on Lemans and use this to enable
additional DisplayPort outputs on the Lemans Ride board, and IFP
mezzanine for the EVK. Also enable the GPIO expander on the Lemans EVK
to get the CAN signals out.
Add crypto hardware and qfprom nodes on Milos. Reduce the remotefs
shared memory size to avoid sanity checks in the modem firmware
rejecting the region.
Enable the vibrator on FairPhone FP6.
Add GPSDP FastRPC support on Monaco, and describe the Bluetooth
controller on the Arduino VENTUNO Q board.
Introduce an EL2 overlay for the Purwa IoT EVK.
Enable CAN bus controller on QCS6490 RB3gen2 and add a remotefs node.
Enable FastRPC on the SC8280XP ADSP.
Correct SDM630 and SDM660 ADSP FastRPC channel ids. Also add the ADSP
memory region on SDM630.
On SDM845 devices, enable NFC on Google Pixel 3, OnePlus 6, OnePlus 6T,
and SHIFT SHIFT6mq. Enable camera flash on LG devices. Rework the
framebuffer description on Samsung, SHIFT and Xiaomi devices. Enable
camera flash on LG devices. Fix Bluetooth and WiFi on LG and Xiaomi
devices.
Enable MDSS and the display panel on Xiaomi Mi A3.
Scale L3 and DDR clock votes based on CPUfreq selection.
Enable camera clock controller, cpufreq cooling devices, and correct the
DSI1 reference clock on SM8750.
On the Talos platform, describe the QSPI support, GPR and audio
services, and enable sound on the EVK target. Enable QSPI and describe
the SPINOR on this bus, on the QCS615 Ride.
Describe power-domain and iface clock for the Inline Crypto Engine (ICE)
across various platforms.
Fix the Bluetooth RFA supply name across a variety of devices.
* tag 'qcom-arm64-for-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (131 commits)
arm64: dts: qcom: add support for pixel 3a xl with the tianma panel
arm64: dts: qcom: sdm670-google: add common device tree include
arm64: dts: qcom: hamoa-iot-evk: add MCP2518FD CAN on spi18
arm64: dts: qcom: sm8750: allow mode-switch events to reach the QMP Combo PHY
arm64: dts: qcom: sc8280xp: drop unused polling-delay-passive properties
arm64: dts: qcom: ipq5210: add watchdog node
arm64: dts: qcom: sdm845-xiaomi-beryllium: Correct IPA FW path
arm64: dts: qcom: monaco-arduino-monza: Add Bluetooth UART node
arm64: dts: qcom: glymur: Add qfprom efuse node
arm64: dts: qcom: milos: Add qfprom efuse node
arm64: dts: qcom: glymur: add coresight nodes
arm64: dts: qcom: qcs6490-rb3gen2: add rmtfs node
arm64: dts: qcom: lemans-evk: Enable CAN RX via I2C GPIO expander
arm64: dts: qcom: glymur: Fix wrong interrupt number for i2c19
arm64: dts: qcom: Drop unused remoteproc_adsp_glink label
arm64: dts: qcom: lemans: Add eDP ref clock for eDP PHYs
arm64: dts: qcom: sm8750: Add power-domain and iface clk for ice node
arm64: dts: qcom: sm8650: Add power-domain and iface clk for ice node
arm64: dts: qcom: sm8550: Add power-domain and iface clk for ice node
arm64: dts: qcom: sm8450: Add power-domain and iface clk for ice node
...
-> __domain_flush_pages( flush_size /* power of 2 flush_size */)
-> domain_flush_pages_v1()
-> build_inv_iommu_pages()
-> build_inv_address()
}
build_inv_address() will trigger a full invalidation if the chunk
size > (1 << 51). Consequently, the guest will issue multiple full
invalidations for a single call to amd_iommu_domain_flush_all()
Without this patch, we will see 10 time instead of 1 time full
invalidations for every amd_iommu_domain_flush_all().
Cc: stable@vger.kernel.org Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM") Suggested-by: Josef Bacik <josef@toxicpanda.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Weinan Liu <wnliu@google.com> Reviewed-by: Wei Wang <wei.w.wang@hotmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Simon Xue [Tue, 28 Apr 2026 16:05:31 +0000 (18:05 +0200)]
iommu/rockchip: disable fetch dte time limit
Disable the Bit 31 of the AUTO_GATING iommu register, as it causes
hangups with the RGA3 (Raster Graphics Acceleration 3) peripheral.
The RGA3 register description of the TRM already states that the bit
must be set to 1. The vendor kernel sets the bit unconditionally to
1 to fix VOP (Video Output Processor) screen black issues. This patch
squashes the 2 vendor kernel commits with the following commit messages:
Master fetch data and cpu update page table may work in parallel, may
have the following procedure:
New iommu version has the above bug, if fetch dte consecutively four
times, then it will be blocked. Fortunately, we can set bit 31 of
register MMU_AUTO_GATING to 1 to make it work as old version which does
not have this issue.
This issue only appears on RV1126 so far, so make a workaround dedicated
to "rockchip,rv1126" machine type.
iommu/rockchip: fix vop blocked and screen black on RK356X and RK3588
RK3568 and RK3588 has the same issue as RV1126/RV1109 that caused by
dte fetch time limit, So we can set BIT(31) of register 0x24 default
to 1 as a workaround.
Signed-off-by: Simon Xue <xxm@rock-chips.com> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de> Acked-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Rafael Passos [Tue, 2 Jun 2026 03:05:19 +0000 (00:05 -0300)]
HID: Input: Add battery list cleanup with devm action
The batteries list (hdev->batteries) is not cleaned up during
hidinput_disconnect(), but struct hid_battery entries are allocated
with devm_kzalloc.
When a driver is unbound (e.g. during devicereprobe), devm frees those
entries while their list_head nodesremain dangling in hdev->batteries,
which persists across rebinds.
Link: https://lore.kernel.org/all/20260602011949.2825852-1-rafael@rcpassos.me/ Fixes: 4a58ae85c3f9 ("HID: input: Add support for multiple batteries per device") Signed-off-by: Rafael Passos <rafael@rcpassos.me> Acked-by: Lucas Zampieri <lcasmz54@gmail.com> Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Dave Jones [Mon, 18 May 2026 18:46:28 +0000 (14:46 -0400)]
NFS: write_completion: dereference loop-local req, not hdr->req
5d3869a41f36 ("NFS: fix writeback in presence of errors") introduced
a dereference of hdr->req->wb_lock_context in nfs_write_completion's
per-request loop. hdr->req is set once at nfs_pgheader_init() time
and is not refcount-protected for the lifetime of the loop; when hdr
aggregates requests from multiple page groups (common under heavy
NFSv3 writeback), a parallel COMMIT on hdr->req's group can drop the
last reference and free it while the outer loop is still iterating
requests from other groups. KASAN catches this as an 8-byte read at
offset +24 of a freed nfs_page slab object (wb_lock_context).
All requests in a given pgio share the same open_context, so reading
the loop-local req's wb_lock_context yields the same value and is
safe -- req is still on hdr->pages and holds its writeback kref
through the commit branch.
Allocated by task 121997 on cpu 3 at 31643.290294s:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x13/0x60
__kasan_slab_alloc+0x62/0x70
kmem_cache_alloc_noprof+0x1ab/0x4e0
nfs_page_create+0x152/0x460 [nfs]
nfs_page_create_from_folio+0x7e/0x210 [nfs]
nfs_update_folio+0x7a9/0x32a0 [nfs]
nfs_write_end+0x290/0xc60 [nfs]
generic_perform_write+0x4ce/0x990
nfs_file_write+0x6b3/0xce0 [nfs]
vfs_write+0x63c/0xfa0
ksys_write+0x122/0x240
do_syscall_64+0xc3/0x13f0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Freed by task 122046 on cpu 0 at 31647.037964s:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x13/0x60
kasan_save_free_info+0x37/0x60
__kasan_slab_free+0x3b/0x60
kmem_cache_free+0x11b/0x5a0
nfs_page_group_destroy+0x13a/0x210 [nfs]
nfs_unlock_and_release_request+0x64/0x90 [nfs]
nfs_commit_release_pages+0x339/0xbd0 [nfs]
nfs_commit_release+0x51/0xb0 [nfs]
rpc_free_task+0xee/0x160
rpc_async_release+0x5d/0xb0
process_one_work+0x9b0/0x1890
worker_thread+0x75a/0x10a0
kthread+0x3d3/0x4d0
ret_from_fork+0x669/0xa50
ret_from_fork_asm+0x11/0x20
The buggy address belongs to the object at ffff888118af2040\x0a which belongs to the cache nfs_page of size 96
The buggy address is located 24 bytes inside of\x0a freed 96-byte region [ffff888118af2040, ffff888118af20a0)
Memory state around the buggy address: ffff888118af1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc ffff888118af1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888118af2000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
^ ffff888118af2080: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888118af2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Reviewed-by Jeff Layton <jlayton@kernel.org>
Fixes: 5d3869a41f36 ("NFS: fix writeback in presence of errors") Cc: Olga Kornievskaia <okorniev@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: linux-nfs@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
drm/imx: Fix three kernel-doc warnings in dcss-scaler.c
Fix the following W=1 kerneldoc warnings by adding the missing parameter
descriptions for @phase0_identity and @nn_interpolation in
dcss_scaler_filter_design() and @phase0_identity in
dcss_scaler_gaussian_filter()
Warning: drivers/gpu/drm/imx/dcss/dcss-scaler.c:173 function parameter 'phase0_identity' not described in 'dcss_scaler_gaussian_filter'
Warning: drivers/gpu/drm/imx/dcss/dcss-scaler.c:270 function parameter 'phase0_identity' not described in 'dcss_scaler_filter_design'
Warning: drivers/gpu/drm/imx/dcss/dcss-scaler.c:270 function parameter 'nn_interpolation' not described in 'dcss_scaler_filter_design'
Fixes: 9021c317b770 ("drm/imx: Add initial support for DCSS on iMX8MQ") Signed-off-by: Yicong Hui <yiconghui@gmail.com> Reviewed-by: Laurentiu Palcu <laurentiu.palcu@oss.nxp.com> Link: https://patch.msgid.link/20260406180013.2442096-1-yiconghui@gmail.com Signed-off-by: Liu Ying <victor.liu@nxp.com>
Joel Kamminga [Sat, 30 May 2026 18:49:43 +0000 (12:49 -0600)]
kbuild: rust: clean `*.long-type-*.txt` files
When rustc prints an error containing a long type that doesn't fit
in a line, it will write the whole thing in a .txt file -- see commit 420dd187e157 (".gitignore: ignore rustc long type txt files") for more
details.
These files are purely compiler artifacts and are not created
intentionally by the build system.
Thus add them to the `clean` target to stop them from cluttering up the
source tree.
Ronald Claveau [Wed, 13 May 2026 10:43:55 +0000 (12:43 +0200)]
arm64: dts: amlogic: t7: khadas-vim4: add PWM-driven status LED
The VIM4 board exposes a status LED wired to the PWM_AO_C_D output.
Enable the pwm_ao_cd controller with its pinmux, and declare a
pwm-leds node with a heartbeat trigger.
Enable and configure the three MMC controllers for the Khadas VIM4 board:
- sd_emmc_a: SDIO interface for the BCM43752 Wi-Fi module
- sd_emmc_b: SD card slot
- sd_emmc_c: eMMC storage
Ronald Claveau [Thu, 26 Mar 2026 09:59:16 +0000 (10:59 +0100)]
arm64: dts: amlogic: t7: Add PWM controller nodes
Add device tree nodes for the seven PWM controllers available
on the Amlogic T7 SoC, using amlogic,meson-s4-pwm as fallback compatible.
All nodes are disabled by default and should be
enabled in the board-specific DTS file.
Co-developed-by: Nick Xie <nick@khadas.com> Signed-off-by: Nick Xie <nick@khadas.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr> Link: https://patch.msgid.link/20260326-add-emmc-t7-vim4-v5-5-d3f182b48e9d@aliel.fr Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Ronald Claveau [Thu, 26 Mar 2026 09:59:14 +0000 (10:59 +0100)]
arm64: dts: amlogic: t7: Add MMC controller nodes
Add device tree nodes for the three MMC controllers available
on the Amlogic T7 SoC, using amlogic,meson-axg-mmc as fallback compatible.
All nodes are disabled by default and should be
enabled in the board-specific DTS file.
Ferus Castor [Mon, 1 Jun 2026 01:58:48 +0000 (18:58 -0700)]
ALSA: oxygen: add HT-Omega eClaro (7284:9783) support
The HT-Omega eClaro is a PCI sound card built on the C-Media CMI8788
(Oxygen HD) controller, with PCI subsystem ID 7284:9783.
Output hardware:
- AK4396VF stereo DAC: front L/R output, connected via SPI CE0
- CS4362A 6-channel DAC: surround, center/LFE, and side outputs,
connected via SPI CE1 with a 3-byte [0x30, reg, val] frame
The CS4362A uses inverse attenuation encoding (0 = 0 dB, 127 = max
attenuation) and a 0.5 dB/step logarithmic scale. Volume TLV is set
to TLV_DB_SCALE(-6350, 50, 0) to match the hardware. The channel-to-
register mapping was verified by listening test:
- Pair 1 (regs 7/8): side L/R (ALSA channels 6/7)
- Pair 2 (regs 10/11): center/LFE (ALSA channels 4/5)
- Pair 3 (regs 13/14): rear L/R (ALSA channels 2/3)
Input hardware:
- CS5361 stereo ADC: Line In and Mic In capture
accel/ivpu: Add buffer overflow check in MS get_info_ioctl
Add validation that the info size returned from the metric stream info
query is not exceeded when checked against the allocated buffer size.
If the firmware returns a size larger than the buffer, reject the
operation with -EOVERFLOW instead of proceeding with an incorrect
buffer copy.
Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") Cc: stable@vger.kernel.org # v6.18+ Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20260529120841.135852-1-andrzej.kacprowski@linux.intel.com
accel/ivpu: Add bounds checks for firmware log indices
Add validation that read and write indices in the firmware log buffer
are within valid bounds (< data_size) before using them. If
out-of-bounds indices are encountered (from firmware), clamp them to
safe values instead of proceeding with invalid offsets.
This prevents potential out-of-bounds buffer access when firmware
supplies invalid log indices.
Fixes: 1fc1251149a7 ("accel/ivpu: Refactor functions in ivpu_fw_log.c") Cc: stable@vger.kernel.org # v6.18+ Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20260529115842.135378-1-andrzej.kacprowski@linux.intel.com
accel/ivpu: Add bounds check for firmware runtime memory
Validate that the firmware runtime memory specified in the image
header is properly aligned and sized to hold the firmware image.
This prevents errors during memory allocation and image transfer.
Fixes: 2007e210b6a1 ("accel/ivpu: Split FW runtime and global memory buffers") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20260529120853.135876-1-andrzej.kacprowski@linux.intel.com
tools/testing/memblock: fix stale NUMA reservation tests
memblock allocations now reserve memory with MEMBLOCK_RSRV_KERN and,
on NUMA configurations, record the requested node on the reserved
region. Several memblock simulator NUMA tests still expected merges
that only worked before those reservation semantics changed, so the
suite aborted even though the allocator behavior was correct.
Update the NUMA merge expectations in the memblock_alloc_try_nid()
and memblock_alloc_exact_nid_raw() tests to match the current reserved
region metadata rules. For cases that should still merge, create the
pre-existing reservation with matching nid and MEMBLOCK_RSRV_KERN
metadata. Also strengthen the memblock_alloc_node() coverage by
checking the newly created reserved region directly instead of
re-reading the source memory node descriptor.
Finally, drop the stale README/TODO notes that still claimed
memblock_alloc_node() could not be tested.
The memblock simulator passes again with NUMA enabled after these
updates.
mm/fake-numa: fix under-allocation detection in uniform split
When splitting NUMA node uniformly, split_nodes_size_interleave_uniform()
returns the next absolute node ID, not the number of nodes created.
The existing under-allocation detection logic compares next absolute node
ID (ret) and request count (n), which only works when nid starts at 0.
For example, on a system with 2 physical NUMA nodes (node 0: 2GB, node
1: 128MB) and numa=fake=8U, 8 fake nodes are successfully created from
node 0 and split_nodes_size_interleave_uniform() returns 8. For node 1,
fake node nid starts at 8, but only 4 fake nodes are created due to
current FAKE_NODE_MIN_SIZE being 32MB, and
split_nodes_size_interleave_uniform() returns 12. By existing
under-allocation detection logic, "ret < n" (12 < 8) is false, so the
under-allocation will not be detected.
Fix under-allocation detection logic to compare the number of actually
created nodes (ret - nid) against the request count (n). Also skip
under-allocation detection logic for memoryless physical nodes where no
fake nodes are created.
Also, fix the outdated comment describing
split_nodes_size_interleave_uniform() to match the actual return value.
bpf: fix UAF by restoring RCU-delayed inode freeing in bpffs
commit 4f375ade6aa9 ("bpf: Avoid RCU context warning when unpinning
htab with internal structs") moved inode cleanup from ->free_inode()
into ->destroy_inode() to avoid sleeping in RCU context when calling
bpf_any_put(). However this removed the RCU delay on freeing the
inode itself and the cached symlink body (i_link), both of which
can be accessed by RCU pathwalk (pick_link, may_lookup etc.).
This causes a use-after-free when a concurrent unlinkat() drops the
last inode reference and destroy_inode() frees the inode immediately,
while another task is still walking the path in RCU mode and reads
inode->i_opflags (offset +2) inside current_time() -> is_mgtime().
KASAN reports:
BUG: KASAN: slab-use-after-free in is_mgtime include/linux/fs.h:2313
Read of size 2 at addr ffff8880407e4282 (offset +2 = i_opflags)
The rules (per Al Viro):
->destroy_inode() called immediately, can sleep, use for blocking
cleanup e.g. bpf_any_put()
->free_inode() called after RCU grace period, use for freeing
inode and anything RCU-accessible e.g. i_link
Fix: split the two concerns properly:
- keep bpf_any_put() in bpf_destroy_inode() since it is blocking
and needs to run promptly
- introduce bpf_free_inode() to handle kfree(i_link) and
free_inode_nonrcu() with proper RCU delay, preventing the UAF
Thomas Weißschuh [Sun, 31 May 2026 13:20:16 +0000 (15:20 +0200)]
platform/chrome: Prevent build for big-endian systems
Both ARM and ARM64 which are a dependency for CHROME_PLATFORMS have
seldomly used big-endian variants.
The ChromeOS EC framework and drivers are written under the assumption
that they will be running on a little-endian systems. Code which would
be broken on big-endian can be found trivially.
====================
Add StarFive jhb100 soc SGMII GMAC support
jhb100 is a Starfive new RISC-V SoC for datacenter BMC (BaseBoard
Managent Controller). Similar with Aspeed 27x0.
The jhb100 minimal system upstream is in progress:
https://patchwork.kernel.org/project/linux-riscv/cover/20260508053632.818548-1-changhuang.liang@starfivetech.com/
jhb100 GMAC still using designware GMAC core like JH7100 and JH7110,
and contains 2 SGMII interfaces, 1 RGMII/RMII interface, 1 RMII
interface. In JH7100/JH7110 dwmac-starfive.c have supported RGMII/RMII
interface. So require to add SGMII support to dwmac-starfive.c for JHB100.
SGMII serdes PHY has been integrated in JHB100 and do not have driver
setting.
In JHB100 EVB board, SGMII connect with motorcomm YT8531s external PHY
and support RJ45 ethernet port.
====================
Minda Chen [Wed, 27 May 2026 08:41:07 +0000 (16:41 +0800)]
net: stmmac: starfive: Add jhb100 SGMII interface
Add jhb100 compatible and SGMII support. jhb100 soc contains
2 SGMII interfaces and integrated with serdes PHY. SGMII with
split TX/RX MAC clock and need to set 2.5M/25M/125M TX/RX clock
rate in 10M/100M/1000M speed mode.
Minda Chen [Wed, 27 May 2026 08:41:06 +0000 (16:41 +0800)]
dt-bindings: net: starfive,jh7110-dwmac: Add jhb100 support
The jhb100 GMAC still using Synopsys designware GMAC core.
hardware features are similar with jh7100.
Add jhb100 GMAC compatible and reset, interrupts features.
jhb100 dwmac has only one reset signal and one interrupt
line.
jhb100 SGMII interface tx/rx mac clock is split and require to
set clock rate in 10M/100M/1000M speed. So dts need to add a
new rx clock in code, dts and dt binding doc.
Linus Torvalds [Tue, 2 Jun 2026 02:55:30 +0000 (19:55 -0700)]
Merge tag 'for-7.1/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mikulas Patocka:
- fix race condition in dm-cache-policy-smq
* tag 'for-7.1/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache policy smq: check allocation under invalidate lock
Mark Bloch [Thu, 28 May 2026 19:14:10 +0000 (22:14 +0300)]
devlink: Release nested relation on devlink free
devlink relation state is normally released from devl_unregister(), which
calls devlink_rel_put(). This misses devlink instances that get a nested
relation before registration and then fail probe before devl_register() is
reached.
That flow can happen for SFs. The child devlink gets linked to its
parent before registration, then a later probe error calls devlink_free()
directly. Since the instance was never registered, devl_unregister() is not
called and devlink->rel is leaked.
Release any pending relation from devlink_free() as well. The registered
path is unchanged because devl_unregister() already clears devlink->rel
before devlink_free() runs.
Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260528191411.3270532-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
qrtr_send_resume_tx() calls qrtr_node_lookup() which takes a
reference on the returned node. If the subsequent call to
qrtr_alloc_ctrl_packet() fails due to memory allocation failure, the
function returns -ENOMEM without calling qrtr_node_release() to
release the node reference.
Add qrtr_node_release(node) before returning on the allocation failure
path to properly release the reference.
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260528080019.1176700-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Thompson [Thu, 28 May 2026 16:50:17 +0000 (16:50 +0000)]
net: lan743x: avoid netdev-based logging before netdev registration
This patch updates the lan743x driver to prevent the use of netdev-based
logging APIs (such as netdev_dbg) before the network device has been
successfully registered. Using netdev-based logging prior to registration
results in log messages referencing "(unnamed net_device) (uninitialized)",
which can be confusing and less informative.
The driver must use netif_msg_ APIs and device-based logging (e.g. dev_dbg)
until netdev registration is complete. This ensures log entries are
associated with the correct device context and improves log clarity. After
registration, netdev-based logging APIs can be used safely.
Linus Torvalds [Tue, 2 Jun 2026 02:50:33 +0000 (19:50 -0700)]
Merge tag 'auxdisplay-v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay
Pull auxdisplay updates from Andy Shevchenko:
- Fix potential out-of-bound access in line-display library
- Miscellaneous refactoring and cleaning up
[ Andy says this could easily be delayed until 7.2, but it's _so_ tiny
that it's more work for me to schedule it for later than to just take
it now, and just doesn't seem worth delaying - Linus ]
* tag 'auxdisplay-v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
auxdisplay: Kconfig: drop unneeded quotes in PANEL_BOOT_MESSAGE dep
auxdisplay: line-display: fix OOB read on zero-length message_store()
auxdisplay: max6959: use regmap_assign_bits() in max6959_enable()
Lee Jones [Wed, 27 May 2026 13:36:29 +0000 (13:36 +0000)]
l2tp: pppol2tp: hold reference to session in pppol2tp_ioctl()
pppol2tp_ioctl() read sock->sk->sk_user_data directly without any
locks or reference counting. If a controllable sleep was induced during
copy_from_user() (e.g. via a userfaultfd page fault sleep), a concurrent
socket close could trigger pppol2tp_session_close() asynchronously. This
frees the l2tp_session structure via the l2tp_session_del_work workqueue.
Upon resuming, the ioctl thread dereferences the stale session pointer,
resulting in a Use-After-Free (UAF).
Fix this by securely fetching the session reference using the RCU-safe,
refcounted helper pppol2tp_sock_to_session(sk) on entry. This locks the
session's refcount across the sleep. We structured the function to exit
via standard err breaks, guaranteeing that l2tp_session_put() is cleanly
called on all return paths to drop the reference.
To preserve existing behavior we validate the session and its magic
signature only for the specific L2TP commands that require it. This
ensures that generic/unknown ioctls called on an unconnected socket
still return -ENOIOCTLCMD and correctly fall back to generic handlers
(e.g. in sock_do_ioctl()).
net: wwan: t7xx: Add delay between MD and SAP suspend
SAP (Service Access Point) suspend occasionally times out with error
-110 (ETIMEDOUT), followed by modem port errors and complete modem
failure requiring a system reboot to recover.
Error symptoms:
mtk_t7xx 0000:72:00.0: [PM] SAP suspend error: -110
mtk_t7xx 0000:72:00.0: can't suspend (...returned -110)
mtk_t7xx 0000:07:00.0: Failed to send skb: -22
mtk_t7xx 0000:07:00.0: Write error on MBIM port, -22
The modem firmware needs time after receiving the MD (modem) suspend
request to complete internal operations before it is ready to accept
the SAP suspend request. Without this delay, if runtime PM attempts
to suspend while the firmware is busy, the SAP suspend command times
out, leaving the modem in an unrecoverable state.
Root cause and userspace interaction:
ModemManager 1.24+ includes changes that reduce the likelihood of this
issue by ensuring the modem is in a low-power state before the kernel
attempts runtime suspend. However, the kernel driver should not depend
on specific userspace behavior or ModemManager versions. Older versions
(1.20-1.22) are still widely deployed, and the kernel should be robust
regardless of userspace implementation details.
There appears to be no hardware status register or other mechanism
available to query whether the firmware is ready for SAP suspend.
A delay between the two suspend requests is the most reliable solution
found through testing.
Add a 50ms delay between MD suspend and SAP suspend. This gives the
firmware adequate time to complete internal operations without adding
significant latency to the suspend path. This makes the driver robust
across all ModemManager versions and system conditions.
Testing: 96+ hours of continuous operation with ModemManager 1.20.2
and Fibocom FM350-GL modem. Zero SAP suspend timeouts observed across
2000+ successful suspend/resume cycles. Previously failed within
24 hours with 100% reproducibility.
Petr Wozniak [Wed, 27 May 2026 05:39:09 +0000 (07:39 +0200)]
net: phy: sfp: probe for RollBall I2C-to-MDIO bridge in mdio-i2c
The "OEM"/"SFP-10G-T" quirk entry in sfp_fixup_rollball_cc()
unconditionally forces MDIO_I2C_ROLLBALL for all modules matching that
vendor/part-number combination. This works for modules that genuinely
implement a RollBall I2C-to-MDIO bridge, but silently breaks modules
that share the same EEPROM strings without having such a bridge.
The Realtek RTL8261BE-CG is one such module: a pure copper 10G SFP+
media converter with no I2C-to-MDIO bridge. Its EEPROM reports
vendor="OEM", part="SFP-10G-T-I", and -- critically -- Vendor OUI
00:00:00, making OUI-based differentiation impossible. With
MDIO_I2C_ROLLBALL forced, the module silently ACKs the unlock password
write, the MDIO bus is created, but no PHY responds; the SFP state
machine cycles through the RollBall PHY-probe retry window before
reporting no PHY.
Move the probe into i2c_mii_init_rollball() in mdio-i2c.c, where the
RollBall protocol constants are already defined. After sending the
unlock password, issue a CMD_READ and poll for CMD_DONE up to 200 ms
(10 x 20 ms, matching the existing rollball poll tolerance). A genuine
RollBall bridge asserts CMD_DONE within that window; modules without a
bridge never do, so i2c_mii_init_rollball() returns -ENODEV.
mdio_i2c_alloc() propagates -ENODEV to the caller to signal that no
bridge is present and PHY probing should be skipped.
sfp_sm_add_mdio_bus() catches -ENODEV and transitions
sfp->mdio_protocol to MDIO_I2C_NONE so the rest of the state machine
skips PHY probing for this module.
Any I2C-level error (NACK, timeout) during the probe is also treated as
-ENODEV: if the module does not respond at I2C address 0x51 at all,
there is certainly no RollBall bridge there, and SFP initialization
should not abort.
The probe writes are safe with respect to SFP EEPROM integrity: only
modules explicitly listed in the quirk table enter this path, and the
RollBall password unlock write to 0x51 was already issued by
i2c_mii_init_rollball() before the probe for all such modules. Any
module without a device at 0x51 NACKs the transfer and is treated as
-ENODEV.
Add "OEM"/"SFP-10G-T-I" to the quirk table so RTL8261BE modules enter
the probe path; genuine RollBall modules continue to work as before.
Jakub Kicinski [Tue, 2 Jun 2026 02:11:17 +0000 (19:11 -0700)]
Merge branch 'mv88e6xxx-serdes-on-mv88e6321'
Fidan Aliyeva says:
====================
mv88e6xxx: SERDES on mv88e6321
This patch series add code support to be able to use SERDES feature of
mv88e6321 version of Marvel mv88e6xxx series. mv88e6321 has 2 ports to
support high speed SERDES but the support is lacking in the driver.
mv88e6321 version has a similar architecture to mv88e6352 version making it
possible to reuse its pcs functions. That's why the patch series consist of
2 parts:
1. Refactor the serdes functions and pcs_init of mv88e6352 to be more
generic (patches 1-2).
2. Add the SERDES support for mv88e6321 reusing 6352's pcs functions
The final code has been tested on mv88e6321 ethernet device directly by ip
ping tests, performance tests and also verifying the switch's expected
register values.
Referred document: 88E6321/88E6320 Functional Specification
====================
Fidan Aliyeva [Thu, 28 May 2026 21:03:10 +0000 (23:03 +0200)]
mv88e6xxx: Add SERDES Support for mv88e6321
Add serdes and pcs_ops functions for mv88e6321. In mv88e6321
2 ports support serdes functionality; port 0 and port 1. These ports are
serdes-only ports.
Changes:
1. Add a function support to return the lane address for the port based on
cmode.
2. Reuse mv88e6352's serdes_get_regs* and pcs_init functions for mv88e6321.
Tested on mv88e6321 switch port 0.
Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-4-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fidan Aliyeva [Thu, 28 May 2026 21:03:09 +0000 (23:03 +0200)]
mv88e6xxx: Refactor 6352's serdes functions
Changes:
1. Replace serdes check by mv88e6352_g2_scratch_port_has_serdes in
mv88e6352_pcs_init function by mv88e6xxx_serdes_get_lane function making it
more generic.
2. Replace serdes checks in mv88e6352_serdes_get_* functions with
mv88e6xxx_serdes_get_lane making them more generic.
3. Add lane argument to mv88e6352_serdes_read so it can be reused later for
6321.
Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-3-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fidan Aliyeva [Thu, 28 May 2026 21:03:08 +0000 (23:03 +0200)]
mv88e6xxx: Add mv88e6352_serdes_get_lane
Changes:
1. Add mv88e6352_serdes_get_lane function which checks if the port
supports SERDES by calling mv88e6352_g2_scratch_port_has_serdes. Then
returns the address of the SERDES lane.
2. Add this function as .serdes_get_lane member to all the chip
versions which use mv88e6352_pcs_init.
Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-2-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yizhou Zhao [Wed, 27 May 2026 08:18:01 +0000 (16:18 +0800)]
6lowpan: fix off-by-one in multicast context address compression
The second memcpy in lowpan_iphc_mcast_ctx_addr_compress() uses
&data[1] as destination and &ipaddr->s6_addr[11] as source, but
both should be offset by one: &data[2] and &ipaddr->s6_addr[12]
respectively.
This off-by-one has two consequences:
1. data[1] is overwritten with s6_addr[11], corrupting the RIID
field in the compressed multicast address
2. data[5] is never written, so uninitialized kernel stack memory
is transmitted over the network via lowpan_push_hc_data(),
leaking kernel stack contents
The correct inline data layout must match what the decompression
function lowpan_uncompress_multicast_ctx_daddr() expects:
data[0..1] = s6_addr[1..2] (flags/scope + RIID)
data[2..5] = s6_addr[12..15] (group ID)
Also zero-initialize the data array as a defensive measure against
similar bugs in the future.
Fixes: 5609c185f24d ("6lowpan: iphc: add support for stateful compression") Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Ao Wang <wangao@seu.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://patch.msgid.link/20260527081806.42747-1-zhaoyz24@mails.tsinghua.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Realtek Otto switch platform consist of four different series
- RTL838x aka maple : 28 port 1G Switches
- RTL839x aka cypress : 52 port 1G Switches
- RTL930x aka longan : 28 port 1G/2.5G/10G Switches
- RTL931x aka mango : 56 port 1G/2.5G/10G Switches
After establishing basic groundwork for multi device support, this series
harmonizes the command handling of the MDIO driver. It is the second step
to allow easier integration of the non RTL930x SoCs into this driver.
====================
net: mdio: realtek-rtl9300: use command runner for read_c22()
Convert the final missing read_c22() path to the new read enabled command
runner. Do it the same way as other implementations.
- bus calls otto_emdio_read_c22()
- this hands over to SoC specific otto_emdio_9300_read_c22()
- finally the registers are filled and the runner issued
With this cleanup remove the obsolete helper otto_emdio_wait_ready()
net: mdio: realtek-rtl9300: use command runner for read_c45()
Convert the read_c45() path to the new command runner. This needs the
additional helper otto_emdio_read_cmd() that can issue the command runner
and process a read operation. It is basically nothing more than
- run the command
- read the command result thorugh the I/O register
With this in place convert the read_c45() like the alread existing write
C22/C45 implementation.
- bus calls otto_emdio_read_c45()
- this handed over to SoC specific otto_emdio_9300_read_c45()
- the registers are filled
- the otto_emdio_read_cmd() is issued
- that calls the command runner
net: mdio: realtek-rtl9300: provide generic command runner
The current bus read/write commands for C22/C45 are RTL930x specific.
Avoid to duplicate those 200 lines of code for the RTL838x, RTL839x and
RTL931x targets. Instead provide a generic command runner that is SoC
independent. The implementation works as follows:
The runner will take a prepared list of the four MDIO registers. It will
feed the data into the registers. This generic write to all registers
(or to say "a little bit too much") is no issue. The hardware looks at
the to be executed command and will only take the pieces of data that
are really required. No side effects have been observed on any of the
four SoCs during the time this mechanism exists in downstream OpenWrt.
The last fed register is the C22/command register. This will be enriched
with the proper command flags from the caller. The hardware issues the
command and the runner will wait for its finalization.
Besides from feeding all registers the runner emulates the behaviour of
the old code as best as possible
- check defensively for a running command in advance
- Before this commit the driver had different MMIO timeout values.
1000s for command preparation, 100us after writes and 1000us after
reads. The new version uses a consistent 1000us timeout for all
of these.
- return -ENXIO in case of hardware failure (fail bit)
As a first consumer of this runner convert the write_c45() function.
This is realized in a multi stage approach
- a generic otto_emdio_write_c45() will be called by the bus
- this will forward the request to the device specific writer. In this
case otto_emdio_9300_write_c45().
- There the command data is filled in and the additional helper
otto_emdio_write_cmd() will be called
- That adds the write flag and issues the generic command runner.
With all the above mentioned in place, there is not much left to do in
otto_emdio_9300_write_c45(). It just fills the register fields and
calls the write helper with the right command bits.
====================
Minimize annotations for arena programs
BPF programs must currently include code to address two limitations
of function signatures that include arena types. First, arena arguments
must be annotated with __arg_arena in the function signature in addition
to __arena. Second, it is currently not allowed to return an arena pointer
from a subprog, even though it is safe to do so. These limitations require
extra annotations and typecasts respectively, and have proven sources of
confusion to programmers.
The patchset improves arena-related function signatures in two ways.
First, it removes the need for __arg_arena in function signatures.
Second, it allows subprogs to directly return arena pointers to their
caller.
To do this we add a new type tag to the existing __arena annotation.
The annotation is currently an alias for __attribute__((address_space(1))),
which is not discoverable from BTF alone and so cannot be used to
determine whether a pointer variable is an arena pointer during
verification. With the new type tag, we can determine whether
either the arguments and or the return value of a function belong
in an arena.
We test the new code by modifying libarena to take advantage of these
relaxed limitations.
- Added Acks by Eduard
- Complete the __arg_arena removal by removing them from htab (Alexei)
- Add a test in verifier_arena_globals1.c to confirm the new __arena attribute
works as expected in function argument and return types
- Reject type tags on non-pointer types, currently only possible in handcrafted
BTF (Eduard)
- Undo inaccurate change on verifier comment (AI)
- Fix error return value for invalid BTF return types during BTF parsing (Eduard)
- Rebased to fix conflict
- Removed the typedef foo * foo_t typedefs. Those were necessary to avoid
annotating each instance of the type with __arena. The new version of the
patch instead removes typedefs and uses __arena everywhere directly (see
patch 4/5 for more details).
- Reorganized the patchset to frontload all kernel-side changes and place
the libarena changes at the end.
====================
Emil Tsalapatis [Tue, 2 Jun 2026 00:41:20 +0000 (20:41 -0400)]
selftests/bpf: Add tests for the new type-tag based __arena identifier
Add selftests that combine the new type-based __arena identifier with
the volatile qualifier both in functions' arguments and return values.
This way we test both that they are recognized as arena arguments and
that they are not sensitive to the position they are placed in the type
compared to other qualifiers.
Emil Tsalapatis [Tue, 2 Jun 2026 00:41:19 +0000 (20:41 -0400)]
selftests/bpf: libarena: Directly return arena pointers from functions
Now that the __arena annotation includes a BTF type tag, and the
verifier can identify arena pointers at BTF loading time, return
arena pointers as their true type instead of casting to u64. Remove the
preprocessor typecast wrappers used to hide this from the caller.
Emil Tsalapatis [Tue, 2 Jun 2026 00:41:18 +0000 (20:41 -0400)]
selftests/bpf: Remove __arg_arena from the codebase
Now that BPF __arg_arena has been subsumed by __arena, remove
__arg_arena from the codebase. This way the user has one fewer
annotation to worry about.
To remove __arg_arena we remove the typedefs we were previously
using to minimize __arena annotations. This is because __arena
now also includes a BTF type tag, which is ignored for non-pointer
types. As a result, we cannot capture the whole __arena annotation
inside a typedef and need to directly annotate the pointer type when
declaring the variable.
The extra verbosity is worth it because the use of the __arena tag
is intuitive to the programmer and removes the __arg_arena tag that
has been a consistent source of confusion for users. The typedefs
can be reintroduced later (without __arg_arena) once compilers start
supporting BTF type tags for non-pointer types.
Emil Tsalapatis [Tue, 2 Jun 2026 00:41:17 +0000 (20:41 -0400)]
bpf: Allow subprogs to return arena pointers
BPF subprogs currently only return void or scalar values. However,
it is also safe to return arena pointers between subprogs in the same
BPF program: Arena pointers are guaranteed to be safe for both programs
at any point. Expand the verifier to permit returning an arena pointer
to the caller.
The main subprog is still not allowed to return an arena pointer because
arena pointers are internal to the BPF program, and the return values
permitted for each main subprog depend on the program type anyway.
Emil Tsalapatis [Tue, 2 Jun 2026 00:41:16 +0000 (20:41 -0400)]
verifier: parse BTF type tags for function arguments
The BTF parsing logic for function arguments goes through
the arguments' decl tags, but does not go into their type
tags. Add type tag parsing for function arguments.
Emil Tsalapatis [Tue, 2 Jun 2026 00:41:15 +0000 (20:41 -0400)]
selftests/bpf: libarena: Add "arena" BTF type tag to __arena qualifier
The arena qualifier currently designates its associated type
as belonging to address space 1. This property affects code
generation, but is not reflected in the BTF information of
the function.
This lack of information at the BTF level prevents us from
returning arena pointers from global subprograms. Subprogs
cannot return any data structure more complex than a scalar,
so pointers to structs are rejected as a return type. We
have no way of marking the return type as a pointer to an
arena, which is safe provided the two subprogs have the same
arena.
Expand the __arena qualifier to also attach a BTF type tag
to the type. This lets us determine whether a variable belongs
to an arena from its type alone through BTF parsing.
Follow-up fixes for the signed loader, includes also the recent
sashiko findings.
v1->v2:
- Fixed up verifier_map_ptr selftest
- Added patch 1/2/6/7 with a new map-in-map fix and a
redundant hash_buf memcpy cleanup as well as selftests
====================
Daniel Borkmann [Mon, 1 Jun 2026 15:02:48 +0000 (17:02 +0200)]
selftests/bpf: Test that exclusive maps are rejected in map-in-map
Add a subtest to map_excl that verifies an exclusive map (created with
excl_prog_hash) cannot be used in a map-of-maps, covering both kernel
enforcement points: i) the inner-map template at map-of-maps creation
and, ii) the element inserted into an existing map-of-maps.
KP Singh [Mon, 1 Jun 2026 15:02:47 +0000 (17:02 +0200)]
selftests/bpf: Adjust verifier_map_ptr for the map's excl field
Adding the u32 excl field at offset 32 of struct bpf_map right after the
sha[SHA256_DIGEST_SIZE] hash shifts the ops pointer from offset 32 to 40.
Therefore, fix up the test case.
Daniel Borkmann [Mon, 1 Jun 2026 15:02:46 +0000 (17:02 +0200)]
libbpf: Skip max_entries override on signed loaders
bpf_gen__map_create() lets the host-supplied loader ctx override a
map's max_entries at runtime (map_desc[idx].max_entries, when non-zero).
This is how the light skeleton sizes maps to the target machine, but
it happens after emit_signature_match() and is covered by neither the
signed loader instructions nor the hashed blob.
For a signed loader this means an untrusted host can re-dimension the
program's maps, outside what the signature attests to. Gate the override
on gen_hash so signed loaders use the signer-provided max_entries baked
into the blob.
Daniel Borkmann [Mon, 1 Jun 2026 15:02:45 +0000 (17:02 +0200)]
libbpf: Skip initial_value override on signed loaders
bpf_gen__map_update_elem() emits code that, when the host-supplied
loader ctx provides a non-NULL map_desc[idx].initial_value, overwrites
the blob value with bytes read from the host (bpf_copy_from_user /
bpf_probe_read_kernel) before the BPF_MAP_UPDATE_ELEM that populates
the program's .data/.rodata/.bss maps.
This override runs after emit_signature_match() has validated map->sha[],
and initial_value is part of neither the signed loader instructions nor
the hashed data blob. For a signed loader this lets an untrusted host
substitute global-variable contents into a program whose code carries
a valid signature, thus weakening what the signature attests to.
The blob already contains the signer-provided value (added via add_data()
and covered by the embedded, signed hash), so simply skip emitting the
override for signed loaders (gen_hash). Runtime initialization stays
available for the unsigned light-skeleton path as before. The jump
offsets within the override block are internal to it, so guarding the
whole block leaves them unchanged.