Randy Dunlap [Tue, 7 Apr 2026 00:50:39 +0000 (17:50 -0700)]
fuse: fuse_dev_i.h: clean up kernel-doc warnings
Change some "/**" to "/*" since they are not kernel-doc comments:
Warning: fs/fuse/fuse_dev_i.h:25 This comment starts with '/**', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst
* Request flags
Warning: fs/fuse/fuse_dev_i.h:58 This comment starts with '/**', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst
* A request to the client
Warning: fs/fuse/fuse_dev_i.h:117 This comment starts with '/**', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst
* Input queue callbacks
Warning: fs/fuse/fuse_dev_i.h:289 This comment starts with '/**', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst
* Fuse device instance
and more like this.
Convert enum fuse_req_flag to kernel-doc format.
Convert struct fuse_req, struct fuse_iqueue_ops, and struct fuse_dev
to kernel-doc format.
These warnings remain:
Warning: fs/fuse/fuse_dev_i.h:115 struct member 'ring_entry' not described in 'fuse_req'
Warning: fs/fuse/fuse_dev_i.h:115 struct member 'ring_queue' not described in 'fuse_req'
Binary build output is the same before and after these changes.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Randy Dunlap [Tue, 7 Apr 2026 00:50:38 +0000 (17:50 -0700)]
fuse-uring: drop kernel-doc notation for a comment
Use regular C comment syntax for a non-kernel-doc comment to avoid
a kernel-doc warning:
Warning: fs/fuse/dev_uring_i.h:104 This comment starts with '/**', but
isn't a kernel-doc comment.
* Describes if uring is for communication and holds alls the data needed
Binary build output is the same before and after this change.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse: alloc pqueue before installing fch in fuse_dev
Prior to this patchset, fuse_dev (containing fuse_pqueue) was allocated on
mount. But now fuse_dev is allocated when opening /dev/fuse, even though
the queues are not needed at that time.
Delay allocation of the pqueue (4k worth of list_head) just before mounting
or cloning a device.
Various distributions (e.g. Debian/Fedora) configure /dev/fuse as world
writable, so the pqueue allocation should be deferred to a privileged
operation (mount) to prevent unprivileged userspace from consuming pinned
kernel memory.
[Li Wang: fix kernel NULL pointer dereference in fuse_uring_add_to_pq()]
[Fix race in fuse_dev_release()]
fuse: fix io-uring background queue dispatch on request completion
When a background request completes via the io_uring path, the
background queue gets flushed to dispatch pending background requests,
but this is done before the connection-level background counters
(fc->num_background, fc->active_background) are properly accounted,
which may reduce effective queue depth to one.
The connection-level counters are decremented in fuse_request_end(), but
flush_bg_queue() flushes the /dev/fuse path queue (fc->bg_queue), not
the io_uring per-queue bg one, which means pending uring background
requests on the queue are never dispatched in this path.
Fix this by accounting the connection-level background counters first
before flushing the queue's background queue. Since
fuse_request_bg_finish() clears FR_BACKGROUND, fuse_request_end() will
skip the background cleanup branch entirely, which avoids any
double-decrements; it will call the wake_up(&req->waitq) branch but this
is effectively a no-op as background requests have no waiters on
req->waitq.
Alberto Ruiz [Wed, 8 Apr 2026 15:23:40 +0000 (17:23 +0200)]
fuse: fix device node leak in cuse_process_init_reply()
If device_add() succeeds during CUSE initialization but a subsequent
step (cdev_alloc() or cdev_add()) fails, the error path calls
put_device() without first calling device_del(). This leaks the
devtmpfs entry created by device_add(), leaving a stale /dev/<name>
node that persists until reboot.
Since the cuse_conn is never linked into cuse_conntbl on the failure
path, cuse_channel_release() sees cc->dev == NULL and skips
device_unregister(), so no other code path cleans up the node.
This has several consequences:
- The device name is permanently poisoned: any subsequent attempt to
create a CUSE device with the same name hits the stale sysfs entry,
device_add() fails, and the new device is aborted.
- The collision manifests as ENODEV returned to userspace with no
dmesg diagnostic, making it very difficult to debug.
- The failure is self-perpetuating: once a name is leaked, all future
attempts with that name fail identically.
Fix this by introducing an err_dev label that calls device_del() to
undo device_add() before falling through to err_unlock. The existing
err_unlock path from a device_add() failure correctly skips device_del()
since the device was never added.
Testing instructions can be found at the lore link below.
Miklos Szeredi [Wed, 10 Jun 2026 11:02:53 +0000 (13:02 +0200)]
fuse: do not use start_removing_noperm()
Revert the fuse part of commit c9ba789dad15 ("VFS: introduce
start_creating_noperm() and start_removing_noperm()").
Commit c9ba789dad15 ("VFS: introduce start_creating_noperm() and
start_removing_noperm()") caused a regression in FUSE_NOTIFY_INVAL_ENTRY,
which failed to invalidate negative dentries.
This manifests in the filesystem returning -ENOENT for operations on an
existing file.
Fixing it properly while still keeping the start_removing* infrastructure
would add much additional complexity.
Instead revert to the original simple implementation.
The start_removing* infrastructure is needed in VFS to abstract the
filesystem locking. However filesystem code can still safely use the raw
locking primitives without affacting other filesystems.
Commit c9ba789dad15 ("VFS: introduce start_creating_noperm() and
start_removing_noperm()") caused a regression in FUSE_NOTIFY_INVAL_ENTRY,
which failed to invalidate negative dentries.
This manifests in the filesystem returning -ENOENT for operations on an
existing file.
Fixing it properly while still keeping the start_removing* infrastructure
would add much additional complexity.
Instead revert to the original simple implementation.
The start_removing* infrastructure is needed in VFS to abstract the
filesystem locking. However filesystem code can still safely use the raw
locking primitives without affacting other filesystems.
On 32-bit kernels, size_t is also 32 bits, so the daemon-controlled
count multiplication can wrap. A prune notification with count
0x20000000 and no nodeid payload passes the check, enters the copy
loop, and asks the device copy path to read nodeids that are not
present in the userspace write buffer. In QEMU this reaches the
fuse_copy_fill() BUG_ON(!err) path.
Validate the payload length with array_size() instead. That accepts
exactly the same valid messages, but avoids wrapping arithmetic before
the copy loop consumes the count.
Joanne Koong [Tue, 9 Jun 2026 21:36:58 +0000 (14:36 -0700)]
fuse-uring: remove request-less entries from ent_w_req_queue to fix NULL deref
If a copy into the userspace ring buffer fails, a request will be
terminated and fuse_uring_req_end() will set ent->fuse_req to NULL but
it will leave the entry on ent_w_req_queue in FRRS_FUSE_REQ state. This
can lead to a NULL deref if the request expiration logic scans
ent_w_req_queue in the window before the entry is moved off it.
Fix this by taking the entry off ent_w_req_queue and changing its state
from FRRS_FUSE_REQ to FRRS_INVALID before terminating the request.
Ji'an Zhou [Tue, 9 Jun 2026 09:58:51 +0000 (09:58 +0000)]
fuse: clear intr_entry in fuse_resend and fuse_remove_pending_req
When fuse_resend() moves a request from fpq->processing back to
fiq->pending, it sets FR_PENDING and clears FR_SENT but does not
remove the requests intr_entry from fiq->interrupts. If the
request had FR_INTERRUPTED set from a prior signal, intr_entry
remains dangling on fiq->interrupts. When the requesting task
then receives a fatal signal, fuse_remove_pending_req() sees
FR_PENDING=1, removes the request from fiq->pending and frees it
via the refcount path, also without cleaning intr_entry. The
stale intr_entry causes use-after-free when fuse_read_interrupt()
iterates fiq->interrupts:
- list_del_init(&req->intr_entry) -> UAF write on freed slab
- req->in.h.unique -> UAF read, data leaked to userspace
Remove intr_entry from fiq->interrupts in fuse_resend() for
interrupted requests before they are placed back on fiq->pending.
Add a WARN_ON if the intr_entry is not empty on request destruction.
Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests") Cc: stable@vger.kernel.org # 6.9 Signed-off-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Bernd Schubert [Mon, 8 Jun 2026 21:03:45 +0000 (23:03 +0200)]
fuse-uring: make a fuse_req on SQE commit only findable after memcpy
Bad userspace might try to trick us and send commit SQEs request
unique / commit-id of requests that are not even send to
fuse-server (io_uring_cmd_done() not called) yet.
fuse_uring_commit_fetch() ends the fuse request when the ring entry
has a wrong state, but that could have caused a use-after-free
with the memcpy operations in fuse_uring_send_in_task().
In order to avoid such races the call of fuse_uring_add_to_pq()
is moved after the copy operations and just before completing
the io-uring request - malicious userspace cannot find the request
anymore until all prepration work in fuse-client/kernel is completed.
This also moves fuse_uring_add_to_pq() a bit up in the code to
avoid a forward declaration. Also not with a preparation commit,
to make it easier to back port to older kernels.
Bernd Schubert [Mon, 8 Jun 2026 21:03:44 +0000 (23:03 +0200)]
fuse-uring: Avoid queue->stopped races and set/read that value under lock
There are several readers of queue->stopped that check the value
under lock, but fuse_uring_commit_fetch() did not and actually
the value was not set under the lock in fuse_uring_abort_end_requests()
either. Especially in fuse_uring_commit_fetch it is important
to check under a lock, because due to races 'struct fuse_req'
might be freed with fuse_request_end, but another thread/cpu
might already do teardown work.
Bernd Schubert [Mon, 8 Jun 2026 21:03:43 +0000 (23:03 +0200)]
fuse-uring: Avoid use-after-free in fuse_uring_async_stop_queues
fuse_uring_async_stop_queues() might run when the last reference
on ring->queue_refs was already dropped.
In order to avoid an early destruction a reference on struct fuse_conn
is now taken before starting fuse_uring_async_stop_queues() and that
reference is only released when that delayed work queue terminates.
Chris Mason [Tue, 9 Jun 2026 00:28:55 +0000 (17:28 -0700)]
fuse-uring: end fuse_req on io-uring cancel task work
When io_uring delivers task work with tw.cancel set (PF_EXITING,
PF_KTHREAD fallback, or percpu_ref_is_dying on the ring context),
fuse_uring_send_in_task() takes the cancel branch, assigns
-ECANCELED, and falls through to fuse_uring_send(). That path only
flips the entry to FRRS_USERSPACE and completes the io_uring cmd;
it never discharges the ring entry's owning reference to the
fuse_req that fuse_uring_add_req_to_ring_ent() handed it at
dispatch time.
The fuse_req stays linked on fpq->processing[hash] and
fuse_request_end() is never invoked. The originating syscall
thread blocks in D-state in request_wait_answer() until
fuse_abort_conn() runs, which can be the entire connection
lifetime. For FR_BACKGROUND requests fc->num_background is never
decremented either, so repeated cancels inflate the counter until
max_background is hit and all later background ops stall. tw.cancel does
not imply a connection abort (e.g. a single io_uring worker thread exits
while the fuse connection stays up), so this cannot be left for
fuse_abort_conn() to clean up.
Ending the req but still routing the entry through fuse_uring_send()
is not enough: that leaves a req-less entry on ent_in_userspace, and
ent_list_request_expired() dereferences ent->fuse_req unconditionally
on the head of that list, which would then NULL-deref.
Fix the cancel branch to release the entry directly. Remove it from the
queue, complete the io_uring cmd, end the fuse_req, free the entry, and
drop its queue_refs (waking the teardown waiter if it was the last).
Fixes: c2c9af9a0b13 ("fuse: Allow to queue fg requests through io-uring") Cc: stable@vger.kernel.org Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Assisted-by: kres (claude-opus-4-7) Signed-off-by: Chris Mason <clm@meta.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Joanne Koong [Mon, 8 Jun 2026 19:21:49 +0000 (12:21 -0700)]
fuse-uring: fix moving cancelled entry to ent_in_userspace list
fuse_uring_cancel() moves entries that are available (these have no reqs
attached) to the ent_in_userspace list. ent_list_request_expired()
checks the first entry on ent_in_userspace and dereferences
ent->fuse_req unconditionally, which will crash on a cancelled entry
that was moved to this list.
Fix this by freeing the entry and dropping queue_refs directly in
fuse_uring_cancel(). This is safe because cancel is the cancel handler
itself - after io_uring_cmd_done(), no more cancels will be dispatched
for this command, and teardown serializes with cancel via queue->lock.
Since cancel now decrements queue_refs, fuse_uring_abort() must no
longer gate fuse_uring_abort_end_requests() on queue_refs > 0, as
cancelled entries may have already dropped queue_refs while requests are
still queued. Remove the gate so abort always flushes requests and stops
queues.
Joanne Koong [Mon, 8 Jun 2026 19:21:48 +0000 (12:21 -0700)]
fuse-uring: check connection abort during ring creation
Check fch->connected under fch->lock in fuse_uring_create() before
attaching a new ring. Without this, a race between fuse_uring_create()
and fuse_chan_abort() can result in the ring, queue, and fpq.processing
table being created after fuse_uring_abort() has already run, leading
to unnecessary allocation and teardown. These are eventually cleaned up
by fuse_uring_destruct() but will linger until the process exits, even
with the connection aborted.
Joanne Koong [Mon, 8 Jun 2026 19:21:47 +0000 (12:21 -0700)]
fuse-uring: fix race between registration and connection abortion
This fixes this race:
- thread a: io_uring_enter -> register sqe ->
fuse_uring_create_ring_ent -> allocate ent but doesn't grab queue_ref
yet
- thread b: fuse_conn_destroy() -> fuse_chan_abort() ->
fuse_uring_abort() is a no-op due to queue ref being 0
- thread a: grabs the queue_ref, queue_ref is now 1, rest of
fuse_uring_do_register() logic executes
- thread b: fuse_chan_abort() returns, fuse_chan_wait_aborted() now runs
and calls
"wait_event(ring->stop_waitq, atomic_read(&ring->queue_refs) == 0);"
The abort/unmount thread will hang indefinitely in unkillable state as
nothing will decrement queue_refs or wake stop_waitq, and the ring,
queue, and ent are leaked.
Fix this by checking fch->connected under fch->lock after the created
ent has grabbed a ref count on the queue. This ensures that in the
scenario above, it is guaranteed that we either release the queue ref
and wake up stop_waitq (in case fuse_chan_wait_aborted() is already
waiting) in fuse_uring_do_register() when we detect !fch->connected, or
if the connection is aborted after the check, it is guaranteed that the
async teardown worker will be running in the background cleaning up ents
and decrementing the ent's ref on the queue, which will unblock the
eventual queue and ring teardown.
Chris Mason [Fri, 5 Jun 2026 19:27:07 +0000 (12:27 -0700)]
fuse-uring: fix data races on ring->ready
On weakly-ordered architectures, the store to fiq->ops can be
reordered past the store to ring->ready, allowing a CPU that sees
ring->ready == true via fuse_uring_ready() to dispatch requests
through a stale fiq->ops pointer. Upgrade the store to
smp_store_release() and the load in fuse_uring_ready() to
smp_load_acquire() so that the preceding WRITE_ONCE(fiq->ops, ...)
is visible to any CPU that observes ring->ready == true.
Additionally, fuse_uring_do_register() publishes ring->ready with
WRITE_ONCE() but the fast-path check reads it with a plain load.
This is a marked-vs-unmarked access that KCSAN will flag. Wrap it in
READ_ONCE() to mark it without adding unnecessary ordering.
Also wrap the fc->ring load in fuse_uring_ready() in READ_ONCE() to
prevent the compiler from reloading it between the NULL check and the
dereference.
Chris Mason [Fri, 5 Jun 2026 19:27:06 +0000 (12:27 -0700)]
fuse-uring: fix EFAULT clobber in fuse_uring_commit
copy_from_user() returns the number of bytes not copied as an unsigned
residual on failure (1..sizeof(struct fuse_out_header)). fuse_uring_commit
stores that residual in ssize_t err, sets req->out.h.error to -EFAULT,
then jumps to out: with err still holding the positive residual.
err = copy_from_user(&req->out.h, &ent->headers->in_out,
sizeof(req->out.h));
if (err) {
req->out.h.error = -EFAULT;
goto out; /* err is the positive residual */
}
...
out:
fuse_uring_req_end(ent, req, err);
fuse_uring_req_end() then runs
if (error)
req->out.h.error = error;
which overwrites the just-assigned -EFAULT with the positive residual.
FUSE callers such as fuse_simple_request() test err < 0 to detect
failure, so the positive value is interpreted as success and the
caller proceeds with an uninitialised or partial req->out.args.
Fix by assigning err = -EFAULT in the failure branch before jumping
to out, so fuse_uring_req_end() receives a negative errno and sets
req->out.h.error to -EFAULT.
Matthew R. Ochs [Tue, 26 May 2026 15:20:21 +0000 (08:20 -0700)]
fuse: back uncached readdir buffers with pages
Commit dabb90391028 ("fuse: increase readdir buffer size") changed
fuse_readdir_uncached() to size its temporary buffer from ctx->count.
This is useful for overlayfs and other in-kernel callers that use
INT_MAX to indicate an unlimited directory read.
The larger buffer is currently supplied as a kvec output argument. For
virtiofs, kvec arguments are copied through req->argbuf, which is
allocated with kmalloc(..., GFP_ATOMIC). A large uncached readdir buffer
can therefore require a multi-megabyte contiguous atomic allocation
before the request is queued.
Avoid the large bounce-buffer allocation by backing uncached readdir
output with pages and setting out_pages. Transports such as virtiofs can
then pass the pages as scatter-gather entries instead of copying the
output through argbuf.
Map the pages with vm_map_ram() only while parsing the returned dirents.
The existing parser can then continue to use a linear kernel mapping.
[SzM: separate allocation of pages into a helper function]
Fixes: dabb90391028 ("fuse: increase readdir buffer size") Cc: stable@vger.kernel.org Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Thu, 28 May 2026 08:58:24 +0000 (10:58 +0200)]
virtiofs: fix UAF on submount umount
iput() called from fuse_release_end() can Oops if the super block has
already been destroyed. Normally this is prevented by waiting for
num_waiting to go down to zero before commencing with super block shutdown.
This only works, however, for the last submount instance, as the wait
counter is per connection, not per superblock.
Revert to using synchronous release requests for the auto_submounts case,
which is virtiofs only at this time.
Reported-by: Aurélien Bombo <abombo@microsoft.com> Reported-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: Greg Kurz <gkurz@redhat.com> Closes: https://github.com/kata-containers/kata-containers/issues/12589 Fixes: 26e5c67deb2e ("fuse: fix livelock in synchronous file put from fuseblk workers") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <gkurz@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Arnd Bergmann [Mon, 15 Jun 2026 11:46:01 +0000 (13:46 +0200)]
Merge tag 'memory-controller-drv-7.2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers
Memory controller drivers for v7.2, part two
A few improvements for Tegra Memory Controller drivers, including one
fix for UBSAN report for an older commit.
* tag 'memory-controller-drv-7.2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
memory: tegra234: drop dead NULL check in tegra234_mc_icc_aggregate()
memory: tegra264: drop redundant tegra264_mc_icc_aggregate()
memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
Viktor Menshin [Mon, 15 Jun 2026 09:25:15 +0000 (18:25 +0900)]
ALSA: hda/realtek: Add quirk for Lenovo Xiaoxin 14 GT
The Lenovo Xiaoxin 14 GT (Chinese market model, AMD Ryzen AI 9 365)
produces constant electrical hissing and crackling noise from both
internal speakers and 3.5mm headphone jack during audio playback.
Audio works correctly on Windows.
The PCI SSID 17aa:3912 is not present in the quirk list. The device
shares the same AMD platform and ALC287 codec as neighboring Lenovo
14" AMD models (17aa:3911, 17aa:390d), so apply the same fixup.
Note: the fixup selection is based on similarity with neighboring
models and has not been verified by testing a compiled kernel.
Guidance from maintainers on the correct fixup is welcome.
powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
twice -- once for KEXEC_STATE_IRQS_OFF and once for
KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
leaving preempt_count elevated by one extra nesting level.
In practice the imbalance does not trigger a 'scheduling while atomic'
splat because the kexec path is a one-way trip: IRQs are already
disabled, no schedule() occurs after the leak, and
default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
before jumping into kexec_sequence() which never returns. However the
bookkeeping is still wrong.
kexec_prepare_cpus() calls local_irq_disable()/hard_irq_disable()
before invoking kexec_prepare_cpus_wait(), so the CPU is already pinned
and the get_cpu()/put_cpu() preempt_disable() bracketing is unnecessary.
Only the current CPU id is needed, so replace get_cpu() with
raw_smp_processor_id() and drop the now-unneeded put_cpu().
powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
CPU id but never calls the matching put_cpu(), leaking one
preempt_disable() nesting level on every invocation.
In practice the imbalance does not trigger a visible splat because the
kexec teardown path is a one-way trip: IRQs are already disabled, no
schedule() occurs after the leak, and default_machine_kexec() overwrites
preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
which never returns. However the bookkeeping is still wrong.
The function only needs the current CPU id, and this path runs with
interrupts disabled and the CPU pinned, so the preempt_disable()
side-effect of get_cpu() is unnecessary. Replace it with
raw_smp_processor_id().
powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events) at
the 'out:' label, but only calls the matching get_cpu_var() after the
'i < 0' early-return check. When event->hw.idx is negative the
function jumps to 'out:' without having taken get_cpu_var(), and the
trailing put_cpu_var() then issues an unmatched preempt_enable(),
underflowing preempt_count.
On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
eventually present as a 'scheduling while atomic' BUG.
Move put_cpu_var() to pair with get_cpu_var() so the percpu access is
correctly bracketed and the 'out:' label only handles perf_pmu_enable.
Amit Machhiwal [Mon, 25 May 2026 16:16:01 +0000 (21:46 +0530)]
powerpc/boot: Allow text relocations for pseries wrapper with binutils 2.46+
Binutils 2.46 changed the default linker behavior from '-z notext' to
'-z text', which treats dynamic relocations in read-only segments as
errors rather than warnings. This causes the pseries boot wrapper build
to fail with:
/usr/bin/ld.bfd: arch/powerpc/boot/wrapper.a(crt0.o): warning:
relocation against `_platform_stack_top' in read-only section `.text'
/usr/bin/ld.bfd: error: read-only segment has dynamic relocations
The pseries wrapper uses '-pie' to create position-independent code.
However, crt0.S contains a pointer to '_platform_stack_top' in the .text
section, which requires a dynamic relocation at runtime. This creates
DT_TEXTREL (text relocations), which were allowed by default in binutils
2.45 and earlier (via implicit '-z notext') but are now rejected by
binutils 2.46+.
Add '-z notext' linker flag to explicitly allow text relocations for
the pseries platform, similar to what is already done for the epapr
platform. This restores the previous behavior and allows the boot
wrapper to build successfully with binutils 2.46+.
A few old machines have not been converted away from the old-style
gpiolib interfaces. Make these select the new CONFIG_GPIOLIB_LEGACY
symbol so the code still works where it is needed but can be left
out otherwise.
This is the list of all gpio_request() calls in mips:
Linus Torvalds [Mon, 15 Jun 2026 10:23:57 +0000 (15:53 +0530)]
Merge tag 'pull-fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/viro/vfs
Pull udf fix from Al Viro:
"I just noticed that a udf fix had been sitting in #fixes since
February; still applicable, Jan's Acked-by applied. Very belated pull
request"
* tag 'pull-fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/viro/vfs:
udf: fix nls leak on udf_fill_super() failure
After a recent change in binutils that warns when local symbols have
non-default visibility [1], there are a couple instances when building
arch/mips:
Assembler messages:
{standard input}: Warning: local symbol `__memset' has non-default visibility
Assembler messages:
{standard input}: Warning: local symbol `__memcpy' has non-default visibility
Remove the '.hidden' directives for these symbols to clear up the
warnings, as they are pointless with a local symbol, which is by
definition hidden. This results in no changes to these symbols in nm's
output when assembled with various copies of binutils.
MIPS: VDSO: Avoid including .got in dynamic segment
After commit 2db1ec80dfd5 ("MIPS: VDSO: Fold MIPS_DISABLE_VDSO into
MIPS_GENERIC_GETTIMEOFDAY"), building ARCH=mips allnoconfig with LLVM=1
shows some warnings from llvm-readelf while checking the VDSO for
dynamic relocations:
llvm-readelf: warning: 'arch/mips/vdso/vdso.so.dbg.raw': invalid PT_DYNAMIC size (0xa4)
llvm-readelf: warning: 'arch/mips/vdso/vdso.so.dbg.raw': PT_DYNAMIC dynamic table is invalid: SHT_DYNAMIC will be used
The blamed commit alters the link order of objects into vdso.so.raw,
placing vgettimeofday.o after sigreturn.o. This ultimately results in
the .text section shrinking slightly in size, which in turn changes the
offset of the .dynamic section.
Changing the offset of the .dynamic section causes the dynamic segment
size to grow by the same amount, which triggers a warning in
llvm-readelf because PT_DYNAMIC's p_filesz (0xa4) is no longer a
multiple of its sh_entsize (8):
- DYNAMIC 0x000c20 0x00000c20 0x00000c20 0x00098 0x00098 R 0x10
+ DYNAMIC 0x000c14 0x00000c14 0x00000c14 0x000a4 0x000a4 R 0x10
The size of the dynamic segment was already incorrect before the blamed
comment, as it should be 0x90 like the .dynamic section above (18
entries at 8 bytes per entry); it just so happens that 0x98 % 8 is 0,
whereas 0xa4 % 8 is 4, so there was no warning.
Looking at the section to segment mapping of the dynamic segment reveals
that it includes the .got section, as it is implicitly placed after
.dynamic by ld.lld's orphan section heuristics and inherits its segments
from the linker script.
Explicitly describe the .got section in the MIPS VDSO linker script
after .rodata, which switches back to the default text segment,
resulting in a dynamic segment that is the exact size of the .dynamic
section as expected with no other layout changes.
- DYNAMIC 0x000c14 0x00000c14 0x00000c14 0x000a4 0x000a4 R 0x10
+ DYNAMIC 0x000c14 0x00000c14 0x00000c14 0x00090 0x00090 R 0x4
- 03 .dynamic .got
+ 03 .dynamic
Closes: https://github.com/ClangBuiltLinux/linux/issues/2166 Fixes: 2db1ec80dfd5 ("MIPS: VDSO: Fold MIPS_DISABLE_VDSO into MIPS_GENERIC_GETTIMEOFDAY") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Jonas Jelonek [Mon, 8 Jun 2026 09:37:29 +0000 (09:37 +0000)]
MIPS: smp: report dying CPU to RCU in stop_this_cpu()
smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
marks the CPU offline for the scheduler via set_cpu_online(false) but
never informs RCU, so RCU keeps expecting a quiescent state from CPUs
that are now spinning forever with interrupts disabled.
As long as nothing waits for an RCU grace period after smp_send_stop()
this is harmless, which is why it went unnoticed. Since commit 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
however, irq_work_sync() calls synchronize_rcu() on architectures without
an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
false. That is the asm-generic default used by MIPS. Any irq_work_sync()
issued in the reboot/shutdown path after smp_send_stop() then blocks on
a grace period that can never complete, hanging the reboot:
WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
...
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu: Offline CPU 1 blocking current GP.
rcu: Offline CPU 2 blocking current GP.
rcu: Offline CPU 3 blocking current GP.
This issue was noticed on several Realtek MIPS switch SoCs (MIPS
interAptiv) and came up during kernel bump downstream in OpenWrt from
6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable
branch. The patch also has been backported all the way back to 6.1.
Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs
and grace periods can still complete. MIPS shuts down all CPUs here
without going through the CPU-hotplug mechanism, so this report is not
otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug
offline path is not unprecedented: arm64 does the same in cpu_die_early().
There it is an exception for a CPU that was coming online and is aborting
bringup, rather than the default shutdown action as on MIPS.
Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT") CC: stable@vger.kernel.org Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This patch addresses a critical memory management flaw. When
CONFIG_CPUMASK_OFFSTACK is enabled, cpumask_var_t is a pointer.
Consequently, sizeof(new_mask) evaluates to the pointer size, causing
copy_from_user() to clobber the mask pointer. Furthermore, the old
logic performed copy_from_user() before allocating the mask.
Fix this by allocating new_mask first. To handle variable-sized user
masks correctly, use cpumask_size() to truncate overly large user masks
or pad undersized masks with zeros before copying the data directly into
the allocated buffer.
Fixes: 295cbf6d63165 ("[MIPS] Move FPU affinity code into separate file.") Cc: stable@vger.kernel.org Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Yadan Fan [Mon, 25 May 2026 04:04:36 +0000 (12:04 +0800)]
MIPS: mm: Fix out-of-bounds write in maar_res_walk()
maar_res_walk() uses wi->num_cfg as the index into the fixed-size
wi->cfg array, but checks whether the array is full only after it has
filled the selected entry. If walk_system_ram_range() reports more than
16 memory ranges, the overflow call writes one struct maar_config past
the end of the array before WARN_ON() prevents num_cfg from advancing.
Move the full-array check before taking the array slot and return non-zero
when the scratch array is full, so walk_system_ram_range() terminates the
walk instead of invoking the callback for further ranges.
Fixes: a5718fe8f70f ("MIPS: mm: Drop boot_mem_map") Signed-off-by: Yadan Fan <ydfan@suse.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Rosen Penev [Wed, 27 May 2026 22:25:04 +0000 (15:25 -0700)]
MIPS: ath79: reduce ARCH_DMA_MINALIGN
Currently, ath79 SoCs use the default ARCH_DMA_MINALIGN value of 128
bytes defined in mach-generic. This is excessive for these platforms
and leads to significant memory waste in kmalloc.
Override ARCH_DMA_MINALIGN to use L1_CACHE_BYTES, which is 32 bytes for
ath79 SoCs.
Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Rosen Penev [Mon, 8 Jun 2026 05:32:02 +0000 (22:32 -0700)]
mips: dts: ar9132: fix wdt node name
Fixes the following warning:
$nodename:0: 'wdt@18060008' does not match
'^(timer|watchdog)(@.*|-([0-9]|[1-9][0-9]+))?$'
from schema $id: http://devicetree.org/schemas/watchdog/qca,ar7130-wdt.yaml#
Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS: mm: remove comment referring to removed CONFIG_MIPS_CMP
CMP support was removed in commit 7fb6f7b0af67 ("MIPS: Remove
deprecated CONFIG_MIPS_CMP"), but a comment referring to it remained in
arch/mips/mm/c-r4k.c. Remove it.
Discovered while searching for CONFIG_* symbols referenced in code but
not defined in any Kconfig file.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Linus Torvalds [Mon, 15 Jun 2026 09:55:17 +0000 (15:25 +0530)]
Merge tag 'x86-cpu-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Ingo Molnar:
- CPUID API updates (Ahmed S. Darwish):
- Introduce a centralized CPUID parser
- Introduce a centralized CPUID data model
- Introduce <asm/cpuid/leaf_types.h>
- Rename cpuid_leaf()/cpuid_subleaf() APIs
- treewide: Explicitly include the x86 CPUID headers
- Update to x86-cpuid-db v3.1 (Maciej Wieczor-Retman)
- Continued removal of pre-i586 support and related simplifications
(Ingo Molnar)
- Add Intel CPU model number for rugged Panther Lake (Tony Luck)
- Misc fixes, updates and cleanups by Arnd Bergmann, Chao Gao, Lukas
Bulwahn, Sohil Mehta, Maciej Wieczor-Retman.
* tag 'x86-cpu-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/cpu: Make CONFIG_X86_CX8 unconditional
x86/cpu: Remove unused !CONFIG_X86_TSC code
x86/cpuid: Update bitfields to x86-cpuid-db v3.1
tools/x86/kcpuid: Update bitfields to x86-cpuid-db v3.1
x86/cpu: Make CONFIG_X86_TSC unconditional
MAINTAINERS: Drop obsolete FPU EMULATOR section
x86/cpu: Fix a F00F bug warning and clean up surrounding code
x86/cpu: Add Intel CPU model number for rugged Panther Lake
x86/cpuid: Introduce a centralized CPUID parser
x86/cpu: Introduce a centralized CPUID data model
x86/cpuid: Introduce <asm/cpuid/leaf_types.h>
x86/cpuid: Rename cpuid_leaf()/cpuid_subleaf() APIs
x86/cpu: Do not include the CPUID API header in asm/processor.h
Documentation: core-api/cpu_hotplug: Remove stale cpu0_hotplug docs
x86/cpu, cpufreq: Remove AMD ELAN support
x86/fpu: Remove the math-emu/ FPU emulation library
x86/fpu: Remove the 'no387' boot option
x86/fpu: Remove MATH_EMULATION and related glue code
treewide: Explicitly include the x86 CPUID headers
x86/cpu: Remove the CONFIG_X86_INVD_BUG quirk
...
Linus Torvalds [Mon, 15 Jun 2026 09:20:18 +0000 (14:50 +0530)]
Merge tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"SMP load-balancing updates:
- A large series to introduce infrastructure for cache-aware load
balancing, with the goal of co-locating tasks that share data
within the same Last Level Cache (LLC) domain. By improving cache
locality, the scheduler can reduce cache bouncing and cache misses,
ultimately improving data access efficiency.
Implemented by Chen Yu and Tim Chen, based on early prototype work
by Peter Zijlstra, with fixes by Jianyong Wu, Peter Zijlstra and
Shrikanth Hegde.
- A series to simplify CONFIG_SCHED_SMT ifdef usage (Shrikanth Hegde)
Fair scheduler updates:
- A series to improve SD_ASYM_CPUCAPACITY scheduling by introducing
SMT awareness (Andrea Righi, K Prateek Nayak)
- A series to optimize cfs_rq and sched_entity allocation for better
data locality (Zecheng Li)
- A preparatory series to change fair/cgroup scheduling to a single
runqueue, without the final change (Peter Zijlstra)
- Optimize update_tg_load_avg()'s rate-limiting code (Rik van Riel)
- Allow account_cfs_rq_runtime() to throttle current hierarchy
(K Prateek Nayak)
- Update util_est after updating util_avg during dequeue, to fix the
util signal update logic, which reduces signal noise (Vincent
Guittot)
Scheduler topology updates:
- Allow multiple domains to claim sched_domain_shared (K Prateek
Nayak)
- Add parameter to split LLC (Peter Zijlstra)
Core scheduler updates:
- Use trace_call__<tp>() to save a static branch (Gabriele Monaco)
Scheduler statistics updates:
- Drop now-stale mul_u64_u64_div_u64() cputime over-approximation
guard (Nicolas Pitre)
Deadline scheduler updates:
- Reject debugfs dl_server writes for offline CPUs (Andrea Righi)
- Fix replenishment logic for non-deferred servers (Yuri Andriaccio)
RT scheduling updates:
- Turn RT_PUSH_IPI default off for non PREEMPT_RT (Steven Rostedt)
- Update default bandwidth for real-time tasks to 1.0 (Yuri
Andriaccio)
Proxy scheduling updates:
- A series to implement Optimized Donor Migration for Proxy Execution
(John Stultz, Peter Zijlstra)
- Various proxy scheduling cleanups and fixes (Peter Zijlstra,
K Prateek Nayak)
Misc fixes, improvements and cleanups by Aaron Lu, Andrea Righi,
Zenghui Yu, Chen Yu, Guanyou.Chen, John Stultz, Shrikanth Hegde,
Peter Zijlstra, Liang Luo and Yiyang Chen"
* tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (91 commits)
sched/fair: Fix newidle vs core-sched
sched/deadline: Use task_on_rq_migrating() helper
sched/core: Combine separate 'else' and 'if' statements
sched/fair: Fix cpu_util runnable_avg arithmetic
sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()
sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up()
sched/fair: Call update_curr() before unthrottling the hierarchy
sched/fair: Use throttled_csd_list for local unthrottle
sched/fair: Convert cfs bandwidth throttling to use guards
sched/fair: Allocate cfs_tg_state with percpu allocator
sched/fair: Remove task_group->se pointer array
sched/fair: Co-locate cfs_rq and sched_entity in cfs_tg_state
sched: restore timer_slack_ns when resetting RT policy on fork
MAINTAINERS: Fix spelling mistake in Peter's name
sched: Simplify ttwu_runnable()
sched/proxy: Remove superfluous clear_task_blocked_in()
sched/proxy: Remove PROXY_WAKING
sched/proxy: Switch proxy to use p->is_blocked
sched/proxy: Only return migrate when needed
sched: Be more strict about p->is_blocked
...
- Fix various inaccurate hard-coded event configurations (Dapeng Mi)
Intel uncore PMU driver updates (Zide Chen):
- Fix discovery unit lookup bug for multi-die systems
- Guard against invalid box control address
- Fix PCI device refcount leak in UPI discovery
- Defer ADL global PMON enable to enable_box() to save power
- Fix uncore_die_to_cpu() for offline dies
- Implement global init callback for GNR uncore
AMD CPU PMU driver updates:
- Always use the NMI latency mitigation (Sandipan Das)
AMD uncore PMU driver updates:
- Use Node ID to identify DF and UMC domains (Sandipan Das)"
* tag 'perf-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (22 commits)
perf/x86/amd/uncore: Use Node ID to identify DF and UMC domains
perf: Reveal PMU type in fdinfo
perf/x86/intel/uncore: Implement global init callback for GNR uncore
perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies
perf/x86/intel/uncore: Move die_to_cpu() to uncore.c
perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()
perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery
perf/x86/intel/uncore: Guard against invalid box control address
perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
perf/x86/amd/core: Always use the NMI latency mitigation
perf/x86/intel: Update event constraints and cache_extra_regsfor CWF
perf/x86/intel: Update event constraints and cache_extra_regsfor SRF
perf/x86/intel: Update event constraints and cache_extra_regsfor NVL
perf/x86/intel: Update event constraints for PTL
perf/x86/intel: Update event constraints and cache_extra_regsfor ARL
perf/x86/intel: Update event constraints and cache_extra_regsfor LNL
perf/x86/intel: Update event constraints and cache_extra_regsfor MTL
perf/x86/intel: Update event constraints and cache_extra_regsfor ADL
perf/x86/intel: Update event constraints for DMR
perf/x86/intel: Update event constraints and cache_extra_regsfor SPR
...
- Large series to address the robust futex unlock race for real, by
Thomas Gleixner:
"The robust futex unlock mechanism is racy in respect to the
clearing of the robust_list_head::list_op_pending pointer because
unlock and clearing the pointer are not atomic.
The race window is between the unlock and clearing the pending op
pointer. If the task is forced to exit in this window, exit will
access a potentially invalid pending op pointer when cleaning up
the robust list.
That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF.
In the worst case this UAF can lead to memory corruption when
unrelated content has been mapped to the same address by the time
the access happens.
User space can't solve this problem without help from the kernel.
This series provides the kernel side infrastructure to help it
along:
1) Combined unlock, pointer clearing, wake-up for the
contended case
2) VDSO based unlock and pointer clearing helpers with a
fix-up function in the kernel when user space was interrupted
within the critical section.
... with help by André Almeida:
- Add a note about robust list race condition (André Almeida)
- Add self-tests for robust release operations (André Almeida)
Context analysis updates:
- Implement context analysis for 'struct rt_mutex'. (Bart Van Assche)
- Bump required Clang version to 23 (Marco Elver)
Guard infrastructure updates:
- Series to remove NULL check from unconditional guards (Dmitry
Ilvokhin)
Lockdep updates:
- Restore self-test migrate_disable() and sched_rt_mutex state on
PREEMPT_RT (Karl Mehltretter)
Membarriers updates:
- Use per-CPU mutexes for targeted commands (Aniket Gattani)
- Modernize membarrier_global_expedited with cleanup guards (Aniket
Gattani)
- Add rseq stress test for CFS throttle interactions (Aniket Gattani)
percpu-rwsems updates:
- Extract __percpu_up_read() to optimize inlining overhead (Dmitry
Ilvokhin)
Seqlocks updates:
- Allow UBSAN_ALIGNMENT to fail optimizing (Heiko Carstens)
Lock tracing:
- Add contended_release tracepoint to sleepable locks such as
mutexes, percpu-rwsems, rtmutexes, rwsems and semaphores (Dmitry
Ilvokhin)
MAINTAINERS updates:
- MAINTAINERS: Add RUST [SYNC] entry (Boqun Feng)
Misc updates and fixes by Randy Dunlap, YE WEI-HONG, Fabricio Parra,
Dmitry Ilvokhin and Peter Zijlstra"
* tag 'locking-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (36 commits)
locking: Add contended_release tracepoint to sleepable locks
locking/percpu-rwsem: Extract __percpu_up_read()
tracing/lock: Remove unnecessary linux/sched.h include
futex: Optimize futex hash bucket access patterns
rust: sync: completion: Mark inline complete_all and wait_for_completion
MAINTAINERS: Add RUST [SYNC] entry
cleanup: Specify nonnull argument index
selftests: futex: Add tests for robust release operations
Documentation: futex: Add a note about robust list race condition
x86/vdso: Implement __vdso_futex_robust_try_unlock()
x86/vdso: Prepare for robust futex unlock support
futex: Provide infrastructure to plug the non contended robust futex unlock race
futex: Add robust futex unlock IP range
futex: Add support for unlocking robust futexes
futex: Cleanup UAPI defines
x86: Select ARCH_MEMORY_ORDER_TSO
uaccess: Provide unsafe_atomic_store_release_user()
futex: Provide UABI defines for robust list entry modifiers
futex: Move futex related mm_struct data into a struct
futex: Make futex_mm_init() void
...
Linus Torvalds [Mon, 15 Jun 2026 08:27:13 +0000 (13:57 +0530)]
Merge tag 'timers-vdso-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull vdso updates from Thomas Gleixner:
- Remove the redundant CONFIG_GENERIC_TIME_VSYSCALL after converting
the remaining users over.
- Rework and sanitize the MIPS VDSO handling, so it does not handle the
time related VDSO if there is no VDSO capable clocksource available.
Also stop mapping VDSO data pages unconditionally even if there is no
usage possible.
* tag 'timers-vdso-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
MIPS: VDSO: Fold MIPS_CLOCK_VSYSCALL into MIPS_GENERIC_GETTIMEOFDAY
MIPS: VDSO: Gate microMIPS restriction on GCC version
MIPS: VDSO: Fold MIPS_DISABLE_VDSO into MIPS_GENERIC_GETTIMEOFDAY
clocksource/drivers/mips-gic-timer: Only use VDSO_CLOCKMODE_GIC when it is a available
MIPS: csrc-r4k: Only use VDSO_CLOCKMODE_R4K when it is a available
MIPS: VDSO: Only map the data pages when the vDSO is used
MIPS: Introduce Kconfig MIPS_GENERIC_GETTIMEOFDAY
vdso/datastore: Always provide symbol declarations
MAINTAINERS: Add include/linux/vdso_datastore.h to vDSO block
vdso/gettimeofday: Rename __arch_get_vdso_u_timens_data()
vdso/treewide: Drop GENERIC_TIME_VSYSCALL
vdso/vsyscall: Gate update_vsyscall() behind CONFIG_GENERIC_GETTIMEOFDAY
riscv: vdso: Drop CONFIG_GENERIC_TIME_VSYSCALL guard around syscall fallbacks
Linus Torvalds [Mon, 15 Jun 2026 08:21:27 +0000 (13:51 +0530)]
Merge tag 'timers-ptp-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull timekeeping updates from Thomas Gleixner:
"Updates for NTP/timekeeping and PTP:
- Expand timekeeping snapshot mechanisms
The various snapshot functions are mostly used for PTP to collect
"atomic" snapshots of various involved clocks.
They lack support for the recently introduced AUX clocks and do not
provide the underlying counter value (e.g. TSC) to user space.
Exposing the counter value snapshot allows for better control and
steering.
Convert the hard wired ktime_get_snapshot() to take a clock ID,
which allows the caller to select the clock ID to be captured along
with CLOCK_MONONOTONIC_RAW. Additionally capture the underlying
hardware counter value and the clock source ID of the counter.
Expand the hardware based snapshot capture where devices provide a
mechanism to snapshot the hardware PTP clock and the system counter
(usually via PCI/PTM) to support AUX clocks and also provide the
captured counter value back to the caller and not only the clock
timestamps derived from it.
- Add a new optional read_snapshot() callback to clocksources
That is required to capture atomic snapshots from clocksources
which are derived from TSC with a scaling mechanism (e.g. Hyper-V,
KVMclock).
The value pair is handed back in the snapshot structure to the
callers, so they can do the necessary correlations in a more
precise way.
This touches usage sites of the affected functions and data structure
all over the tree, but stays fully backwards compatible for the
existing user space exposed interfaces. New PTP IOCTLs will provide
access to the extended functionality in later kernel versions"
* tag 'timers-ptp-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (28 commits)
ptp: vmclock: Use hw_cycles from snapshot for precise TSC pairing
x86/kvmclock: Implement read_snapshot() for kvmclock clocksource
clocksource/hyperv: Implement read_snapshot() for TSC page clocksource
timekeeping: Add clocksource read_snapshot() method and hw_cycles to snapshot
ptp: Switch to ktime_get_snapshot_id() for pre/post timestamps
timekeeping: Add support for AUX clock cross timestamping
timekeeping: Remove system_device_crosststamp::sys_realtime
ALSA: hda/common: Use system_device_crosststamp::sys_systime
wifi: iwlwifi: Use system_device_crosststamp::sys_systime
ptp: Use system_device_crosststamp::sys_systime
timekeeping: Prepare for cross timestamps on arbitrary clock IDs
timekeeping: Remove ktime_get_snapshot()
virtio_rtc: Use provided clock ID for history snapshot
net/mlx5: Use provided clock ID for history snapshot
igc: Use provided clock ID for history snapshot
ice/ptp: Use provided clock ID for history snapshot
wifi: iwlwifi: Adopt PTP cross timestamps to core changes
timekeeping: Add CLOCK ID to system_device_crosststamp
timekeeping: Add system_counterval_t to struct system_device_crosststamp
timekeeping: Add CLOCK_AUX support for ktime_get_snapshot_id()
...
Linus Torvalds [Mon, 15 Jun 2026 08:18:52 +0000 (13:48 +0530)]
Merge tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull NOHZ updates from Thomas Gleixner:
- Fix a long standing TOCTOU in get_cpu_sleep_time_us()
- Make the CPU offline NOHZ handling more robust by disabling NOHZ on
the outgoing CPU early instead of creating unneeded state which needs
to be undone.
- Unify idle CPU time accounting instead of having two different
accounting mechanisms. These two different mechanisms are not really
independent, but the different properties can in the worst case cause
that gloabl idle time can be observed going backwards.
- Consolidate the idle/iowait time retrieval interfaces instead of
converting back and forth between them.
- Make idle interrupt time accounting more robust. The original code
assumes that interrupt time accouting is enabled and therefore stops
elapsing idle time while an interrupt is handled in NOHZ dyntick
state. That assumption is not correct as interrupt time accounting
can be disabled at compile and runtime.
- Fix an accounting error between dyntick idle time and dyntick idle
steal time. The stolen time is not accounted and therefore idle time
becomes inaccurate. The stolen time is now accounted after the fact
as there is no way to predict the steal time upfront.
* tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Handle dyntick-idle steal time correctly
sched/cputime: Handle idle irqtime gracefully
sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case
tick/sched: Consolidate idle time fetching APIs
tick/sched: Account tickless idle cputime only when tick is stopped
tick/sched: Remove unused fields
tick/sched: Move dyntick-idle cputime accounting to cputime code
tick/sched: Remove nohz disabled special case in cputime fetch
tick/sched: Unify idle cputime accounting
s390/time: Prepare to stop elapsing in dynticks-idle
powerpc/time: Prepare to stop elapsing in dynticks-idle
sched/cputime: Correctly support generic vtime idle time
sched/cputime: Remove superfluous and error prone kcpustat_field() parameter
sched/idle: Handle offlining first in idle loop
tick/sched: Fix TOCTOU in nohz idle time fetch
Linus Torvalds [Mon, 15 Jun 2026 08:09:12 +0000 (13:39 +0530)]
Merge tag 'timers-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
"Updates for the time/timer core subsystem:
- Harden the user space controllable hrtimer interfaces further to
protect against unpriviledged DoS attempts by arming timers in the
past.
- Add per-capacity hierarchies to the timer migration code to prevent
timer migration accross different capacity domains. This code has
been disabled last minute as there is a pathological problem with
SoCs which advertise a larger number of capacity domains. The
problem is under investigation and the code won't be active before
v7.3, but that turned out to be less intrusive than a full revert
as it preserves the preparatory steps and allows people to work on
the final resolution
- Export time namespace functionality as a recent user can be built
as a module.
- Initialize the jiffies clocksource before using it. The recent
hardening against time moving backward requires that the related
members of struct clocksource have been initialized, otherwise it
clamps the readout to 0, which makes time stand sill and causes
boot delays.
- Fix a more than twenty year old PID reference count leak in an
error path of the POSIX CPU timer code.
- The usual small fixes, improvements and cleanups all over the
place"
* tag 'timers-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (31 commits)
posix-cpu-timers: Fix pid refcount leak in do_cpu_nanosleep() error path
time/jiffies: Register jiffies clocksource before usage
timers/migration: Temporarily disable per capacity hierarchies
timers/migration: Turn tmigr_hierarchy level_list into a flexible array
timers/migration: Deactivate per-capacity hierarchies under nohz_full
timers/migration: Fix hotplug migrator selection target on asymetric capacity machines
ntsync: Honour caller's time namespace for absolute MONOTONIC timeouts
time/namespace: Export init_time_ns and do_timens_ktime_to_host()
timers/migration: Update stale @online doc to @available
timers: Fix flseep() typo in kernel-doc comment
hrtimer: Fix the bogus return type of __hrtimer_start_range_ns()
hrtimer: Return ktime_t from hrtimer_get_next_event()/hrtimer_next_event_without()
clocksource: Clean up clocksource_update_freq() functions
alarmtimer: Remove stale return description from alarm_handle_timer()
selftests/posix_timers: Use CLOCK_THREAD_CPUTIME_ID for ITIMER_PROF measurements
scripts/timers: Add timer_migration_tree.py
timers/migration: Handle capacity in connect tracepoints
timers/migration: Split per-capacity hierarchies
timers/migration: Track CPUs in a hierarchy
timers/migration: Abstract out hierarchy to prepare for CPU capacity awareness
...
Linus Torvalds [Mon, 15 Jun 2026 08:04:03 +0000 (13:34 +0530)]
Merge tag 'timers-clocksource-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull clocksource updates from Thomas Gleixner:
"Updates for clocksource/clockevent drivers:
- Add devm helpers for clocksources, which allows to simplify driver
teardown and probe failure handling.
- More module conversion work
- Update the support for the ARM EL2 virtual timer including the
required ACPI changes.
- Add clockevent and clocksource support for the TI Dual Mode Timer
- Fix the support for multiple watchdog instances in the TEGRA186
driver
- Add D1 timer support to the SUN5I driver
- The usual devicetree updates, cleanups and small fixes all over the
place"
* tag 'timers-clocksource-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (24 commits)
clocksource: move NXP timer selection to drivers/clocksource
clocksource/drivers/timer-tegra186: Reserve and service a kernel watchdog
clocksource/drivers/timer-tegra186: Register all accessible watchdog timers
clocksource/drivers/timer-tegra186: Correct num_wdts for Tegra186 and Tegra234
clocksource/drivers/timer-tegra186: Fix support for multiple watchdog instances
clocksource/drivers/timer-ti-dm: Add clockevent support
clocksource/drivers/timer-ti-dm: Add clocksource support
clocksource/drivers/timer-ti-dm: Fix property name in comment
dt-bindings: timer: arm,arch_timer: Fix requirements for interrupt description
clocksource/drivers/arm_arch_timer: Default to EL2 virtual timer when running VHE
ACPI: GTDT: Parse information related to the EL2 virtual timer
ACPI: GTDT: Account for GTDTv3 size when walking the platform timer descriptors
clocksource: Add devm_clocksource_register_*() helpers
clocksource/drivers/sun5i: Add D1 hstimer support
dt-bindings: timer: allwinner,sun5i-a13-hstimer: add H616 and D1
dt-bindings: timer: Add StarFive JHB100 clint
dt-bindings: timer: renesas,rz-mtu3: document RZ/{T2H,N2H}
dt-bindings: timer: renesas,rz-mtu3: Remove TCIU8 interrupt
dt-bindings: timer: Remove sifive,fine-ctr-bits property
clocksource/drivers/timer-of: Make the code compatible with modules
...
Linus Torvalds [Mon, 15 Jun 2026 08:00:04 +0000 (13:30 +0530)]
Merge tag 'smp-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull smp core updates from Thomas Gleixner:
"Two small updates to the SMP/hotplug subsystem:
- Add cpuhplock.h to the maintained files
- Provide the missing stubs for lockdep_is_cpus_held() and
lockdep_is_cpus_write_held() so the usage sites can be simplified"
* tag 'smp-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
cpu: Add lockdep_is_cpus_held()/lockdep_is_cpus_write_held() stubs for !CONFIG_HOTPLUG_CPU
MAINTAINERS: Add include/linux/cpuhplock.h to CPU HOTPLUG area
Linus Torvalds [Mon, 15 Jun 2026 07:55:32 +0000 (13:25 +0530)]
Merge tag 'irq-drivers-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull interrupt chip driver updates from Thomas Gleixner:
- Replace the support for the AST2700-A0 early silicon with a proper
driver for the final A2 production silicon
- Rename and rework the StarFive JH8100 interrupt controller for the
new JHB100 SoC as JH8100 was discontinued before production.
- Add support for Amlogic A9 SoCs to the meson-gpio interrupt
controller
- Expand the Econet interrupt controller driver to support MIPS 34Kc
Vectored External Interrupt Controller mode.
- Prevent a NULL pointer dereference in the GICv4 code as the vLPI code
blindly assumes that the ITS was populated. Add the missing sanity
check.
- Add support for software triggered and for error interrupts to the
Renesas RZ/T2H driver.
- Add interrupt redirection support for the loongarch architecture.
- Add multicore support to the Realtek RTL interrupt driver
- The usual updates, enhancements and fixes all over the place
* tag 'irq-drivers-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (32 commits)
irqchip/irq-realtek-rtl: Add multicore support
irqchip/irq-realtek-rtl: Add/simplify register helpers
irqchip/loongarch-ir: Add IR (interrupt redirection) irqchip support
irqchip/loongarch-avec: Return IRQ_SET_MASK_OK_DONE when keep affinity
irqchip/loongarch-avec: Prepare for interrupt redirection support
Docs/LoongArch: Add advanced extended IRQ model
irqchip/qcom-pdc: Use FIELD_GET() to extract bank index and bit position
irqchip/qcom-pdc: Add PDC_VERSION() macro to describe version register fields
irqchip/qcom-pdc: Tighten ioremap clamp to single DRV region size
irqchip/qcom-pdc: Split __pdc_enable_intr() into per-version helpers
irqchip/exynos-combiner: Remove useless spinlock
irqchip/renesas-rzt2h: Add error interrupts support
irqchip/renesas-rzt2h: Add software-triggered interrupts support
irqchip/gic-v4: Don't advertise VLPIs if no ITS is probed
irqchip/gic-v3-its: Use FIELD_MODIFY()
irqchip/econet-en751221: Support MIPS 34Kc VEIC mode
dt-bindings: interrupt-controller: econet: Add CPU interrupt mapping
irqchip/meson-gpio: Add support for Amlogic A9 SoCs
dt-bindings: interrupt-controller: Add support for Amlogic A9 SoCs
irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type()
...
Linus Torvalds [Mon, 15 Jun 2026 07:49:41 +0000 (13:19 +0530)]
Merge tag 'irq-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull interrupt core updates from Thomas Gleixner:
- Rework of /proc/interrupt handling:
/proc/interrupts was subject to micro optimizations for a long time,
but most of the low hanging fruit was left on the table. This rework
addresses the major time consuming issues:
- Printing a long series of zeros one by one via a format string
instead of counting subsequent zeros and emitting a string
constant.
- Simplify and cache the conditions whether interrupts should be
printed
- Use a proper iteration over the interrupt descriptor xarray
instead of walking and testing one by one.
- Provide helper functions for the architecture code to emit the
architecture specific counters
- Convert the counter structure in x86 to an array, which
simplifies the output and add mechanisms to suppress unused
architecture interrupts, which just occupy space for nothing.
Adopt the new core mechanisms.
This adjusts the gdb scripts related to interrupt counter statistics
to work with the new mechanisms.
- Prevent a string overflow in the /proc/irq/$N/ directory name
creation code.
* tag 'irq-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
x86/irq: Add missing 's' back to thermal event printout
genirq/proc: Speed up /proc/interrupts iteration
genirq/proc: Runtime size the chip name
genirq: Expose irq_find_desc_at_or_after() in core code
genirq: Add rcuref count to struct irq_desc
genirq/proc: Increase default interrupt number precision to four
genirq: Calculate precision only when required
genirq: Cache the condition for /proc/interrupts exposure
genirq/manage: Make NMI cleanup RT safe
genirq: Expose nr_irqs in core code
scripts/gdb: Update x86 interrupts to the array based storage
x86/irq: Move IOAPIC misrouted and PIC/APIC error counts into irq_stats
x86/irq: Suppress unlikely interrupt stats by default
x86/irq: Make irqstats array based
genirq/proc: Utilize irq_desc::tot_count to avoid evaluation
genirq/proc: Avoid formatting zero counts in /proc/interrupts
x86/irq: Optimize interrupts decimals printing
genirq/proc: Size interrupt directory names for 10-digit interrupt numbers
Linus Torvalds [Mon, 15 Jun 2026 07:44:36 +0000 (13:14 +0530)]
Merge tag 'core-rseq-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull rseq update from Thomas Gleixner:
"A trivial update for RSEQ selftests to provide the config fragments
which contain the config options required to actually run the tests"
* tag 'core-rseq-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
selftests/rseq: Add config fragment
Colton Jones [Mon, 15 Jun 2026 03:36:20 +0000 (03:36 +0000)]
ALSA: hda/realtek: Add CS35L41 I2C quirk for ASUS UM3405GA
The ASUS Zenbook 14 UM3405GA uses a Realtek ALC294 codec with two
Cirrus Logic CS35L41 speaker amplifiers exposed through the CSC3551 ACPI
device. The machine reports the Realtek subsystem ID 1043:19f4.
Without a PCI quirk, the codec falls back to generic pin matching and the
internal speakers remain silent even though PCM playback completes.
Add the UM3405GA subsystem ID and reuse the same ASUS I2C headset-mic
fixup used by the closely related UM3406HA. That fixup configures the
headset microphone pin and chains to CS35L41 I2C speaker-amp binding.
selftests/bpf: Work around llvm stack overflow in crypto progs
clang 23 fails to build crypto_bench.c and crypto_sanity.c with
"BPF stack limit exceeded". The progs fill a 408-byte
bpf_crypto_params on the stack and pass it to bpf_crypto_ctx_create().
clang 23 copies the byte-aligned cipher/key globals into it one byte at
a time through the stack, and keeps more than one copy of the struct
around. Together that blows the 512-byte limit.
Align the source arrays to 8 bytes so the copy is word-wise, and move
params off the stack into a static .bss var. static keeps it out of the
skeleton, where bpf_crypto_params is an incomplete type. Either change
alone is not enough.
Linus Torvalds [Mon, 15 Jun 2026 07:11:17 +0000 (12:41 +0530)]
Merge tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"Deferred probe:
- Fix race where deferred probe timeout work could be permanently
canceled by using mod_delayed_work()
- Fix missing jiffies conversion in deferred_probe_extend_timeout()
- Guard timeout extension with delayed_work_pending() to prevent
premature firing
- Use system_percpu_wq instead of the deprecated system_wq
- Update deferred_probe_timeout documentation
device:
- Replace direct struct device bitfield access (can_match, dma_iommu,
dma_skip_sync, dma_ops_bypass, state_synced, dma_coherent,
of_node_reused, offline, offline_disabled) with flag-based
accessors using bit operations
- Reject devices with unregistered buses
- Delete unused DEVICE_ATTR_PREALLOC()
- Add low-level device attribute macros with const show/store
callbacks, allowing device attributes to reside in read-only memory
- Move core device attributes to read-only memory
- Constify group array pointers in driver_add_groups() /
driver_remove_groups(), struct bus_type, and struct device_driver
device property:
- Fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
- Initialize all fields of fwnode_handle in fwnode_init()
- Provide swnode_get()/swnode_put() wrappers around kobject_get/put()
- Allow passing struct software_node_ref_args pointers directly to
PROPERTY_ENTRY_REF()
driver_override:
- Migrate amba, cdx, vmbus, and rpmsg to the generic driver_override
infrastructure, fixing a UAF from unsynchronized access to
driver_override in bus match() callbacks
- Remove the now-unused driver_set_override()
firmware loader:
- Fix recursive lock deadlock in device_cache_fw_images() when async
work falls back to synchronous execution
- Fix device reference leak in firmware_upload_register()
platform:
- Pass KBUILD_MODNAME through the platform driver registration macro
to create module symlinks in sysfs for built-in drivers; move
module_kset initialization to a pure_initcall and tegra cbb
registration to core_initcall to ensure correct ordering
- Pass THIS_MODULE implicitly through a coresight_init_driver() macro
sysfs:
- Upgrade OOB write detection in sysfs_kf_seq_show() from printk to
WARN
- Add return value clamping to sysfs_kf_read()
Rust:
- ACPI:
Fix missing match data for PRP0001 by exporting
acpi_of_match_device()
- Auxiliary:
Replace drvdata() with dedicated registration data on
auxiliary_device. drvdata() exposed the driver's bus device private
data beyond the driver's own scope, creating ordering constraints
and forcing the data to outlive all registrations that access it.
Registration data is instead scoped structurally to the
Registration object, making lifecycle ordering enforced by
construction rather than convention.
- Rust-native device driver lifetimes (HRT):
Allow Rust device drivers to carry a lifetime parameter on their
bus device private data, tied to the device binding scope -- the
interval during which a bus device is bound to a driver. Device
resources like pci::Bar<'a> and IoMem<'a> can be stored directly in
the driver's bus device private data with a lifetime bounded by the
binding scope, so the compiler enforces at build time that they do
not outlive the binding. This removes Devres indirection from every
access site and eliminates try_access() failure paths in
destructors.
Bus driver traits use a Generic Associated Type (GAT) Data<'bound>
to introduce the lifetime on the private data, rather than
parameterizing the Driver trait itself. Auxiliary registration
data, where the lifetime is not introduced by a trait callback but
must be threaded through Registration, uses the ForLt trait (a
type-level abstraction for types generic over a lifetime).
Misc:
- Fix DT overlayed devices not probing by reverting the broken
treewide overlay fix and re-running fw_devlink consumer pickup when
an overlay is applied to a bound device
- Use root_device_register() for faux bus root device; add sanity
check for failed bus init
- Fix dev_has_sync_state() data race with READ_ONCE() and move it to
base.h
- Avoid spurious device_links warning when removing a device while
its supplier is unbinding
- Switch ISA bus to dynamic root device
- Fix suspicious RCU usage in kernfs_put()
- Remove devcoredump exit callback
- Constify devfreq_event_class"
* tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core: (81 commits)
software node: allow passing reference args to PROPERTY_ENTRY_REF()
driver core: platform: set mod_name in driver registration
coresight: pass THIS_MODULE implicitly through a macro
kernel: param: initialize module_kset in a pure_initcall
soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
firmware_loader: Fix recursive lock in device_cache_fw_images()
driver core: Use system_percpu_wq instead of system_wq
driver core: remove driver_set_override()
rpmsg: use generic driver_override infrastructure
Drivers: hv: vmbus: use generic driver_override infrastructure
cdx: use generic driver_override infrastructure
amba: use generic driver_override infrastructure
rust: devres: add 'static bound to Devres<T>
samples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data
rust: auxiliary: generalize Registration over ForLt
rust: types: add `ForLt` trait for higher-ranked lifetime support
gpu: nova-core: separate driver type from driver data
samples: rust: rust_driver_pci: use HRT lifetime for Bar
rust: io: make IoMem and ExclusiveIoMem lifetime-parameterized
rust: pci: make Bar lifetime-parameterized
...
Linus Torvalds [Mon, 15 Jun 2026 06:07:18 +0000 (11:37 +0530)]
Merge tag 'pm-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"Over a half of the changes here are cpufreq updates that include core
modifications, fixes of the old-style governors, new hardware support
in drivers, assorded driver fixes and cleanups, and the removal of one
driver (AMD Elan SC4*).
Apart from that, the intel_idle driver will now be able to avoid
exposing redundant C-states if PC6 is disabled and there are new
sysctl knobs for device suspend/resume watchdog timeouts, hibernation
gets built-in LZ4 support for image compression and there is the usual
collection of assorted fixes and cleanups.
Specifics:
- Fix a race between cpufreq suspend and CPU hotplug during system
shutdown (Tianxiang Chen)
- Avoid redundant target() calls for unchanged limits and fix a typo
in a comment in the cpufreq core (Viresh Kumar)
- Fix concurrency issues related to sysfs attributes access that
affect cpufreq governors using the common governor code (Zhongqiu
Han)
- Simplify frequency limit handling in the conservative cpufreq
governor (Lifeng Zheng)
- Fix descriptions of the conservative governor freq_step tunable and
the ondemand governor sampling_down_factor tunable in the cpufreq
documentation (Pengjie Zhang)
- Fix use-after-free and double free during _OSC evaluation in the
PCC cpufreq driver (Yuho Choi)
- Rework the handling of policy min and max frequency values in the
cpufreq core to allow drivers to specify special initial values for
the scaling_min_freq and scaling_max_freq sysfs attributes (Pierre
Gondois)
- Add cpufreq scaling support for Qualcomm Shikra SoC (Taniya Das,
Imran Shaik).
- Improve the warning message on HWP-disabled hybrid processors
printed by the intel_pstate driver and sync policy->cur during CPU
offline in it (Yohei Kojima, Fushuai Wang)
- Drop cpufreq support for AMD Elan SC4* (Sean Young)
- Minor fixes for cpufreq drivers (Krzysztof Kozlowski, Akashdeep
Kaur, Hans Zhang, Guangshuo Li, Xueqin Luo)
- Clean up dead dependencies on X86 in the cpufreq Kconfig (Julian
Braha)
- Allow the intel_idle driver to avoid exposing C-states that are
redundant when PC6 is disabled (Artem Bityutskiy)
- Fix memory leak and a potential race in the OPP core (Abdun Nihaal,
Di Shen)
- Mark Rust OPP methods as inline (Nicolás Antinori)
- Fix misc device registration failure path in the PM QoS core (Yuho
Choi)
- Add sysctl interface for DPM watchdog timeouts (Tzung-Bi Shih)
- Use complete() instead of complete_all() in device_pm_sleep_init()
to avoid a false-positive warning from lockdep_assert_RT_in_threaded_ctx()
when CONFIG_PROVE_RAW_LOCK_NESTING is enabled (Jiakai Xu)
- Use a flexible array for CRC uncompressed buffers during
hibernation image saving (Rosen Penev)
- Make the LZ4 algorithm available for hibernation compression
(l1rox3)
- Move the preallocate_image() call during hibernation after the
"prepare" phase of the "freeze" transition (Matthew Leach)
- Fix a memory leak in rapl_add_package_cpuslocked() in the
intel_rapl power capping driver and use sysfs_emit() in
cpumask_show() in that driver (Sumeet Pawnikar, Yury Norov)
- Fix ValueError when parsing incomplete device properties in the
pm-graph utility (Gongwei Li)"
* tag 'pm-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm: (40 commits)
PM: dpm_watchdog: Add sysctl interface for DPM watchdog timeouts
PM: QoS: Fix misc device registration unwind
cpufreq: Use policy->min/max init as QoS request
cpufreq: Remove driver default policy->min/max init
cpufreq: Set default policy->min/max values for all drivers
cpufreq: Extract cpufreq_policy_init_qos() function
cpufreq: Documentation: fix conservative governor freq_step description
cpufreq: ti: Add EPROBE_DEFER for K3 SoCs
cpufreq: qcom: Add cpufreq scaling support for Qualcomm Shikra SoC
dt-bindings: cpufreq: Document Qualcomm Shikra SoC EPSS
powercap: intel_rapl: Use sysfs_emit() in cpumask_show()
cpufreq: governor: Fix stale prev_cpu_nice spike when enabling ignore_nice_load
cpufreq: governor: Fix data races on per-CPU idle/nice baselines
PM: hibernate: Use flexible array for CRC uncompressed buffers
powercap: intel_rapl: Fix memory leak in rapl_add_package_cpuslocked()
PM: hibernate: make LZ4 available for hibernation compression
PM: sleep: Use complete() in device_pm_sleep_init()
opp: rust: mark OPP methods as inline
cpufreq: intel_pstate: Improve warning message on HWP-disabled hybrid CPUs
cpufreq: elanfreq: Drop support for AMD Elan SC4*
...
Linus Torvalds [Mon, 15 Jun 2026 06:05:11 +0000 (11:35 +0530)]
Merge tag 'thermal-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add new hardware support (i.MX93 TMU, Amlogic T7, Intel Arrow
Lake, QCom Nord, Shikra and Hawi), fix issues in a number of places in
the thermal control core and drivers, clean up code and refactor it in
preparation for future changes:
- Rework the initialization and cleanup of thermal class cooling
devices to separate DT-based cooling device registration and
cooling device registration without DT (Daniel Lezcano, Ovidiu
Panait)
- Update the cooling device DT bindings to support 3-cell cooling
device representation, where the additional cell holds an ID to
select a cooling mechanism for devices that offer multiple cooling
mechanisms, and adjust the cooling device registration code
accordingly (Gaurav Kohli, Daniel Lezcano)
- Remove dead code from two functions in the thermal core and
simplify the unregistration of thermal governors (Rafael Wysocki)
- Fix critical temperature attribute removal handling in the generic
thermal zone hwmon support code and rework that code to register a
separate hwmon class device for each thermal zone (instead of using
one hwmon class device for all thermal zones of the same type) to
address thermal zone removal deadlocks (Rafael Wysocki)
- Use attribute groups for adding temperature attributes to hwmon
class devices associated with thermal zones (Rafael Wysocki)
- Pass WQ_UNBOUND when allocating the thermal workqueue (Marco
Crivellari)
- Fix potential shift overflow in ptc_mmio_write() and improve error
handling in proc_thermal_ptc_add() in the int340x thermal control
driver (Aravind Anilraj)
- Use sysfs_emit() for cpumask printing in the Intel powerclamp
thermal driver (Yury Norov)
- Add Arrow Lake CPU models to the intel_tcc_cooling driver (Srinivas
Pandruvada)
- Add QCom Nord, Shikra and Hawi temperature sensor DT bindings
(Deepti Jaggi, Gaurav Kohli, Dipa Ramesh Mantre)
- Use devm_add_action_or_reset() for clock disable on the NVidia
soctherm and switch it to devm cooling device registration version
(Daniel Lezcano)
- Add the Amlogic T7 thermal sensor along with thermal calibration
data read from SMC calls (Ronald Claveau)
- Fix atomic temperature read in the QCom tsens driver to comply with
hardware documentation (Priyansh Jain)
- Add SpacemiT K1 thermal sensor support (Shuwei Wu)
- Add i.MX93 temperature sensor support and filter out the invalid
temperature (Jacky Bai)
- Enable by default the TMU (Thermal Monitoring Unit) on Exynos
platform (Krzysztof Kozlowski)
- Rework interrupt initialization in the Tsens driver and add the
optional wakeup source (Priyansh Jain)
- Fix typo in a comment in the TSens QCom driver (Jinseok Kim)
- Fix trailing whitespace and repeated word in the OF code, remove
quoted string splitting across lines from the iMX7 driver, and
remove a stray space from the thermal_trip_of_attr() macro
definition (Mayur Kumar)
- Update the thermal testing facility code to avoid NULL pointer
dereferences by rejecting missing command arguments and replace
sscanf() with kstrtoint() or kstrtoul() in that code (Ovidiu
Panait, Samuel Moelius)"
* tag 'thermal-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm: (54 commits)
thermal: sysfs: Replace sscanf() with kstrtoul()
thermal: testing: Replace sscanf() with kstrtoint()
thermal: testing: reject missing command arguments
thermal: intel: intel_tcc_cooling: Add Arrow Lake CPU models
thermal/drivers/qcom/tsens: Disable wakeup interrupt setup on automotive targets
thermal/drivers/qcom/tsens: Switch wake IRQ handling to PM callbacks
thermal/core: Fix missing stub for devm_thermal_cooling_device_register
dt-bindings: thermal: cooling-devices: Update support for 3 cells cooling device
thermal/of: Support cooling device ID in cooling-spec
thermal/of: Pass cdev_id and introduce devm registration helper
thermal/of: Add cooling device ID support
thermal/of: Rename the devm_thermal_of_cooling_device_register() function
thermal/core: Make cooling device OF node conditional on CONFIG_THERMAL_OF
thermal/of: Move cooling device OF helpers out of thermal core
hwmon: Use non-OF thermal cooling device registration API
thermal/core: Add devm_thermal_cooling_device_register()
thermal/core: Introduce non-OF thermal_cooling_device_register()
thermal/drivers/samsung: Enable TMU by default
thermal/driver/qoriq: Workaround unexpected temperature readings from tmu
thermal/drivers/qoriq: Add i.MX93 tmu support
...
Linus Torvalds [Mon, 15 Jun 2026 06:02:38 +0000 (11:32 +0530)]
Merge tag 'acpi-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream version 20260408, introduce support for devres-based management of ACPI notify
handlers and update some core ACPI device drivers on top of that
(which includes some fixes and cleanups), add _DEP support for PCI/CXL
roots and Intel CVS devices, fix a couple of assorted issues and clean
up code:
- Fix multiple issues related to probe, removal and missing NVDIMM
device notifications in the ACPI NFIT driver (Rafael Wysocki)
- Add support for devres-based management of ACPI notify handlers to
the ACPI core (Rafael Wysocki)
- Switch multiple core ACPI device drivers (including the ACPI PAD,
ACPI video bus, ACPI HED, ACPI thermal zone, ACPI AC, ACPI battery,
and ACPI NFIT drivers) over to using devres-based resource
management during probe (Rafael Wysocki)
- Replace mutex_lock/unlock() with guard()/scoped_guard() in the ACPI
PMIC driver (Maxwell Doose)
- Fix message kref handling in the dead device path of the ACPI IPMI
address space handler (Yuho Choi)
- Use sysfs_emit() in idlecpus_show() in the ACPI processor
aggregator device (PAD) driver (Yury Norov)
- Clean up device_id_scheme initialization in the ACPI video bus
driver (Jean-Ralph Aviles)
- Clean up lid handling in the ACPI button driver and
acpi_button_probe(), reorganize installing and removing event
handlers in that driver and switch it over to using devres-based
resource management during probe (Rafael Wysocki)
- Add support for the Legacy Virtual Register (LVR) field in I2C
serial bus resource descriptors to ACPICA (Akhil R)
- Fix multiple issues related to bounds checks, input validation,
use-after-free, and integer overflow checks in the AML interpreter
in ACPICA (ikaros)
- Update the copyright year to 2026 in ACPICA files and make minor
changes related to ACPI 6.6 support (Pawel Chmielewski)
- Remove spurious precision from format used to dump parse trees in
ACPICA (David Laight)
- Add modern standby DSM GUIDs to ACPICA header files (Daniel
Schaefer)
- Update D3hot/cold device power states definitions in ACPICA header
files (Aymeric Wibo)
- Fix NULL pointer dereference in acpi_ns_custom_package() (Weiming
Shi)
- Update ACPICA version to 20260408 (Saket Dumbre)
- Add cpuidle driver check in acpi_processor_register_idle_driver()
to avoid evaluating _CST unnecessarily (Tony W Wang-oc)
- Suppress UBSAN warning caused by field misuse during PCC-based
register access in the ACPI CPPC library (Jeremy Linton)
- Add support for CPPC v4 to the ACPI CPPC library (Sumit Gupta)
- Update the ACPI device enumeration code to honor _DEP for ACPI0016
PCI/CXL host bridges and make the ACPI PCI root driver clear _DEP
dependencies for PCI roots that have become operational (Chen Pei)"
* tag 'acpi-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
ACPI: processor: Add cpuidle driver check in acpi_processor_register_idle_driver()
ACPI: IPMI: Fix message kref handling on dead device
ACPI: CPPC: Suppress UBSAN warning caused by field misuse
ACPI: scan: Honor _DEP for Intel CVS devices
ACPI: NFIT: core: Fix possible deadlock and missing notifications
ACPI: NFIT: core: Eliminate redundant local variable
ACPI: NFIT: core: Fix acpi_nfit_init() error cleanup
ACPI: NFIT: core: Fix possible NULL pointer dereference
ACPI: bus: Clean up devm_acpi_install_notify_handler()
ACPI: button: Switch over to devres-based resource management
ACPI: button: Reorganize installing and removing event handlers
ACPI: button: Use string literals for generating netlink messages
ACPI: button: Clean up adding and removing lid procfs interface
ACPI: button: Merge two switch () statements in acpi_button_probe()
ACPI: button: Drop redundant variable from acpi_button_probe()
ACPI: button: Rework device verification during probe
ACPI: CPPC: Add support for CPPC v4
ACPI: PAD: Use sysfs_emit() in idlecpus_show()
ACPI: scan: Honor _DEP for ACPI0016 PCI/CXL host bridge
ACPI: PCI: Clear _DEP dependencies after PCI root bridge attach
...
Linus Torvalds [Mon, 15 Jun 2026 05:59:31 +0000 (11:29 +0530)]
Merge tag 'nolibc-20260614-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
- New architectures: OpenRISC and 32-bit parisc
- New library functionality: alloca(), assert(), creat() and
ftruncate()
- Automatic large file support
- Proper 64-bit system call argument passing on x32 and MIPS N32
- Cleanups of the testmatrix
- Various bugfixes and cleanups
* tag 'nolibc-20260614-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (37 commits)
selftests/nolibc: test against -Wwrite-strings
selftests/nolibc: use mutable buffer for execve() argv string
tools/nolibc: cast default values of program_invocation_name
tools/nolibc: add ftruncate()
tools/nolibc: add a helper to split a 64-bit argument into 32-bit halves
selftests/nolibc: enable CONFIG_TMPFS for sparc32
tools/nolibc: stackprotector: Avoid stalling program startup if crng is not init yet
tools/nolibc: getopt: Fix potential out of bounds access
selftests/nolibc: test open mode handling
tools/nolibc: always pass mode to open syscall
tools/nolibc: split open mode handling into a macro
tools/nolibc: split implicit open flags into a macro
tools/nolibc: add support for 32-bit parisc
selftests/nolibc: avoid function pointer comparisons
tools/nolibc: add support for OpenRISC / or1k
selftests/nolibc: use vmlinux for MIPS tests
selftests/nolibc: trim IMAGE mappings
selftests/nolibc: trim DEFCONFIG mappings
selftests/nolibc: trim QEMU_ARCH mappings
selftests/nolibc: use QEMU_ARCH for QEMU_ARCH_USER
...
====================
bpf, skmsg: some fixes for skmsg
All fixes are from previous patches sent by Weiming Shi, Zhang Cen,
Kuniyuki and Sechang Lim, which have already been reviewed by me and John and Jakub.
The automated reviewer (sashiko) may still flag a few other potential
issues on top of this series. After looking into them, they are either
already covered by the patches here, are the BPF program's own
responsibility (e.g. initializing the payload it pushes) and intentionally
left out, or only reachable under very narrow conditions that require a
specially crafted BPF program and an unusual sk_msg ring state, so they are
not practical to trigger and are left out of this series. I'm collecting
these fixes together because the same
problems have been re-sent many times in slightly different forms, and I
hope this series can be prioritized for merging so the duplicates can
finally settle. With so many AI-generated patches floating around for
these spots, leaving them unmerged just keeps wasting maintainer review
cycles on the same issues.
v3->v4: Carry Kuniyuki Iwashima's reviewed-by tag.
Drop the __GFP_ZERO patch; initializing the pushed payload is the
BPF program's responsibility, not the kernel's (per maintainer
feedback).
https://lore.kernel.org/bpf/20260612130919.299124-1-jiayuan.chen@linux.dev/
v2->v3: Target to bpf-next and carry Emil's reviewed-by tag.
Reverse xmas tree style is used suggested by Cong.
(not all code match reverse xmas tree due to variable dependency)
v1->v2: fix problem when fix the conflict.
====================
Sechang Lim [Mon, 15 Jun 2026 02:19:59 +0000 (10:19 +0800)]
selftests/bpf: add test for bpf_msg_pop_data() overflow
Add a test in sockmap_basic.c that calls bpf_msg_pop_data() with a length
close to U32_MAX, which overflows the start + len bounds check. The sk_msg
program records the return value over a sendmsg and the test checks that
the call is rejected with -EINVAL.
Sechang Lim [Mon, 15 Jun 2026 02:19:58 +0000 (10:19 +0800)]
bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
start and len are u32, so
u64 last = start + len;
evaluates start + len in 32-bit and wraps before storing it in last.
The bounds check
if (start >= offset + l || last > msg->sg.size)
return -EINVAL;
can then be passed with an out-of-range start/len, after which the pop
loop runs off the end of the scatterlist and sk_msg_shift_left() calls
put_page() on the empty msg->sg.end slot:
Widen the addition with a (u64) cast so the bound is evaluated in
64-bit and a len near U32_MAX no longer wraps below msg->sg.size.
While here, change pop from int to u32. It counts bytes against the
unsigned scatterlist lengths and can never be negative, so the signed
type only invites sign-confusion in the pop loop.
Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages") Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260615021959.140010-6-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>