lib/smp_processor_id: Make migration check unconditional of SMP
Commit cac5cefbade90 ("sched/smp: Make SMP unconditional")
migrate_disable() even on UP builds.
Commit 06ddd17521bf1 ("sched/smp: Always define is_percpu_thread() and
scheduler_ipi()") made is_percpu_thread() check the affinity mask
instead replying always true for UP mask.
As a consequence smp_processor_id() now complains if invoked within a
migrate_disable() section because is_percpu_thread() checks its mask and
the migration check is left out.
Make migration check unconditional of SMP.
Fixes: cac5cefbade90 ("sched/smp: Make SMP unconditional") Closes: https://lore.kernel.org/oe-lkp/202507100448.6b88d6f1-lkp@intel.com Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chen Yu <yu.c.chen@intel.com> Link: https://lore.kernel.org/r/20250710082748.-DPO1rjO@linutronix.de
Vincent Guittot [Tue, 8 Jul 2025 16:56:30 +0000 (18:56 +0200)]
sched/fair: Always trigger resched at the end of a protected period
Always trigger a resched after a protected period even if the entity is
still eligible. It can happen that an entity remains eligible at the end
of the protected period but must let an entity with a shorter slice to run
in order to keep its lag shorter than slice. This is particulalry true
with run to parity which tries to maximize the lag.
Vincent Guittot [Tue, 8 Jul 2025 16:56:29 +0000 (18:56 +0200)]
sched/fair: Fix entity's lag with run to parity
When an entity is enqueued without preempting current, we must ensure
that the slice protection is updated to take into account the slice
duration of the newly enqueued task so that its lag will not exceed
its slice (+ tick).
Vincent Guittot [Tue, 8 Jul 2025 16:56:28 +0000 (18:56 +0200)]
sched/fair: Limit run to parity to the min slice of enqueued entities
Run to parity ensures that current will get a chance to run its full
slice in one go but this can create large latency and/or lag for
entities with shorter slice that have exhausted their previous slice
and wait to run their next slice.
Clamp the run to parity to the shortest slice of all enqueued entities.
Even if the waking task can preempt current, it might not be the one
selected by pick_task_fair. Check that the waking task will be selected
if we cancel the slice protection before doing so.
Vincent Guittot [Tue, 8 Jul 2025 16:56:26 +0000 (18:56 +0200)]
sched/fair: Fix NO_RUN_TO_PARITY case
EEVDF expects the scheduler to allocate a time quantum to the selected
entity and then pick a new entity for next quantum.
Although this notion of time quantum is not strictly doable in our case,
we can ensure a minimum runtime for each task most of the time and pick a
new entity after a minimum time has elapsed.
Reuse the slice protection of run to parity to ensure such runtime
quantum.
Peter Zijlstra [Tue, 20 May 2025 09:19:30 +0000 (11:19 +0200)]
sched/deadline: Less agressive dl_server handling
Chris reported that commit 5f6bd380c7bd ("sched/rt: Remove default
bandwidth control") caused a significant dip in his favourite
benchmark of the day. Simply disabling dl_server cured things.
His workload hammers the 0->1, 1->0 transitions, and the
dl_server_{start,stop}() overhead kills it -- fairly obviously a bad
idea in hind sight and all that.
Change things around to only disable the dl_server when there has not
been a fair task around for a whole period. Since the default period
is 1 second, this ensures the benchmark never trips this, overhead
gone.
Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20250702121158.465086194@infradead.org
Dietmar reported that commit 3840cbe24cf0 ("sched: psi: fix bogus
pressure spikes from aggregation race") caused a regression for him on
a high context switch rate benchmark (schbench) due to the now
repeating cpu_clock() calls.
In particular the problem is that get_recent_times() will extrapolate
the current state to 'now'. But if an update uses a timestamp from
before the start of the update, it is possible to get two reads
with inconsistent results. It is effectively back-dating an update.
(note that this all hard-relies on the clock being synchronized across
CPUs -- if this is not the case, all bets are off).
Combine this problem with the fact that there are per-group-per-cpu
seqcounts, the commit in question pushed the clock read into the group
iteration, causing tree-depth cpu_clock() calls. On architectures
where cpu_clock() has appreciable overhead, this hurts.
Instead move to a per-cpu seqcount, which allows us to have a single
clock read for all group updates, increasing internal consistency and
lowering update overhead. This comes at the cost of a longer update
side (proportional to the tree depth) which can cause the read side to
retry more often.
Fixes: 3840cbe24cf0 ("sched: psi: fix bogus pressure spikes from aggregation race") Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>, Link: https://lkml.kernel.org/20250522084844.GC31726@noisy.programming.kicks-ass.net
This creates 4 message threads pinned to CPUs 0-3, and 256x4 worker
threads spread across the rest of the CPUs. Neither the worker threads
or the message threads do any work, they just wake each other up and go
back to sleep as soon as possible.
The end result is the first 4 CPUs are pegged waking up those 1024
workers, and the rest of the CPUs are constantly banging in and out of
idle. If I take a v6.9 Linus kernel and revert that one commit,
performance goes from 3.4M RPS to 5.4M RPS.
schedstat shows there are ~100x more new idle balance operations, and
profiling shows the worker threads are spending ~20% of their CPU time
on new idle balance. schedstats also shows that almost all of these new
idle balance attemps are failing to find busy groups.
The fix used here is to crank up the cost of the newidle balance whenever it
fails. Since we don't want sd->max_newidle_lb_cost to grow out of
control, this also changes update_newidle_cost() to use
sysctl_sched_migration_cost as the upper limit on max_newidle_lb_cost.
Peter Zijlstra [Thu, 3 Jul 2025 08:24:39 +0000 (10:24 +0200)]
Merge tag 'rust-sched.2025.06.24' of git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux into sched/core
Rust task & schedule changes for v6.17:
- Make Task, CondVar and PollCondVar methods inline to avoid unnecessary
function calls
- Add might_sleep() support for Rust code: Rust's "#[track_caller]"
mechanism is used so that Rust's might_sleep() doesn't need to be
defined as a macro
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmhbLOgACgkQSXnow7UH
# +riscwf/e/+KmJTox5/JOqs6yxxQdCHMaGnMK62E5AII7NsiUI8+XB9z6efzCMmy
# kS2W7aCmBZX67Y1B/xRL/ArHMBAJBi/CrCedZJcmzfB9aMa4Lj4mgiPkbUXkxE6Q
# F5CDQK21ftu+0Q7Hhlq92ec17ZWodOvNxCFBBmjtQqUvBzj0dY45jcG7brs+N+1Z
# t9UO3YokzukNqpIXTpG0HFP+XNafWWCgn9iIQ44lRxIaAoPI44uJjh1OXLTrZ1M9
# EMWYIrsY3b71Im78l6pzr+UOzJdJLI+QCBiz7ySLYz3kZ5dEfFdJOsumbc0G8A69
# VSGDFPEbJxZYuMxrH0E44XmxH4rJdA==
# =gD/Y
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 25 Jun 2025 12:55:36 AM CEST
# gpg: using RSA key 8F9228B104CFCFC5D4D704474979E8C3B507FAB8
# gpg: Can't check signature: No public key
Boqun Feng [Sun, 15 Jun 2025 22:14:00 +0000 (15:14 -0700)]
rust: Introduce file_from_location()
Most of kernel debugging facilities take a nul-terminated string for
file names for a callsite (generated from __FILE__), however the Rust
courterpart, Location, would return a Rust string (not nul-terminated)
from method .file(). And such a string cannot be passed to C debugging
function directly.
There is ongoing work to support a Location::file_with_nul() [1], which
returns a nul-terminated string from a Location. Since it's still
working in progress, and it will take some time before the feature
finally gets stabilized and the kernel's minimal rustc version might
also take a while to bump to a version that at least has that feature,
introduce a file_from_location() function, which returns a warning
string if Location::file_with_nul() is not available.
This should work in most cases because as for now the known usage of
Location::file_with_nul() is only in debugging code (e.g. might_sleep())
and there might be other information reported by the debugging code that
could help locate the problematic function, so missing the file name is
fine at the moment.
When building the kernel using the llvm-18.1.3-rust-1.85.0-x86_64
toolchain provided by kernel.org, the following symbols are generated:
$ nm vmlinux | grep ' _R'.*Task | rustfilt
... T <kernel::task::Task>::get_pid_ns
... T <kernel::task::Task>::tgid_nr_ns
... T <kernel::task::Task>::current_pid_ns
... T <kernel::task::Task>::signal_pending
... T <kernel::task::Task>::uid
... T <kernel::task::Task>::euid
... T <kernel::task::Task>::current
... T <kernel::task::Task>::wake_up
... T <kernel::task::Task as kernel::types::AlwaysRefCounted>::dec_ref
... T <kernel::task::Task as kernel::types::AlwaysRefCounted>::inc_ref
These Rust symbols are trivial wrappers around the C functions. It
doesn't make sense to go through a trivial wrapper for these functions,
so mark them inline.
[boqun: Capitalize the title, reword a bit to avoid listing all the C
functions as the code already shows them and remove the addresses of the
symbols in the commit log as they are different from build to build.]
Kunwu Chan [Mon, 17 Mar 2025 02:52:05 +0000 (10:52 +0800)]
rust: sync: Mark PollCondVar::drop() inline
When building the kernel using the llvm-18.1.3-rust-1.85.0-x86_64
with ARCH=arm64, the following symbols are generated:
$nm vmlinux | grep ' _R'.*PollCondVar | rustfilt
... T <kernel::sync::poll::PollCondVar as kernel::init::PinnedDrop>::drop
...
This Rust symbol is trivial wrappers around the C functions
__wake_up_pollfree() and synchronize_rcu(). It doesn't make sense to go
through a trivial wrapper for its functions, so mark it inline.
[boqun: Reword the commit title and re-format the commit log per tip
tree's requirement, remove unnecessary information from "nm vmlinux"
result.]
Kunwu Chan [Mon, 24 Mar 2025 06:18:34 +0000 (14:18 +0800)]
rust: sync: Mark CondVar::notify_*() inline
When build the kernel using the llvm-18.1.3-rust-1.85.0-x86_64
with ARCH=arm64, the following symbols are generated:
$nm vmlinux | grep ' _R'.*CondVar | rustfilt
... T <kernel::sync::condvar::CondVar>::notify_all
... T <kernel::sync::condvar::CondVar>::notify_one
... T <kernel::sync::condvar::CondVar>::notify_sync
...
These notify_*() symbols are trivial wrappers around the C functions
__wake_up() and __wake_up_sync(). It doesn't make sense to go through
a trivial wrapper for these functions, so mark them inline.
[boqun: Reword the commit title for consistency and reformat the commit
log.]
Tejun Heo [Sat, 14 Jun 2025 01:23:30 +0000 (15:23 -1000)]
sched/core: Reorganize cgroup bandwidth control interface file writes
- Move input parameter validation from tg_set_cfs_bandwidth() to the new
outer function tg_set_bandwidth(). The outer function handles parameters
in usecs, validates them and calls tg_set_cfs_bandwidth() which converts
them into nsecs. This matches tg_bandwidth() on the read side.
- max/min_cfs_* consts are now used by tg_set_bandwidth(). Relocate, convert
into usecs and drop "cfs" from the names.
- Reimplement cpu_cfs_{period|quote|burst}_write_*() using tg_bandwidth()
and tg_set_bandwidth() and replace "cfs" in the names with "bw".
- Update cpu_max_write() to use tg_set_bandiwdth(). cpu_period_quota_parse()
is updated to drop nsec conversion accordingly. This aligns the behavior
with cfs_period_quota_print().
- Drop now unused tg_set_cfs_{period|quota|burst}().
- While at it, for consistency, rename default_cfs_period() to
default_bw_period_us() and make it return usecs.
This is to prepare for adding bandwidth control support to sched_ext.
tg_set_bandwidth() will be used as the muxing point. No functional changes
intended.
Tejun Heo [Sat, 14 Jun 2025 01:23:29 +0000 (15:23 -1000)]
sched/core: Reorganize cgroup bandwidth control interface file reads
- Update tg_get_cfs_*() to return u64 values. These are now used as the low
level accessors to the fair's bandwidth configuration parameters.
Translation to usecs takes place in these functions.
- Add tg_bandwidth() which reads all three bandwidth parameters using
tg_get_cfs_*().
- Reimplement cgroup interface read functions using tg_bandwidth(). Drop cfs
from the function names.
This is to prepare for adding bandwidth control support to sched_ext.
tg_bandwidth() will be used as the muxing point similar to tg_weight(). No
functional changes.
Tejun Heo [Sat, 14 Jun 2025 01:23:28 +0000 (15:23 -1000)]
sched/core: Relocate tg_get_cfs_*() and cpu_cfs_*_read_*()
Collect the getters, relocate the trivial interface file wrappers, and put
all of them in period, quota, burst order to prepare for future changes.
Pure reordering. No functional changes.
Tejun Heo [Sat, 14 Jun 2025 01:23:27 +0000 (15:23 -1000)]
sched/fair: Move max_cfs_quota_period decl and default_cfs_period() def from fair.c to sched.h
max_cfs_quota_period is defined in core.c but has a declaration in fair.c.
Move the declaration to kernel/sched/sched.h. Also, move
default_cfs_period() from fair.c to sched.h. No functional changes.
Ingo Molnar [Wed, 28 May 2025 08:09:08 +0000 (10:09 +0200)]
sched/smp: Use the SMP version of idle_thread_set_boot_cpu()
Simplify the scheduler by making the CONFIG_SMP=y version of
idle_thread_set_boot_cpu() unconditional.
Note that idle_thread_set_boot_cpu() is already conditional
on CONFIG_GENERIC_SMP_IDLE_THREAD, which most architectures
select unconditionally on both UP and SMP kernels.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250528080924.2273858-28-mingo@kernel.org
Ingo Molnar [Wed, 28 May 2025 08:09:01 +0000 (10:09 +0200)]
sched/smp: Make SMP unconditional
Simplify the scheduler by making CONFIG_SMP=y primitives and data
structures unconditional.
Introduce transitory wrappers for functionality not yet converted to SMP.
Note that this patch is pretty large, because there's no clear separation
between various aspects of the SMP scheduler, it's basically a huge block
of #ifdef CONFIG_SMP. A fair amount of it has to be switched on for it to
boot and work on UP systems.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250528080924.2273858-21-mingo@kernel.org
Peter Zijlstra [Fri, 23 May 2025 16:26:21 +0000 (18:26 +0200)]
sched: Make clangd usable
Due to the weird Makefile setup of sched the various files do not
compile as stand alone units. The new generation of editors are trying
to do just this -- mostly to offer fancy things like completions but
also better syntax highlighting and code navigation.
Specifically, I've been playing around with neovim and clangd.
Setting up clangd on the kernel source is a giant pain in the arse
(this really should be improved), but once you do manage, you run into
dumb stuff like the above.
Fix up the scheduler files to at least pretend to work.
Linus Torvalds [Sun, 8 Jun 2025 18:33:00 +0000 (11:33 -0700)]
Merge tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanup from Thomas Gleixner:
"The delayed from_timer() API cleanup:
The renaming to the timer_*() namespace was delayed due massive
conflicts against Linux-next. Now that everything is upstream finish
the conversion"
* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename from_timer() to timer_container_of()
Linus Torvalds [Sun, 8 Jun 2025 18:27:20 +0000 (11:27 -0700)]
Merge tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Cure IO bitmap inconsistencies
A failed fork cleans up all resources of the newly created thread
via exit_thread(). exit_thread() invokes io_bitmap_exit() which
does the IO bitmap cleanups, which unfortunately assume that the
cleanup is related to the current task, which is obviously bogus.
Make it work correctly
- A lockdep fix in the resctrl code removed the clearing of the
command buffer in two places, which keeps stale error messages
around. Bring them back.
- Remove unused trace events"
* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
x86/iopl: Cure TIF_IO_BITMAP inconsistencies
x86/fpu: Remove unused trace events
Zhang Rui [Fri, 30 May 2025 00:09:28 +0000 (08:09 +0800)]
tools/power turbostat: Avoid probing the same perf counters
For the RAPL package energy status counter, Intel and AMD share the same
perf_subsys and perf_name, but with different MSR addresses.
Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are
introduced to describe this counter for different Vendors.
As a result, the perf counter is probed twice, and causes a failure in
in get_rapl_counters() because expected_read_size and actual_read_size
don't match.
Fix the problem by skipping the already probed counter.
Note, this is not a perfect fix. For example, if different
vendors/platforms use the same MSR value for different purpose, the code
can be fooled when it probes a rapl_counter_arch_infos[] entry that does
not belong to the running Vendor/Platform.
In a long run, better to put rapl_counter_arch_infos[] into the
platform_features so that this becomes Vendor/Platform specific.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Sat, 17 May 2025 09:44:50 +0000 (17:44 +0800)]
tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared
platform_features->rapl_msrs describes the RAPL MSRs supported. While
RAPL Perf counters can be exposed from different kernel backend drivers,
e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver.
Thus, turbostat should first blindly probe all the available RAPL Perf
counters, and falls back to the RAPL MSR counters if they are listed in
platform_features->rapl_msrs.
With this, platforms that don't have RAPL MSRs can clear the
platform_features->rapl_msrs bits and use RAPL Perf counters only.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR
counters and MPERF/APERF/SMI MSR counters, thus its name is misleading.
Similar to add_perf_counter(), introduce add_msr_counter() to probe a
counter via MSR. Introduce wrapper function add_rapl_msr_counter() at
the same time to add extra check for Zero return value for specified
RAPL counters.
No functional change intended.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
As the only caller of add_cstate_perf_counter_(),
add_cstate_perf_counter() just gives extra debug output on top. There is
no need to keep both functions.
Remove add_cstate_perf_counter_() and move all the logic to
add_cstate_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
commit 05a2f07db888 ("tools/power turbostat: read RAPL counters via
perf") that adds support to read RAPL counters via perf defines the
notion of a RAPL domain_id which is set to physical_core_id on
platforms which support per_core_rapl counters (Eg: AMD processors
Family 17h onwards) and is set to the physical_package_id on all the
other platforms.
However, the physical_core_id is only unique within a package and on
platforms with multiple packages more than one core can have the same
physical_core_id and thus the same domain_id. (For eg, the first cores
of each package have the physical_core_id = 0). This results in all
these cores with the same physical_core_id using the same entry in the
rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the
perf-initialization for cores whose domain_ids have already been
visited, cores that have the same physical_core_id always read the
perf file corresponding to the physical_core_id of the first package
and thus the package-energy is incorrectly reported to be the same
value for different packages.
Note: This issue only arises when RAPL counters are read via perf and
not when they are read via MSRs since in the latter case the MSRs are
read separately on each core.
Fix this issue by associating each CPU with rapl_core_id which is
unique across all the packages in the system.
Fixes: 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf") Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
tools/power turbostat: Add Android support for MSR device handling
It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr,
updates error messages and permission checks to reflect the Android
device path, and wraps platform-specific code with #if defined(ANDROID)
to ensure correct behavior on both Android and non-Android systems.
These changes improve compatibility and usability of turbostat on
Android devices.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sun, 8 Jun 2025 18:07:33 +0000 (11:07 -0700)]
Merge tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Thomas Gleixner:
"A single fix for the x86 performance counters on Intel CPUs:
The MSR offset calculations for fixed performance counters are stored
at the wrong index in the configuration array causing the general
purpose counter MSR offset to be overwritten, so both the general
purpose and the fixed counters offsets are incorrect.
Correct the array index calculation to fix that"
* tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()
Linus Torvalds [Sun, 8 Jun 2025 18:02:53 +0000 (11:02 -0700)]
Merge tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for the PCI/MSI code:
The conversion to per device MSI domains created a MSI domain with
size 1 instead of sizing it to the maximum possible number of MSI
interrupts for the device. This "worked" as the subsequent allocations
resized the domain, but the recent change to move the prepare() call
into the domain creation path broke this works by chance mechanism.
Size the domain properly at creation time"
* tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Size device MSI domain with the maximum number of vectors
Linus Torvalds [Sun, 8 Jun 2025 17:35:12 +0000 (10:35 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount fixes from Al Viro:
"Various mount-related bugfixes:
- split the do_move_mount() checks in subtree-of-our-ns and
entire-anon cases and adapt detached mount propagation selftest for
mount_setattr
- allow clone_private_mount() for a path on real rootfs
- fix a race in call of has_locked_children()
- fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP
- make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right
userns
- avoid false negatives in path_overmount()
- don't leak MNT_LOCKED from parent to child in finish_automount()
- do_change_type(): refuse to operate on unmounted/not ours mounts"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_change_type(): refuse to operate on unmounted/not ours mounts
clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
selftests/mount_setattr: adapt detached mount propagation test
do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases
fs: allow clone_private_mount() for a path on real rootfs
fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
finish_automount(): don't leak MNT_LOCKED from parent to child
path_overmount(): avoid false negatives
fs/fhandle.c: fix a race in call of has_locked_children()
Linus Torvalds [Sun, 8 Jun 2025 17:20:21 +0000 (10:20 -0700)]
Merge tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- multichannel/reconnect fixes
- move smbdirect (smb over RDMA) defines to fs/smb/common so they will
be able to be used in the future more broadly, and a documentation
update explaining setting up smbdirect mounts
- update email address for Paulo
* tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
MAINTAINERS, mailmap: Update Paulo Alcantara's email address
cifs: add documentation for smbdirect setup
cifs: do not disable interface polling on failure
cifs: serialize other channels when query server interfaces is pending
cifs: deal with the channel loading lag while picking channels
smb: client: make use of common smbdirect_socket_parameters
smb: smbdirect: introduce smbdirect_socket_parameters
smb: client: make use of common smbdirect_socket
smb: smbdirect: add smbdirect_socket.h
smb: client: make use of common smbdirect.h
smb: smbdirect: add smbdirect.h with public structures
smb: client: make use of common smbdirect_pdu.h
smb: smbdirect: add smbdirect_pdu.h with protocol definitions
Linus Torvalds [Sun, 8 Jun 2025 15:19:01 +0000 (08:19 -0700)]
Merge tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing fixes from Steven Rostedt:
- Fix regression of waiting a long time on updating trace event filters
When the faultable trace points were added, it needed task trace RCU
synchronization.
This was added to the tracepoint_synchronize_unregister() function.
The filter logic always called this function whenever it updated the
trace event filters before freeing the old filters. This increased
the time of "trace-cmd record" from taking 13 seconds to running over
2 minutes to complete.
Move the freeing of the filters to call_rcu*() logic, which brings
the time back down to 13 seconds.
The error path of the ring_buffer_subbuf_order_set() released the
mutex too early and allowed subsequent accesses to setting the
subbuffer size to corrupt the data and cause a bug.
By moving the mutex locking to the end of the error path, it prevents
the reentrant access to the critical data and also allows the
function to convert the taking of the mutex over to the guard()
logic.
- Remove unused power management clock events
The clock events were added in 2010 for power management. In 2011 arm
used them. In 2013 the code they were used in was removed. These
events have been wasting memory since then.
- Fix sparse warnings
There was a few places that sparse warned about trace_events_filter.c
where file->filter was referenced directly, but it is annotated with
an __rcu tag. Use the helper functions and fix them up to use
rcu_dereference() properly.
* tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Add rcu annotation around file->filter accesses
tracing: PM: Remove unused clock events
ring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set()
tracing: Fix regression of filter waiting a long time on RCU synchronization
Linus Torvalds [Sat, 7 Jun 2025 17:05:35 +0000 (10:05 -0700)]
Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which
exports a symbol only to specified modules
- Improve ABI handling in gendwarfksyms
- Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n
- Add checkers for redundant or missing <linux/export.h> inclusion
- Deprecate the extra-y syntax
- Fix a genksyms bug when including enum constants from *.symref files
* tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
genksyms: Fix enum consts from a reference affecting new values
arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds
kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES}
efi/libstub: use 'targets' instead of extra-y in Makefile
module: make __mod_device_table__* symbols static
scripts/misc-check: check unnecessary #include <linux/export.h> when W=1
scripts/misc-check: check missing #include <linux/export.h> when W=1
scripts/misc-check: add double-quotes to satisfy shellcheck
kbuild: move W=1 check for scripts/misc-check to top-level Makefile
scripts/tags.sh: allow to use alternative ctags implementation
kconfig: introduce menu type enum
docs: symbol-namespaces: fix reST warning with literal block
kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n
tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
docs/core-api/symbol-namespaces: drop table of contents and section numbering
modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time
kbuild: move kbuild syntax processing to scripts/Makefile.build
Makefile: remove dependency on archscripts for header installation
Documentation/kbuild: Add new gendwarfksyms kABI rules
Documentation/kbuild: Drop section numbers
...
Linus Torvalds [Sat, 7 Jun 2025 17:00:03 +0000 (10:00 -0700)]
Merge tag 'sh-for-v6.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
- replace the __ASSEMBLY__ with __ASSEMBLER__ macro in all headers
since the latter is now defined automatically by both GCC and Clang
when compiling assembly code (Thomas Huth)
- set the default SPI mode for the ecovec24 board which became
necessary after a new mode member as added to the sh_msiof_spi_info
struct in cf9e4784f3bd ("spi: sh-msiof: Add slave mode support")
(Geert Uytterhoeven)
- remove unused variables in the kprobes code in
kprobe_exceptions_notify() (Mike Rapoport)
* tag 'sh-for-v6.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: kprobes: Remove unused variables in kprobe_exceptions_notify()
sh: ecovec24: Make SPI mode explicit
sh: Replace __ASSEMBLY__ with __ASSEMBLER__ in all headers
Linus Torvalds [Sat, 7 Jun 2025 16:56:18 +0000 (09:56 -0700)]
Merge tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Adjust the 'make install' operation
- Support SCHED_MC (Multi-core scheduler)
- Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
- Enable HAVE_ARCH_STACKLEAK
- Increase max supported CPUs up to 2048
- Introduce the numa_memblks conversion
- Add PWM controller nodes in dts
- Some bug fixes and other small changes
* tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
platform/loongarch: laptop: Unregister generic_sub_drivers on exit
platform/loongarch: laptop: Add backlight power control support
platform/loongarch: laptop: Get brightness setting from EC on probe
LoongArch: dts: Add PWM support to Loongson-2K2000
LoongArch: dts: Add PWM support to Loongson-2K1000
LoongArch: dts: Add PWM support to Loongson-2K0500
LoongArch: vDSO: Correctly use asm parameters in syscall wrappers
LoongArch: Fix panic caused by NULL-PMD in huge_pte_offset()
LoongArch: Preserve firmware configuration when desired
LoongArch: Avoid using $r0/$r1 as "mask" for csrxchg
LoongArch: Introduce the numa_memblks conversion
LoongArch: Increase max supported CPUs up to 2048
LoongArch: Enable HAVE_ARCH_STACKLEAK
LoongArch: Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
LoongArch: Add SCHED_MC (Multi-core scheduler) support
LoongArch: Add some annotations in archhelp
LoongArch: Using generic scripts/install.sh in `make install`
LoongArch: Add a default install.sh
Linus Torvalds [Sat, 7 Jun 2025 16:40:08 +0000 (09:40 -0700)]
Merge tag 'sound-fix-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of fix patches for the 6.16-rc1 merge window.
Most of changes are about ASoC, especially lots of AVS driver fixes.
Larger LOCs are seen in TAS571x codec drivers, but the changes are
trivial and safe. The rest are all device-specific small fixes"
* tag 'sound-fix-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
ASoC: Intel: avs: boards: Fix rt5663 front end name
ASoC: Intel: avs: Simplify verification of parse_int_array() result
ALSA: usb-audio: Add implicit feedback quirk for RODE AI-1
ALSA: hda: Ignore unsol events for cards being shut down
ALSA: hda: Add new pci id for AMD GPU display HD audio controller
ALSA: hda: cs35l41: Constify regmap_irq_chip
ALSA: usb-audio: Add a quirk for Lenovo Thinkpad Thunderbolt 3 dock
ASoC: ti: omap-hdmi: Re-add dai_link->platform to fix card init
ASoC: pcm: Do not open FEs with no BEs connected
ASoC: rt1320: fix speaker noise when volume bar is 100%
ASoC: Intel: avs: Include missing string.h
ASoC: Intel: avs: Verify content returned by parse_int_array()
ASoC: Intel: avs: Verify kcalloc() status when setting constraints
ASoC: Intel: avs: Fix paths in MODULE_FIRMWARE hints
ASoC: Intel: avs: Fix possible null-ptr-deref when initing hw
ASoC: Intel: avs: Fix PPLCxFMT calculation
ASoC: Intel: avs: Fix deadlock when the failing IPC is SET_D0IX
ASoC: codecs: hda: Fix RPM usage count underflow
ASoC: amd: yc: Add support for Lenovo Yoga 7 16ARP8
ASoC: tas571x: fix tas5733 num_controls
...
Linus Torvalds [Sat, 7 Jun 2025 14:24:07 +0000 (07:24 -0700)]
Merge tag 'ubifs-for-linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull JFFS2 and UBIFS fixes from Richard Weinberger:
"JFFS2:
- Correctly check return code of jffs2_prealloc_raw_node_refs()
UBIFS:
- Spelling fixes"
* tag 'ubifs-for-linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: check jffs2_prealloc_raw_node_refs() result in few other places
jffs2: check that raw node were preallocated before writing summary
ubifs: Fix grammar in error message