Linus Torvalds [Tue, 2 Dec 2025 19:35:49 +0000 (11:35 -0800)]
Merge tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:
- Add microcode staging support on Intel: it moves the sole microcode
blobs loading to a non-critical path so that microcode loading
latencies are kept at minimum. The actual "directing" the hardware to
load microcode is the only step which is done on the critical path.
This scheme is also opportunistic as in: on a failure, the machinery
falls back to normal loading
- Add the capability to the AMD side of the loader to select one of two
per-family/model/stepping patches: one is pre-Entrysign and the other
is post-Entrysign; with the goal to take care of machines which
haven't updated their BIOS yet - something they should absolutely do
as this is the only proper Entrysign fix
- Other small cleanups and fixlets
* tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Mark early_parse_cmdline() as __init
x86/microcode/AMD: Select which microcode patch to load
x86/microcode/intel: Enable staging when available
x86/microcode/intel: Support mailbox transfer
x86/microcode/intel: Implement staging handler
x86/microcode/intel: Define staging state struct
x86/microcode/intel: Establish staging control logic
x86/microcode: Introduce staging step to reduce late-loading time
x86/cpu/topology: Make primary thread mask available with SMP=n
Linus Torvalds [Tue, 2 Dec 2025 19:04:37 +0000 (11:04 -0800)]
Merge tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- The second part of the AMD MCA interrupts rework after the
last-minute show-stopper from the last merge window was sorted out.
After this, the AMD MCA deferred errors, thresholding and corrected
errors interrupt handlers use common MCA code and are tightly
integrated into the core MCA code, thereby getting rid of
considerable duplication. All culminating into allowing CMCI error
thresholding storms to be detected at AMD too, using the common
infrastructure
- Add support for two new MCA bank bits on AMD Zen6 which denote
whether the error address logged is a system physical address, which
obviates the need for it to be translated before further error
recovery can be done
* tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Handle AMD threshold interrupt storms
x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems
x86/mce: Add support for physical address valid bit
x86/mce: Save and use APEI corrected threshold limit
x86/mce/amd: Define threshold restart function for banks
x86/mce/amd: Remove redundant reset_block()
x86/mce/amd: Support SMCA Corrected Error Interrupt
x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
x86/mce: Unify AMD DFR handler with MCA Polling
x86/mce: Unify AMD THR handler with MCA Polling
Linus Torvalds [Tue, 2 Dec 2025 18:45:50 +0000 (10:45 -0800)]
Merge tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- imh_edac: Add a new EDAC driver for Intel Diamond Rapids and future
incarnations of this memory controllers architecture
- amd64_edac: Remove the legacy csrow sysfs interface which has been
deprecated and unused (we assume) for at least a decade
- Add the capability to fallback to BIOS-provided address translation
functionality (ACPI PRM) which can be used on systems unsupported by
the current AMD address translation library
- The usual fixes, fixlets, cleanups and improvements all over the
place
* tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16()
EDAC/igen6: Fix error handling in igen6_edac driver
EDAC/imh: Setup 'imh_test' debugfs testing node
EDAC/{skx_comm,imh}: Detect 2-level memory configuration
EDAC/skx_common: Extend the maximum number of DRAM chip row bits
EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers
EDAC/skx_common: Prepare for skx_set_hi_lo()
EDAC/skx_common: Prepare for skx_get_edac_list()
EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev
EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error()
EDAC/ie31200: Fix error handling in ie31200_register_mci
RAS/CEC: Replace use of system_wq with system_percpu_wq
EDAC: Remove the legacy EDAC sysfs interface
EDAC/amd64: Remove NUM_CONTROLLERS macro
EDAC/amd64: Generate ctl_name string at runtime
RAS/AMD/ATL: Require PRM support for future systems
ACPI: PRM: Add acpi_prm_handler_available()
RAS/AMD/ATL: Return error codes from helper functions
Linus Torvalds [Tue, 2 Dec 2025 18:18:49 +0000 (10:18 -0800)]
Merge tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core irq cleanup from Thomas Gleixner:
"Tree wide cleanup of the remaining users of in_irq() which got
replaced by in_hardirq() and marked deprecated in 2020"
* tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide: Remove in_irq()
Linus Torvalds [Tue, 2 Dec 2025 17:58:33 +0000 (09:58 -0800)]
Merge tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Prevent a thundering herd problem when the timekeeper CPU is delayed
and a large number of CPUs compete to acquire jiffies_lock to do the
update. Limit it to one CPU with a separate "uncontended" atomic
variable.
- A set of improvements for the timer migration mechanism:
- Support imbalanced NUMA trees correctly
- Support dynamic exclusion of CPUs from the migrator duty to allow
the cpuset/isolation mechanism to exclude them from handling
timers of remote idle CPUs
- The usual small updates, cleanups and enhancements
* tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Exclude isolated cpus from hierarchy
cpumask: Add initialiser to use cleanup helpers
sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
timers/migration: Use scoped_guard on available flag set/clear
timers/migration: Add mask for CPUs available in the hierarchy
timers/migration: Rename 'online' bit to 'available'
selftests/timers/nanosleep: Add tests for return of remaining time
selftests/timers: Clean up kernel version check in posix_timers
time: Fix a few typos in time[r] related code comments
time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc
hrtimer: Store time as ktime_t in restart block
timers/migration: Remove dead code handling idle CPU checking for remote timers
timers/migration: Remove unused "cpu" parameter from tmigr_get_group()
timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy
timers/migration: Fix imbalanced NUMA trees
timers/migration: Remove locking on group connection
timers/migration: Convert "while" loops to use "for"
tick/sched: Limit non-timekeeper CPUs calling jiffies update
Linus Torvalds [Tue, 2 Dec 2025 17:54:27 +0000 (09:54 -0800)]
Merge tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clocksource updates from Thomas Gleixner:
"Updates for clocksource and clockevent drivers:
- A new driver for the Realtel system timer
- Prevent the unbinding of timers when the drivers do not support
that
- Expand the timer counter readout for the SPRD driver to 64 bit
to allow IOT devices suspend times of more than 36 hours, which
is the current limit of the 32-bi readout
- The usual small cleanups, fixes and enhancements all over the
place"
* tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers: Add Realtek system timer driver
dt-bindings: timer: Add Realtek SYSTIMER
clocksource/drivers/stm32-lp: Drop unused module alias
clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC
clocksource/drivers/nxp-stm: Prevent driver unbind
clocksource/drivers/nxp-pit: Prevent driver unbind
clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbind
clocksource/drivers/nxp-stm: Fix section mismatches
clocksource/drivers/sh_cmt: Always leave device running after probe
clocksource/drivers/stm: Fix double deregistration on probe failure
clocksource/drivers/ralink: Fix resource leaks in init error path
clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock source is not registered
clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit
Linus Torvalds [Tue, 2 Dec 2025 17:35:59 +0000 (09:35 -0800)]
Merge tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
"Updates for [PCI] MSI related code:
- Remove one variant of PCI/MSI management as all users have been
converted to use per device domains. That reduces the variants to
two:
The modern and the real archaic legacy variant, which keeps the
usual suspects in the museum category alive.
- Rework the platform MSI device ID detection mechanism in the ARM
GIC world to address resource leaks, duplicated code and other
details. This requires a corresponding preparatory step in the
PCI/iproc driver.
- Trivial core code cleanups"
* tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-its: Rework platform MSI deviceID detection
PCI: iproc: Implement MSI controller node detection with of_msi_xlate()
genirq/msi: Slightly simplify msi_domain_alloc()
PCI/MSI: Delete pci_msi_create_irq_domain()
Linus Torvalds [Tue, 2 Dec 2025 17:32:53 +0000 (09:32 -0800)]
Merge tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq driver updates from Thomas Gleixner:
"Boring updates for interrupt drivers:
- Support for a couple of new ARM64 and RISCV SoC variants and their
magic interrupt controllers which either can reuse existing code or
require quirks due to a botched hardware implementation
- More section mismatch fixes
- The usual cleanups and fixes all over the place"
* tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
irqchip/meson-gpio: Add support for Amlogic S6 S7 and S7D SoCs
dt-bindings: interrupt-controller: Add support for Amlogic S6 S7 and S7D SoCs
dt-bindings: interrupt-controller: aspeed,ast2700: Correct #interrupt-cells and interrupts count
irqchip/aclint-sswi: Add Nuclei UX900 support
dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT SSWI
dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT MSWI
dt-bindings: interrupt-controller: Add Anlogic DR1V90 PLIC
irqchip/irq-bcm7038-l1: Remove unused reg_mask_status()
irqchip/sifive-plic: Fix call to __plic_toggle() in M-Mode code path
irqchip/sifive-plic: Add support for UltraRISC DP1000 PLIC
irqchip/sifive-plic: Cache the interrupt enable state
dt-bindings: interrupt-controller: Add UltraRISC DP1000 PLIC
dt-bindings: vendor-prefixes: Add UltraRISC
irqchip/qcom-irq-combiner: Rename driver structure
irqchip/riscv-imsic: Inline imsic_vector_from_local_id()
irqchip/riscv-imsic: Embed the vector array in lpriv
irqchip/riscv-imsic: Remove redundant irq_data lookups
irqchip/ts4800: Drop unused module alias
irqchip/mvebu-pic: Drop unused module alias
irqchip/meson-gpio: Drop unused module alias
...
Linus Torvalds [Tue, 2 Dec 2025 17:14:26 +0000 (09:14 -0800)]
Merge tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core updates from Thomas Gleixner:
"Updates for the interrupt core and treewide cleanups:
- Rework of the Per Processor Interrupt (PPI) management on ARM[64]
PPI support was built under the assumption that the systems are
homogenous so that the same CPU local device types are connected to
them. That's unfortunately wishful thinking and created horrible
workarounds.
This rework provides affinity management for PPIs so that they can
be individually configured in the firmware tables and mops up the
related drivers all over the place.
- Prevent CPUSET/isolation changes to arbitrarily affine interrupt
threads to random CPUs, which ignores user or driver settings.
- Plug a harmless race in the interrupt affinity proc interface,
which allows to see a half updated mask
- Adjust the priority of secondary interrupt threads on RT, so that
the combination of primary and secondary thread emulates the
hardware interrupt plus thread scenario. Having them at the same
priority can cause starvation issues in some drivers"
* tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
genirq: Remove cpumask availability check on kthread affinity setting
genirq: Fix interrupt threads affinity vs. cpuset isolated partitions
genirq: Prevent early spurious wake-ups of interrupt threads
genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier()
genirq/manage: Reduce priority of forced secondary interrupt handler
genirq/proc: Fix race in show_irq_affinity()
genirq: Fix percpu_devid irq affinity documentation
perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer
irqdomain: Kill of_node_to_fwnode() helper
genirq: Kill irq_{g,s}et_percpu_devid_partition()
irqchip: Kill irq-partition-percpu
irqchip/apple-aic: Drop support for custom PMU irq partitions
irqchip/gic-v3: Drop support for custom PPI partitions
coresight: trbe: Request specific affinities for per CPU interrupts
perf: arm_spe_pmu: Request specific affinities for per CPU interrupts
perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts
genirq: Add request_percpu_irq_affinity() helper
genirq: Allow per-cpu interrupt sharing for non-overlapping affinities
genirq: Update request_percpu_nmi() to take an affinity
genirq: Add affinity to percpu_devid interrupt requests
...
Linus Torvalds [Tue, 2 Dec 2025 17:07:48 +0000 (09:07 -0800)]
Merge tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects update from Thomas Gleixner:
"Two small updates for debugobjects:
- Allow pool refill on RT enabled kernels before the scheduler is up
and running to prevent pool exhaustion
- Correct the lockdep override to prevent false positives"
* tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP
debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING
Linus Torvalds [Tue, 2 Dec 2025 16:48:53 +0000 (08:48 -0800)]
Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq updates from Thomas Gleixner:
"A large overhaul of the restartable sequences and CID management:
The recent enablement of RSEQ in glibc resulted in regressions which
are caused by the related overhead. It turned out that the decision to
invoke the exit to user work was not really a decision. More or less
each context switch caused that. There is a long list of small issues
which sums up nicely and results in a 3-4% regression in I/O
benchmarks.
The other detail which caused issues due to extra work in context
switch and task migration is the CID (memory context ID) management.
It also requires to use a task work to consolidate the CID space,
which is executed in the context of an arbitrary task and results in
sporadic uncontrolled exit latencies.
The rewrite addresses this by:
- Removing deprecated and long unsupported functionality
- Moving the related data into dedicated data structures which are
optimized for fast path processing.
- Caching values so actual decisions can be made
- Replacing the current implementation with a optimized inlined
variant.
- Separating fast and slow path for architectures which use the
generic entry code, so that only fault and error handling goes into
the TIF_NOTIFY_RESUME handler.
- Rewriting the CID management so that it becomes mostly invisible in
the context switch path. That moves the work of switching modes
into the fork/exit path, which is a reasonable tradeoff. That work
is only required when a process creates more threads than the
cpuset it is allowed to run on or when enough threads exit after
that. An artificial thread pool benchmarks which triggers this did
not degrade, it actually improved significantly.
The main effect in migration heavy scenarios is that runqueue lock
held time and therefore contention goes down significantly"
* tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/mmcid: Switch over to the new mechanism
sched/mmcid: Implement deferred mode change
irqwork: Move data struct to a types header
sched/mmcid: Provide CID ownership mode fixup functions
sched/mmcid: Provide new scheduler CID mechanism
sched/mmcid: Introduce per task/CPU ownership infrastructure
sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
sched/mmcid: Provide precomputed maximal value
sched/mmcid: Move initialization out of line
signal: Move MMCID exit out of sighand lock
sched/mmcid: Convert mm CID mask to a bitmap
cpumask: Cache num_possible_cpus()
sched/mmcid: Use cpumask_weighted_or()
cpumask: Introduce cpumask_weighted_or()
sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
sched/mmcid: Move scheduler code out of global header
sched: Fixup whitespace damage
sched/mmcid: Cacheline align MM CID storage
sched/mmcid: Use proper data structures
sched/mmcid: Revert the complex CID management
...
Linus Torvalds [Tue, 2 Dec 2025 16:01:39 +0000 (08:01 -0800)]
Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scoped user access updates from Thomas Gleixner:
"Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
CONFIG_CPU_SPECTRE=n.
This makes it possible to access a 64bit value in generic code with
[unsafe_]get_user(). All other architectures and ARM variants
provide the relevant accessors already.
- Ensure that ASM GOTO jump label usage in the user mode access
helpers always goes through a local C scope label indirection
inside the helpers.
This is required because compilers are not supporting that a ASM
GOTO target leaves a auto cleanup scope. GCC silently fails to emit
the cleanup invocation and CLANG fails the build.
[ Editor's note: gcc-16 will have fixed the code generation issue
in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm
goto jumps [PR122835]"). But we obviously have to deal with clang
and older versions of gcc, so.. - Linus ]
This provides generic wrapper macros and the conversion of affected
architecture code to use them.
- Scoped user mode access with auto cleanup
Access to user mode memory can be required in hot code paths, but
if it has to be done with user controlled pointers, the access is
shielded with a speculation barrier, so that the CPU cannot
speculate around the address range check. Those speculation
barriers impact performance quite significantly.
This cost can be avoided by "masking" the provided pointer so it is
guaranteed to be in the valid user memory access range and
otherwise to point to a guaranteed unpopulated address space. This
has to be done without branches so it creates an address dependency
for the access, which the CPU cannot speculate ahead.
This results in repeating and error prone programming patterns:
if (can_do_masked_user_access())
from = masked_user_read_access_begin((from));
else if (!user_read_access_begin(from, sizeof(*from)))
return -EFAULT;
unsafe_get_user(val, from, Efault);
user_read_access_end();
return 0;
Efault:
user_read_access_end();
return -EFAULT;
which can be replaced with scopes and automatic cleanup:
- Convert code which implements the above pattern over to
scope_user.*.access(). This also corrects a couple of imbalanced
masked_*_begin() instances which are harmless on most
architectures, but prevent PowerPC from implementing the masking
optimization.
- Add a missing speculation barrier in copy_from_user_iter()"
* tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
scm: Convert put_cmsg() to scoped user access
iov_iter: Add missing speculation barrier to copy_from_user_iter()
iov_iter: Convert copy_from_user_iter() to masked user access
select: Convert to scoped user access
x86/futex: Convert to scoped user access
futex: Convert to get/put_user_inline()
uaccess: Provide put/get_user_inline()
uaccess: Provide scoped user access regions
arm64: uaccess: Use unsafe wrappers for ASM GOTO
s390/uaccess: Use unsafe wrappers for ASM GOTO
riscv/uaccess: Use unsafe wrappers for ASM GOTO
powerpc/uaccess: Use unsafe wrappers for ASM GOTO
x86/uaccess: Use unsafe wrappers for ASM GOTO
uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
ARM: uaccess: Implement missing __get_user_asm_dword()
Linus Torvalds [Tue, 2 Dec 2025 05:33:01 +0000 (21:33 -0800)]
Merge tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull bug handling infrastructure updates from Ingo Molnar:
"Core updates:
- Improve WARN(), which has vararg printf like arguments, to work
with the x86 #UD based WARN-optimizing infrastructure by hiding the
format in the bug_table and replacing this first argument with the
address of the bug-table entry, while making the actual function
that's called a UD1 instruction (Peter Zijlstra)
- Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo
Molnar, s390 support by Heiko Carstens)
- <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter
Zijlstra)"
* tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
x86/bug: Fix BUG_FORMAT vs KASLR
x86_64/bug: Inline the UD1
x86/bug: Implement WARN_ONCE()
x86_64/bug: Implement __WARN_printf()
x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED
x86/bug: Add BUG_FORMAT basics
bug: Allow architectures to provide __WARN_printf()
bug: Implement WARN_ON() using __WARN_FLAGS()
bug: Add report_bug_entry()
bug: Add BUG_FORMAT_ARGS infrastructure
bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS
bug: Add BUG_FORMAT infrastructure
x86: Rework __bug_table helpers
bugs/s390: Remove private WARN_ON() implementation
bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output
bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS()
...
* tag 'x86-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Prevent KASAN false positive warnings in __show_regs()
x86/alternative: Drop not needed test after call of alt_replace_call()
* tag 'x86-apic-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix frequency in apic=verbose log output
x86/ioapic: Simplify mp_irqdomain_alloc() slightly
- Remove double update_rq_clock() in __set_cpus_allowed_ptr_locked()
(Hao Jia)
- Increase sched_tick_remote timeout (Phil Auld)
- sched/deadline: Use cpumask_weight_and() in dl_bw_cpus() (Shrikanth
Hegde)
- sched/deadline: Clean up select_task_rq_dl() (Shrikanth Hegde)"
* tag 'sched-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
sched: Provide and use set_need_resched_current()
sched/fair: Proportional newidle balance
sched/fair: Small cleanup to update_newidle_cost()
sched/fair: Small cleanup to sched_balance_newidle()
sched/fair: Revert max_newidle_lb_cost bump
sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals
sched/fair: Enable scheduler feature NEXT_BUDDY
sched: Increase sched_tick_remote timeout
sched/fair: Have SD_SERIALIZE affect newidle balancing
sched/fair: Skip sched_balance_running cmpxchg when balance is not due
sched/deadline: Minor cleanup in select_task_rq_dl()
sched/deadline: Use cpumask_weight_and() in dl_bw_cpus
sched/deadline: Document dl_server
sched/deadline: Fix dl_server stop condition
sched/deadline: Fix dl_server time accounting
sched/core: Remove double update_rq_clock() in __set_cpus_allowed_ptr_locked()
sched/eevdf: Fix min_vruntime vs avg_vruntime
sched/core: Add comment explaining force-idle vruntime snapshots
sched/core: Optimize core cookie matching check
sched/proxy: Yield the donor task
...
- Large series to prepare for and implement architectural PEBS
support for Intel platforms such as Clearwater Forest (CWF) and
Panther Lake (PTL). (Dapeng Mi, Kan Liang)
- Check dynamic constraints (Kan Liang)
- Optimize PEBS extended config (Peter Zijlstra)
- cstates:
- Remove PC3 support from LunarLake (Zhang Rui)
- Add Pantherlake support (Zhang Rui)
- Clearwater Forest support (Zide Chen)
AMD PMU driver:
- x86/amd: Check event before enable to avoid GPF (George Kennedy)
- perf/x86: Fix NULL event access and potential PEBS record loss
(Dapeng Mi)
- Misc other fixes and cleanups (Dapeng Mi, Ingo Molnar, Peter
Zijlstra)"
* tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use
perf/x86/intel: Optimize PEBS extended config
perf/x86/intel: Check PEBS dyn_constraints
perf/x86/intel: Add a check for dynamic constraints
perf/x86/intel: Add counter group support for arch-PEBS
perf/x86/intel: Setup PEBS data configuration and enable legacy groups
perf/x86/intel: Update dyn_constraint base on PEBS event precise level
perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
perf/x86/intel: Process arch-PEBS records or record fragments
perf/x86/intel/ds: Factor out PEBS group processing code to functions
perf/x86/intel/ds: Factor out PEBS record processing code to functions
perf/x86/intel: Initialize architectural PEBS
perf/x86/intel: Correct large PEBS flag check
perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call
perf/x86: Fix NULL event access and potential PEBS record loss
perf/x86: Remove redundant is_x86_event() prototype
entry,unwind/deferred: Fix unwind_reset_info() placement
unwind_user/x86: Fix arch=um build
perf: Support deferred user unwind
unwind_user/x86: Teach FP unwind about start of function
...
Introduce new objtool features and a klp-build script to generate
livepatch modules using a source .patch as input.
This builds on concepts from the longstanding out-of-tree kpatch
project which began in 2012 and has been used for many years to
generate livepatch modules for production kernels. However, this is a
complete rewrite which incorporates hard-earned lessons from 12+
years of maintaining kpatch.
Key improvements compared to kpatch-build:
- Integrated with objtool: Leverages objtool's existing control-flow
graph analysis to help detect changed functions.
- Works on vmlinux.o: Supports late-linked objects, making it
compatible with LTO, IBT, and similar.
- Simplified code base: ~3k fewer lines of code.
- Upstream: No more out-of-tree #ifdef hacks, far less cruft.
- Cleaner internals: Vastly simplified logic for
symbol/section/reloc inclusion and special section extraction.
- Robust __LINE__ macro handling: Avoids false positive binary diffs
caused by the __LINE__ macro by introducing a fix-patch-lines
script which injects #line directives into the source .patch to
preserve the original line numbers at compile time.
- Disassemble code with libopcodes instead of running objdump
(Alexandre Chartre)
- Disassemble support (-d option to objtool) by Alexandre Chartre,
which supports the decoding of various Linux kernel code generation
specials such as alternatives:
- Function validation tracing support (Alexandre Chartre)
- Various -ffunction-sections fixes (Josh Poimboeuf)
- Clang AutoFDO (Automated Feedback-Directed Optimizations) support
(Josh Poimboeuf)
- Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo
Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra,
Thorsten Blum)
* tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
objtool: Fix segfault on unknown alternatives
objtool: Build with disassembly can fail when including bdf.h
objtool: Trim trailing NOPs in alternative
objtool: Add wide output for disassembly
objtool: Compact output for alternatives with one instruction
objtool: Improve naming of group alternatives
objtool: Add Function to get the name of a CPU feature
objtool: Provide access to feature and flags of group alternatives
objtool: Fix address references in alternatives
objtool: Disassemble jump table alternatives
objtool: Disassemble exception table alternatives
objtool: Print addresses with alternative instructions
objtool: Disassemble group alternatives
objtool: Print headers for alternatives
objtool: Preserve alternatives order
objtool: Add the --disas=<function-pattern> action
objtool: Do not validate IBT for .return_sites and .call_sites
objtool: Improve tracing of alternative instructions
objtool: Add functions to better name alternatives
objtool: Identify the different types of alternatives
...
- lock: Add a Pin<&mut T> accessor (Daniel Almeida)"
* tag 'locking-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/local_lock: Fix all kernel-doc warnings
locking/local_lock: s/l/__l/ and s/tl/__tl/ to reduce the risk of shadowing
locking/local_lock: Add the <linux/local_lock*.h> headers to MAINTAINERS
locking/mutex: Redo __mutex_init() to reduce generated code size
rust: debugfs: Replace the usage of Rust native atomics
rust: sync: atomic: Implement Debug for Atomic<Debug>
rust: sync: atomic: Make Atomic*Ops pub(crate)
seqlock: Allow KASAN to fail optimizing
rust: debugfs: Implement Reader for Mutex<T> only when T is Unpin
seqlock: Change do_io_accounting() to use scoped_seqlock_read()
seqlock: Change do_task_stat() to use scoped_seqlock_read()
seqlock: Change thread_group_cputime() to use scoped_seqlock_read()
seqlock: Introduce scoped_seqlock_read()
documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry
atomic: Skip alignment check for try_cmpxchg() old arg
rust: lock: Add a Pin<&mut T> accessor
rust: lock: Pin the inner data
rust: lock: guard: Add T: Unpin bound to DerefMut
locking/spinlock/debug: Fix data-race in do_raw_write_lock
Linus Torvalds [Tue, 2 Dec 2025 01:32:07 +0000 (17:32 -0800)]
Merge tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fd prepare updates from Christian Brauner:
"This adds the FD_ADD() and FD_PREPARE() primitive. They simplify the
common pattern of get_unused_fd_flags() + create file + fd_install()
that is used extensively throughout the kernel and currently requires
cumbersome cleanup paths.
FD_ADD() - For simple cases where a file is installed immediately:
The primitives are centered around struct fd_prepare. FD_PREPARE()
encapsulates all allocation and cleanup logic and must be followed by
a call to fd_publish() which associates the fd with the file and
installs it into the caller's fdtable. If fd_publish() isn't called,
both are deallocated automatically. FD_ADD() is a shorthand that does
fd_publish() immediately and never exposes the struct to the caller.
I've implemented this in a way that it's compatible with the cleanup
infrastructure while also being usable separately. IOW, it's centered
around struct fd_prepare which is aliased to class_fd_prepare_t and so
we can make use of all the basica guard infrastructure"
* tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
io_uring: convert io_create_mock_file() to FD_PREPARE()
file: convert replace_fd() to FD_PREPARE()
vfio: convert vfio_group_ioctl_get_device_fd() to FD_ADD()
tty: convert ptm_open_peer() to FD_ADD()
ntsync: convert ntsync_obj_get_fd() to FD_PREPARE()
media: convert media_request_alloc() to FD_PREPARE()
hv: convert mshv_ioctl_create_partition() to FD_ADD()
gpio: convert linehandle_create() to FD_PREPARE()
pseries: port papr_rtas_setup_file_interface() to FD_ADD()
pseries: convert papr_platform_dump_create_handle() to FD_ADD()
spufs: convert spufs_gang_open() to FD_PREPARE()
papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()
spufs: convert spufs_context_open() to FD_PREPARE()
net/socket: convert __sys_accept4_file() to FD_ADD()
net/socket: convert sock_map_fd() to FD_ADD()
net/kcm: convert kcm_ioctl() to FD_PREPARE()
net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()
secretmem: convert memfd_secret() to FD_ADD()
memfd: convert memfd_create() to FD_ADD()
bpf: convert bpf_token_create() to FD_PREPARE()
...
Linus Torvalds [Tue, 2 Dec 2025 00:38:21 +0000 (16:38 -0800)]
Merge tag 'vfs-6.19-rc1.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull autofs update from Christian Brauner:
"Prevent futile mount triggers in private mount namespaces.
Fix a problematic loop in autofs when a mount namespace contains
autofs mounts that are propagation private and there is no
namespace-specific automount daemon to handle possible automounting.
Previously, attempted path resolution would loop until MAXSYMLINKS was
reached before failing, causing significant noise in the log.
The fix adds a check in autofs ->d_automount() so that the VFS can
immediately return EPERM in this case. Since the mount is propagation
private, EPERM is the most appropriate error code"
* tag 'vfs-6.19-rc1.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
autofs: dont trigger mount if it cant succeed
Linus Torvalds [Tue, 2 Dec 2025 00:31:21 +0000 (16:31 -0800)]
Merge tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs cred guard conversion from Christian Brauner:
"This converts all of overlayfs to use credential guards, eliminating
manual credential management throughout the filesystem.
Credential guard conversion:
- Convert all of overlayfs to use credential guards, replacing the
manual ovl_override_creds()/ovl_revert_creds() pattern with scoped
guards.
This makes credential handling visually explicit and eliminates a
class of potential bugs from mismatched override/revert calls.
Introduced a specialized guard for file creation operations
that handles the two-phase credential override (mounter
credentials, then fs{g,u}id override). The new pattern is much
clearer:
- Introduce struct ovl_renamedata to simplify rename handling
- Don't override credentials for ovl_check_whiteouts() (unnecessary)
- Remove unneeded semicolon"
* tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits)
ovl: remove unneeded semicolon
ovl: remove struct ovl_cu_creds and associated functions
ovl: port ovl_copy_up_tmpfile() to cred guard
ovl: mark *_cu_creds() as unused temporarily
ovl: port ovl_copy_up_workdir() to cred guard
ovl: add copy up credential guard
ovl: drop ovl_setup_cred_for_create()
ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard
ovl: mark ovl_setup_cred_for_create() as unused temporarily
ovl: reflow ovl_create_or_link()
ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard
ovl: add ovl_override_creator_creds cred guard
ovl: remove ovl_revert_creds()
ovl: port ovl_fill_super() to cred guard
ovl: refactor ovl_fill_super()
ovl: port ovl_lower_positive() to cred guard
ovl: port ovl_lookup() to cred guard
ovl: refactor ovl_lookup()
ovl: port ovl_copyfile() to cred guard
ovl: port ovl_rename() to cred guard
...
Linus Torvalds [Tue, 2 Dec 2025 00:13:46 +0000 (16:13 -0800)]
Merge tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull directory locking updates from Christian Brauner:
"This contains the work to add centralized APIs for directory locking
operations.
This series is part of a larger effort to change directory operation
locking to allow multiple concurrent operations in a directory. The
ultimate goal is to lock the target dentry(s) rather than the whole
parent directory.
To help with changing the locking protocol, this series centralizes
locking and lookup in new helper functions. The helpers establish a
pattern where it is the dentry that is being locked and unlocked
(currently the lock is held on dentry->d_parent->d_inode, but that can
change in the future).
This also changes vfs_mkdir() to unlock the parent on failure, as well
as dput()ing the dentry. This allows end_creating() to only require
the target dentry (which may be IS_ERR() after vfs_mkdir()), not the
parent"
* tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nfsd: fix end_creating() conversion
VFS: introduce end_creating_keep()
VFS: change vfs_mkdir() to unlock on failure.
ecryptfs: use new start_creating/start_removing APIs
Add start_renaming_two_dentries()
VFS/ovl/smb: introduce start_renaming_dentry()
VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
VFS: add start_creating_killable() and start_removing_killable()
VFS: introduce start_removing_dentry()
smb/server: use end_removing_noperm for for target of smb2_create_link()
VFS: introduce start_creating_noperm() and start_removing_noperm()
VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
VFS: tidy up do_unlinkat()
VFS: introduce start_dirop() and end_dirop()
debugfs: rename end_creating() to debugfs_end_creating()
Linus Torvalds [Mon, 1 Dec 2025 23:34:41 +0000 (15:34 -0800)]
Merge tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull directory delegations update from Christian Brauner:
"This contains the work for recall-only directory delegations for
knfsd.
Add support for simple, recallable-only directory delegations. This
was decided at the fall NFS Bakeathon where the NFS client and server
maintainers discussed how to merge directory delegation support.
The approach starts with recallable-only delegations for several reasons:
1. RFC8881 has gaps that are being addressed in RFC8881bis. In
particular, it requires directory position information for
CB_NOTIFY callbacks, which is difficult to implement properly
under Linux. The spec is being extended to allow that information
to be omitted.
2. Client-side support for CB_NOTIFY still lags. The client side
involves heuristics about when to request a delegation.
3. Early indication shows simple, recallable-only delegations can
help performance. Anna Schumaker mentioned seeing a multi-minute
speedup in xfstests runs with them enabled.
With these changes, userspace can also request a read lease on a
directory that will be recalled on conflicting accesses. This may be
useful for applications like Samba. Users can disable leases
altogether via the fs.leases-enable sysctl if needed.
VFS changes:
- Dedicated Type for Delegations
Introduce struct delegated_inode to track inodes that may have
delegations that need to be broken. This replaces the previous
approach of passing raw inode pointers through the delegation
breaking code paths, providing better type safety and clearer
semantics for the delegation machinery.
- Break parent directory delegations in open(..., O_CREAT) codepath
- Allow mkdir to wait for delegation break on parent
- Allow rmdir to wait for delegation break on parent
- Add try_break_deleg calls for parents to vfs_link(), vfs_rename(),
and vfs_unlink()
- Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations
on parent directory
- Clean up argument list for vfs_create()
- Expose delegation support to userland
Filelock changes:
- Make lease_alloc() take a flags argument
- Rework the __break_lease API to use flags
- Add struct delegated_inode
- Push the S_ISREG check down to ->setlease handlers
- Lift the ban on directory leases in generic_setlease
NFSD changes:
- Allow filecache to hold S_IFDIR files
- Allow DELEGRETURN on directories
- Wire up GET_DIR_DELEGATION handling
Fixes:
- Fix kernel-doc warnings in __fcntl_getlease
- Add needed headers for new struct delegation definition"
* tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: add needed headers for new struct delegation definition
filelock: __fcntl_getlease: fix kernel-doc warnings
vfs: expose delegation support to userland
nfsd: wire up GET_DIR_DELEGATION handling
nfsd: allow DELEGRETURN on directories
nfsd: allow filecache to hold S_IFDIR files
filelock: lift the ban on directory leases in generic_setlease
vfs: make vfs_symlink break delegations on parent dir
vfs: make vfs_mknod break delegations on parent directory
vfs: make vfs_create break delegations on parent directory
vfs: clean up argument list for vfs_create()
vfs: break parent dir delegations in open(..., O_CREAT) codepath
vfs: allow rmdir to wait for delegation break on parent
vfs: allow mkdir to wait for delegation break on parent
vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
filelock: push the S_ISREG check down to ->setlease handlers
filelock: add struct delegated_inode
filelock: rework the __break_lease API to use flags
filelock: make lease_alloc() take a flags argument
Linus Torvalds [Mon, 1 Dec 2025 23:22:40 +0000 (15:22 -0800)]
Merge tag 'vfs-6.19-rc1.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull minix fixes from Christian Brauner:
"Fix two syzbot corruption bugs in the minix filesystem.
Syzbot fuzzes filesystems by trying to mount and manipulate
deliberately corrupted images. This should not lead to BUG_ONs and
WARN_ONs for easy to detect corruptions.
- Add error handling to minix filesystem for inode corruption
detection, enabling the filesystem to report such corruptions
cleanly.
- Fix a drop_nlink warning in minix_rmdir() triggered by corrupted
directory link counts.
- Fix a drop_nlink warning in minix_rename() triggered by corrupted
inode link counts"
* tag 'vfs-6.19-rc1.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Fix a drop_nlink warning in minix_rename
Fix a drop_nlink warning in minix_rmdir
Add error handling to minix filesystem for inode corruption detection
Linus Torvalds [Mon, 1 Dec 2025 22:39:03 +0000 (14:39 -0800)]
Merge tag 'vfs-6.19-rc1.guards' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock lock guard updates from Christian Brauner:
"This starts the work of introducing guards for superblock related
locks.
Introduce super_write_guard for scoped superblock write protection.
This provides a guard-based alternative to the manual sb_start_write()
and sb_end_write() pattern, allowing the compiler to automatically
handle the cleanup"
* tag 'vfs-6.19-rc1.guards' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: use super write guard in xfs_file_ioctl()
open: use super write guard in do_ftruncate()
btrfs: use super write guard in relocating_repair_kthread()
ext4: use super write guard in write_mmp_block()
btrfs: use super write guard in sb_start_write()
btrfs: use super write guard btrfs_run_defrag_inode()
btrfs: use super write guard in btrfs_reclaim_bgs_work()
fs: add super_write_guard
Linus Torvalds [Mon, 1 Dec 2025 22:18:01 +0000 (14:18 -0800)]
Merge tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fs header updates from Christian Brauner:
"This contains initial work to start splitting up fs.h.
Begin the long-overdue work of splitting up the monolithic fs.h
header. The header has grown to over 3000 lines and includes types and
functions for many different subsystems, making it difficult to
navigate and causing excessive compilation dependencies.
This series introduces new focused headers for superblock-related
code:
- Rename fs_types.h to fs_dirent.h to better reflect its actual
content (directory entry types)
- Add fs/super_types.h containing superblock type definitions
- Add fs/super.h containing superblock function declarations
This is the first step in a longer effort to modularize the VFS
headers.
Cleanups:
- Inode Field Layout Optimization (Mateusz Guzik)
Move inode fields used during fast path lookup closer together to
improve cache locality during path resolution.
- current_umask() Optimization (Mateusz Guzik)
Inline current_umask() and move it to fs_struct.h. This improves
performance by avoiding function call overhead for this
frequently-used function, and places it in a more appropriate
header since it operates on fs_struct"
* tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: move inode fields used during fast path lookup closer together
fs: inline current_umask() and move it to fs_struct.h
fs: add fs/super.h header
fs: add fs/super_types.h header
fs: rename fs_types.h to fs_dirent.h
Linus Torvalds [Mon, 1 Dec 2025 21:45:41 +0000 (13:45 -0800)]
Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull cred guard updates from Christian Brauner:
"This contains substantial credential infrastructure improvements
adding guard-based credential management that simplifies code and
eliminates manual reference counting in many subsystems.
Features:
- Kernel Credential Guards
Add with_kernel_creds() and scoped_with_kernel_creds() guards that
allow using the kernel credentials without allocating and copying
them. This was requested by Linus after seeing repeated
prepare_kernel_creds() calls that duplicate the kernel credentials
only to drop them again later.
The new guards completely avoid the allocation and never expose the
temporary variable to hold the kernel credentials anywhere in
callers.
- Generic Credential Guards
Add scoped_with_creds() guards for the common override_creds() and
revert_creds() pattern. This builds on earlier work that made
override_creds()/revert_creds() completely reference count free.
- Prepare Credential Guards
Add prepare credential guards for the more complex pattern of
preparing a new set of credentials and overriding the current
credentials with them:
- prepare_creds()
- modify new creds
- override_creds()
- revert_creds()
- put_cred()
Cleanups:
- Make init_cred static since it should not be directly accessed
- Add kernel_cred() helper to properly access the kernel credentials
- Fix scoped_class() macro that was introduced two cycles ago
- coredump: split out do_coredump() from vfs_coredump() for cleaner
credential handling
- coredump: move revert_cred() before coredump_cleanup()
- coredump: mark struct mm_struct as const
- coredump: pass struct linux_binfmt as const
- sev-dev: use guard for path"
* tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
trace: use override credential guard
trace: use prepare credential guard
coredump: use override credential guard
coredump: use prepare credential guard
coredump: split out do_coredump() from vfs_coredump()
coredump: mark struct mm_struct as const
coredump: pass struct linux_binfmt as const
coredump: move revert_cred() before coredump_cleanup()
sev-dev: use override credential guards
sev-dev: use prepare credential guard
sev-dev: use guard for path
cred: add prepare credential guard
net/dns_resolver: use credential guards in dns_query()
cgroup: use credential guards in cgroup_attach_permissions()
act: use credential guards in acct_write_process()
smb: use credential guards in cifs_get_spnego_key()
nfs: use credential guards in nfs_idmap_get_key()
nfs: use credential guards in nfs_local_call_write()
nfs: use credential guards in nfs_local_call_read()
erofs: use credential guards
...
Linus Torvalds [Mon, 1 Dec 2025 18:26:38 +0000 (10:26 -0800)]
Merge tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull folio updates from Christian Brauner:
"Add a new folio_next_pos() helper function that returns the file
position of the first byte after the current folio. This is a common
operation in filesystems when needing to know the end of the current
folio.
The helper is lifted from btrfs which already had its own version, and
is now used across multiple filesystems and subsystems:
- btrfs
- buffer
- ext4
- f2fs
- gfs2
- iomap
- netfs
- xfs
- mm
This fixes a long-standing bug in ocfs2 on 32-bit systems with files
larger than 2GiB. Presumably this is not a common configuration, but
the fix is backported anyway. The other filesystems did not have bugs,
they were just mildly inefficient.
This also introduce uoff_t as the unsigned version of loff_t. A recent
commit inadvertently changed a comparison from being unsigned (on
64-bit systems) to being signed (which it had always been on 32-bit
systems), leading to sporadic fstests failures.
Generally file sizes are restricted to being a signed integer, but in
places where -1 is passed to indicate "up to the end of the file", it
is convenient to have an unsigned type to ensure comparisons are
always unsigned regardless of architecture"
* tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: Add uoff_t
mm: Use folio_next_pos()
xfs: Use folio_next_pos()
netfs: Use folio_next_pos()
iomap: Use folio_next_pos()
gfs2: Use folio_next_pos()
f2fs: Use folio_next_pos()
ext4: Use folio_next_pos()
buffer: Use folio_next_pos()
btrfs: Use folio_next_pos()
filemap: Add folio_next_pos()
Linus Torvalds [Mon, 1 Dec 2025 18:17:39 +0000 (10:17 -0800)]
Merge tag 'vfs-6.19-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfd and coredump updates from Christian Brauner:
"Features:
- Expose coredump signal via pidfd
Expose the signal that caused the coredump through the pidfd
interface. The recent changes to rework coredump handling to rely
on unix sockets are in the process of being used in systemd. The
previous systemd coredump container interface requires the coredump
file descriptor and basic information including the signal number
to be sent to the container. This means the signal number needs to
be available before sending the coredump to the container.
- Add supported_mask field to pidfd
Add a new supported_mask field to struct pidfd_info that indicates
which information fields are supported by the running kernel. This
allows userspace to detect feature availability without relying on
error codes or kernel version checks.
Cleanups:
- Drop struct pidfs_exit_info and prepare to drop exit_info pointer,
simplifying the internal publication mechanism for exit and
coredump information retrievable via the pidfd ioctl
- Use guard() for task_lock in pidfs
- Reduce wait_pidfd lock scope
- Add missing PIDFD_INFO_SIZE_VER1 constant
- Add missing BUILD_BUG_ON() assert on struct pidfd_info
Fixes:
- Fix PIDFD_INFO_COREDUMP handling
Selftests:
- Split out coredump socket tests and common helpers into separate
files for better organization
- Fix userspace coredump client detection issues
- Handle edge-triggered epoll correctly
- Ignore ENOSPC errors in tests
- Add debug logging to coredump socket tests, socket protocol tests,
and test helpers
- Add tests for PIDFD_INFO_COREDUMP_SIGNAL
- Add tests for supported_mask field
- Update pidfd header for selftests"
* tag 'vfs-6.19-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
pidfs: reduce wait_pidfd lock scope
selftests/coredump: add second PIDFD_INFO_COREDUMP_SIGNAL test
selftests/coredump: add first PIDFD_INFO_COREDUMP_SIGNAL test
selftests/coredump: ignore ENOSPC errors
selftests/coredump: add debug logging to coredump socket protocol tests
selftests/coredump: add debug logging to coredump socket tests
selftests/coredump: add debug logging to test helpers
selftests/coredump: handle edge-triggered epoll correctly
selftests/coredump: fix userspace coredump client detection
selftests/coredump: fix userspace client detection
selftests/coredump: split out coredump socket tests
selftests/coredump: split out common helpers
selftests/pidfd: add second supported_mask test
selftests/pidfd: add first supported_mask test
selftests/pidfd: update pidfd header
pidfs: expose coredump signal
pidfs: drop struct pidfs_exit_info
pidfs: prepare to drop exit_info pointer
pidfd: add a new supported_mask field
pidfs: add missing BUILD_BUG_ON() assert on struct pidfd_info
...
Linus Torvalds [Mon, 1 Dec 2025 17:47:41 +0000 (09:47 -0800)]
Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
"This contains substantial namespace infrastructure changes including a new
system call, active reference counting, and extensive header cleanups.
The branch depends on the shared kbuild branch for -fms-extensions support.
Features:
- listns() system call
Add a new listns() system call that allows userspace to iterate
through namespaces in the system. This provides a programmatic
interface to discover and inspect namespaces, addressing
longstanding limitations:
Currently, there is no direct way for userspace to enumerate
namespaces. Applications must resort to scanning /proc/*/ns/ across
all processes, which is:
- Inefficient - requires iterating over all processes
- Incomplete - misses namespaces not attached to any running
process but kept alive by file descriptors, bind mounts, or
parent references
- Permission-heavy - requires access to /proc for many processes
- No ordering or ownership information
- No filtering per namespace type
Features include:
- Pagination support for large namespace sets
- Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
- Filtering by owning user namespace
- Permission checks respecting namespace isolation
- Active Reference Counting
Introduce an active reference count that tracks namespace
visibility to userspace. A namespace is visible in the following
cases:
- The namespace is in use by a task
- The namespace is persisted through a VFS object (namespace file
descriptor or bind-mount)
- The namespace is a hierarchical type and is the parent of child
namespaces
The active reference count does not regulate lifetime (that's still
done by the normal reference count) - it only regulates visibility
to namespace file handles and listns().
This prevents resurrection of namespaces that are pinned only for
internal kernel reasons (e.g., user namespaces held by
file->f_cred, lazy TLB references on idle CPUs, etc.) which should
not be accessible via (1)-(3).
- Unified Namespace Tree
Introduce a unified tree structure for all namespaces with:
- Fixed IDs assigned to initial namespaces
- Lookup based solely on inode number
- Maintained list of owned namespaces per user namespace
- Simplified rbtree comparison helpers
Cleanups
- Header Reorganization:
- Move namespace types into separate header (ns_common_types.h)
- Decouple nstree from ns_common header
- Move nstree types into separate header
- Switch to new ns_tree_{node,root} structures with helper functions
- Use guards for ns_tree_lock
- Initial Namespace Reference Count Optimization
- Make all reference counts on initial namespaces a nop to avoid
pointless cacheline ping-pong for namespaces that can never go
away
- Drop custom reference count initialization for initial namespaces
- Add NS_COMMON_INIT() macro and use it for all namespaces
- pid: rely on common reference count behavior
- Miscellaneous Cleanups
- Rename exit_task_namespaces() to exit_nsproxy_namespaces()
- Rename is_initial_namespace() and make argument const
- Use boolean to indicate anonymous mount namespace
- Simplify owner list iteration in nstree
- nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
- nsfs: use inode_just_drop()
- pidfs: raise DCACHE_DONTCACHE explicitly
- pidfs: simplify PIDFD_GET__NAMESPACE ioctls
- libfs: allow to specify s_d_flags
- cgroup: add cgroup namespace to tree after owner is set
- nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
Fixes:
- setns(pidfd, ...) race condition
Fix a subtle race when using pidfds with setns(). When the target
task exits after prepare_nsset() but before commit_nsset(), the
namespace's active reference count might have been dropped. If
setns() then installs the namespaces, it would bump the active
reference count from zero without taking the required reference on
the owner namespace, leading to underflow when later decremented.
The fix resurrects the ownership chain if necessary - if the caller
succeeded in grabbing passive references, the setns() should
succeed even if the target task exits or gets reaped.
- Return EFAULT on put_user() error instead of success
- Make sure references are dropped outside of RCU lock (some
namespaces like mount namespace sleep when putting the last
reference)
- Don't skip active reference count initialization for network
namespace
- Add asserts for active refcount underflow
- Add asserts for initial namespace reference counts (both passive
and active)
- ipc: enable is_ns_init_id() assertions
- Fix kernel-doc comments for internal nstree functions
- Selftests
- 15 active reference count tests
- 9 listns() functionality tests
- 7 listns() permission tests
- 12 inactive namespace resurrection tests
- 3 threaded active reference count tests
- commit_creds() active reference tests
- Pagination and stress tests
- EFAULT handling test
- nsid tests fixes"
* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
nstree: fix kernel-doc comments for internal functions
nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
selftests/namespaces: fix nsid tests
ns: drop custom reference count initialization for initial namespaces
pid: rely on common reference count behavior
ns: add asserts for initial namespace active reference counts
ns: add asserts for initial namespace reference counts
ns: make all reference counts on initial namespace a nop
ipc: enable is_ns_init_id() assertions
fs: use boolean to indicate anonymous mount namespace
ns: rename is_initial_namespace()
ns: make is_initial_namespace() argument const
nstree: use guards for ns_tree_lock
nstree: simplify owner list iteration
nstree: switch to new structures
nstree: add helper to operate on struct ns_tree_{node,root}
nstree: move nstree types into separate header
nstree: decouple from ns_common header
ns: move namespace types into separate header
...
Linus Torvalds [Mon, 1 Dec 2025 17:20:51 +0000 (09:20 -0800)]
Merge tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull writeback updates from Christian Brauner:
"Features:
- Allow file systems to increase the minimum writeback chunk size.
The relatively low minimal writeback size of 4MiB means that
written back inodes on rotational media are switched a lot. Besides
introducing additional seeks, this also can lead to extreme file
fragmentation on zoned devices when a lot of files are cached
relative to the available writeback bandwidth.
This adds a superblock field that allows the file system to
override the default size, and sets it to the zone size for zoned
XFS.
- Add logging for slow writeback when it exceeds
sysctl_hung_task_timeout_secs. This helps identify tasks waiting
for a long time and pinpoint potential issues. Recording the
starting jiffies is also useful when debugging a crashed vmcore.
- Wake up waiting tasks when finishing the writeback of a chunk
Cleanups:
- filemap_* writeback interface cleanups.
Adding filemap_fdatawrite_wbc ended up being a mistake, as all but
the original btrfs caller should be using better high level
interfaces instead.
This series removes all these low-level interfaces, switches btrfs
to a more specific interface, and cleans up other too low-level
interfaces. With this the writeback_control that is passed to the
writeback code is only initialized in three places.
- Remove __filemap_fdatawrite, __filemap_fdatawrite_range, and
filemap_fdatawrite_wbc
- Add filemap_flush_nr helper for btrfs
- Push struct writeback_control into start_delalloc_inodes in btrfs
- Rename filemap_fdatawrite_range_kick to filemap_flush_range
- Stop opencoding filemap_fdatawrite_range in 9p, ocfs2, and mm
- Make wbc_to_tag() inline and use it in fs"
* tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: Make wbc_to_tag() inline and use it in fs.
xfs: set s_min_writeback_pages for zoned file systems
writeback: allow the file system to override MIN_WRITEBACK_PAGES
writeback: cleanup writeback_chunk_size
mm: rename filemap_fdatawrite_range_kick to filemap_flush_range
mm: remove __filemap_fdatawrite_range
mm: remove filemap_fdatawrite_wbc
mm: remove __filemap_fdatawrite
mm,btrfs: add a filemap_flush_nr helper
btrfs: push struct writeback_control into start_delalloc_inodes
btrfs: use the local tmp_inode variable in start_delalloc_inodes
ocfs2: don't opencode filemap_fdatawrite_range in ocfs2_journal_submit_inode_data_buffers
9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close
mm: don't opencode filemap_fdatawrite_range in filemap_invalidate_inode
writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs)
writeback: Wake up waiting tasks when finishing the writeback of a chunk.
Linus Torvalds [Mon, 1 Dec 2025 17:02:34 +0000 (09:02 -0800)]
Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
"Features:
- Hide inode->i_state behind accessors. Open-coded accesses prevent
asserting they are done correctly. One obvious aspect is locking,
but significantly more can be checked. For example it can be
detected when the code is clearing flags which are already missing,
or is setting flags when it is illegal (e.g., I_FREEING when
->i_count > 0)
- Provide accessors for ->i_state, converts all filesystems using
coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
compile
- Rework I_NEW handling to operate without fences, simplifying the
code after the accessor infrastructure is in place
Cleanups:
- Move wait_on_inode() from writeback.h to fs.h
- Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
for clarity
- Cosmetic fixes to LRU handling
- Push list presence check into inode_io_list_del()
- Touch up predicts in __d_lookup_rcu()
- ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
- Assert on ->i_count in iput_final()
- Assert ->i_lock held in __iget()
Fixes:
- Add missing fences to I_NEW handling"
* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
dcache: touch up predicts in __d_lookup_rcu()
fs: push list presence check into inode_io_list_del()
fs: cosmetic fixes to lru handling
fs: rework I_NEW handling to operate without fences
fs: make plain ->i_state access fail to compile
xfs: use the new ->i_state accessors
nilfs2: use the new ->i_state accessors
overlayfs: use the new ->i_state accessors
gfs2: use the new ->i_state accessors
f2fs: use the new ->i_state accessors
smb: use the new ->i_state accessors
ceph: use the new ->i_state accessors
btrfs: use the new ->i_state accessors
Manual conversion to use ->i_state accessors of all places not covered by coccinelle
Coccinelle-based conversion to use ->i_state accessors
fs: provide accessors for ->i_state
fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
fs: move wait_on_inode() from writeback.h to fs.h
fs: add missing fences to I_NEW handling
ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
...
Linus Torvalds [Mon, 1 Dec 2025 16:44:26 +0000 (08:44 -0800)]
Merge tag 'vfs-6.19-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Cheaper MAY_EXEC handling for path lookup. This elides MAY_WRITE
permission checks during path lookup and adds the
IOP_FASTPERM_MAY_EXEC flag so filesystems like btrfs can avoid
expensive permission work.
- Tidy up and inline step_into() and walk_component() for improved
code generation.
- Re-enable IOCB_NOWAIT writes to files. This refactors file
timestamp update logic, fixing a layering bypass in btrfs when
updating timestamps on device files and improving FMODE_NOCMTIME
handling in VFS now that nfsd started using it.
- Path lookup optimizations extracting slowpaths into dedicated
routines and adding branch prediction hints for mntput_no_expire(),
fd_install(), lookup_slow(), and various other hot paths.
- Enable clang's -fms-extensions flag, requiring a JFS rename to
avoid conflicts.
- Remove spurious exports in fs/file_attr.c.
- Stop duplicating union pipe_index declaration. This depends on the
shared kbuild branch that brings in -fms-extensions support which
is merged into this branch.
- Use MD5 library instead of crypto_shash in ecryptfs.
- Use largest_zero_folio() in iomap_dio_zero().
- Replace simple_strtol/strtoul with kstrtoint/kstrtouint in init and
initrd code.
- Various typo fixes.
Fixes:
- Fix emergency sync for btrfs. Btrfs requires an explicit sync_fs()
call with wait == 1 to commit super blocks. The emergency sync path
never passed this, leaving btrfs data uncommitted during emergency
sync.
- Use local kmap in watch_queue's post_one_notification().
- Add hint prints in sb_set_blocksize() for LBS dependency on THP"
* tag 'vfs-6.19-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
MAINTAINERS: add German Maglione as virtiofs co-maintainer
fs: inline step_into() and walk_component()
fs: tidy up step_into() & friends before inlining
orangefs: use inode_update_timestamps directly
btrfs: fix the comment on btrfs_update_time
btrfs: use vfs_utimes to update file timestamps
fs: export vfs_utimes
fs: lift the FMODE_NOCMTIME check into file_update_time_flags
fs: refactor file timestamp update logic
include/linux/fs.h: trivial fix: regualr -> regular
fs/splice.c: trivial fix: pipes -> pipe's
fs: mark lookup_slow() as noinline
fs: add predicts based on nd->depth
fs: move mntput_no_expire() slowpath into a dedicated routine
fs: remove spurious exports in fs/file_attr.c
watch_queue: Use local kmap in post_one_notification()
fs: touch up predicts in path lookup
fs: move fd_install() slowpath into a dedicated routine and provide commentary
fs: hide dentry_cache behind runtime const machinery
fs: touch predicts in do_dentry_open()
...
Linus Torvalds [Mon, 1 Dec 2025 16:14:00 +0000 (08:14 -0800)]
Merge tag 'vfs-6.19-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
"FUSE iomap Support for Buffered Reads:
This adds iomap support for FUSE buffered reads and readahead. This
enables granular uptodate tracking with large folios so only
non-uptodate portions need to be read. Also fixes a race condition
with large folios + writeback cache that could cause data corruption
on partial writes followed by reads.
- Refactored iomap read/readahead bio logic into helpers
- Added caller-provided callbacks for read operations
- Moved buffered IO bio logic into new file
- FUSE now uses iomap for read_folio and readahead
Zero Range Folio Batch Support:
Add folio batch support for iomap_zero_range() to handle dirty
folios over unwritten mappings. Fix raciness issues where dirty data
could be lost during zero range operations.
- filemap_get_folios_tag_range() helper for dirty folio lookup
- Optional zero range dirty folio processing
- XFS fills dirty folios on zero range of unwritten mappings
- Removed old partial EOF zeroing optimization
DIO Write Completions from Interrupt Context:
Restore pre-iomap behavior where pure overwrite completions run
inline rather than being deferred to workqueue. Reduces context
switches for high-performance workloads like ScyllaDB.
- Removed unused IOCB_DIO_CALLER_COMP code
- Error completions always run in user context (fixes zonefs)
- Reworked REQ_FUA selection logic
- Inverted IOMAP_DIO_INLINE_COMP to IOMAP_DIO_OFFLOAD_COMP
Buffered IO Cleanups:
Some performance and code clarity improvements:
- Replace manual bitmap scanning with find_next_bit()
- Simplify read skip logic for writes
- Optimize pending async writeback accounting
- Better variable naming
- Documentation for iomap_finish_folio_write() requirements
Misaligned Vectors for Zoned XFS:
Enables sub-block aligned vectors in XFS always-COW mode for zoned
devices via new IOMAP_DIO_FSBLOCK_ALIGNED flag.
Bug Fixes:
- Allocate s_dio_done_wq for async reads (fixes syzbot report after
error completion changes)
- Fix iomap_read_end() for already uptodate folios (regression fix)"
* tag 'vfs-6.19-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (40 commits)
iomap: allocate s_dio_done_wq for async reads as well
iomap: fix iomap_read_end() for already uptodate folios
iomap: invert the polarity of IOMAP_DIO_INLINE_COMP
iomap: support write completions from interrupt context
iomap: rework REQ_FUA selection
iomap: always run error completions in user context
fs, iomap: remove IOCB_DIO_CALLER_COMP
iomap: use find_next_bit() for uptodate bitmap scanning
iomap: use find_next_bit() for dirty bitmap scanning
iomap: simplify when reads can be skipped for writes
iomap: simplify ->read_folio_range() error handling for reads
iomap: optimize pending async writeback accounting
docs: document iomap writeback's iomap_finish_folio_write() requirement
iomap: account for unaligned end offsets when truncating read range
iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted
xfs: support sub-block aligned vectors in always COW mode
iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
xfs: error tag to force zeroing on debug kernels
iomap: remove old partial eof zeroing optimization
xfs: fill dirty folios on zero range of unwritten mappings
...
Randy Dunlap [Fri, 28 Nov 2025 06:59:25 +0000 (22:59 -0800)]
locking/local_lock: Fix all kernel-doc warnings
Modify kernel-doc comments in local_lock.h to prevent warnings:
Warning: include/linux/local_lock.h:9 function parameter 'lock' not described in 'local_lock_init'
Warning: include/linux/local_lock.h:56 function parameter 'lock' not described in 'local_trylock_init'
Warning: include/linux/local_lock.h:56 expecting prototype for local_lock_init(). Prototype was for local_trylock_init() instead
Vincent Mailhol [Thu, 27 Nov 2025 14:41:40 +0000 (15:41 +0100)]
locking/local_lock: s/l/__l/ and s/tl/__tl/ to reduce the risk of shadowing
The Linux kernel coding style advises to avoid common variable
names in function-like macros to reduce the risk of namespace
collisions.
Throughout local_lock_internal.h, several macros use the rather common
variable names 'l' and 'tl'. This already resulted in an actual
collision: the __local_lock_acquire() function like macro is currently
shadowing the parameter 'l' of the:
Rename the variable 'l' to '__l' and the variable 'tl' to '__tl'
throughout the file to fix the current namespace collision and
to prevent future ones.
[ bigeasy: Rebase, update all l and tl instances in macros ]
Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://patch.msgid.link/20251127144140.215722-3-bigeasy@linutronix.de
locking/local_lock: Add the <linux/local_lock*.h> headers to MAINTAINERS
The local_lock_t was never added to the MAINTAINERS file since its
inclusion.
Add local_lock_t to the locking primitives section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://patch.msgid.link/20251127144140.215722-2-bigeasy@linutronix.de
locking/mutex: Redo __mutex_init() to reduce generated code size
mutex_init() invokes __mutex_init() providing the name of the lock and
a pointer to a the lock class. With LOCKDEP enabled this information is
useful but without LOCKDEP it not used at all. Passing the pointer
information of the lock class might be considered negligible but the
name of the lock is passed as well and the string is stored. This
information is wasting storage.
Split __mutex_init() into a _genereic() variant doing the initialisation
of the lock and a _lockdep() version which does _genereic() plus the
lockdep bits. Restrict the lockdep version to lockdep enabled builds
allowing the compiler to remove the unused parameter.
MIPS: mm: kmalloc tlb_vpn array to avoid stack overflow
Owing to Config4.MMUSizeExt and VTLB/FTLB MMU features later MIPSr2+
cores can have more than 64 TLB entries. Therefore allocate an array
for uniquification instead of placing too an small array on the stack.
Fixes: 35ad7e181541 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init") Co-developed-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # v6.17+: 9f048fa48740: MIPS: mm: Prevent a TLB shutdown on initial uniquification Cc: stable@vger.kernel.org # v6.17+ Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com> Tested-by: Klara Modin <klarasmodin@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
David Howells [Sat, 29 Nov 2025 00:40:11 +0000 (00:40 +0000)]
afs: Fix uninit var in afs_alloc_anon_key()
Fix an uninitialised variable (key) in afs_alloc_anon_key() by setting it
to cell->anonymous_key. Without this change, the error check may return a
false failure with a bad error number.
Most of the time this is unlikely to happen because the first encounter
with afs_alloc_anon_key() will usually be from (auto)mount, for which all
subsequent operations must wait - apart from other (auto)mounts. Once the
call->anonymous_key is allocated, all further calls to afs_request_key()
will skip the call to afs_alloc_anon_key() for that cell.
Fixes: d27c71257825 ("afs: Fix delayed allocation of a cell's anonymous key") Reported-by: Paulo Alcantra <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara <pc@manguebit.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: syzbot+41c68824eefb67cdf00c@syzkaller.appspotmail.com
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Nov 2025 22:08:09 +0000 (14:08 -0800)]
Merge tag 'spi-fix-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A disappointingly large set of device specific fixes that have built
up since I've been a bit tardy with sending a pull requests as people
kept sending me new new fixes.
The bcm63xx and lpspi issues could lead to corruption so the fixes are
fairly important for the affected parts, the other issues should all
be relatively minor"
* tag 'spi-fix-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: nxp-fspi: Propagate fwnode in ACPI case as well
spi: tegra114: remove Kconfig dependency on TEGRA20_APB_DMA
spi: amlogic-spifc-a1: Handle devm_pm_runtime_enable() errors
spi: spi-fsl-lpspi: fix watermark truncation caused by type cast
spi: cadence-quadspi: Fix cqspi_probe() error handling for runtime pm
spi: bcm63xx: fix premature CS deassertion on RX-only transactions
spi: spi-cadence-quadspi: Remove duplicate pm_runtime_put_autosuspend() call
spi: spi-cadence-quadspi: Enable pm runtime earlier to avoid imbalance
Linus Torvalds [Fri, 28 Nov 2025 18:01:24 +0000 (10:01 -0800)]
Merge tag 'vfs-6.18-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- afs: Fix delayed allocation of a cell's anonymous key
The allocation of a cell's anonymous key is done in a background
thread along with other cell setup such as doing a DNS upcall. The
normal key lookup tries to use the key description on the anonymous
authentication key as the reference for request_key() - but it may
not yet be set, causing an oops
- ovl: fail ovl_lock_rename_workdir() if either target is unhashed
As well as checking that the parent hasn't changed after getting the
lock, the code needs to check that the dentry hasn't been unhashed.
Otherwise overlayfs might try to rename something that has been
removed
- namespace: fix a reference leak in grab_requested_mnt_ns
lookup_mnt_ns() already takes a reference on mnt_ns, and so
grab_requested_mnt_ns() doesn't need to take an extra reference
* tag 'vfs-6.18-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Fix delayed allocation of a cell's anonymous key
ovl: fail ovl_lock_rename_workdir() if either target is unhashed
fs/namespace: fix reference leak in grab_requested_mnt_ns
Linus Torvalds [Fri, 28 Nov 2025 17:16:20 +0000 (09:16 -0800)]
Merge tag 'tty-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are two serial driver fixes for reported issues for 6.18-rc8.
These are:
- fix for a much reported symbol build loop that broke the build for
some kernel configurations
- amba-pl011 driver bugfix for a reported issue
Both have been in linux next (the last for weeks, the first for a
shorter amount of time), with no reported issues"
* tag 'tty-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250: Fix 8250_rsa symbol loop
serial: amba-pl011: prefer dma_mapping_error() over explicit address checking
Linus Torvalds [Fri, 28 Nov 2025 17:12:40 +0000 (09:12 -0800)]
Merge tag 'usb-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some last-minutes USB and Thunderbolt driver fixes and new
device ids for 6.18-rc8. Included in here are:
- usb storage quirk fixup
- xhci driver fixes for reported issues
- usb gadget driver fixes
- dwc3 driver fixes
- UAS driver fixup
- thunderbolt new device ids
- usb-serial driver new ids
All of these have been in linux-next with no reported issues, many for
many weeks"
* tag 'usb-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits)
usb: gadget: renesas_usbf: Handle devm_pm_runtime_enable() errors
USB: storage: Remove subclass and protocol overrides from Novatek quirk
usb: uas: fix urb unmapping issue when the uas device is remove during ongoing data transfer
usb: dwc3: Fix race condition between concurrent dwc3_remove_requests() call paths
xhci: dbgtty: fix device unregister
usb: storage: sddr55: Reject out-of-bound new_pba
USB: serial: option: add support for Rolling RW101R-GL
usb: typec: ucsi: psy: Set max current to zero when disconnected
usb: gadget: f_eem: Fix memory leak in eem_unwrap
usb: dwc3: pci: Sort out the Intel device IDs
usb: dwc3: pci: add support for the Intel Nova Lake -S
drivers/usb/dwc3: fix PCI parent check
usb: storage: Fix memory leak in USB bulk transport
xhci: sideband: Fix race condition in sideband unregister
xhci: dbgtty: Fix data corruption when transmitting data form DbC to host
xhci: fix stale flag preventig URBs after link state error is cleared
USB: serial: ftdi_sio: add support for u-blox EVK-M101
usb: cdns3: Fix double resource release in cdns3_pci_probe
usb: gadget: udc: fix use-after-free in usb_gadget_state_work
usb: renesas_usbhs: Fix synchronous external abort on unbind
...
* tag 'mailbox-fixes-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
mailbox: th1520: fix clock imbalance on probe failure
mailbox: pcc: don't zero error register
mailbox: mtk-gpueb: Add missing 'static' to mailbox ops struct
mailbox: mtk-cmdq: Refine DMA address handling for the command buffer
mailbox: mailbox-test: Fix debugfs_create_dir error checking
mailbox: omap-mailbox: Check for pending msgs only when mbox is exclusive
Arnd Bergmann [Fri, 21 Nov 2025 21:14:04 +0000 (22:14 +0100)]
Merge tag 'omap-for-v6.19/maintainers-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into arm/fixes
MAINTAINERS: Add entry for TQ-Systems AM335 device trees
* tag 'omap-for-v6.19/maintainers-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
MAINTAINERS: Add entry for TQ-Systems AM335 device trees
Linus Torvalds [Fri, 28 Nov 2025 16:20:14 +0000 (08:20 -0800)]
Merge tag 'mmc-v6.18-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
- sdhci-of-dwcmshc: Fix reset handling for some variants
* tag 'mmc-v6.18-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-dwcmshc: Promote the th1520 reset handling to ip level
Linus Torvalds [Fri, 28 Nov 2025 16:08:02 +0000 (08:08 -0800)]
Merge tag 'pmdomain-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- mediatek: Fix spinlock recursion in probe
- tegra: Use GENPD_FLAG_NO_STAY_ON to restore old behaviour
* tag 'pmdomain-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: tegra: Add GENPD_FLAG_NO_STAY_ON flag
pmdomains: mtk-pm-domains: Fix spinlock recursion in probe
Johan Hovold [Fri, 17 Oct 2025 05:54:14 +0000 (07:54 +0200)]
mailbox: th1520: fix clock imbalance on probe failure
The purpose of the devm_add_action_or_reset() helper is to call the
action function in case adding an action ever fails so drop the clock
disable from the error path to avoid disabling the clocks twice.
Fixes: 5d4d263e1c6b ("mailbox: Introduce support for T-head TH1520 Mailbox driver") Cc: Michal Wilczynski <m.wilczynski@samsung.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Michal Wilczynski <m.wilczynski@samsung.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Jamie Iles [Wed, 5 Nov 2025 14:42:29 +0000 (14:42 +0000)]
mailbox: pcc: don't zero error register
The error status mask for a type 3/4 subspace is used for reading the
error status, and the bitwise inverse is used for clearing the error
with the intent being to preserve any of the non-error bits. However,
we were previously applying the mask to extract the status and then
applying the inverse to the result which ended up clearing all bits.
Instead, store the inverse mask in the preserve mask and then use that
on the original value read from the error status so that only the error
is cleared.
mailbox: mtk-gpueb: Add missing 'static' to mailbox ops struct
mtk_gpueb_mbox_ops should be declared static. However, due to its const
nature, this specifier was missed, as it compiled fine without it and
with no warning by the compiler.
arc-linux-gcc (GCC) 12.5.0 doesn't seem to like it however, so add the
static to fix that.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510100629.3nGvrhEU-lkp@intel.com/ Fixes: dbca0eabb821 ("mailbox: add MediaTek GPUEB IPI mailbox") Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Jason-JH Lin [Wed, 22 Oct 2025 17:16:30 +0000 (01:16 +0800)]
mailbox: mtk-cmdq: Refine DMA address handling for the command buffer
GCE can only fetch the command buffer address from a 32-bit register.
Some SoCs support a 35-bit command buffer address for GCE, which
requires a right shift of 3 bits before setting the address into
the 32-bit register. A comment has been added to the header of
cmdq_get_shift_pa() to explain this requirement.
To prevent the GCE command buffer address from being DMA mapped beyond
its supported bit range, the DMA bit mask for the device is set during
initialization.
Additionally, to ensure the correct shift is applied when setting or
reading the register that stores the GCE command buffer address,
new APIs, cmdq_convert_gce_addr() and cmdq_revert_gce_addr(), have
been introduced for consistent operations on this register.
The variable type for the command buffer address has been standardized
to dma_addr_t to prevent handling issues caused by type mismatches.
Fixes: 0858fde496f8 ("mailbox: cmdq: variablize address shift in platform") Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Beleswar Padhi [Mon, 3 Nov 2025 20:11:11 +0000 (01:41 +0530)]
mailbox: omap-mailbox: Check for pending msgs only when mbox is exclusive
On TI K3 devices, the mailbox resides in the Always-On power domain
(LPSC_main_alwayson) and is shared among multiple processors. The
mailbox is not solely exclusive to Linux.
Currently, the suspend path checks all FIFO queues for pending messages
and blocks suspend if any are present. This behavior is unnecessary for
K3 devices, since some of the FIFOs are used for RTOS<->RTOS
communication and are independent of Linux.
For FIFOs used in Linux<->RTOS communication, any pending message would
trigger an interrupt, which naturally prevents suspend from completing.
Hence, there is no need for the mailbox driver to explicitly check for
pending messages on K3 platforms.
Introduce a device match flag to indicate whether the mailbox instance
is exclusive to Linux, and skip the pending message check for
non-exclusive instances (such as in K3).
Fixes: a49f991e740f ("arm64: dts: ti: k3-am62-verdin: Add missing cfg for TI IPC Firmware") Closes: https://lore.kernel.org/all/sid7gtg5vay5qgicsl6smnzwg5mnneoa35cempt5ddwjvedaio@hzsgcx6oo74l/ Signed-off-by: Beleswar Padhi <b-padhi@ti.com> Tested-by: Hiago De Franco <hiago.franco@toradex.com> Reviewed-by: Andrew Davis <afd@ti.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
This now removes roughly double the code that it adds.
I've been playing with this to allow for moderately flexible usage of
the get_unused_fd_flags() + create file + fd_install() pattern that's
used quite extensively and requires cumbersome cleanup paths.
How callers allocate files is really heterogenous so it's not really
convenient to fold them into a single class. It's possibe to split them
into subclasses like for anon inodes. I think that's not necessarily
nice as well. This adds two primitives:
(1) FD_ADD() the simple cases a file is installed:
I've converted all of the easy cases over to it and it gets rid of an
aweful lot of convoluted cleanup logic. There are a bunch of other cases
that can also be converted after a bit of massaging.
It's centered around a simple struct. FD_PREPARE() encapsulates all of
allocation and cleanup logic and must be followed by a call to
fd_publish() which associates the fd with the file and installs it into
the callers fdtable. If fd_publish() isn't called both are deallocated.
FD_ADD() is a shorthand that does the fd_publish() and never exposes the
struct to the caller. That's often the case when they don't need access
to anything after installing the fd.
It mandates a specific order namely that first we allocate the fd and
then instantiate the file. But that shouldn't be a problem. Nearly
everyone I've converted used this order anyway.
There's a bunch of additional cases where it would be easy to convert
them to this pattern. For example, the whole sync file stuff in dma
currently returns the containing structure of the file instead of the
file itself even though it's only used to allocate files. Changing that
would make it fall into the FD_PREPARE() pattern easily. I've not done
that work yet.
There's room for extending this in a way that wed'd have subclasses for
some particularly often use patterns but as I said I'm not even sure
that's worth it.
* patches from https://patch.msgid.link/20251123-work-fd-prepare-v4-0-b6efa1706cfd@kernel.org: (47 commits)
kvm: convert kvm_vcpu_ioctl_get_stats_fd() to FD_PREPARE()
kvm: convert kvm_arch_supports_gmem_init_shared() to FD_PREPARE()
io_uring: convert io_create_mock_file() to FD_PREPARE()
file: convert replace_fd() to FD_PREPARE()
vfio: convert vfio_group_ioctl_get_device_fd() to FD_PREPARE()
tty: convert ptm_open_peer() to FD_PREPARE()
ntsync: convert ntsync_obj_get_fd() to FD_PREPARE()
media: convert media_request_alloc() to FD_PREPARE()
hv: convert mshv_ioctl_create_partition() to FD_PREPARE()
gpio: convert linehandle_create() to FD_PREPARE()
dma: port sw_sync_ioctl_create_fence() to FD_PREPARE()
pseries: port papr_rtas_setup_file_interface() to FD_PREPARE()
pseries: convert papr_platform_dump_create_handle() to FD_PREPARE()
spufs: convert spufs_gang_open() to FD_PREPARE()
papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()
spufs: convert spufs_context_open() to FD_PREPARE()
net/socket: convert __sys_accept4_file() to FD_PREPARE()
net/socket: convert sock_map_fd() to FD_PREPARE()
net/sctp: convert sctp_getsockopt_peeloff_common() to FD_PREPARE()
net/kcm: convert kcm_ioctl() to FD_PREPARE()
...