- Fix various inaccurate hard-coded event configurations (Dapeng Mi)
Intel uncore PMU driver updates (Zide Chen):
- Fix discovery unit lookup bug for multi-die systems
- Guard against invalid box control address
- Fix PCI device refcount leak in UPI discovery
- Defer ADL global PMON enable to enable_box() to save power
- Fix uncore_die_to_cpu() for offline dies
- Implement global init callback for GNR uncore
AMD CPU PMU driver updates:
- Always use the NMI latency mitigation (Sandipan Das)
AMD uncore PMU driver updates:
- Use Node ID to identify DF and UMC domains (Sandipan Das)"
* tag 'perf-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (22 commits)
perf/x86/amd/uncore: Use Node ID to identify DF and UMC domains
perf: Reveal PMU type in fdinfo
perf/x86/intel/uncore: Implement global init callback for GNR uncore
perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies
perf/x86/intel/uncore: Move die_to_cpu() to uncore.c
perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box()
perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery
perf/x86/intel/uncore: Guard against invalid box control address
perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
perf/x86/amd/core: Always use the NMI latency mitigation
perf/x86/intel: Update event constraints and cache_extra_regsfor CWF
perf/x86/intel: Update event constraints and cache_extra_regsfor SRF
perf/x86/intel: Update event constraints and cache_extra_regsfor NVL
perf/x86/intel: Update event constraints for PTL
perf/x86/intel: Update event constraints and cache_extra_regsfor ARL
perf/x86/intel: Update event constraints and cache_extra_regsfor LNL
perf/x86/intel: Update event constraints and cache_extra_regsfor MTL
perf/x86/intel: Update event constraints and cache_extra_regsfor ADL
perf/x86/intel: Update event constraints for DMR
perf/x86/intel: Update event constraints and cache_extra_regsfor SPR
...
- Large series to address the robust futex unlock race for real, by
Thomas Gleixner:
"The robust futex unlock mechanism is racy in respect to the
clearing of the robust_list_head::list_op_pending pointer because
unlock and clearing the pointer are not atomic.
The race window is between the unlock and clearing the pending op
pointer. If the task is forced to exit in this window, exit will
access a potentially invalid pending op pointer when cleaning up
the robust list.
That happens if another task manages to unmap the object
containing the lock before the cleanup, which results in an UAF.
In the worst case this UAF can lead to memory corruption when
unrelated content has been mapped to the same address by the time
the access happens.
User space can't solve this problem without help from the kernel.
This series provides the kernel side infrastructure to help it
along:
1) Combined unlock, pointer clearing, wake-up for the
contended case
2) VDSO based unlock and pointer clearing helpers with a
fix-up function in the kernel when user space was interrupted
within the critical section.
... with help by André Almeida:
- Add a note about robust list race condition (André Almeida)
- Add self-tests for robust release operations (André Almeida)
Context analysis updates:
- Implement context analysis for 'struct rt_mutex'. (Bart Van Assche)
- Bump required Clang version to 23 (Marco Elver)
Guard infrastructure updates:
- Series to remove NULL check from unconditional guards (Dmitry
Ilvokhin)
Lockdep updates:
- Restore self-test migrate_disable() and sched_rt_mutex state on
PREEMPT_RT (Karl Mehltretter)
Membarriers updates:
- Use per-CPU mutexes for targeted commands (Aniket Gattani)
- Modernize membarrier_global_expedited with cleanup guards (Aniket
Gattani)
- Add rseq stress test for CFS throttle interactions (Aniket Gattani)
percpu-rwsems updates:
- Extract __percpu_up_read() to optimize inlining overhead (Dmitry
Ilvokhin)
Seqlocks updates:
- Allow UBSAN_ALIGNMENT to fail optimizing (Heiko Carstens)
Lock tracing:
- Add contended_release tracepoint to sleepable locks such as
mutexes, percpu-rwsems, rtmutexes, rwsems and semaphores (Dmitry
Ilvokhin)
MAINTAINERS updates:
- MAINTAINERS: Add RUST [SYNC] entry (Boqun Feng)
Misc updates and fixes by Randy Dunlap, YE WEI-HONG, Fabricio Parra,
Dmitry Ilvokhin and Peter Zijlstra"
* tag 'locking-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (36 commits)
locking: Add contended_release tracepoint to sleepable locks
locking/percpu-rwsem: Extract __percpu_up_read()
tracing/lock: Remove unnecessary linux/sched.h include
futex: Optimize futex hash bucket access patterns
rust: sync: completion: Mark inline complete_all and wait_for_completion
MAINTAINERS: Add RUST [SYNC] entry
cleanup: Specify nonnull argument index
selftests: futex: Add tests for robust release operations
Documentation: futex: Add a note about robust list race condition
x86/vdso: Implement __vdso_futex_robust_try_unlock()
x86/vdso: Prepare for robust futex unlock support
futex: Provide infrastructure to plug the non contended robust futex unlock race
futex: Add robust futex unlock IP range
futex: Add support for unlocking robust futexes
futex: Cleanup UAPI defines
x86: Select ARCH_MEMORY_ORDER_TSO
uaccess: Provide unsafe_atomic_store_release_user()
futex: Provide UABI defines for robust list entry modifiers
futex: Move futex related mm_struct data into a struct
futex: Make futex_mm_init() void
...
Linus Torvalds [Mon, 15 Jun 2026 08:27:13 +0000 (13:57 +0530)]
Merge tag 'timers-vdso-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull vdso updates from Thomas Gleixner:
- Remove the redundant CONFIG_GENERIC_TIME_VSYSCALL after converting
the remaining users over.
- Rework and sanitize the MIPS VDSO handling, so it does not handle the
time related VDSO if there is no VDSO capable clocksource available.
Also stop mapping VDSO data pages unconditionally even if there is no
usage possible.
* tag 'timers-vdso-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
MIPS: VDSO: Fold MIPS_CLOCK_VSYSCALL into MIPS_GENERIC_GETTIMEOFDAY
MIPS: VDSO: Gate microMIPS restriction on GCC version
MIPS: VDSO: Fold MIPS_DISABLE_VDSO into MIPS_GENERIC_GETTIMEOFDAY
clocksource/drivers/mips-gic-timer: Only use VDSO_CLOCKMODE_GIC when it is a available
MIPS: csrc-r4k: Only use VDSO_CLOCKMODE_R4K when it is a available
MIPS: VDSO: Only map the data pages when the vDSO is used
MIPS: Introduce Kconfig MIPS_GENERIC_GETTIMEOFDAY
vdso/datastore: Always provide symbol declarations
MAINTAINERS: Add include/linux/vdso_datastore.h to vDSO block
vdso/gettimeofday: Rename __arch_get_vdso_u_timens_data()
vdso/treewide: Drop GENERIC_TIME_VSYSCALL
vdso/vsyscall: Gate update_vsyscall() behind CONFIG_GENERIC_GETTIMEOFDAY
riscv: vdso: Drop CONFIG_GENERIC_TIME_VSYSCALL guard around syscall fallbacks
Linus Torvalds [Mon, 15 Jun 2026 08:21:27 +0000 (13:51 +0530)]
Merge tag 'timers-ptp-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull timekeeping updates from Thomas Gleixner:
"Updates for NTP/timekeeping and PTP:
- Expand timekeeping snapshot mechanisms
The various snapshot functions are mostly used for PTP to collect
"atomic" snapshots of various involved clocks.
They lack support for the recently introduced AUX clocks and do not
provide the underlying counter value (e.g. TSC) to user space.
Exposing the counter value snapshot allows for better control and
steering.
Convert the hard wired ktime_get_snapshot() to take a clock ID,
which allows the caller to select the clock ID to be captured along
with CLOCK_MONONOTONIC_RAW. Additionally capture the underlying
hardware counter value and the clock source ID of the counter.
Expand the hardware based snapshot capture where devices provide a
mechanism to snapshot the hardware PTP clock and the system counter
(usually via PCI/PTM) to support AUX clocks and also provide the
captured counter value back to the caller and not only the clock
timestamps derived from it.
- Add a new optional read_snapshot() callback to clocksources
That is required to capture atomic snapshots from clocksources
which are derived from TSC with a scaling mechanism (e.g. Hyper-V,
KVMclock).
The value pair is handed back in the snapshot structure to the
callers, so they can do the necessary correlations in a more
precise way.
This touches usage sites of the affected functions and data structure
all over the tree, but stays fully backwards compatible for the
existing user space exposed interfaces. New PTP IOCTLs will provide
access to the extended functionality in later kernel versions"
* tag 'timers-ptp-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (28 commits)
ptp: vmclock: Use hw_cycles from snapshot for precise TSC pairing
x86/kvmclock: Implement read_snapshot() for kvmclock clocksource
clocksource/hyperv: Implement read_snapshot() for TSC page clocksource
timekeeping: Add clocksource read_snapshot() method and hw_cycles to snapshot
ptp: Switch to ktime_get_snapshot_id() for pre/post timestamps
timekeeping: Add support for AUX clock cross timestamping
timekeeping: Remove system_device_crosststamp::sys_realtime
ALSA: hda/common: Use system_device_crosststamp::sys_systime
wifi: iwlwifi: Use system_device_crosststamp::sys_systime
ptp: Use system_device_crosststamp::sys_systime
timekeeping: Prepare for cross timestamps on arbitrary clock IDs
timekeeping: Remove ktime_get_snapshot()
virtio_rtc: Use provided clock ID for history snapshot
net/mlx5: Use provided clock ID for history snapshot
igc: Use provided clock ID for history snapshot
ice/ptp: Use provided clock ID for history snapshot
wifi: iwlwifi: Adopt PTP cross timestamps to core changes
timekeeping: Add CLOCK ID to system_device_crosststamp
timekeeping: Add system_counterval_t to struct system_device_crosststamp
timekeeping: Add CLOCK_AUX support for ktime_get_snapshot_id()
...
Linus Torvalds [Mon, 15 Jun 2026 08:18:52 +0000 (13:48 +0530)]
Merge tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull NOHZ updates from Thomas Gleixner:
- Fix a long standing TOCTOU in get_cpu_sleep_time_us()
- Make the CPU offline NOHZ handling more robust by disabling NOHZ on
the outgoing CPU early instead of creating unneeded state which needs
to be undone.
- Unify idle CPU time accounting instead of having two different
accounting mechanisms. These two different mechanisms are not really
independent, but the different properties can in the worst case cause
that gloabl idle time can be observed going backwards.
- Consolidate the idle/iowait time retrieval interfaces instead of
converting back and forth between them.
- Make idle interrupt time accounting more robust. The original code
assumes that interrupt time accouting is enabled and therefore stops
elapsing idle time while an interrupt is handled in NOHZ dyntick
state. That assumption is not correct as interrupt time accounting
can be disabled at compile and runtime.
- Fix an accounting error between dyntick idle time and dyntick idle
steal time. The stolen time is not accounted and therefore idle time
becomes inaccurate. The stolen time is now accounted after the fact
as there is no way to predict the steal time upfront.
* tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Handle dyntick-idle steal time correctly
sched/cputime: Handle idle irqtime gracefully
sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case
tick/sched: Consolidate idle time fetching APIs
tick/sched: Account tickless idle cputime only when tick is stopped
tick/sched: Remove unused fields
tick/sched: Move dyntick-idle cputime accounting to cputime code
tick/sched: Remove nohz disabled special case in cputime fetch
tick/sched: Unify idle cputime accounting
s390/time: Prepare to stop elapsing in dynticks-idle
powerpc/time: Prepare to stop elapsing in dynticks-idle
sched/cputime: Correctly support generic vtime idle time
sched/cputime: Remove superfluous and error prone kcpustat_field() parameter
sched/idle: Handle offlining first in idle loop
tick/sched: Fix TOCTOU in nohz idle time fetch
Linus Torvalds [Mon, 15 Jun 2026 08:09:12 +0000 (13:39 +0530)]
Merge tag 'timers-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
"Updates for the time/timer core subsystem:
- Harden the user space controllable hrtimer interfaces further to
protect against unpriviledged DoS attempts by arming timers in the
past.
- Add per-capacity hierarchies to the timer migration code to prevent
timer migration accross different capacity domains. This code has
been disabled last minute as there is a pathological problem with
SoCs which advertise a larger number of capacity domains. The
problem is under investigation and the code won't be active before
v7.3, but that turned out to be less intrusive than a full revert
as it preserves the preparatory steps and allows people to work on
the final resolution
- Export time namespace functionality as a recent user can be built
as a module.
- Initialize the jiffies clocksource before using it. The recent
hardening against time moving backward requires that the related
members of struct clocksource have been initialized, otherwise it
clamps the readout to 0, which makes time stand sill and causes
boot delays.
- Fix a more than twenty year old PID reference count leak in an
error path of the POSIX CPU timer code.
- The usual small fixes, improvements and cleanups all over the
place"
* tag 'timers-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (31 commits)
posix-cpu-timers: Fix pid refcount leak in do_cpu_nanosleep() error path
time/jiffies: Register jiffies clocksource before usage
timers/migration: Temporarily disable per capacity hierarchies
timers/migration: Turn tmigr_hierarchy level_list into a flexible array
timers/migration: Deactivate per-capacity hierarchies under nohz_full
timers/migration: Fix hotplug migrator selection target on asymetric capacity machines
ntsync: Honour caller's time namespace for absolute MONOTONIC timeouts
time/namespace: Export init_time_ns and do_timens_ktime_to_host()
timers/migration: Update stale @online doc to @available
timers: Fix flseep() typo in kernel-doc comment
hrtimer: Fix the bogus return type of __hrtimer_start_range_ns()
hrtimer: Return ktime_t from hrtimer_get_next_event()/hrtimer_next_event_without()
clocksource: Clean up clocksource_update_freq() functions
alarmtimer: Remove stale return description from alarm_handle_timer()
selftests/posix_timers: Use CLOCK_THREAD_CPUTIME_ID for ITIMER_PROF measurements
scripts/timers: Add timer_migration_tree.py
timers/migration: Handle capacity in connect tracepoints
timers/migration: Split per-capacity hierarchies
timers/migration: Track CPUs in a hierarchy
timers/migration: Abstract out hierarchy to prepare for CPU capacity awareness
...
Linus Torvalds [Mon, 15 Jun 2026 08:04:03 +0000 (13:34 +0530)]
Merge tag 'timers-clocksource-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull clocksource updates from Thomas Gleixner:
"Updates for clocksource/clockevent drivers:
- Add devm helpers for clocksources, which allows to simplify driver
teardown and probe failure handling.
- More module conversion work
- Update the support for the ARM EL2 virtual timer including the
required ACPI changes.
- Add clockevent and clocksource support for the TI Dual Mode Timer
- Fix the support for multiple watchdog instances in the TEGRA186
driver
- Add D1 timer support to the SUN5I driver
- The usual devicetree updates, cleanups and small fixes all over the
place"
* tag 'timers-clocksource-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (24 commits)
clocksource: move NXP timer selection to drivers/clocksource
clocksource/drivers/timer-tegra186: Reserve and service a kernel watchdog
clocksource/drivers/timer-tegra186: Register all accessible watchdog timers
clocksource/drivers/timer-tegra186: Correct num_wdts for Tegra186 and Tegra234
clocksource/drivers/timer-tegra186: Fix support for multiple watchdog instances
clocksource/drivers/timer-ti-dm: Add clockevent support
clocksource/drivers/timer-ti-dm: Add clocksource support
clocksource/drivers/timer-ti-dm: Fix property name in comment
dt-bindings: timer: arm,arch_timer: Fix requirements for interrupt description
clocksource/drivers/arm_arch_timer: Default to EL2 virtual timer when running VHE
ACPI: GTDT: Parse information related to the EL2 virtual timer
ACPI: GTDT: Account for GTDTv3 size when walking the platform timer descriptors
clocksource: Add devm_clocksource_register_*() helpers
clocksource/drivers/sun5i: Add D1 hstimer support
dt-bindings: timer: allwinner,sun5i-a13-hstimer: add H616 and D1
dt-bindings: timer: Add StarFive JHB100 clint
dt-bindings: timer: renesas,rz-mtu3: document RZ/{T2H,N2H}
dt-bindings: timer: renesas,rz-mtu3: Remove TCIU8 interrupt
dt-bindings: timer: Remove sifive,fine-ctr-bits property
clocksource/drivers/timer-of: Make the code compatible with modules
...
Linus Torvalds [Mon, 15 Jun 2026 08:00:04 +0000 (13:30 +0530)]
Merge tag 'smp-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull smp core updates from Thomas Gleixner:
"Two small updates to the SMP/hotplug subsystem:
- Add cpuhplock.h to the maintained files
- Provide the missing stubs for lockdep_is_cpus_held() and
lockdep_is_cpus_write_held() so the usage sites can be simplified"
* tag 'smp-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
cpu: Add lockdep_is_cpus_held()/lockdep_is_cpus_write_held() stubs for !CONFIG_HOTPLUG_CPU
MAINTAINERS: Add include/linux/cpuhplock.h to CPU HOTPLUG area
Linus Torvalds [Mon, 15 Jun 2026 07:55:32 +0000 (13:25 +0530)]
Merge tag 'irq-drivers-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull interrupt chip driver updates from Thomas Gleixner:
- Replace the support for the AST2700-A0 early silicon with a proper
driver for the final A2 production silicon
- Rename and rework the StarFive JH8100 interrupt controller for the
new JHB100 SoC as JH8100 was discontinued before production.
- Add support for Amlogic A9 SoCs to the meson-gpio interrupt
controller
- Expand the Econet interrupt controller driver to support MIPS 34Kc
Vectored External Interrupt Controller mode.
- Prevent a NULL pointer dereference in the GICv4 code as the vLPI code
blindly assumes that the ITS was populated. Add the missing sanity
check.
- Add support for software triggered and for error interrupts to the
Renesas RZ/T2H driver.
- Add interrupt redirection support for the loongarch architecture.
- Add multicore support to the Realtek RTL interrupt driver
- The usual updates, enhancements and fixes all over the place
* tag 'irq-drivers-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (32 commits)
irqchip/irq-realtek-rtl: Add multicore support
irqchip/irq-realtek-rtl: Add/simplify register helpers
irqchip/loongarch-ir: Add IR (interrupt redirection) irqchip support
irqchip/loongarch-avec: Return IRQ_SET_MASK_OK_DONE when keep affinity
irqchip/loongarch-avec: Prepare for interrupt redirection support
Docs/LoongArch: Add advanced extended IRQ model
irqchip/qcom-pdc: Use FIELD_GET() to extract bank index and bit position
irqchip/qcom-pdc: Add PDC_VERSION() macro to describe version register fields
irqchip/qcom-pdc: Tighten ioremap clamp to single DRV region size
irqchip/qcom-pdc: Split __pdc_enable_intr() into per-version helpers
irqchip/exynos-combiner: Remove useless spinlock
irqchip/renesas-rzt2h: Add error interrupts support
irqchip/renesas-rzt2h: Add software-triggered interrupts support
irqchip/gic-v4: Don't advertise VLPIs if no ITS is probed
irqchip/gic-v3-its: Use FIELD_MODIFY()
irqchip/econet-en751221: Support MIPS 34Kc VEIC mode
dt-bindings: interrupt-controller: econet: Add CPU interrupt mapping
irqchip/meson-gpio: Add support for Amlogic A9 SoCs
dt-bindings: interrupt-controller: Add support for Amlogic A9 SoCs
irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type()
...
Linus Torvalds [Mon, 15 Jun 2026 07:49:41 +0000 (13:19 +0530)]
Merge tag 'irq-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull interrupt core updates from Thomas Gleixner:
- Rework of /proc/interrupt handling:
/proc/interrupts was subject to micro optimizations for a long time,
but most of the low hanging fruit was left on the table. This rework
addresses the major time consuming issues:
- Printing a long series of zeros one by one via a format string
instead of counting subsequent zeros and emitting a string
constant.
- Simplify and cache the conditions whether interrupts should be
printed
- Use a proper iteration over the interrupt descriptor xarray
instead of walking and testing one by one.
- Provide helper functions for the architecture code to emit the
architecture specific counters
- Convert the counter structure in x86 to an array, which
simplifies the output and add mechanisms to suppress unused
architecture interrupts, which just occupy space for nothing.
Adopt the new core mechanisms.
This adjusts the gdb scripts related to interrupt counter statistics
to work with the new mechanisms.
- Prevent a string overflow in the /proc/irq/$N/ directory name
creation code.
* tag 'irq-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
x86/irq: Add missing 's' back to thermal event printout
genirq/proc: Speed up /proc/interrupts iteration
genirq/proc: Runtime size the chip name
genirq: Expose irq_find_desc_at_or_after() in core code
genirq: Add rcuref count to struct irq_desc
genirq/proc: Increase default interrupt number precision to four
genirq: Calculate precision only when required
genirq: Cache the condition for /proc/interrupts exposure
genirq/manage: Make NMI cleanup RT safe
genirq: Expose nr_irqs in core code
scripts/gdb: Update x86 interrupts to the array based storage
x86/irq: Move IOAPIC misrouted and PIC/APIC error counts into irq_stats
x86/irq: Suppress unlikely interrupt stats by default
x86/irq: Make irqstats array based
genirq/proc: Utilize irq_desc::tot_count to avoid evaluation
genirq/proc: Avoid formatting zero counts in /proc/interrupts
x86/irq: Optimize interrupts decimals printing
genirq/proc: Size interrupt directory names for 10-digit interrupt numbers
Linus Torvalds [Mon, 15 Jun 2026 07:44:36 +0000 (13:14 +0530)]
Merge tag 'core-rseq-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull rseq update from Thomas Gleixner:
"A trivial update for RSEQ selftests to provide the config fragments
which contain the config options required to actually run the tests"
* tag 'core-rseq-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip:
selftests/rseq: Add config fragment
Colton Jones [Mon, 15 Jun 2026 03:36:20 +0000 (03:36 +0000)]
ALSA: hda/realtek: Add CS35L41 I2C quirk for ASUS UM3405GA
The ASUS Zenbook 14 UM3405GA uses a Realtek ALC294 codec with two
Cirrus Logic CS35L41 speaker amplifiers exposed through the CSC3551 ACPI
device. The machine reports the Realtek subsystem ID 1043:19f4.
Without a PCI quirk, the codec falls back to generic pin matching and the
internal speakers remain silent even though PCM playback completes.
Add the UM3405GA subsystem ID and reuse the same ASUS I2C headset-mic
fixup used by the closely related UM3406HA. That fixup configures the
headset microphone pin and chains to CS35L41 I2C speaker-amp binding.
selftests/bpf: Work around llvm stack overflow in crypto progs
clang 23 fails to build crypto_bench.c and crypto_sanity.c with
"BPF stack limit exceeded". The progs fill a 408-byte
bpf_crypto_params on the stack and pass it to bpf_crypto_ctx_create().
clang 23 copies the byte-aligned cipher/key globals into it one byte at
a time through the stack, and keeps more than one copy of the struct
around. Together that blows the 512-byte limit.
Align the source arrays to 8 bytes so the copy is word-wise, and move
params off the stack into a static .bss var. static keeps it out of the
skeleton, where bpf_crypto_params is an incomplete type. Either change
alone is not enough.
Linus Torvalds [Mon, 15 Jun 2026 07:11:17 +0000 (12:41 +0530)]
Merge tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"Deferred probe:
- Fix race where deferred probe timeout work could be permanently
canceled by using mod_delayed_work()
- Fix missing jiffies conversion in deferred_probe_extend_timeout()
- Guard timeout extension with delayed_work_pending() to prevent
premature firing
- Use system_percpu_wq instead of the deprecated system_wq
- Update deferred_probe_timeout documentation
device:
- Replace direct struct device bitfield access (can_match, dma_iommu,
dma_skip_sync, dma_ops_bypass, state_synced, dma_coherent,
of_node_reused, offline, offline_disabled) with flag-based
accessors using bit operations
- Reject devices with unregistered buses
- Delete unused DEVICE_ATTR_PREALLOC()
- Add low-level device attribute macros with const show/store
callbacks, allowing device attributes to reside in read-only memory
- Move core device attributes to read-only memory
- Constify group array pointers in driver_add_groups() /
driver_remove_groups(), struct bus_type, and struct device_driver
device property:
- Fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
- Initialize all fields of fwnode_handle in fwnode_init()
- Provide swnode_get()/swnode_put() wrappers around kobject_get/put()
- Allow passing struct software_node_ref_args pointers directly to
PROPERTY_ENTRY_REF()
driver_override:
- Migrate amba, cdx, vmbus, and rpmsg to the generic driver_override
infrastructure, fixing a UAF from unsynchronized access to
driver_override in bus match() callbacks
- Remove the now-unused driver_set_override()
firmware loader:
- Fix recursive lock deadlock in device_cache_fw_images() when async
work falls back to synchronous execution
- Fix device reference leak in firmware_upload_register()
platform:
- Pass KBUILD_MODNAME through the platform driver registration macro
to create module symlinks in sysfs for built-in drivers; move
module_kset initialization to a pure_initcall and tegra cbb
registration to core_initcall to ensure correct ordering
- Pass THIS_MODULE implicitly through a coresight_init_driver() macro
sysfs:
- Upgrade OOB write detection in sysfs_kf_seq_show() from printk to
WARN
- Add return value clamping to sysfs_kf_read()
Rust:
- ACPI:
Fix missing match data for PRP0001 by exporting
acpi_of_match_device()
- Auxiliary:
Replace drvdata() with dedicated registration data on
auxiliary_device. drvdata() exposed the driver's bus device private
data beyond the driver's own scope, creating ordering constraints
and forcing the data to outlive all registrations that access it.
Registration data is instead scoped structurally to the
Registration object, making lifecycle ordering enforced by
construction rather than convention.
- Rust-native device driver lifetimes (HRT):
Allow Rust device drivers to carry a lifetime parameter on their
bus device private data, tied to the device binding scope -- the
interval during which a bus device is bound to a driver. Device
resources like pci::Bar<'a> and IoMem<'a> can be stored directly in
the driver's bus device private data with a lifetime bounded by the
binding scope, so the compiler enforces at build time that they do
not outlive the binding. This removes Devres indirection from every
access site and eliminates try_access() failure paths in
destructors.
Bus driver traits use a Generic Associated Type (GAT) Data<'bound>
to introduce the lifetime on the private data, rather than
parameterizing the Driver trait itself. Auxiliary registration
data, where the lifetime is not introduced by a trait callback but
must be threaded through Registration, uses the ForLt trait (a
type-level abstraction for types generic over a lifetime).
Misc:
- Fix DT overlayed devices not probing by reverting the broken
treewide overlay fix and re-running fw_devlink consumer pickup when
an overlay is applied to a bound device
- Use root_device_register() for faux bus root device; add sanity
check for failed bus init
- Fix dev_has_sync_state() data race with READ_ONCE() and move it to
base.h
- Avoid spurious device_links warning when removing a device while
its supplier is unbinding
- Switch ISA bus to dynamic root device
- Fix suspicious RCU usage in kernfs_put()
- Remove devcoredump exit callback
- Constify devfreq_event_class"
* tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core: (81 commits)
software node: allow passing reference args to PROPERTY_ENTRY_REF()
driver core: platform: set mod_name in driver registration
coresight: pass THIS_MODULE implicitly through a macro
kernel: param: initialize module_kset in a pure_initcall
soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
firmware_loader: Fix recursive lock in device_cache_fw_images()
driver core: Use system_percpu_wq instead of system_wq
driver core: remove driver_set_override()
rpmsg: use generic driver_override infrastructure
Drivers: hv: vmbus: use generic driver_override infrastructure
cdx: use generic driver_override infrastructure
amba: use generic driver_override infrastructure
rust: devres: add 'static bound to Devres<T>
samples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data
rust: auxiliary: generalize Registration over ForLt
rust: types: add `ForLt` trait for higher-ranked lifetime support
gpu: nova-core: separate driver type from driver data
samples: rust: rust_driver_pci: use HRT lifetime for Bar
rust: io: make IoMem and ExclusiveIoMem lifetime-parameterized
rust: pci: make Bar lifetime-parameterized
...
Linus Torvalds [Mon, 15 Jun 2026 06:07:18 +0000 (11:37 +0530)]
Merge tag 'pm-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"Over a half of the changes here are cpufreq updates that include core
modifications, fixes of the old-style governors, new hardware support
in drivers, assorded driver fixes and cleanups, and the removal of one
driver (AMD Elan SC4*).
Apart from that, the intel_idle driver will now be able to avoid
exposing redundant C-states if PC6 is disabled and there are new
sysctl knobs for device suspend/resume watchdog timeouts, hibernation
gets built-in LZ4 support for image compression and there is the usual
collection of assorted fixes and cleanups.
Specifics:
- Fix a race between cpufreq suspend and CPU hotplug during system
shutdown (Tianxiang Chen)
- Avoid redundant target() calls for unchanged limits and fix a typo
in a comment in the cpufreq core (Viresh Kumar)
- Fix concurrency issues related to sysfs attributes access that
affect cpufreq governors using the common governor code (Zhongqiu
Han)
- Simplify frequency limit handling in the conservative cpufreq
governor (Lifeng Zheng)
- Fix descriptions of the conservative governor freq_step tunable and
the ondemand governor sampling_down_factor tunable in the cpufreq
documentation (Pengjie Zhang)
- Fix use-after-free and double free during _OSC evaluation in the
PCC cpufreq driver (Yuho Choi)
- Rework the handling of policy min and max frequency values in the
cpufreq core to allow drivers to specify special initial values for
the scaling_min_freq and scaling_max_freq sysfs attributes (Pierre
Gondois)
- Add cpufreq scaling support for Qualcomm Shikra SoC (Taniya Das,
Imran Shaik).
- Improve the warning message on HWP-disabled hybrid processors
printed by the intel_pstate driver and sync policy->cur during CPU
offline in it (Yohei Kojima, Fushuai Wang)
- Drop cpufreq support for AMD Elan SC4* (Sean Young)
- Minor fixes for cpufreq drivers (Krzysztof Kozlowski, Akashdeep
Kaur, Hans Zhang, Guangshuo Li, Xueqin Luo)
- Clean up dead dependencies on X86 in the cpufreq Kconfig (Julian
Braha)
- Allow the intel_idle driver to avoid exposing C-states that are
redundant when PC6 is disabled (Artem Bityutskiy)
- Fix memory leak and a potential race in the OPP core (Abdun Nihaal,
Di Shen)
- Mark Rust OPP methods as inline (Nicolás Antinori)
- Fix misc device registration failure path in the PM QoS core (Yuho
Choi)
- Add sysctl interface for DPM watchdog timeouts (Tzung-Bi Shih)
- Use complete() instead of complete_all() in device_pm_sleep_init()
to avoid a false-positive warning from lockdep_assert_RT_in_threaded_ctx()
when CONFIG_PROVE_RAW_LOCK_NESTING is enabled (Jiakai Xu)
- Use a flexible array for CRC uncompressed buffers during
hibernation image saving (Rosen Penev)
- Make the LZ4 algorithm available for hibernation compression
(l1rox3)
- Move the preallocate_image() call during hibernation after the
"prepare" phase of the "freeze" transition (Matthew Leach)
- Fix a memory leak in rapl_add_package_cpuslocked() in the
intel_rapl power capping driver and use sysfs_emit() in
cpumask_show() in that driver (Sumeet Pawnikar, Yury Norov)
- Fix ValueError when parsing incomplete device properties in the
pm-graph utility (Gongwei Li)"
* tag 'pm-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm: (40 commits)
PM: dpm_watchdog: Add sysctl interface for DPM watchdog timeouts
PM: QoS: Fix misc device registration unwind
cpufreq: Use policy->min/max init as QoS request
cpufreq: Remove driver default policy->min/max init
cpufreq: Set default policy->min/max values for all drivers
cpufreq: Extract cpufreq_policy_init_qos() function
cpufreq: Documentation: fix conservative governor freq_step description
cpufreq: ti: Add EPROBE_DEFER for K3 SoCs
cpufreq: qcom: Add cpufreq scaling support for Qualcomm Shikra SoC
dt-bindings: cpufreq: Document Qualcomm Shikra SoC EPSS
powercap: intel_rapl: Use sysfs_emit() in cpumask_show()
cpufreq: governor: Fix stale prev_cpu_nice spike when enabling ignore_nice_load
cpufreq: governor: Fix data races on per-CPU idle/nice baselines
PM: hibernate: Use flexible array for CRC uncompressed buffers
powercap: intel_rapl: Fix memory leak in rapl_add_package_cpuslocked()
PM: hibernate: make LZ4 available for hibernation compression
PM: sleep: Use complete() in device_pm_sleep_init()
opp: rust: mark OPP methods as inline
cpufreq: intel_pstate: Improve warning message on HWP-disabled hybrid CPUs
cpufreq: elanfreq: Drop support for AMD Elan SC4*
...
Linus Torvalds [Mon, 15 Jun 2026 06:05:11 +0000 (11:35 +0530)]
Merge tag 'thermal-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add new hardware support (i.MX93 TMU, Amlogic T7, Intel Arrow
Lake, QCom Nord, Shikra and Hawi), fix issues in a number of places in
the thermal control core and drivers, clean up code and refactor it in
preparation for future changes:
- Rework the initialization and cleanup of thermal class cooling
devices to separate DT-based cooling device registration and
cooling device registration without DT (Daniel Lezcano, Ovidiu
Panait)
- Update the cooling device DT bindings to support 3-cell cooling
device representation, where the additional cell holds an ID to
select a cooling mechanism for devices that offer multiple cooling
mechanisms, and adjust the cooling device registration code
accordingly (Gaurav Kohli, Daniel Lezcano)
- Remove dead code from two functions in the thermal core and
simplify the unregistration of thermal governors (Rafael Wysocki)
- Fix critical temperature attribute removal handling in the generic
thermal zone hwmon support code and rework that code to register a
separate hwmon class device for each thermal zone (instead of using
one hwmon class device for all thermal zones of the same type) to
address thermal zone removal deadlocks (Rafael Wysocki)
- Use attribute groups for adding temperature attributes to hwmon
class devices associated with thermal zones (Rafael Wysocki)
- Pass WQ_UNBOUND when allocating the thermal workqueue (Marco
Crivellari)
- Fix potential shift overflow in ptc_mmio_write() and improve error
handling in proc_thermal_ptc_add() in the int340x thermal control
driver (Aravind Anilraj)
- Use sysfs_emit() for cpumask printing in the Intel powerclamp
thermal driver (Yury Norov)
- Add Arrow Lake CPU models to the intel_tcc_cooling driver (Srinivas
Pandruvada)
- Add QCom Nord, Shikra and Hawi temperature sensor DT bindings
(Deepti Jaggi, Gaurav Kohli, Dipa Ramesh Mantre)
- Use devm_add_action_or_reset() for clock disable on the NVidia
soctherm and switch it to devm cooling device registration version
(Daniel Lezcano)
- Add the Amlogic T7 thermal sensor along with thermal calibration
data read from SMC calls (Ronald Claveau)
- Fix atomic temperature read in the QCom tsens driver to comply with
hardware documentation (Priyansh Jain)
- Add SpacemiT K1 thermal sensor support (Shuwei Wu)
- Add i.MX93 temperature sensor support and filter out the invalid
temperature (Jacky Bai)
- Enable by default the TMU (Thermal Monitoring Unit) on Exynos
platform (Krzysztof Kozlowski)
- Rework interrupt initialization in the Tsens driver and add the
optional wakeup source (Priyansh Jain)
- Fix typo in a comment in the TSens QCom driver (Jinseok Kim)
- Fix trailing whitespace and repeated word in the OF code, remove
quoted string splitting across lines from the iMX7 driver, and
remove a stray space from the thermal_trip_of_attr() macro
definition (Mayur Kumar)
- Update the thermal testing facility code to avoid NULL pointer
dereferences by rejecting missing command arguments and replace
sscanf() with kstrtoint() or kstrtoul() in that code (Ovidiu
Panait, Samuel Moelius)"
* tag 'thermal-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm: (54 commits)
thermal: sysfs: Replace sscanf() with kstrtoul()
thermal: testing: Replace sscanf() with kstrtoint()
thermal: testing: reject missing command arguments
thermal: intel: intel_tcc_cooling: Add Arrow Lake CPU models
thermal/drivers/qcom/tsens: Disable wakeup interrupt setup on automotive targets
thermal/drivers/qcom/tsens: Switch wake IRQ handling to PM callbacks
thermal/core: Fix missing stub for devm_thermal_cooling_device_register
dt-bindings: thermal: cooling-devices: Update support for 3 cells cooling device
thermal/of: Support cooling device ID in cooling-spec
thermal/of: Pass cdev_id and introduce devm registration helper
thermal/of: Add cooling device ID support
thermal/of: Rename the devm_thermal_of_cooling_device_register() function
thermal/core: Make cooling device OF node conditional on CONFIG_THERMAL_OF
thermal/of: Move cooling device OF helpers out of thermal core
hwmon: Use non-OF thermal cooling device registration API
thermal/core: Add devm_thermal_cooling_device_register()
thermal/core: Introduce non-OF thermal_cooling_device_register()
thermal/drivers/samsung: Enable TMU by default
thermal/driver/qoriq: Workaround unexpected temperature readings from tmu
thermal/drivers/qoriq: Add i.MX93 tmu support
...
Linus Torvalds [Mon, 15 Jun 2026 06:02:38 +0000 (11:32 +0530)]
Merge tag 'acpi-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream version 20260408, introduce support for devres-based management of ACPI notify
handlers and update some core ACPI device drivers on top of that
(which includes some fixes and cleanups), add _DEP support for PCI/CXL
roots and Intel CVS devices, fix a couple of assorted issues and clean
up code:
- Fix multiple issues related to probe, removal and missing NVDIMM
device notifications in the ACPI NFIT driver (Rafael Wysocki)
- Add support for devres-based management of ACPI notify handlers to
the ACPI core (Rafael Wysocki)
- Switch multiple core ACPI device drivers (including the ACPI PAD,
ACPI video bus, ACPI HED, ACPI thermal zone, ACPI AC, ACPI battery,
and ACPI NFIT drivers) over to using devres-based resource
management during probe (Rafael Wysocki)
- Replace mutex_lock/unlock() with guard()/scoped_guard() in the ACPI
PMIC driver (Maxwell Doose)
- Fix message kref handling in the dead device path of the ACPI IPMI
address space handler (Yuho Choi)
- Use sysfs_emit() in idlecpus_show() in the ACPI processor
aggregator device (PAD) driver (Yury Norov)
- Clean up device_id_scheme initialization in the ACPI video bus
driver (Jean-Ralph Aviles)
- Clean up lid handling in the ACPI button driver and
acpi_button_probe(), reorganize installing and removing event
handlers in that driver and switch it over to using devres-based
resource management during probe (Rafael Wysocki)
- Add support for the Legacy Virtual Register (LVR) field in I2C
serial bus resource descriptors to ACPICA (Akhil R)
- Fix multiple issues related to bounds checks, input validation,
use-after-free, and integer overflow checks in the AML interpreter
in ACPICA (ikaros)
- Update the copyright year to 2026 in ACPICA files and make minor
changes related to ACPI 6.6 support (Pawel Chmielewski)
- Remove spurious precision from format used to dump parse trees in
ACPICA (David Laight)
- Add modern standby DSM GUIDs to ACPICA header files (Daniel
Schaefer)
- Update D3hot/cold device power states definitions in ACPICA header
files (Aymeric Wibo)
- Fix NULL pointer dereference in acpi_ns_custom_package() (Weiming
Shi)
- Update ACPICA version to 20260408 (Saket Dumbre)
- Add cpuidle driver check in acpi_processor_register_idle_driver()
to avoid evaluating _CST unnecessarily (Tony W Wang-oc)
- Suppress UBSAN warning caused by field misuse during PCC-based
register access in the ACPI CPPC library (Jeremy Linton)
- Add support for CPPC v4 to the ACPI CPPC library (Sumit Gupta)
- Update the ACPI device enumeration code to honor _DEP for ACPI0016
PCI/CXL host bridges and make the ACPI PCI root driver clear _DEP
dependencies for PCI roots that have become operational (Chen Pei)"
* tag 'acpi-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
ACPI: processor: Add cpuidle driver check in acpi_processor_register_idle_driver()
ACPI: IPMI: Fix message kref handling on dead device
ACPI: CPPC: Suppress UBSAN warning caused by field misuse
ACPI: scan: Honor _DEP for Intel CVS devices
ACPI: NFIT: core: Fix possible deadlock and missing notifications
ACPI: NFIT: core: Eliminate redundant local variable
ACPI: NFIT: core: Fix acpi_nfit_init() error cleanup
ACPI: NFIT: core: Fix possible NULL pointer dereference
ACPI: bus: Clean up devm_acpi_install_notify_handler()
ACPI: button: Switch over to devres-based resource management
ACPI: button: Reorganize installing and removing event handlers
ACPI: button: Use string literals for generating netlink messages
ACPI: button: Clean up adding and removing lid procfs interface
ACPI: button: Merge two switch () statements in acpi_button_probe()
ACPI: button: Drop redundant variable from acpi_button_probe()
ACPI: button: Rework device verification during probe
ACPI: CPPC: Add support for CPPC v4
ACPI: PAD: Use sysfs_emit() in idlecpus_show()
ACPI: scan: Honor _DEP for ACPI0016 PCI/CXL host bridge
ACPI: PCI: Clear _DEP dependencies after PCI root bridge attach
...
Linus Torvalds [Mon, 15 Jun 2026 05:59:31 +0000 (11:29 +0530)]
Merge tag 'nolibc-20260614-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
- New architectures: OpenRISC and 32-bit parisc
- New library functionality: alloca(), assert(), creat() and
ftruncate()
- Automatic large file support
- Proper 64-bit system call argument passing on x32 and MIPS N32
- Cleanups of the testmatrix
- Various bugfixes and cleanups
* tag 'nolibc-20260614-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (37 commits)
selftests/nolibc: test against -Wwrite-strings
selftests/nolibc: use mutable buffer for execve() argv string
tools/nolibc: cast default values of program_invocation_name
tools/nolibc: add ftruncate()
tools/nolibc: add a helper to split a 64-bit argument into 32-bit halves
selftests/nolibc: enable CONFIG_TMPFS for sparc32
tools/nolibc: stackprotector: Avoid stalling program startup if crng is not init yet
tools/nolibc: getopt: Fix potential out of bounds access
selftests/nolibc: test open mode handling
tools/nolibc: always pass mode to open syscall
tools/nolibc: split open mode handling into a macro
tools/nolibc: split implicit open flags into a macro
tools/nolibc: add support for 32-bit parisc
selftests/nolibc: avoid function pointer comparisons
tools/nolibc: add support for OpenRISC / or1k
selftests/nolibc: use vmlinux for MIPS tests
selftests/nolibc: trim IMAGE mappings
selftests/nolibc: trim DEFCONFIG mappings
selftests/nolibc: trim QEMU_ARCH mappings
selftests/nolibc: use QEMU_ARCH for QEMU_ARCH_USER
...
====================
bpf, skmsg: some fixes for skmsg
All fixes are from previous patches sent by Weiming Shi, Zhang Cen,
Kuniyuki and Sechang Lim, which have already been reviewed by me and John and Jakub.
The automated reviewer (sashiko) may still flag a few other potential
issues on top of this series. After looking into them, they are either
already covered by the patches here, are the BPF program's own
responsibility (e.g. initializing the payload it pushes) and intentionally
left out, or only reachable under very narrow conditions that require a
specially crafted BPF program and an unusual sk_msg ring state, so they are
not practical to trigger and are left out of this series. I'm collecting
these fixes together because the same
problems have been re-sent many times in slightly different forms, and I
hope this series can be prioritized for merging so the duplicates can
finally settle. With so many AI-generated patches floating around for
these spots, leaving them unmerged just keeps wasting maintainer review
cycles on the same issues.
v3->v4: Carry Kuniyuki Iwashima's reviewed-by tag.
Drop the __GFP_ZERO patch; initializing the pushed payload is the
BPF program's responsibility, not the kernel's (per maintainer
feedback).
https://lore.kernel.org/bpf/20260612130919.299124-1-jiayuan.chen@linux.dev/
v2->v3: Target to bpf-next and carry Emil's reviewed-by tag.
Reverse xmas tree style is used suggested by Cong.
(not all code match reverse xmas tree due to variable dependency)
v1->v2: fix problem when fix the conflict.
====================
Sechang Lim [Mon, 15 Jun 2026 02:19:59 +0000 (10:19 +0800)]
selftests/bpf: add test for bpf_msg_pop_data() overflow
Add a test in sockmap_basic.c that calls bpf_msg_pop_data() with a length
close to U32_MAX, which overflows the start + len bounds check. The sk_msg
program records the return value over a sendmsg and the test checks that
the call is rejected with -EINVAL.
Sechang Lim [Mon, 15 Jun 2026 02:19:58 +0000 (10:19 +0800)]
bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
start and len are u32, so
u64 last = start + len;
evaluates start + len in 32-bit and wraps before storing it in last.
The bounds check
if (start >= offset + l || last > msg->sg.size)
return -EINVAL;
can then be passed with an out-of-range start/len, after which the pop
loop runs off the end of the scatterlist and sk_msg_shift_left() calls
put_page() on the empty msg->sg.end slot:
Widen the addition with a (u64) cast so the bound is evaluated in
64-bit and a len near U32_MAX no longer wraps below msg->sg.size.
While here, change pop from int to u32. It counts bytes against the
unsigned scatterlist lengths and can never be negative, so the signed
type only invites sign-confusion in the pop loop.
Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages") Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260615021959.140010-6-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Zhang Cen [Mon, 15 Jun 2026 02:19:56 +0000 (10:19 +0800)]
bpf, sockmap: keep sk_msg copy state in sync
SK_MSG uses msg->sg.copy as per-scatterlist-entry provenance. Entries
with this bit set are copied before data/data_end are exposed to SK_MSG
BPF programs for direct packet access.
bpf_msg_pull_data(), bpf_msg_push_data(), and bpf_msg_pop_data()
rewrite the sk_msg scatterlist ring by collapsing, splitting, and
shifting entries. These operations move msg->sg.data[] entries, but the
parallel copy bitmap can be left behind on the old slot. A copied entry
can then return to msg->sg.start with its copy bit clear and be exposed
as directly writable packet data.
This corruption path requires an attached SK_MSG BPF program that calls
the mutating helpers; ordinary sockmap/TLS traffic that never runs
push/pop/pull helper sequences is not affected.
Keep msg->sg.copy synchronized with scatterlist entry moves, preserve
the copy bit when an entry is split, clear it when a helper replaces an
entry with a private page, and clear slots vacated by pull-data
compaction.
Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages") Cc: stable@vger.kernel.org Co-developed-by: Han Guidong <2045gemini@gmail.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Han Guidong <2045gemini@gmail.com> Signed-off-by: Zhang Cen <rollkingzzc@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260615021959.140010-4-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Weiming Shi [Mon, 15 Jun 2026 02:19:55 +0000 (10:19 +0800)]
bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
When bpf_msg_push_data() splits a scatterlist element into head and
tail, the tail's page offset is advanced by `start` (absolute message
byte offset) instead of `start - offset` (byte position within the
element). This makes rsge.offset overshoot by `offset` bytes, pointing
to the wrong location within the page or beyond its boundary. Consumers
of the corrupted entry either silently read wrong data or trigger an
out-of-bounds access.
BUG: KASAN: slab-use-after-free in bpf_msg_pull_data (net/core/filter.c:2728)
Read of size 32752 at addr ffff8881042f0010 by task poc/130
Call Trace:
__asan_memcpy (mm/kasan/shadow.c:105)
bpf_msg_pull_data (net/core/filter.c:2728)
bpf_prog_run_pin_on_cpu (include/linux/bpf.h:1402)
sk_psock_msg_verdict (net/core/skmsg.c:934)
tcp_bpf_send_verdict (net/ipv4/tcp_bpf.c:421)
sock_sendmsg_nosec (net/socket.c:727)
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Reported-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260615021959.140010-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Weiming Shi [Mon, 15 Jun 2026 02:19:54 +0000 (10:19 +0800)]
bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
When the scatterlist ring is full or nearly full, bpf_msg_push_data()
enters a copy fallback path and computes copy + len for the page
allocation size. Since len comes from BPF with arg3_type = ARG_ANYTHING
and both are u32, a crafted len can wrap the sum to a small value,
causing an undersized allocation followed by an out-of-bounds memcpy.
Gabriele Monaco [Thu, 11 Jun 2026 15:07:03 +0000 (17:07 +0200)]
selftsets/bpf: Retry map update on helper_fill_hashmap()
helper_fill_hashmap() is used also on parallel and stress map tests.
Those are consistently failing with ENOMEM on kernels built with
PREEMPT_RT if preallocation is disabled. The failure is transient and
only called by the memory cache refill running in a preemptible
irq_work, which can easily stall in case of contention.
Use a retriable update in those cases to handle transient ENOMEM and
make the test more stable also on PREEMPT_RT.
Also fix the sign of the value printed in case of error (strerror()
expects a positive errno while updates return it negative).
Linus Torvalds [Mon, 15 Jun 2026 03:55:48 +0000 (09:25 +0530)]
Merge tag 'rust-7.2' of gitolite.kernel.org:pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"This one is big due to the vendoring of the `zerocopy` library, which
allows us to replace a bunch of `unsafe` code dealing with conversions
between byte sequences and other types with safe alternatives. More
details on that below (and in its merge commit).
Toolchain and infrastructure:
- Introduce support for the 'zerocopy' library [1][2]:
Fast, safe, compile error. Pick two.
Zerocopy makes zero-cost memory manipulation effortless. We write
`unsafe` so you don't have to.
It essentially provides derivable traits (e.g. 'FromBytes') and
macros (e.g. 'transmute!') for safely converting between byte
sequences and other types. Having such support allows us to remove
some 'unsafe' code.
It is among the most downloaded Rust crates and it is also used by
the Rust compiler itself.
It is licensed under "BSD-2-Clause OR Apache-2.0 OR MIT".
The crates are imported essentially as-is (only +2/-3 lines needed
to be adapted), plus SPDX identifiers. Upstream has since added the
SPDX identifiers as well as one of the tweaks at my request, thus
reducing our future diffs on updates -- I keep the details in one
of our usual live lists [3].
In total, it is about ~39k lines added, ~32k without counting
'benches/' which are just for documentation purposes.
The series includes a few Kbuild and rust-analyzer improvements and
an example patch using it in Nova, removing one 'unsafe impl'.
I checked that the codegen of an isolated example function (similar
to the Nova patch on top) is essentially identical. It also turns
out that (for that particular case) the 'zerocopy' version, even
with 'debug-assertions' enabled, has no remaining panics, unlike a
few in the current code (since the compiler can prove the remaining
'ub_checks' statically).
So their "fast, safe" does indeed check out -- at least in that
case.
- Support AutoFDO. This allows Rust code to be profiled and optimized
based on the profile. Tested with Rust Binder: ~13% slower without
AutoFDO in the binderAddInts benchmark (using an app-launch
benchmark for the profile).
- Support Software Tag-Based KASAN.
In addition, fix KASAN Kconfig by requiring Clang.
- Add Kconfig options for each existing Rust KUnit test suite, such
as 'CONFIG_RUST_BITMAP_KUNIT_TEST'.
They are placed within a new menu, 'CONFIG_RUST_KUNIT_TESTS', in
the new 'rust/kernel/Kconfig.test' file.
- Support the upcoming Rust 1.98.0 release (expected 2026-08-20):
lint cleanups and an unstable flag rename.
- Disable 'rustdoc' documentation inlining for all prelude items,
which bloats the generated documentation.
- Ignore (in Git) and clean (in Kbuild) the (rarely) 'rustc'-generated
'*.long-type-*.txt' files.
'kernel' crate:
- Add new 'bitfield' module with the 'bitfield!' macro (extracted
from the existing 'register!' one), which declares integer types
that are split into distinct bit fields of arbitrary length.
Each field is a 'Bounded' of the appropriate bit width (ensuring
values are properly validated and avoiding implicit data loss) and
gets several generated getters and setters (infallible, 'const' and
fallible) as well as associated constants ('_MASK', '_SHIFT' and
'_RANGE'). It also supports fields that can be converted from/to
custom types, either fallibly ('?=>') or infallibly ('=>').
Add as well documentation and a test suite for it, as usual; and
update the 'register!' macro to use it.
It will be maintained by Alexandre Courbot (with Yury Norov as
reviewer) under a new 'MAINTAINERS' entry: 'RUST [BITFIELD]'.
- 'ptr' module: rework index projection syntax into keyworded syntax
and introduce panicking variant.
The keyword syntax ('build:', 'try:', 'panic:') is more explicit
and paves the way of perhaps adding more flavors in the future,
e.g. an 'unsafe' index projection.
For instance, projections now look like this:
fn f(p: *const [u8; 32]) -> Result {
// Ok, within bounds, checked at build time.
project!(p, [build: 1]);
// Build error.
project!(p, [build: 128]);
// `OutOfBound` runtime error (convertible to `ERANGE`).
project!(p, [try: 128]);
// Runtime panic.
project!(p, [panic: 128]);
Ok(())
}
Update as well the users, which now look like e.g.
// Pointer to the first entry of the GSP message queue.
let data = project!(self.0.as_ptr(), .gspq.msgq.data[build: 0]);
- 'build_assert' module: make the module the home of its macros
instead of rendering them twice.
- Fix the 'Vec::reserve()' doctest to properly account for the
existing vector length in the capacity assertion.
- Fix an incorrect operator in the 'Vec::extend_with()' 'SAFETY'
comment; add a doc test demonstrating basic usage and the
zero-length case.
- Clean imports across several modules to follow the "kernel
vertical" import style in order to minimize conflicts.
'pin-init' crate:
- User visible changes:
- Do not generate 'non_snake_case' warnings for identifiers that
are syntactically just users of a field name. This would allow
all '#[allow(non_snake_case)]' in nova-core to be removed,
which Gary will send to the nova tree next cycle.
- Filter non-cfg attributes out properly in derived structs. This
improves pin-init compatibility with other derive macros.
- Insert projection types' where clause properly.
- Other changes:
- Bump MSRV to 1.82, plus associated cleanups.
- Overhaul how init slots are projected. The new approach is
easier to justify with safety comments.
- Mark more functions as inline, which should help mitigate the
super-long symbol name issue due to lack of inlining.
rust-analyzer:
- Support '--envs' for passing env vars for crates like 'zerocopy'.
'MAINTAINERS':
- Add the following reviewers to the 'RUST' entry:
- Daniel Almeida
- Tamir Duberstein
- Alexandre Courbot
- Onur Özkan
They have been involved in the Rust for Linux project for about 7
collective years and bring expertise across several domains, which
will be very useful to have around in the future.
Thanks everyone for stepping up!
And some other fixes, cleanups and improvements"
Link: https://github.com/google/zerocopy Link: https://docs.rs/zerocopy Link: https://github.com/Rust-for-Linux/linux/issues/1239
* tag 'rust-7.2' of gitolite.kernel.org:pub/scm/linux/kernel/git/ojeda/linux: (86 commits)
MAINTAINERS: add Onur Özkan as Rust reviewer
MAINTAINERS: add Alexandre Courbot as Rust reviewer
MAINTAINERS: add Tamir Duberstein as Rust reviewer
MAINTAINERS: add Daniel Almeida as Rust reviewer
kbuild: rust: clean `zerocopy-derive` in `mrproper`
rust: make `build_assert` module the home of related macros
rust: str: clean unused import for Rust >= 1.98
rust: str: use the "kernel vertical" imports style
rust: aref: use the "kernel vertical" imports style
rust: page: use the "kernel vertical" imports style
gpu: nova-core: firmware: parse `FalconUCodeDescV2` via `zerocopy`
rust: prelude: add `zerocopy{,_derive}::FromBytes`
rust: zerocopy-derive: enable support in kbuild
rust: zerocopy-derive: add `README.md`
rust: zerocopy-derive: avoid generating non-ASCII identifiers
rust: zerocopy-derive: add SPDX License Identifiers
rust: zerocopy-derive: import crate
rust: zerocopy: enable support in kbuild
rust: zerocopy: add `README.md`
rust: zerocopy: remove float `Display` support
...
Linus Torvalds [Mon, 15 Jun 2026 03:46:00 +0000 (09:16 +0530)]
Merge tag 'rcu.release.v7.2' of gitolite.kernel.org:pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Uladzislau Rezki:
"Torture test updates:
- Improve kvm-series.sh script by adding examples in its header
comment
- Lazy RCU is more fully tested now by replacing call_rcu_hurry()
with call_rcu() and doing rcu_barrier() to motivate lazy callbacks
during a stutter pause
- Add more synonyms for the "--do-normal" group of torture.sh
command-line arguments
Misc changes:
- Reduce stack usage of nocb_gp_wait() to address frame size warning
when built with CONFIG_UBSAN_ALIGNMENT
- The synchronize_rcu() call can detect the flood and latches a
normal/default path temporary switching to wait_rcu_gp() path
- Document using rcu_access_pointer() to fetch the old pointer for
lockless cmpxchg() updates
- Simplify some RCU code using clamp_val()
- Fix a kerneldoc header comment typo in srcu_down_read_fast()"
* tag 'rcu.release.v7.2' of gitolite.kernel.org:pub/scm/linux/kernel/git/rcu/linux:
rcu/nocb: reduce stack usage in nocb_gp_wait()
rcu-tasks: Fix possible boot-time tests failed for the call_rcu_tasks()
rcu: Latch normal synchronize_rcu() path on flood
rcu: Document rcu_access_pointer() feeding into cmpxchg()
rcu: Simplify param_set_next_fqs_jiffies() by applying clamp_val()
rcu: Simplify rcu_do_batch() by applying clamp()
checkpatch: Undeprecate rcu_read_lock_trace() and rcu_read_unlock_trace()
srcu: Fix kerneldoc header comment typo in srcu_down_read_fast()
torture: Allow "norm" abbreviation for "normal"
torture: Improve kvm-series.sh header comment
torture: Add torture_sched_set_normal() for user-specified nice values
rcutorture: Fully test lazy RCU
Linus Torvalds [Mon, 15 Jun 2026 03:26:31 +0000 (08:56 +0530)]
Merge tag 'kcsan-20260612-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux
Pull KCSAN update from Marco Elver:
- Silence -Wmaybe-uninitialized when calling __kcsan_check_access()
* tag 'kcsan-20260612-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux:
kcsan: Silence -Wmaybe-uninitialized when calling __kcsan_check_access()
David Windsor [Thu, 11 Jun 2026 14:35:49 +0000 (10:35 -0400)]
selftests/bpf: Add test for sleepable lsm_cgroup rejection
Confirm the verifier rejects loading a sleepable BPF_LSM_CGROUP program,
as introduced in commit 5b038319be44 ("bpf: Reject sleepable
BPF_LSM_CGROUP programs at load time").
Merge remote-tracking branches 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates
* ras/edac-drivers: (21 commits)
EDAC: Consistently define pci_device_ids using named initializers
EDAC/igen6: Add Intel Nova Lake-H SoC support
EDAC/igen6: Make registers for detecting IBECC configurable
EDAC/imh: Add RRL support for Intel Diamond Rapids server
EDAC/{skx_common,i10nm}: Prepare RRL for sub-channel granularity
EDAC/skx_common: Add SubChannel support to ADXL decode
EDAC/{skx_common,i10nm}: Move RRL handling to common code
EDAC/{skx_common,i10nm}: Introduce rrl_ctrl_mode
EDAC/{skx_common,i10nm}: Rename rrl_mode to rrl_source_type
EDAC/{skx_common,skx,i10nm}: Split skx_set_decode()
EDAC/{skx_common,i10nm,imh}: Move MC register access helpers to skx_common
EDAC/{skx_common,skx}: Fix UBSAN shift-out-of-bounds in skx_get_dimm_info
EDAC/igen6: Add one Intel Panther Lake-H SoC support
EDAC/igen6: Fix memory topology parsing for Panther Lake-H SoCs
EDAC/igen6: Fix call trace due to missing release()
EDAC/sb_edac: fix grammar in sb_decode_ddr3 warning
EDAC/i5400: disable error reporting at teardown and refactor helper
EDAC/i5100: disable error reporting at teardown and create helper
EDAC/i5000: disable error reporting at teardown and refactor helper
EDAC/i7300: disable error reporting if init fails and refactor helper
...
* ras/edac-misc:
RAS/AMD/ATL: Drop malformed default N from Kconfig
====================
bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket
When TCP over IPv4 via INET6 API, sk->sk_family is AF_INET6, but it is a
v4 pkt. inet_csk(sk)->icsk_af_ops is ipv6_mapped and use ip_queue_xmit.
The tos sockopt does not work for bpf [get,set]sockopt() helpers.
Leon Hwang [Sat, 13 Jun 2026 16:24:42 +0000 (00:24 +0800)]
bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket
When TCP over IPv4 via INET6 API, bpf_get/setsockopt with ipv4 will
fail, because sk->sk_family is AF_INET6. With ipv6 will success, not
take effect, because inet_csk(sk)->icsk_af_ops is ipv6_mapped and
use ip_queue_xmit, inet_sk(sk)->tos.
To relax this restriction, allow getting/setting tos for those possible
ipv4-mapped ipv6 sockets.
====================
tools build: bpf: Append EXTRA_CFLAGS and HOST_EXTRACFLAGS
Append EXTRA_CFLAGS and HOST_EXTRACFLAGS to the BPF build.
This mitigates an issue introduced in GCC 15, where a {0} initializer
does not guarantee zeroing the entire union [1].
The common changes under tools to support EXTRA_CFLAGS and
HOST_EXTRACFLAGS are sent separately [2]. As suggested, BPF patches
would be picked up via the bpf tree, so this series only includes BPF
related changes.
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
Changes in v2:
- Used strscpy() instead in patch 06 (Ihor).
- Added prefix "bpf-next" in subject (Alexei).
- Added patch 01 to pass host cflags to bootstrap libbpf.
- Added patch 08 to avoid static LLVM linking for cross build.
- Link to v1: https://lore.kernel.org/r/20260323-tools_build_fix_zero_init_bpf_only-v1-0-d1cfad2f4cd1@arm.com
====================
Leo Yan [Tue, 2 Jun 2026 14:47:17 +0000 (15:47 +0100)]
selftests/bpf: Avoid static LLVM linking for cross builds
The BPF selftests prefer static LLVM linking, which works for native
builds but can break cross builds. Its --link-static output may include
host-only libraries that are unavailable for the cross compilation,
causing link failures.
Avoid static LLVM linking for cross builds and use shared LLVM libraries
instead. Native builds keep the existing behavior.
Leo Yan [Tue, 2 Jun 2026 14:47:16 +0000 (15:47 +0100)]
selftests/bpf: Use common CFLAGS for urandom_read
The urandom_read helper and its shared library are built with $(CLANG)
directly rather than through the normal selftest $(CC) rules.
The CFLAGS variable can contain specific flags only for $(CC) but might
be imcompatible for $(CLANG) and those flags are not necessarily valid
for the clang-only urandom_read build.
Split the BPF selftest local flags into COMMON_CFLAGS and append them to
CFLAGS for the normal build path. Use COMMON_CFLAGS directly for
urandom_read and liburandom_read.so, while still filtering out -static as
before.
Leo Yan [Tue, 2 Jun 2026 14:47:13 +0000 (15:47 +0100)]
libbpf: Initialize CFLAGS before including Makefile.include
tools/scripts/Makefile.include may expand EXTRA_CFLAGS in a future
change. This could alter the initialization of CFLAGS, as the default
options "-g -O2" would never be set once EXTRA_CFLAGS is expanded.
Prepare for this by moving the CFLAGS initialization before including
tools/scripts/Makefile.include, so it is not affected by the extended
EXTRA_CFLAGS.
Append EXTRA_CFLAGS to CFLAGS only after including Makefile.include and
place it last so that the extra flags propagate properly and can
override the default options.
tools/scripts/Makefile.include already appends $(CLANG_CROSS_FLAGS) to
CFLAGS, the Makefile appends $(CLANG_CROSS_FLAGS) again, remove the
redundant append.
Leo Yan [Tue, 2 Jun 2026 14:47:10 +0000 (15:47 +0100)]
bpftool: Pass host flags to bootstrap libbpf
bpftool builds a bootstrap libbpf with HOSTCC, but the libbpf submake can
still inherit target build flags through CFLAGS. This can break cross
builds when host objects are compiled with target-only options.
Since HOST_CFLAGS contains warning options that are not suitable for
building libbpf, use LIBBPF_BOOTSTRAP_CFLAGS with the warning options
removed to build the bootstrap libbpf. Clear EXTRA_CFLAGS so target
extra flags are not mixed into the host bootstrap libbpf build.
====================
bpf: Allow uprobe_multi binary specified by file descriptor
Add ability to open uprobe_multi link on top of binary identified
by file descriptor. This allows us to avoid the race where the binary is
replaced between path resolution and attachment, ensuring we monitor the
intended binary.
v3 changes:
- guard t_user acesss with access_ok [sashiko]
v2 changes:
- move path retrieval in separate function so CLASS(..) is not used in function
with goto-based cleanup [sashiko]
- force zero path_fd in case BPF_F_UPROBE_MULTI_PATH_FD is not set [sashiko]
- add space around | in bpf_uprobe_multi_link_attach [Alexei]
====================
Jiri Olsa [Thu, 11 Jun 2026 11:42:30 +0000 (13:42 +0200)]
selftests/bpf: Fix typo in verify_umulti_link_info
We verify info.uprobe_multi.flags against wrong kprobe-multi flag
(BPF_F_KPROBE_MULTI_RETURN). It's the same value as the correct
flag (BPF_F_UPROBE_MULTI_RETURN), so there's not functional change.
Jiri Olsa [Thu, 11 Jun 2026 11:42:28 +0000 (13:42 +0200)]
selftests/bpf: Add uprobe_multi path_fd test
Add a uprobe_multi link API selftest that opens /proc/self/exe and passes
the resulting descriptor through opts.uprobe_multi.path_fd with
BPF_F_UPROBE_MULTI_PATH_FD set.
Jiri Olsa [Thu, 11 Jun 2026 11:42:26 +0000 (13:42 +0200)]
bpf: Add support to specify uprobe_multi target via file descriptor
Allow uprobe_multi link to identify the target binary by an already
opened file descriptor.
Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for
the attr.link_create.uprobe_multi struct.
When the flag is set, we resolve the target from path_fd, without the
flag, we keep the existing string path behavior.
I don't see a use case for supporting O_PATH file descriptors, because
we need to read the binary first to get probes offsets, so I'm using
the CLASS(fd, f), which fails for O_PATH fds.
Jiri Olsa [Thu, 11 Jun 2026 11:42:25 +0000 (13:42 +0200)]
bpf: Use user_path_at for path resolution in uprobe_multi
Resolve the uprobe_multi user path with user_path_at() instead of copying
the string with strndup_user() and passing it to kern_path(). This removes
the temporary allocation and keeps the lookup logic in one helper.
Linus Torvalds [Sun, 14 Jun 2026 23:36:02 +0000 (05:06 +0530)]
Merge tag 'for-linus-7.2-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- Several small cleanups of various Xen related drivers
(xen/platform-pci, xen-balloon, xenbus, xen/mcelog)
- Cleanup for Xen PV-mode related code (includes dropping the Xen
debugfs code)
- Drop the additional lazy mmu mode tracking done by Xen specific code
* tag 'for-linus-7.2-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: Replace strcpy() with memcpy()
x86/xen: Replace generic lazy tracking with cpu specific one
x86/xen: Get rid of last XEN_LAZY_MMU uses
mm: Refactor lazy_mmu_mode_pause() and lazy_mmu_mode_resume()
x86/xen: Change interface of xen_mc_issue()
x86/xen: Drop lazy mode from trace entries
x86/xen: Remove Xen debugfs support
x86/xen: Cleanup Xen related trace points
x86/xen: Guard PV-only stuff in xen-ops.h with CONFIG_XEN_PV
xen: balloon: Replace sprintf() with sysfs_emit()
xen/mcelog: mark g_physinfo, ncpus and xen_mce_chrdev_device as __ro_after_init
xen: constify xsd_errors array
xen/platform-pci: Simplify initialization of pci_device_id array
Linus Torvalds [Sun, 14 Jun 2026 23:31:15 +0000 (05:01 +0530)]
Merge tag 'kbuild-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild / Kconfig updates from Nathan Chancellor:
"Kbuild:
- Remove broken module linking exclusion for BTF
- Add documentation around how offset header files work
- Include unstripped vDSO libraries in pacman packages
- Bump minimum version of LLVM for building the kernel to 17.0.1 and
clean up unnecessary workarounds
- Use a context manager in run-clang-tools
- Add dist macro value if present to release tag for RPM packages
- Detect and report truncated buf_printf() output in modpost
- Add __llvm_covfun and __llvm_covmap to section whitelist in modpost
- Support Clang's distributed ThinLTO mode
- Remove architecture specific configurations for AutoFDO and
Propeller to ease individual architecture maintenance
Kconfig:
- Add kconfig-sym-check target to look for dangling Kconfig symbol
references and invalid tristate literal values
- Harden against potential NULL pointer dereference
- Fix typo in Kconfig test comment"
* tag 'kbuild-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (31 commits)
kconfig: tests: fix typo in comment
kconfig: Remove the architecture specific config for Propeller
kconfig: Remove the architecture specific config for AutoFDO
modpost: Add __llvm_covfun and __llvm_covmap to section_white_list
kconfig: add kconfig-sym-check static checker
kbuild: Remove unnecessary 'T' modifier in cmd_ar_builtin_fixup
kbuild: distributed build support for Clang ThinLTO
kbuild: move vmlinux.a build rule to scripts/Makefile.vmlinux_a
scripts: modpost: detect and report truncated buf_printf() output
kbuild: rpm-pkg: append %{?dist} macro to Release tag
run-clang-tools: run multiprocessing.Pool as context manager
compiler-clang.h: Drop explicit version number from "all" diagnostic macro
compiler-clang.h: Remove __cleanup -Wunused-variable workaround
kbuild: Remove check for broken scoping with clang < 17 in CC_HAS_ASM_GOTO_OUTPUT
x86/entry/vdso32: Remove conditional omission of '.cfi_offset eflags'
x86/module: Revert "Deal with GOT based stack cookie load on Clang < 17"
x86/build: Drop unnecessary '-ffreestanding' addition to KBUILD_CFLAGS
scripts/Makefile.warn: Drop -Wformat handling for clang < 16
riscv: Drop tautological condition from TOOLCHAIN_NEEDS_OLD_ISA_SPEC
riscv: Remove tautological condition from selection of ARCH_SUPPORTS_CFI
...
Linus Torvalds [Sun, 14 Jun 2026 22:58:20 +0000 (04:28 +0530)]
Merge tag 'pull-configfs-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull configfs updates from Al Viro:
"A couple of fixes (UAF in configfs_lookup() and really old races
introduced when lseek() on configfs directories stopped locking those
directories; impact up to and including UAF).
Fixes aside, the main result is that configfs is finally switched to
tree-in-dcache machinery. It's *not* making use of recursive removal
helpers yet, and it still does the bloody awful "build subtree in full
sight of userland, with possibility of failure halfway through and
need to unroll" that forces the locking model from hell; dealing with
that is a separate patch series, once this one is out of the way.
However, it is using DCACHE_PERSISTENT properly now. And apparmorfs is
the sole remaining user of __simple_{unlink,rmdir}() at that point"
* tag 'pull-configfs-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
create_default_group(): pass parent's dentry instead of config_group
configfs_attach_group(): drop the unused parent_item argument
configs_attach_item(): drop unused parent_item argument
configfs_create(): lift parent timestamp updates into callers
kill configfs_drop_dentry()
configfs: mark pinned dentries persistent
configfs: dentry refcount needs to be pinned only once
switch configfs_detach_{group,item}() to passing dentry
configfs_remove_dir(), detach_attrs(): switch to passing dentry
populate_attrs(): move cleanup to the sole caller
populate_group(): move cleanup on failure to the sole caller
configfs_detach_rollback(): pass configfs_dirent instead of dentry
configfs_do_depend_item(): pass configfs_dirent instead of dentry
configfs_depend_prep(): pass configfs_dirent instead of dentry
configfs_detach_prep(): pass configfs_dirent instead of dentry
configfs_mkdir(): use take_dentry_name_snapshot()
configfs: fix lockless traversals of ->s_children
configfs_lookup(): don't leave ->s_dentry dangling on failure
Linus Torvalds [Sun, 14 Jun 2026 22:51:00 +0000 (04:21 +0530)]
Merge tag 'pull-d_add' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dentry d_add() cleanups from Al Viro:
"This converts a bunch of unidiomatic uses of d_add() in ->lookup()
instances to equivalent uses of d_splice_alias(), which is the normal
mechanism for ->lookup()"
* tag 'pull-d_add' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gfs2: use d_splice_alias() for ->lookup() return value
ntfs: use d_splice_alias() for ->lookup() return value
simple_lookup(): use d_splice_alias() for ->lookup() return value
ecryptfs: use d_splice_alias() for ->lookup() return value
configfs_lookup(): switch to d_splice_alias()
tracefs: use d_splice_alias() in ->lookup() instances
Linus Torvalds [Sun, 14 Jun 2026 22:45:31 +0000 (04:15 +0530)]
Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache updates from Al Viro:
- d_alloc_parallel() API change (Neil's with my changes)
- NORCU fixes
- Reorganization and simplification of dentry eviction logic
- Simplifying rcu_read_lock() scopes in fs/dcache.c
- Secondary roots work - getting rid of NFS fake root dentries and
dealing with remaining shrink_dcache_for_umount() and
shrink_dentry_list() races
- making cursors NORCU (surprisingly easy)
* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits)
make cursors NORCU
nfs: get rid of fake root dentries
wind ->s_roots via ->d_sib instead of ->d_hash
shrink_dentry_tree(): unify the calls of shrink_dentry_list()
shrinking rcu_read_lock() scope in d_alloc_parallel()
d_walk(): shrink rcu_read_lock() scope
document dentry_kill()
adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill()
Document rcu_read_lock() use in select_collect2()
Shift rcu_read_{,un}lock() inside fast_dput()
simplify safety for lock_for_kill() slowpath
fold lock_for_kill() and __dentry_kill() into common helper
fold lock_for_kill() into shrink_kill()
shrink_dentry_list(): start with removing from shrink list
d_prune_aliases(): make sure to skip NORCU aliases
kill d_dispose_if_unused()
make to_shrink_list() return whether it has moved dentry to list
select_collect(): ignore dentries on shrink lists if they have positive refcounts
find_acceptable_alias(): skip NORCU aliases with zero refcount
fix a race between d_find_any_alias() and final dput() of NORCU dentries
...
Linus Torvalds [Sun, 14 Jun 2026 22:37:58 +0000 (04:07 +0530)]
Merge tag 'vfs-7.2-rc1.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull procfs updates from Christian Brauner:
- Revamp fs/filesystems.c
The file was a mess with a hand-rolled linked list in desperate need
of a cleanup. The filesystems list is now RCU-ified, /proc files can
be marked permanent from outside fs/proc/, and the string emitted
when reading /proc/filesystems is pre-generated and cached instead of
pointer-chasing and printfing entry by entry on every read.
The file is read frequently because libselinux reads it and is linked
into numerous frequently used programs (even ones you would not
suspect, like sed!). Scalability also improves since reference
maintenance on open/close is bypassed.
A follow-up patch adds missing unlocks in some corner cases and
tidies things up.
- Relax the mount visibility check for subset=pid mounts
When procfs is mounted with subset=pid, all static files become
unavailable and only the dynamic pid information is accessible. In
that case there is no point in imposing the full mount visibility
restrictions on the mounter - everything that can be hidden in procfs
is already inaccessible. These restrictions prevented procfs from
being mounted inside rootless containers since almost all container
implementations overmount parts of procfs to hide certain
directories.
As part of this /proc/self/net is only shown in subset=pid mounts for
CAP_NET_ADMIN, reconfiguring subset=pid is rejected, the
SB_I_USERNS_VISIBLE superblock flag is replaced with an
FS_USERNS_MOUNT_RESTRICTED filesystem flag, fully visible mounts are
recorded in a list, and the mount restrictions are finally
documented.
- Protect ptrace_may_access() with exec_update_lock in procfs
Most uses of ptrace_may_access() in procfs should hold
exec_update_lock to avoid TOCTOU issues with concurrent privileged
execve() (like setuid binary execution).
This fixes the easy cases - the owner and visibility checks and the
FD link permission checks - with the gnarlier ones to follow later.
* tag 'vfs-7.2-rc1.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: fix ups and tidy ups to /proc/filesystems caching
proc: protect ptrace_may_access() with exec_update_lock (FD links)
proc: protect ptrace_may_access() with exec_update_lock (part 1)
docs: proc: add documentation about mount restrictions
proc: handle subset=pid separately in userns visibility checks
proc: prevent reconfiguring subset=pid
proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN
fs: cache the string generated by reading /proc/filesystems
sysfs: remove trivial sysfs_get_tree() wrapper
fs: RCU-ify filesystems list
fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED
proc: allow to mark /proc files permanent outside of fs/proc/
namespace: record fully visible mounts in list
Linus Torvalds [Sun, 14 Jun 2026 22:29:45 +0000 (03:59 +0530)]
Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Reduce pipe->mutex contention by pre-allocating pages outside the
lock in anon_pipe_write().
anon_pipe_write() called alloc_page() once per page while holding
pipe->mutex. The allocation can sleep doing direct reclaim and runs
memcg charging, which extends the critical section and stalls any
concurrent reader on the same mutex. Now up to 8 pages are
pre-allocated before the mutex is taken, leftovers are recycled
into the per-pipe tmp_page[] cache before unlock, and any remainder
is released after unlock, keeping the allocator out of the critical
section on both sides. On a writers x readers sweep with 64KB
writes against a 1 MB pipe throughput improves 6-28% and average
write latency drops 5-22%; under memory pressure - when the cost of
holding the mutex across reclaim is highest - throughput improves
21-48% and latency drops 17-33%. The microbenchmark is added to
selftests.
- uaccess/sockptr: fix the ignored_trailing logic in
copy_struct_to_user() to behave as documented and the usize check
in copy_struct_from_sockptr() for user pointers, and add
copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr()
helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC).
- bpf: add a sleepable bpf_real_inode() kfunc that resolves the real
inode backing a dentry via d_real_inode(). On overlayfs the inode
attached to the dentry doesn't carry the underlying device
information; this is used by the filesystem restriction BPF program
that was merged into systemd.
- docs: add guidelines for submitting new filesystems, motivated by
the maintenance burden abandoned and untestable filesystems impose
on VFS developers, blocking infrastructure work like folio
conversions and iomap migration.
Fixes:
- libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
and drop the now-redundant assignments in callers. This began as a
one-line dma-buf fix for a path_noexec() warning; a pseudo
filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo()
callers were audited: the only visible effect is on dma-buf where
SB_I_NOEXEC silences the warning.
- Handle set_blocksize() failures in legacy filesystems (bfs, hpfs,
qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a
device with a sector size > PAGE_SIZE crashed roughly half of them;
the rest had the same missing error handling pattern. Plus a
follow-up releasing the superblock buffer_head when setting the
minix v3 block size fails.
- mount: honour SB_NOUSER in the new mount API.
- fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by
switching the process-group paths of send_sigio() and send_sigurg()
from read_lock(&tasklist_lock) to RCU, matching the single-PID
path.
- vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing
delegated NFS mounts (fsopen() in a container with the mount
performed by a privileged daemon) that broke when non-init
s_user_ns was tied to FS_USERNS_MOUNT.
- selftests/namespaces: fix a hang in nsid_test where an unreaped
grandchild kept the TAP pipe write-end open, a waitpid(-1) race in
listns_efault_test, and a false FAIL on kernels without listns()
where the tests should SKIP.
- filelock: fix the break_lease() stub signature for
CONFIG_FILE_LOCKING=n.
- init/initramfs_test: wait for the async initramfs unpacking before
running; the test and do_populate_rootfs() share the parser state.
- fs/coredump: reduce redundant log noise in
validate_coredump_safety().
- iomap: pass the correct length to fserror_report_io() in
__iomap_write_begin().
- backing-file: fix the backing_file_open() kerneldoc.
Cleanups:
- initramfs: refactor the cpio hex header parsing to use hex2bin()
instead of the hand-rolled simple_strntoul() which is reverted, and
extend the initramfs KUnit tests to cover header fields with 0x
prefixes.
- Replace __get_free_pages() and friends with kmalloc()/kzalloc()
across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2,
isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the
do_mounts init code - part of the larger work of replacing page
allocator calls with kmalloc().
- Use clear_and_wake_up_bit() in unlock_buffer() and
journal_end_buffer_io_sync() instead of open-coding the sequence.
- Drop unused VFS exports: unexport drop_super_exclusive(), remove
start_removing_user_path_at(), and fold __start_removing_path()
into start_removing_path().
- fs/read_write: narrow the __kernel_write() export with
EXPORT_SYMBOL_FOR_MODULES().
- vfs: uapi: retire octal and hex constants in favor of (1 << n) for
the O_ flags. Finding a free bit for a new flag across the
architectures was needlessly hard with the mixed bases.
- dcache: add extra sanity checks of dead dentries in dentry_free()
via a new DENTRY_WARN_ONCE() that also prints d_flags.
- iov_iter: use kmemdup_array() in dup_iter() to harden the
allocation against multiplication overflow.
- fs/pipe: write to ->poll_usage only once.
- vfs: remove an always-taken if-branch in find_next_fd().
- dcache: use kmalloc_flex() for struct external_name in __d_alloc().
- namei: use QSTR() instead of QSTR_INIT() in path_pts().
- sync_file_range: delete dead S_ISLNK code.
- Comment fixes: retire a stale comment in fget_task_next() and fix
assorted spelling mistakes"
* tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits)
backing-file: fix backing_file_open() kerneldoc parameter
iomap: pass the correct len to fserror_report_io in __iomap_write_begin
vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS
filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n
vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
bpf: add bpf_real_inode() kfunc
fs/read_write: Do not export __kernel_write() to the entire world
libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
mount: honour SB_NOUSER in the new mount API
fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
selftests/pipe: add pipe_bench microbenchmark
fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write
fs: retire stale comment in fget_task_next()
fs: fix spelling mistakes in comment
bfs: replace get_zeroed_page() with kzalloc()
binfmt_misc: replace __get_free_page() with kmalloc()
configfs: replace __get_free_pages() with kzalloc()
fs/namespace: use __getname() to allocate mntpath buffer
fs/select: replace __get_free_page() with kmalloc()
...
Linus Torvalds [Sun, 14 Jun 2026 22:24:54 +0000 (03:54 +0530)]
Merge tag 'vfs-7.2-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull simple_xattr updates from Christian Brauner:
"This reworks the simple xattr api to make it more efficient and easier
to use for all consumers.
The simple_xattr hash table moves from the inode into a per-superblock
cache, removing the per-inode overhead for the common case of few or
no xattrs. The interface now passes struct simple_xattrs ** so lazy
allocation is handled internally instead of by every caller, kernfs
xattr operations on kernfs nodes shared between multiple superblocks
are properly serialized, and tmpfs constructs "security.foo" xattr
names with kasprintf() instead of kmalloc() plus two memcpy()s.
A follow-up fix links kernfs nodes to their parent before the LSM init
hook runs: with the per-sb cache kernfs_xattr_set() computes the cache
via kernfs_root(kn), which faulted on a freshly allocated node when
selinux_kernfs_init_security() called into it - reproducible as a NULL
pointer dereference on the first cgroup mkdir on SELinux-enabled
systems.
On top of this bpffs gains support for trusted.* and security.* xattrs
so that user space and BPF LSM programs can attach metadata - for
example a content hash or a security label - to pinned objects and
directories and inspect it uniformly like on other filesystems. The
store is in-memory and non-persistent, living only for the lifetime of
the mount like everything else in bpffs"
* tag 'vfs-7.2-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
bpf: Add simple xattr support to bpffs
kernfs: link kn to its parent before the LSM init hook
simpe_xattr: use per-sb cache
simple_xattr: change interface to pass struct simple_xattrs **
tmpfs: simplify constructing "security.foo" xattr names
kernfs: fix xattr race condition with multiple superblocks
Linus Torvalds [Sun, 14 Jun 2026 22:16:54 +0000 (03:46 +0530)]
Merge tag 'vfs-7.2-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
- Add the vfs infrastructure required to implement fs-verity support
for XFS with a post-EOF merkle tree: fsverity generates and stores a
zero-block hash, and iomap learns to verify data on buffered reads,
to handle fsverity during writeback via the new IOMAP_F_FSVERITY
flag, and to write fsverity metadata through iomap_fsverity_write().
- Skip the memset of the iomap in iomap_iter() once the iteration is
done. In high-IOPS scenarios (4k randread NVMe polling via io_uring)
the pointless memset wasted memory write bandwidth; this improves
IOPS by about 5% on ext4 and xfs.
- Add balance_dirty_pages_ratelimited() to iomap_zero_iter(), aligning
it with iomap_write_iter(). This prepares for the exFAT iomap
conversion where zeroing beyond valid_size can trigger large-scale
zeroing operations that caused memory pressure without throttling.
- Remove the over-strict inline data boundary check. If a filesystem
provides a valid inline_data pointer and length there is no reason to
require that inline data must not cross a page boundary.
- Don't make REQ_POLLED imply REQ_NOWAIT, matching the earlier
equivalent block layer fix: there are valid cases to poll for I/O
completion without REQ_NOWAIT, and REQ_NOWAIT for file system writes
is currently not supported as writes aren't idempotent.
- Introduce IOMAP_F_ZERO_TAIL for filesystems that maintain a separate
valid data length (exFAT, NTFS). For a write starting at or beyond
valid_size, __iomap_write_begin() now zeroes only the tail portion of
the block while preserving valid data before it, instead of leaving
stale data in the page cache. The flag is also added to the iomap
trace event strings.
* tag 'vfs-7.2-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: Add IOMAP_F_ZERO_TAIL flag to trace event strings
iomap: introduce iomap_fsverity_write() for writing fsverity metadata
iomap: teach iomap to read files with fsverity
iomap: introduce IOMAP_F_FSVERITY and teach writeback to handle fsverity
fsverity: generate and store zero-block hash
iomap: introduce IOMAP_F_ZERO_TAIL flag
iomap: don't make REQ_POLLED imply REQ_NOWAIT
iomap: remove over-strict inline data boundary check
iomap: add dirty page control to iomap_zero_iter
iomap: avoid memset iomap when iter is done
Linus Torvalds [Sun, 14 Jun 2026 22:10:54 +0000 (03:40 +0530)]
Merge tag 'vfs-7.2-rc1.eventpoll' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull eventpoll updates from Christian Brauner:
- eventpoll clarity refactor
The recent eventpoll UAF fixes (a6dc643c6931 and follow-ups) depended
on invariants in fs/eventpoll.c that were nowhere documented and had
to be reverse-engineered from the code: the lifetime relationships
between struct eventpoll, struct epitem, and struct file, the three
removal paths coordinating via epi_fget() pins and ep->mtx, the
ovflist sentinel-encoded scan state machine, the POLLFREE
release/acquire handshake, and the loop / path check globals
serialized by epnested_mutex. The fixes were correct but the next
person to touch this code would hit the same learning curve.
This series codifies those invariants in source and tightens the
surrounding structure. No functional changes intended:
- Documentation: a top-of-file overview with field-protection
tables for struct eventpoll and struct epitem, a section
gathering the loop-check / path-check globals next to their
declarations, labelled comments on the two sides of the POLLFREE
handshake, refreshed comments on epi_fget() and ep_remove_file(),
and a docblock on ep_clear_and_put() that names its two-pass
structure as load-bearing.
- Mechanical renames: ep_refcount_dec_and_test() -> ep_put() to
pair with ep_get(), attach_epitem() -> ep_attach_file() for
ep_remove_file() symmetry, the unused depth argument dropped from
epoll_mutex_lock(), and the CONFIG_KCMP block relocated next to
CONFIG_COMPAT so the hot-path code is contiguous.
- Helper extraction: ep_insert() splits into ep_alloc_epitem() and
ep_register_epitem(), ep_clear_and_put()'s two passes become
ep_drain_pollwaits() and ep_drain_tree() so the ordering
invariant is enforced by the call sequence rather than
convention, the per-event delivery loop body becomes
ep_deliver_event(), and the ep->mtx + epnested_mutex acquisition
dance lifts out of do_epoll_ctl() into ep_ctl_lock() /
ep_ctl_unlock().
- Sentinel and predicate cleanup: the EP_UNACTIVE_PTR overload is
hidden behind named helpers (ep_is_scanning, epi_on_ovflist,
...), epi->next is renamed to epi->ovflist_next, and the boolean
predicates return bool.
- The per-CTL_ADD scratch state (tfile_check_list, path_count[],
inserting_into) moves from file-scope globals into a
stack-allocated struct ep_ctl_ctx plumbed through the loop / path
check chain.
Two follow-up fixes are included: missing kernel-doc for the new @ctx
parameters, and restoring the EP_UNACTIVE_PTR sentinel for
ctx->tfile_check_list - replacing it with NULL termination broke
ep_remove_file()'s "never listed" check for the list tail, causing a
syzbot-reported use-after-free.
- io_uring related epoll cleanups
One of the nastier things about epoll is how it allows nesting
contexts inside each other, leading to the necessity of loop
detection and the issues that have come with that. There is no reason
to support nesting on the io_uring side, so contain the damage and
disallow nested contexts from there: eventpoll gains a file based
control interface and struct epoll_filefd is renamed to epoll_key.
The io_uring side proper goes on top of this through the block tree.
- Fix epoll_wait() reporting false negatives
ep_events_available() checks ep->rdllist and ep_is_scanning() without
a lock and can race with a concurrent scan such that neither check
sees the events, causing epoll_wait() with a zero timeout to wrongly
report no events even though events are available. A sequence lock
closes the race and a reproducer is added to the eventpoll selftests.
* tag 'vfs-7.2-rc1.eventpoll' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
eventpoll: restore EP_UNACTIVE_PTR sentinel for ctx->tfile_check_list
eventpoll: Fix epoll_wait() report false negative
selftests/eventpoll: Add test for multiple waiters
eventpoll: add missing kernel-doc for @ctx function parameters
eventpoll: rename struct epoll_filefd to epoll_key
eventpoll: add file based control interface
eventpoll: export is_file_epoll()
eventpoll: pass struct epoll_filefd through ep_find() and ep_insert()
eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx
eventpoll: use bool for predicate helpers
eventpoll: rename epi->next and txlist for clarity
eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers
eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock()
eventpoll: extract ep_deliver_event() from ep_send_events()
eventpoll: split ep_clear_and_put() into drain helpers
eventpoll: split ep_insert() into alloc + register stages
eventpoll: relocate KCMP helpers near compat syscalls
eventpoll: rename attach_epitem() to ep_attach_file()
eventpoll: drop unused depth argument from epoll_mutex_lock()
eventpoll: rename ep_refcount_dec_and_test() to ep_put()
...
Linus Torvalds [Sun, 14 Jun 2026 22:06:08 +0000 (03:36 +0530)]
Merge tag 'vfs-7.2-rc1.bh' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull buffer_head updates from Christian Brauner:
"This removes b_end_io from struct buffer_head.
Instead of setting bio->bi_end_io to end_bio_bh_io_sync() which then
calls bh->b_end_io(), the new bh_submit() and __bh_submit() interfaces
set bio->bi_end_io to the appropriate completion handler directly,
replacing two indirect function calls in the completion path with one.
It is also one fewer function pointer in the middle of a writable data
structure that can be corrupted, it shrinks struct buffer_head from
104 to 96 bytes allowing roughly 7% more buffer_heads to be cached in
the same amount of memory, and it removes some atomic operations as
the buffer refcount is no longer incremented before calling the end_io
handler.
All in-tree users (fs/buffer.c itself, ext4, jbd2, ocfs2, gfs2,
nilfs2, and md-bitmap) are converted, and submit_bh(),
mark_buffer_async_write(), and end_buffer_write_sync() are removed"
* tag 'vfs-7.2-rc1.bh' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (34 commits)
buffer: Remove end_buffer_write_sync()
buffer: Change calling convention for end_buffer_read_sync()
buffer: Remove b_end_io
buffer: Remove submit_bh()
md-bitmap: Convert read_file_page and write_file_page to bh_submit()
nilfs2: Convert nilfs_mdt_submit_block to bh_submit()
nilfs2: Convert nilfs_gccache_submit_read_data to bh_submit()
nilfs2: Convert nilfs_btnode_submit_block to bh_submit()
buffer: Remove mark_buffer_async_write()
gfs2: Convert gfs2_aspace_write_folio to bh_submit()
gfs2: Remove use of b_end_io in gfs2_meta_read_endio()
gfs2: Convert gfs2_dir_readahead to bh_submit()
gfs2: Convert gfs2_metapath_ra to bh_submit()
ocfs2: Convert ocfs2_write_super_or_backup to bh_submit()
ocfs2: Convert ocfs2_read_blocks to bh_submit()
ocfs2: Convert ocfs2_read_block to bh_submit()
ocfs2: Convert ocfs2_write_block to bh_submit()
jbd2: Convert jbd2_write_superblock() to bh_submit()
jbd2: Convert journal commit to bh_submit()
ext4: Convert ext4_commit_super() to bh_submit()
...
Marco Crivellari [Thu, 14 May 2026 15:01:22 +0000 (17:01 +0200)]
drm/bridge: anx7625: Add WQ_PERCPU add to alloc_workqueue
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.
In order to keep alloc_workqueue() behavior identical, explicitly request
WQ_PERCPU.
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org>
wuyankun [Thu, 11 Jun 2026 01:55:45 +0000 (09:55 +0800)]
wifi: ath6kl: fix invalid workqueue flags in ath6kl_usb_create()
ath6kl_usb_create() currently creates ath6kl_wq with flags set to 0:
alloc_workqueue("ath6kl_wq", 0, 0)
This triggers a runtime warning in __alloc_workqueue() because the queue is
created with neither WQ_PERCPU nor WQ_UNBOUND set:
workqueue: ath6kl_wq is using neither WQ_PERCPU or WQ_UNBOUND.
Setting WQ_PERCPU.
Set WQ_PERCPU explicitly to match the actual execution model and remove the
warning during device probe. No functional change intended.
Fixes: 21c05ca88a54 ("workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present") Reported-by: syzbot+f80c62f371ba6a1e7d79@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/6a289c01.39669fcc.33b062.00aa.GAE@google.com/T/ Signed-off-by: wuyankun <wuyankun@uniontech.com> Signed-off-by: Tejun Heo <tj@kernel.org>
btrfs: Drop WQ_PERCPU from ordered_flags in btrfs_init_workqueues()
After commit 21c05ca88a54 ("workqueue: Add warnings and ensure one among
WQ_PERCPU or WQ_UNBOUND is present"), there is a warning from the
btrfs-qgroup-rescan workqueue at run time:
workqueue: btrfs-qgroup-rescan uses both WQ_PERCPU and WQ_UNBOUND. Dropped WQ_PERCPU, keeping WQ_UNBOUND.
WQ_PERCPU is included in ordered_flags after commit 69635d7f4b34 ("fs:
WQ_PERCPU added to alloc_workqueue users") and WQ_UNBOUND is set in
alloc_ordered_workqueue(), which btrfs_alloc_ordered_workqueue() calls.
Drop WQ_PERCPU from ordered_flags, as alloc_ordered_workqueue() notes
that only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful.
Fixes: 69635d7f4b34 ("fs: WQ_PERCPU added to alloc_workqueue users") Fixes: 21c05ca88a54 ("workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Breno Leitao <leitao@debian.org> Acked-by: Marco Crivellari <marco.crivellari@suse.com> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Sun, 14 Jun 2026 22:00:45 +0000 (03:30 +0530)]
Merge tag 'vfs-7.2-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs writeback updates from Christian Brauner:
- Fix a race between cgroup_writeback_umount() and inode_switch_wbs()
When a container exits, a race between cgroup_writeback_umount() and
inode_switch_wbs()/cleanup_offline_cgwb() can trigger "VFS: Busy
inodes after unmount" followed by a use-after-free on percpu
counters.
There is a window between inode_prepare_wbs_switch() returning true
(having passed the SB_ACTIVE check and grabbed the inode) and the
subsequent wb_queue_isw() call: if cgroup_writeback_umount() observes
the global isw_nr_in_flight counter as non-zero but flush_workqueue()
finds nothing queued yet, it returns early - leaving a held inode
reference that blocks evict_inodes() and a later iput() that hits
freed percpu counters.
The race is closed by covering the window from
inode_prepare_wbs_switch() through wb_queue_isw() with an RCU
read-side critical section and synchronizing in the umount path.
On top of that the now-dead rcu_barrier() left over from the
queue_rcu_work() era is removed, and the global
synchronize_rcu()/flush_workqueue() pair is replaced with a per-sb
in-flight counter plus pin/unpin/drain helpers so umount no longer
serializes against switch activity on unrelated superblocks.
Under cgroup writeback churn on a 16 vCPU guest this takes umount
latency from ~92-138ms p50 down to ~5-8ms p50 and the cumulative cost
of cgroup_writeback_umount() from ~62ms to ~4us per call.
The initial race fix is kept separate and minimal so it backports
cleanly to stable trees that still queue switches via
queue_rcu_work().
- Improve write performance with RWF_DONTCACHE
Dirty DONTCACHE pages are now tracked per bdi_writeback so that the
writeback flusher can be kicked in a targeted fashion for
IOCB_DONTCACHE writes instead of relying on global writeback, and the
PG_dropbehind flag is preserved when a folio is split.
* tag 'vfs-7.2-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
mm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking
mm: track DONTCACHE dirty pages per bdi_writeback
mm: preserve PG_dropbehind flag during folio split
writeback: use a per-sb counter to drain inode wb switches at umount
writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
Linus Torvalds [Sun, 14 Jun 2026 21:55:36 +0000 (03:25 +0530)]
Merge tag 'vfs-7.2-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs superblock updates from Christian Brauner:
"This retires sget().
CIFS plus the two ext4 KUnit tests (extents-test, mballoc-test) were
the last in-tree callers, and all three convert cleanly to sget_fc().
That lets sget() and its prototype come out, taking ~60 lines that
only existed to be kept in lockstep with sget_fc() on every
publish-path change"
* tag 'vfs-7.2-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: retire sget()
smb: client: convert cifs_smb3_do_mount() to sget_fc()
ext4: convert mballoc KUnit test to sget_fc()
ext4: convert extents KUnit test to sget_fc()
Linus Torvalds [Sun, 14 Jun 2026 21:41:05 +0000 (03:11 +0530)]
Merge tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull openat2 updates from Christian Brauner:
"Features:
- Add O_EMPTYPATH to openat(2)/openat2(2). To get an operable file
descriptor from an O_PATH file descriptor it is possible to use
openat(fd, ".", O_DIRECTORY) for directories, but other file types
require going through open("/proc/<pid>/fd/<nr>") and thus depend
on a functioning procfs.
With O_EMPTYPATH an empty path string is accepted and LOOKUP_EMPTY
is set at path resolution time, allowing to reopen the file behind
the file descriptor directly. Selftests are included.
- Add an OPENAT2_REGULAR flag for openat2(2) which refuses to open
anything but regular files with the new EFTYPE error code.
This implements the "ability to only open regular files" feature
requested by userspace via uapi-group.org and protects services
from being redirected to fifos, device nodes, and friends.
All atomic_open implementations were audited for OPENAT2_REGULAR
handling. Explicit checks were added to ceph, gfs2, nfs (v4), and
cifs/smb - these are the filesystems whose atomic_open can
encounter an existing non-regular file and would otherwise call
finish_open() on it or return a misleading error code.
The remaining implementations (9p, fuse, vboxsf, nfs v2/v3) only
call finish_open() on freshly created files and use
finish_no_open() for lookup hits, letting the VFS catch non-regular
files via the do_open() safety net.
Cleanups:
- Migrate the openat2 selftests to the kselftest harness and move
them under selftests/filesystems/. The tests were written in the
early days of selftests' TAP support and the modern kselftest
harness is much easier to follow and maintain. The contents of the
tests are unchanged and the new emptypath tests are ported on top.
- Make the LAST_XXX last-type constants private to fs/namei.c. The
only user outside of fs/namei.c was ksmbd which only needs to know
whether the last component is a regular one, so
vfs_path_parent_lookup() now performs the LAST_NORM check
internally. The ints are replaced with a dedicated enum last_type"
* tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: replace ints with enum last_type for LAST_XXX
vfs: make LAST_XXX private to fs/namei.c
selftests: openat2: port emptypath_test to kselftest harness
kselftest/openat2: test for OPENAT2_REGULAR flag
openat2: new OPENAT2_REGULAR flag support
openat2: introduce EFTYPE error code
selftest: add tests for O_EMPTYPATH
vfs: add O_EMPTYPATH to openat(2)/openat2(2)
selftests: openat2: migrate to kselftest harness
selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE
selftests: openat2: move helpers to header
selftests: move openat2 tests to selftests/filesystems/
Linus Torvalds [Sun, 14 Jun 2026 21:35:50 +0000 (03:05 +0530)]
Merge tag 'kernel-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc kernel updates from Christian Brauner:
"Fixes
- rhashtable: give each instance its own lockdep class
syzbot reported a circular locking dependency between ht->mutex and
fs_reclaim via the simple_xattrs rhashtable being torn down during
inode eviction.
The predicted deadlock cannot occur: rhashtable_free_and_destroy()
cancels the deferred worker before taking ht->mutex and
acquisitions on distinct rhashtables are on distinct mutexes.
Lockdep flags a cycle anyway because every ht->mutex in the kernel
shared the single static lockdep class from
rhashtable_init_noprof().
The lockdep key is lifted to a per-call-site static key so every
rhashtable instance gets its own class.
- selftests/clone3: fix misuse of the libcap library interface in the
cap_checkpoint_restore test and remove unused variables
- selftests/pid_namespace: compute the pid_max test limits
dynamically instead of hardcoding values below the kernel-enforced
minimum of PIDS_PER_CPU_MIN * num_possible_cpus() which made the
tests fail on machines with many possible CPUs
- selftests: fix the Makefile TARGETS entry for nsfs which wasn't
adjusted when the tests moved under filesystems/
Cleanups
- ipc/sem.c: use unsigned int for nsops to match the declaration in
syscalls.h"
* tag 'kernel-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/clone3: remove unused variables
selftests/clone3: fix libcap interface usage
ipc/sem.c: use unsigned int for nsops
selftests: Fix Makefile target for nsfs
rhashtable: give each instance its own lockdep class
selftests/pid_namespace: compute pid_max test limits dynamically
Linus Torvalds [Sun, 14 Jun 2026 21:30:58 +0000 (03:00 +0530)]
Merge tag 'kernel-7.2-rc1.task_exec_state' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull task_exec_state updates from Christian Brauner:
"This introduces a new per-task task_exec_state structure and relocates
the dumpable mode and the user namespace captured at execve() from
mm_struct onto it. It stays attached to the task for its full
lifetime.
__ptrace_may_access() and several /proc owner and visibility checks
need to consult two pieces of state for any observable task, including
zombies that have already gone through exit_mm(): the dumpable mode
and the user namespace captured at execve(). Both live on mm_struct
today, which exit_mm() clears from the task long before the task is
reaped. A reader that races with do_exit() observes task->mm == NULL
and either fails the check or falls back to init_user_ns - which
denies legitimate access to non-dumpable zombies that were running in
a nested user namespace.
mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so the MMF_DUMP_FILTER_* layout exposed
via /proc/<pid>/coredump_filter stays stable. task->user_dumpable and
its exit_mm() snapshot are removed.
task_exec_state is the privilege domain established by an execve().
Within a thread group it is shared via refcount; across thread groups
each task has its own:
- Non-CLONE_VM clones (fork(), vfork() without CLONE_VM) allocate a
fresh exec_state inheriting the parent's dumpable mode and user_ns.
- execve() in the child allocates a fresh instance and installs it
under task_lock + exec_update_lock via task_exec_state_replace().
- Credential changes (setresuid, capset, ...) and
prctl(PR_SET_DUMPABLE) update dumpability on the current task's
exec_state, i.e., on the thread group's shared instance.
On top of this exec_mmap() no longer tears down the old mm while
holding exec_update_lock for writing and cred_guard_mutex. Neither
lock is needed for that: exec_update_lock only exists to make the mm
swap atomic with the later commit_creds() and all its readers operate
on the new mm; none looks at the detached old mm.
The cost was real: __mmput() runs exit_mmap() over the entire old
address space and can block in exit_aio() waiting for in-flight AIO,
so execve() of a large process blocked ptrace_attach() and every
exec_update_lock reader for the duration of the teardown.
The old mm is now stashed in bprm->old_mm and released from
setup_new_exec() after both locks are dropped, with a backstop in
free_bprm() for the error paths"
* tag 'kernel-7.2-rc1.task_exec_state' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
exec: free the old mm outside the exec locks
exec_state: relocate dumpable information
ptrace: add ptracer_access_allowed()
exec: introduce struct task_exec_state
sched/coredump: introduce enum task_dumpable
Linus Torvalds [Sun, 14 Jun 2026 21:25:34 +0000 (02:55 +0530)]
Merge tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs casefolding updates from Christian Brauner:
"This exposes the case folding behavior of local filesystems so that
file servers - nfsd, ksmbd, and user space file servers - can report
the actual behavior to clients instead of guessing.
Filesystems report case-insensitive and case-nonpreserving behavior
via new file_kattr flags in their fileattr_get implementations. fat,
exfat, ntfs3, hfs, hfsplus, xfs, cifs, nfs, vboxsf, and isofs are
wired up. Local filesystems that are not explicitly handled default to
the usual POSIX behavior of case-sensitive and case-preserving.
nfsd uses this to report case folding via NFSv3 PATHCONF and to
implement the NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
attributes - both have been part of the NFS protocols for decades to
support clients on non-POSIX systems - and ksmbd reports it via
FS_ATTRIBUTE_INFORMATION. Exposing the information through the
fileattr uapi covers user space file servers.
The immediate motivation is interoperability: Windows NFS clients
hard-require servers to report case-insensitivity for Win32
applications to work correctly, and a client that knows the server is
case-insensitive can avoid issuing multiple LOOKUP/READDIR requests
searching for case variants.
The Linux NFS client already grew support for case-insensitive shares
years ago in support of the Hammerspace NFS server - negative dentry
caching must be disabled (a lookup for "FILE.TXT" failing must not
cache a negative entry when "file.txt" exists) and directory change
invalidation must drop cached case-folded name variants. Such servers
often operate in multi-protocol environments where a single file
service instance caters to both NFS and SMB clients, and nfsd needs to
report case folding properly to participate as a first-class citizen
there.
A follow-up series brings fixes for the initial work: the nfsd
case-info probe now uses kernel credentials, maps -ESTALE to
NFS3ERR_STALE, and has its cost capped across READDIR entries; the nfs
client avoids transiently zeroed case capability bits during the probe
and skips the pathconf probe when neither field is consumed; the
FS_CASEFOLD_FL semantics are clarified in the UAPI header; and the
tools UAPI headers are synced"
* tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
nfsd: Cap case-folding probe cost across READDIR entries
nfsd: Map -ESTALE from case probe to NFS3ERR_STALE
nfsd: Use kernel credentials for case-info probe
fs: Clarify FS_CASEFOLD_FL semantics in UAPI header
nfs: Skip pathconf probe when neither field is consumed
nfs: Avoid transient zeroed case capability bits during probe
tools headers UAPI: Sync case-sensitivity flags from linux/fs.h
ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
nfsd: Report export case-folding via NFSv3 PATHCONF
isofs: Implement fileattr_get for case sensitivity
vboxsf: Implement fileattr_get for case sensitivity
nfs: Implement fileattr_get for case sensitivity
cifs: Implement fileattr_get for case sensitivity
xfs: Report case sensitivity in fileattr_get
hfsplus: Report case sensitivity in fileattr_get
hfs: Implement fileattr_get for case sensitivity
ntfs3: Implement fileattr_get for case sensitivity
exfat: Implement fileattr_get for case sensitivity
fat: Implement fileattr_get for case sensitivity
...
Linus Torvalds [Sun, 14 Jun 2026 21:20:44 +0000 (02:50 +0530)]
Merge tag 'vfs-7.2-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs directory delegations from Christian Brauner:
"This contains the VFS prerequisites for supporting directory
delegations in nfsd via CB_NOTIFY callbacks.
The filelock core gains support for ignoring delegation breaks for
directory change events together with an inode_lease_ignore_mask()
helper, and fsnotify gains fsnotify_modify_mark_mask() and a
FSNOTIFY_EVENT_RENAME data type.
With this in place nfsd can request delegations on directories and set
up inotify watches to trigger sending CB_NOTIFY events to clients
instead of having every directory change break the delegation.
New tracepoints are added to fsnotify() and to the start of
break_lease(), and trace_break_lease_block() is passed the currently
blocking lease instead of the new one.
A follow-up fix moves the LEASE_BREAK_* flags out of
#ifdef CONFIG_FILE_LOCKING to fix the build for CONFIG_FILE_LOCKING=n
configurations"
* tag 'vfs-7.2-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
filelock: move LEASE_BREAK_* flags out of #ifdef CONFIG_FILE_LOCKING
fsnotify: add FSNOTIFY_EVENT_RENAME data type
fsnotify: add fsnotify_modify_mark_mask()
fsnotify: new tracepoint in fsnotify()
filelock: add an inode_lease_ignore_mask helper
filelock: add a tracepoint to start of break_lease()
filelock: add support for ignoring deleg breaks for dir change events
filelock: pass current blocking lease to trace_break_lease_block() rather than "new_fl"
Linus Torvalds [Sun, 14 Jun 2026 21:14:23 +0000 (02:44 +0530)]
Merge tag 'vfs-7.2-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
"This extends the lockless ->i_count handling.
iput() could already decrement any value greater than one locklessly
but acquiring a reference always required taking inode->i_lock. Now
acquiring a reference is lockless as long as the count was already at
least 1, i.e., only the 0->1 and 1->0 transitions take the lock.
This avoids the lock for the common cases of nfs calling into the
inode hash and btrfs using igrab(). Cleanup-wise icount_read_once() is
added to line up with inode_state_read_once() and the open-coded
->i_count loads across the tree are converted, and ihold() is
relocated and tidied up.
On top of that some stale lock ordering annotations are retired from
the inode hash code: iunique() no longer takes the hash lock since the
inode hash became RCU-searchable and s_inode_list_lock is no longer
taken under the hash lock either"
* tag 'vfs-7.2-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: retire stale lock ordering annotations from inode hash
fs: allow lockless ->i_count bumps as long as it does not transition 0->1
fs: relocate and tidy up ihold()
fs: add icount_read_once() and stop open-coding ->i_count loads
Linus Torvalds [Sun, 14 Jun 2026 21:08:54 +0000 (02:38 +0530)]
Merge tag 'vfs-7.2-rc1.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull exportfs updates from Christian Brauner:
"This cleans up the exportfs support for block-style layouts that
provide direct block device access: the operations for layout-based
block device access are split out of struct export_operations into a
separate header, ->commit_blocks() no longer takes a struct iattr
argument, and the way support for layout-based block device access is
detected is reworked.
nfsd's blocklayout code also stops honoring loca_time_modify. This is
preparation for supporting export of more than a single device per
file system"
* tag 'vfs-7.2-rc1.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
exportfs,nfsd: rework checking for layout-based block device access support
exportfs: don't pass struct iattr to ->commit_blocks
exportfs: split out the ops for layout-based block device access
nfsd/blocklayout: always ignore loca_time_modify
Linus Torvalds [Sun, 14 Jun 2026 21:04:37 +0000 (02:34 +0530)]
Merge tag 'vfs-7.2-rc1.kfunc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull bpf filesystem kfunc fix from Christian Brauner:
"The bpf_set_dentry_xattr() and bpf_remove_dentry_xattr() kfuncs locked
the inode of the supplied dentry without checking whether the dentry
is negative.
Passing a negative dentry (e.g., from security_inode_create) caused a
NULL pointer dereference. Negative dentries now fail with EINVAL. The
WARN_ON(!inode) in the bpf xattr permission helpers is dropped as well
since it could be triggered the same way, amounting to a denial of
service on systems with panic_on_warn enabled"
* tag 'vfs-7.2-rc1.kfunc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
bpf: fix crash in bpf_[set|remove]_dentry_xattr for negative dentries
Dan Carpenter [Thu, 11 Jun 2026 07:35:28 +0000 (10:35 +0300)]
smb/client: Fix error code in smb2_aead_req_alloc()
The "*num_sgs" variable is a u32 so "ERR_PTR(*num_sgs)" doesn't work.
We would have to do something similar to the previous line where it's
cast to int and then long. However, it's simpler to store the return in
an int ret variable.
This bug would eventually result in a crash when dereference the invalid
error pointer.
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Cc: stable@kernel.org Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Huiwen He [Mon, 8 Jun 2026 15:57:31 +0000 (23:57 +0800)]
smb/client: allow FS_IOC_SETFLAGS to clear compression
The CIFS FS_IOC_SETFLAGS path can set FS_COMPR_FL now, but it cannot
clear it again. This can be reproduced on a share backed by a filesystem
that supports compression, for example btrfs exported by Samba:
The final chattr -c fails with EOPNOTSUPP, and leaves the remote object
with the compressed attribute still set, because the client always sends
FSCTL_SET_COMPRESSION with COMPRESSION_FORMAT_DEFAULT. That is correct
for setting FS_COMPR_FL, but clearing FS_COMPR_FL requires sending
COMPRESSION_FORMAT_NONE.
Fix this by passing the requested compression state through the
set_compression operation. The SMB1 and SMB2 helpers no longer hard-code
COMPRESSION_FORMAT_DEFAULT.
When FS_COMPR_FL is set, send COMPRESSION_FORMAT_DEFAULT. When it is
cleared, send COMPRESSION_FORMAT_NONE. If the server accepts the request,
update the cached FILE_ATTRIBUTE_COMPRESSED bit under i_lock so
FS_IOC_GETFLAGS reports the new state.
Signed-off-by: Huiwen He <hehuiwen@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
Huiwen He [Mon, 8 Jun 2026 15:57:30 +0000 (23:57 +0800)]
smb/client: use writable handle for FS_IOC_SETFLAGS compression
Setting the compressed flag on a CIFS mount can fail with -EACCES:
[compress_share]
vfs objects = btrfs
$ touch test.bin
$ chattr +c test.bin
chattr: Permission denied while setting flags on test.bin
This can be reproduced against a Samba share backed by a filesystem that
supports compression, such as btrfs.
FS_IOC_SETFLAGS is issued on the file handle opened by userspace. chattr
opens the target read-only before setting FS_COMPR_FL, so the SMB client
currently sends FSCTL_SET_COMPRESSION on a handle that may not have
FILE_WRITE_DATA access. Samba requires FILE_WRITE_DATA for
FSCTL_SET_COMPRESSION and rejects the request.
Use the current handle only if it already has FILE_WRITE_DATA. Otherwise
try an existing writable handle for the inode. If none is available, open
a temporary FILE_WRITE_DATA handle for the compression request.
After FSCTL_SET_COMPRESSION succeeds, update the cached compressed
attribute immediately, matching how smb2_set_sparse() updates
FILE_ATTRIBUTE_SPARSE_FILE after a successful FSCTL_SET_SPARSE.
Signed-off-by: Huiwen He <hehuiwen@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
The lsattr reproducer depends on the previous contents of its userspace
buffer, so it may not reproduce on every setup. A deterministic
reproducer is to initialize the ioctl argument before FS_IOC_GETFLAGS
on a file without the compressed attribute:
int flags = 0x7fffffff;
ioctl(fd, FS_IOC_GETFLAGS, &flags);
On an affected kernel, flags remains 0x7fffffff. With the fix, it is
set to 0.
This happens because when the cached inode does not have the compressed
bit set, the CIFS fallback path in FS_IOC_GETFLAGS returns success
without calling put_user() to write the zero flags value into the user
buffer. As a result, the caller observes stale contents from its own
buffer.
Fix this by always writing the visible flags value back to the user
buffer before returning success, even when the value is zero.
Fixes: 64a5cfa6db94 ("Allow setting per-file compression via SMB2/3") Signed-off-by: Huiwen He <hehuiwen@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
Huiwen He [Fri, 5 Jun 2026 16:35:17 +0000 (00:35 +0800)]
smb/client: update i_blocks after contiguous writes
When a lease allows CIFS to use cached inode attributes, getattr may
return the locally cached attributes instead of revalidating them from
the server. After local writes extend a file, the write path updates the
file size, but i_blocks can remain based on the old allocation size.
For example, while the file is still open after two contiguous writes,
the local block count can remain smaller than the written range:
after first write: st_size = 4096, st_blocks = 7
after second write: st_size = 12288, st_blocks = 21
after close: st_size = 12288, st_blocks = 24
This can make a fully written file look sparse:
i_blocks * 512 < i_size
and can cause swap activation to reject a valid write-created swapfile
as having holes. This results in xfstests skipping swap-related tests
on CIFS mounts:
generic/472 [not run] swapfiles are not supported
generic/494 [not run] swapfiles are not supported
generic/497 [not run] swapfiles are not supported
generic/569 [not run] swapfiles are not supported
generic/636 [not run] swapfiles are not supported
generic/643 [not run] swapfiles are not supported
Update the local i_blocks estimate after successful writes, but only
when the write starts at or before the currently known allocated range.
This lets sequential writes grow i_blocks while avoiding treating
write-past-EOF holes as allocated.
Skip the local estimate for files that are already marked sparse, since
their allocation needs to come from the server rather than from a
contiguous-write estimate.
Signed-off-by: Huiwen He <hehuiwen@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
Fredric Cover [Tue, 2 Jun 2026 00:55:10 +0000 (17:55 -0700)]
smb: client: fix races in cifsd thread creation
The cifsd demultiplex thread can run and access tcp_ses before the parent
thread has finished populating tcp_ses, which the worker thread accesses
locklessly.
Also, the kthread_run macro may start the thread before returning the
thread pointer. Because the pointer is part of the structure that the
thread can access, if the kernel is preempted after the thread is spawned,
but before the thread pointer is populated and the thread attempts to exit,
it will sleep, waiting for a SIGKILL signal.
Fix this by moving creation of the thread to after all of tcp_ses'es
fields are populated, and spawning the thread last, using a split
kthread_create/wake_up_process logic.
Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Qihang [Sun, 17 May 2026 08:25:27 +0000 (16:25 +0800)]
cifs: validate full SID length in security descriptors
parse_sid() only verified that the fixed SID header fit in the
returned security descriptor, but did not verify that the full SID
body described by num_subauth was present.
A malicious server can return a truncated owner or group SID whose
header lies within the descriptor buffer while sub_auth[] extends
past the end of the allocation, leading to an out-of-bounds read
when the client later parses or copies that SID.
Validate the full SID body in parse_sid(), centralize owner/group SID
lookup and bounds checking in sid_from_sd(), and use that validation
in parse_sec_desc(), build_sec_desc(), and copy_sec_desc() before
sub_auth[] is accessed.
Signed-off-by: Qihang <q.h.hack.winter@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: resolve SWN tcon from live registrations
cifs_swn_notify() looks up a witness registration by id under
cifs_swnreg_idr_mutex, drops the mutex, and then uses the registration's
cached tcon pointer. That pointer is not a lifetime reference, and it is
not a stable representative once cifs_get_swn_reg() lets multiple tcons
for the same net/share name share one registration id.
A same-share second mount can keep the cifs_swn_reg alive after the first
tcon unregisters and is freed. The registration then still points at the
freed first tcon, so taking tc_lock or incrementing tc_count through
swnreg->tcon only moves the use-after-free earlier. Taking tc_lock while
holding cifs_swnreg_idr_mutex also violates the documented CIFS lock
order.
Fix this by making the registration store only the stable witness
identity: id, net name, share name, and notify flags. When a notify
arrives, copy that identity under cifs_swnreg_idr_mutex, drop the mutex,
then find and pin a live witness tcon that currently matches the net/share
pair under the normal cifs_tcp_ses_lock -> tc_lock order. The notification
path uses that pinned tcon directly and drops the reference when done.
Registration and unregister messages now use the live tcon passed by the
caller instead of a cached tcon in the registration. The final unregister
send is folded into cifs_swn_unregister() while the registration is still
protected by cifs_swnreg_idr_mutex. This removes the previous
find/drop/reacquire raw-pointer window. The release path only removes the
idr entry and frees the stable identity strings.
This preserves the intended one-registration/many-tcon behavior: a
registration id represents a net/share pair, and notify handling acts on a
live representative selected at use time. It also preserves CLIENT_MOVE
ordering for the representative tcon because the old-IP unregister is sent
before cifs_swn_register() sends the new-IP register.
Fixes: fed979a7e082 ("cifs: Set witness notification handler for messages from userspace daemon") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Steve French <stfrench@microsoft.com>
This can be reproduce by following:
unshare -n bash -c "
mkdir -p ${CIFS_MNT}
ip netns attach root 1
ip link add eth0 type veth peer veth0 netns root
ip link set eth0 up
ip -n root link set veth0 up
ip addr add 192.168.0.2/24 dev eth0
ip -n root addr add 192.168.0.1/24 dev veth0
ip route add default via 192.168.0.1 dev eth0
ip netns exec root sysctl net.ipv4.ip_forward=1
ip netns exec root iptables -t nat -A POSTROUTING -s 192.168.0.2 -o
${DEV} -j MASQUERADE
mount -t cifs ${CIFS_PATH} ${CIFS_MNT} -o
vers=3.0,sec=ntlmssp,credentials=${CIFS_CRED},rsize=65536,wsize=65536,cache=none,echo_interval=1
touch ${CIFS_MNT}/a.txt
ip netns exec root iptables -t nat -D POSTROUTING -s 192.168.0.2 -o
${DEV} -j MASQUERADE
"
umount ${CIFS_MNT}
Fixes: 340cea84f691 ("cifs: open files should not hold ref on superblock") Signed-off-by: Jian Zhang <zhangjian496@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: fix conflicting option validation for new mount API
Apply conflicting option validation consistently across all the new
mount API paths, for both mount and remount.
Some checks were only applied during initial mount validation, while
others were handled during option parsing, causing mount and
remount/reconfigure to behave differently.
Move the conflicting option checks into smb3_handle_conflicting_options()
and call it from the common validation paths, including for
multichannel/max_channels handling.
Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Tingmao Wang [Fri, 12 Jun 2026 01:48:53 +0000 (02:48 +0100)]
selftests/landlock: Add tests for quiet flag with net rules
Tests that:
- Quiet flag works on network rules
- Quiet flag applied to unrelated ports has no effect
- Denied access not in quiet_access_net is still logged
This is not as thorough as the fs tests, but given the shared logic it
should be sufficient. There is also no "optional" access for network
rules.
Tingmao Wang [Fri, 12 Jun 2026 01:48:52 +0000 (02:48 +0100)]
selftests/landlock: Add tests for quiet flag with fs rules
Test various interactions of the quiet flag with filesystem rules:
- Non-optional access (tested with open and rename).
- Optional access (tested with truncate and ioctl).
- Behaviour around mounts matches with normal Landlock rules.
- Behaviour around disconnected directories matches with normal Landlock
rules (test expected behaviour of 9a868cdbe66a ("landlock: Fix
handling of disconnected directories") applied to the collected quiet
flag).
- Multiple layers works as expected.