Linus Torvalds [Wed, 24 Jun 2026 01:36:41 +0000 (18:36 -0700)]
Merge tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New features:
- XPRTRDMA: Decouple req recycling from RPC completion
- NFS: Expose FMODE_NOWAIT for read-only files
Bugfixes:
- SUNRPC:
- Fix sunrpc sysfs error handling
- Fix uninitialized xprt_create_args structure
- XPRTRDMA:
- Harden connect and reply handling
- NFS:
- Fix EOF updates after fallocate/zero-range
- Keep PG_UPTODATE clear after read errors in page groups
- Use nfsi->rwsem to protect traversal of the file lock list
- Prevent resource leak in nfs_alloc_server()
- NFSv4:
- Clear exception state on successful mkdir retry
- Don't skip revalidate when holding a dir delegation and attrs are stale
- pNFS:
- Fix use-after-free in pnfs_update_layout()
- Defer return_range callbacks until after inode unlock
- Fix LAYOUTCOMMIT retry loop on OLD_STATEID
- Reject zero-length r_addr in nfs4_decode_mp_ds_addr
- NFS/flexfiles:
- Reject zero-length filehandle version arrays
- Fix checking if a layout is striped
- Fixes for honoring FF_FLAGS_NO_IO_THRU_MDS
Other cleanups and improvements:
- Remove the fileid field from struct nfs_inode
- Move long-delayed xprtrdma work onto the system_dfl_long_wq
- Convert xprtrdma send buffer free list to an llist
- Show "<redacted>" for cert_serial and privkey_serial mount options"
* tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (42 commits)
NFS: Use common error handling code in nfs_alloc_server()
NFS: Prevent resource leak in nfs_alloc_server()
NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr
nfs: don't skip revalidate on directory delegation when attrs flagged stale
xprtrdma: Return sendctx slot after Send preparation failure
xprtrdma: Repost Receive buffers for malformed replies
xprtrdma: Sanitize the reply credit grant after parsing
xprtrdma: Fix bcall rep leak and unbounded peek
xprtrdma: Resize reply buffers before reposting receives
xprtrdma: Check frwr_wp_create() during connect
xprtrdma: Initialize re_id before removal registration
xprtrdma: Fix ep kref imbalance on ADDR_CHANGE
xprtrdma: Convert send buffer free list to llist
NFS: correct CONFIG_NFS_V4 macro name in #endif comment
nfs: use nfsi->rwsem to protect traversal of the file lock list
NFSv4.1/pNFS: fix LAYOUTCOMMIT retry loop on OLD_STATEID
nfs: expose FMODE_NOWAIT for read-only files
nfs: add nowait version of nfs_start_io_direct
NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write
NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors
...
Linus Torvalds [Wed, 24 Jun 2026 00:59:36 +0000 (17:59 -0700)]
Merge tag 'f2fs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"The changes primarily focus on filesystem error reporting, reducing
memory footprint by reverting in-memory data structures used for
runtime validation, honoring FDP hints, and adding trace and debug
logs. In addition, there are critical bug fixes resolving
out-of-bounds read vulnerabilities in inline directory and ACL
handling, potential deadlocks in balance_fs, use-after-free issues in
atomic writes, and false data/node type assignments in large sections.
Enhancements:
- Revert in-memory sit version and block bitmaps
- support to report fserror
- add trace_f2fs_fault_report
- add iostat latency tracking for direct IO
- add logs in f2fs_disable_checkpoint()
- honor per-I/O write streams for direct writes
- map data writes to FDP streams
- skip inode folio lookup for cached overwrite
- skip direct I/O iostat context when disabled
- revert "check in-memory block bitmap"
- revert "check in-memory sit version bitmap"
Fixes:
- optimize representative type determination in GC
- fix incorrect FI_NO_EXTENT handling in __destroy_extent_node()
- fix potential deadlock in f2fs_balance_fs()
- fix potential deadlock in gc_merge path of f2fs_balance_fs()
- atomic: fix UAF issue on f2fs_inode_info.atomic_inode
- fix missing read bio submission on large folio error
- pass correct iostat type for single node writes
- fix to do sanity check on f2fs_get_node_folio_ra()
- validate orphan inode entry count
- keep atomic write retry from zeroing original data
- read COW data with the original inode during atomic write
- validate inline dentry name lengths before conversion
- validate dentry name length before lookup compares it
- reject setattr size changes on large folio files
- revert "remove non-uptodate folio from the page cache in move_data_block"
- validate ACL entry sizes in f2fs_acl_from_disk()
- bound i_inline_xattr_size for non-inline-xattr inodes
- fix listxattr handling of corrupted xattr entries
- fix to round down start offset of fallocate for pin file"
* tag 'f2fs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
f2fs: fix to round down start offset of fallocate for pin file
f2fs: fix listxattr handling of corrupted xattr entries
f2fs: skip direct I/O iostat context when disabled
f2fs: remove unneeded f2fs_is_compressed_page()
f2fs: avoid unnecessary fscrypt_finalize_bounce_page()
f2fs: avoid unnecessary sanity check on ckpt_valid_blocks
f2fs: misc cleanup in f2fs_record_stop_reason()
f2fs: fix wrong description in printed log
f2fs: bound i_inline_xattr_size for non-inline-xattr inodes
f2fs: validate ACL entry sizes in f2fs_acl_from_disk()
Revert "f2fs: remove non-uptodate folio from the page cache in move_data_block"
f2fs: Split f2fs_write_end_io()
f2fs: Rename f2fs_post_read_wq into f2fs_wq
f2fs: Prepare for supporting delayed bio completion
f2fs: reject setattr size changes on large folio files
f2fs: validate dentry name length before lookup compares it
f2fs: validate inline dentry name lengths before conversion
f2fs: read COW data with the original inode during atomic write
f2fs: skip inode folio lookup for cached overwrite
f2fs: keep atomic write retry from zeroing original data
...
Linus Torvalds [Wed, 24 Jun 2026 00:16:31 +0000 (17:16 -0700)]
Merge tag 'x86-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
- Prevent NULL dereference on theoretical missing IO bitmap (Li
RongQing)
* tag 'x86-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioperm: Prevent NULL dereference on theoretical missing IO bitmap
Linus Torvalds [Tue, 23 Jun 2026 23:57:39 +0000 (16:57 -0700)]
Merge tag 'timers-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc timer fixes from Ingo Molnar:
- Fix timekeeping locking order bug in the timekeeping init code
(Mikhail Gavrilov)
- Fix u64 multiplication bug in the posix-cpu-timers code on 32-bit
kernels (Zhan Xusheng)
- Fix macro name in comment block (Ethan Nelson-Moore)
- Fix off-by-one bug in the compat settimeofday() usecs validation code
(Wang Yan)
* tag 'timers-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Fix off-by-one in compat settimeofday() usec validation
hrtimer: Correct CONFIG_NO_HZ_COMMON macro name in comment
posix-cpu-timers: Use u64 multiplication in update_rlimit_cpu()
timekeeping: Register default clocksource before taking tk_core.lock
Linus Torvalds [Tue, 23 Jun 2026 23:43:24 +0000 (16:43 -0700)]
Merge tag 'smp-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc CPU hotplug fixes from Ingo Molnar:
- Fix CPU hotplug error handling rollback bug (Bradley Morgan)
- Fix possible output OOB write bug in the sysfs hotplug states
printing code (Bradley Morgan)
* tag 'smp-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: hotplug: Bound hotplug states sysfs output
cpu: hotplug: Preserve per instance callback errors
Linus Torvalds [Tue, 23 Jun 2026 23:15:53 +0000 (16:15 -0700)]
Merge tag 'locking-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
- Fix the incorrect RCU protection in rt_spin_unlock() (Thomas
Gleixner)
* tag 'locking-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
Linus Torvalds [Tue, 23 Jun 2026 23:05:54 +0000 (16:05 -0700)]
Merge tag 'core-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
- Fix an MM-CID race that can cause an OOB write (Rik van Riel)
- Fix a debugobjects OOM handling race (Thomas Gleixner)
* tag 'core-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Plug race against a concurrent OOM disable
sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
Linus Torvalds [Tue, 23 Jun 2026 23:02:03 +0000 (16:02 -0700)]
Merge tag 'irq-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc irqchip driver fixes from Ingo Molnar:
- Fix indexing bug in the Crossbar irqchip driver (Bhargav Joshi)
- Fix a parent domain resource leak in the Crossbar irqchip driver
(Bhargav Joshi)
- Fix resource leak in the ImgTec PDC irqchip driver's exit logic
(Qingshuang Fu)
- Fix macro name in comment block (Ethan Nelson-Moore)
* tag 'irq-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi: Correct CONFIG_PCI_MSI_ARCH_FALLBACKS macro name in comment
irqchip/imgpdc: Fix resource leak, add missing chained handler cleanup on remove
irqchip/crossbar: Fix parent domain resource leak
irqchip/crossbar: Use correct index in crossbar_domain_free()
Linus Torvalds [Tue, 23 Jun 2026 22:51:14 +0000 (15:51 -0700)]
Merge tag 'dmaengine-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"Core:
- New devm_of_dma_controller_register() API
- Refactor devm_dma_request_chan() API
New Support:
- Loongson Multi-Channel DMA controller support
- Renesas RZ/{T2H,N2H} support
- Dw CV1800B DMA support
- Switchtec DMA engine driver
U pdates:
- Xilinx AXI dma binding conversion
- Renesas CHCTRL register read updates
- AMD MDB Endpoint and non-LL mode Support
- AXI dma handling of SW and HW cyclic transfers termination
- Intel ioatdma and idxd driver updates"
* tag 'dmaengine-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (62 commits)
dt-bindings: dma: snps,dw-axi-dmac: Add fallback compatible for CV1800B
MAINTAINERS: dmaengine/ti: Remove myself and add Vignesh as maintainer
dmaengine: qcom: Unify user-visible "Qualcomm" name
dt-bindings: dma: qcom,gpi: Document GPI DMA engine for Shikra SoC
dmaengine: qcom: hidma: use sysfs_emit() in sysfs show callbacks
dmaengine: dw-axi-dmac: fix PM for system sleep and channel alloc
dmaengine: dw-axi-dmac: drop redundant DMAC enable in block start
dmaengine: altera-msgdma: Use memcpy_toio for descriptor FIFO writes
dt-bindings: dma: fsl-edma: add dma-channel-mask property description
dmaengine: tegra: Fix burst size calculation
dmaengine: iop32x-adma: Remove a leftover header file
dmaengine: dma-axi-dmac: use DMA pool to manange DMA descriptor
dmaengine: dma-axi-dmac: Drop struct clk from main struct
dmaengine: dma-axi-dmac: Properly free struct axi_dmac_desc
dmaengine: Fix possible use after free
dmaengine: dw-edma: Add spinlock to protect DONE_INT_MASK and ABORT_INT_MASK
dmaengine: dw-edma-pcie: Reject devices without driver data
dmaengine: sh: rz-dmac: Add DMA ACK signal routing support
irqchip/renesas-rzv2h: Add DMA ACK signal routing support
dmaengine: dw-edma: Remove dw_edma_add_irq_mask()
...
Linus Torvalds [Tue, 23 Jun 2026 22:41:48 +0000 (15:41 -0700)]
Merge tag 'phy-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"Bunch of new driver, device support in existing drivers/binding and
few updates to existing drivers
New Support:
- Qualcomm Eliza QMP PHY, Eliza Synopsys eUSB2 support, Eliza PCIe
phy support, Nord QMP UFS PHY, IPQ5210 USB3 PHY support
- Econet EN751221 and EN7528 PCIe phy support
- NXPs TJA1145 CAN transceiver phy support
- TI DS125DF111 retimer phy support
- Rockchip RK3528 usb phy support
- TI J722S phy support
- Axiado eMMC PHY driver
- EyeQ5 Ethernet PHY driver
- Generic PHY driver for Lynx 10G SerDes
- Spacemit K3 USB2 PHY support
Updates:
- Tomi helping maintian zynqmp phys
- lynx phy updates to support 25GBASER
- Rockchip GRF for RK3568/RV1108 support
- Qualcomm QSERDES COM v2 support"
* tag 'phy-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (87 commits)
phy: rockchip: inno-usb2: Add missing clkout_ctl_phy kerneldoc
phy: Move MODULE_DEVICE_TABLE next to the table itself
phy: add basic support for NXPs TJA1145 CAN transceiver
dt-bindings: phy: add support for NXPs TJA1145 CAN transceiver
phy: freescale: phy-fsl-imx8qm-lvds-phy: Fix missing pm_runtime_disable() on probe error path
dt-bindings: phy: qcom,qmp-usb: Add ipq5210 USB3 PHY
dt-bindings: phy: qcom,qusb2: Document IPQ5210 compatible
phy: freescale: phy-fsl-imx8qm-lvds-phy: Use synchronous PM runtime put in reset
MAINTAINERS: expand Lynx 28G entry to cover Lynx 10G SerDes
phy: lynx-10g: new driver
dt-bindings: phy: lynx-10g: initial document
phy: lynx-28g: improve phy_validate() procedure
phy: lynx-28g: optimize read-modify-write operation
phy: lynx-28g: add support for big endian register maps
phy: lynx-28g: common probe() and remove()
phy: lynx-28g: make lynx_28g_pll_read_configuration() callable per PLL
phy: lynx-28g: move struct lynx_info definitions downwards
phy: lynx-28g: provide default lynx_lane_supports_mode() implementation
phy: lynx-28g: generalize protocol converter accessors
phy: lynx-28g: common lynx_pll_get()
...
Linus Torvalds [Tue, 23 Jun 2026 20:58:38 +0000 (13:58 -0700)]
Merge tag 'soundwire-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
- Improvements in handling of soundwire groups
- Additional checks flagged by various tools
- Intel driver updates for ghost Realtek device handling in firmware
and adding devices to wake lists
* tag 'soundwire-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: dmi-quirks: Disable ghost Realtek devices
soundwire: only handle alert events when the peripheral is attached
soundwire: intel_ace2x: release bpt_stream when close it
soundwire: intel: Move suspend tracking from trigger to pm suspend
soundwire: intel_auxdevice: Add es9356 to wake_capable_list
soundwire: use krealloc_array to prevent integer overflow
soundwire: increase group->max_size after allocation
soundwire: fix bug in sdw_add_element_group_count found by syzkaller
soundwire: don't program SDW_SCP_BUSCLOCK_SCALE on a unattached Peripheral
soundwire: validate DT compatible before parsing it
soundwire: intel_auxdevice: Add cs42l43b to wake_capable_list
soundwire: stream: sdw_stream_remove_slave(): Check stream is valid
Linus Torvalds [Tue, 23 Jun 2026 20:36:09 +0000 (13:36 -0700)]
Merge tag 'sched_ext-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext tree reorg from Tejun Heo:
"Pure source reorganization with no functional change:
- the kernel/sched/ext* files move into a new kernel/sched/ext/
subdirectory
- the headers and sources are made self-contained so editor tooling
can parse each file on its own"
* tag 'sched_ext-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Move shared helpers from ext.c into internal.h and cid.h
sched_ext: Make kernel/sched/ext/ sources self-contained for clangd
sched_ext: Move sources under kernel/sched/ext/
Provide khugepaged with the capability to collapse anonymous memory
regions to mTHPs
- "Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable
files" (Zi Yan)
Remove the READ_ONLY_THP_FOR_FS check in file_thp_enabled(), so that
khugepaged and MADV_COLLAPSE can run on filesystems with PMD THP
pagecache support even without READ_ONLY_THP_FOR_FS enabled
- "make MM selftests more CI friendly" (Mike Rapoport)
General fixes and cleanups to the MM selftests. Also move more MM
selftests under the kselftest framework, making them more amenable to
ongoing CI testing
- "selftests/mm: fix failures and robustness improvements" and
"selftests/mm: assorted fixes for hmm-tests" (Sayali Patil)
Fix several issues in MM selftests which were revealed by powerpc 64k
pagesize
* tag 'mm-stable-2026-06-23-08-55' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (118 commits)
Revert "mm: limit filemap_fault readahead to VMA boundaries"
mm/vmscan: pass NULL to trace vmscan node reclaim
mm: use mapping_mapped to simplify the code
selftests/mm: fix exclusive_cow test fork() handling
selftests/mm: remove hardcoded THP sizing assumptions in hmm tests
selftests/mm: allow PUD-level entries in compound testcase of hmm tests
mm/gup_test: reject wrapped user ranges
mm/page_frag: reject invalid CPUs in page_frag_test
mm/damon/core: always put unsuccessfully committed target pids
mm: page_isolation: avoid unsafe folio reads while scanning compound pages
mm/shrinker: do not hold RCU lock in shrinker_debugfs_count_show()
selftests: mm: fix and speedup "droppable" test
mm: merge writeout into pageout
MAINTAINERS: add Hao Ge as reviewer for codetag and alloc_tag
selftests/mm: clarify alternate unmapping in compaction_test
selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category
selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero
selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported
selftests/mm: ensure destination is hugetlb-backed in hugetlb-mremap
selftest/mm: register existing mapping with userfaultfd in hugetlb-mremap
...
Linus Torvalds [Tue, 23 Jun 2026 18:34:49 +0000 (11:34 -0700)]
Merge tag 'perf-tools-for-v7.2-1-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Introduce 'perf inject --aslr' to remap ASLR-randomized addresses in
perf.data files, enabling reproducible analysis across runs with
different address space layouts
- Refactor evsel out of sample processing paths: store evsel in struct
perf_sample and remove the redundant evsel parameter from tool APIs,
tracepoint handlers, hist entry iterators, and db-export, simplifying
the entire tool callback chain
- Switch architecture detection from string-based perf_env__arch()
comparisons to the numeric ELF e_machine field across the codebase
(capstone, print_insn, c2c, lock-contention, sort, sample-raw,
machine, header), making cross-analysis more robust
- Overhaul ARM CoreSight ETM tests: add deterministic and named_threads
workloads, speed up basic and disassembly tests, add process
attribution and concurrent threads tests, remove unused workloads and
duplicate tests, queue context packets for the frontend decoder
- Add ARM SPE IMPDEF event decoding for Arm Neoverse N1, store MIDR in
arm_spe_pkt for per-CPU event mapping, handle missing CPU IDs
gracefully
- Refactor libunwind support: remove the libunwind-local backend, make
register reading cross-platform, add RISC-V libunwind support, allow
dynamic selection between libdw and libunwind unwinding at runtime
- Extensive hardening of perf.data parsing against crafted files: add
bounds checks and byte-swap validation for session records, feature
sections, header attributes, BPF metadata, auxtrace errors,
compressed events, CPU maps, build ID notes, and ELF program headers.
Add minimum event size validation and file offset diagnostics
- Fix libdw API contract violations across dwarf-aux, libdw,
probe-finder, annotate-data, and debuginfo subsystems. Fix callchain
parent update in ORDER_CALLER mode, support DWARF line 0 in inline
lists, handle multiple address spaces in callchains
- Fix numerous 'perf sched' bugs: thread reference leaks, memory leaks,
heap overflows with cross-machine recordings, NULL dereferences,
replace BUG_ON assertions with graceful error handling, bounds-check
CPU indices, fix SIGCHLD vs pause() races in sched stats
- Overhaul the build system: move BPF skeleton generation out of
Makefile.perf into bpf_skel.mak, decouple pmu-events from the prepare
target, make beauty generated C code standalone .o files, compile BPF
skeletons with -mcpu=v3, fix continuous rebuilds, various cleanups
- Add 'perf test' JUnit XML reporting with -j/--junit option, split
monolithic test suites into sub-tests, add summary reporting,
refactor parallel poll loop, fix test failures on musl-based systems
- Fix 'perf c2c' memory leaks in hist entry and format list handling,
use-after-free in error paths, bounds-check CPU and node IDs
- Fix 'perf bpf' metadata leaks on duplicate insert and alloc failure,
bounds-check array offsets, validate event sizes and func_info
fields, add NULL checks
- Fix symbols subsystem: bounds-check ELF and sysfs build ID note
iteration, validate p_filesz, fix 32-bit ELF bswap error, fix signed
overflow in size checks, bounds-check .gnu_debuglink section
- Fix tools lib api: null termination in filename__read_int/ull(),
uninitialized stack data in filename__write_int(), snprintf
truncation in mount_overload()
- Replace libbabeltrace with babeltrace2-ctf-writer for CTF conversion
in 'perf data'
- Add RISC-V SDT argument parsing for static tracepoints
- Add 'perf trace --show-cpu' option to display CPU id
- Add 'perf bench sched pipe --write-size' option
- Add a perf-specific .clang-format that overrides some kernel style
behaviors
- Update Intel vendor events for Alder Lake, Arrow Lake, Clearwater
Forest, Emerald Rapids, Granite Rapids, Grand Ridge, Lunar Lake,
Meteor Lake, Panther Lake, Sapphire Rapids, Sierra Forest
- Add IOMMU metrics for AMD and Intel
- Fix AMD event: switch l2_itlb_misses to
bp_l1_tlb_miss_l2_tlb_miss.all
- Add AMD IBS improvements: decode Streaming-store and Remote-Socket
flags, suppress bogus fields on Zen4+, skip privilege test on Zen6+
- Fix 'perf lock contention' SIGCHLD vs pause() race, allow 'mmap_lock'
in -L filter, enable end-timestamp for cgroup aggregation, fix
non-atomic data updates
- Fix 'perf stat' false NMI watchdog warning in aggregation modes,
bounds-check CPU index in topology callbacks, add aggr_nr metric
parser support for uncore scaling
- Fix 'perf timechart' memory leaks, CPU bounds checking,
use-after-free on corrupted callchains
- Add O_CLOEXEC to open() calls and use mkostemp() for temporary files
to prevent file descriptor leaks to child processes
- Fix s390 Python extension TEXTREL by compiling as PIC
- Fix build with ASAN for jitdump
- Fix build failure due to btf_vlen() return type change
* tag 'perf-tools-for-v7.2-1-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (343 commits)
perf bpf: Fix up build failure due to change of btf_vlen() return type
perf dso: Set standard errno on decompression failure
perf bpf: Validate array presence before casting BPF prog info pointers
perf c2c: Fix hist entry and format list leaks in c2c_he_free()
perf c2c: Free format list entries when c2c_hists__init() fails
perf cs-etm: Bounds-check CPU in cs_etm__get_queue()
perf cs-etm: Require full global header in auxtrace_info size check
perf cs-etm: Validate num_cpu before metadata allocation
perf machine: Use snprintf() for guestmount path construction
perf machine: Propagate machine__init() error to callers
perf trace: Guard __probe_ip suppression with evsel__is_probe()
perf evsel: Add lazy-initialized probe type detection helpers
perf evsel: Add no-libtraceevent stubs for evsel__field() and evsel__common_field()
perf cs-etm: Reject CPU IDs that would overflow signed comparison
perf c2c: Free format list entries when releasing c2c hist entries
perf bpf: Bounds-check array offsets in bpil_offs_to_addr()
perf bpf: Reject oversized BPF metadata events that truncate header.size
perf bpf: Validate func_info_rec_size and sub_id in synthesize_bpf_prog_name()
perf sched: Replace (void*)1 sentinel with proper runtime allocation
perf hwmon: Fix fd check to accept fd 0 in hwmon_pmu__describe_items()
...
Linus Torvalds [Tue, 23 Jun 2026 15:31:33 +0000 (08:31 -0700)]
Merge tag 'platform-drivers-x86-v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:
- amd/hfi: Add support for dynamic ranking tables (version 3)
- amd/pmc:
- Add PMC driver support for AMD 1Ah M80H SoC
- Delay suspend for some Lenovo Laptops to avoid keyboard and lid
switch problems after s2idle
* tag 'platform-drivers-x86-v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (115 commits)
platform/x86/intel/pmc: Add NVL PCI IDs for SSRAM telemetry discovery
platform/x86/intel/pmc/ssram: Make PMT registration optional
platform/x86/intel/pmc/ssram: Add ACPI discovery scaffolding
platform/x86/intel/pmc/ssram: Switch to static array with per-index probe state
platform/x86/intel/pmc/ssram: Refactor DEVID/PWRMBASE extraction into helper
platform/x86/intel/pmc/ssram: Add PCI platform data
platform/x86/intel/pmc/ssram: Rename probe and PCI ID table for consistency
platform/x86/intel/pmc: Add ACPI PWRM telemetry driver for Nova Lake S
platform/x86/intel/pmc: Add PMC SSRAM Kconfig description
platform/x86/intel/pmt: Unify header fetch and add ACPI source
platform/x86/intel/pmt: Cache the telemetry discovery header
platform/x86/intel/pmt: Pass discovery index instead of resource
platform/x86/intel/pmt/telemetry: Move overlap check to post-decode hook
platform/x86/intel/pmt/crashlog: Split init into pre-decode
platform/x86/intel/pmt: Add pre/post decode hooks around header parsing
modpost: Handle malformed WMI GUID strings
platform/wmi: Make sysfs attributes const
platform/wmi: Make wmi_bus_class const
hwmon: (dell-smm) Use new buffer-based WMI API
platform/x86: dell-ddv: Use new buffer-based WMI API
...
Linus Torvalds [Tue, 23 Jun 2026 14:47:40 +0000 (07:47 -0700)]
Merge tag 'mailbox-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox updates from Jassi Brar:
"Core:
- add debugfs support for used channels
- fix resource leak on startup failure
- propagate tx error codes
- clarify blocking mode thread support
Drivers:
- exynos: remove unused register definitions
- imx: refactor IRQ handlers, migrate to devm helpers, and other
minor improvements
- mpfs: fix syscon presence check in inbox ISR
- mtk-adsp: fix use-after-free during device teardown
- qcom: add dt-bindings for QCOM Maili, Hawi, Shikra APCS, and Nord
CPUCP platform support"
* tag 'mailbox-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: (23 commits)
mailbox: imx: Don't force-thread the primary handler
mailbox: imx: Move the RXDB part of the mailbox into the threaded handler
mailbox: imx: Move the RX part of the mailbox into the threaded handler
mailbox: imx: Start splitting the IRQ handler in primary and threaded handler
mailbox: imx: Use channel index instead of zero in imx_mu_specific_rx()
mailbox: imx: use devm_of_platform_populate()
mailbox: imx: Use devm_pm_runtime_enable()
mailbox: imx: Add a channel shutdown field
mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx()
dt-bindings: mailbox: qcom: Add IPCC support for Maili Platform
mailbox: add list of used channels to debugfs
mailbox: don't free the channel if the startup callback failed
mailbox: Make mbox_send_message() return error code when tx fails
mailbox: Clarify multi-thread is not supported in blocking mode
mailbox: mtk-adsp: fix UAF during device teardown
mailbox: qcom: Unify user-visible "Qualcomm" name
mailbox: exynos: Drop unused register definitions
dt-bindings: mailbox: qcom: Add IPCC support for Hawi Platform
dt-bindings: mailbox: qcom,cpucp-mbox: Add Hawi compatible
dt-bindings: mailbox: qcom: Add Shikra APCS compatible
...
Linus Torvalds [Tue, 23 Jun 2026 14:39:49 +0000 (07:39 -0700)]
Merge tag 'for-next-tpm-7.2-rc1-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"Only bug fixes"
* tag 'for-next-tpm-7.2-rc1-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: fix event_size output in tpm1_binary_bios_measurements_show
tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in
tpm: tpm2-sessions: wait for async KPP completion in tpm_buf_append_salt
tpm: tpm_tis: Add settle time for some TPMs
tpm: tpm_tis: store entire did_vid
tpm_crb: Check ACPI_COMPANION() against NULL during probe
tpm: tpm_tis_spi: Use wait_woken() in wait_for_tmp_stat()
tpm: Initialize name_size_alg for non-NULL name in tpm_buf_append_name()
tpm: restore timeout for key creation commands
tpm: svsm: constify tpm_chip_ops
Linus Torvalds [Tue, 23 Jun 2026 14:35:37 +0000 (07:35 -0700)]
Merge tag 'linux_kselftest-next-7.2-rc1-second' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more kselftest updates from Shuah Khan:
"Docs:
-remove obsolete wiki link from kselftest.rst
ftrace:
- drop invalid top-level local in test_ownership
- Fix trace_marker_raw test on 64K page kernels"
* tag 'linux_kselftest-next-7.2-rc1-second' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
docs: kselftest: remove link to obsolete wiki
selftests/ftrace: Fix trace_marker_raw test on 64K page kernels
selftests/ftrace: Drop invalid top-level local in test_ownership
Linus Torvalds [Tue, 23 Jun 2026 01:44:48 +0000 (18:44 -0700)]
Merge tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most notable change is the removal of the fscache backend: it has
been deprecated for almost two years, mainly because EROFS file-backed
mounts and fanotify pre-content hooks (together with erofs-utils) now
provide better functionality and simpler codebase. In addition,
fscache has depended on netfslib for years, which is undesirable for
EROFS since it is a local filesystem. More details in [1].
In addition, sparse support has been added to the pcluster layout,
which is helpful for large sparse AI datasets, and map requests for
chunk-based inodes have been optimized to be more efficient as well.
There are also the usual fixes and cleanups.
Summary:
- Report more consecutive chunks of the same type for
each iomap request
- Add sparse support for the pcluster layout
- Update the EROFS documentation overview
- Remove the deprecated fscache backend
- Various fixes and cleanups"
Link: https://lore.kernel.org/r/20260622013622.934174-1-hsiangkao@linux.alibaba.com
* tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: handle 48-bit blocks_hi for compressed inodes
erofs: remove fscache backend entirely
erofs: simplify RCU read critical sections
erofs: add sparse support to pcluster layout
erofs: add folio order to trace_erofs_read_folio
erofs: introduce erofs_map_chunks()
erofs: call erofs_exit_ishare() before rcu_barrier()
erofs: update the overview of the documentation
erofs: clean up erofs_ishare_fill_inode()
Tejun Heo [Mon, 22 Jun 2026 17:29:39 +0000 (07:29 -1000)]
sched_ext: Move shared helpers from ext.c into internal.h and cid.h
idle.c and cid.c are included into build_policy.c together with ext.c and
use helpers that ext.c defines. Because the helpers live in ext.c, the two
files can not parse as standalone units and clangd reports errors in them.
Move the helpers to the headers they belong to. The op-dispatch macros and
helpers plus scx_parent() to internal.h, and scx_cpu_arg()/scx_cpu_ret() to
cid.h. No functional change. idle.c and cid.c now parse clean standalone.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
Tejun Heo [Mon, 22 Jun 2026 17:29:39 +0000 (07:29 -1000)]
sched_ext: Make kernel/sched/ext/ sources self-contained for clangd
The sources under kernel/sched/ext/ build as a single translation unit:
build_policy.c includes the source files and headers. An LSP/clangd editor
parses each as a standalone unit, sees no types, and reports a flood of
errors.
Give each header its dependencies and include guard, and have each source
include the headers it uses.
ext.c, arena.c and the ext headers now parse clean standalone. idle.c and
cid.c still reference a few macros and helpers defined in ext.c. The next
patch moves those to shared headers.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
Sunmin Jeong [Mon, 22 Jun 2026 05:28:17 +0000 (14:28 +0900)]
f2fs: fix to round down start offset of fallocate for pin file
Currently, the length of fallocate for pin file is section-aligned to
keep allocated sections from being selected as victims of GC. However,
for the case that the start offset of fallocate is not aligned in
section, the allocated sections can't be fully utilized. It's because a
new section is allocated by f2fs_allocate_pinning_section() after using
blks_per_sec blocks regardless of the start offset. As a result, several
unexpected dirty segments may be created, including blocks assigned to
the pinned file.
To address this issue, let's round down the start offset of fallocate
to the length of section.
The reproducing scenario is as below
chunk=$(((2<<20)+4096)) # 2MB + 4KB
touch test
f2fs_io pinfile set test
f2fs_io fallocate 0 0 $chunk test
f2fs_io fallocate 0 $chunk $chunk test
f2fs_io fallocate 0 $((chunk*2)) $chunk test
f2fs_io fiemap 0 $((chunk*3)) test
Keshav Verma [Mon, 22 Jun 2026 15:14:21 +0000 (20:44 +0530)]
f2fs: fix listxattr handling of corrupted xattr entries
Validate the xattr entry before reading its fields in f2fs_listxattr().
Return -EFSCORRUPTED when the entry is outside the valid xattr storage
area instead of returning a successful partial result.
Fixes: 688078e7f36c ("f2fs: fix to avoid memory leakage in f2fs_listxattr") Cc: stable@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Keshav Verma <iganschel@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wenjie Qi [Tue, 16 Jun 2026 03:06:55 +0000 (11:06 +0800)]
f2fs: skip direct I/O iostat context when disabled
F2FS iostat is optional and is disabled by default. Direct I/O still
allocates and binds a bio_iostat_ctx, updates the submit timestamp, and
replaces bi_end_io for every DIO bio even when sbi->iostat_enable is
false.
The byte accounting calls do not need an extra guard because
f2fs_update_iostat() already checks sbi->iostat_enable. Only skip the
DIO bio context setup when iostat is disabled. If iostat is enabled
through sysfs before submission, the existing context allocation and
latency accounting path is still used.
QEMU benchmark on a 1GiB F2FS virtio-blk image, with iostat_enable=0,
4KiB O_DIRECT I/O over a 64MiB file, 50000 iterations per run:
fscrypt_finalize_bounce_page() should be called only if we use fs layer
crypto, let's avoid unnecessary fscrypt_finalize_bounce_page() in error
path of f2fs_write_compressed_pages().
BTW, fscrypt_finalize_bounce_page() will check mapping of bounced page
before retrieving original page, so, previously it won't cause any issue
w/ fscrypt_finalize_bounce_page(), but still we'd better avoid coupling
w/ any logic inside fscrypt_finalize_bounce_page().
Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Jun 2026 13:08:19 +0000 (21:08 +0800)]
f2fs: avoid unnecessary sanity check on ckpt_valid_blocks
The calculation of sec->ckpt_valid_blocks are the same in both
set_ckpt_valid_blocks() and sanity_check_valid_blocks(), so it
doesn't necessary to call sanity_check_valid_blocks() right after
set_ckpt_valid_blocks().
Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Bryam Vargas [Fri, 12 Jun 2026 04:00:36 +0000 (23:00 -0500)]
f2fs: bound i_inline_xattr_size for non-inline-xattr inodes
When the flexible_inline_xattr feature is enabled, do_read_inode() loads
the on-disk i_inline_xattr_size unconditionally:
if (f2fs_sb_has_flexible_inline_xattr(sbi))
fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
but sanity_check_inode() only range-checks it when the inode also has the
FI_INLINE_XATTR flag set. An inode that carries an inline dentry or inline
data but not FI_INLINE_XATTR -- the normal layout for an inline
directory -- therefore keeps a fully attacker-controlled
i_inline_xattr_size from a crafted image.
get_inline_xattr_addrs() returns that value with no flag gating, so it
feeds the inode geometry:
A large i_inline_xattr_size drives MAX_INLINE_DATA() and NR_INLINE_DENTRY()
negative, so make_dentry_ptr_inline() sets d->max (int) to a negative
value. The inline directory walk then compares an unsigned long bit_pos
against that negative d->max, which is promoted to a huge unsigned bound,
and reads far past the inline area:
Mounting a crafted image and reading such a directory triggers an
out-of-bounds read in f2fs_fill_dentries(); the same underflow also
corrupts ADDRS_PER_INODE for regular files.
Validate i_inline_xattr_size against MAX_INLINE_XATTR_SIZE whenever the
flexible_inline_xattr feature is enabled -- i.e. whenever the value is
loaded from disk and consumed -- and keep the lower MIN_INLINE_XATTR_SIZE
bound gated on inodes that actually carry an inline xattr, so legitimate
inodes with i_inline_xattr_size == 0 are still accepted.
Zhang Cen [Mon, 15 Jun 2026 07:19:54 +0000 (15:19 +0800)]
f2fs: validate ACL entry sizes in f2fs_acl_from_disk()
f2fs_acl_count() only validates the aggregate ACL xattr length. A
malformed ACL can still place ACL_USER or ACL_GROUP in a slot that only
contains struct f2fs_acl_entry_short bytes, and f2fs_acl_from_disk()
then reads entry->e_id before verifying that a full entry fits.
Require a short entry before reading e_tag and e_perm, and require a
full entry before reading e_id for ACL_USER and ACL_GROUP. Return
-EFSCORRUPTED from these new truncated-entry checks, while keeping the
pre-existing -EINVAL paths unchanged.
Validation reproduced this kernel report:
KASAN slab-out-of-bounds in __f2fs_get_acl+0x6fb/0x7e0
RIP: 0033:0x7f4b835ea7aa
The buggy address belongs to the object at ffff888114589960 which belongs
to the cache kmalloc-8 of size 8
The buggy address is located 0 bytes to the right of allocated 8-byte
region [ffff888114589960, ffff888114589968)
Read of size 4
Call trace:
dump_stack_lvl+0x66/0xa0 (?:?)
print_report+0xce/0x630 (?:?)
__f2fs_get_acl+0x6fb/0x7e0 (fs/f2fs/acl.c:169)
srso_alias_return_thunk+0x5/0xfbef5 (?:?)
__virt_addr_valid+0x224/0x430 (?:?)
kasan_report+0xe0/0x110 (?:?)
__f2fs_get_acl+0x5/0x7e0 (fs/f2fs/acl.c:169)
__get_acl+0x281/0x380 (?:?)
vfs_get_acl+0x10b/0x190 (?:?)
do_get_acl+0x2a/0x410 (?:?)
do_get_acl+0x9/0x410 (?:?)
do_getxattr+0xe8/0x260 (?:?)
filename_getxattr+0xd1/0x140 (?:?)
do_getname+0x2d/0x2d0 (?:?)
path_getxattrat+0x16c/0x200 (?:?)
lock_release+0xc8/0x290 (?:?)
cgroup_update_frozen+0x9d/0x320 (?:?)
lockdep_hardirqs_on_prepare+0xea/0x1a0 (?:?)
trace_hardirqs_on+0x1a/0x170 (?:?)
_raw_spin_unlock_irq+0x28/0x50 (?:?)
do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87)
entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?)
The kernel panics are keeping to be reported especially when the f2fs
partition get almost full. By investigation, we find that the reason is
one f2fs page got freed to buddy without being deleted from LRU and the
root cause is the race happened in [2] which is enrolled by this commit.
There are 3 race processes in this scenario, please find below for their
main activities.
The changed code in move_data_block() lets the GC path evict the tail-end
folio from the page cache through folio_end_dropbehind(). Once
folio_unmap_invalidate() removes the folio from mapping->i_pages, the
page-cache references for all pages in the folio are dropped. The folio
is then kept alive only by temporary external references, which allows a
later split to operate on a folio whose subpages are no longer protected
by page-cache references.
After the page-cache references are gone, split_folio_to_order() can
split the big folio into individual pages and put the resulting subpages
back on the LRU. For tail pages beyond EOF, split removes them from the
page cache and drops their page-cache references. A tail page can then
remain on the LRU with PG_lru set while holding only the split caller's
temporary reference. When free_folio_and_swap_cache() drops that final
reference, the page enters the final folio_put() release path.
In parallel, folio_isolate_lru() can observe the same tail page with a
non-zero refcount and PG_lru set. It clears PG_lru before taking its own
reference. If this races with the final folio_put() from the split path,
__folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
The page is then freed back to the allocator while its lru links are
still present in the LRU list. A later LRU operation on a neighboring
page detects the stale link and reports list corruption.
Wenjie Qi [Wed, 10 Jun 2026 14:37:35 +0000 (22:37 +0800)]
f2fs: reject setattr size changes on large folio files
F2FS large folios are only enabled for immutable non-compressed files.
Writable open and writable mmap reject such mappings, but truncate(2)
through f2fs_setattr() misses the same guard.
If FS_IMMUTABLE_FL is cleared while the inode is still cached, the mapping
can keep large-folio support and ATTR_SIZE can change i_size. Reject size
changes in that state.
Cc: stable@kernel.org Fixes: 05e65c14ea59 ("f2fs: support large folio for immutable non-compressed case") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Samuel Moelius [Wed, 3 Jun 2026 16:11:26 +0000 (16:11 +0000)]
f2fs: validate dentry name length before lookup compares it
The f2fs dentry lookup path can use the on-disk name length before
checking that the name fits in the dentry filename area. A corrupted
dentry can then make lookup read beyond the filename slots.
The bounds check needs to happen before any comparison that consumes
the name length from disk.
Reject dentries with invalid name lengths before comparing their names.
Assisted-by: Codex:gpt-5.5-cyber-preview Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Samuel Moelius [Wed, 3 Jun 2026 15:11:40 +0000 (15:11 +0000)]
f2fs: validate inline dentry name lengths before conversion
Inline dentry conversion copies names out of the inline dentry area
before checking that each recorded name length fits in the available
filename slots.
A corrupted image can therefore make the conversion path read past
the inline filename storage while building the regular dentry block.
Validate each inline dentry name length against the inline filename
area before copying it.
Assisted-by: Codex:gpt-5.5-cyber-preview Signed-off-by: Samuel Moelius <samuel.moelius@trailofbits.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Mikhail Lobanov [Mon, 15 Jun 2026 11:36:13 +0000 (14:36 +0300)]
f2fs: read COW data with the original inode during atomic write
When updating an atomic-write file, f2fs_write_begin() may read the
previously written data back from the COW inode:
prepare_atomic_write_begin() locates the block in the COW inode and sets
use_cow, and the read bio is then built with the COW inode:
and f2fs_grab_read_bio() decides whether to schedule fs-layer decryption
(STEP_DECRYPT) for the bio based on that inode via
fscrypt_inode_uses_fs_layer_crypto().
However, the folio being filled belongs to the original inode
(folio->mapping->host == inode), and the data stored in the COW block was
encrypted (or left as plaintext) using the original inode's context, not
the COW inode's -- see f2fs_encrypt_one_page(), which keys off
fio->page->mapping->host. fscrypt_decrypt_pagecache_blocks() likewise
operates on folio->mapping->host.
The COW inode is created as a tmpfile in the parent directory and inherits
its encryption policy from there. With test_dummy_encryption the newly
created COW inode gets the dummy policy and becomes encrypted, while a
pre-existing regular file -- created before the policy applied, e.g.
already present in the on-disk image -- stays unencrypted. The read
path then sets STEP_DECRYPT based on the encrypted COW inode and calls
fscrypt_decrypt_pagecache_blocks() on a folio whose host (the unencrypted
original inode) has a NULL ->i_crypt_info, dereferencing it:
Oops: general protection fault, probably for non-canonical address ...
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
RIP: 0010:fscrypt_decrypt_pagecache_blocks+0xa0/0x310
Workqueue: f2fs_post_read_wq f2fs_post_read_work
Call Trace:
fscrypt_decrypt_bio+0x1eb/0x340
f2fs_post_read_work+0xba/0x140
process_one_work+0x91c/0x1a40
worker_thread+0x677/0xe90
kthread+0x2bc/0x3a0
The COW inode is only needed to locate the on-disk block, and that block
address is already resolved into @blkaddr by prepare_atomic_write_begin()
via __find_data_block(cow_inode, ...); f2fs_submit_page_read() then reads
from that physical @blkaddr directly, so the inode argument only selects
the post-read crypto context, not which block is fetched. Reading with
@inode therefore returns the same (latest, not-yet-committed) COW data,
while making both the fs-layer decryption decision and the inline crypto
path use the correct (original inode's) key.
With the COW inode no longer used at the read site, the use_cow flag has no
remaining consumer; drop it from f2fs_write_begin() and
prepare_atomic_write_begin().
Fixes: 591fc34e1f98 ("f2fs: use cow inode data when updating atomic write") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wenjie Qi [Fri, 29 May 2026 02:29:24 +0000 (10:29 +0800)]
f2fs: skip inode folio lookup for cached overwrite
prepare_write_begin() first gets the inode folio and builds a dnode,
then checks the read extent cache. For an ordinary overwrite of a
non-inline and non-compressed file, an extent-cache hit already gives the
data block address and the following path does not need to allocate or
update any node state.
Check the read extent cache before fetching the inode folio for that
narrow case. Keep the existing paths for inline data, compressed files,
and writes that may extend past EOF, where the helper may need inline
conversion, compression preparation, or block reservation.
This avoids a node-folio lookup in the buffered overwrite fast path when
the mapping is already cached.
In a QEMU/KASAN x86_64 VM, using a small buffered overwrite workload on
an existing 1MiB file, median time improved as follows:
Wenjie Qi [Wed, 27 May 2026 12:06:28 +0000 (20:06 +0800)]
f2fs: keep atomic write retry from zeroing original data
A partial atomic write reserves a block in the COW inode before reading the
original data page for the untouched bytes in that page.
If that read fails, write_begin returns an error but leaves the COW inode
entry as NEW_ADDR. A retry of the same partial write then finds the COW
entry, treats it as existing COW data, and f2fs_write_begin() zeroes the
whole folio because blkaddr is NEW_ADDR.
If the retry is committed, the bytes outside the retried write range are
committed as zeroes instead of preserving the original file contents.
Only use the COW inode as the read source when it already has a real data
block. If the COW entry is still NEW_ADDR, treat it as a reservation to
reuse: keep reading the old data from the original inode and avoid
reserving or accounting the same atomic block again.
Cc: stable@kernel.org Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wenjie Qi [Tue, 26 May 2026 05:35:57 +0000 (13:35 +0800)]
f2fs: validate orphan inode entry count
f2fs_recover_orphan_inodes() trusts the orphan block entry_count when
replaying orphan inodes from the checkpoint pack. A corrupted entry_count
larger than F2FS_ORPHANS_PER_BLOCK makes the recovery loop read past the
ino[] array and interpret footer or following data as inode numbers.
On a crafted image, mounting an unpatched kernel can drive orphan recovery
into f2fs_bug_on() and panic the kernel. Validate entry_count before
consuming entries so corrupted checkpoint data fails the mount with
-EFSCORRUPTED and requests fsck instead.
Set ERROR_INCONSISTENT_ORPHAN as well, so the corruption reason can be
recorded in the superblock s_errors[] field. This gives fsck a persistent
hint even though mount-time orphan recovery failure may leave no chance to
persist SBI_NEED_FSCK through a checkpoint.
Wenjie Qi [Fri, 22 May 2026 06:12:06 +0000 (14:12 +0800)]
f2fs: honor per-I/O write streams for direct writes
io_uring can pass a per-I/O write stream through kiocb->ki_write_stream,
and block direct I/O propagates that value to bio->bi_write_stream.
F2FS added FDP stream mapping for DATA writes, but its direct write
submit hook always rewrites bio->bi_write_stream from the inode write
hint and F2FS temperature. As a result, a direct write with an explicit
io_uring write_stream is submitted to the F2FS-selected stream instead
of the user-requested stream.
Validate an explicit write stream before starting F2FS direct I/O, pass
the kiocb through the iomap private pointer, and preserve the per-I/O
stream in the direct write bio. When no per-I/O stream is supplied, keep
using the existing F2FS temperature-to-stream mapping.
Fixes: 42f7a7a50a33 ("f2fs: map data writes to FDP streams") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The root cause should be: in the corrupted inode, there is a direct node
which has the same ino and nid in its footer, so in f2fs_do_truncate_blocks(),
after f2fs_get_dnode_of_data() finds such dnode:
1) ADDRS_PER_PAGE(dn.node_folio, inode) will return 923
2) once dn.ofs_in_node points to addr[923, 1017]
Then it will trigger the system panic.
Let's introduce NODE_TYPE_NON_IXNODE to indicate current node should
not be an inode or xattr node, and then use it in below path to detect
inconsistent node chain in inode mapping table:
- f2fs_do_truncate_blocks
- f2fs_get_dnode_of_data
- f2fs_get_node_folio_ra
- __get_node_folio
- f2fs_sanity_check_node_footer
- case NODE_TYPE_NON_IXNODE -> check whether it is inode|xnode
Chao Yu [Fri, 22 May 2026 06:59:12 +0000 (14:59 +0800)]
Revert: "f2fs: check in-memory sit version bitmap"
Commit ae27d62e6bef ("f2fs: check in-memory sit version bitmap") added
a mirror for sit version bitmap, it expects to detect in-memory
corruption, however we never got any reports from the check points
for almost decade, let's remove the code, it can help to save
memories.
Cc: wallentx <william.allentx@gmail.com> Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 22 May 2026 06:59:11 +0000 (14:59 +0800)]
Revert: "f2fs: check in-memory block bitmap"
Commit 355e78913c0d ("f2fs: check in-memory block bitmap") added
a mirror for valid block bitmap, it expects to detect in-memory
corruption, however we never got any reports from the check points
for almost decade, let's remove the code, it can help to save
memories.
Cc: wallentx <william.allentx@gmail.com> Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wenjie Qi [Thu, 21 May 2026 10:37:48 +0000 (18:37 +0800)]
f2fs: avoid false shutdown fserror reports
F2FS records image errors and checkpoint-stop reasons through the same
s_error_work worker. The ordinary f2fs_handle_error() path only updates
s_errors, but the worker still calls fserror_report_shutdown()
unconditionally after committing the superblock.
As a result, a metadata corruption report can be followed by a synthetic
FAN_FS_ERROR event with ESHUTDOWN and an invalid superblock file handle,
even though no stop reason was recorded.
Track whether save_stop_reason() actually changed the stop_reason array
and only report the shutdown fserror for that case. Pure s_errors updates
still commit the superblock, but no longer generate a false shutdown event.
Fixes: 50faed607d32 ("f2fs: support to report fserror") Cc: stable@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wenjie Qi [Thu, 21 May 2026 03:16:18 +0000 (11:16 +0800)]
f2fs: validate compress cache inode only when enabled
F2FS_COMPRESS_INO() uses NM_I(sbi)->max_nid as the synthetic inode
number for the compressed page cache inode. That inode only exists when
the compress_cache mount option is enabled.
When compress_cache is disabled, max_nid is outside the valid inode
range. A corrupted directory entry that points to ino == max_nid should
therefore be rejected by f2fs_check_nid_range(). However, is_meta_ino()
currently treats F2FS_COMPRESS_INO() as a meta inode unconditionally,
so f2fs_iget() bypasses do_read_inode() and its nid range check, and
instantiates a fake internal inode instead.
Gate the compressed cache inode case on COMPRESS_CACHE, matching
f2fs_init_compress_inode(). With compress_cache disabled, ino ==
max_nid now follows the normal inode path and is rejected as an
out-of-range nid.
Wenjie Qi [Wed, 20 May 2026 12:07:05 +0000 (20:07 +0800)]
f2fs: pass correct iostat type for single node writes
f2fs_write_single_node_folio() takes an io_type argument, but still
passes FS_GC_NODE_IO to __write_node_folio() unconditionally.
This was harmless while the helper was only used by
f2fs_move_node_folio(), whose caller passes FS_GC_NODE_IO. However,
commit fe9b8b30b971 ("f2fs: fix inline data not being written to disk
in writeback path") made f2fs_inline_data_fiemap() call the helper with
FS_NODE_IO for FIEMAP_FLAG_SYNC.
Honor the caller supplied io_type so inline-data FIEMAP sync writeback is
accounted as normal node IO instead of GC node IO, while the GC path
continues to pass FS_GC_NODE_IO explicitly.
Cc: stable@kernel.org Fixes: fe9b8b30b971 ("f2fs: fix inline data not being written to disk in writeback path") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wenjie Qi [Wed, 20 May 2026 09:52:04 +0000 (17:52 +0800)]
f2fs: fix missing read bio submission on large folio error
f2fs_read_data_large_folio() can keep a read bio across multiple
readahead folios. If a later folio hits an error before any of its
blocks are added to the bio, folio_in_bio is false and the current error
path returns immediately after ending that folio.
This can leave the bio accumulated for earlier folios unsubmitted. Those
folios then never receive read completion, and readers can wait
indefinitely on the locked folios.
Route errors through the common out path so any pending bio is submitted
before returning. Stop consuming more readahead folios once an error is
seen, and only wait on and clear the current folio when it was actually
added to the bio.
Cc: stable@kernel.org Fixes: a5d8b9d94e18 ("f2fs: fix to unlock folio in f2fs_read_data_large_folio()") Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 19 May 2026 01:14:38 +0000 (01:14 +0000)]
f2fs: fix potential deadlock in gc_merge path of f2fs_balance_fs()
When we mount device w/ gc_merge mount option, we may suffer below
potential deadlock:
Kworker GC trehad Truncator
- f2fs_write_cache_pages
- f2fs_write_single_data_page
- f2fs_do_write_data_page
- folio_start_writeback --- set writeback flag on folio
- f2fs_outplace_write_data
: cached folio in internal bio cache
- f2fs_balance_fs
- wake_up(gc_thread)
: wake up gc thread to run foreground GC
- finish_wait(fggc_wq)
: wait on the waitqueue --- wait on GC thread to finish the work
- truncate_inode_pages_range
- __filemap_get_folio(, FGP_LOCK) --- lock folio
- truncate_inode_partial_folio
- folio_wait_writeback --- wait on writeback being cleared
- do_garbage_collect
- move_data_page
- f2fs_get_lock_data_folio
- lock on folio --- blocked on folio's lock
In order to avoid such deadlock, let's call below functions to commit
cached bios in GC_MERGE path of f2fs_balance_fs() as the same as we did
in NOGC_MERGE path.
- f2fs_submit_merged_write(sbi, DATA);
- f2fs_submit_all_merged_ipu_writes(sbi);
liujinbao1 [Wed, 13 May 2026 14:14:36 +0000 (22:14 +0800)]
f2fs: add iostat latency tracking for direct IO
F2FS did not collect iostat latency for direct IO reads and writes,
hook iomap_dio_ops.submit_io to bind an iostat context and record the
submission timestamp. Replace bi_end_io with f2fs_dio_end_bio() to
collect IO latency on completion before calling back to the original
iomap_dio_bio_end_io(), to add iostat latency tracking support for
F2FS DIO.
Daeho Jeong [Thu, 14 May 2026 20:55:13 +0000 (13:55 -0700)]
f2fs: optimize representative type determination in GC
In large section mode, do_garbage_collect() previously determined the
section's representative type by looking only at the first segment of
the section. However, if data was fsynced into an area previously used
as a node section, and this area is recovered during roll-forward
recovery after sudden power off (SPO), GC would incorrectly assume the
section's type based on an empty or obsolete first segment. This caused
the recovered data segment to be misunderstood as being stuck inside a
node section, triggering false inconsistency panics (Inconsistent
segment type in SSA and SIT) and subsequent mount failures.
This patch optimizes do_garbage_collect() to determine the section's
representative type by identifying the first segment that actually
contains valid blocks (valid_blocks > 0) during the main GC loop. This
eliminates false alarms from empty/obsolete leading segments while
maintaining strict section-level type consistency checks for genuine
corruption.
Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
liujinbao1 [Wed, 6 May 2026 09:57:31 +0000 (17:57 +0800)]
f2fs: Add trace_f2fs_fault_report
Add trace_f2fs_fault_report to trigger reporting upon f2fs_bug_on,
need_fsck, stop_checkpoint, and handle_eio. Since f2fs_bug_on and
need_fsck can be triggered in hundreds of scenarios, define set_sbi_flag
as a macro to help capture the effective fault function and line number.
Cen Zhang [Tue, 5 May 2026 12:55:10 +0000 (20:55 +0800)]
f2fs: annotate lockless NAT counter reads
nat_cnt[] is updated while callers hold nat_tree_lock, but F2FS samples
the counters locklessly in f2fs_available_free_memory(),
excess_dirty_nats(), and excess_cached_nats(). Those helpers only steer
cache reclaim and background sync heuristics; they do not control NAT
entry lifetime or checkpoint correctness.
Document the intent with data_race(READ_ONCE()) and a short comment
instead of adding locking to the balance path.
Cen Zhang [Wed, 6 May 2026 01:07:09 +0000 (09:07 +0800)]
f2fs: annotate lockless last_time[] accesses
f2fs stores mount-wide activity timestamps in sbi->last_time[] and
samples them from background discard, GC, and balance paths without a
dedicated lock. The timestamps are used as best-effort heuristics to
decide whether background work should run now or sleep a bit longer.
The current helpers use plain loads and stores, so KCSAN can report races
between frequent foreground updates and background readers. Exact
freshness is not required here, but the intentional lockless accesses
should be marked explicitly.
Use WRITE_ONCE() in f2fs_update_time() and READ_ONCE() in
f2fs_time_over() and f2fs_time_to_wait(). This preserves the existing
heuristic behavior and avoids adding locking to hot paths.
Linus Torvalds [Mon, 22 Jun 2026 19:43:16 +0000 (12:43 -0700)]
Merge tag 'staging-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here is the big set of staging driver updates for 7.2-rc1.
Nothing major in here, just constant grind of tiny cleanups and coding
style fixes and wrapper removals. Overall more code was removed than
added, always a nice sign that things are progressing forward.
Changes outside of drivers/staging/ was due to the octeon driver
changes, which for some reason also lives partially in the mips
subsystem, someday that all will be untangled and cleaned up, or just
removed entirely, it's hard to tell which is going to be its fate.
Other than octeon driver cleanups, in here are the usual:
- rtl8723bs driver reworking and cleanups, being the bulk of this
merge window given all of the issues and wrappers involved in that
beast of a driver
- most driver cleanups
- sm750fb driver cleanups (which might be done, as this really should
be moved to the drm layer one of these days...)
- other tiny staging driver cleanups and fixes
All of these have been in linux-next for many weeks with no reported
issues"
* tag 'staging-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (199 commits)
staging: most: video: avoid double free on video register failure
staging: sm750: rename CamelCase variable Bpp to bpp
staging: rtl8723bs: delete superfluous switch statement
staging: sm750fb: Mark g_noaccel, g_nomtrr and g_dualview as __ro_after_init
staging: rtl8723bs: propagate errno through hal xmit path
staging: rtl8723bs: propagate errno through xmit enqueue path
staging: rtl8723bs: convert rtw_xmit_classifier to return errno
staging: rtl8723bs: make rtw_xmit_classifier static
staging: rtl8723bs: simplify rtw_xmit_classifier control flow
staging: rtl8723bs: make _rtw_enqueue_cmd return 0 on success
staging: rtl8723bs: simplify rtw_enqueue_cmd control flow
staging: rtl8723bs: make _rtw_enqueue_cmd static
staging: rtl8723bs: simplify _rtw_enqueue_cmd control flow
staging: rtl8723bs: fix multiple blank lines in more hal/ files
staging: rtl8723bs: remove unused TXDESC_64_BYTES code
staging: rtl8723bs: remove unused DBG_XMIT_BUF and DBG_XMIT_BUF_EXT code
staging: rtl8723bs: fix multiple blank lines in hal/Hal* files
staging: rtl8723bs: fix multiple blank lines in hal/ files
staging: rtl8723bs: rtw_mlme: add blank line for readability
staging: rtl8723bs: rtw_mlme: wrap rtw_sitesurvey_cmd condition
...
Linus Torvalds [Mon, 22 Jun 2026 19:06:22 +0000 (12:06 -0700)]
Merge tag 'spdx-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here is a "big" set of SPDX-like patches for 7.2-rc1. It is the
addition of the ability for the kernel build process to generate a
Software Bill of Materials (SBOM) in the SPDX format, that matches up
exactly with just the files that are actually built for the specific
kernel image generated.
To generate a sbom, after the kernel has been built, just do:
make sbom
and marvel at the JSON file that is generated...
This is needed by users for environments in which a SBOM is required
(medical, automotive, anything shipped in the EU, etc.) and cuts down
by a massive size the "naive" SBOM solution that many vendors have
done by just including _all_ of the kernel files in the resulting
document.
This result is still a giant JSON file, that I am told parses
properly, so we just have to trust that it is properly inclusive as
attempting to parse that thing by hand is impossible.
The scripts here are self-contained python scripts, no additional
libraries or tools to create the SBOM are needed, which is important
for many build systems. Overall it's just a bit over 4000 lines of
"simple" python code, the most complex part is the regex matching
lines, but those are nothing compared to what we maintain in
scripts/checkpatch.pl today...
The various parts where the tool touches the kbuild subsystem have
been acked by the kbuild maintainer, so all should be good here.
All of these patches have been in linux-next for weeks with no
reported problems"
* tag 'spdx-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
scripts/sbom: add unit tests for SPDX-License-Identifier parsing
scripts/sbom: add unit tests for command parsers
scripts/sbom: add SPDX build graph
scripts/sbom: add SPDX source graph
scripts/sbom: add SPDX output graph
scripts/sbom: collect file metadata
scripts/sbom: add shared SPDX elements
scripts/sbom: add JSON-LD serialization
scripts/sbom: add SPDX classes
scripts/sbom: add additional dependency sources for cmd graph
scripts/sbom: add cmd graph generation
scripts/sbom: add command parsers
scripts/sbom: setup sbom logging
scripts/sbom: integrate script in make process
scripts/sbom: add documentation
Mark Brown [Wed, 17 Jun 2026 13:00:38 +0000 (14:00 +0100)]
perf bpf: Fix up build failure due to change of btf_vlen() return type
Fix:
util/btf.c: In function '__btf_type__find_member_by_name':
util/btf.c:19:43: error: comparison of integer expressions of different signedness: 'int' and '__u32' {aka 'unsigned int'} [-Werror=sign-compare]
19 | for (i = 0, m = btf_members(t); i < btf_vlen(t); i++, m++) {
| ^
builtin-trace.c: In function 'syscall_arg__strtoul_btf_enum':
builtin-trace.c:967:27: error: comparison of integer expressions of different signedness: 'int' and '__u32' {aka 'unsigned int'} [-Werror=sign-compare]
967 | for (int i = 0; i < btf_vlen(bt); ++i, ++be) {
| ^
by making the variable the same type as the function.
Committer note:
Add an extra hunk from Alan Maguire, fixing btf_enum_scnprintf().
Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 22 Jun 2026 18:51:49 +0000 (11:51 -0700)]
Merge tag 'tty-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is the big set of TTY and Serial driver updates for 7.2-rc1.
Overall we end up removing more code than added, due to an obsolete
synclink_gt driver being removed from the tree, always a nice thing to
see happen.
Other than that driver removal, major things included in here are:
- max310x serial driver updates and fixes
- 8250 driver updates and rework in places to make it more "modern"
- dts file updates
- serial driver core tweaks and updates
- vt code cleanups
- vc_screen crash fixes
- other minor driver updates and cleanups
All of these have been in linux-next for well over a week with no
reported issues"
* tag 'tty-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (49 commits)
serial: 8250_pci: Don't specify conflicting values to pci_device_id members
vc_screen: fix null-ptr-deref in vcs_notifier() during concurrent vcs_write
serial: qcom_geni: Fix RX DMA stall when SE_DMA_RX_LEN_IN is zero
vt: merge ucs_is_zero_width()/ucs_is_double_width() into ucs_get_width()
serial: 8250: fix possible ISR soft lockup
dt-bindings: serial: rs485: remove deprecated .txt binding stub
serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
tty: serial: Use named initializers for arrays of i2c_device_data
serial: 8250_dw: remove clock-notifier infrastructure
serial: 8250_dw: unregister 8250 port if clk_notifier_register() fails
amba/serial: amba-pl011: Bring back zx29 UART support
serial: 8250: Add support for console flow control
serial: 8250: Check LSR timeout on console flow control
serial: 8250: Set cons_flow on port registration
tty: serial: 8250: protect against NULL uart->port.dev in register
arm64: dts: add support for A9 based Amlogic BY401
dt-bindings: arm: amlogic: add A311Y3 support
serial: max310x: fix compile errors if CONFIG_SPI_MASTER is disabled
serial: qcom-geni: Avoid probing debug console UART without console support
serial: max310x: add comments for PLL limits
...
Linus Torvalds [Mon, 22 Jun 2026 16:30:31 +0000 (09:30 -0700)]
Merge tag 'i2c-7.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux
Pull more i2c updates from Andi Shyti:
"Cleanups:
- generic cleanups in qcom, qcom-cci and pxa, plus core cleanups in
algo-bit and atr
Fixes:
- davinci: clean up cpufreq notifier on probe failure
- imx-lpi2c: suspend the adapter while hardware is powered down
- ls2x-v2: return IRQ_HANDLED after servicing error interrupts
- stm32f7: fix timing calculation accuracy
* tag 'i2c-7.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux:
i2c: pxa: Use named initializers for the platform_device_id array
i2c: imx-lpi2c: mark I2C adapter when hardware is powered down
i2c: stm32f7: truncate clock period instead of rounding it
dt-bindings: i2c: microchip,corei2c: permit resets
i2c: qcom: Unify user-visible "Qualcomm" name
i2c: ls2x-v2: return IRQ_HANDLED after servicing an error
i2c: atr: annotate i2c_atr_adap_desc->aliases with __counted_by_ptr
i2c: algo: bit: use str_plural helper in bit_xfer
dt-bindings: i2c: i2c-mux-pinctrl: change maintainer
dt-bindings: i2c: convert i2c-mux-reg to DT schema
i2c: davinci: Unregister cpufreq notifier on probe failure
i2c: qcom-cci: Remove overcautious disable_irq() calls
i2c: qcom-cci: Move cci_init() under cci_reset() function
i2c: qcom-cci: Do not check return value of cci_init()
Linus Torvalds [Mon, 22 Jun 2026 16:24:22 +0000 (09:24 -0700)]
Merge tag 'i3c/for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"This cycle, there was a lot of work around the mipi-i3c-hci driver
that also led to improvements of the core. We also have support for a
new SoC, the Microchip SAMA7D65. And of course, there are small fixes
for the other controller drivers.
Subsystem:
- introduce dynamic address reconciliation after DAA
- add preliminary API for hub support
- fixes for dev_nack_retry_count handling
- move hot-join support in the core instead of open coding in
different drivers
Drivers:
- mipi-i3c-hci-pci: DMA abort, recovery and related improvements,
hot-join support, Microchip SAMA7D65 support, fix possible race in
IBI handling
- dw-i3c-master: fix IBI count register selection for versalnet
- svc: interrupt handling fixes for NPCM845"
* tag 'i3c/for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: (45 commits)
i3c: mipi-i3c-hci: Use named initializers for platform_device_id's .driver_data
i3c: master: Use unsigned int for dev_nack_retry_count consistently
i3c: master: Add missing runtime PM get in dev_nack_retry_count_store()
i3c: master: Update dev_nack_retry_count under maintenance lock
i3c: master: Expose the APIs to support I3C hub
i3c: master: rename i3c_master_reattach_i3c_dev() to *_locked
i3c: mipi-i3c-hci: add microchip sama7d65 SoC compatible with the required quirk
dt-bindings: i3c: mipi-i3c-hci: add Microchip SAMA7D65 compatible
i3c: Consistently define pci_device_ids using named initializers
i3c: master: Reconcile dynamic addresses after DAA
i3c: master: Move DAA API functions after i3c_master_add_i3c_dev_locked()
i3c: master: Make i3c_master_add_i3c_dev_locked() return void
i3c: mipi-i3c-hci: Tolerate i3c_master_add_i3c_dev_locked() failures in DAA
i3c: master: Prevent reuse of dynamic address on device add failure
i3c: mipi-i3c-hci: Ignore DISEC failures when disabling IBIs
i3c: mipi-i3c-hci: Fix race in i3c_hci_addr_to_dev()
i3c: mipi-i3c-hci: Add Hot-Join support
i3c: master: Export i3c_master_enec_disec_locked()
i3c: master: Defer new-device registration out of DAA caller context
i3c: dw: Drop redundant Hot-Join cancel_work_sync() in shutdown
...
The driver allocates domain generic chips using
irq_alloc_domain_generic_chips() during probe and sets up chained
handlers using irq_set_chained_handler_and_data(). However, on driver
removal, the generic chips are not freed and the chained handlers are
not removed.
The generic chips remain on the global gc_list and may later be accessed by
generic interrupt chip suspend, resume, or shutdown callbacks after the
driver has been removed, potentially resulting in a use-after-free and
kernel crash.
The chained handlers that were installed in probe for peripheral and
syswake interrupts are also left dangling, which can lead to spurious
interrupts accessing freed memory.
Fix these issues by:
- Setting IRQ_DOMAIN_FLAG_DESTROY_GC flag in domain->flags, so the
core code automatically removes generic chips when irq_domain_remove()
is called
- Clearing all chained handlers with NULL in pdc_intc_remove()
Fixes: b6ef9161e43a ("irq-imgpdc: add ImgTec PDC irqchip driver") Signed-off-by: Qingshuang Fu <fuqingshuang@kylinos.cn> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260618021352.661773-1-fffsqian@163.com
Tejun Heo [Mon, 22 Jun 2026 15:32:54 +0000 (05:32 -1000)]
sched_ext: Move sources under kernel/sched/ext/
The sched_ext sources had grown to ten ext* files directly under
kernel/sched/. Move them into a new kernel/sched/ext/ subdirectory and drop
the now-redundant ext_ prefix. ext.c/h keep their names.
The include paths in build_policy.c and sched.h, the MAINTAINERS glob, and a
few documentation and comment references are updated to match. No code or
symbol changes.
Linus Torvalds [Mon, 22 Jun 2026 15:28:48 +0000 (08:28 -0700)]
Merge tag 'slab-for-7.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull more slab updates from Vlastimil Babka:
- Introduce and wire up a new alloc_flags parameter for modifying
slab-specific behavior without adding or reusing gfp flags. Also
introduce slab_alloc_context to keep function parameter bloat in
check. Both are similar to what the page allocator does.
kmalloc_flags() exposes alloc_flags for mm-internal users.
- SLAB_ALLOC_NOLOCK flag is used to implement kmalloc_nolock()
behavior without relying on lack of __GFP_RECLAIM, which caused
false positives with workarounds like fd3634312a04 ("debugobject:
Make it work with deferred page initialization - again").
- SLAB_ALLOC_NO_RECURSE replaces __GFP_NO_OBJ_EXT, which could have
been removed, but pending memory allocation profiling changes in
mm tree have grown a new user - there is however a work ongoing
to replace that too, so __GFP_NO_OBJ_EXT should eventually be
removed. (Vlastimil Babka)
- Add kmem_buckets_alloc_track_caller() with a user to be added in the
net tree (Pedro Falcato)
- Fixes for kernel-doc and slabinfo (Randy Dunlap, Yichong Chen)
* tag 'slab-for-7.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
tools/mm/slabinfo: fix total_objects attribute name
slab: recognize @GFP parameter as optional in kernel-doc
mm/slab: add a node-track-caller variant for kmem buckets allocation
mm/slab: replace __GFP_NO_OBJ_EXT with SLAB_ALLOC_NO_RECURSE for sheaves
mm/slab: remove __GFP_NO_OBJ_EXT usage from alloc_slab_obj_exts()
mm/slab: introduce kmalloc_flags()
mm/slab: allow __GFP_NOMEMALLOC and __GFP_NOWARN for kmalloc_nolock()
mm/slab: pass slab_alloc_context to __do_kmalloc_node()
mm/slab: allow kmem_cache_alloc_bulk() with any gfp flags
mm/slab: replace slab_alloc_node() parameters with slab_alloc_context
mm/slab: pass alloc_flags through slab_post_alloc_hook() chain
mm/slab: pass alloc_flags to new slab allocation
mm/slab: add alloc_flags to slab_alloc_context
mm/slab: replace struct partial_context with slab_alloc_context
mm/slab: introduce alloc_flags and SLAB_ALLOC_NOLOCK
mm/slab: introduce slab_alloc_context
mm/slab: stop inlining __slab_alloc_node()
mm/slab: do not init any kfence objects on allocation
Linus Torvalds [Mon, 22 Jun 2026 15:06:13 +0000 (08:06 -0700)]
Merge tag 'hyperv-next-signed-20260621' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Use wakeup mailbox to boot APs in Hyper-V VTL2 TDX guests (Yunhong
Jiang, Ricardo Neri)
- Move the Hyper-V IOMMU to its own subdirectory (Mukesh Rathor)
- Cosmetic changes to mshv and balloon driver (Junrui Luo, Markus
Elfring)
* tag 'hyperv-next-signed-20260621' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: add bounds check on vp_index in mshv_intercept_isr()
hv_balloon: Simplify data output in hv_balloon_debug_show()
x86/hyperv: Cosmetic changes in irqdomain.c for readability
iommu/hyperv: Create hyperv subdirectory under drivers/iommu
x86/hyperv/vtl: Use the wakeup mailbox to boot secondary CPUs
x86/hyperv/vtl: Mark the wakeup mailbox page as private
x86/acpi: Add a helper to get the address of the wakeup mailbox
x86/hyperv/vtl: Setup the 64-bit trampoline for TDX guests
x86/realmode: Make the location of the trampoline configurable
x86/hyperv/vtl: Set real_mode_header in hv_vtl_init_platform()
x86/dt: Parse the Wakeup Mailbox for Intel processors
dt-bindings: reserved-memory: Wakeup Mailbox for Intel processors
x86/acpi: Add functions to setup and access the wakeup mailbox
x86/topology: Add missing struct declaration and attribute dependency
Linus Torvalds [Mon, 22 Jun 2026 14:43:48 +0000 (07:43 -0700)]
Merge tag 's390-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- consolidate s390 idle time accounting by moving all CPU time tracking
to the architecture backend and eliminate the mix of architecture-
specific and common code accounting
- Add missing EXPORT_SYMBOL_GPL() to kcpustat_field_idle() and
kcpustat_field_iowait() functions
- Finalize ptep_get() conversion by replacing direct page table entry
dereferencing with proper accessors (ptep_get(), pmdp_get(), etc.)
- Explicitly check the buffer length in PKEY_VERIFYPROTK ioctl and
pkey_pckmo implementations and fail if the length is exceeded
* tag 's390-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pkey: Check length in pkey_pckmo handler implementation
s390/pkey: Check length in PKEY_VERIFYPROTK ioctl
s390/idle: Add missing EXPORT_SYMBOL_GPL()
s390/mm: Complete ptep_get() conversion
s390/idle: Remove idle time and count sysfs files
s390/idle: Provide arch specific kcpustat_field_idle()/kcpustat_field_iowait()
s390/irq/idle: Use stcke instead of stckf for time stamps
s390/timex: Move union tod_clock type to separate header
Thomas Gleixner [Sun, 21 Jun 2026 14:47:44 +0000 (16:47 +0200)]
debugobjects: Plug race against a concurrent OOM disable
syzbot reported a puzzling splat:
WARNING: kernel/time/hrtimer.c:443 at stub_timer+0xa/0x20
stub_timer() is installed as timer callback function in
hrtimer_fixup_assert_init(), which is invoked when
debug_object_assert_init() can't find a shadow object. In that case debug
objects emits a warning about it before invoking the fixup.
Though the provided console log lacks this warning and instead has the
following a few seconds before the splat:
ODEBUG: Out of memory. ODEBUG disabled
So the object was looked up in debug_object_assert_init() and the lookup
failed due a concurrent out of memory situation which disabled debug
objects and freed the shadow objects:
debug_object_assert_init()
if (!debug_objects_enabled)
return; obj = alloc();
if (!obj) {
// Out of memory
debug_objects_enabled = false;
free_objects();
obj = lookup_or_alloc();
// The lookup failed because the other side
// removed the objects, so this returns
// an error code as the object in question
// is not statically initialized
if (!IS_ERR_OR_NULL(obj))
return;
if (!obj) {
debug_oom();
return;
}
print(...)
if (!debug_objects_enabled)
return;
fixup(...)
The debug object splat is skipped because debug_objects_enabled is false,
but the fixup callback is invoked unconditionally, which makes the timer
disfunctional.
This is only a problem in debug_object_assert_init() and
debug_object_activate() as both have to handle statically initialized
objects and therefore must handle the error pointer return case
gracefully. All other places only handle the found/not found case and the
NULL pointer return is a signal for OOM. Otherwise they get a valid shadow
object.
Plug the hole by checking whether debug objects are still enabled before
invoking the print and fixup function in those two places.
Fixes: b84d435cc228 ("debugobjects: Extend to assert that an object is initialized") Reported-by: syzbot+5e8dda76ca21dae314b6@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/874iiwlzlb.ffs@fw13
Wang Yan [Mon, 22 Jun 2026 10:33:48 +0000 (18:33 +0800)]
time: Fix off-by-one in compat settimeofday() usec validation
The compat version of settimeofday() uses '>' instead of '>=' when
validating tv_usec against USEC_PER_SEC, allowing the value 1000000 to pass
the check. After the subsequent conversion to nanoseconds (tv_nsec *=
NSEC_PER_USEC), this results in tv_nsec == NSEC_PER_SEC, which violates the
timespec invariant that tv_nsec must be strictly less than NSEC_PER_SEC.
The native settimeofday() was already fixed in commit ce4abda5e126 ("time:
Fix off-by-one in settimeofday() usec validation"), but the compat
counterpart was missed.
Fix it by using '>=' to reject tv_usec values outside the valid range [0,
USEC_PER_SEC - 1].
Fixes: 5e0fb1b57bea ("y2038: time: avoid timespec usage in settimeofday()") Signed-off-by: Wang Yan <wangyan01@kylinos.cn> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260622103348.120255-1-wangyan01@kylinos.cn
Zhan Xusheng [Mon, 22 Jun 2026 08:11:36 +0000 (16:11 +0800)]
erofs: handle 48-bit blocks_hi for compressed inodes
Combine i_nb.blocks_hi with i_u.blocks_lo when computing
inode->i_blocks for compressed inodes, mirroring the startblk_hi
handling for unencoded inodes a few lines above. Also evaluate
the shift in u64 to avoid truncation.
Gao Xiang [Mon, 22 Jun 2026 01:36:22 +0000 (09:36 +0800)]
erofs: remove fscache backend entirely
EROFS over fscache was introduced to provide image lazy pulling
functionality. After the feature landed, the fscache subsystem made
netfs a new hard dependency, which is unexpected for a local filesystem
and has an kernel-defined caching hierarchy which could be inflexible
compared to the fanotify pre-content hooks. Therefore, this feature has
been deprecated for almost two years.
As EROFS file-backed mounts and fanotify pre-content hooks both upstream
for a while and already providing equivalent functionality (erofs-utils
has supported fanotify pre-content hooks), let's remove the fscache
backend now.
The main application of this feature is Nydus [1], and they plan to move
to use fanotify pre-content hooks in the near future too.
I hope this patch can be merged into Linux 7.2, which is also motivated
by newly found implementation issues [2][3] that are not worth
investigating given the deprecation and limited development resources.
The associated fscache/cachefiles cleanup patch will follow separately
through the vfs tree (netfs) later: it seems fine since the codebase is
isolated by CONFIG_CACHEFILES_ONDEMAND.
Gao Xiang [Sun, 21 Jun 2026 19:44:14 +0000 (03:44 +0800)]
erofs: add sparse support to pcluster layout
Although zeros can be compressed transparently on EROFS using fixed-size
output compression so that it is never prioritized in the Android use
cases, indicating entire pclusters as holes is still useful to preserve
holes in the sparse datasets; otherwise overlayfs will allocate more
space when copying up, and SEEK_HOLE won't report any hole.
This patch introduces two ways to mark a pcluster as a hole:
- A new Z_EROFS_LI_HOLE compatible flag (bit 14) in the HEAD lcluster
advise field for non-compact (full) indexes;
- A 0-block CBLKCNT value on the first NONHEAD lcluster.
The hole tag is preferred for maximum compatibility since pre-existing
kernels that do not understand Z_EROFS_LI_HOLE will decompress at the
stored blkaddr (the same blkaddr will be shared among all sparse
pclusters). Only the 0-block CBLKCNT approach also works for compact
indexes, but it is limited to big pclusters and new kernels.
Linus Torvalds [Sun, 21 Jun 2026 21:09:49 +0000 (14:09 -0700)]
lib: Add stale 'raid6' directory to .gitignore file
I keep having to do this, because people think they can just move
directories around and move the gitignore files around with them.
You really can't do that - the old generated files stay around for
others, and still need to be ignored in the old location.
So when moving gitignore entries around because you moved the files (or
when moving a whole gitignore file around because the directory it was
in moved), the old gitignore situation needs to be dealt with.
Yes, those files may have moved in *your* tree when you moved the
directory. And yes, new repositories will never even have seen them.
But all those other developers that see the result of your move still
likely have a working tree with the old state, and the files that were
hidden from git by an old gitignore file do not suddenly become
relevant.
Fixes: 3626738bc714 ("raid6: move to lib/raid/") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 21 Jun 2026 20:20:19 +0000 (13:20 -0700)]
Merge tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "taskstats: fix TGID dead-thread stat retention" (Yiyang Chen)
Fix a taskstats TGID aggregation bug where fields added in the TGID
query path were not preserved after thread exit, and adds a kselftest
covering the regression.
Fix a number of possible issues in the ocfs2 xattr code
- "lib and lib/cmdline enhancements" (Dmitry Antipov)
Provide additional robustness checking in the cmdline handling code
and its in-kernel testing and selftests
- "cleanup the RAID6 P/Q library" (Christoph Hellwig)
Clean up the RAID6 P/Q library to match the recent updates to the
RAID 5 XOR library and other CRC/crypto libraries
- "ocfs2: harden inode validators against forged metadata" (Michael
Bommarito)
Add three structural checks to OCFS2 dinode validation so malformed
on-disk fields are rejected before ocfs2_populate_inode() copies them
into the in-core inode
- "lib/raid: replace __get_free_pages() call with kmalloc()" (Mike
Rapoport)
Clean up the lib/raid code by using kmalloc() in more places
* tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (108 commits)
ocfs2: fix circular locking dependency in ocfs2_dio_end_io_write
ocfs2: fix NULL h_transaction deref in ocfs2_assure_trans_credits
lib: interval_tree_test: validate benchmark parameters
ocfs2: avoid moving extents to occupied clusters
treewide: fix transposed "sign" typos and update spelling.txt
ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec
fat: reject BPB volumes whose data area starts beyond total sectors
selftests/uevent: increase __UEVENT_BUFFER_SIZE to avoid ENOBUFS on busy systems
lib/test_firmware: allocate the configured into_buf size
fs: efs: remove unneeded debug prints
checkpatch: cuppress warnings when Reported-by: is followed by Link:
MAINTAINERS: add Alexander as a kcov reviewer
mailmap: update Alexander Sverdlin's Email addresses
fs: fat: inode: replace sprintf() with scnprintf()
ocfs2: fix out-of-bounds write in ocfs2_remove_refcount_extent
ocfs2: fix race between ocfs2_control_install_private() and ocfs2_control_release()
ocfs2/dlm: require a ref for locking_state debugfs open
ocfs2: reject FITRIM ranges shorter than a cluster
ocfs2: validate fast symlink target during inode read
ocfs2: add journal NULL check in ocfs2_checkpoint_inode()
...
Linus Torvalds [Sun, 21 Jun 2026 19:25:17 +0000 (12:25 -0700)]
Merge tag 'mtd/for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd updates from Miquel Raynal:
"NAND changes:
- Extend SPI NAND continuous read to Winbond devices, which requires
numerous changes in the spi-{mem,nand} layers such as the need for
a secondary read operation template
- Continuous reads in general have also been enhanced/fixed for
avoiding potential issues at probe time and at block boundaries
SPI NOR changes:
- Big set of cleanups and improvements to the locking support.
This series contains some cleanups and bug fixes for code and
documentation around write protection. Then support is added for
complement locking, which allows finer grained configuration of
what is considered locked and unlocked. Then complement locking is
enabled on a bunch of Winbond W25 flashes
- Fix die erase support on Spansion flashes.
Die erase is only supported on multi-die flashes, but the die erase
opcode was set for all. When the opcode is set, it overrides the
default chip erase opcode which should be used for single-die
flashes. Only set the opcode on multi-die flashes. Also, the opcode
was not set on multi-die s28hx-t flashes. Set it so they can use
die-erase correctly
General changes:
- A few drivers and mappings have been removed following SoCs support
removal
- And again, there is the usual load of misc improvements and fixes"
* tag 'mtd/for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (63 commits)
mtd: cfi: Use common error handling code in two functions
mtd: slram: simplify register_device() cleanup
mtd: slram: remove failed entries from the device list
mtd: rawnand: ndfc: use ioread32be/iowrite32be and allow COMPILE_TEST
mtd: spi-nor: spansion: add die erase support in s28hx-t
mtd: spi-nor: spansion: use die erase for multi-die devices only
mtd: spi-nor: winbond: Add W25Q02NWxxIM CMP locking support
mtd: spi-nor: winbond: Add W25Q01NWxxIM CMP locking support
mtd: spi-nor: winbond: Add W25Q01NWxxIQ CMP locking support
mtd: spi-nor: winbond: Add W25H02NWxxAM CMP locking support
mtd: spi-nor: winbond: Add W25H01NWxxAM CMP locking support
mtd: spi-nor: winbond: Add W25H512NWxxAM CMP locking support
mtd: spi-nor: Add steps for testing locking with CMP
mtd: spi-nor: swp: Add support for the complement feature
mtd: spi-nor: Add steps for testing locking support
mtd: maps: remove obsolete impa7 map driver
mtd: maps: remove uclinux map driver
mtd: maps: remove AMD Élan specific drivers
mtd: inftlmount: convert printk(KERN_WARNING) to pr_warn
mtd: Consistently define pci_device_ids
...
Bradley Morgan [Fri, 19 Jun 2026 16:37:18 +0000 (16:37 +0000)]
cpu: hotplug: Bound hotplug states sysfs output
states_show() adds CPU hotplug state names into a single sysfs buffer
using sprintf(). With enough registered states, this can write past the
end of the PAGE_SIZE buffer.
Use sysfs_emit_at() so output is bounded.
Fixes: 98f8cdce1db5 ("cpu/hotplug: Add sysfs state interface") Signed-off-by: Bradley Morgan <include@grrlz.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260619163719.12103-2-include@grrlz.net
Bradley Morgan [Fri, 19 Jun 2026 16:37:17 +0000 (16:37 +0000)]
cpu: hotplug: Preserve per instance callback errors
cpuhp_invoke_callback() unwinds earlier callbacks for the same
hotplug state when one instance fails. The rollback path currently
reuses ret, so a successful rollback can hide the original error and
make the failed transition look successful.
Keep the rollback result separate from the original error.
Fixes: 724a86881d03 ("smp/hotplug: Callback vs state-machine consistency") Signed-off-by: Bradley Morgan <include@grrlz.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260619163719.12103-1-include@grrlz.net
This patch caused a significant performance regression, so revert it, and
we can determine whether the approach is sensible or not moving forwards,
and if so how to avoid this.
There was a merge conflict with commit de97ae6222c1 ("mm/readahead: no
PG_readahead on EOF"), care was taken to ensure that the revert retained
the behaviour of this patch and cleanly reverts commit 7b32f64bc512 ("mm:
limit filemap_fault readahead to VMA boundaries") only.
Link: https://lore.kernel.org/20260619112852.104213-1-ljs@kernel.org Fixes: 7b32f64bc512 ("mm: limit filemap_fault readahead to VMA boundaries") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202606181547.617a6967-lkp@intel.com Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ben Dooks [Tue, 16 Jun 2026 09:59:06 +0000 (10:59 +0100)]
mm/vmscan: pass NULL to trace vmscan node reclaim
The tracepoint for node relcaims takes a `struct mem_cgroup *`
as the third argument, so pass NULL instead of 0 to fix warning
about using an integer as a pointer.
Fixes the following warnings:
mm/vmscan.c:6753:66: warning: Using plain integer as NULL pointer
mm/vmscan.c:6757:58: warning: Using plain integer as NULL pointer
mm/vmscan.c:7818:60: warning: Using plain integer as NULL pointer
selftests/mm: fix exclusive_cow test fork() handling
The test ignores the return value of fork(), so both the parent and the
(newly created) child run the COW verification loops and then call
hmm_buffer_free() before returning into the kselftest harness, which
_exit()s each side. This duplicated teardown sequence has been observed
to manifest as a SIGSEGV in the test child, e.g.:
hmm-tests[360141]: segfault (11) at 0 nip 10006964 lr 1000ac3c code 1
in hmm-tests[6964,10000000+30000]
Fix this by adopting the same fork()-then-wait pattern already used by the
nearby anon_write_child / anon_write_child_shared tests in this file: the
child performs the COW verification and then _exit(0)s so it does not run
the test teardown, while the parent independently verifies COW, waits for
the child, and only then frees the buffer.
Link: https://lore.kernel.org/20260611034102.1030738-4-aboorvad@linux.ibm.com Fixes: b659baea75469 ("mm: selftests for exclusive device memory") Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Sayali Patil <sayalip@linux.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sayali Patil [Thu, 11 Jun 2026 03:41:01 +0000 (09:11 +0530)]
selftests/mm: remove hardcoded THP sizing assumptions in hmm tests
migrate_partial_unmap_fault() and migrate_remap_fault() use hardcoded
offsets based on a 2MB PMD size. Similarly, benchmark_thp_migration()
assumes a fixed 2MB THP size when generating test buffer sizes.
Derive offsets and test sizes from the runtime PMD page size returned by
read_pmd_pagesize(). If unavailable, fall back to TWOMEG. This allows
the tests to adapt correctly on systems where PMD-sized THP differs from
2MB. Also replace the fixed 1MB unmap size with a PMD-relative value
derived from the runtime PMD size.
On systems with larger PMD sizes, computed test buffer sizes can exceed
INT_MAX. Skip such test cases to avoid overflow.
Link: https://lore.kernel.org/20260611034102.1030738-3-aboorvad@linux.ibm.com Fixes: 24c2c5b8ffbd ("selftests/mm/hmm-tests: partial unmap, mremap and anon_write tests") Fixes: 271a7b2e3c13 ("selftests/mm/hmm-tests: new throughput tests including THP") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sayali Patil [Thu, 11 Jun 2026 03:41:00 +0000 (09:11 +0530)]
selftests/mm: allow PUD-level entries in compound testcase of hmm tests
Patch series "selftests/mm: assorted fixes for hmm-tests", v3.
This series fixes a few issues in hmm-tests that show up when page-size
and huge-page configuration differ from the hardcoded assumptions the
tests were written for (PMD/THP sizing, default hugepage size, and related
cases).
It also includes a fix to exclusive_cow: the test ignored the return value
of fork(), so both parent and child ran the same teardown path.
This patch (of 3):
The HMM compound testcase currently assumes only PMD-level mappings and
fails on systems where default_hugepagesz=1G is set, because the region is
then reported by the device at PUD level.
Determine the mapping level (PMD or PUD) the device reports for the first
page of the range and require every page to match that level exactly via
ASSERT_EQ(). This accepts PUD-level mappings while preserving the
expected/observed protection values printed on failure, and rejects a
fragmented mapping that mixes PMD- and PUD-level entries within the same
range (which a per-page OR check would have let pass).
Link: https://lore.kernel.org/20260611034102.1030738-1-aboorvad@linux.ibm.com Link: https://lore.kernel.org/20260611034102.1030738-2-aboorvad@linux.ibm.com Fixes: e478425bec93 ("mm/hmm: add tests for hmm_pfn_to_map_order()") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Co-developed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Samuel Moelius [Tue, 9 Jun 2026 00:48:15 +0000 (00:48 +0000)]
mm/gup_test: reject wrapped user ranges
gup_test accepts an address and size from the debugfs ioctl and repeatedly
compares against addr + size. If that addition wraps, the loop can be
skipped and the ioctl returns success with size rewritten to zero.
Compute the end address once with overflow checking and use that checked
end for the loop bounds.
Assisted-by: Codex:gpt-5.5-cyber-preview Link: https://lore.kernel.org/20260609004814.1240586.6294d614ac80.gup-test-range-end-wrap@trailofbits.com Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Samuel Moelius [Fri, 5 Jun 2026 18:41:52 +0000 (18:41 +0000)]
mm/page_frag: reject invalid CPUs in page_frag_test
The page_frag selftest module accepts test_push_cpu and test_pop_cpu as
signed module parameters, then validates them by passing them directly to
cpu_active().
That validation is itself unsafe for negative or out-of-range CPU numbers.
For example, test_push_cpu=-1 is converted to a very large unsigned CPU
number before cpu_active() reaches cpumask_test_cpu(), which trips the
cpumask range check with CONFIG_DEBUG_PER_CPU_MAPS enabled.
Reject CPU values outside [0, nr_cpu_ids) before asking whether the CPU is
active.
Assisted-by: Codex:gpt-5.5-cyber-preview Link: https://lore.kernel.org/20260605184157.2490353-1-sam.moelius@trailofbits.com Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 5 Jun 2026 01:38:48 +0000 (18:38 -0700)]
mm/damon/core: always put unsuccessfully committed target pids
damon_commit_target() puts and gets the destination and the source target
pids. It puts the destination target pid because it will be overwritten
by the source target pid. It gets the source pid because the caller is
supposed to eventually put the pids. In more detail, the caller will call
damon_destroy_ctx() after damon_commit_ctx() to destroy the entire source
context. And in this case, [f]vaddr operation set's cleanup_target()
callback will put the pids.
The commit operation is made at the context level. The operation can fail
in multiple places including in the middle and after the targets commit
operations. For any such failures, immediately the error is returned to
the damon_commit_ctx() caller. If some or all of the source target pids
were committed to the destination during the unsuccessful context commit
attempt, those pids should be put twice.
The source context will do the put operations using the above explained
routine. However, let's suppose the destination context was not
originally using [f]vaddr operation set and the commit failed before the
ops of the source context is committed. The destination does not have the
cleanup_target() ops callback, so it cannot put the pids via the
damon_destroy_ctx().
As a result, the pids are leaked. The issue in the real world would be
not very common. The commit feature is for changing parameters of running
DAMON context while inheriting internal status like the monitoring
results. The monitoring results of a physical address range ain't have
things that are beneficial to be inherited to a virtual address ranges
monitoring. So the problem-causing DAMON control would be not very common
in the real world. That said, it is a supported feature. And
damon_commit_target() failure due to memory allocation is relatively
realistic [1] if there are a huge number of target regions.
Fix by putting the pids in the commit operation in case of the failures.
Kaitao Cheng [Tue, 2 Jun 2026 13:07:55 +0000 (21:07 +0800)]
mm: page_isolation: avoid unsafe folio reads while scanning compound pages
page_is_unmovable() can inspect compound pages without holding a folio
reference or any lock. The folio can therefore be freed, split or reused
while the scanner is still looking at it.
The existing HugeTLB handling already avoids folio_hstate() for this
reason, but it still derives the hstate from folio_size() and later
derives the scan step from folio_nr_pages() and folio_page_idx(). These
helpers rely on the folio still being a valid folio head. If the folio
changed concurrently, the scanner can read inconsistent folio metadata and
compute a wrong step. In the worst case, folio_nr_pages() can return 1
for what used to be a tail page and the subtraction from folio_page_idx()
can underflow.
There is a similar issue for non-Hugetlb compound pages: folio_test_lru()
expects a valid folio. If the previously observed head page has been
reused as a tail page of another compound page, the folio flag checks can
trigger VM_BUG_ON_PGFLAGS().
Read the compound order once with compound_order(), reject obviously bogus
orders, and derive the hstate and scan step from that order instead of
querying folio size information again. Also use PageLRU(page), which is
safe for the page being scanned, instead of folio_test_lru() on a
potentially stale folio pointer.
Treat an unknown HugeTLB hstate as unmovable so the scanner does not try
to skip over an unstable HugeTLB folio.
Link: https://lore.kernel.org/20260602130755.38794-1-kaitao.cheng@linux.dev Fixes: a0a9f2180b90 ("mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock") Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shakeel Butt [Wed, 10 Jun 2026 23:20:48 +0000 (16:20 -0700)]
mm/shrinker: do not hold RCU lock in shrinker_debugfs_count_show()
Reading the debugfs "count" file of a memcg-aware shrinker can sleep
inside an RCU read-side critical section:
BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:421
RCU nest depth: 1, expected: 0
css_rstat_flush
mem_cgroup_flush_stats
zswap_shrinker_count
shrinker_debugfs_count_show
shrinker_debugfs_count_show() invokes the ->count_objects() callback under
rcu_read_lock(). The zswap callback flushes memcg stats via
css_rstat_flush(), which may sleep, so it must not run under RCU.
The RCU lock is not needed here. mem_cgroup_iter() takes RCU internally
and returns a memcg holding a css reference (dropped on the next iteration
or by mem_cgroup_iter_break()), so the memcg stays alive without it. The
shrinker is kept alive by the open debugfs file: shrinker_free() removes
the debugfs entries via debugfs_remove_recursive(), which waits for
in-flight readers to drain, before call_rcu(..., shrinker_free_rcu_cb).
The sibling "scan" handler already invokes the sleeping ->scan_objects()
callback with no RCU section.
The droppable test currently relies on creating memory pressure in a child
process to trigger dropping the droppable pages.
That not only takes a long time on some machines (allocating and filling
all that memory), on large machines this will not work as we hardcode the
area size to 134217728 bytes.
... further, we rely on timeouts to detect that memory was not dropped,
which is really suboptimal.
Instead, let's just use MADV_PAGEOUT on a 2 MiB region. MADV_PAGEOUT
works with droppable memory even without swap.
There is the low chance of MADV_PAGEOUT failing to drop a page because of
speculative references. We'll wait 1s and retry 10 times to rule that
unlikely case out as best as we can.
On a machine without swap:
$ ./droppable
TAP version 13
1..1
ok 1 madvise(MADV_PAGEOUT) behavior
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
Link: https://lore.kernel.org/20260611-droppable_test-v1-1-b6a73d99f658@kernel.org Fixes: 9651fcedf7b9 ("mm: add MAP_DROPPABLE for designating always lazily freeable mappings") Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: SeongJae Park <sj@kernel.org> Tested-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
writeout is only called from pageout, and a straight flow at the end, so
merge the two functions.
Link: https://lore.kernel.org/20260601113449.3464734-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Baoquan He <baoquan.he@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hao Ge [Tue, 26 May 2026 09:26:41 +0000 (17:26 +0800)]
MAINTAINERS: add Hao Ge as reviewer for codetag and alloc_tag
I've been contributing to the codetag and alloc_tag subsystems since 2024,
Memory allocation profiling is indeed a great tool and I'd like to stay
involved as a reviewer to keep up with ongoing development and not miss
any of the details. I'm happy to help review patches and contribute to
the ongoing development of this subsystem.
Link: https://lore.kernel.org/20260526092641.299399-1-hao.ge@linux.dev Signed-off-by: Hao Ge <hao.ge@linux.dev> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: "David Hildenbrand (Arm)" <david@kernel.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sayali Patil [Thu, 21 May 2026 06:47:53 +0000 (12:17 +0530)]
selftests/mm: clarify alternate unmapping in compaction_test
Add a comment explaining that every other entry in the list is unmapped to
intentionally create fragmentation with locked pages before invoking
check_compaction().
Link: https://lore.kernel.org/da5e0a8d5152e54152c0d2f456aac2fac35af291.1779296493.git.sayalip@linux.ibm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sayali Patil [Thu, 21 May 2026 06:47:52 +0000 (12:17 +0530)]
selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category
run_vmtests.sh contains special handling to ensure the hwpoison_inject
module is available for the memory-failure tests. This logic was
implemented outside of run_test(), making the setup category-specific but
managed globally.
Move the hwpoison_inject handling into run_test() and restrict it
to the memory-failure category so that:
1. the module is checked and loaded only when memory-failure tests run,
2. the test is skipped if the module or the debugfs interface
(/sys/kernel/debug/hwpoison/) is not available.
3. the module is unloaded after the test if it was loaded by the script.
This localizes category-specific setup and makes the test flow
consistent with other per-category preparations.
While updating this logic, fix the module availability check.
The script previously used:
modprobe -R hwpoison_inject
The -R option prints the resolved module name to stdout, causing every
run to print:
hwpoison_inject
in the test output, even when no action is required, introducing
unnecessary noise.
Replace this with:
modprobe -n hwpoison_inject
which verifies that the module is loadable without producing output,
keeping the selftest logs clean and consistent.
Also, ensure that skipped tests do not override a previously recorded
failure. A skipped test currently sets exitcode to ksft_skip even if a
prior test has failed, which can mask failures in the final exit status.
Update the logic to only set exitcode to ksft_skip when no failure has
been recorded.
Link: https://lore.kernel.org/93441f34f7ef5add47d1a130d03daa79e21b5050.1779296493.git.sayalip@linux.ibm.com Fixes: ff4ef2fbd101 ("selftests/mm: add memory failure anonymous page test") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>