Uros Bizjak [Mon, 8 Dec 2025 16:33:34 +0000 (17:33 +0100)]
x86/bpf: Avoid emitting LOCK prefix for XCHG atomic ops
The x86 XCHG instruction is implicitly locked when one of the
operands is a memory location, making an explicit LOCK prefix
unnecessary.
Stop emitting the LOCK prefix for BPF_XCHG in the JIT atomic
read-modify-write helpers. This avoids redundant instruction
prefixes while preserving correct atomic semantics.
====================
bpf: Optimize recursion detection on arm64
V2: https://lore.kernel.org/all/20251217233608.2374187-1-puranjay@kernel.org/
Changes in v2->v3:
- Added acked by Yonghong
- Patch 2:
- Change alignment of active from 8 to 4
- Use le32_to_cpu in place of get_unaligned_le32()
V1: https://lore.kernel.org/all/20251217162830.2597286-1-puranjay@kernel.org/
Changes in V1->V2:
- Patch 2:
- Put preempt_enable()/disable() around RMW accesses to mitigate
race conditions. Because on CONFIG_PREEMPT_RCU and sleepable
bpf programs, preemption can cause no bpf prog to execute in
case of recursion.
BPF programs detect recursion using a per-CPU 'active' flag in struct
bpf_prog. The trampoline currently sets/clears this flag with atomic
operations.
On some arm64 platforms (e.g., Neoverse V2 with LSE), per-CPU atomic
operations are relatively slow. Unlike x86_64 - where per-CPU updates
can avoid cross-core atomicity, arm64 LSE atomics are always atomic
across all cores, which is unnecessary overhead for strictly per-CPU
state.
This patch removes atomics from the recursion detection path on arm64.
It was discovered in [1] that per-CPU atomics that don't return a value
were extremely slow on some arm64 platforms, Catalin added a fix in
commit 535fdfc5a228 ("arm64: Use load LSE atomics for the non-return
per-CPU atomic operations") to solve this issue, but it seems to have
caused a regression on the fentry benchmark.
Using the fentry benchmark from the bpf selftests shows the following:
./tools/testing/selftests/bpf/bench trig-fentry
+---------------------------------------------+------------------------+
| Configuration | Total Operations (M/s) |
+---------------------------------------------+------------------------+
| bpf-next/master with Catalin’s fix reverted | 51.770 |
|---------------------------------------------|------------------------|
| bpf-next/master | 43.271 |
| bpf-next/master with this change | 43.271 |
+---------------------------------------------+------------------------+
All benchmarks were run on a KVM based vm with Neoverse-V2 and 8 cpus.
This patch yields a 25% improvement in this benchmark compared to
bpf-next. Notably, reverting Catalin's fix also results in a performance
gain for this benchmark, which is interesting but expected.
For completeness, this benchmark was also run with the change enabled on
x86-64, which resulted in a 30% regression in the fentry benchmark. So,
it is only enabled on arm64.
P.S. - Here is more data with other program types:
Puranjay Mohan [Fri, 19 Dec 2025 18:44:18 +0000 (10:44 -0800)]
bpf: arm64: Optimize recursion detection by not using atomics
BPF programs detect recursion using a per-CPU 'active' flag in struct
bpf_prog. The trampoline currently sets/clears this flag with atomic
operations.
On some arm64 platforms (e.g., Neoverse V2 with LSE), per-CPU atomic
operations are relatively slow. Unlike x86_64 - where per-CPU updates
can avoid cross-core atomicity, arm64 LSE atomics are always atomic
across all cores, which is unnecessary overhead for strictly per-CPU
state.
This patch removes atomics from the recursion detection path on arm64 by
changing 'active' to a per-CPU array of four u8 counters, one per
context: {NMI, hard-irq, soft-irq, normal}. The running context uses a
non-atomic increment/decrement on its element. After increment,
recursion is detected by reading the array as a u32 and verifying that
only the expected element changed; any change in another element
indicates inter-context recursion, and a value > 1 in the same element
indicates same-context recursion.
For example, starting from {0,0,0,0}, a normal-context trigger changes
the array to {0,0,0,1}. If an NMI arrives on the same CPU and triggers
the program, the array becomes {1,0,0,1}. When the NMI context checks
the u32 against the expected mask for normal (0x00000001), it observes
0x01000001 and correctly reports recursion. Same-context recursion is
detected analogously.
Puranjay Mohan [Fri, 19 Dec 2025 18:44:17 +0000 (10:44 -0800)]
bpf: move recursion detection logic to helpers
BPF programs detect recursion by doing atomic inc/dec on a per-cpu
active counter from the trampoline. Create two helpers for operations on
this active counter, this makes it easy to changes the recursion
detection logic in future.
====================
resolve_btfids: Support for BTF modifications
This series changes resolve_btfids and kernel build scripts to enable
BTF transformations in resolve_btfids. Main motivation for enhancing
resolve_btfids is to reduce dependency of the kernel build on pahole
capabilities [1] and enable BTF features and optimizations [2][3]
particular to the kernel.
Patches #1-#4 in the series are non-functional changes in
resolve_btfids.
Patch #5 makes kernel build notice pahole version changes between
builds.
Patch #6 changes minimum version of pahole required for
CONFIG_DEBUG_INFO_BTF to v1.22
Patch #7 makes a small prep change in selftests/bpf build.
The last patch (#8) makes significant changes in resolve_btfids and
introduces scripts/gen-btf.sh. See implementation details in the patch
description.
Successful BPF CI run: https://github.com/kernel-patches/bpf/actions/runs/20378061470
v2->v3:
- add patch #4 bumping minimum pahole version (Andrii, Alan)
- add patch #5 pre-fixing resolve_btfids test (Donglin)
- add GEN_BTF var and assemble RESOLVE_BTFIDS_FLAGS in Makefile.btf (Alan)
- implement --distill_base flag in resolve_btfids, set it depending
on KBUILD_EXTMOD in Makefile.btf (Eduard)
- various implementation nits, see the v2 thread for details (Andrii, Eduard)
v1->v2:
- gen-btf.sh and other shell script fixes (Donglin)
- update selftests build (Donglin)
- generate .BTF.base only when KBUILD_EXTMOD is set (Alan)
- proper endianness handling for cross-compilation
- change elf_begin mode from ELF_C_RDWR_MMAP to ELF_C_READ_MMAP_PRIVATE
- remove compressed_section_fix()
- nit NULL check in patch #3 (suggested by AI)
Ihor Solodrai [Fri, 19 Dec 2025 18:18:25 +0000 (10:18 -0800)]
resolve_btfids: Change in-place update with raw binary output
Currently resolve_btfids updates .BTF_ids section of an ELF file
in-place, based on the contents of provided BTF, usually within the
same input file, and optionally a BTF base.
Change resolve_btfids behavior to enable BTF transformations as part
of its main operation. To achieve this, in-place ELF write in
resolve_btfids is replaced with generation of the following binaries:
* ${1}.BTF with .BTF section data
* ${1}.BTF_ids with .BTF_ids section data if it existed in ${1}
* ${1}.BTF.base with .BTF.base section data for out-of-tree modules
The execution of resolve_btfids and consumption of its output is
orchestrated by scripts/gen-btf.sh introduced in this patch.
The motivation for emitting binary data is that it allows simplifying
resolve_btfids implementation by delegating ELF update to the $OBJCOPY
tool [1], which is already widely used across the codebase.
There are two distinct paths for BTF generation and resolve_btfids
application in the kernel build: for vmlinux and for kernel modules.
For the vmlinux binary a .BTF section is added in a roundabout way to
ensure correct linking. The patch doesn't change this approach, only
the implementation is a little different.
Before this patch it worked as follows:
* pahole consumed .tmp_vmlinux1 [2] and added .BTF section with
llvm-objcopy [3] to it
* then everything except the .BTF section was stripped from .tmp_vmlinux1
into a .tmp_vmlinux1.bpf.o object [2], later linked into vmlinux
* resolve_btfids was executed later on vmlinux.unstripped [4],
updating it in-place
After this patch gen-btf.sh implements the following:
* pahole consumes .tmp_vmlinux1 and produces a *detached* file with
raw BTF data
* resolve_btfids consumes .tmp_vmlinux1 and detached BTF to produce
(potentially modified) .BTF, and .BTF_ids sections data
* a .tmp_vmlinux1.bpf.o object is then produced with objcopy copying
BTF output of resolve_btfids
* .BTF_ids data gets embedded into vmlinux.unstripped in
link-vmlinux.sh by objcopy --update-section
For kernel modules, creating a special .bpf.o file is not necessary,
and so embedding of sections data produced by resolve_btfids is
straightforward with objcopy.
With this patch an ELF file becomes effectively read-only within
resolve_btfids, which allows deleting elf_update() call and satellite
code (like compressed_section_fix [5]).
Endianness handling of .BTF_ids data is also changed. Previously the
"flags" part of the section was bswapped in sets_patch() [6], and then
Elf_Type was modified before elf_update() to signal to libelf that
bswap may be necessary. With this patch we explicitly bswap entire
data buffer on load and on dump.
Ihor Solodrai [Fri, 19 Dec 2025 18:18:24 +0000 (10:18 -0800)]
selftests/bpf: Run resolve_btfids only for relevant .test.o objects
A selftest targeting resolve_btfids functionality relies on a resolved
.BTF_ids section to be available in the TRUNNER_BINARY. The underlying
BTF data is taken from a special BPF program (btf_data.c), and so
resolve_btfids is executed as a part of a TRUNNER_BINARY build recipe
on the final binary.
Subsequent patches in this series allow resolve_btfids to modify BTF
before resolving the symbols, which means that the test needs access
to that modified BTF [1]. Currently the test simply reads in
btf_data.bpf.o on the assumption that BTF hasn't changed.
Implement resolve_btfids call only for particular test objects (just
resolve_btfids.test.o for now). The test objects are linked into the
TRUNNER_BINARY, and so .BTF_ids section will be available there.
This will make it trivial for the resolve_btfids test to access BTF
modified by resolve_btfids.
Ihor Solodrai [Fri, 19 Dec 2025 18:18:23 +0000 (10:18 -0800)]
lib/Kconfig.debug: Set the minimum required pahole version to v1.22
Subsequent patches in the series change vmlinux linking scripts to
unconditionally pass --btf_encode_detached to pahole, which was
introduced in v1.22 [1][2].
This change allows to remove PAHOLE_HAS_SPLIT_BTF Kconfig option and
other checks of older pahole versions.
Ihor Solodrai [Fri, 19 Dec 2025 18:13:18 +0000 (10:13 -0800)]
kbuild: Sync kconfig when PAHOLE_VERSION changes
This patch implements kconfig re-sync when the pahole version changes
between builds, similar to how it happens for compiler version change
via CC_VERSION_TEXT.
Define PAHOLE_VERSION in the top-level Makefile and export it for
config builds. Set CONFIG_PAHOLE_VERSION default to the exported
variable.
Kconfig records the PAHOLE_VERSION value in
include/config/auto.conf.cmd [1].
The Makefile includes auto.conf.cmd, so if PAHOLE_VERSION changes
between builds, make detects a dependency change and triggers
syncconfig to update the kconfig [2].
For external module builds, add a warning message in the prepare
target, similar to the existing compiler version mismatch warning.
Note that if pahole is not installed or available, PAHOLE_VERSION is
set to 0 by pahole-version.sh, so the (un)installation of pahole is
treated as a version change.
Ihor Solodrai [Fri, 19 Dec 2025 18:13:15 +0000 (10:13 -0800)]
resolve_btfids: Factor out load_btf()
Increase the lifetime of parsed BTF in resolve_btfids by factoring
load_btf() routine out of symbols_resolve() and storing the base_btf
and btf pointers in the struct object.
- Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)
- Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
C23 (Mikhail Gavrilov)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: add regression test for bpf_d_path()
bpf: Fix verifier assumptions of bpf_d_path's output buffer
selftests/bpf: Add test for truncated dmabuf_iter reads
bpf: Fix truncated dmabuf iterator reads
x86/unwind/orc: Support reliable unwinding through BPF stack frames
bpf: Add bpf_has_frame_pointer()
bpf, arm64: Do not audit capability check in do_jit()
libbpf: Fix -Wdiscarded-qualifiers under C23
bpftool: Fix build warnings due to MS extensions
net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
selftests/bpf: Add -fms-extensions to bpf build flags
Linus Torvalds [Wed, 17 Dec 2025 03:48:30 +0000 (15:48 +1200)]
Merge tag 's390-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- clear 'Search boot program' flag when 'bootprog' sysfs file is
written to override a value set from Hardware Management Console
- fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
functions when triggering PCI device recovery using sysfs
- annotate the expected lock context imbalance in zpci_release_device()
function to fix a sparse complaint
- fix the logic to fallback to the return address register value in the
topmost frame when stack tracing uses a back chain
* tag 's390-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/stacktrace: Do not fallback to RA register
s390/pci: Annotate lock context imbalance in zpci_release_device()
s390/pci: Fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
s390/ipl: Clear SBP flag when bootprog is set
====================
libbpf: move arena variables out of the zero page
Modify libbpf to place arena globals at the end of the arena mapping
instead of the very beginning. This allows programs to leave the
"zero page" of the arena unmapped, so that NULL arena pointer
dereferences trigger a page fault and associated backtrace in BPF streams.
In contrast, the current policy of placing global data in the zero pages
means that NULL dereferences silently corrupt global data, e.g, arena
qspinlock state. This makes arena bugs more difficult to debug.
The patchset adds code to libbpf to move global arena data to the end of
the arena. At load time, libbpf adjusts each symbol's location within
the arena to point to the right location in the arena. The patchset
also adjusts the arena skeleton pointer to point to the arena globals,
now that they are not in the beginning of the arena region.
- Added Acks by Eduard
- Changed jumptable sym_off to unsigned int for consistency (AI)
- Adjusted selftests to ensure arena globals are actually mapped in (Eduard)
- (Patch 2) Adjusted selftests that were failing because they were expecting the
now removed "direct map offset" error message
- Remove unnecessary kernel bounds check in resolve_pseudo_ldimm64
(Andrii)
- Added patch to turn sym_off unsigned to prevent overflow (AI)
- Remove obsolete references to offsets from test patch description
(Andrii)
- Use size_t for arena_data_off (Andrii)
- Remove extra mutable variable from offset calculations (Andrii)
- Moved globals to the end of the mapping: (Andrii)
- Removed extra parameter for offset and parameter picking logic
- Removed padding in the skeleton
- Removed additional libbpf call
- Added Reviewed-by from Eduard on patch 1
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
====================
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:25 +0000 (12:33 -0500)]
selftests/bpf: Add tests for the arena offset of globals
Add tests for the new libbpf globals arena offset logic. The
tests cover the case of globals being as large as the arena
itself, and being smaller than the arena. In that case, the
data is placed at the end of the arena, and the beginning
of the arena is free.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:24 +0000 (12:33 -0500)]
libbpf: Move arena globals to the end of the arena
Arena globals are currently placed at the beginning of the arena
by libbpf. This is convenient, but prevents users from reserving
guard pages in the beginning of the arena to identify NULL pointer
dereferences. Adjust the load logic to place the globals at the
end of the arena instead.
Also modify bpftool to set the arena pointer in the program's BPF
skeleton to point to the globals. Users now call bpf_map__initial_value()
to find the beginning of the arena mapping and use the arena pointer
in the skeleton to determine which part of the mapping holds the
arena globals and which part is free.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:23 +0000 (12:33 -0500)]
libbpf: Turn relo_core->sym_off unsigned
The symbols' relocation offsets in BPF are stored in an int field,
but cannot actually be negative. When in the next patch libbpf relocates
globals to the end of the arena, it is also possible to have valid
offsets > 2GiB that are used to calculate the final relo offsets.
Avoid accidentally interpreting large offsets as negative by turning
the sym_off field unsigned.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:22 +0000 (12:33 -0500)]
bpf/verifier: Do not limit maximum direct offset into arena map
The verifier currently limits direct offsets into a map to 512MiB
to avoid overflow during pointer arithmetic. However, this prevents
arena maps from using direct addressing instructions to access data
at the end of > 512MiB arena maps. This is necessary when moving
arena globals to the end of the arena instead of the front.
Refactor the verifier code to remove the offset calculation during
direct value access calculations. This is possible because the only
two map types that implement .map_direct_value_addr() are arrays and
arenas, and they both do their own internal checks to ensure the
offset is within bounds.
Adjust selftests that expect the old error. These tests still fail
because the verifier identifies the access as out of bounds for the
map, so change them to expect an "invalid access to map value pointer"
error instead.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:21 +0000 (12:33 -0500)]
selftests/bpf: Explicitly account for globals in verifier_arena_large
The big_alloc1 test in verifier_arena_large assumes that the arena base
and the first page allocated by bpf_arena_alloc_pages are identical.
This is not the case, because the first page in the arena is populated
by global arena data. The test still passes because the code makes the
tacit assumption that the first page is on offset PAGE_SIZE instead of
0.
Make this distinction explicit in the code, and adjust the page offsets
requested during the test to count from the beginning of the arena
instead of using the address of the first allocated page.
Linus Torvalds [Tue, 16 Dec 2025 07:44:36 +0000 (19:44 +1200)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull shmem rename fixes from Al Viro:
"A couple of shmem rename fixes - recent regression from tree-in-dcache
series and older breakage from stable directory offsets stuff"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
shmem: fix recovery on rename failures
shmem_whiteout(): fix regression from tree-in-dcache series
Linus Torvalds [Tue, 16 Dec 2025 07:34:09 +0000 (19:34 +1200)]
Merge tag 'v6.19-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix set xattr name validation
- Fix session refcount leak
- Minor cleanup
- smbdirect (RDMA) fixes: improve receive completion, and connect
* tag 'v6.19-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix buffer validation by including null terminator size in EA length
ksmbd: Fix refcount leak when invalid session is found on session lookup
ksmbd: remove redundant DACL check in smb_check_perm_dacl
ksmbd: convert comma to semicolon
smb: server: defer the initial recv completion logic to smb_direct_negotiate_recv_work()
smb: server: initialize recv_io->cqe.done = recv_done just once
smb: smbdirect: introduce smbdirect_socket.connect.{lock,work}
Linus Torvalds [Tue, 16 Dec 2025 07:28:20 +0000 (19:28 +1200)]
Merge tag 'for-6.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix missing btrfs_path release after printing a relocation error
message
- fix extent changeset leak on mmap write after failure to reserve
metadata
- fix fs devices list structure freeing, it could be potentially leaked
under some circumstances
- tree log fixes:
- fix incremental directory logging where inodes for new dentries
were incorrectly skipped
- don't log conflicting inode if it's a directory moved in the
current transaction
- regression fixes:
- fix incorrect btrfs_path freeing when it's auto-cleaned
- revert commit simplifying preallocation of temporary structures
in qgroup functions, some cases were not handled properly
* tag 'for-6.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix changeset leak on mmap write after failure to reserve metadata
btrfs: fix memory leak of fs_devices in degraded seed device path
btrfs: fix a potential path leak in print_data_reloc_error()
Revert "btrfs: add ASSERTs on prealloc in qgroup functions"
btrfs: do not skip logging new dentries when logging a new name
btrfs: don't log conflicting inode if it's a dir moved in the current transaction
btrfs: tests: fix double btrfs_path free in remove_extent_ref()
Linus Torvalds [Tue, 16 Dec 2025 07:24:35 +0000 (19:24 +1200)]
Merge tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix memory leak when destroying helper kthread workers during
scheduler disable
- Fix bypass depth accounting on scx_enable() failure which could leave
the system permanently in bypass mode
- Fix missing preemption handling when moving tasks to local DSQs via
scx_bpf_dsq_move()
- Misc fixes including NULL check for put_prev_task(), flushing stdout
in selftests, and removing unused code
* tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Remove unused code in the do_pick_task_scx()
selftests/sched_ext: flush stdout before test to avoid log spam
sched_ext: Fix missing post-enqueue handling in move_local_task_to_local_dsq()
sched_ext: Factor out local_dsq_post_enq() from dispatch_enqueue()
sched_ext: Fix bypass depth leak on scx_enable() failure
sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
sched_ext: Fix the memleak for sch->helper objects
Linus Torvalds [Tue, 16 Dec 2025 07:21:17 +0000 (19:21 +1200)]
Merge tag 'cgroup-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
- Fix a race condition in css_rstat_updated() where CMPXCHG without
LOCK prefix could cause lnode corruption when the flusher runs
concurrently on another CPU. The issue was introduced in 6.17 and
causes memcg stats to become corrupted in production.
* tag 'cgroup-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated
Al Viro [Sat, 13 Dec 2025 22:50:23 +0000 (17:50 -0500)]
shmem: fix recovery on rename failures
maple_tree insertions can fail if we are seriously short on memory;
simple_offset_rename() does not recover well if it runs into that.
The same goes for simple_offset_rename_exchange().
Moreover, shmem_whiteout() expects that if it succeeds, the caller will
progress to d_move(), i.e. that shmem_rename2() won't fail past the
successful call of shmem_whiteout().
Not hard to fix, fortunately - mtree_store() can't fail if the index we
are trying to store into is already present in the tree as a singleton.
For simple_offset_rename_exchange() that's enough - we just need to be
careful about the order of operations.
For simple_offset_rename() solution is to preinsert the target into the
tree for new_dir; the rest can be done without any potentially failing
operations.
That preinsertion has to be done in shmem_rename2() rather than in
simple_offset_rename() itself - otherwise we'd need to deal with the
possibility of failure after successful shmem_whiteout().
Fixes: a2e459555c5f ("shmem: stable directory offsets") Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Namjae Jeon [Sun, 14 Dec 2025 06:06:34 +0000 (15:06 +0900)]
ksmbd: fix buffer validation by including null terminator size in EA length
The smb2_set_ea function, which handles Extended Attributes (EA),
was performing buffer validation checks that incorrectly omitted the size
of the null terminating character (+1 byte) for EA Name.
This patch fixes the issue by explicitly adding '+ 1' to EaNameLength where
the null terminator is expected to be present in the buffer, ensuring
the validation accurately reflects the total required buffer size.
Cc: stable@vger.kernel.org Reported-by: Roger <roger.andersen@protonmail.com> Reported-by: Stanislas Polu <spolu@dust.tt> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Sun, 14 Dec 2025 06:05:56 +0000 (15:05 +0900)]
ksmbd: Fix refcount leak when invalid session is found on session lookup
When a session is found but its state is not SMB2_SESSION_VALID, It
indicates that no valid session was found, but it is missing to decrement
the reference count acquired by the session lookup, which results in
a reference count leak. This patch fixes the issue by explicitly calling
ksmbd_user_session_put to release the reference to the session.
Cc: stable@vger.kernel.org Reported-by: Alexandre <roger.andersen@protonmail.com> Reported-by: Stanislas Polu <spolu@dust.tt> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Chen Ni [Tue, 18 Nov 2025 01:32:29 +0000 (09:32 +0800)]
ksmbd: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: server: defer the initial recv completion logic to smb_direct_negotiate_recv_work()
The previous change to relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in
recv_done() and smb_direct_cm_handler() seems to work around the
problem that the order of initial recv completion and
RDMA_CM_EVENT_ESTABLISHED is random, but it's still
a bit ugly.
This implements a better solution deferring the recv completion
processing to smb_direct_negotiate_recv_work(), which is queued
only if both events arrived.
In order to avoid more basic changes to the main recv_done
callback, I introduced a smb_direct_negotiate_recv_done,
which is only used for the first pdu, this will allow
further cleanup and simplifications in recv_done
as a future patch.
smb_direct_negotiate_recv_work() is also very basic
with only basic error checking and the transition
from SMBDIRECT_SOCKET_NEGOTIATE_NEEDED to
SMBDIRECT_SOCKET_NEGOTIATE_RUNNING, which allows
smb_direct_prepare() to continue as before.
Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: server: initialize recv_io->cqe.done = recv_done just once
smbdirect_recv_io structures are pre-allocated so we can set the
callback function just once.
This will make it easy to move smb_direct_post_recv to common code
soon.
Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
This will first be used by the server in order to defer
the processing of the initial recv of the negotiation
request.
But in future it will also be used by the client in order
to implement an async connect.
Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Jens Remus [Thu, 11 Dec 2025 11:24:50 +0000 (12:24 +0100)]
s390/stacktrace: Do not fallback to RA register
The logic to fallback to the return address (RA) register value in
the topmost frame when stack tracing using back chain is broken in
multiple ways:
When assuming the RA register 14 has not been saved yet one must assume
that a new user stack frame has not been allocated either. Therefore
the back chain would not contain the stack pointer (SP) at entry, but
the caller's SP at its entry instead.
Therefore when falling back to the RA register 14 value it would also be
necessary to fallback to the SP register 15 value. Otherwise an invalid
combination of RA register 14 and caller's SP at its entry (from the
back chain) is used.
In the topmost frame the back chain contains either the caller's SP at
its entry (before having allocated a new stack frame in the prologue),
the SP at entry (after having allocated a new stack frame), or an
uninitialized value (during static/dynamic stack allocation). In both
cases where the back chain is valid either the caller or prologue must
have saved its respective RA to the respective frame. Therefore, if the
RA obtained from the frame pointed to by the back chain is invalid, this
does not indicate that the IP in the topmost frame is still early in the
prologue and the RA has not been saved.
Benjamin Block [Fri, 5 Dec 2025 15:47:17 +0000 (16:47 +0100)]
s390/pci: Fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
When triggering PCI device recovery by writing into the SysFS attribute
`recover` of a Physical Function with existing child SR-IOV Virtual
Functions, lockdep is reporting a possible deadlock between three
threads:
In zpci_bus_scan_busses() the `zbus_list_lock` is taken for the whole
duration of the function, which also includes taking
`pci_rescan_remove_lock`, among other things. But `zbus_list_lock` only
really needs to protect the modification of the global registration
`zbus_list`, it can be dropped while the functions within the list
iteration run; this way we break the cycle above.
Break up zpci_bus_scan_busses() into an "iterator" zpci_bus_get_next()
that iterates over `zbus_list` element by element, and acquires and
releases `zbus_list_lock` as necessary, but never keep holding it.
References to `zpci_bus` objects are also acquired and released.
The reference counting on `zpci_bus` objects is also changed so that all
put() and get() operations are done under the protection of
`zbus_list_lock`, and if the operation results in a modification of
`zpci_bus_list`, this modification is done in the same critical section
(apart the very first initialization). This way objects are never seen
on the list that are about to be released and/or half-initialized.
Fixes: 14c87ba8123a ("s390/pci: separate zbus registration from scanning") Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Fri, 5 Dec 2025 09:58:57 +0000 (10:58 +0100)]
s390/ipl: Clear SBP flag when bootprog is set
With z16 a new flag 'search boot program' was introduced for
list-directed IPL (SCSI, NVMe, ECKD DASD). If this flag is set,
e.g. via selecting the "Automatic" value for the "Boot program
selector" control on an HMC load panel, it is copied to the reipl
structure from the initial ipl structure. When a user now sets a
boot prog via sysfs, the flag is not cleared and the bootloader
will again automatically select the boot program, ignoring user
configuration.
To avoid that, clear the SBP flag when a bootprog sysfs file is
written.
Cc: stable@vger.kernel.org Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Linus Torvalds [Sun, 14 Dec 2025 03:35:35 +0000 (15:35 +1200)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The only core fix is in doc; all the others are in drivers, with the
biggest impacts in libsas being the rollback on error handling and in
ufs coming from a couple of error handling fixes, one causing a crash
if it's activated before scanning and the other fixing W-LUN
resumption"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom: Fix confusing cleanup.h syntax
scsi: libsas: Add rollback handling when an error occurs
scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
scsi: ufs: core: Fix a deadlock in the frequency scaling code
scsi: ufs: core: Fix an error handler crash
scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
scsi: qla4xxx: Use time conversion macros
scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
scsi: imm: Fix use-after-free bug caused by unfinished delayed work
scsi: target: sbp: Remove KMSG_COMPONENT macro
scsi: core: Correct documentation for scsi_device_quiesce()
scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
scsi: target: Reset t_task_cdb pointer in error case
scsi: ufs: core: Fix EH failure after W-LUN resume error
Linus Torvalds [Sun, 14 Dec 2025 03:24:10 +0000 (15:24 +1200)]
Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"We have a patch that adds an initial set of tracepoints to the MDS
client from Max, a fix that hardens osdmap parsing code from myself
(marked for stable) and a few assorted fixups"
* tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client:
rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES
ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES
libceph: make decode_pool() more resilient against corrupted osdmaps
libceph: Amend checking to fix `make W=1` build breakage
ceph: Amend checking to fix `make W=1` build breakage
ceph: add trace points to the MDS client
libceph: fix log output race condition in OSD client
T.J. Mercier [Sun, 7 Dec 2025 09:10:04 +0000 (01:10 -0800)]
bpf: Fix bpf_seq_read docs for increased buffer size
Commit af65320948b8 ("bpf: Bump iter seq size to support BTF
representation of large data structures") increased the fixed buffer
size from PAGE_SIZE to PAGE_SIZE << 3, but the docs for the function
didn't get updated at the same time. Update them.
Linus Torvalds [Sat, 13 Dec 2025 18:12:46 +0000 (06:12 +1200)]
Merge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:
- Fix CPU hotplug callbacks to disable interrupts on UP kernels
* tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Make atomic hotplug callbacks run with interrupts disabled on UP
Linus Torvalds [Sat, 13 Dec 2025 18:10:35 +0000 (06:10 +1200)]
Merge tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
- Fix NULL pointer dereference crash in the Intel PMU driver
- Fix missing read event generation on task exit
- Fix AMD uncore driver init error handling
- Fix whitespace noise
* tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
perf/core: Fix missing read event generation on task exit
perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error
perf/uprobes: Remove <space><Tab> whitespace noise
Linus Torvalds [Sat, 13 Dec 2025 18:07:09 +0000 (06:07 +1200)]
Merge tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix error code in the irqchip/mchp-eic driver
- Fix setup_percpu_irq() affinity assumptions
- Remove the unused irq_domain_add_tree() function
* tag 'irq-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mchp-eic: Fix error code in mchp_eic_domain_alloc()
irqdomain: Delete irq_domain_add_tree()
genirq: Allow NULL affinity for setup_percpu_irq()
Linus Torvalds [Sat, 13 Dec 2025 18:04:16 +0000 (06:04 +1200)]
Merge tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
- Improve bug reporting
- Suppress W=1 format warning
- Improve rseq scalability on Clang builds
* tag 'core-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Always inline rseq_debug_syscall_return()
bug: Hush suggest-attribute=format for __warn_printf()
bug: Let report_bug_entry() provide the correct bugaddr
Linus Torvalds [Sat, 13 Dec 2025 08:55:12 +0000 (20:55 +1200)]
Merge tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"There are no significant series in this small merge. Please see the
individual changelogs for details"
[ Editor's note: it's mainly ocfs2 and a couple of random fixes ]
* tag 'mm-nonmm-stable-2025-12-11-11-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: memfd_luo: add CONFIG_SHMEM dependency
mm: shmem: avoid build warning for CONFIG_SHMEM=n
ocfs2: fix memory leak in ocfs2_merge_rec_left()
ocfs2: invalidate inode if i_mode is zero after block read
ocfs2: avoid -Wflex-array-member-not-at-end warning
ocfs2: convert remaining read-only checks to ocfs2_emergency_state
ocfs2: add ocfs2_emergency_state helper and apply to setattr
checkpatch: add uninitialized pointer with __free attribute check
args: fix documentation to reflect the correct numbers
ocfs2: fix kernel BUG in ocfs2_find_victim_chain
liveupdate: luo_core: fix redundant bound check in luo_ioctl()
ocfs2: validate inline xattr size and entry count in ocfs2_xattr_ibody_list
fs/fat: remove unnecessary wrapper fat_max_cache()
ocfs2: replace deprecated strcpy with strscpy
ocfs2: check tl_used after reading it from trancate log inode
liveupdate: luo_file: don't use invalid list iterator
Brown paper bag time. This is a silly oversight where I missed to drop
the error condition checking to ensure we clean up on early error
returns. I have an internal unit testset coming up for this which will
catch all such issues going forward.
Reported-by: Chris Mason <clm@fb.com> Reported-by: Jeff Layton <jlayton@kernel.org> Fixes: 011703a9acd7 ("file: add FD_{ADD,PREPARE}()") Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 13 Dec 2025 07:57:41 +0000 (19:57 +1200)]
x86/hv: Add gitignore entry for generated header file
Commit 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") added a
new generated header file for the offsets into the mshv_vtl_cpu_context
structure to be used by the low-level assembly code. But it didn't add
the .gitignore file to go with it, so 'git status' and friends will
mention it.
Let's add the gitignore file before somebody thinks that generated
header should be committed.
Linus Torvalds [Sat, 13 Dec 2025 05:39:28 +0000 (17:39 +1200)]
Merge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel
Pull more drm fixes from Dave Airlie:
"These are the enqueued fixes that ended up in our fixes branch,
nouveau mostly, along with some small fixes in other places.
plane:
- Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties()
ttm:
- fix devcoredump for evicted bos
panel:
- Fix stack usage warning in novatek-nt35560
nouveau:
- alloc fwsec sb at boot to avoid s/r problems
- fix strcpy usage
- fix i2c encoder crash
bridge:
- Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83
mgag200:
- Fix bigendian handling in mgag200
tilcdc:
- Fix probe failure in tilcdc"
* tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
drm/mgag200: Fix big-endian support
drm/tilcdc: Fix removal actions in case of failed probe
drm/ttm: Avoid NULL pointer deref for evicted BOs
drm: nouveau: Replace sprintf() with sysfs_emit()
drm/nouveau: fix circular dep oops from vendored i2c encoder
drm/nouveau: refactor deprecated strcpy
drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties()
drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors
drm/nouveau/gsp: Allocate fwsec-sb at boot
drm/panel: novatek-nt35560: avoid on-stack device structure
i915:
- Fix format string truncation warning
- FIx runtime PM reference during fbdev BO creation
panthor:
- fix UAF
renesas:
- fix sync flag handling"
* tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/amd/display: Fix pbn to kbps Conversion"
drm/amd: Fix unbind/rebind for VCN 4.0.5
drm/i915: Fix format string truncation warning
drm/i915/fbdev: Hold runtime PM ref during fbdev BO creation
drm/amd/display: Improve HDMI info retrieval
drm/amdkfd: bump minimum vgpr size for gfx1151
drm/amd/display: shrink struct members
drm/amdkfd: Export the cwsr_size and ctl_stack_size to userspace
drm/amd/display: Refactor dml_core_mode_support to reduce stack frame
drm/amdgpu: don't attach the tlb fence for SI
drm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()
drm/amdkfd: Trap handler support for expert scheduling mode
drm/amdkfd: Use huge page size to check split svm range alignment
drm/rcar-du: dsi: Handle both DRM_MODE_FLAG_N.SYNC and !DRM_MODE_FLAG_P.SYNC
drm/gem-shmem: revert the 8-byte alignment constraint
drm/gem-dma: revert the 8-byte alignment constraint
drm/panthor: Prevent potential UAF in group creation
Linus Torvalds [Sat, 13 Dec 2025 05:15:16 +0000 (17:15 +1200)]
Merge tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull further i3c update from Alexandre Belloni:
"We are removing a legacy API callback and having this sooner rather
than later will help ensuring no one introduces a new driver using it.
I've also added patches removing the "__free(...) = NULL" pattern
because I'm sure we won't avoid people sending those following the
mailing list discussion..."
* tag 'i3c/for-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: adi: Fix confusing cleanup.h syntax
i3c: master: Fix confusing cleanup.h syntax
i3c: master: cleanup callback .priv_xfers()
i3c: master: switch to use new callback .i3c_xfers() from .priv_xfers()
Linus Torvalds [Sat, 13 Dec 2025 05:09:06 +0000 (17:09 +1200)]
Merge tag 'rtc-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Subsystem:
- stop setting max_user_freq from the individual drivers as this has
not been hardware related for a while
New drivers:
- Andes ATCRTC100
- Apple SMC
- Nvidia VRS
Linus Torvalds [Sat, 13 Dec 2025 04:36:57 +0000 (16:36 +1200)]
Merge tag 'gpio-fixes-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
- fix spinlock op type after conversion to lock guards
- fix a memory leak in error path in gpio-regmap
- Kconfig fixes in GPIO drivers
- add a GPIO ACPI quirk for Dell Precision 7780
- set of fixes for shared GPIO management
* tag 'gpio-fixes-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: shared: make locking more fine-grained
gpio: shared: fix auxiliary device cleanup order
gpio: shared: check if a reference is populated before cleaning its resources
gpio: shared: fix NULL-pointer dereference in teardown path
gpio: shared: ignore disabled nodes when traversing the device-tree
gpiolib: acpi: Add quirk for Dell Precision 7780
gpio: tb10x: fix OF_GPIO dependency
gpio: qixis: select CONFIG_REGMAP_MMIO
gpio: regmap: Fix memleak in error path in gpio_regmap_register()
gpio: mmio: fix bad guard conversion
Linus Torvalds [Sat, 13 Dec 2025 04:09:10 +0000 (16:09 +1200)]
Merge tag 'sound-fix-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only slightly large change is the enablement of CIX HD-audio
controller, which took a bit time to be cooked up, while most of other
changes are device-specific small trivial fixes:
- Default disablement of the kconfig for decades old pre-release
alsa-lib PCM API; it's only the default config value change, so it
can't lead to any regressions for the existing setups
- Support for CIX HD-audio controller
- A few ASoC ACP fixes
- Fixes for ASoC cirrus, bcm, wcd, qcom, ak platforms
- Trivial hardening for FireWire and USB-audio
- HD-audio Intel binding fix and quirks"
* tag 'sound-fix-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits)
ALSA: hda/tas2781: Add new quirk for HP new project
ALSA: hda: cix-ipbloq: Use modern PM ops
ALSA: hda: intel-dsp-config: Prefer legacy driver as fallback
ASoC: amd: acp: update tdm channels for specific DAI
ASoC: cs35l56: Fix incorrect select SND_SOC_CS35L56_CAL_SYSFS_COMMON
ALSA: firewire-motu: add bounds check in put_user loop for DSP events
ASoC: cs35l41: Always return 0 when a subsystem ID is found
ALSA: uapi: Fix typo in asound.h comment
ALSA: Do not build obsolete API
ALSA: hda: add CIX IPBLOQ HDA controller support
ALSA: hda/core: add addr_offset field for bus address translation
ALSA: hda: dt-bindings: add CIX IPBLOQ HDA controller support
ALSA: hda/realtek: Add support for ASUS UM3406GA
ALSA: hda/realtek: Add support for HP Turbine Laptops
ALSA: usb-audio: Initialize status1 to fix uninitialized symbol errors
ALSA: firewire-motu: fix buffer overflow in hwdep read for DSP events
ALSA: hda: cs35l41: Fix NULL pointer dereference in cs35l41_hda_read_acpi()
ASoC: cros_ec_codec: Remove unnecessary selection of CRYPTO
ASoc: qcom: q6afe: fix bad guard conversion
ASoC: rockchip: Fix Wvoid-pointer-to-enum-cast warning (again)
...
Dave Airlie [Sat, 13 Dec 2025 00:54:28 +0000 (10:54 +1000)]
Merge tag 'drm-misc-fixes-2025-12-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.19-rc1:
- Fix stack usage warning in novatek-nt35560.
- Fix s/r, i2c issues in nouveau and update string handling.
- Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83.
- Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties().
- Fix devcoredump crash on reading evicted bo's.
- Fix bigendian handling in mgag200.
- Fix probe failure in tilcdc.
Initializing automatic __free variables to NULL without need (e.g.
branches with different allocations), followed by actual allocation is
in contrary to explicit coding rules guiding cleanup.h:
"Given that the "__free(...) = NULL" pattern for variables defined at
the top of the function poses this potential interdependency problem the
recommendation is to always define and assign variables in one statement
and not group variable definitions at the top of the function when
__free() is used."
Code does not have a bug, but is less readable and uses discouraged
coding practice, so fix that by moving declaration to the place of
assignment.
Initializing automatic __free variables to NULL without need (e.g.
branches with different allocations), followed by actual allocation is
in contrary to explicit coding rules guiding cleanup.h:
"Given that the "__free(...) = NULL" pattern for variables defined at
the top of the function poses this potential interdependency problem the
recommendation is to always define and assign variables in one statement
and not group variable definitions at the top of the function when
__free() is used."
Code does not have a bug, but is less readable and uses discouraged
coding practice, so fix that by moving declaration to the place of
assignment.
Not that other existing usage of __free() in this context is a corret
exception initialized to NULL, because the actual allocation is branched
in if().
Emil Tsalapatis [Fri, 5 Dec 2025 21:23:14 +0000 (13:23 -0800)]
selftests/sched_ext: flush stdout before test to avoid log spam
The sched_ext selftests runner runs each test in the same process,
with each test possibly forking multiple times. When the main runner
has not flushed its stdout, the children inherit the buffered output
for previous tests and emit it during exit. This causes log spam.
Make sure stdout/stderr is fully flushed before each test.
Linus Torvalds [Fri, 12 Dec 2025 17:44:03 +0000 (05:44 +1200)]
Merge tag 'loongarch-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Add basic LoongArch32 support
Note: Build infrastructures of LoongArch32 are not enabled yet,
because we need to adjust irqchip drivers and wait for GNU toolchain
be upstream first.
- Select HAVE_ARCH_BITREVERSE in Kconfig
- Fix build and boot for CONFIG_RANDSTRUCT
- Correct the calculation logic of thread_count
- Some bug fixes and other small changes
* tag 'loongarch-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (22 commits)
LoongArch: Adjust default config files for 32BIT/64BIT
LoongArch: Adjust VDSO/VSYSCALL for 32BIT/64BIT
LoongArch: Adjust misc routines for 32BIT/64BIT
LoongArch: Adjust user accessors for 32BIT/64BIT
LoongArch: Adjust system call for 32BIT/64BIT
LoongArch: Adjust module loader for 32BIT/64BIT
LoongArch: Adjust time routines for 32BIT/64BIT
LoongArch: Adjust process management for 32BIT/64BIT
LoongArch: Adjust memory management for 32BIT/64BIT
LoongArch: Adjust boot & setup for 32BIT/64BIT
LoongArch: Adjust common macro definitions for 32BIT/64BIT
LoongArch: Add adaptive CSR accessors for 32BIT/64BIT
LoongArch: Add atomic operations for 32BIT/64BIT
LoongArch: Add new PCI ID for pci_fixup_vgadev()
LoongArch: Add and use some macros for AVEC
LoongArch: Correct the calculation logic of thread_count
LoongArch: Use unsigned long for _end and _text
LoongArch: Use __pmd()/__pte() for swap entry conversions
LoongArch: Fix arch_dup_task_struct() for CONFIG_RANDSTRUCT
LoongArch: Fix build errors for CONFIG_RANDSTRUCT
...
Tejun Heo [Fri, 12 Dec 2025 01:45:04 +0000 (15:45 -1000)]
sched_ext: Fix missing post-enqueue handling in move_local_task_to_local_dsq()
move_local_task_to_local_dsq() is used when moving a task from a non-local
DSQ to a local DSQ on the same CPU. It directly manipulates the local DSQ
without going through dispatch_enqueue() and was missing the post-enqueue
handling that triggers preemption when SCX_ENQ_PREEMPT is set or the idle
task is running.
The function is used by move_task_between_dsqs() which backs
scx_bpf_dsq_move() and may be called while the CPU is busy.
Add local_dsq_post_enq() call to move_local_task_to_local_dsq(). As the
dispatch path doesn't need post-enqueue handling, add SCX_RQ_IN_BALANCE
early exit to keep consume_dispatch_q() behavior unchanged and avoid
triggering unnecessary resched when scx_bpf_dsq_move() is used from the
dispatch path.
Tejun Heo [Fri, 12 Dec 2025 01:45:03 +0000 (15:45 -1000)]
sched_ext: Factor out local_dsq_post_enq() from dispatch_enqueue()
Factor out local_dsq_post_enq() which performs post-enqueue handling for
local DSQs - triggering resched_curr() if SCX_ENQ_PREEMPT is specified or if
the current CPU is idle. No functional change.
This will be used by the next patch to fix move_local_task_to_local_dsq().
Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Filipe Manana [Thu, 11 Dec 2025 11:51:19 +0000 (11:51 +0000)]
btrfs: fix changeset leak on mmap write after failure to reserve metadata
If the call to btrfs_delalloc_reserve_metadata() fails we jump to the
'out_noreserve' label and there we never free the extent_changeset
allocated by the previous call to btrfs_check_data_free_space() (if
qgroups are enabled). Fix this by calling extent_changeset_free() under
the 'out_noreserve' label.
Fixes: 6599716de2d6 ("btrfs: fix -ENOSPC mmap write failure on NOCOW files/extents") Reported-by: syzbot+2f8aa76e6acc9fce6638@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/693a635a.a70a0220.33cd7b.0029.GAE@google.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs: fix memory leak of fs_devices in degraded seed device path
In open_seed_devices(), when find_fsid() fails and we're in DEGRADED
mode, a new fs_devices is allocated via alloc_fs_devices() but is never
added to the seed_list before returning. This contrasts with the normal
path where fs_devices is properly added via list_add().
If any error occurs later in read_one_dev() or btrfs_read_chunk_tree(),
the cleanup code iterates seed_list to free seed devices, but this
orphaned fs_devices is never found and never freed, causing a memory
leak. Any devices allocated via add_missing_dev() and attached to this
fs_devices are also leaked.
Fix this by adding the newly allocated fs_devices to seed_list in the
degraded path, consistent with the normal path.
Fixes: 5f37583569442 ("Btrfs: move the missing device to its own fs device list") Reported-by: syzbot+eadd98df8bceb15d7fed@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=eadd98df8bceb15d7fed Tested-by: syzbot+eadd98df8bceb15d7fed@syzkaller.appspotmail.com Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
- Fix chacha-riscv64-zvkb.S to not use frame pointer for data"
* tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
crypto: arm64/ghash - Fix incorrect output from ghash-neon
crypto/arm64: sm4/xts - Merge ksimd scopes to reduce stack bloat
crypto/arm64: aes/xts - Use single ksimd scope to reduce stack bloat
lib/crypto: blake2s: Replace manual unrolling with unrolled_full
lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bit
lib/crypto: riscv: Depend on RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS
lib/crypto: riscv/chacha: Avoid s0/fp register
Linus Torvalds [Fri, 12 Dec 2025 10:04:18 +0000 (22:04 +1200)]
Merge tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Always initialize DMA state, fixing a potentially nasty issue on the
block side
- btrfs zoned write fix with cached zone reports
- Fix corruption issues in bcache with chained bio's, and further make
it clear that the chained IO handler is simply a marker, it's not
code meant to be executed
- Kill old code dealing with synchronous IO polling in the block layer,
that has been dead for a long time. Only async polling is supported
these days
- Fix a lockdep issue in tag_set management, moving it to RCU
- Fix an issue with ublks bio_vec iteration
- Don't unconditionally enforce blocking issue of ublk control
commands, allow some of them with non-blocking issue as they
do not block
* tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
blk-mq-dma: always initialize dma state
blk-mq: delete task running check in blk_hctx_poll()
block: fix cached zone reports on devices with native zone append
block: Use RCU in blk_mq_[un]quiesce_tagset() instead of set->tag_list_lock
ublk: don't mutate struct bio_vec in iteration
block: prohibit calls to bio_chain_endio
bcache: fix improper use of bi_end_io
ublk: allow non-blocking ctrl cmds in IO_URING_F_NONBLOCK issue
Linus Torvalds [Fri, 12 Dec 2025 10:01:32 +0000 (22:01 +1200)]
Merge tag 'io_uring-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fix from Jens Axboe:
"Single fix for io_uring headed to stable, fixing an issue introduced
with the min_wait support earlier this year, where SQPOLL didn't get
correctly woken if an event arrived once the event waiting has
finished the min_wait portion.
As we already have regression tests for this added and people
reporting new failures there, let's get this one flushed out
so it can bubble back down to stable as well"
* tag 'io_uring-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: fix min_wait wakeups for SQPOLL
Linus Torvalds [Fri, 12 Dec 2025 09:59:19 +0000 (21:59 +1200)]
Merge tag 'v6.19-rc-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- minor cleanup
- minor update to comment to avoid confusion about fs type
* tag 'v6.19-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
smb/server: add comment to FileSystemName of FileFsAttributeInformation
smb/server: remove unused nterr.h
smb/server: rename include guard in smb_common.h
Linus Torvalds [Fri, 12 Dec 2025 09:52:42 +0000 (21:52 +1200)]
Merge tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- Fix 'nlink' attribute update races when unlinking a file
- Add missing initialisers for the directory verifier in various
places
- Don't regress the NFSv4 open state due to misordered racing replies
- Ensure the NFSv4.x callback server uses the correct transport
connection
- Fix potential use-after-free races when shutting down the NFSv4.x
callback server
- Fix a pNFS layout commit crash
- Assorted fixes to ensure correct propagation of mount options when
the client crosses a filesystem boundary and triggers the VFS
automount code
- More localio fixes
Features and cleanups:
- Add initial support for basic directory delegations
- SunRPC back channel code cleanups"
* tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
NFSv4: Handle NFS4ERR_NOTSUPP errors for directory delegations
nfs/localio: remove 61 byte hole from needless ____cacheline_aligned
nfs/localio: remove alignment size checking in nfs_is_local_dio_possible
NFS: Fix up the automount fs_context to use the correct cred
NFS: Fix inheritance of the block sizes when automounting
NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags
Revert "nfs: ignore SB_RDONLY when mounting nfs"
Revert "nfs: clear SB_RDONLY before getting superblock"
Revert "nfs: ignore SB_RDONLY when remounting nfs"
NFS: Add a module option to disable directory delegations
NFS: Shortcut lookup revalidations if we have a directory delegation
NFS: Request a directory delegation during RENAME
NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK
NFS: Add support for sending GDD_GETATTR
NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalid
NFSv4.1: protect destroying and nullifying bc_serv structure
SUNRPC: new helper function for stopping backchannel server
SUNRPC: cleanup common code in backchannel request
NFSv4.1: pass transport for callback shutdown
NFSv4: ensure the open stateid seqid doesn't go backwards
...
Brendan Jackman [Sun, 7 Dec 2025 03:53:18 +0000 (03:53 +0000)]
bug: Hush suggest-attribute=format for __warn_printf()
Recent additions to this function cause GCC 14.3.0 to get excited
(W=1) and suggest a missing attribute:
lib/bug.c: In function '__warn_printf':
lib/bug.c:187:25: error: function '__warn_printf' be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
187 | vprintk(fmt, *args);
| ^~~~~~~
Disable the diagnostic locally, following the pattern used for stuff
like va_format().
Heiko Carstens [Mon, 8 Dec 2025 20:06:58 +0000 (21:06 +0100)]
bug: Let report_bug_entry() provide the correct bugaddr
report_bug_entry() always provides zero for bugaddr but could easily
extract the correct address from the provided bug_entry. Just do that to
have proper warning messages.
E.g. adding an artificial:
void foo(void) { WARN_ONCE(1, "bar"); }
function generates this warning message:
WARNING: arch/s390/kernel/setup.c:1017 at 0x0, CPU#0: swapper/0/0
^^^
With the correct bug address this changes to:
WARNING: arch/s390/kernel/setup.c:1017 at foo+0x1c/0x40, CPU#0: swapper/0/0
^^^^^^^^^^^^^
Evan Li [Fri, 12 Dec 2025 08:49:43 +0000 (16:49 +0800)]
perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()
handle_pmi_common() may observe an active bit set in cpuc->active_mask
while the corresponding cpuc->events[] entry has already been cleared,
which leads to a NULL pointer dereference.
This can happen when interrupt throttling stops all events in a group
while PEBS processing is still in progress. perf_event_overflow() can
trigger perf_event_throttle_group(), which stops the group and clears
the cpuc->events[] entry, but the active bit may still be set when
handle_pmi_common() iterates over the events.
The following recent fix:
7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss")
moved the cpuc->events[] clearing from x86_pmu_stop() to x86_pmu_del() and
relied on cpuc->active_mask/pebs_enabled checks. However,
handle_pmi_common() can still encounter a NULL cpuc->events[] entry
despite the active bit being set.
Add an explicit NULL check on the event pointer before using it,
to cover this legitimate scenario and avoid the NULL dereference crash.
Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss") Reported-by: kitta <kitta@linux.alibaba.com> Co-developed-by: kitta <kitta@linux.alibaba.com> Signed-off-by: Evan Li <evan.li@linux.alibaba.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251212084943.2124787-1-evan.li@linux.alibaba.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220855
Tejun Heo [Tue, 9 Dec 2025 21:04:33 +0000 (11:04 -1000)]
sched_ext: Fix bypass depth leak on scx_enable() failure
scx_enable() calls scx_bypass(true) to initialize in bypass mode and then
scx_bypass(false) on success to exit. If scx_enable() fails during task
initialization - e.g. scx_cgroup_init() or scx_init_task() returns an error -
it jumps to err_disable while bypass is still active. scx_disable_workfn()
then calls scx_bypass(true/false) for its own bypass, leaving the bypass depth
at 1 instead of 0. This causes the system to remain permanently in bypass mode
after a failed scx_enable().
Failures after task initialization is complete - e.g. scx_tryset_enable_state()
at the end - already call scx_bypass(false) before reaching the error path and
are not affected. This only affects a subset of failure modes.
Fix it by tracking whether scx_enable() called scx_bypass(true) in a bool and
having scx_disable_workfn() call an extra scx_bypass(false) to clear it. This
is a temporary measure as the bypass depth will be moved into the sched
instance, which will make this tracking unnecessary.
Fixes: 8c2090c504e9 ("sched_ext: Initialize in bypass mode") Cc: stable@vger.kernel.org # v6.12+ Reported-by: Chris Mason <clm@meta.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/stable/286e6f7787a81239e1ce2989b52391ce%40kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
When building without CONFIG_PM_SLEEP, there are several warnings (or
errors with CONFIG_WERROR=y / W=e) from the cix-ipbloq driver:
sound/hda/controllers/cix-ipbloq.c:378:12: error: 'cix_ipbloq_hda_runtime_resume' defined but not used [-Werror=unused-function]
378 | static int cix_ipbloq_hda_runtime_resume(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/hda/controllers/cix-ipbloq.c:362:12: error: 'cix_ipbloq_hda_runtime_suspend' defined but not used [-Werror=unused-function]
362 | static int cix_ipbloq_hda_runtime_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/hda/controllers/cix-ipbloq.c:349:12: error: 'cix_ipbloq_hda_resume' defined but not used [-Werror=unused-function]
349 | static int cix_ipbloq_hda_resume(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~
sound/hda/controllers/cix-ipbloq.c:336:12: error: 'cix_ipbloq_hda_suspend' defined but not used [-Werror=unused-function]
336 | static int cix_ipbloq_hda_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~
When CONFIG_PM and CONFIG_PM_SLEEP are unset, SET_SYSTEM_SLEEP_PM_OPS()
and SET_RUNTIME_PM_OPS() evaluate to nothing, so these functions appear
unused to the compiler in this configuration.
Use the modern SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS macros to resolve
these warnings, which is what they are intended to do. Additionally,
wrap &cix_ipbloq_hda_pm in pm_ptr() to ensure the compiler can drop the
entire structure when CONFIG_PM is unset.
Linus Torvalds [Thu, 11 Dec 2025 03:13:29 +0000 (12:13 +0900)]
Merge tag 'for-6.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- convert crypto_shash users to direct crypto library use with simpler
and faster code and reduced stack usage (Eric Biggers):
- the dm-verity SHA-256 conversion also teaches it to do two-way
interleaved hashing for added performance
- dm-crypt MD5 conversion (used for Loop-AES compatibility)
- added document for for takeover/reshape raid1 -> raid5 examples (Heinz Mauelshagen)
- fix dm-vdo kerneldoc warnings (Matthew Sakai)
- various random fixes and cleanups
* tag 'for-6.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
dm pcache: fix segment info indexing
dm pcache: fix cache info indexing
dm-pcache: advance slot index before writing slot
dm raid: add documentation for takeover/reshape raid1 -> raid5 table line examples
dm log-writes: Add missing set_freezable() for freezable kthread
dm-raid: fix possible NULL dereference with undefined raid type
dm-snapshot: fix 'scheduling while atomic' on real-time kernels
dm: ignore discard return value
MAINTAINERS: add Benjamin Marzinski as a device mapper maintainer
dm-mpath: Simplify the setup_scsi_dh code
dm vdo: fix kerneldoc warnings
dm-bufio: align write boundary on physical block size
dm-crypt: enable DM_TARGET_ATOMIC_WRITES
dm: test for REQ_ATOMIC in dm_accept_partial_bio()
dm-verity: remove useless mempool
dm-verity: disable recursive forward error correction
dm-ebs: Mark full buffer dirty even on partial write
dm mpath: enable DM_TARGET_ATOMIC_WRITES
dm verity fec: Expose corrected block count via status
dm: Don't warn if IMA_DISABLE_HTABLE is not enabled
...
Linus Torvalds [Thu, 11 Dec 2025 00:57:08 +0000 (09:57 +0900)]
Merge tag 'spi-fix-v6.19-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small fixes for SPI that came in during the merge window,
nothing too exciting here"
* tag 'spi-fix-v6.19-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: microchip-core: Fix an error handling path in mchp_corespi_probe()
spi: cadence-qspi: Fix runtime PM imbalance in probe
Linus Torvalds [Thu, 11 Dec 2025 00:54:59 +0000 (09:54 +0900)]
Merge tag 'regulator-fix-v6.19-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A few fixes that came in during the merge window, nothing too
exciting - the one core fix improves error propagation from gpiolib
which hopefully shouldn't actually happen but is safer"
* tag 'regulator-fix-v6.19-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: spacemit: Align input supply name with the DT binding
regulator: fixed: Rely on the core freeing the enable GPIO
regulator: check the return value of gpiod_set_value_cansleep()
Arnd Bergmann [Thu, 4 Dec 2025 10:01:58 +0000 (11:01 +0100)]
mm: memfd_luo: add CONFIG_SHMEM dependency
The new memfd code fails to link without SHMEM:
aarch64-linux-ld: mm/memfd_luo.o: in function `memfd_luo_retrieve_folios':
memfd_luo.c:(.text.memfd_luo_retrieve_folios+0xdc): undefined reference to `shmem_add_to_page_cache'
memfd_luo.c:(.text.memfd_luo_retrieve_folios+0x11c): undefined reference to `shmem_inode_acct_blocks'
memfd_luo.c:(.text.memfd_luo_retrieve_folios+0x134): undefined reference to `shmem_recalc_inode'
Add a Kconfig dependency to disallow that configuration.
Arnd Bergmann [Thu, 4 Dec 2025 10:28:59 +0000 (11:28 +0100)]
mm: shmem: avoid build warning for CONFIG_SHMEM=n
The newly added 'flags' variable is unused and causes a warning if
CONFIG_SHMEM is disabled, since the shmem_acct_size() macro it is passed
into does nothing:
mm/shmem.c: In function '__shmem_file_setup':
mm/shmem.c:5816:23: error: unused variable 'flags' [-Werror=unused-variable]
5816 | unsigned long flags = (vm_flags & VM_NORESERVE) ? SHMEM_F_NORESERVE : 0;
| ^~~~~
Replace the two macros with equivalent inline functions to get the
argument checking.
Link: https://lkml.kernel.org/r/20251204102905.1048000-1-arnd@kernel.org Fixes: 6ff1610ced56 ("mm: shmem: use SHMEM_F_* flags instead of VM_* flags") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Christian Brauner <brauner@kernel.org> Cc: guoweikang <guoweikang.kernel@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: invalidate inode if i_mode is zero after block read
A panic occurs in ocfs2_unlink due to WARN_ON(inode->i_nlink == 0) when
handling a corrupted inode with i_mode=0 and i_nlink=0 in memory.
This "zombie" inode is created because ocfs2_read_locked_inode proceeds
even after ocfs2_validate_inode_block successfully validates a block that
structurally looks okay (passes checksum, signature etc.) but contains
semantically invalid data (specifically i_mode=0). The current validation
function doesn't check for i_mode being zero.
This results in an in-memory inode with i_mode=0 being added to the VFS
cache, which later triggers the panic during unlink.
Prevent this by adding an explicit check for (i_mode == 0, i_nlink == 0,
non-orphan) within ocfs2_validate_inode_block. If the check is true,
return -EFSCORRUPTED to signal corruption. This causes the caller
(ocfs2_read_locked_inode) to invoke make_bad_inode(), correctly preventing
the zombie inode from entering the cache.
Link: https://lkml.kernel.org/r/20251202224507.53452-2-eraykrdg1@gmail.com Co-developed-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com> Signed-off-by: Ahmet Eray Karadag <eraykrdg1@gmail.com> Reported-by: syzbot+55c40ae8a0e5f3659f2b@syzkaller.appspotmail.com Fixes: https://syzkaller.appspot.com/bug?extid=55c40ae8a0e5f3659f2b Link: https://lore.kernel.org/all/20251022222752.46758-2-eraykrdg1@gmail.com/T/ Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: David Hunter <david.hunter.linux@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>