Sourabh Jain [Tue, 18 Nov 2025 07:10:23 +0000 (12:40 +0530)]
crash: export crashkernel CMA reservation to userspace
Add a sysfs entry /sys/kernel/kexec_crash_cma_ranges to expose all CMA
crashkernel ranges.
This allows userspace tools configuring kdump to determine how much memory
is reserved for crashkernel. If CMA is used, tools can warn users when
attempting to capture user pages with CMA reservation.
The new sysfs hold the CMA ranges in below format:
The reason for not including Crash CMA Ranges in /proc/iomem is to avoid
conflicts. It has been observed that contiguous memory ranges are
sometimes shown as two separate System RAM entries in /proc/iomem. If a
CMA range overlaps two System RAM ranges, adding crashk_res to /proc/iomem
can create a conflict. Reference [1] describes one such instance on the
PowerPC architecture.
Link: https://lkml.kernel.org/r/20251118071023.1673329-1-sourabhjain@linux.ibm.com Link: https://lore.kernel.org/all/20251016142831.144515-1-sourabhjain@linux.ibm.com/ Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sourabh Jain [Mon, 17 Nov 2025 03:51:53 +0000 (09:21 +0530)]
Documentation/ABI: add kexec and kdump sysfs interface
Add an ABI document for following kexec and kdump sysfs interface:
- /sys/kernel/kexec_loaded
- /sys/kernel/kexec_crash_loaded
- /sys/kernel/kexec_crash_size
- /sys/kernel/crash_elfcorehdr_size
Link: https://lkml.kernel.org/r/20251117035153.1199665-1-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shivang Upadhyay <shivangu@linux.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guan-Chun Wu [Fri, 14 Nov 2025 06:02:40 +0000 (14:02 +0800)]
ceph: replace local base64 helpers with lib/base64
Remove the ceph_base64_encode() and ceph_base64_decode() functions and
replace their usage with the generic base64_encode() and base64_decode()
helpers from lib/base64.
This eliminates the custom implementation in Ceph, reduces code
duplication, and relies on the shared Base64 code in lib. The helpers
preserve RFC 3501-compliant Base64 encoding without padding, so there are
no functional changes.
This change also improves performance: encoding is about 2.7x faster and
decoding achieves 43-52x speedups compared to the previous local
implementation.
Link: https://lkml.kernel.org/r/20251114060240.89965-1-409411716@gms.tku.edu.tw Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Xiubo Li <xiubli@redhat.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: David Laight <david.laight.linux@gmail.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guan-Chun Wu [Fri, 14 Nov 2025 06:02:21 +0000 (14:02 +0800)]
fscrypt: replace local base64url helpers with lib/base64
Replace the base64url encoding and decoding functions in fscrypt with the
generic base64_encode() and base64_decode() helpers from lib/base64.
This removes the custom implementation in fscrypt, reduces code
duplication, and relies on the shared Base64 implementation in lib. The
helpers preserve RFC 4648-compliant URL-safe Base64 encoding without
padding, so there are no functional changes.
This change also improves performance: encoding is about 2.7x faster and
decoding achieves 43-52x speedups compared to the previous implementation.
Link: https://lkml.kernel.org/r/20251114060221.89734-1-409411716@gms.tku.edu.tw Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Cc: Christoph Hellwig <hch@lst.de> Cc: David Laight <david.laight.linux@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guan-Chun Wu [Fri, 14 Nov 2025 06:01:57 +0000 (14:01 +0800)]
lib: add KUnit tests for base64 encoding/decoding
Add a KUnit test suite to validate the base64 helpers. The tests cover
both encoding and decoding, including padded and unpadded forms as defined
by RFC 4648 (standard base64), and add negative cases for malformed inputs
and padding errors.
The test suite also validates other variants (URLSAFE, IMAP) to ensure
their correctness.
In addition to functional checks, the suite includes simple
microbenchmarks which report average encode/decode latency for small (64B)
and larger (1KB) inputs. These numbers are informational only and do not
gate the tests.
Kconfig (BASE64_KUNIT) and lib/tests/Makefile are updated accordingly.
Sample KUnit output:
KTAP version 1
# Subtest: base64
# module: base64_kunit
1..4
# base64_performance_tests: [64B] encode run : 32ns
# base64_performance_tests: [64B] decode run : 35ns
# base64_performance_tests: [1KB] encode run : 510ns
# base64_performance_tests: [1KB] decode run : 530ns
ok 1 base64_performance_tests
ok 2 base64_std_encode_tests
ok 3 base64_std_decode_tests
ok 4 base64_variant_tests
# base64: pass:4 fail:0 skip:0 total:4
# Totals: pass:4 fail:0 skip:0 total:4
Link: https://lkml.kernel.org/r/20251114060157.89507-1-409411716@gms.tku.edu.tw Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Laight <david.laight.linux@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guan-Chun Wu [Fri, 14 Nov 2025 06:01:32 +0000 (14:01 +0800)]
lib/base64: rework encode/decode for speed and stricter validation
The old base64 implementation relied on a bit-accumulator loop, which was
slow for larger inputs and too permissive in validation. It would accept
extra '=', missing '=', or even '=' appearing in the middle of the input,
allowing malformed strings to pass. This patch reworks the internals to
improve performance and enforce stricter validation.
Changes:
- Encoder:
* Process input in 3-byte blocks, mapping 24 bits into four 6-bit
symbols, avoiding bit-by-bit shifting and reducing loop iterations.
* Handle the final 1-2 leftover bytes explicitly and emit '=' only when
requested.
- Decoder:
* Based on the reverse lookup tables from the previous patch, decode
input in 4-character groups.
* Each group is looked up directly, converted into numeric values, and
combined into 3 output bytes.
* Explicitly handle padded and unpadded forms:
- With padding: input length must be a multiple of 4, and '=' is
allowed only in the last two positions. Reject stray or early '='.
- Without padding: validate tail lengths (2 or 3 chars) and require
unused low bits to be zero.
* Removed the bit-accumulator style loop to reduce loop iterations.
Kuan-Wei Chiu [Fri, 14 Nov 2025 06:01:07 +0000 (14:01 +0800)]
lib/base64: optimize base64_decode() with reverse lookup tables
Replace the use of strchr() in base64_decode() with precomputed reverse
lookup tables for each variant. This avoids repeated string scans and
improves performance. Use -1 in the tables to mark invalid characters.
Kuan-Wei Chiu [Fri, 14 Nov 2025 06:00:45 +0000 (14:00 +0800)]
lib/base64: add support for multiple variants
Patch series " lib/base64: add generic encoder/decoder, migrate users", v5.
This series introduces a generic Base64 encoder/decoder to the kernel
library, eliminating duplicated implementations and delivering significant
performance improvements.
The Base64 API has been extended to support multiple variants (Standard,
URL-safe, and IMAP) as defined in RFC 4648 and RFC 3501. The API now
takes a variant parameter and an option to control padding. As part of
this series, users are migrated to the new interface while preserving
their specific formats: fscrypt now uses BASE64_URLSAFE, Ceph uses
BASE64_IMAP, and NVMe is updated to BASE64_STD.
On the encoder side, the implementation processes input in 3-byte blocks,
mapping 24 bits directly to 4 output symbols. This avoids bit-by-bit
streaming and reduces loop overhead, achieving about a 2.7x speedup
compared to previous implementations.
On the decoder side, replace strchr() lookups with per-variant reverse
tables and process input in 4-character groups. Each group is mapped to
numeric values and combined into 3 bytes. Padded and unpadded forms are
validated explicitly, rejecting invalid '=' usage and enforcing tail
rules. This improves throughput by ~43-52x.
This patch (of 6):
Extend the base64 API to support multiple variants (standard, URL-safe,
and IMAP) as defined in RFC 4648 and RFC 3501. The API now takes a
variant parameter and an option to control padding. Update NVMe auth code
to use the new interface with BASE64_STD.
Link: https://lkml.kernel.org/r/20251114055829.87814-1-409411716@gms.tku.edu.tw Link: https://lkml.kernel.org/r/20251114060045.88792-1-409411716@gms.tku.edu.tw Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Co-developed-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Signed-off-by: Guan-Chun Wu <409411716@gms.tku.edu.tw> Reviewed-by: David Laight <david.laight.linux@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: "Theodore Y. Ts'o" <tytso@mit.edu> Cc: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yu-Sheng Huang <home7438072@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Feng Tang [Thu, 13 Nov 2025 11:10:39 +0000 (19:10 +0800)]
sys_info: add a default kernel sys_info mask
Which serves as a global default sys_info mask. When users want the same
system information for many error cases (panic, hung, lockup ...), they
can chose to set this global knob only once, while not setting up each
individual sys_info knobs.
This just adds a 'lazy' option, and doesn't change existing kernel
behavior as the mask is 0 by default.
Link: https://lkml.kernel.org/r/20251113111039.22701-5-feng.tang@linux.alibaba.com Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Feng Tang [Thu, 13 Nov 2025 11:10:38 +0000 (19:10 +0800)]
watchdog: add sys_info sysctls to dump sys info on system lockup
When soft/hard lockup happens, developers may need different kinds of
system information (call-stacks, memory info, locks, etc.) to help
debugging.
Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to take
human readable string like "tasks,mem,timers,locks,ftrace,...", and when
system lockup happens, all requested information will be printed out.
(refer kernel/sys_info.c for more details).
Link: https://lkml.kernel.org/r/20251113111039.22701-4-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Feng Tang [Thu, 13 Nov 2025 11:10:37 +0000 (19:10 +0800)]
hung_task: add hung_task_sys_info sysctl to dump sys info on task-hung
When task-hung happens, developers may need different kinds of system
information (call-stacks, memory info, locks, etc.) to help debugging.
Add 'hung_task_sys_info' sysctl knob to take human readable string like
"tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all
requested information will be dumped. (refer kernel/sys_info.c for more
details).
Meanwhile, the newly introduced sys_info() call is used to unify some
existing info-dumping knobs.
[feng.tang@linux.alibaba.com: maintain consistecy established behavior, per Lance and Petr] Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local Link: https://lkml.kernel.org/r/20251113111039.22701-3-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Feng Tang [Thu, 13 Nov 2025 11:10:36 +0000 (19:10 +0800)]
docs: panic: correct some sys_ifo names in sysctl doc
Patch series "Enable hung_task and lockup cases to dump system info on
demand", v2.
When working on kernel stability issues: panic, task-hung and soft/hard
lockup are frequently met. And to debug them, user may need lots of
system information at that time, like task call stacks, lock info, memory
info, ftrace dump, etc.
panic case already uses sys_info() for this purpose, and has a
'panic_sys_info' sysctl(also support cmdline setup) interface to take
human readable string like "tasks,mem,timers,locks,ftrace,..." to control
what kinds of information is needed. Which is also helpful to debug
task-hung and lockup cases.
So this patchset introduces the similar sys_info sysctl interface for
task-hung and lockup cases.
his is mainly for debugging and the info dumping could be intrusive, like
dumping call stack for all tasks when system has huge number of tasks,
similarly for ftrace dump (we may add tracing_stop() and tracing_start()
around it)
Locally these have been used in our bug chasing for stability issues and
were helpful.
As Andrew suggested, add a configurable global 'kernel_sys_info' knob.
When error scenarios like panic/hung-task/lockup etc doesn't setup their
own sys_info knob and calls sys_info() with parameter "0", this global
knob will take effect. It could be used for other kernel cases like OOM,
which may not need one dedicated sys_info knob.
This patch (of 4):
Some sys_info names wered forgotten to change in patch iterations, while
the right names are defined in kernel/sys_info.c.
The introduction of WRITE_ONCE() calls for the 'prev' and 'next' variables
inside plist_check_list() was a misapplication. WRITE_ONCE() is
fundamentally a compiler barrier designed to prevent compiler
optimizations (like caching or reordering) on shared memory locations.
However, the variables 'prev' and 'next' are local, stack-allocated
pointers accessed only by the current thread's invocation of the function.
Since these pointers are thread-local and are never accessed concurrently,
applying WRITE_ONCE() to them is semantically incorrect and unnecessary.
Furthermore, the use of WRITE_ONCE() on local variables prevents the
compiler from performing standard optimizations, such as keeping these
variables cached solely in CPU registers throughout the loop, potentially
introducing performance overhead. Restore the conventional C assignment
for local loop variables, allowing the compiler to generate optimal code.
Xie Yuanbin [Sun, 9 Nov 2025 08:37:15 +0000 (16:37 +0800)]
include/linux/once_lite.h: fix judgment in WARN_ONCE with clang
For c code:
```c
extern int xx;
void test(void)
{
if (WARN_ONCE(xx, "x"))
__asm__ volatile ("nop":::);
}
```
Clang will generate the following assembly code:
```assemble
test:
movl xx(%rip), %eax // Assume xx == 0 (likely case)
testl %eax, %eax // judge once
je .LBB0_3 // jump to .LBB0_3
testb $1, test.__already_done(%rip)
je .LBB0_2
.LBB0_3:
testl %eax, %eax // judge again
je .LBB0_5 // jump to .LBB0_5
.LBB0_4:
nop
.LBB0_5:
retq
// omit
```
In the above code, `xx == 0` should be a likely case, but in this case,
xx has been judged twice.
Test info:
1. kernel source:
linux-next
commit 9c0826a5d9aa4d52206d ("Add linux-next specific files for 20251107")
2. compiler:
clang: Debian clang version 21.1.4 (8) with
Debian LLD 21.1.4 (compatible with GNU linkers)
3. config:
base on default x86_64_defconfig, and setting:
CONFIG_MITIGATION_RETHUNK=n
CONFIG_STACKPROTECTOR=n
Add unlikely to __ret_cond to help the compiler optimize correctly.
[akpm@linux-foundation.org: undo whitespace changes] Link: https://lkml.kernel.org/r/20251109083715.24495-1-qq570070308@gmail.com Signed-off-by: Xie Yuanbin <qq570070308@gmail.com> Cc: Bill Wendling <morbo@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Justin Stitt <justinstitt@google.com> Cc: Maninder Singh <maninder1.s@samsung.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Fri, 7 Nov 2025 15:32:49 +0000 (00:32 +0900)]
MAINTAINERS: update nilfs2 entry
Viacheslav has kindly offered to help with the maintenance of nilfs2 by
upstreaming patches, similar to the HFS/HFS+ tree. I've accepted his
offer, and will therefore add him as a co-maintainer and switch the
project's git tree for that role.
At the same time, change the outdated status field to Maintained to
reflect the current state.
zhang jiao [Thu, 6 Nov 2025 01:07:34 +0000 (09:07 +0800)]
fs/proc/page: remove unused KPMBITS
KPMBITS is never referenced in the code. Just remove it.
Link: https://lkml.kernel.org/r/20251106010735.1603-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liu Ye <liuye@kylinos.cn> Cc: Luiz Capitulino <luizcap@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
scripts/gdb/symbols: make BPF debug info available to GDB
One can debug BPF programs with QEMU gdbstub by setting a breakpoint on
bpf_prog_kallsyms_add(), waiting for a hit with a matching aux.name, and
then setting a breakpoint on bpf_func. This is tedious, error-prone, and
also lacks line numbers.
Automate this in a way similar to the existing support for modules in
lx-symbols.
Enumerate and monitor changes to both BPF kallsyms and JITed progs. For
each ksym, generate and compile a synthetic .s file containing the name,
code, and size. In addition, if this ksym is also a prog, and not a
trampoline, add line number information.
Ensure that this is a no-op if the kernel is built without BPF support or
if "as" is missing. In theory the "as" dependency may be dropped by
generating the synthetic .o file manually, but this is too much complexity
for too little benefit.
Now one can debug BPF progs out of the box like this:
(gdb) lx-symbols -bpf
(gdb) b bpf_prog_4e612a6a881a086b_arena_list_add
Breakpoint 2 (bpf_prog_4e612a6a881a086b_arena_list_add) pending.
# ./test_progs -t arena_list
Thread 4 hit Breakpoint 2, bpf_prog_4e612a6a881a086b_arena_list_add ()
at linux/tools/testing/selftests/bpf/progs/arena_list.c:51
51 list_head = &global_head;
(gdb) n
bpf_prog_4e612a6a881a086b_arena_list_add () at linux/tools/testing/selftests/bpf/progs/arena_list.c:53
53 for (i = zero; i < cnt && can_loop; i++) {
Patch series "scripts/gdb/symbols: make BPF debug info available to GDB",
v2.
This series greatly simplifies debugging BPF progs when using QEMU gdbstub
by providing symbol names, sizes, and line numbers to GDB.
Patch 1 adds radix tree iteration, which is necessary for parsing
prog_idr. Patch 2 is the actual implementation; its description contains
some details on how to use this.
This patch (of 2):
Add a function and a command to iterate over radix tree contents.
Duplicate the C implementation in Python, but drop support for tagging.
David Laight [Wed, 5 Nov 2025 20:10:34 +0000 (20:10 +0000)]
lib: mul_u64_u64_div_u64(): optimise the divide code
Replace the bit by bit algorithm with one that generates 16 bits per
iteration on 32bit architectures and 32 bits on 64bit ones.
On my zen 5 this reduces the time for the tests (using the generic code)
from ~3350ns to ~1000ns.
Running the 32bit algorithm on 64bit x86 takes ~1500ns. It'll be slightly
slower on a real 32bit system, mostly due to register pressure.
The savings for 32bit x86 are much higher (tested in userspace). The
worst case (lots of bits in the quotient) drops from ~900 clocks to ~130
(pretty much independant of the arguments). Other 32bit architectures may
see better savings.
It is possibly to optimise for divisors that span less than
__LONG_WIDTH__/2 bits. However I suspect they don't happen that often and
it doesn't remove any slow cpu divide instructions which dominate the
result.
Typical improvements for 64bit random divides:
old new
sandy bridge: 470 150
haswell: 400 144
piledriver: 960 467 I think rdpmc is very slow.
zen5: 244 80
(Timing is 'rdpmc; mul_div(); rdpmc' with the multiply depending on the
first rdpmc and the second rdpmc depending on the quotient.)
Object code (64bit x86 test program): old 0x173 new 0x141.
Link: https://lkml.kernel.org/r/20251105201035.64043-9-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Wed, 5 Nov 2025 20:10:33 +0000 (20:10 +0000)]
lib: mul_u64_u64_div_u64(): optimise multiply on 32bit x86
gcc generates horrid code for both ((u64)u32_a * u32_b) and (u64_a +
u32_b). As well as the extra instructions it can generate a lot of spills
to stack (including spills of constant zeros and even multiplies by
constant zero).
mul_u32_u32() already exists to optimise the multiply. Add a similar
add_u64_32() for the addition. Disable both for clang - it generates
better code without them.
Move the 64x64 => 128 multiply into a static inline helper function for
code clarity. No need for the a/b_hi/lo variables, the implicit casts on
the function calls do the work for us. Should have minimal effect on the
generated code.
Use mul_u32_u32() and add_u64_u32() in the 64x64 => 128 multiply in
mul_u64_add_u64_div_u64().
Link: https://lkml.kernel.org/r/20251105201035.64043-8-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Wed, 5 Nov 2025 20:10:32 +0000 (20:10 +0000)]
lib: test_mul_u64_u64_div_u64(): test both generic and arch versions
Change the #if in div64.c so that test_mul_u64_u64_div_u64.c can compile
and test the generic version (including the 'long multiply') on
architectures (eg amd64) that define their own copy.
Test the kernel version and the locally compiled version on all arch.
Output the time taken (in ns) on the 'test completed' trace.
For reference, on my zen 5, the optimised version takes ~220ns and the
generic version ~3350ns. Using the native multiply saves ~200ns and
adding back the ilog2() 'optimisation' test adds ~50ms.
Link: https://lkml.kernel.org/r/20251105201035.64043-7-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Wed, 5 Nov 2025 20:10:31 +0000 (20:10 +0000)]
lib: add tests for mul_u64_u64_div_u64_roundup()
Replicate the existing mul_u64_u64_div_u64() test cases with round up.
Update the shell script that verifies the table, remove the comment
markers so that it can be directly pasted into a shell.
Rename the divisor from 'c' to 'd' to match mul_u64_add_u64_div_u64().
It any tests fail then fail the module load with -EINVAL.
Link: https://lkml.kernel.org/r/20251105201035.64043-6-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Wed, 5 Nov 2025 20:10:30 +0000 (20:10 +0000)]
lib: add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()
The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' variant
needs 'divisor - 1' adding in between the multiply and divide so cannot
easily be done by a caller.
Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d and
implement the 'round down' and 'round up' using it.
Update the x86-64 asm to optimise for 'c' being a constant zero.
Add kerndoc definitions for all three functions.
Link: https://lkml.kernel.org/r/20251105201035.64043-5-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Wed, 5 Nov 2025 20:10:29 +0000 (20:10 +0000)]
lib: mul_u64_u64_div_u64(): simplify check for a 64bit product
If the product is only 64bits div64_u64() can be used for the divide.
Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a simple
post-multiply check that the high 64bits are zero.
This has the advantage of being simpler, more accurate and less code. It
will always be faster when the product is larger than 64bits.
Most 64bit cpu have a native 64x64=128 bit multiply, this is needed (for
the low 64bits) even when div64_u64() is called - so the early check gains
nothing and is just extra code.
32bit cpu will need a compare (etc) to generate the 64bit ilog2() from two
32bit bit scans - so that is non-trivial. (Never mind the mess of x86's
'bsr' and any oddball cpu without fast bit-scan instructions.) Whereas the
additional instructions for the 128bit multiply result are pretty much one
multiply and two adds (typically the 'adc $0,%reg' can be run in parallel
with the instruction that follows).
The only outliers are 64bit systems without 128bit mutiply and simple in
order 32bit ones with fast bit scan but needing extra instructions to get
the high bits of the multiply result. I doubt it makes much difference to
either, the latter is definitely not mainstream.
If anyone is worried about the analysis they can look at the generated
code for x86 (especially when cmov isn't used).
Link: https://lkml.kernel.org/r/20251105201035.64043-4-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Wed, 5 Nov 2025 20:10:28 +0000 (20:10 +0000)]
lib: mul_u64_u64_div_u64(): combine overflow and divide by zero checks
Since the overflow check always triggers when the divisor is zero
move the check for divide by zero inside the overflow check.
This means there is only one test in the normal path.
Link: https://lkml.kernel.org/r/20251105201035.64043-3-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Laight [Wed, 5 Nov 2025 20:10:27 +0000 (20:10 +0000)]
lib: mul_u64_u64_div_u64(): rename parameter 'c' to 'd'
Patch series "Implement mul_u64_u64_div_u64_roundup()", v5.
The pwm-stm32.c code wants a 'rounding up' version of
mul_u64_u64_div_u64(). This can be done simply by adding 'divisor - 1' to
the 128bit product. Implement mul_u64_add_u64_div_u64(a, b, c, d) = (a *
b + c)/d based on the existing code. Define mul_u64_u64_div_u64(a, b, d)
as mul_u64_add_u64_div_u64(a, b, 0, d) and mul_u64_u64_div_u64_roundup(a,
b, d) as mul_u64_add_u64_div_u64(a, b, d-1, d).
Only x86-64 has an optimsed (asm) version of the function. That is
optimised to avoid the 'add c' when c is known to be zero. In all other
cases the extra code will be noise compared to the software divide code.
The test module has been updated to test mul_u64_u64_div_u64_roundup() and
also enhanced it to verify the C division code on x86-64 and the 32bit
division code on 64bit.
This patch (of 9):
Change to prototype from mul_u64_u64_div_u64(u64 a, u64 b, u64 c) to
mul_u64_u64_div_u64(u64 a, u64 b, u64 d). Using 'd' for 'divisor' makes
more sense.
An upcoming change adds a 'c' parameter to calculate (a * b + c)/d.
Link: https://lkml.kernel.org/r/20251105201035.64043-1-david.laight.linux@gmail.com Link: https://lkml.kernel.org/r/20251105201035.64043-2-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Tue, 4 Nov 2025 18:38:34 +0000 (19:38 +0100)]
util_macros.h: fix kernel-doc for u64_to_user_ptr()
The added documentation to u64_to_user_ptr() misspelled the function name.
Fix it.
Link: https://lkml.kernel.org/r/20251104183834.1046584-1-andriy.shevchenko@linux.intel.com Fixes: 029c896c4105 ("kernel.h: move PTR_IF() and u64_to_user_ptr() to util_macros.h") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexandru Ardelean <aardelean@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Carlos López [Fri, 31 Oct 2025 11:19:09 +0000 (12:19 +0100)]
checkpatch: add IDR to the deprecated list
As of commit 85656ec193e9, the IDR interface is marked as deprecated in
the documentation, but no checks are made in that regard for new code.
Add the existing IDR initialization APIs to the deprecated list in
checkpatch, so that if new code is introduced using these APIs, a warning
is emitted.
Link: https://lkml.kernel.org/r/20251031111908.2266077-2-clopez@suse.de Signed-off-by: Carlos López <clopez@suse.de> Suggested-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ocfs2: validate cl_bpc in allocator inodes to prevent divide-by-zero
The chain allocator field cl_bpc (blocks per cluster) is read from disk
and used in division operations without validation. A corrupted
filesystem image with cl_bpc=0 causes a divide-by-zero crash in the
kernel:
This patch adds validation in ocfs2_validate_inode_block() to ensure
cl_bpc matches the expected value calculated from the superblock's cluster
size and block size for chain allocator inodes (identified by
OCFS2_CHAIN_FL).
Moving the validation to inode validation time (rather than allocation time)
has several benefits:
- Validates once when the inode is read, rather than on every allocation
- Protects all code paths that use cl_bpc (allocation, resize, etc.)
- Follows the existing pattern of inode validation in OCFS2
- Centralizes validation logic
The validation catches both:
- Zero values that cause divide-by-zero crashes
- Non-zero but incorrect values indicating filesystem corruption or
mismatched filesystem geometry
With this fix, mounting a corrupted filesystem produces:
[dmantipov@yandex.ru: combine into the series and tweak the message to fit the commonly used style] Link: https://lkml.kernel.org/r/20251030153003.1934585-2-dmantipov@yandex.ru Link: https://lore.kernel.org/ocfs2-devel/20251026132625.12348-1-kartikey406@gmail.com/T/#u Link: https://lore.kernel.org/all/20251027124131.10002-1-kartikey406@gmail.com/T/ Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+fd8af97c7227fe605d95@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fd8af97c7227fe605d95 Tested-by: syzbot+fd8af97c7227fe605d95@syzkaller.appspotmail.com Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Antipov [Thu, 30 Oct 2025 15:30:02 +0000 (18:30 +0300)]
ocfs2: add extra consistency checks for chain allocator dinodes
When validating chain allocator dinode in 'ocfs2_validate_inode_block()',
add an extra checks whether a) the maximum amount of chain records in
'struct ocfs2_chain_list' matches the value calculated based on the
filesystem block size, and b) the next free slot index is within the valid
range.
Link: https://lkml.kernel.org/r/20251030153003.1934585-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+77026564530dbc29b854@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=77026564530dbc29b854 Reported-by: syzbot+5054473a31f78f735416@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5054473a31f78f735416 Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Deepanshu Kartikey <kartikey406@gmail.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Thu, 30 Oct 2025 11:44:20 +0000 (12:44 +0100)]
panic: sys_info: rewrite a fix for a compilation error (`make W=1`)
Compiler was not happy about dead variable in use:
lib/sys_info.c:52:19: error: variable 'sys_info_avail' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
52 | static const char sys_info_avail[] = "tasks,mem,timers,locks,ftrace,all_bt,blocked_tasks";
| ^~~~~~~~~~~~~~
This was fixed by adding __maybe_unused attribute that just hides the
issue and didn't actually fix the root cause. Rewrite the fix by moving
the local variable from stack to a heap.
As a side effect this drops unneeded "synchronisation" of duplicative info
and also makes code ready for the further refactoring.
Andy Shevchenko [Thu, 30 Oct 2025 11:44:18 +0000 (12:44 +0100)]
panic: sys_info: align constant definition names with parameters
Align constant definition names with parameters to make it easier to map.
It's also better to maintain and extend the names while keeping their
uniqueness.
Andy Shevchenko [Thu, 30 Oct 2025 11:44:17 +0000 (12:44 +0100)]
panic: sys_info: capture si_bits_global before iterating over it
Patch series "panic: sys_info: Refactor and fix a potential issue", v3.
While targeting the compilation issue due to dangling variable, I have
noticed more opportunities for refactoring that helps to avoid above
mentioned compilation issue in a cleaner way and also fixes a potential
problem with global variable access.
This patch (of 6):
The for-loop might re-read the content of the memory the si_bits_global
points to on each iteration. Instead, just capture it for the sake of
consistency and use that instead.
Thorsten Blum [Thu, 30 Oct 2025 15:46:43 +0000 (00:46 +0900)]
nilfs2: replace vmalloc + copy_from_user with vmemdup_user
Replace vmalloc() followed by copy_from_user() with vmemdup_user() to
improve nilfs_ioctl_clean_segments() and nilfs_ioctl_set_suinfo(). Use
kvfree() to free the buffers created by vmemdup_user().
Use u64_to_user_ptr() instead of manually casting the pointers and
remove the obsolete 'out_free' label.
Oleg Nesterov [Sun, 26 Oct 2025 14:31:40 +0000 (15:31 +0100)]
release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts()
rcu_read_lock() was added to shut RCU-lockdep up when this code used
__task_cred()->rcu_dereference(), but after the commit 21d1c5e386bc
("Reimplement RLIMIT_NPROC on top of ucounts") it is no longer needed:
task_ucounts()->task_cred_xxx() takes rcu_read_lock() itself.
NOTE: task_ucounts() returns the pointer to another rcu-protected data,
struct ucounts. So it should either be used when task->real_cred and thus
task->real_cred->ucounts is stable (release_task, copy_process,
copy_creds), or it should be called under rcu_read_lock(). In both cases
it is pointless to take rcu_read_lock() to read the cred->ucounts pointer.
xxh32_reset() and xxh32_copy_state() are unused, and with those gone, the
xxh32_state struct is also unused.
xxh64_copy_state() is also unused.
Remove them all.
(Also fixes a comment above the xxh64_state that referred to it as
xxh32_state).
Link: https://lkml.kernel.org/r/20251024205120.454508-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ye Bin [Sat, 25 Oct 2025 08:00:03 +0000 (16:00 +0800)]
dynamic_debug: add support for print stack
In practical problem diagnosis, especially during the boot phase, it is
often desirable to know the call sequence. However, currently, apart from
adding print statements and recompiling the kernel, there seems to be no
good alternative. If dynamic_debug supported printing the call stack, it
would be very helpful for diagnosing issues. This patch add support '+d'
for dump stack.
Dmitry Antipov [Thu, 23 Oct 2025 14:16:50 +0000 (17:16 +0300)]
ocfs2: add inline inode consistency check to ocfs2_validate_inode_block()
In 'ocfs2_validate_inode_block()', add an extra check whether an inode
with inline data (i.e. self-contained) has no clusters, thus preventing
an invalid inode from being passed to 'ocfs2_evict_inode()' and below.
Link: https://lkml.kernel.org/r/20251023141650.417129-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+c16daba279a1161acfb0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c16daba279a1161acfb0 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Sat, 25 Oct 2025 12:32:18 +0000 (20:32 +0800)]
ocfs2: convert to host endian in ocfs2_validate_inode_block
Convert to host endian when checking OCFS2_VALID_FL to keep consistent
with other checks.
Link: https://lkml.kernel.org/r/20251025123218.3997866-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Antipov [Mon, 13 Oct 2025 06:28:26 +0000 (09:28 +0300)]
ocfs2: add boundary check to ocfs2_check_dir_entry()
In 'ocfs2_check_dir_entry()', add extra check whether at least the
smallest possible dirent may be located at the specified offset within
bh's data, thus preventing an out-of-bounds accesses below.
Link: https://lkml.kernel.org/r/20251013062826.122586-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+b20bbf680bb0f2ecedae@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b20bbf680bb0f2ecedae Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
uaccess: decouple INLINE_COPY_FROM_USER and CONFIG_RUST
Commit 1f9a8286bc0c ("uaccess: always export _copy_[from|to]_user with
CONFIG_RUST") exports _copy_{from,to}_user() unconditionally, if RUST is
enabled. This pollutes exported symbols namespace, and spreads RUST
ifdefery in core files.
It's better to declare a corresponding helper under the rust/helpers,
similarly to how non-underscored copy_{from,to}_user() is handled.
[yury.norov@gmail.com: drop rust part of comment for _copy_from_user(), per Alice] Link: https://lkml.kernel.org/r/20251024154754.99768-1-yury.norov@gmail.com Link: https://lkml.kernel.org/r/20251023171607.1171534-1-yury.norov@gmail.com Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Tested-by: Alice Ryhl <aliceryhl@google.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Trevor Gross <tmgross@umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Douglas Anderson [Thu, 23 Oct 2025 18:33:05 +0000 (11:33 -0700)]
init/main.c: wrap long kernel cmdline when printing to logs
The kernel cmdline length is allowed to be longer than what printk can
handle. When this happens the cmdline that's printed to the kernel ring
buffer at bootup is cutoff and some kernel cmdline options are "hidden"
from the logs. This undercuts the usefulness of the log message.
Specifically, grepping for COMMAND_LINE_SIZE shows that 2048 is common and
some architectures even define it as 4096. s390 allows a CONFIG-based
maximum up to 1MB (though it's not expected that anyone will go over the
default max of 4096 [1]).
The maximum message pr_notice() seems to be able to handle (based on
experiment) is 1021 characters. This appears to be based on the current
value of PRINTKRB_RECORD_MAX as 1024 and the fact that pr_notice() spends
2 characters on the loglevel prefix and we have a '\n' at the end.
While it would be possible to increase the limits of printk() (and
therefore pr_notice()) somewhat, it doesn't appear possible to increase it
enough to fully include a 2048-character cmdline without breaking
userspace. Specifically on at least two tested userspaces (ChromeOS plus
the Debian-based distro I'm typing this message on) the `dmesg` tool reads
lines from `/dev/kmsg` in 2047-byte chunks. As per
`Documentation/ABI/testing/dev-kmsg`:
Every read() from the opened device node receives one record
of the kernel's printk buffer.
...
Messages in the record ring buffer get overwritten as whole,
there are never partial messages received by read().
We simply can't fit a 2048-byte cmdline plus the "Kernel command line:"
prefix plus info about time/log_level/etc in a 2047-byte read.
The above means that if we want to avoid the truncation we need to do some
type of wrapping of the cmdline when printing.
Add wrapping to the printout of the kernel command line. By default, the
wrapping is set to 1021 characters to avoid breaking anyone, but allow
wrapping to be set lower by a Kconfig knob
"CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN". Any tools that are correctly parsing
the cmdline today (because it is less than 1021 characters) will see no
difference in their behavior. The format of wrapped output is designed to
be matched by anyone using "grep" to search for the cmdline and also to be
easy for tools to handle. Anyone who is sure their tools (if any) handle
the wrapped format can choose a lower wrapping value and have prettier
output.
Setting CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN to 0 fully disables the wrapping
logic. This means that long command lines will be truncated again, but
this config could be set if command lines are expected to be long and
userspace is known not to handle parsing logs with the wrapping.
Wrapping is based on spaces, ignoring quotes. All lines are prefixed with
"Kernel command line: " and lines that are not the last line have a " \"
suffix added to them. The prefix and suffix count towards the line length
for wrapping purposes. The ideal length will be exceeded if no
appropriate place to wrap is found.
The wrapping function added here is fairly generic and could be made a
library function (somewhat like print_hex_dump()) if it's needed elsewhere
in the kernel. However, having printk() directly incorporate this
wrapping would be unlikely to be a good idea since it would break
printouts into more than one record without any obvious common line prefix
to tie lines together. It would also be extra overhead when, in general,
kernel log message should simply be kept smaller than 1021 bytes. For
some discussion on this topic, see responses to the v1 posting of this
patch [2].
[akpm@linux-foundation.org: make print_kernel_cmdline __init]
[dianders@chromium.org: v4] Link: https://lkml.kernel.org/r/20251027082204.v4.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid Link: https://lkml.kernel.org/r/20251023113257.v3.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid Link: https://lore.kernel.org/r/20251021131633.26700Dd6-hca@linux.ibm.com Link: https://lore.kernel.org/r/CAD=FV=VNyt1zG_8pS64wgV8VkZWiWJymnZ-XCfkrfaAhhFSKcA@mail.gmail.com Signed-off-by: Douglas Anderson <dianders@chromium.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Chant <achant@google.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Francesco Valla <francesco@valla.it> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: guoweikang <guoweikang.kernel@gmail.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Hendrik Farr <kernel@jfarr.cc> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vlad Kulikov [Tue, 21 Oct 2025 18:13:39 +0000 (21:13 +0300)]
ipc: create_ipc_ns: drop mqueue mount on sysctl setup failure
If setup_mq_sysctls(ns) fails after mq_init_ns(ns) succeeds, the error
path skipped releasing the internal kernel mqueue mount kept in
ns->mq_mnt. That leaves the vfsmount/superblock referenced until final
namespace teardown, i.e. a resource leak on this rare failure edge.
Unwind it by calling mntput(ns->mq_mnt) before dropping user_ns and
freeing the IPC namespace. This mirrors the normal ordering used in
free_ipc_ns().
Link: https://lkml.kernel.org/r/20251021181341.670297-1-vlad_kulikov_c@pm.me Signed-off-by: Vlad Kulikov <vlad_kulikov_c@pm.me> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Ma Wupeng <mawupeng1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Antipov [Mon, 13 Oct 2025 10:37:09 +0000 (13:37 +0300)]
ocfs2: add directory size check to ocfs2_find_dir_space_id()
Fix a null-pointer-deref which was detected by UBSAN:
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 5317 Comm: syz-executor310 Not tainted 6.15.0-syzkaller-12141-gec7714e49479 #0 PREEMPT(full)
In 'ocfs2_find_dir_space_id()', add extra check whether the directory data
block is large enough to hold at least one directory entry, and raise
'ocfs2_error()' if the former is unexpectedly small.
Link: https://lkml.kernel.org/r/20251013103709.146001-1-dmantipov@yandex.ru Reported-by: syzbot+ded9116588a7b73c34bc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ded9116588a7b73c34bc Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Petr Pavlu [Wed, 22 Oct 2025 08:28:04 +0000 (10:28 +0200)]
taint/module: remove unnecessary taint_flag.module field
The TAINT_RANDSTRUCT and TAINT_FWCTL flags are mistakenly set in the
taint_flags table as per-module flags. While this can be trivially
corrected, the issue can be avoided altogether by removing the
taint_flag.module field.
This is possible because, since commit 7fd8329ba502 ("taint/module: Clean
up global and module taint flags handling") in 2016, the handling of
module taint flags has been fully generic. Specifically,
module_flags_taint() can print all flags, and the required output buffer
size is properly defined in terms of TAINT_FLAGS_COUNT. The actual
per-module flags are always those added to module.taints by calls to
add_taint_module().
Link: https://lkml.kernel.org/r/20251022082938.26670-1-petr.pavlu@suse.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Wed, 15 Oct 2025 22:16:26 +0000 (15:16 -0700)]
taint: add reminder about updating docs and scripts
Sometimes people update taint-related pieces of the kernel without
updating the supporting documentation or scripts. Add a reminder to do
this.
Link: https://lkml.kernel.org/r/20251015221626.1126156-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This warning occurs due to a conflict between crashkernel and System RAM
iomem resources.
The generic crashkernel reservation adds the crashkernel memory range to
/proc/iomem during early initialization. Later, all memblock ranges are
added to /proc/iomem as System RAM. If the crashkernel region overlaps
with any memblock range, it causes a conflict while adding those memblock
regions as iomem resources, triggering the above warning. The conflicting
memblock regions are then omitted from /proc/iomem.
For example, if the following crashkernel region is added to /proc/iomem: 20000000-11fffffff : Crash kernel
then the following memblock regions System RAM regions fail to be inserted: 00000000-7fffffff : System RAM 80000000-257fffffff : System RAM
Fix this by not adding the crashkernel memory to /proc/iomem on powerpc.
Introduce an architecture hook to let each architecture decide whether to
export the crashkernel region to /proc/iomem.
For more info checkout commit c40dd2f766440 ("powerpc: Add System RAM
to /proc/iomem") and commit bce074bdbc36 ("powerpc: insert System RAM
resource to prevent crashkernel conflict")
Note: Before switching to the generic crashkernel reservation, powerpc
never exported the crashkernel region to /proc/iomem.
Link: https://lkml.kernel.org/r/20251016142831.144515-1-sourabhjain@linux.ibm.com Fixes: e3185ee438c2 ("powerpc/crash: use generic crashkernel reservation"). Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/90937fe0-2e76-4c82-b27e-7b8a7fe3ac69@linux.ibm.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
WangYuli [Tue, 14 Oct 2025 05:07:47 +0000 (13:07 +0800)]
.mailmap: add entry for WangYuli
Map my old, obsolete work email address to my current email address.
My current work email may not be ideal for timely communication, as
it requires a secure network environment for access due to security
policies.
Therefore, associate both my previous and current work email addresses
with an email address provided to me by AOSC Linux community. During
work hours, my commits will likely still be authored using my company
email address.
Link: https://lkml.kernel.org/r/20251014050747.527357-1-wangyuli@aosc.io Signed-off-by: WangYuli <wangyl5933@chinaunicom.cn> Signed-off-by: WangYuli <wangyuli@aosc.io> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Shannon Nelson <sln@onemain.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Li RongQing [Wed, 15 Oct 2025 06:36:15 +0000 (14:36 +0800)]
hung_task: panic when there are more than N hung tasks at the same time
The hung_task_panic sysctl is currently a blunt instrument: it's all or
nothing.
Panicking on a single hung task can be an overreaction to a transient
glitch. A more reliable indicator of a systemic problem is when
multiple tasks hang simultaneously.
Extend hung_task_panic to accept an integer threshold, allowing the
kernel to panic only when N hung tasks are detected in a single scan.
This provides finer control to distinguish between isolated incidents
and system-wide failures.
The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic on the first hung task (unchanged)
- N > 1: Panic after N hung tasks are detected in a single scan
The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility.
[lance.yang@linux.dev: new changelog] Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Tested-by: Lance Yang <lance.yang@linux.dev> Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> [aspeed_g5_defconfig] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: Florian Wesphal <fw@strlen.de> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Joel Stanley <joel@jms.id.au> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Phil Auld <pauld@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Antipov [Tue, 7 Oct 2025 09:46:26 +0000 (12:46 +0300)]
ocfs2: add extra consistency check to ocfs2_dx_dir_lookup_rec()
In 'ocfs2_dx_dir_lookup_rec()', check whether an extent list length of the
directory indexing block matches the one configured via the superblock
parameters established at mount, thus preventing an out-of-bounds accesses
while iterating over the extent records below.
Link: https://lkml.kernel.org/r/20251007094626.196143-1-dmantipov@yandex.ru Reported-by: syzbot+30b53487d00b4f7f0922@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=30b53487d00b4f7f0922 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com>> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Antipov [Tue, 7 Oct 2025 12:35:26 +0000 (15:35 +0300)]
ocfs2: annotate flexible array members with __counted_by_le()
Annotate flexible array members of 'struct ocfs2_extent_list',
'struct ocfs2_chain_list', 'struct ocfs2_truncate_log',
'struct ocfs2_dx_entry_list', 'ocfs2_refcount_list' and
'struct ocfs2_xattr_header' with '__counted_by_le()'
attribute to improve array bounds checking when
CONFIG_UBSAN_BOUNDS is enabled.
[dmantipov@yandex.ru: fix __counted_by_le() usage in ocfs2_expand_inline_dx_root()] Link: https://lkml.kernel.org/r/20251014070324.130313-1-dmantipov@yandex.ru Link: https://lkml.kernel.org/r/20251007123526.213150-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lukas Bulwahn [Fri, 10 Oct 2025 08:21:38 +0000 (10:21 +0200)]
treewide: drop outdated compiler version remarks in Kconfig help texts
As of writing, Documentation/Changes states the minimal versions of GNU C
being 8.1, Clang being 15.0.0 and binutils being 2.30. A few Kconfig help
texts are pointing out that specific GCC and Clang versions are needed,
but by now, those pointers to versions, such later than 4.0, later than
4.4, or clang later than 5.0, are obsolete and unlikely to be found by
users configuring their kernel builds anyway.
Drop these outdated remarks in Kconfig help texts referring to older
compiler and binutils versions. No functional change.
Link: https://lkml.kernel.org/r/20251010082138.185752-1-lukas.bulwahn@redhat.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Russel King <linux@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhichi Lin [Sat, 11 Oct 2025 08:22:22 +0000 (16:22 +0800)]
scs: fix a wrong parameter in __scs_magic
__scs_magic() needs a 'void *' variable, but a 'struct task_struct *' is
given. 'task_scs(tsk)' is the starting address of the task's shadow call
stack, and '__scs_magic(task_scs(tsk))' is the end address of the task's
shadow call stack. Here should be '__scs_magic(task_scs(tsk))'.
The user-visible effect of this bug is that when CONFIG_DEBUG_STACK_USAGE
is enabled, the shadow call stack usage checking function
(scs_check_usage) would scan an incorrect memory range. This could lead
to:
1. **Inaccurate stack usage reporting**: The function would calculate
wrong usage statistics for the shadow call stack, potentially showing
incorrect value in kmsg.
2. **Potential kernel crash**: If the value of __scs_magic(tsk)is
greater than that of __scs_magic(task_scs(tsk)), the for loop may
access unmapped memory, potentially causing a kernel panic. However,
this scenario is unlikely because task_struct is allocated via the slab
allocator (which typically returns lower addresses), while the shadow
call stack returned by task_scs(tsk) is allocated via vmalloc(which
typically returns higher addresses).
However, since this is purely a debugging feature
(CONFIG_DEBUG_STACK_USAGE), normal production systems should be not
unaffected. The bug only impacts developers and testers who are actively
debugging stack usage with this configuration enabled.
Link: https://lkml.kernel.org/r/20251011082222.12965-1-zhichi.lin@vivo.com Fixes: 5bbaf9d1fcb9 ("scs: Add support for stack usage debugging") Signed-off-by: Jiyuan Xie <xiejiyuan@vivo.com> Signed-off-by: Zhichi Lin <zhichi.lin@vivo.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Will Deacon <will@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yee Lee <yee.lee@mediatek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kexec_core: remove superfluous page offset handling in segment loading
During kexec_segment loading, when copying the content of the segment
(i.e. kexec_segment::kbuf or kexec_segment::buf) to its associated pages,
kimage_load_{cma,normal,crash}_segment handle the case where the physical
address of the segment is not page aligned, e.g. in
kimage_load_normal_segment:
(similar logic is present in kimage_load_{cma,crash}_segment).
This is actually not needed because, prior to their loading, all
kexec_segments first go through a vetting step in
`sanity_check_segment_list`, which rejects any segment that is not
page-aligned:
for (i = 0; i < nr_segments; i++) {
unsigned long mstart, mend;
mstart = image->segment[i].mem;
mend = mstart + image->segment[i].memsz;
// ...
if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK))
return -EADDRNOTAVAIL;
// ...
}
In case `sanity_check_segment_list` finds a non-page aligned the whole
kexec load is aborted and no segment is loaded.
This means that `kimage_load_{cma,normal,crash}_segment` never actually
have to handle non page-aligned segments and `(maddr & ~PAGE_MASK) == 0`
is always true no matter if the segment is coming from a file (i.e.
`kexec_file_load` syscall), from a user-space buffer (i.e. `kexec_load`
syscall) or created by the kernel through `kexec_add_buffer`. In the
latter case, `kexec_add_buffer` actually enforces the page alignment:
[jbouron@amazon.com: v3] Link: https://lkml.kernel.org/r/20251024155009.39502-1-jbouron@amazon.com Link: https://lkml.kernel.org/r/20250929160220.47616-1-jbouron@amazon.com Signed-off-by: Justinien Bouron <jbouron@amazon.com> Reviewed-by: Gunnar Kudrjavets <gunnarku@amazon.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Graf <graf@amazon.com> Cc: Marcos Paulo de Souza <mpdesouza@suse.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Antipov [Thu, 9 Oct 2025 10:23:49 +0000 (13:23 +0300)]
ocfs2: relax BUG() to ocfs2_error() in __ocfs2_move_extent()
In '__ocfs2_move_extent()', relax 'BUG()' to 'ocfs2_error()' just
to avoid crashing the whole kernel due to a filesystem corruption.
Fixes: 8f603e567aa7 ("Ocfs2/move_extents: move a range of extent.") Link: https://lkml.kernel.org/r/20251009102349.181126-2-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Closes: https://syzkaller.appspot.com/bug?extid=727d161855d11d81e411 Reported-by: syzbot+727d161855d11d81e411@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Antipov [Thu, 9 Oct 2025 10:23:48 +0000 (13:23 +0300)]
ocfs2: add extra flags check in ocfs2_ioctl_move_extents()
In 'ocfs2_ioctl_move_extents()', add extra check whether only actually
supported flags are passed via 'ioctl(..., OCFS2_IOC_MOVE_EXT, ...)',
and reject anything beyond OCFS2_MOVE_EXT_FL_AUTO_DEFRAG and
OCFS2_MOVE_EXT_FL_PART_DEFRAG with -EINVAL. In particular,
OCFS2_MOVE_EXT_FL_COMPLETE may be set by the kernel only and
should never be passed from userspace.
Link: https://lkml.kernel.org/r/20251009102349.181126-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
checkpatch: detect unhandled placeholders in cover letters
Add a new check PLACEHOLDER_USE to detect unhandled placeholders. This
prevents sending patch series with incomplete patches (mostly in cover
letters) containing auto generated subject or blurb lines.
These placeholders can be seen on mailing lists. With this change,
checkpatch will emit an error when such text is found.
Link: https://lkml.kernel.org/r/20250917173725.22547-2-work@onurozkan.dev Signed-off-by: Onur Özkan <work@onurozkan.dev> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 9 Nov 2025 17:29:44 +0000 (09:29 -0800)]
Merge tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Two reverts merged into one commit to handle a regression caused by a
wrong cleanup because the underlying implications were unclear"
* tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: muxes: pca954x: Fix broken reset-gpio usage
Linus Torvalds [Sun, 9 Nov 2025 17:22:08 +0000 (09:22 -0800)]
Merge tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Strip trailing padding bytes from modules.builtin.modinfo to fix
error during modules_install with certain versions of kmod
- Drop unused static inline function warning in .c files with clang
from W=1 to W=2
- Ensure kernel-doc.py invocations use the PYTHON3 make variable to
ensure user's choice of Python interpreter is always respected
* tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Let kernel-doc.py use PYTHON3 override
compiler_types: Move unused static inline functions warning to W=2
kbuild: Strip trailing padding bytes from modules.builtin.modinfo
Jean Delvare [Fri, 7 Nov 2025 18:29:33 +0000 (19:29 +0100)]
kbuild: Let kernel-doc.py use PYTHON3 override
It is possible to force a specific version of python to be used when
building the kernel by passing PYTHON3= on the make command line.
However kernel-doc.py is currently called with python3 hard-coded and
thus ignores this setting.
Use $(PYTHON3) to run $(KERNELDOC) so that the desired version of
python is used.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://patch.msgid.link/20251107192933.2bfe9e57@endymion Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Linus Torvalds [Sat, 8 Nov 2025 23:37:03 +0000 (15:37 -0800)]
Merge tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fix from Dave Airlie:
"Brown paper bag, the dma mask fix which I applied and actually looked
through for bad things, actually broke newer GPUs, there might be some
latent part in the boot path that is assuming 32-bit still, but we
will figure that out elsewhere.
nouveau:
- revert DMA mask change"
* tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/nouveau: set DMA mask before creating the flush page"
Linus Torvalds [Sat, 8 Nov 2025 23:34:23 +0000 (15:34 -0800)]
Merge tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"The two reverts are for patches that I shouldn't have applied. The
rx8025 patch fixes an issue present since 2022:
Linus Torvalds [Sat, 8 Nov 2025 17:01:11 +0000 (09:01 -0800)]
Merge tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix AMD PCI root device caching regression that triggers
on certain firmware variants
- Fix the zen5_rdseed_microcode[] array to be NULL-terminated
- Add more AMD models to microcode signature checking
* tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Add more known models to entry sign checking
x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
x86/amd_node: Fix AMD root device caching
Linus Torvalds [Sat, 8 Nov 2025 16:59:05 +0000 (08:59 -0800)]
Merge tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a group-throttling bug in the fair scheduler"
* tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
Linus Torvalds [Sat, 8 Nov 2025 16:43:01 +0000 (08:43 -0800)]
Merge tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"This contain fixes for the RT and zoned allocator, and a few fixes for
atomic writes"
* tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: free xfs_busy_extents structure when no RT extents are queued
xfs: fix zone selection in xfs_select_open_zone_mru
xfs: fix a rtgroup leak when xfs_init_zone fails
xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
xfs: fix delalloc write failures in software-provided atomic writes
Tested the latest kernel on my GB203 and this seems to break it somehow.
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: GSP-FMC boot failed (mbox: 0x0000000b)
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: init failed, -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: init failed with -5
Nov 09 04:16:14 bighp kernel: nouveau: drm:00000000:00000080: init failed with -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: drm: Device allocation failed: -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: probe with driver nouveau failed with error -5
Not sure why, I went over the patch and thought it should have worked, but there must be some
32-bit problem maybe in the FMC boot path.
Pavel Begunkov [Fri, 7 Nov 2025 18:41:26 +0000 (18:41 +0000)]
io_uring: fix regbuf vector size truncation
There is a report of io_estimate_bvec_size() truncating the calculated
number of segments that leads to corruption issues. Check it doesn't
overflow "int"s used later. Rough but simple, can be improved on top.
Cc: stable@vger.kernel.org Fixes: 9ef4cbbcb4ac3 ("io_uring: add infra for importing vectored reg buffers") Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-458654612@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Günther Noack <gnoack@google.com> Tested-by: Günther Noack <gnoack@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 7 Nov 2025 22:51:11 +0000 (14:51 -0800)]
Merge tag 'drm-fixes-2025-11-08' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Back from travel, thanks to Simona for handling things. regular fixes,
seems about the right size, but spread out a bit.
amdgpu has the usual range of fixes, xe has a few fixes, and nouveau
has a couple of fixes, one for blackwell modifiers on 8/16 bit
surfaces.
Otherwise a few small fixes for mediatek, sched, imagination and
pixpaper.
xe:
- Fix missing synchronization on unbind
- Fix device shutdown when doing FLR
- Fix user fence signaling order
i915:
- Avoid lock inversion when pinning to GGTT on CHV/BXT+VTD
- Fix conversion between clock ticks and nanoseconds
mediatek:
- Disable AFBC support on Mediatek DRM driver
- Add pm_runtime support for GCE power control
imagination:
- kconfig: Fix dependencies
nouveau:
- Set DMA mask earlier
- Advertize correct modifiers for GB20x
pixpaper:
- kconfig: Fix dependencies"
* tag 'drm-fixes-2025-11-08' of https://gitlab.freedesktop.org/drm/kernel: (26 commits)
drm/xe: Enforce correct user fence signaling order using
drm/xe: Do clean shutdown also when using flr
drm/xe: Move declarations under conditional branch
drm/xe/guc: Synchronize Dead CT worker with unbind
drm/amd/display: Enable mst when it's detected but yet to be initialized
drm/amdgpu: Fix wait after reset sequence in S3
drm/amd: Fix suspend failure with secure display TA
drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
drm/tiny: pixpaper: add explicit dependency on MMU
drm/nouveau: Advertise correct modifiers on GB20x
drm: define NVIDIA DRM format modifiers for GB20x
drm/nouveau: set DMA mask before creating the flush page
drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb
drm/amd/display: Fix NULL deref in debugfs odm_combine_segments
drm/amdkfd: Don't clear PT after process killed
drm/amdgpu/smu: Handle S0ix for vangogh
drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()
drm/amd/display: Fix black screen with HDMI outputs
drm/amd/display: Don't stretch non-native images by default in eDP
drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()
...
Linus Torvalds [Fri, 7 Nov 2025 21:19:18 +0000 (13:19 -0800)]
Merge tag 'parisc-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
- fix crash triggered by unaligned access in parisc unwinder
* tag 'parisc-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid crash due to unaligned access in unwinder
Linus Torvalds [Fri, 7 Nov 2025 21:13:09 +0000 (13:13 -0800)]
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
- Syzkaller found a case where maths overflows can cause divide by 0
- Typo in a compiler bug warning fix in the selftests broke the
selftests
- type1 compatability had a mismatch when unmapping an already unmapped
range, it should succeed
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()
iommufd: Don't overflow during division for dirty tracking
Peter Zijlstra [Thu, 6 Nov 2025 10:50:00 +0000 (11:50 +0100)]
compiler_types: Move unused static inline functions warning to W=2
Per Nathan, clang catches unused "static inline" functions in C files
since commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
Linus said:
> So I entirely ignore W=1 issues, because I think so many of the extra
> warnings are bogus.
>
> But if this one in particular is causing more problems than most -
> some teams do seem to use W=1 as part of their test builds - it's fine
> to send me a patch that just moves bad warnings to W=2.
>
> And if anybody uses W=2 for their test builds, that's THEIR problem..
Here is the change to bump the warning from W=1 to W=2.
Fixes: 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251106105000.2103276-1-andriy.shevchenko@linux.intel.com
[nathan: Adjust comment as well] Signed-off-by: Nathan Chancellor <nathan@kernel.org>