git.ipfire.org Git - thirdparty/kernel/linux.git/log

doc: watchdog: futher improvements

Make further additions and alterations to the watchdog documentation.

Link: https://lkml.kernel.org/r/acF3tXBxSr0KOP9b@pathway.suse.cz
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Mayank Rungta <mrungta@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

doc: watchdog: document buddy detector

The current documentation generalizes the hardlockup detector as primarily
NMI-perf-based and lacks details on the SMP "Buddy" detector.

Update the documentation to add a detailed description of the Buddy
detector, and also restructure the "Implementation" section to explicitly
separate "Softlockup Detector", "Hardlockup Detector (NMI/Perf)", and
"Hardlockup Detector (Buddy)".

Clarify that the softlockup hrtimer acts as the heartbeat generator for
both hardlockup mechanisms and centralize the configuration details in a
"Frequency and Heartbeats" section.

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-5-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

watchdog/hardlockup: improve buddy system detection timeliness

Currently, the buddy system only performs checks every 3rd sample.  With a
4-second interval.  If a check window is missed, the next check occurs 12
seconds later, potentially delaying hard lockup detection for up to 24
seconds.

Modify the buddy system to perform checks at every interval (4s).
Introduce a missed-interrupt threshold to maintain the existing grace
period while reducing the detection window to 8-12 seconds.

Best and worst case detection scenarios:

Before (12s check window):
- Best case: Lockup occurs after first check but just before heartbeat
  interval. Detected in ~8s (8s till next check).
- Worst case: Lockup occurs just after a check.
  Detected in ~24s (missed check + 12s till next check + 12s logic).

After (4s check window with threshold of 3):
- Best case: Lockup occurs just before a check.
  Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
- Worst case: Lockup occurs just after a check.
  Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-4-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

doc: watchdog: clarify hardlockup detection timing

The current documentation implies that a hardlockup is strictly defined as
looping for "more than 10 seconds." However, the detection mechanism is
periodic (based on `watchdog_thresh`), meaning detection time varies
significantly depending on when the lockup occurs relative to the NMI perf
event.

Update the definition to remove the strict "more than 10 seconds"
constraint in the introduction and defer details to the Implementation
section.

Additionally, add a "Detection Overhead" section illustrating the Best
Case (~6s) and Worst Case (~20s) detection scenarios to provide
administrators with a clearer understanding of the watchdog's latency.

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-3-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

watchdog: update saved interrupts during check

Currently, arch_touch_nmi_watchdog() causes an early return that skips
updating hrtimer_interrupts_saved.  This leads to stale comparisons and
delayed lockup detection.

I found this issue because in our system the serial console is fairly
chatty.  For example, the 8250 console driver frequently calls
touch_nmi_watchdog() via console_write().  If a CPU locks up after a timer
interrupt but before next watchdog check, we see the following sequence:

  * watchdog_hardlockup_check() saves counter (e.g., 1000)
  * Timer runs and updates the counter (1001)
  * touch_nmi_watchdog() is called
  * CPU locks up
  * 10s pass: check() notices touch, returns early, skips update
  * 10s pass: check() saves counter (1001)
  * 10s pass: check() finally detects lockup

This delays detection to 30 seconds.  With this fix, we detect the lockup
in 20 seconds.

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-2-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

watchdog: return early in watchdog_hardlockup_check()

Patch series "watchdog/hardlockup: Improvements to hardlockup", v2.

This series addresses limitations in the hardlockup detector
implementations and updates the documentation to reflect actual behavior
and recent changes.

The changes are structured as follows:

Refactoring (Patch 1)
=====================
Patch 1 refactors watchdog_hardlockup_check() to return early if no
lockup is detected. This reduces the indentation level of the main
logic block, serving as a clean base for the subsequent changes.

Hardlockup Detection Improvements (Patches 2 & 4)
=================================================
The hardlockup detector logic relies on updating saved interrupt counts to
determine if the CPU is making progress.

Patch 1 ensures that the saved interrupt count is updated unconditionally
before checking the "touched" flag.  This prevents stale comparisons which
can delay detection.  This is a logic fix that ensures the detector
remains accurate even when the watchdog is frequently touched.

Patch 3 improves the Buddy detector's timeliness.  The current checking
interval (every 3rd sample) causes high variability in detection time (up
to 24s).  This patch changes the Buddy detector to check at every hrtimer
interval (4s) with a missed-interrupt threshold of 3, narrowing the
detection window to a consistent 8-12 second range.

Documentation Updates (Patches 3 & 5)
=====================================
The current documentation does not fully capture the variable nature of
detection latency or the details of the Buddy system.

Patch 3 removes the strict "10 seconds" definition of a hardlockup, which
was misleading given the periodic nature of the detector.  It adds a
"Detection Overhead" section to the admin guide, using "Best Case" and
"Worst Case" scenarios to illustrate that detection time can vary
significantly (e.g., ~6s to ~20s).

Patch 5 adds a dedicated section for the Buddy detector, which was
previously undocumented.  It details the mechanism, the new timing logic,
and known limitations.

This patch (of 5):

Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()` to
return early when a hardlockup is not detected.  This flattens the main
logic block, reducing the indentation level and making the code easier to
read and maintain.

This refactoring serves as a preparation patch for future hardlockup
changes.

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com
Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-1-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/kexec: remove inclusion of crypto/hash.h

kexec_core.c does not do any cryptographic hashing, so the header
crypto/hash.h is not needed at all.

Link: https://lkml.kernel.org/r/20260314204144.44884-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/crash: remove inclusion of crypto/sha1.h

Several files related to kernel crash dumps include crypto/sha1.h but
never use any of its functionality. Remove these includes so that these
files don't unnecessarily come up in searches for which kernel code is
still using the obsolete SHA-1 algorithm.

Link: https://lkml.kernel.org/r/20260314204243.45001-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

CREDITS: simplify the end-of-file alphabetical order comment

The existing comment references specific individuals by name which becomes
outdated as new entries are added. Simplify it to state the alphabetical
ordering rule clearly without depending on who happens to be the last
entry.

Link: https://lkml.kernel.org/r/20260312011741.846664-2-hisamshar@gmail.com
Signed-off-by: Hisam Mehboob <hisamshar@gmail.com>
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/glob: initialize back_str to silence uninitialized variable warning

back_str is only used when back_pat is non-NULL, and both are always set
together, so it is safe in practice. Initialize back_str to NULL to make
this safety invariant explicit and silence compiler/static analysis
warnings.

Link: https://lkml.kernel.org/r/20260312215249.50165-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

checkpatch: add support for Assisted-by tag

The Assisted-by tag was introduced in
Documentation/process/coding-assistants.rst for attributing AI tool
contributions to kernel patches.  However, checkpatch.pl did not recognize
this tag, causing two issues:

  WARNING: Non-standard signature: Assisted-by:
  ERROR: Unrecognized email address: 'AGENT_NAME:MODEL_VERSION'

Fix this by:
1. Adding Assisted-by to the recognized $signature_tags list
2. Skipping email validation for Assisted-by lines since they use the
   AGENT_NAME:MODEL_VERSION format instead of an email address
3. Warning when the Assisted-by value doesn't match the expected format

Link: https://lkml.kernel.org/r/20260311215818.518930-1-sashal@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

decode_stacktrace: decode caller address

Decode the caller address instead of the return address by default.  This
also introduced -R option to provide return address decoding mode.

This changes the decode_stacktrace.sh to decode the line info 1byte before
the return address which will be the call(branch) instruction address.  If
the return address is a symbol address (zero offset from it), it falls
back to decoding the return address.

This improves results especially when optimizations have changed the order
of the lines around the return address, or when the return address does
not have the actual line information.

With this change;
Call Trace:
  <TASK>
  dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
  lockdep_rcu_suspicious (kernel/locking/lockdep.c:6876)
  event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:1057)
  kernel_clone (include/trace/events/sched.h:396 include/trace/events/sched.h:396 kernel/fork.c:2664)
  __x64_sys_clone (kernel/fork.c:2795 kernel/fork.c:2779 kernel/fork.c:2779)
  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
  ? trace_irq_disable (include/trace/events/preemptirq.h:36)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)

Without this (or give -R option);
Call Trace:
  <TASK>
  dump_stack_lvl (lib/dump_stack.c:122)
  lockdep_rcu_suspicious (kernel/locking/lockdep.c:6877)
  event_filter_pid_sched_process_fork (kernel/trace/trace_events.c:?)
  kernel_clone (include/trace/events/sched.h:? include/trace/events/sched.h:396 kernel/fork.c:2664)
  __x64_sys_clone (kernel/fork.c:2779)
  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
  ? trace_irq_disable (include/trace/events/preemptirq.h:36)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

[akpm@linux-foundation.org: fix spello]
Link: https://lkml.kernel.org/r/177275821652.1557019.18367881408364381866.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> [arm64]
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Sasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests: fix ARCH normalization to handle command-line argument

Several selftests Makefiles (e.g.  prctl, breakpoints, etc) attempt to
normalize the ARCH variable by converting x86_64 and i.86 to x86.
However, it uses the conditional assignment operator '?='.

When ARCH is passed as a command-line argument (e.g., during an rpmbuild
process), the '?=' operator ignores the shell command and the sed
transformation.  This leads to an incorrect ARCH value being used, which
causes build failures

  # make -C tools/testing/selftests TARGETS=prctl ARCH=x86_64
  make: Entering directory '/build/tools/testing/selftests'
  make[1]: Entering directory '/build/tools/testing/selftests/prctl'
  make[1]: *** No targets.  Stop.
  make[1]: Leaving directory '/build/tools/testing/selftests/prctl'
  make: *** [Makefile:197: all] Error 2

Change the assignment to use 'override' and ':=' to ensure the
normalization logic is applied regardless of how the ARCH variable was
initially defined.

Link: https://lkml.kernel.org/r/20260309205145.572778-1-aleksey.oladko@virtuozzo.com
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Cc: Chelsy Ratnawat <chelsyratnawat2001@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/ts_kmp: fix integer overflow in pattern length calculation

The ts_kmp algorithm stores its prefix_tbl[] table and pattern in a single
allocation sized from the pattern length. If the prefix_tbl[] size
calculation wraps, the resulting allocation can be too small and
subsequent pattern copies can overflow it.

Fix this by rejecting zero-length patterns and by using overflow helpers
before calculating the combined allocation size.

This fixes a potential heap overflow. The pattern length calculation can
wrap during a size_t addition, leading to an undersized allocation.
Because the textsearch library is reachable from userspace via Netfilter's
xt_string module, this is a security risk that should be backported to LTS
kernels.

Link: https://lkml.kernel.org/r/20260308202028.2889285-2-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/ts_bm: fix integer overflow in pattern length calculation

The ts_bm algorithm stores its good_shift[] table and pattern in a single
allocation sized from the pattern length. If the good_shift[] size
calculation wraps, the resulting allocation can be too small and
subsequent pattern copies can overflow it.

Fix this by rejecting zero-length patterns and by using overflow helpers
before calculating the combined allocation size.

This fixes a potential heap overflow. The pattern length calculation can
wrap during a size_t addition, leading to an undersized allocation.
Because the textsearch library is reachable from userspace via Netfilter's
xt_string module, this is a security risk that should be backported to LTS
kernels.

Link: https://lkml.kernel.org/r/20260308202028.2889285-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: remove redundant error code assignment

Remove the error assignment for variable 'ret' during correct code
execution. In subsequent execution, variable 'ret' is overwritten.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lkml.kernel.org/r/20260307234809.88421-1-a.velichayshiy@ispras.ru
Signed-off-by: Alexey Velichayshiy <a.velichayshiy@ispras.ru>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

tools/getdelays: use the static UAPI headers from tools/include/uapi

The include directory ../../usr/include is only present if an in-tree
kernel build with CONFIG_HEADERS_INSTALL was done before.
Otherwise the system UAPI headers are used, which most likely are not
the most recent ones.

To make sure to always have access to up-to-date UAPI headers,
use the static copy in tools/include/uapi.

Link: https://lkml.kernel.org/r/20260307-accounting-taskstats-h-v1-2-0b75915c6ce5@weissschuh.net
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202603062103.Z5fecwZD-lkp@intel.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Jiang Kun <jiang.kun2@zte.com.cn>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

tools headers UAPI: sync linux/taskstats.h

Patch series "tools/getdelays: use the static UAPI headers from
tools/include/uapi".

The include directory ../../usr/include is only present if an in-tree
kernel build with CONFIG_HEADERS_INSTALL was done before. Otherwise the
system UAPI headers are used, which most likely are not the most recent
ones.

To make sure to always have access to up-to-date UAPI headers, use the
static copy in tools/include/uapi.

This patch (of 2):

To give the accounting tools access to the new fields introduced in commit
503efe850c74 ("delayacct: add timestamp of delay max")

Link: https://lkml.kernel.org/r/20260307-accounting-taskstats-h-v1-0-0b75915c6ce5@weissschuh.net
Link: https://lkml.kernel.org/r/20260307-accounting-taskstats-h-v1-1-0b75915c6ce5@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Jiang Kun <jiang.kun2@zte.com.cn>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: decompress_bunzip2: fix 32-bit shift undefined behavior

Fix undefined behavior caused by shifting a 32-bit integer by 32 bits
during decompression. This prevents potential kernel decompression
failures or corruption when parsing malicious or malformed bzip2 archives.

Link: https://lkml.kernel.org/r/20260308165012.2872633-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: fix possible deadlock between unlink and dio_end_io_write

ocfs2_unlink takes orphan dir inode_lock first and then ip_alloc_sem,
while in ocfs2_dio_end_io_write, it acquires these locks in reverse order.
This creates an ABBA lock ordering violation on lock classes
ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE] and
ocfs2_file_ip_alloc_sem_key.

Lock Chain #0 (orphan dir inode_lock -> ip_alloc_sem):
ocfs2_unlink
  ocfs2_prepare_orphan_dir
    ocfs2_lookup_lock_orphan_dir
      inode_lock(orphan_dir_inode) <- lock A
    __ocfs2_prepare_orphan_dir
      ocfs2_prepare_dir_for_insert
        ocfs2_extend_dir
  ocfs2_expand_inline_dir
    down_write(&oi->ip_alloc_sem) <- Lock B

Lock Chain #1 (ip_alloc_sem -> orphan dir inode_lock):
ocfs2_dio_end_io_write
  down_write(&oi->ip_alloc_sem) <- Lock B
  ocfs2_del_inode_from_orphan()
    inode_lock(orphan_dir_inode) <- Lock A

Deadlock Scenario:
  CPU0 (unlink)                     CPU1 (dio_end_io_write)
  ------                            ------
  inode_lock(orphan_dir_inode)
                                    down_write(ip_alloc_sem)
  down_write(ip_alloc_sem)
                                    inode_lock(orphan_dir_inode)

Since ip_alloc_sem is to protect allocation changes, which is unrelated
with operations in ocfs2_del_inode_from_orphan.  So move
ocfs2_del_inode_from_orphan out of ip_alloc_sem to fix the deadlock.

Link: https://lkml.kernel.org/r/20260306032211.1016452-1-joseph.qi@linux.alibaba.com
Reported-by: syzbot+67b90111784a3eac8c04@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=67b90111784a3eac8c04
Fixes: a86a72a4a4e0 ("ocfs2: take ip_alloc_sem in ocfs2_dio_get_block & ocfs2_dio_end_io_write")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/bug: remove unnecessary variable initializations

Remove the unnecessary initialization of 'rcu' to false in
report_bug_entry() and report_bug(), as it is assigned by warn_rcu_enter()
before its first use.

Link: https://lkml.kernel.org/r/20260306162418.2815979-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/bug: fix inconsistent capitalization in BUG message

Use lowercase "kernel BUG" consistently in pr_crit() messages. The
verbose path already uses "kernel BUG at %s:%u!" but the non-verbose
fallback uses "Kernel BUG" with an uppercase 'K'.

Link: https://lkml.kernel.org/r/20260306162327.2815553-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/inflate: fix typo "This results" to "The results" in comment

Fix "This results of this trade" to "The results of this trade" in the
comment describing the lbits and dbits tuning parameters.

Link: https://lkml.kernel.org/r/20260306161732.2812132-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/inflate: fix grammar in comment: "variable" to "variables"

Fix "all variable" to "all variables" in the file header comment.

Link: https://lkml.kernel.org/r/20260306161707.2812005-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/inflate: fix memory leak in inflate_dynamic() on inflate_codes() failure

When inflate_codes() fails in inflate_dynamic(), the code jumps to the
'out' label which only frees 'll', leaking the Huffman tables 'tl' and
'td'. Restructure the code so that the decoding tables are always freed
before reaching the 'out' label.

Link: https://lkml.kernel.org/r/20260306161647.2811874-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/inflate: fix memory leak in inflate_fixed() on inflate_codes() failure

When inflate_codes() fails in inflate_fixed(), only the length list 'l' is
freed, but the Huffman tables 'tl' and 'td' are leaked. Add the missing
huft_free() calls on the error path.

Link: https://lkml.kernel.org/r/20260306161612.2811703-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/uuid: fix typo "reversion" to "revision" in comment

Fix a typo in __uuid_gen_common() where "reversion" (meaning to revert)
was used instead of "revision" when describing the UUID variant field.

Link: https://lkml.kernel.org/r/20260306161250.2811500-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/gdb/symbols: handle module path parameters

commit 581ee79a2547 ("scripts/gdb/symbols: make BPF debug info available
to GDB") added support to make BPF debug information available to GDB.
However, the argument handling loop was slightly broken, causing it to
fail if further modules were passed. Fix it to append these passed
modules to the instance variable after expansion.

Link: https://lkml.kernel.org/r/20260304110642.2020614-2-benjamin@sipsolutions.net
Fixes: 581ee79a2547 ("scripts/gdb/symbols: make BPF debug info available to GDB")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hung_task: explicitly report I/O wait state in log output

Currently, the hung task reporting mechanism indiscriminately labels all
TASK_UNINTERRUPTIBLE (D) tasks as "blocked", irrespective of whether they
are awaiting I/O completion or kernel locking primitives.  This ambiguity
compels system administrators to manually inspect stack traces to discern
whether the delay stems from an I/O wait (typically indicative of hardware
or filesystem anomalies) or software contention.  Such detailed analysis
is not always immediately accessible to system administrators or support
engineers.

To address this, this patch utilises the existing in_iowait field within
struct task_struct to augment the failure report.  If the task is blocked
due to I/O (e.g., via io_schedule_prepare()), the log message is updated
to explicitly state "blocked in I/O wait".

Examples:
        - Standard Block: "INFO: task bash:123 blocked for more than 120
          seconds".

        - I/O Block: "INFO: task dd:456 blocked in I/O wait for more than
          120 seconds".

Theoretically, concurrent executions of io_schedule_finish() could result
in a race condition where the read flag does not precisely correlate with
the subsequently printed backtrace.  However, this limitation is deemed
acceptable in practice.  The entire reporting mechanism is inherently racy
by design; nevertheless, it remains highly reliable in the vast majority
of cases, particularly because it primarily captures protracted stalls.
Consequently, introducing additional synchronisation to mitigate this
minor inaccuracy would be entirely disproportionate to the situation.

Link: https://lkml.kernel.org/r/20260303221324.4106917-1-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hung_task: increment the global counter immediately

A recent change allowed to reset the global counter of hung tasks using
the sysctl interface.  A potential race with the regular check has been
solved by updating the global counter only once at the end of the check.

However, the hung task check can take a significant amount of time,
particularly when task information is being dumped to slow serial
consoles.  Some users monitor this global counter to trigger immediate
migration of critical containers.  Delaying the increment until the full
check completes postpones these high-priority rescue operations.

Update the global counter as soon as a hung task is detected.  Since the
value is read asynchronously, a relaxed atomic operation is sufficient.

Link: https://lkml.kernel.org/r/20260303203031.4097316-4-atomlin@atomlin.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reported-by: Lance Yang <lance.yang@linux.dev>
Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hung_task: enable runtime reset of hung_task_detect_count

Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot.  In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset once
an incident has been resolved.  Furthermore, the previous implementation
relied upon implicit ordering, which could not strictly guarantee that
diagnostic metadata published by one CPU was visible to the panic logic on
another.

This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl.  The proc_handler logic
has been updated to validate this input and atomically reset the counter.

The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace.  The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:

    1. Prevention of Load-Store Reordering via Acquire Semantics By
       utilising atomic_long_read_acquire() to snapshot the counter
       before initiating the task traversal, we establish a strict
       memory barrier. This prevents the compiler or hardware from
       reordering the initial load to a point later in the scan. Without
       this "acquire" barrier, a delayed load could potentially read a
       "0" value resulting from a userspace reset that occurred
       mid-scan. This would lead to the subsequent cmpxchg succeeding
       erroneously, thereby overwriting the user's reset with stale
       increment data.

    2. Atomicity of the "Commit" Phase via Release Semantics The
       atomic_long_cmpxchg_release() serves as the transaction's commit
       point. The "release" barrier ensures that all diagnostic
       recordings and task-state observations made during the scan are
       globally visible before the counter is incremented.

    3. Race Condition Resolution This pairing effectively detects any
       "out-of-band" reset of the counter. If
       sysctl_hung_task_detect_count is modified via the procfs
       interface during the scan, the final cmpxchg will detect the
       discrepancy between the current value and the "acquire" snapshot.
       Consequently, the update will fail, ensuring that a reset command
       from the administrator is prioritised over a scan that may have
       been invalidated by that very reset.

Link: https://lkml.kernel.org/r/20260303203031.4097316-3-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Joel Granados <joel.granados@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

hung_task: refactor detection logic and atomicise detection count

Patch series "hung_task: Provide runtime reset interface for hung task
detector", v9.

This series introduces the ability to reset
/proc/sys/kernel/hung_task_detect_count.

Writing a "0" value to this file atomically resets the counter of detected
hung tasks.  This functionality provides system administrators with the
means to clear the cumulative diagnostic history following incident
resolution, thereby simplifying subsequent monitoring without
necessitating a system restart.

This patch (of 3):

The check_hung_task() function currently conflates two distinct
responsibilities: validating whether a task is hung and handling the
subsequent reporting (printing warnings, triggering panics, or
tracepoints).

This patch refactors the logic by introducing hung_task_info(), a function
dedicated solely to reporting.  The actual detection check,
task_is_hung(), is hoisted into the primary loop within
check_hung_uninterruptible_tasks().  This separation clearly decouples the
mechanism of detection from the policy of reporting.

Furthermore, to facilitate future support for concurrent hung task
detection, the global sysctl_hung_task_detect_count variable is converted
from unsigned long to atomic_long_t.  Consequently, the counting logic is
updated to accumulate the number of hung tasks locally (this_round_count)
during the iteration.  The global counter is then updated atomically via
atomic_long_cmpxchg_relaxed() once the loop concludes, rather than
incrementally during the scan.

These changes are strictly preparatory and introduce no functional change
to the system's runtime behaviour.

Link: https://lkml.kernel.org/r/20260303203031.4097316-1-atomlin@atomlin.com
Link: https://lkml.kernel.org/r/20260303203031.4097316-2-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Joel Granados <joel.granados@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: update Guru Das Srinagesh's email address

Add my current email address and map previous addresses to it.

Link: https://lkml.kernel.org/r/20260301-gds-mailmap-update-2-v1-1-5691415be73c@gurudas.dev
Signed-off-by: Guru Das Srinagesh <linux@gurudas.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: math: polynomial: remove link to non-exist file and fix spelling

The Baikal SoC and platform support was dropped from the kernel, remove
the reference to non-exist file. While at it, fix spelling.

Link: https://lkml.kernel.org/r/20260302092831.2267785-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: math: polynomial: don't use 'proxy' headers

Update header inclusions to follow IWYU (Include What You Use) principle.

Link: https://lkml.kernel.org/r/20260302092831.2267785-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: polynomial: move to math/ subfolder

Patch series "lib: polynomial: Move to math/ and clean up", v2.

While removing Baikal SoC and platform code pieces I found that this code
belongs to lib/math/ rather than generic lib/. Hence the move and
followed up cleanups.

This patch (of 3):

The algorithm behind polynomial belongs to our collection of math
equations and expressions handling. Move it to math/ subfolder where
others of the kind are located.

Link: https://lkml.kernel.org/r/20260302092831.2267785-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ocfs2: fix deadlock when creating quota file

syzbot detected a circular locking dependency. the scenarios:

        CPU0                    CPU1
        ----                    ----
   lock(&ocfs2_quota_ip_alloc_sem_key);
                                lock(&ocfs2_sysfile_lock_key[USER_QUOTA_SYSTEM_INODE]);
                                lock(&ocfs2_quota_ip_alloc_sem_key);
   lock(&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]);

or:
       CPU0                    CPU1
       ----                    ----
  lock(&ocfs2_quota_ip_alloc_sem_key);
                               lock(&dquot->dq_lock);
                               lock(&ocfs2_quota_ip_alloc_sem_key);
  lock(&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]);

Following are the code paths for above scenarios:

path_openat
ocfs2_create
  ocfs2_mknod
  + ocfs2_reserve_new_inode
  |  ocfs2_reserve_suballoc_bits
  |   inode_lock(alloc_inode) //C0: hold INODE_ALLOC_SYSTEM_INODE
  |    //ocfs2_free_alloc_context(inode_ac) is called at the end of
  |    //caller ocfs2_mknod to handle the release
  |
  + ocfs2_get_init_inode
     __dquot_initialize
      dqget
       ocfs2_acquire_dquot
       + ocfs2_lock_global_qf
       |  down_write(&OCFS2_I(oinfo->dqi_gqinode)->ip_alloc_sem)//A2:grabbing
       + ocfs2_create_local_dquot
          down_write(&OCFS2_I(lqinode)->ip_alloc_sem)//A3:grabbing

evict
ocfs2_evict_inode
  ocfs2_delete_inode
   ocfs2_wipe_inode
    + inode_lock(orphan_dir_inode) //B0:hold
    + ...
    + ocfs2_remove_inode
       inode_lock(inode_alloc_inode) //INODE_ALLOC_SYSTEM_INODE
        down_write(&inode->i_rwsem) //C1:grabbing

generic_file_direct_write
ocfs2_direct_IO
  __blockdev_direct_IO
   dio_complete
    ocfs2_dio_end_io
     ocfs2_dio_end_io_write
     + down_write(&oi->ip_alloc_sem) //A0:hold
     + ocfs2_del_inode_from_orphan
        inode_lock(orphan_dir_inode) //B1:grabbing

Root cause for the circular locking:

DIO completion path:
holds oi->ip_alloc_sem and is trying to acquire the orphan_dir_inode lock.

evict path:
holds the orphan_dir_inode lock and is trying to acquire the
inode_alloc_inode lock.

ocfs2_mknod path:
Holds the inode_alloc_inode lock (to allocate a new quota file) and is
blocked waiting for oi->ip_alloc_sem in ocfs2_acquire_dquot().

How to fix:

Replace down_write() with down_write_trylock() in ocfs2_acquire_dquot().
If acquiring oi->ip_alloc_sem fails, return -EBUSY to abort the file
creation routine and break the deadlock.

Link: https://lkml.kernel.org/r/20260302061707.7092-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reported-by: syzbot+78359d5fbb04318c35e9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=78359d5fbb04318c35e9
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

get_maintainer: add ** glob pattern support

Add support for the ** glob operator in MAINTAINERS F: and X: patterns,
matching any number of path components (like Python's ** glob).

The existing * to .* conversion with slash-count check is preserved. **
is converted to (?:.*), a non-capturing group used as a marker to bypass
the slash-count check in file_match_pattern(), allowing the pattern to
cross directory boundaries.

This enables patterns like F: **/*[_-]kunit*.c to match files at any depth
in the tree.

Link: https://lkml.kernel.org/r/20260302103822.77343-1-teknoraver@meta.com
Signed-off-by: Matteo Croce <teknoraver@meta.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash_dump: use sysfs_emit in sysfs show functions

Replace sprintf() with sysfs_emit() in sysfs show functions. sysfs_emit()
is preferred for formatting sysfs output because it provides safer bounds
checking. No functional changes.

Link: https://lkml.kernel.org/r/20260301125106.911980-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib/glob: clean up "bool abuse" in pointer arithmetic

Replace the implicit 'bool' to 'int' conversion with an explicit ternary
operator. This makes the pointer arithmetic clearer and avoids relying on
boolean memory representation for logic flow.

Link: https://lkml.kernel.org/r/20260301203845.2617217-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: glob: replace bitwise OR with logical operation on boolean

Using bitwise OR (|=) on a boolean variable is valid C, but replacing it
with a direct logical assignment makes the intent clearer and appeases
strict static analysis tools.

Link: https://lkml.kernel.org/r/20260301152143.2572137-2-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: glob: add explicit include for export.h

Include <linux/export.h> explicitly instead of relying on it being
implicitly included by <linux/module.h> for the EXPORT_SYMBOL macro.

Link: https://lkml.kernel.org/r/20260301152143.2572137-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: glob: fix grammar and replace non-inclusive terminology

Fix a missing article ('a') in the comment describing the glob
implementation, and replace 'blacklists' with 'denylists' to align with
the kernel's inclusive terminology guidelines.

Link: https://lkml.kernel.org/r/20260301154553.2592681-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/fchmodat2: use ksft_finished()

The fchmodat2 test program open codes a version of ksft_finished(), use
the standard version.

Link: https://lkml.kernel.org/r/20260226-selftests-fchmodat2-v4-2-a6419435f2e8@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Alexey Gladkov <legion@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/fchmodat2: clean up temporary files and directories

Patch series "selftests/fchmodat2: Error handling and general", v4.

I looked at the fchmodat2() tests since I've been experiencing some random
intermittent segfaults with them in my test systems, while doing so I
noticed these two issues. Unfortunately I didn't figure out the original
yet, unless I managed to fix it unwittingly.

This patch (of 2):

The fchmodat2() test program creates a temporary directory with a file and
a symlink for every test it runs but never cleans these up, resulting in
${TMPDIR} getting left with stale files after every run. Restructure the
program a bit to ensure that we clean these up, this is more invasive than
it might otherwise be due to the extensive use of ksft_exit_fail_msg() in
the program.

As a side effect this also ensures that we report a consistent test name
for the tests and always try both tests even if they are skipped.

Link: https://lkml.kernel.org/r/20260226-selftests-fchmodat2-v4-0-a6419435f2e8@kernel.org
Link: https://lkml.kernel.org/r/20260226-selftests-fchmodat2-v4-1-a6419435f2e8@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Alexey Gladkov <legion@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

lib: glob: add missing SPDX-License-Identifier

Add the missing dual MIT/GPL license identifier to glob.c.

Link: https://lkml.kernel.org/r/20260228195300.2468310-1-objecting@objecting.org
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

pid: document the PIDNS_ADDING checks in alloc_pid() and copy_process()

Both copy_process() and alloc_pid() do the same PIDNS_ADDING check. The
reasons for these checks, and the fact that both are necessary, are not
immediately obvious. Add the comments.

Link: https://lkml.kernel.org/r/aaGIRElc78U4Er42@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

pid: make sub-init creation retryable

Patch series "pid: make sub-init creation retryable".

This patch (of 2):

Currently we allow only one attempt to create init in a new namespace. If
the first fork() fails after alloc_pid() succeeds, free_pid() clears
PIDNS_ADDING and thus disables further PID allocations.

Nowadays this looks like an unnecessary limitation. The original reason
to handle "case PIDNS_ADDING" in free_pid() is gone, most probably after
commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of
proc").

Change free_pid() to keep ns->pid_allocated == PIDNS_ADDING, and change
alloc_pid() to reset the cursor early, right after taking pidmap_lock.

Test-case:

#define _GNU_SOURCE
#include <linux/sched.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <assert.h>
#include <sched.h>
#include <errno.h>

int main(void)
{
struct clone_args args = {
.exit_signal = SIGCHLD,
.flags = CLONE_PIDFD,
.pidfd = 0,
};
unsigned long pidfd;
int pid;

assert(unshare(CLONE_NEWPID) == 0);

pid = syscall(__NR_clone3, &args, sizeof(args));
assert(pid == -1 && errno == EFAULT);

args.pidfd = (unsigned long)&pidfd;
pid = syscall(__NR_clone3, &args, sizeof(args));
if (pid)
assert(pid > 0 && wait(NULL) == pid);
else
assert(getpid() == 1);

return 0;
}

Link: https://lkml.kernel.org/r/aaGHu3ixbw9Y7kFj@redhat.com
Link: https://lkml.kernel.org/r/aaGIHa7vGdwhEc_D@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash_dump: fix typo in function name read_key_from_user_keying

The function read_key_from_user_keying() is missing an 'r' in its name.
Fix the typo by renaming it to read_key_from_user_keyring().

Link: https://lkml.kernel.org/r/20260227230422.859423-1-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

crash_dump: remove redundant less-than-zero check

'key_count' is an 'unsigned int' and cannot be less than zero. Remove
the redundant condition.

Link: https://lkml.kernel.org/r/20260228085136.861971-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fork: replace simple_strtoul with kstrtoul in coredump_filter_setup

Replace simple_strtoul() with the recommended kstrtoul() for parsing the
'coredump_filter=' boot parameter.

Check the return value of kstrtoul() and reject invalid values. This adds
error handling while preserving behavior for existing values, and removes
use of the deprecated simple_strtoul() helper. The current code silently
sets 'default_dump_filter = 0' if parsing fails, instead of leaving the
default value (MMF_DUMP_FILTER_DEFAULT) unchanged.

Rename the static variable 'default_dump_filter' to 'coredump_filter'
since it does not necessarily contain the default value and the current
name can be misleading.

Link: https://lkml.kernel.org/r/20251215142152.4082-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

complete_signal: kill always-true "core_state || !SIGNAL_GROUP_EXIT" check

The "(signal->core_state || !(signal->flags & SIGNAL_GROUP_EXIT))" check
in complete_signal() is not obvious at all, and in fact it only adds
unnecessary confusion: this condition is always true.

prepare_signal() does:

if (signal->flags & SIGNAL_GROUP_EXIT) {
if (signal->core_state)
return sig == SIGKILL;
/*
* The process is in the middle of dying, drop the signal.
*/
return false;
}

This means that "!signal->core_state && (signal->flags &
SIGNAL_GROUP_EXIT)" in complete_signal() is never possible.

If SIGNAL_GROUP_EXIT is set, prepare_signal() can only return true if
signal->core_state is not NULL.

Link: https://lkml.kernel.org/r/aZsfkDhnqJ4s1oTs@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc; Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

exit: kill unnecessary thread_group_leader() checks in exit_notify() and do_notify_parent()

thread_group_empty(tsk) is only possible if tsk is a group leader, and
thread_group_empty() already does the thread_group_leader() check.

So it makes no sense to check "thread_group_leader() &&
thread_group_empty()"; thread_group_empty() alone is enough.

Link: https://lkml.kernel.org/r/aZsfeegKZPZZszJh@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc; Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/ipc: skip msgque test when MSG_COPY is unsupported

msgque kselftest uses msgrcv(..., MSG_COPY) to copy messages. When the
kernel is built without CONFIG_CHECKPOINT_RESTORE, prepare_copy() is
stubbed out and msgrcv() returns -ENOSYS. The test currently reports this
as a failure even though it is simply a missing feature/configuration.

Skip the test when msgrcv() fails with ENOSYS.

Link: https://lkml.kernel.org/r/20260210135359.178636-1-jouyeol8739@gmail.com
Signed-off-by: UYeol Jo <jouyeol8739@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/spelling.txt: add "exaclty" typo

Link: https://lkml.kernel.org/r/20260212144005.45052-2-pvorel@suse.cz
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Cc: Jonathan Camerom <Jonathan.Cameron@huawei.com>
Cc: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/spelling.txt: sort alphabetically

Easier to add new entries. It was sorted when added in 66b47b4a9dad0, but
later got wrong order for few entries. Sorted with en_US.UTF-8 locale.

Link: https://lkml.kernel.org/r/20260212144005.45052-1-pvorel@suse.cz
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Cc: Jonathan Camerom <Jonathan.Cameron@huawei.com>
Cc: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/panic: mark init_taint_buf as __initdata and panic instead of warning in alloc_taint_buf()

However there's a convention of assuming that __init-time allocations
cannot fail. Because if a kmalloc() were to fail at this time, the kernel
is hopelessly messed up anyway. So simply panic() if that kmalloc failed,
then make that 350-byte buffer __initdata.

Link: https://lkml.kernel.org/r/20260223035914.4033-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/panic: allocate taint string buffer dynamically

The buffer used to hold the taint string is statically allocated, which
requires updating whenever a new taint flag is added.

Instead, allocate the exact required length at boot once the allocator is
available in an init function. The allocation sums the string lengths in
taint_flags[], along with space for separators and formatting.
print_tainted() is switched to use this dynamically allocated buffer.

If allocation fails, print_tainted() warns about the failure and continues
to use the original static buffer as a fallback.

Link: https://lkml.kernel.org/r/20260222140804.22225-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kernel/panic: increase buffer size for verbose taint logging

The verbose 'Tainted: ...' string in print_tainted_seq can total to 327
characters while the buffer defined in _print_tainted is 320 bytes.
Increase its size to 350 characters to hold all flags, along with some
headroom.

[akpm@linux-foundation.org: fix spello, add comment]
Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/bloat-o-meter: rename file arguments to match output

The output of bloat-o-meter already uses the words 'old' and 'new' for
symbol size in the table header, so reflect that in the corresponding
argument names.

Link: https://lkml.kernel.org/r/20260212213941.3984330-1-vkoskiv@gmail.com
Signed-off-by: Valtteri Koskivuori <vkoskiv@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

unshare: fix nsproxy leak in ksys_unshare() on set_cred_ucounts() failure

When set_cred_ucounts() fails in ksys_unshare() new_nsproxy is leaked.

Let's call put_nsproxy() if that happens.

Link: https://lkml.kernel.org/r/20260213193959.2556730-1-mge@meta.com
Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred")
Signed-off-by: Michal Grzedzicki <mge@meta.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Gladkov (Intel) <legion@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts/spelling.txt: add "binded||bound"

The correct passive of "to bind" is "bound", not "binded". This is often
used in the context of the BSD socket bind(2) operation.

Link: https://lkml.kernel.org/r/20260214140854.42247-1-gnoack3000@gmail.com
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

proc: array: drop stale FIXME about RCU in task_sig()

task_sig() already wraps the SigQ rlimit read in an explicit RCU read-side
critical section. Drop the stale FIXME comment and keep using
task_ucounts() for the ucounts access.

No functional change.

Link: https://lkml.kernel.org/r/20260215124511.14227-1-jaime.saguillo@gmail.com
Signed-off-by: Jaime Saguillo Revilla <jaime.saguillo@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

   - Clear the pending exception state from a vcpu coming out of reset,
     as it could otherwise affect the first instruction executed in the
     guest

   - Fix pointer arithmetic in address translation emulation, so that
     the Hardware Access bit is set on the correct PTE instead of some
     other location

  s390:

   - Fix deadlock in new memory management

   - Properly handle kernel faults on donated memory

   - Fix bounds checking for irq routing, with selftest

   - Fix invalid machine checks and log all of them"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: arm64: Fix the descriptor address in __kvm_at_swap_desc()
  KVM: s390: vsie: Avoid injecting machine check on signal
  KVM: s390: log machine checks more aggressively
  KVM: s390: selftests: Add IRQ routing address offset tests
  KVM: s390: Limit adapter indicator access to mapped page
  s390/mm: Add missing secure storage access fixups for donated memory
  KVM: arm64: Discard PC update state on vcpu reset
  KVM: s390: Fix a deadlock

Merge tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL) fixes from Dave Jiang:

- Adjust the startup priority of cxl_pmem to be higher than that of
   cxl_acpi

- Use proper endpoint validity check upon sanitize

- Avoid incorrect DVSEC fallback when HDM decoders are enabled

- Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch

- Fix leakage in __construct_region()

- Fix use after free of parent_port in cxl_detach_ep()

* tag 'cxl-fixes-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl: Adjust the startup priority of cxl_pmem to be higher than that of cxl_acpi
  cxl/mbox: Use proper endpoint validity check upon sanitize
  cxl/hdm: Avoid incorrect DVSEC fallback when HDM decoders are enabled
  cxl/acpi: Fix CXL_ACPI and CXL_PMEM Kconfig tristate mismatch
  cxl/region: Fix leakage in __construct_region()
  cxl/port: Fix use after free of parent_port in cxl_detach_ep()

Merge tag 'kvmarm-fixes-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.0, take #4

- Clear the pending exception state from a vcpu coming out of
  reset, as it could otherwise affect the first instruction
  executed in the guest.

- Fix the address translation emulation icode to set the Hardware
  Access bit on the correct PTE instead of some other location.

Merge tag 'kvm-s390-master-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 7.0

- fix deadlock in new memory management
- handle kernel faults on donated memory properly
- fix bounds checking for irq routing + selftest
- fix invalid machine checks + logging

Merge tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM fixes from Andrew Morton:
"6 hotfixes.  2 are cc:stable.  All are for MM.

  All are singletons - please see the changelogs for details"

* tag 'mm-hotfixes-stable-2026-03-23-17-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/damon/stat: monitor all System RAM resources
  mm/zswap: add missing kunmap_local()
  mailmap: update email address for Muhammad Usama Anjum
  zram: do not slot_free() written-back slots
  mm/damon/core: avoid use of half-online-committed context
  mm/rmap: clear vma->anon_vma on error

Merge tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix parsing 'overwrite' in command line event definitions in
   big-endian machines by writing correct union member

- Fix finding default metric in 'perf stat'

- Fix relative paths for including headers in 'perf kvm stat'

- Sync header copies with the kernel sources: msr-index.h, kvm,
   build_bug.h

* tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  tools headers: Synchronize linux/build_bug.h with the kernel sources
  tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf kvm stat: Fix relative paths for including headers
  perf parse-events: Fix big-endian 'overwrite' by writing correct union member
  perf metricgroup: Fix metricgroup__has_metric_or_groups()
  tools headers: Skip arm64 cputype.h check

Merge tag 'media/v7.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

- rkvdec: fix stack usage with clang and improve handling missing
   short/long term RPS

- synopsys: fix a Kconfig issue and an out-of-bounds check

- verisilicon: Fix kernel panic due to __initconst misuse

- media core: serialize REINIT and REQBUFS with req_queue_mutex

* tag 'media/v7.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: verisilicon: Fix kernel panic due to __initconst misuse
  media: rkvdec: reduce stack usage in rkvdec_init_v4l2_vp9_count_tbl()
  media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
  media: rkvdec: Improve handling missing short/long term RPS
  media: mc, v4l2: serialize REINIT and REQBUFS with req_queue_mutex
  media: synopsys: csi2rx: add missing kconfig dependency
  media: synopsys: csi2rx: fix out-of-bounds check for formats array

Merge tag 'xsa482-7.0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"Restrict the xen privcmd driver in unprivileged domU to only allow
  hypercalls to target domain when using secure boot"

* tag 'xsa482-7.0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/privcmd: add boot control for restricted usage in domU
  xen/privcmd: restrict usage in unprivileged domU

mm/damon/stat: monitor all System RAM resources

DAMON_STAT usage document (Documentation/admin-guide/mm/damon/stat.rst)
says it monitors the system's entire physical memory.  But, it is
monitoring only the biggest System RAM resource of the system.  When there
are multiple System RAM resources, this results in monitoring only an
unexpectedly small fraction of the physical memory.  For example, suppose
the system has a 500 GiB System RAM, 10 MiB non-System RAM, and 500 GiB
System RAM resources in order on the physical address space.  DAMON_STAT
will monitor only the first 500 GiB System RAM.  This situation is
particularly common on NUMA systems.

Select a physical address range that covers all System RAM areas of the
system, to fix this issue and make it work as documented.

[sj@kernel.org: return error if monitoring target region is invalid]
Link: https://lkml.kernel.org/r/20260317053631.87907-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260316235118.873-1-sj@kernel.org
Fixes: 369c415e6073 ("mm/damon: introduce DAMON_STAT module")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/zswap: add missing kunmap_local()

Commit e2c3b6b21c77 ("mm: zswap: use SG list decompression APIs from
zsmalloc") updated zswap_decompress() to use the scatterwalk API to copy
data for uncompressed pages.

In doing so, it mapped kernel memory locally for 32-bit kernels using
kmap_local_folio(), however it never unmapped this memory.

This resulted in the linked syzbot report where a BUG_ON() is triggered
due to leaking the kmap slot.

This patch fixes the issue by explicitly unmapping the established kmap.

Also, add flush_dcache_folio() after the kunmap_local() call

I had assumed that a new folio here combined with the flush that is done at
the point of setting the PTE would suffice, but it doesn't seem that's
actually the case, as update_mmu_cache() will in many archtectures only
actually flush entries where a dcache flush was done on a range previously.

I had also wondered whether kunmap_local() might suffice, but it doesn't
seem to be the case.

Some arches do seem to actually dcache flush on unmap, parisc does it if
CONFIG_HIGHMEM is not set by setting ARCH_HAS_FLUSH_ON_KUNMAP and calling
kunmap_flush_on_unmap() from __kunmap_local(), otherwise non-CONFIG_HIGHMEM
callers do nothing here.

Otherwise arch_kmap_local_pre_unmap() is called which does:

* sparc - flush_cache_all()
* arm - if VIVT, __cpuc_flush_dcache_area()
* otherwise - nothing

Also arch_kmap_local_post_unmap() is called which does:

* arm - local_flush_tlb_kernel_page()
* csky - kmap_flush_tlb()
* microblaze, ppc - local_flush_tlb_page()
* mips - local_flush_tlb_one()
* sparc - flush_tlb_all() (again)
* x86 - arch_flush_lazy_mmu_mode()
* otherwise - nothing

But this is only if it's high memory, and doesn't cover all architectures,
so is presumably intended to handle other cache consistency concerns.

In any case, VIPT is problematic here whether low or high memory (in spite
of what the documentation claims, see [0] - 'the kernel did write to a page
that is in the page cache page and / or in high memory'), because dirty
cache lines may exist at the set indexed by the kernel direct mapping,
which won't exist in the set indexed by any subsequent userland mapping,
meaning userland might read stale data from L2 cache.

Even if the documentation is correct and low memory is fine not to be
flushed here, we can't be sure as to whether the memory is low or high
(kmap_local_folio() will be a no-op if low), and this call should be
harmless if it is low.

VIVT would require more work if the memory were shared and already mapped,
but this isn't the case here, and would anyway be handled by the dcache
flush call.

In any case, we definitely need this flush as far as I can tell.

And we should probably consider updating the documentation unless it turns
out there's somehow dcache synchronisation that happens for low
memory/64-bit kernels elsewhere?

[ljs@kernel.org: add flush_dcache_folio() after the kunmap_local() call]
Link: https://lkml.kernel.org/r/13e09a99-181f-45ac-a18d-057faf94bccb@lucifer.local
Link: https://lkml.kernel.org/r/20260316140122.339697-1-ljs@kernel.org
Link: https://docs.kernel.org/core-api/cachetlb.html
Fixes: e2c3b6b21c77 ("mm: zswap: use SG list decompression APIs from zsmalloc")
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reported-by: syzbot+fe426bef95363177631d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69b75e2c.050a0220.12d28.015a.GAE@google.com
Acked-by: Yosry Ahmed <yosry@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: update email address for Muhammad Usama Anjum

Add updated email address.

Link: https://lkml.kernel.org/r/20260310171757.3970390-1-usama.anjum@arm.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: Hans Verkuil <hverkuil@kernel.org>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Martin Kepplinger <martink@posteo.de>
Cc: Shannon Nelson <sln@onemain.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Linux 7.0-rc5

tools headers: Synchronize linux/build_bug.h with the kernel sources

To pick up the changes in:

  6ffd853b0b10e1e2 ("build_bug.h: correct function parameters names in kernel-doc")

That just add some comments, addressing this perf tools build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/linux/build_bug.h include/linux/build_bug.h

Please take a look at tools/include/uapi/README for further info on this
synchronization process.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources

To pick the changes in:

  e2ffe85b6d2bb778 ("KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM")

That just rebuilds kvm-stat.c on x86, no change in functionality.

This silences these perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for further details.

Cc: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

tools headers UAPI: Sync linux/kvm.h with the kernel sources

To pick the changes in:

  da142f3d373a6dda ("KVM: Remove subtle "struct kvm_stats_desc" pseudo-overlay")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the 'perf trace' ioctl syscall argument beautifiers.

This addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Please see tools/include/uapi/README for further details.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

tools arch x86: Sync the msr-index.h copy with the kernel sources

To pick up the changes from these csets:

  9073428bb204d921 ("x86/sev: Allow IBPB-on-Entry feature for SNP guests")

That cause no changes to tooling as it doesn't include a new MSR to be
captured by the tools/perf/trace/beauty/tracepoints/x86_msr.sh script.

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

i2c: tegra: Don't mark devices with pins as IRQ safe

I2C devices with associated pinctrl states (DPAUX I2C controllers)
will change pinctrl state during runtime PM. This requires taking
a mutex, so these devices cannot be marked as IRQ safe.

Add PINCTRL as dependency to avoid build errors.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/all/E1vsNBv-00000009nfA-27ZK@rmk-PC.armlinux.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix how linked registers track zero extension of subregisters (Daniel
   Borkmann)

- Fix unsound scalar fork for OR instructions (Daniel Wade)

- Fix exception exit lock check for subprogs (Ihor Solodrai)

- Fix undefined behavior in interpreter for SDIV/SMOD instructions
   (Jenny Guanni Qu)

- Release module's BTF when module is unloaded (Kumar Kartikeya
   Dwivedi)

- Fix constant blinding for PROBE_MEM32 instructions (Sachin Kumar)

- Reset register ID for END instructions to prevent incorrect value
   tracking (Yazhou Tang)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add a test cases for sync_linked_regs regarding zext propagation
  bpf: Fix sync_linked_regs regarding BPF_ADD_CONST32 zext propagation
  selftests/bpf: Add tests for maybe_fork_scalars() OR vs AND handling
  bpf: Fix unsound scalar forking in maybe_fork_scalars() for BPF_OR
  selftests/bpf: Add tests for sdiv32/smod32 with INT_MIN dividend
  bpf: Fix undefined behavior in interpreter sdiv/smod for INT_MIN
  selftests/bpf: Add tests for bpf_throw lock leak from subprogs
  bpf: Fix exception exit lock checking for subprogs
  bpf: Release module BTF IDR before module unload
  selftests/bpf: Fix pkg-config call on static builds
  bpf: Fix constant blinding for PROBE_MEM32 stores
  selftests/bpf: Add test for BPF_END register ID reset
  bpf: Reset register ID for BPF_END value tracking

Merge tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Revert "tracing: Remove pid in task_rename tracing output"

   A change was made to remove the pid field from the task_rename event
   because it was thought that it was always done for the current task
   and recording the pid would be redundant. This turned out to be
   incorrect and there are a few corner case where this is not true and
   caused some regressions in tooling.

- Fix the reading from user space for migration

   The reading of user space uses a seq lock type of logic where it uses
   a per-cpu temporary buffer and disables migration, then enables
   preemption, does the copy from user space, disables preemption,
   enables migration and checks if there was any schedule switches while
   preemption was enabled. If there was a context switch, then it is
   considered that the per-cpu buffer could be corrupted and it tries
   again. There's a protection check that tests if it takes a hundred
   tries, it issues a warning and exits out to prevent a live lock.

   This was triggered because the task was selected by the load balancer
   to be migrated to another CPU, every time preemption is enabled the
   migration task would schedule in try to migrate the task but can't
   because migration is disabled and let it run again. This caused the
   scheduler to schedule out the task every time it enabled preemption
   and made the loop never exit (until the 100 iteration test
   triggered).

   Fix this by enabling and disabling preemption and keeping migration
   enabled if the reading from user space needs to be done again. This
   will let the migration thread migrate the task and the copy from user
   space will likely pass on the next iteration.

- Fix trace_marker copy option freeing

   The "copy_trace_marker" option allows a tracing instance to get a
   copy of a write to the trace_marker file of the top level instance.
   This is managed by a link list protected by RCU. When an instance is
   removed, a check is made if the option is set, and if so
   synchronized_rcu() is called.

   The problem is that an iteration is made to reset all the flags to
   what they were when the instance was created (to perform clean ups)
   was done before the check of the copy_trace_marker option and that
   option was cleared, so the synchronize_rcu() was never called.

   Move the clearing of all the flags after the check of
   copy_trace_marker to do synchronize_rcu() so that the option is still
   set if it was before and the synchronization is performed.

- Fix entries setting when validating the persistent ring buffer

   When validating the persistent ring buffer on boot up, the number of
   events per sub-buffer is added to the sub-buffer meta page. The
   validator was updating cpu_buffer->head_page (the first sub-buffer of
   the per-cpu buffer) and not the "head_page" variable that was
   iterating the sub-buffers. This was causing the first sub-buffer to
   be assigned the entries for each sub-buffer and not the sub-buffer
   that was supposed to be updated.

- Use "hash" value to update the direct callers

   When updating the ftrace direct callers, it assigned a temporary
   callback to all the callback functions of the ftrace ops and not just
   the functions represented by the passed in hash. This causes an
   unnecessary slow down of the functions of the ftrace_ops that is not
   being modified. Only update the functions that are going to be
   modified to call the ftrace loop function so that the update can be
   made on those functions.

* tag 'trace-v7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod
  ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
  tracing: Fix trace_marker copy link list updates
  tracing: Fix failure to read user space from system call trace events
  tracing: Revert "tracing: Remove pid in task_rename tracing output"

Merge tag 'i2c-for-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- fix broken I2C communication on Armada 3700 with recovery

- fix device_node reference leak in probe (fsi)

- fix NULL-deref when serial string is missing (cp2615)

* tag 'i2c-for-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: pxa: defer reset on Armada 3700 when recovery is used
  i2c: fsi: Fix a potential leak in fsi_i2c_probe()
  i2c: cp2615: fix serial string NULL-deref at probe

Merge tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

- Improve Qemu MCE-injection behavior by only using AMD SMCA MSRs if
   the feature bit is set

- Fix the relative path of gettimeofday.c inclusion in vclock_gettime.c

- Fix a boot crash on UV clusters when a socket is marked as
   'deconfigured' which are mapped to the SOCK_EMPTY node ID by
   the UV firmware, while Linux APIs expect NUMA_NO_NODE.

   The difference being (0xffff [unsigned short ~0]) vs [int -1]

* tag 'x86-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Handle deconfigured sockets
  x86/entry/vdso: Fix path of included gettimeofday.c
  x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs

Merge tag 'perf-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:

- Fix a PMU driver crash on AMD EPYC systems, caused by
   a race condition in x86_pmu_enable()

- Fix a possible counter-initialization bug in x86_pmu_enable()

- Fix a counter inheritance bug in inherit_event() and
   __perf_event_read()

- Fix an Intel PMU driver branch constraints handling bug
   found by UBSAN

- Fix the Intel PMU driver's new Off-Module Response (OMR)
   support code for Diamond Rapids / Nova lake, to fix a snoop
   information parsing bug

* tag 'perf-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix OMR snoop information parsing issues
  perf/x86/intel: Add missing branch counters constraint apply
  perf: Make sure to use pmu_ctx->pmu for groups
  x86/perf: Make sure to program the counter value for stopped events on migration
  perf/x86: Move event pointer setup earlier in x86_pmu_enable()

Merge tag 'objtool-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar:
"Fix three more livepatching related build environment bugs, and a
  false positive warning with Clang jump tables"

* tag 'objtool-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix Clang jump table detection
  livepatch/klp-build: Fix inconsistent kernel version
  objtool/klp: fix mkstemp() failure with long paths
  objtool/klp: fix data alignment in __clone_symbol()

Merge tag 'locking-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:
"Fix a sparse build error regression in <linux/local_lock_internal.h>
caused by the locking context-analysis changes"

* tag 'locking-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
include/linux/local_lock_internal.h: Make this header file again compatible with sparse

Merge tag 'irq-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
"Fix a mailbox channel leak in the riscv-rpmi-sysmsi irqchip driver"

* tag 'irq-urgent-2026-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-rpmi-sysmsi: Fix mailbox channel leak in rpmi_sysmsi_probe()

zram: do not slot_free() written-back slots

slot_free() basically completely resets the slots by clearing all of
its flags and attributes.  While zram_writeback_complete() restores
some of flags back (those that are necessary for async read
decompression) we still lose a lot of slot's metadata.  For example,
slot's ac-time, or ZRAM_INCOMPRESSIBLE.

More importantly, restoring flags/attrs requires extra attention as
some of the flags are directly affecting zram device stats.  And the
original code did not pay that attention.  Namely ZRAM_HUGE slots
handling in zram_writeback_complete().  The call to slot_free() would
decrement ->huge_pages, however when zram_writeback_complete() restored
the slot's ZRAM_HUGE flag, it would not get reflected in an incremented
->huge_pages.  So when the slot would finally get freed, slot_free()
would decrement ->huge_pages again, leading to underflow.

Fix this by open-coding the required memory free and stats updates in
zram_writeback_complete(), rather than calling the destructive
slot_free().  Since we now preserve the ZRAM_HUGE flag on written-back
slots (for the deferred decompression path), we also update slot_free()
to skip decrementing ->huge_pages if ZRAM_WB is set.

Link: https://lkml.kernel.org/r/20260320023143.2372879-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20260319034912.1894770-1-senozhatsky@chromium.org
Fixes: d38fab605c667 ("zram: introduce compressed data writeback")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Richard Chang <richardycc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: avoid use of half-online-committed context

One major usage of damon_call() is online DAMON parameters update.  It is
done by calling damon_commit_ctx() inside the damon_call() callback
function.  damon_commit_ctx() can fail for two reasons: 1) invalid
parameters and 2) internal memory allocation failures.  In case of
failures, the damon_ctx that attempted to be updated (commit destination)
can be partially updated (or, corrupted from a perspective), and therefore
shouldn't be used anymore.  The function only ensures the damon_ctx object
can safely deallocated using damon_destroy_ctx().

The API callers are, however, calling damon_commit_ctx() only after
asserting the parameters are valid, to avoid damon_commit_ctx() fails due
to invalid input parameters.  But it can still theoretically fail if the
internal memory allocation fails.  In the case, DAMON may run with the
partially updated damon_ctx.  This can result in unexpected behaviors
including even NULL pointer dereference in case of damos_commit_dests()
failure [1].  Such allocation failure is arguably too small to fail, so
the real world impact would be rare.  But, given the bad consequence, this
needs to be fixed.

Avoid such partially-committed (maybe-corrupted) damon_ctx use by saving
the damon_commit_ctx() failure on the damon_ctx object.  For this,
introduce damon_ctx->maybe_corrupted field.  damon_commit_ctx() sets it
when it is failed.  kdamond_call() checks if the field is set after each
damon_call_control->fn() is executed.  If it is set, ignore remaining
callback requests and return.  All kdamond_call() callers including
kdamond_fn() also check the maybe_corrupted field right after
kdamond_call() invocations.  If the field is set, break the kdamond_fn()
main loop so that DAMON sill doesn't use the context that might be
corrupted.

[sj@kernel.org: let kdamond_call() with cancel regardless of maybe_corrupted]
Link: https://lkml.kernel.org/r/20260320031553.2479-1-sj@kernel.org
Link: https://sashiko.dev/#/patchset/20260319145218.86197-1-sj%40kernel.org
Link: https://lkml.kernel.org/r/20260319145218.86197-1-sj@kernel.org
Link: https://lore.kernel.org/20260319043309.97966-1-sj@kernel.org
Fixes: 3301f1861d34 ("mm/damon/sysfs: handle commit command using damon_call()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/rmap: clear vma->anon_vma on error

Commit 542eda1a8329 ("mm/rmap: improve anon_vma_clone(),
unlink_anon_vmas() comments, add asserts") alters the way errors are
handled, but overlooked one important aspect of clean up.

When a VMA encounters an error state in anon_vma_clone() (that is, on
attempted allocation of anon_vma_chain objects), it cleans up partially
established state in cleanup_partial_anon_vmas(), before returning an
error.

However, this occurs prior to anon_vma->num_active_vmas being incremented,
and it also fails to clear the VMA's vma->anon_vma field, which remains in
place.

This is immediately an inconsistent state, because
anon_vma->num_active_vmas is supposed to track the number of VMAs whose
vma->anon_vma field references that anon_vma, and now that count is
off-by-negative-1 for each VMA for which this error state has occurred.

When VMAs are unlinked from this anon_vma, unlink_anon_vmas() will
eventually underflow anon_vma->num_active_vmas, which will trigger a
warning.

This will always eventually happen, as we unlink anon_vma's at process
teardown.

It could also cause maybe_reuse_anon_vma() to incorrectly permit the reuse
of an anon_vma which has active VMAs attached, which will lead to a
persistently invalid state.

The solution is to clear the VMA's anon_vma field when we clean up partial
state, as the fact we are doing so indicates clearly that the VMA is not
correctly integrated into the anon_vma tree and thus this field is
invalid.

Link: https://lkml.kernel.org/r/20260318122632.63404-1-ljs@kernel.org
Fixes: 542eda1a8329 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/linux-mm/20260302151547.2389070-1-sashal@kernel.org/
Reported-by: Jiakai Xu <jiakaipeanut@gmail.com>
Closes: https://lore.kernel.org/linux-mm/CAFb8wJvRhatRD-9DVmr5v5pixTMPEr3UKjYBJjCd09OfH55CKg@mail.gmail.com/
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Tested-by: Jiakai Xu <jiakaipeanut@gmail.com>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Sasha Levin (Microsoft) <sashal@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'driver-core-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fixes from Danilo Krummrich:

- Generalize driver_override in the driver core, providing a common
   sysfs implementation and concurrency-safe accessors for bus
   implementations

- Do not use driver_override as IRQ name in the hwmon axi-fan driver

- Remove an unnecessary driver_override check in sh platform_early

- Migrate the platform bus to use the generic driver_override
   infrastructure, fixing a UAF condition caused by accessing the
   driver_override field without proper locking in the platform_match()
   callback

* tag 'driver-core-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  driver core: platform: use generic driver_override infrastructure
  sh: platform_early: remove pdev->driver_override check
  hwmon: axi-fan: don't use driver_override as IRQ name
  docs: driver-model: document driver_override
  driver core: generalize driver_override in struct device

ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod

The modify logic registers temporary ftrace_ops object (tmp_ops) to trigger
the slow path for all direct callers to be able to safely modify attached
addresses.

At the moment we use ops->func_hash for tmp_ops filter, which represents all
the systems attachments. It's faster to use just the passed hash filter, which
contains only the modified sites and is always a subset of the ops->func_hash.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Cc: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20260312123738.129926-1-jolsa@kernel.org
Fixes: e93672f770d7 ("ftrace: Add update_ftrace_direct_mod function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ring-buffer: Fix to update per-subbuf entries of persistent ring buffer

Since the validation loop in rb_meta_validate_events() updates the same
cpu_buffer->head_page->entries, the other subbuf entries are not updated.
Fix to use head_page to update the entries field, since it is the cursor
in this loop.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Fix trace_marker copy link list updates

When the "copy_trace_marker" option is enabled for an instance, anything
written into /sys/kernel/tracing/trace_marker is also copied into that
instances buffer. When the option is set, that instance's trace_array
descriptor is added to the marker_copies link list. This list is protected
by RCU, as all iterations uses an RCU protected list traversal.

When the instance is deleted, all the flags that were enabled are cleared.
This also clears the copy_trace_marker flag and removes the trace_array
descriptor from the list.

The issue is after the flags are called, a direct call to
update_marker_trace() is performed to clear the flag. This function
returns true if the state of the flag changed and false otherwise. If it
returns true here, synchronize_rcu() is called to make sure all readers
see that its removed from the list.

But since the flag was already cleared, the state does not change and the
synchronization is never called, leaving a possible UAF bug.

Move the clearing of all flags below the updating of the copy_trace_marker
option which then makes sure the synchronization is performed.

Also use the flag for checking the state in update_marker_trace() instead
of looking at if the list is empty.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home
Fixes: 7b382efd5e8a ("tracing: Allow the top level trace_marker to write into another instances")
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Fix failure to read user space from system call trace events

The system call trace events call trace_user_fault_read() to read the user
space part of some system calls. This is done by grabbing a per-cpu
buffer, disabling migration, enabling preemption, calling
copy_from_user(), disabling preemption, enabling migration and checking if
the task was preempted while preemption was enabled. If it was, the buffer
is considered corrupted and it tries again.

There's a safety mechanism that will fail out of this loop if it fails 100
times (with a warning). That warning message was triggered in some
pi_futex stress tests. Enabling the sched_switch trace event and
traceoff_on_warning, showed the problem:

pi_mutex_hammer-1375    [006] d..21   138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375    [006] d..21   138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
     migration/6-47      [006] d..2.   138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95

What happened was the task 1375 was flagged to be migrated. When
preemption was enabled, the migration thread woke up to migrate that task,
but failed because migration for that task was disabled. This caused the
loop to fail to exit because the task scheduled out while trying to read
user space.

Every time the task enabled preemption the migration thread would schedule
in, try to migrate the task, fail and let the task continue. But because
the loop would only enable preemption with migration disabled, it would
always fail because each time it enabled preemption to read user space,
the migration thread would try to migrate it.

To solve this, when the loop fails to read user space without being
scheduled out, enabled and disable preemption with migration enabled. This
will allow the migration task to successfully migrate the task and the
next loop should succeed to read user space without being scheduled out.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home
Fixes: 64cf7d058a005 ("tracing: Have trace_marker use per-cpu data to read user space")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Revert "tracing: Remove pid in task_rename tracing output"

This reverts commit e3f6a42272e028c46695acc83fc7d7c42f2750ad.

The commit says that the tracepoint only deals with the current task,
however the following case is not current task:

comm_write() {
    p = get_proc_task(inode);
    if (!p)
        return -ESRCH;

    if (same_thread_group(current, p))
        set_task_comm(p, buffer);
}
where set_task_comm() calls __set_task_comm() which records
the update of p and not current.

So revert the patch to show pid.

Cc: <mhiramat@kernel.org>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <elver@google.com>
Cc: <kees@kernel.org>
Link: https://patch.msgid.link/20260306075954.4533-1-xuewen.yan@unisoc.com
Fixes: e3f6a42272e0 ("tracing: Remove pid in task_rename tracing output")
Reported-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

selftests/bpf: Add a test cases for sync_linked_regs regarding zext propagation

Add multiple test cases for linked register tracking with alu32 ops:

  - Add a test that checks sync_linked_regs() regarding reg->id (the linked
    target register) for BPF_ADD_CONST32 rather than known_reg->id (the
    branch register).

  - Add a test case for linked register tracking that exposes the cross-type
    sync_linked_regs() bug. One register uses alu32 (w7 += 1, BPF_ADD_CONST32)
    and another uses alu64 (r8 += 2, BPF_ADD_CONST64), both linked to the
    same base register.

  - Add a test case that exercises regsafe() path pruning when two execution
    paths reach the same program point with linked registers carrying
    different ADD_CONST flags (BPF_ADD_CONST32 from alu32 vs BPF_ADD_CONST64
    from alu64). This particular test passes with and without the fix since
    the pruning will fail due to different ranges, but it would still be
    useful to carry this one as a regression test for the unreachable div
    by zero.

With the fix applied all the tests pass:

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_linked_scalars
  [...]
  ./test_progs -t verifier_linked_scalars
  #602/1   verifier_linked_scalars/scalars: find linked scalars:OK
  #602/2   verifier_linked_scalars/sync_linked_regs_preserves_id:OK
  #602/3   verifier_linked_scalars/scalars_neg:OK
  #602/4   verifier_linked_scalars/scalars_neg_sub:OK
  #602/5   verifier_linked_scalars/scalars_neg_alu32_add:OK
  #602/6   verifier_linked_scalars/scalars_neg_alu32_sub:OK
  #602/7   verifier_linked_scalars/scalars_pos:OK
  #602/8   verifier_linked_scalars/scalars_sub_neg_imm:OK
  #602/9   verifier_linked_scalars/scalars_double_add:OK
  #602/10  verifier_linked_scalars/scalars_sync_delta_overflow:OK
  #602/11  verifier_linked_scalars/scalars_sync_delta_overflow_large_range:OK
  #602/12  verifier_linked_scalars/scalars_alu32_big_offset:OK
  #602/13  verifier_linked_scalars/scalars_alu32_basic:OK
  #602/14  verifier_linked_scalars/scalars_alu32_wrap:OK
  #602/15  verifier_linked_scalars/scalars_alu32_zext_linked_reg:OK
  #602/16  verifier_linked_scalars/scalars_alu32_alu64_cross_type:OK
  #602/17  verifier_linked_scalars/scalars_alu32_alu64_regsafe_pruning:OK
  #602/18  verifier_linked_scalars/alu32_negative_offset:OK
  #602/19  verifier_linked_scalars/spurious_precision_marks:OK
  #602     verifier_linked_scalars:OK
  Summary: 1/19 PASSED, 0 SKIPPED, 0 FAILED

Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260319211507.213816-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix sync_linked_regs regarding BPF_ADD_CONST32 zext propagation

Jenny reported that in sync_linked_regs() the BPF_ADD_CONST32 flag is
checked on known_reg (the register narrowed by a conditional branch)
instead of reg (the linked target register created by an alu32 operation).

Example case with reg:

  1. r6 = bpf_get_prandom_u32()
  2. r7 = r6 (linked, same id)
  3. w7 += 5 (alu32 -- r7 gets BPF_ADD_CONST32, zero-extended by CPU)
  4. if w6 < 0xFFFFFFFC goto safe (narrows r6 to [0xFFFFFFFC, 0xFFFFFFFF])
  5. sync_linked_regs() propagates to r7 but does NOT call zext_32_to_64()
  6. Verifier thinks r7 is [0x100000001, 0x100000004] instead of [1, 4]

Since known_reg above does not have BPF_ADD_CONST32 set above, zext_32_to_64()
is never called on alu32-derived linked registers. This causes the verifier
to track incorrect 64-bit bounds, while the CPU correctly zero-extends the
32-bit result.

The code checking known_reg->id was correct however (see scalars_alu32_wrap
selftest case), but the real fix needs to handle both directions - zext
propagation should be done when either register has BPF_ADD_CONST32, since
the linked relationship involves a 32-bit operation regardless of which
side has the flag.

Example case with known_reg (exercised also by scalars_alu32_wrap):

  1. r1 = r0; w1 += 0x100 (alu32 -- r1 gets BPF_ADD_CONST32)
  2. if r1 > 0x80 - known_reg = r1 (has BPF_ADD_CONST32), reg = r0 (doesn't)

Hence, fix it by checking for (reg->id | known_reg->id) & BPF_ADD_CONST32.

Moreover, sync_linked_regs() also has a soundness issue when two linked
registers used different ALU widths: one with BPF_ADD_CONST32 and the
other with BPF_ADD_CONST64. The delta relationship between linked registers
assumes the same arithmetic width though. When one register went through
alu32 (CPU zero-extends the 32-bit result) and the other went through
alu64 (no zero-extension), the propagation produces incorrect bounds.

Example:

  r6 = bpf_get_prandom_u32()     // fully unknown
  if r6 >= 0x100000000 goto out  // constrain r6 to [0, U32_MAX]
  r7 = r6
  w7 += 1                        // alu32: r7.id = N | BPF_ADD_CONST32
  r8 = r6
  r8 += 2                        // alu64: r8.id = N | BPF_ADD_CONST64
  if r7 < 0xFFFFFFFF goto out    // narrows r7 to [0xFFFFFFFF, 0xFFFFFFFF]

At the branch on r7, sync_linked_regs() runs with known_reg=r7
(BPF_ADD_CONST32) and reg=r8 (BPF_ADD_CONST64). The delta path
computes:

  r8 = r7 + (delta_r8 - delta_r7) = 0xFFFFFFFF + (2 - 1) = 0x100000000

Then, because known_reg->id has BPF_ADD_CONST32, zext_32_to_64(r8) is
called, truncating r8 to [0, 0]. But r8 used a 64-bit ALU op -- the
CPU does NOT zero-extend it. The actual CPU value of r8 is
0xFFFFFFFE + 2 = 0x100000000, not 0. The verifier now underestimates
r8's 64-bit bounds, which is a soundness violation.

Fix sync_linked_regs() by skipping propagation when the two registers
have mixed ALU widths (one BPF_ADD_CONST32, the other BPF_ADD_CONST64).

Lastly, fix regsafe() used for path pruning: the existing checks used
"& BPF_ADD_CONST" to test for offset linkage, which treated
BPF_ADD_CONST32 and BPF_ADD_CONST64 as equivalent.

Fixes: 7a433e519364 ("bpf: Support negative offsets, BPF_SUB, and alu32 for linked register tracking")
Reported-by: Jenny Guanni Qu <qguanni@gmail.com>
Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260319211507.213816-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-fix-unsound-scalar-forking-for-bpf_or'

Daniel Wade says:

====================
bpf: Fix unsound scalar forking for BPF_OR

maybe_fork_scalars() unconditionally sets the pushed path dst register
to 0 for both BPF_AND and BPF_OR.  For AND this is correct (0 & K == 0),
but for OR it is wrong (0 | K == K, not 0).  This causes the verifier to
track an incorrect value on the pushed path, leading to a verifier/runtime
divergence that allows out-of-bounds map value access.

v4: Use block comment style for multi-line comments in selftests (Amery Hung)
    Add Reviewed-by/Acked-by tags
v3: Use single-line comment style in selftests (Alexei Starovoitov)
v2: Use push_stack(env, env->insn_idx, ...) to re-execute the insn
    on the pushed path (Eduard Zingerman)
====================

Link: https://patch.msgid.link/20260314021521.128361-1-danjwade95@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>