git.ipfire.org Git - thirdparty/kernel/stable.git/log

LoongArch: BPF: Sign-extend struct ops return values properly

The ns_bpf_qdisc selftest triggers a kernel panic:

  Oops[#1]:
  CPU 0 Unable to handle kernel paging request at virtual address 0000000000741d58, era == 90000000851b5ac0, ra == 90000000851b5aa4
  CPU: 0 UID: 0 PID: 449 Comm: test_progs Tainted: G           OE       6.16.0+ #3 PREEMPT(full)
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
  pc 90000000851b5ac0 ra 90000000851b5aa4 tp 90000001076b8000 sp 90000001076bb600
  a0 0000000000741ce8 a1 0000000000000001 a2 90000001076bb5c0 a3 0000000000000008
  a4 90000001004c4620 a5 9000000100741ce8 a6 0000000000000000 a7 0100000000000000
  t0 0000000000000010 t1 0000000000000000 t2 9000000104d24d30 t3 0000000000000001
  t4 4f2317da8a7e08c4 t5 fffffefffc002f00 t6 90000001004c4620 t7 ffffffffc61c5b3d
  t8 0000000000000000 u0 0000000000000001 s9 0000000000000050 s0 90000001075bc800
  s1 0000000000000040 s2 900000010597c400 s3 0000000000000008 s4 90000001075bc880
  s5 90000001075bc8f0 s6 0000000000000000 s7 0000000000741ce8 s8 0000000000000000
     ra: 90000000851b5aa4 __qdisc_run+0xac/0x8d8
    ERA: 90000000851b5ac0 __qdisc_run+0xc8/0x8d8
   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
   PRMD: 00000004 (PPLV0 +PIE -PWE)
   EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
  ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
   BADV: 0000000000741d58
   PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
  Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)]
  Process test_progs (pid: 449, threadinfo=000000009af02b3a, task=00000000e9ba4956)
  Stack : 0000000000000000 90000001075bc8ac 90000000869524a8 9000000100741ce8
          90000001075bc800 9000000100415300 90000001075bc8ac 0000000000000000
          900000010597c400 900000008694a000 0000000000000000 9000000105b59000
          90000001075bc800 9000000100741ce8 0000000000000050 900000008513000c
          9000000086936000 0000000100094d4c fffffff400676208 0000000000000000
          9000000105b59000 900000008694a000 9000000086bf0dc0 9000000105b59000
          9000000086bf0d68 9000000085147010 90000001075be788 0000000000000000
          9000000086bf0f98 0000000000000001 0000000000000010 9000000006015840
          0000000000000000 9000000086be6c40 0000000000000000 0000000000000000
          0000000000000000 4f2317da8a7e08c4 0000000000000101 4f2317da8a7e08c4
          ...
  Call Trace:
  [<90000000851b5ac0>] __qdisc_run+0xc8/0x8d8
  [<9000000085130008>] __dev_queue_xmit+0x578/0x10f0
  [<90000000853701c0>] ip6_finish_output2+0x2f0/0x950
  [<9000000085374bc8>] ip6_finish_output+0x2b8/0x448
  [<9000000085370b24>] ip6_xmit+0x304/0x858
  [<90000000853c4438>] inet6_csk_xmit+0x100/0x170
  [<90000000852b32f0>] __tcp_transmit_skb+0x490/0xdd0
  [<90000000852b47fc>] tcp_connect+0xbcc/0x1168
  [<90000000853b9088>] tcp_v6_connect+0x580/0x8a0
  [<90000000852e7738>] __inet_stream_connect+0x170/0x480
  [<90000000852e7a98>] inet_stream_connect+0x50/0x88
  [<90000000850f2814>] __sys_connect+0xe4/0x110
  [<90000000850f2858>] sys_connect+0x18/0x28
  [<9000000085520c94>] do_syscall+0x94/0x1a0
  [<9000000083df1fb8>] handle_syscall+0xb8/0x158

  Code: 4001ad80  2400873f  2400832d <240073cc> 001137ff  001133ff  6407b41f  001503cc  0280041d

  ---[ end trace 0000000000000000 ]---

The bpf_fifo_dequeue prog returns a skb which is a pointer. The pointer
is treated as a 32bit value and sign extend to 64bit in epilogue. This
behavior is right for most bpf prog types but wrong for struct ops which
requires LoongArch ABI.

So let's sign extend struct ops return values according to the LoongArch
ABI ([1]) and return value spec in function model.

[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

Cc: stable@vger.kernel.org
Fixes: 6abf17d690d8 ("LoongArch: BPF: Add struct ops support for trampoline")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Make error handling robust in arch_prepare_bpf_trampoline()

Bail out instead of trying to perform a bpf_arch_text_copy() if
__arch_prepare_bpf_trampoline() failed.

Cc: stable@vger.kernel.org
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Make trampoline size stable

When attach fentry/fexit BPF programs, __arch_prepare_bpf_trampoline()
is called twice with different `struct bpf_tramp_image *im`:

    bpf_trampoline_update()
        -> arch_bpf_trampoline_size()
            -> __arch_prepare_bpf_trampoline()
        -> arch_prepare_bpf_trampoline()
            -> __arch_prepare_bpf_trampoline()

Use move_imm() will emit unstable instruction sequences, so let's use
move_addr() instead to prevent subtle bugs.

(I observed this while debugging other issues with printk.)

Cc: stable@vger.kernel.org
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Don't align trampoline size

Currently, arch_alloc_bpf_trampoline() use bpf_prog_pack_alloc() which
will pack multiple trampolines into a huge page. So, no need to assume
the trampoline size is page aligned.

Cc: stable@vger.kernel.org
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: No support of struct argument in trampoline programs

The current implementation does not support struct argument. This causes
a oops when running bpf selftest:

  $ ./test_progs -a tracing_struct
  Oops[#1]:
  CPU -1 Unable to handle kernel paging request at virtual address 0000000000000018, era == 9000000085bef268, ra == 90000000844f3938
  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     1-...0: (19 ticks this GP) idle=1094/1/0x4000000000000000 softirq=1380/1382 fqs=801
  rcu:     (detected by 0, t=5252 jiffies, g=1197, q=52 ncpus=4)
  Sending NMI from CPU 0 to CPUs 1:
  rcu: rcu_preempt kthread starved for 2495 jiffies! g1197 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=2
  rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
  rcu: RCU grace-period kthread stack dump:
  task:rcu_preempt     state:I stack:0     pid:15    tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
  Stack : 9000000100423e80 0000000000000402 0000000000000010 90000001003b0680
          9000000085d88000 0000000000000000 0000000000000040 9000000087159350
          9000000085c2b9b0 0000000000000001 900000008704a000 0000000000000005
          00000000ffff355b 00000000ffff355b 0000000000000000 0000000000000004
          9000000085d90510 0000000000000000 0000000000000002 7b5d998f8281e86e
          00000000ffff355c 7b5d998f8281e86e 000000000000003f 9000000087159350
          900000008715bf98 0000000000000005 9000000087036000 900000008704a000
          9000000100407c98 90000001003aff80 900000008715c4c0 9000000085c2b9b0
          00000000ffff355b 9000000085c33d3c 00000000000000b4 0000000000000000
          9000000007002150 00000000ffff355b 9000000084615480 0000000007000002
          ...
  Call Trace:
  [<9000000085c2a868>] __schedule+0x410/0x1520
  [<9000000085c2b9ac>] schedule+0x34/0x190
  [<9000000085c33d38>] schedule_timeout+0x98/0x140
  [<90000000845e9120>] rcu_gp_fqs_loop+0x5f8/0x868
  [<90000000845ed538>] rcu_gp_kthread+0x260/0x2e0
  [<900000008454e8a4>] kthread+0x144/0x238
  [<9000000085c26b60>] ret_from_kernel_thread+0x28/0xc8
  [<90000000844f20e4>] ret_from_kernel_thread_asm+0xc/0x88

  rcu: Stack dump where RCU GP kthread last ran:
  Sending NMI from CPU 0 to CPUs 2:
  NMI backtrace for cpu 2 skipped: idling at idle_exit+0x0/0x4

Reject it for now.

Cc: stable@vger.kernel.org
Fixes: f9b6b41f0cf3 ("LoongArch: BPF: Add basic bpf trampoline support")
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: No text_poke() for kernel text

The current implementation of bpf_arch_text_poke() requires 5 nops
at patch site which is not applicable for kernel/module functions.
Because LoongArch reserves ONLY 2 nops at the function entry. With
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y, this can be done by ftrace
instead.

See the following commit for details:
* commit b91e014f078e ("bpf: Make BPF trampoline use register_ftrace_direct() API")
* commit 9cdc3b6a299c ("LoongArch: ftrace: Add direct call support")

Cc: stable@vger.kernel.org
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Remove duplicated bpf_flush_icache()

The bpf_flush_icache() is called by bpf_arch_text_copy() already. So
remove it. This has been done in arm64 and riscv.

Cc: stable@vger.kernel.org
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Remove duplicated flags check

The check for (BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_SHARE_IPMODIFY) is
duplicated in __arch_prepare_bpf_trampoline(). Remove it.

While at it, make sure stack_size and nargs are initialized once.

Cc: stable@vger.kernel.org
Tested-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Fix uninitialized symbol 'retval_off'

In __arch_prepare_bpf_trampoline(), retval_off is meaningful only when
save_ret is not 0, so the current logic is correct. But it may cause a
build warning:

arch/loongarch/net/bpf_jit.c:1547 __arch_prepare_bpf_trampoline() error: uninitialized symbol 'retval_off'.

So initialize retval_off unconditionally to fix it.

Cc: stable@vger.kernel.org
Fixes: f9b6b41f0cf3 ("LoongArch: BPF: Add basic bpf trampoline support")
Closes: https://lore.kernel.org/r/202508191020.PBBh07cK-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Optimize sign-extention mov instructions

For 8-bit and 16-bit sign-extention mov instructions, it can use the
native instructions ext.w.b and ext.w.h directly, no need to use the
temporary t1 register, just remove the redundant operations.

Here are the test results:

  # modprobe test_bpf test_range=81,84
  # dmesg -t | tail -5
  test_bpf: #81 ALU_MOVSX | BPF_B jited:1 5 PASS
  test_bpf: #82 ALU_MOVSX | BPF_H jited:1 5 PASS
  test_bpf: #83 ALU64_MOVSX | BPF_B jited:1 5 PASS
  test_bpf: #84 ALU64_MOVSX | BPF_H jited:1 5 PASS
  test_bpf: Summary: 4 PASSED, 0 FAILED, [4/4 JIT'ed]

Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Handle new atomic instructions for probes

The atomic instructions sc.q, llacq.{w/d}, screl.{w/d} were newly added
in the LoongArch Reference Manual v1.10, it is necessary to handle them
in insns_not_supported() to avoid putting a breakpoint in the middle of
a ll/sc atomic sequence, otherwise it will loop forever for kprobes and
uprobes.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Try VMA lock-based page fault handling first

Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

The "ebizzy -mTRp" test on Loongson-3A6000 shows that PER_VMA_LOCK can
improve the benchmark by about 17.9% (97837.7 to 115430.8).

This is the LoongArch variant of "x86/mm: try VMA lock-based page fault
handling first".

Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Automatically disable kaslr if boot from kexec_file

Automatically disable kaslr when the kernel loads from kexec_file.

kexec_file loads the secondary kernel image to a non-linked address,
inherently providing KASLR-like randomization.

However, on LoongArch where System RAM may be non-contiguous, enabling
KASLR for the second kernel may relocate it to an invalid memory region
and cause a boot failure. Thus, we disable KASLR when "kexec_file" is
detected in the command line.

To ensure compatibility with older kernels loaded via kexec_file, this
patch should be backported to stable branches.

Cc: stable@vger.kernel.org
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Add crash dump support for kexec_file

Enabling crash dump (kdump) includes:
- Prepare contents of ELF header of a core dump file, /proc/vmcore,
  using crash_prepare_elf64_headers().
- Add "mem=size@start" parameter to the command line and pass it to the
  capture kernel.  Limit the runtime memory area of the captured kernel
  to avoid disrupting the production kernel's runtime state.
- Add "elfcorehdr=size@start" parameter to the cmdline.

The basic usage for kdump (add the cmdline parameter crashkernel=512M
to grub.cfg for production kernel):

1) Load capture kernel image (vmlinux.efi or vmlinux can both be used):
# kexec -s -p vmlinuz.efi --initrd=initrd.img --reuse-cmdline

2) Do something to crash, like:
# echo c > /proc/sysrq-trigger

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Add ELF binary support for kexec_file

This patch creates kexec_elf_ops to load ELF binary file for
kexec_file_load() syscall.

However, for `kbuf->memsz` and `kbuf->buf_min`, special handling is
required, and the generic `kexec_elf_load()` cannot be used directly.

$ readelf -l vmlinux
...
   Type           Offset             VirtAddr           PhysAddr
                  FileSiz            MemSiz              Flags Align
   LOAD           0x0000000000010000 0x9000000000200000 0x9000000000200000
                  0x0000000002747a00 0x000000000287a0d8  RWE 0x10000
   NOTE           0x0000000000000000 0x0000000000000000 0x0000000000000000
                  0x0000000000000000 0x0000000000000000  R      0x8

phdr->p_paddr should have been a physical address, but it is a virtual
address on the current LoongArch. This will cause kexec_file to fail
when loading the kernel and need to be converted to a physical address.

From the above MemSiz, it can be seen that 0x287a0d8 isn't page aligned.
Although kexec_add_buffer() will perform PAGE_SIZE alignment on kbuf->
memsz, there is still a stampeding in the loaded kernel space and initrd
space. The initrd resolution failed when starting the second kernel.

It can be known from the link script vmlinux.lds.S that,
    BSS_SECTION(0, SZ_64K, 8)
    . = ALIGN(PECOFF_SEGMENT_ALIGN);

It needs to be aligned according to SZ_64K, so that after alignment, its
size is consistent with _kernel_asize.

The basic usage (vmlinux):

1) Load second kernel image:
# kexec -s -l vmlinux --initrd=initrd.img --reuse-cmdline

2) Startup second kernel:
# kexec -e

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Add EFI binary support for kexec_file

This patch creates kexec_efi_ops to load EFI binary file for
kexec_file_load() syscall.

The efi_kexec_load() as two parts:
- the first part loads the kernel image (vmlinuz.efi or vmlinux.efi)
- the second part loads other segments (e.g: initrd, cmdline, etc)

Currently, pez (vmlinuz.efi) and pei (vmlinux.efi) format images are
supported.

The basic usage (vmlinuz.efi or vmlinux.efi):

1) Load second kernel image:
# kexec -s -l vmlinuz.efi --initrd=initrd.img --reuse-cmdline

2) Startup second kernel:
# kexec -e

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Add preparatory infrastructure for kexec_file

Add some preparatory infrastructure:
- Add command line processing.
- Add support for loading other segments.
- Other minor modifications.

This initrd will be passed to the second kernel via the command line
'initrd=start,size'.

The 'kexec_file' command line parameter indicates that the kernel is
loaded via kexec_file.

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Add struct loongarch_image_header for kernel

Define a dedicated image header structure for LoongArch architecture to
standardize kernel loading in bootloaders (primarily for kexec_file).

This header includes critical metadata, such as PE/DOS signature, kernel
entry points, kernel image size and load address offset, etc.

Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Allow specify SIMD width via kernel parameters

For power saving or debugging purpose, we usually want to limit the SIMD
(LSX/LASX) usage on a rich feature platform. So allow specify SIMD width
via kernel parameters "simd=".

Allowed values of "simd=" are any integers, and recommended values are:
0: Disable all SIMD features;
128: Enable at most 128bit SIMD features;
256: Enable at most 256bit SIMD features;
-1(default): Enable as many as possible SIMD features automatically.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Init acpi_gbl_use_global_lock to false

Init acpi_gbl_use_global_lock to false, in order to void error messages
during boot phase:

ACPI Error: Could not enable GlobalLock event (20240827/evxfevnt-182)
ACPI Error: No response from Global Lock hardware, disabling lock (20240827/evglock-59)

Fixes: 628c3bb40e9a8cefc0a6 ("LoongArch: Add boot and setup routines")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Fix build error for LTO with LLVM-18

Commit b15212824a01 ("LoongArch: Make LTO case independent in Makefile")
moves "KBUILD_LDFLAGS += -mllvm --loongarch-annotate-tablejump" out of
CONFIG_CC_HAS_ANNOTATE_TABLEJUMP, which breaks the build for LLVM-18, as
'--loongarch-annotate-tablejump' is unimplemented there:

ld.lld: error: -mllvm: ld.lld: Unknown command line argument '--loongarch-annotate-tablejump'.

Call ld-option to detect '--loongarch-annotate-tablejump' before use, so
as to fix the build error.

Fixes: b15212824a01 ("LoongArch: Make LTO case independent in Makefile")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Add cflag -fno-isolate-erroneous-paths-dereference

Currently, when compiling with GCC, there is no "break 7" instruction
for zero division due to using the option -mno-check-zero-division, but
the compiler still generates "break 0" instruction for zero division.

Here is a simple example:

  $ cat test.c
  int div(int a)
  {
  return a / 0;
  }
  $ gcc -O2 -S test.c -o test.s

GCC generates "break 0" on LoongArch and "ud2" on x86, objtool decodes
"ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
objtool warnings for LoongArch, but this is not the intention.

When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
to handle "break 0" as a trap. The generated "break 0" for zero division
by GCC is not proper, it should generate a break instruction with proper
bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
to avoid generating the unexpected "break 0" instruction for now.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202509200413.7uihAxJ5-lkp@intel.com/
Fixes: baad7830ee9a ("objtool/LoongArch: Mark types based on break immediate code")
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

Merge tag 'acpi-6.18-rc1' into loongarch-next

LoongArch architecture changes for 6.18 need acpica changes to handle
global lock initialization, so merge 'acpi-6.18-rc1' to create a base.

Merge branches 'acpi-apei', 'acpi-misc' and 'pnp'

Merge ACPI APEI updates, a miscellaneous update related to ACPI, and a
PNP update for 6.18-rc1:

- Remove redundant assignments in erst_dbg_{ioctl|write}() in the ACPI
   APEI driver (Thorsten Blum)

- Allow the ACPI APEI EINJ to handle more types of addresses than just
   MMIO (Jiaqi Yan)

- Use str_low_high() helper in two places in the ACPI code (Chelsy
   Ratnawat)

- Use str_plural() to simplify the PNP code (Xichao Zhao)

* acpi-apei:
  ACPI: APEI: EINJ: Allow more types of addresses except MMIO
  ACPI: APEI: Remove redundant assignments in erst_dbg_{ioctl|write}()

* acpi-misc:
  ACPI: Use str_low_high() helper in two places

* pnp:
  PNP: isapnp: use str_plural() to simplify the code

Merge branches 'acpi-thermal', 'acpi-fan', 'acpi-video', 'acpi-tad' and 'acpi-prm'

Merge an ACPI thermal zone driver update, an ACPI fan driver update, an
ACPI backlight (video) driver update, an ACPI TAD (time and alarm
device) driver update, and an ACPI PRM (platform runtime mechanism)
driver update for 6.18-rc1:

- Eliminate a dummy local variable from the ACPI thermal driver (Rafael
   Wysocki)

- Fold two simple functions into their only caller in the ACPI fan
   driver (Rafael Wysocki)

- Force native backlight on Lenovo 82K8 in the ACPI backlight (video)
   driver (Mario Limonciello)

- Add missing sysfs_remove_group() for ACPI_TAD_RT (Daniel Tang)

- Skip PRM handlers with NULL handler_address or NULL VA in the ACPI
   PRM driver (Shang song)

* acpi-thermal:
  ACPI: thermal: Get rid of a dummy local variable

* acpi-fan:
  ACPI: fan: Fold two simple functions into their only caller

* acpi-video:
  ACPI: video: force native for Lenovo 82K8

* acpi-tad:
  ACPI: TAD: Add missing sysfs_remove_group() for ACPI_TAD_RT

* acpi-prm:
  ACPI: PRM: Skip handlers with NULL handler_address or NULL VA

Merge branches 'acpi-property', 'acpi-resource', 'acpi-pm' and 'acpi-tables'

Merge updates of the ACPI device properties management code, ACPI
resources management code, ACPI power management, and ACPI data tables
parsing code for 6.18-rc1:

- Fix ACPI buffer properties extraction for data-only subnodes
   represented as _DSD-equivalent packages (Rafael Wysocki)

- Fix handling of ACPI data-only subnodes represented as _DSD-equivalent
   packages in the case when they are embedded in larger _DSD-equivalent
   packages and clean up acpi_nondev_subnode_extract() (Rafael Wysocki)

- Skip ACPI IRQ override on ASUS Vivobook Pro N6506CU (Sam van Kampen)

- Add power resource init function and use it for introducing an HP
   EliteBook 855 G7 WWAN modem power resource quirk (Maciej Szmigiero)

- Add support for DBG2 RISC-V SBI port subtype and Precise Baud Rate
   field to the ACPI SPCR table parser (Chen Pei)

* acpi-property:
  ACPI: property: Adjust failure handling in acpi_nondev_subnode_extract()
  ACPI: property: Do not pass NULL handles to acpi_attach_data()
  ACPI: property: Add code comments explaining what is going on
  ACPI: property: Disregard references in data-only subnode lists
  ACPI: property: Fix buffer properties extraction for subnodes

* acpi-resource:
  ACPI: resource: Skip IRQ override on ASUS Vivobook Pro N6506CU

* acpi-pm:
  ACPI: PM: Add HP EliteBook 855 G7 WWAN modem power resource quirk
  ACPI: PM: Add power resource init function

* acpi-tables:
  ACPI: SPCR: Support Precise Baud Rate field
  ACPI: SPCR: Add support for DBG2 RISC-V SBI port subtype

Merge branches 'acpi-scan', 'acpi-processor' and 'acpi-sysfs'

Merge an ACPI device enumeration update, ACPI processor driver updates,
and an ACPI sysfs-related code update for 6.18-rc1:

- Add Intel CVS ACPI HIDs to acpi_ignore_dep_ids[] so it is not
   regarded as real dependency (Hans de Goede)

- Use ACPI_FREE() for freeing an ACPI object in description_show() in
   the ACPI sysfs-related code (Kaushlendra Kumar)

- Fix memory leak in the ACPI processor idle driver registration error
   code path and optimize ACPI idle driver registration (Huisong Li,
   Rafael Wysocki)

- Add module import namespace to the ACPI processor idle driver (Rafael
   Wysocki)

- Eliminate static variable flat_state_cnt from the ACPI processor idle
   driver (Rafael Wysocki)

- Release cpufreq policy references using __free() in the ACPI
   processor thremal driver (Zihuan Zhang)

- Remove unused empty stubs of some functions and rearrange function
   declarations in a header file in the ACPI processor driver (Huisong
   Li)

- Redefine two functions as void in the ACPI processor driver (Rafael
   Wysocki)

- Do not expose global variable acpi_idle_driver in the ACPI processor
   driver (Huisong Li)

* acpi-scan:
  ACPI: scan: Add Intel CVS ACPI HIDs to acpi_ignore_dep_ids[]

* acpi-processor:
  ACPI: processor: Do not expose global variable acpi_idle_driver
  ACPI: processor: idle: Redefine two functions as void
  ACPI: processor: Update cpuidle driver check in __acpi_processor_start()
  ACPI: processor: idle: Rearrange declarations in header file
  ACPI: processor: Remove unused empty stubs of some functions
  ACPI: processor: thermal: Release policy references using __free()
  ACPI: processor: idle: Fix function defined but not used warning
  ACPI: processor: idle: Eliminate static variable flat_state_cnt
  ACPI: processor: idle: Add module import namespace
  ACPI: processor: idle: Optimize ACPI idle driver registration
  ACPI: processor: idle: Fix memory leak when register cpuidle device failed

* acpi-sysfs:
  ACPI: sysfs: Use ACPI_FREE() for freeing an ACPI object

Merge branch 'acpica'

Merge ACPICA updates (20250807 release material with a few fixes on top)
for 6.18-rc1:

- Add SoundWire File Table (SWFT) signature to ACPICA (Maciej Strozek)

- Rearrange local variable definition involving #ifdef in ACPICA to
   avoid using uninitialized variables (Zhe Qiao)

- Allow ACPICA to skip Global Lock initialization (Huacai Chen)

- Apply ACPI_NONSTRING in more places in ACPICA and fix two regressions
   related to incorrect ACPI_NONSTRING usage (Ahmed Salem)

- Fix printing CDAT table header when dissasebling CDAT AML (Ahmed
   Salem)

- Use acpi_ds_clear_operands() in acpi_ds_call_control_method() in
   ACPICA (Hans de Goede)

- Update dsmethod.c in ACPICA to address unused variable warning (Saket
   Dumbre)

- Print error messages in ACPICA for too few or too many control method
   arguments (Saket Dumbre)

- Update ACPICA version to 20250807 (Saket Dumbre)

- Fix largest possible resource descriptor index in ACPICA (Dmitry
   Antipov)

- Add Back-Invalidate restriction to CXL Window for CEDT in ACPICA
   (Davidlohr Bueso).

- Add the package type to acceptable Arg3 types for _DSM in ACPICA
   because ACPI_TYPE_ANY does not cover it (Saket Dumbre)

- Fix return values in ap_is_valid_checksum() in the acpidump utility
   in ACPICA (Kaushlendra Kumar)

* acpica:
  ACPICA: acpidump: fix return values in ap_is_valid_checksum()
  ACPICA: ACPI_TYPE_ANY does not include the package type
  ACPICA: CEDT: Add Back-Invalidate restriction to CXL Window
  ACPICA: Fix largest possible resource descriptor index
  ACPICA: Update version to 20250807
  ACPICA: Print error messages for too few or too many arguments
  ACPICA: Update dsmethod.c to get rid of unused variable warning
  ACPICA: dispatcher: Use acpi_ds_clear_operands() in acpi_ds_call_control_method()
  ACPICA: Debugger: drop ACPI_NONSTRING attribute from name_seg
  ACPICA: acpidump: drop ACPI_NONSTRING attribute from file_name
  ACPICA: iASL: Fix printing CDAT table header
  ACPICA: Apply ACPI_NONSTRING
  ACPICA: Allow to skip Global Lock initialization
  ACPICA: Change the compilation conditions
  ACPICA: Remove redundant "#ifdef" definitions
  ACPICA: Modify variable definition position
  ACPICA: Add SoundWire File Table (SWFT) signature

Linux 6.17

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux

Pull ARM fix from Russell King:
"Just one fix to the module freeing function that was declared __weak
when it should not have been. Thanks to Petr Pavlu for spotting this"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9458/1: module: Ensure the override of module_arch_freeing_init()

Merge tag 'i2c-for-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- various MAINTAINERS updates

- fix an off-by-one error in riic

- fix k1 DT schema to allow validation

- rtl9300: fix faulty merge conflict resolution

* tag 'i2c-for-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rtl9300: Drop unsupported I2C_FUNC_SMBUS_I2C_BLOCK
  MAINTAINERS: add entry for SpacemiT K1 I2C driver
  MAINTAINERS: Add me as maintainer of Synopsys DesignWare I2C driver
  MAINTAINERS: delete email for Tharun Kumar P
  dt-bindings: i2c: spacemit: extend and validate all properties
  i2c: riic: Allow setting frequencies lower than 50KHz
  MAINTAINERS: Remove myself as Synopsys DesignWare I2C maintainer
  MAINTAINERS: Update email address for Qualcomm's I2C GENI maintainers

Merge tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix buffer overflow in osnoise_cpu_write()

   The allocated buffer to read user space did not add a nul terminating
   byte after copying from user the string. It then reads the string,
   and if user space did not add a nul byte, the read will continue
   beyond the string.

   Add a nul terminating byte after reading the string.

- Fix missing check for lockdown on tracing

   There's a path from kprobe events or uprobe events that can update
   the tracing system even if lockdown on tracing is activate. Add a
   check in the dynamic event path.

- Add a recursion check for the function graph return path

   Now that fprobes can hook to the function graph tracer and call
   different code between the entry and the exit, the exit code may now
   call functions that are not called in entry. This means that the exit
   handler can possibly trigger recursion that is not caught and cause
   the system to crash.

   Add the same recursion checks in the function exit handler as exists
   in the entry handler path.

* tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: fgraph: Protect return handler from recursion loop
  tracing: dynevent: Add a missing lockdown check on dynevent
  tracing/osnoise: Fix slab-out-of-bounds in _parse_integer_limit()

Merge tag 'spi-fix-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few final driver specific fixes that have been sitting in -next for
  a bit.

  The OMAP issue is likely to come up very infrequently since mixed
  configuration SPI buses are rare and the Cadence issue is specific to
  SoCFPGA systems"

* tag 'spi-fix-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: omap2-mcspi: drive SPI_CLK on transfer_setup()
  spi: cadence-qspi: defer runtime support on socfpga if reset bit is enabled

Merge tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"7 hotfixes. 4 are cc:stable and the remainder address post-6.16 issues
  or aren't considered necessary for -stable kernels. 6 of these fixes
  are for MM.

  All singletons, please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines
  mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()
  mailmap: add entry for Bence Csókás
  fs/proc/task_mmu: check p->vec_buf for NULL
  kmsan: fix out-of-bounds access to shadow memory
  mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
  mm/hugetlb: fix folio is still mapped when deleted

i2c: rtl9300: Drop unsupported I2C_FUNC_SMBUS_I2C_BLOCK

While applying the patch for commit ede965fd555a ("i2c: rtl9300: remove
broken SMBus Quick operation support"), a conflict was incorrectly solved
by adding the I2C_FUNC_SMBUS_I2C_BLOCK feature flag. But the code to handle
I2C_SMBUS_I2C_BLOCK_DATA requests will be added by a separate commit.

Fixes: ede965fd555a ("i2c: rtl9300: remove broken SMBus Quick operation support")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

MAINTAINERS: add entry for SpacemiT K1 I2C driver

Add a MAINTAINERS entry for the SpacemiT K1 I2C driver and its DT binding.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

MAINTAINERS: Add me as maintainer of Synopsys DesignWare I2C driver

I volunteered as maintainer of the DesignWare I2C driver, so update my
entry from reviewer to maintainer.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

MAINTAINERS: delete email for Tharun Kumar P

The email address bounced. I couldn't find a newer one in recent git history,
so delete this email entry.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Merge tag 'trace-tools-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull rtla tool fixes from Steven Rostedt:

- Fix a buffer overflow in actions_parse()

   The "trigger_c" variable did not account for the nul byte when
   determining its size

- Fix a compare that had the values reversed

   actions_destroy() is supposed to reallocate when len is greater than
   the current size, but the compare was testing if size is greater than
   the new length

* tag 'trace-tools-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla/actions: Fix condition for buffer reallocation
  rtla: Fix buffer overflow in actions_parse

tracing: fgraph: Protect return handler from recursion loop

function_graph_enter_regs() prevents itself from recursion by
ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(),
which is called at the exit, does not prevent such recursion.
Therefore, while it can prevent recursive calls from
fgraph_ops::entryfunc(), it is not able to prevent recursive calls
to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop.
This can lead an unexpected recursion bug reported by Menglong.

is_endbr() is called in __ftrace_return_to_handler -> fprobe_return
-> kprobe_multi_link_exit_handler -> is_endbr.

To fix this issue, acquire ftrace_test_recursion_trylock() in the
__ftrace_return_to_handler() after unwind the shadow stack to mark
this section must prevent recursive call of fgraph inside user-defined
fgraph_ops::retfunc().

This is essentially a fix to commit 4346ba160409 ("fprobe: Rewrite
fprobe on function-graph tracer"), because before that fgraph was
only used from the function graph tracer. Fprobe allowed user to run
any callbacks from fgraph after that commit.

Reported-by: Menglong Dong <menglong8.dong@gmail.com>
Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/175852292275.307379.9040117316112640553.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Menglong Dong <menglong8.dong@gmail.com>
Acked-by: Menglong Dong <menglong8.dong@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rtla/actions: Fix condition for buffer reallocation

The condition to check if the actions buffer needs to be resized was
incorrect. The check `self->size >= self->len` would evaluate to
true on almost every call to `actions_new()`, causing the buffer to
be reallocated unnecessarily each time an action was added.

Fix the condition to `self->len >= self.size`, ensuring
that the buffer is only resized when it is actually full.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com
Fixes: 6ea082b171e00 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rtla: Fix buffer overflow in actions_parse

Currently, tests 3 and 13-22 in tests/timerlat.t fail with error:

    *** buffer overflow detected ***: terminated
    timeout: the monitored command dumped core

The result of running `sudo make check` is

    tests/timerlat.t (Wstat: 0 Tests: 22 Failed: 11)
      Failed tests:  3, 13-22
    Files=3, Tests=34, 140 wallclock secs ( 0.07 usr  0.01 sys + 27.63 cusr
    27.96 csys = 55.67 CPU)
    Result: FAIL

Fix buffer overflow in actions_parse to avoid this error. After this
change, the tests results are

    tests/hwnoise.t ... ok
    tests/osnoise.t ... ok
    tests/timerlat.t .. ok
    All tests successful.
    Files=3, Tests=34, 186 wallclock secs ( 0.06 usr  0.01 sys + 41.10 cusr
    44.38 csys = 85.55 CPU)
    Result: PASS

Link: https://lore.kernel.org/164ffc2ec8edacaf1295789dad82a07817b6263d.1757034919.git.ipravdin.official@gmail.com
Fixes: 6ea082b171e0 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

- A race-free implementation of pudp_huge_get_and_clear() (based on the
   x86 code)

- A MAINTAINERS update to my E-mail address

* tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  MAINTAINERS: Update Paul Walmsley's E-mail address
  riscv: Use an atomic xchg in pudp_huge_get_and_clear()

Merge tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Fix a CPU topology code regression that caused the mishandling of
  certain boot command line options, and re-enable CONFIG_PTDUMP on i386
  that was mistakenly turned off in the Kconfig"

* tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Implement topology_is_core_online() to address SMT regression
  x86/Kconfig: Reenable PTDUMP on i386

Merge tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Fix two dl_server regressions: a race that can end up leaving the
  dl_server stuck, and a dl_server throttling bug causing lag to fair
  tasks"

* tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix dl_server behaviour
  sched/deadline: Fix dl_server getting stuck

Merge tag 'locking-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
"Fix a PI-futexes race, and fix a copy_process() futex cleanup bug"

* tag 'locking-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Use correct exit on failure from futex_hash_allocate_default()
futex: Prevent use-after-free during requeue-PI

Merge tag 'core-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core fix from Ingo Molnar:
"Fix a CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y bug on older Clang versions"

* tag 'core-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17

Merge tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix unlink bug

- Fix potential out of bounds access in processing compound requests

* tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix wrong index reference in smb2_compound_op()
smb: client: handle unlink(2) of files open by different clients

Merge tag 'vfs-6.17-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Prevent double unlock in netfs

- Fix a NULL pointer dereference in afs_put_server()

- Fix a reference leak in netfs

* tag 'vfs-6.17-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: fix reference leak
  afs: Fix potential null pointer dereference in afs_put_server
  netfs: Prevent duplicate unlocking

Merge tag 'pmdomain-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fix from Ulf Hansson:

- mediatek: Make sure MT8195 AUDIO power domain isn't left powered-on

* tag 'pmdomain-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: set default off flag for MT8195 AUDIO power domain

Merge tag 'platform-drivers-x86-v6.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and New HW Supoort

   - amd/pmc: Use 8042 quirk for Stellaris Slim Gen6 AMD

   - dell: Set USTT mode according to BIOS after reboot

   - dell-lis3lv02d: Add Latitude E6530

   - lg-laptop: Fix setting the fan mode"

* tag 'platform-drivers-x86-v6.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: lg-laptop: Fix WMAB call in fan_mode_store()
  platform/x86: dell-lis3lv02d: Add Latitude E6530
  platform/x86/dell: Set USTT mode according to BIOS after reboot
  platform/x86/amd/pmc: Add Stellaris Slim Gen6 AMD to spurious 8042 quirks list

Merge tag 'gpio-fixes-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- allow looking up GPIOs by the secondary firmware node too

- fix memory leak in gpio-regmap

* tag 'gpio-fixes-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: regmap: fix memory leak of gpio_regmap structure
gpiolib: Extend software-node support to support secondary software-nodes

Merge tag 'block-6.17-20250925' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:
"A regression fix for this series where an attempt to silence an EOD
  error got messed up a bit, and then a change of git trees for the
  block and io_uring trees.

  Switching the git trees to kernel.org now, as I've just about had it
  trying to battle AI bots that bring the box to its knees, continually.
  At least I don't have to maintain the kernel.org side"

* tag 'block-6.17-20250925' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  MAINTAINERS: update io_uring and block tree git trees
  block: fix EOD return for device with nr_sectors == 0

Merge tag 'drm-fixes-2025-09-26' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, some fbcon font handling fixes, then amdgpu/xe/i915 with
  a few, and a few misc fixes for other drivers. Seems about right for
  this stage, and I don't know of anything outstanding.

  fbcon:
   - fix OOB access in font allocation
   - fix integer overflow in font handling

  amdgpu:
   - Backlight fix
   - DC preblend fix
   - DCN 3.5 fix
   - Cleanup output_tf_change

  xe:
   - Don't expose sysfs attributes not applicable for VFs
   - Fix build with CONFIG_MODULES=n
   - Don't copy pinned kernel bos twice on suspend

  i915:
   - Set O_LARGEFILE in __create_shmem()
   - Guard reg_val against a INVALID_TRANSCODER [ddi]

  ast:
   - sleeps causing cpu stall fix

  panthor:
   - scheduler race condition fix

  gma500:
   - NULL ptr deref in hdmi teardown fix"

* tag 'drm-fixes-2025-09-26' of https://gitlab.freedesktop.org/drm/kernel:
  drm/panthor: Defer scheduler entitiy destruction to queue release
  drm/amd/display: remove output_tf_change flag
  drm/amd/display: Init DCN35 clocks from pre-os HW values
  drm/amd/display: Use mpc.preblend flag to indicate preblend
  drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume
  fbcon: Fix OOB access in font allocation
  drm/i915/ddi: Guard reg_val against a INVALID_TRANSCODER
  drm/i915: set O_LARGEFILE in __create_shmem()
  drm/xe: Don't copy pinned kernel bos twice on suspend
  drm/xe: Fix build with CONFIG_MODULES=n
  drm/xe/vf: Don't expose sysfs attributes not applicable for VFs
  fbcon: fix integer overflow in fbcon_do_set_font
  drm/gma500: Fix null dereference in hdmi teardown
  drm/ast: Use msleep instead of mdelay for edid read

smb: client: fix wrong index reference in smb2_compound_op()

In smb2_compound_op(), the loop that processes each command's response
uses wrong indices when accessing response bufferes.

This incorrect indexing leads to improper handling of command results.
Also, if incorrectly computed index is greather than or equal to
MAX_COMPOUND, it can cause out-of-bounds accesses.

Fixes: 3681c74d342d ("smb: client: handle lack of EA support in smb2_query_path_info()") # 6.14
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

netfs: fix reference leak

Commit 20d72b00ca81 ("netfs: Fix the request's work item to not
require a ref") modified netfs_alloc_request() to initialize the
reference counter to 2 instead of 1.  The rationale was that the
requet's "work" would release the second reference after completion
(via netfs_{read,write}_collection_worker()).  That works most of the
time if all goes well.

However, it leaks this additional reference if the request is released
before the I/O operation has been submitted: the error code path only
decrements the reference counter once and the work item will never be
queued because there will never be a completion.

This has caused outages of our whole server cluster today because
tasks were blocked in netfs_wait_for_outstanding_io(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch).  This was caused by a netfs_pgpriv2_begin_copy_to_cache() call
which failed in fscache_begin_write_operation().  The leaked
netfs_io_request was never completed, leaving `netfs_inode.io_count`
with a positive value forever.

All of this is super-fragile code.  Finding out which code paths will
lead to an eventual completion and which do not is hard to see:

- Some functions like netfs_create_write_req() allocate a request, but
  will never submit any I/O.

- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()
  and then netfs_put_request(); however, netfs_unbuffered_read() can
  also fail early before submitting the I/O request, therefore another
  netfs_put_request() call must be added there.

A rule of thumb is that functions that return a `netfs_io_request` do
not submit I/O, and all of their callers must be checked.

For my taste, the whole netfs code needs an overhaul to make reference
counting easier to understand and less fragile & obscure.  But to fix
this bug here and now and produce a patch that is adequate for a
stable backport, I tried a minimal approach that quickly frees the
request object upon early failure.

I decided against adding a second netfs_put_request() each time
because that would cause code duplication which obscures the code
further.  Instead, I added the function netfs_put_failed_request()
which frees such a failed request synchronously under the assumption
that the reference count is exactly 2 (as initially set by
netfs_alloc_request() and never touched), verified by a
WARN_ON_ONCE().  It then deinitializes the request object (without
going through the "cleanup_work" indirection) and frees the allocation
(with RCU protection to protect against concurrent access by
netfs_requests_seq_start()).

All code paths that fail early have been changed to call
netfs_put_failed_request() instead of netfs_put_request().
Additionally, I have added a netfs_put_request() call to
netfs_unbuffered_read() as explained above because the
netfs_put_failed_request() approach does not work there.

Fixes: 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'drm-xe-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Don't expose sysfs attributes not applicable for VFs (Michal)
- Fix build with CONFIG_MODULES=n (Lucas)
- Don't copy pinned kernel bos twice on suspend (Thomas)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aNU-FkJEcA3T4aDB@intel.com

Merge tag 'drm-misc-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A CPU stall fix for ast, a NULL pointer dereference fix for gma500, an
OOB and overflow fixes for fbcon, and a race condition fix for panthor.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250925-smilodon-of-luxurious-genius-4ebee7@penduick

Merge tag 'drm-intel-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Set O_LARGEFILE in __create_shmem() (Taotao Chen)
- Guard reg_val against a INVALID_TRANSCODER [ddi] (Suraj Kandpal)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://lore.kernel.org/r/aNTxWfhsMkFZ3Q-a@linux

Merge tag 'amd-drm-fixes-6.17-2025-09-24' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.17-2025-09-24:

amdgpu:
- Backlight fix
- DC preblend fix
- DCN 3.5 fix
- Cleanup output_tf_change

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250924200632.531102-1-alexander.deucher@amd.com

include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines

commit c519c3c0a113 ("mm/kasan: avoid lazy MMU mode hazards") introduced
the use of arch_enter_lazy_mmu_mode(), which results in the compiler
complaining about "statement has no effect", when
__HAVE_ARCH_LAZY_MMU_MODE is not defined in include/linux/pgtable.h

The exact warning/error is:

In file included from ./include/linux/kasan.h:37,
                 from mm/kasan/shadow.c:14:
mm/kasan/shadow.c: In function kasan_populate_vmalloc_pte:
./include/linux/pgtable.h:247:41: error: statement with no effect [-Werror=unused-value]
  247 | #define arch_enter_lazy_mmu_mode()      (LAZY_MMU_DEFAULT)
      |                                         ^
mm/kasan/shadow.c:322:9: note: in expansion of macro arch_enter_lazy_mmu_mode>   322 |         arch_enter_lazy_mmu_mode();
     |         ^~~~~~~~~~~~~~~~~~~~~~~~

switching these "functions" to static inlines fixes this up.

Fixes: c519c3c0a113 ("mm/kasan: avoid lazy MMU mode hazards")
Reported-by: Balbir Singh <balbirs@nvidia.com>
Closes: https://lkml.kernel.org/r/20250912235515.367061-1-balbirs@nvidia.com
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()

The callback return value is ignored in damon_sysfs_damon_call(), which
means that it is not possible to detect invalid user input when writing
commands such as 'commit' to
/sys/kernel/mm/damon/admin/kdamonds/<K>/state. Fix it.

Link: https://lkml.kernel.org/r/20250920132546.5822-1-akinobu.mita@gmail.com
Fixes: f64539dcdb87 ("mm/damon/sysfs: use damon_call() for update_schemes_stats")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: add entry for Bence Csókás

I will be leaving Prolan this week. You can reach me by my personal email
for now.

Link: https://lkml.kernel.org/r/20250915-mailmap-v1-1-9ebdea93c6a7@prolan.hu
Signed-off-by: Bence Csókás <bence98@sch.bme.hu>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs/proc/task_mmu: check p->vec_buf for NULL

When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches
pagemap_scan_backout_range(), kernel panics with null-ptr-deref:

[   44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[   44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none)
[   44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80

<snip registers, unreliable trace>

[   44.946828] Call Trace:
[   44.947030]  <TASK>
[   44.949219]  pagemap_scan_pmd_entry+0xec/0xfa0
[   44.952593]  walk_pmd_range.isra.0+0x302/0x910
[   44.954069]  walk_pud_range.isra.0+0x419/0x790
[   44.954427]  walk_p4d_range+0x41e/0x620
[   44.954743]  walk_pgd_range+0x31e/0x630
[   44.955057]  __walk_page_range+0x160/0x670
[   44.956883]  walk_page_range_mm+0x408/0x980
[   44.958677]  walk_page_range+0x66/0x90
[   44.958984]  do_pagemap_scan+0x28d/0x9c0
[   44.961833]  do_pagemap_cmd+0x59/0x80
[   44.962484]  __x64_sys_ioctl+0x18d/0x210
[   44.962804]  do_syscall_64+0x5b/0x290
[   44.963111]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are
allocated and p->vec_buf remains set to NULL.

This breaks an assumption made later in pagemap_scan_backout_range(), that
page_region is always allocated for p->vec_buf_index.

Fix it by explicitly checking p->vec_buf for NULL before dereferencing.

Other sites that might run into same deref-issue are already (directly or
transitively) protected by checking p->vec_buf.

Note:
From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output
is requested and it's only the side effects caller is interested in,
hence it passes check in pagemap_scan_get_args().

This issue was found by syzkaller.

Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Penglei Jiang <superman.xpt@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kmsan: fix out-of-bounds access to shadow memory

Running sha224_kunit on a KMSAN-enabled kernel results in a crash in
kmsan_internal_set_shadow_origin():

    BUG: unable to handle page fault for address: ffffbc3840291000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0
    Oops: 0000 [#1] SMP NOPTI
    CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G                 N  6.17.0-rc3 #10 PREEMPT(voluntary)
    Tainted: [N]=TEST
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
    RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100
    [...]
    Call Trace:
    <TASK>
    __msan_memset+0xee/0x1a0
    sha224_final+0x9e/0x350
    test_hash_buffer_overruns+0x46f/0x5f0
    ? kmsan_get_shadow_origin_ptr+0x46/0xa0
    ? __pfx_test_hash_buffer_overruns+0x10/0x10
    kunit_try_run_case+0x198/0xa00

This occurs when memset() is called on a buffer that is not 4-byte aligned
and extends to the end of a guard page, i.e.  the next page is unmapped.

The bug is that the loop at the end of kmsan_internal_set_shadow_origin()
accesses the wrong shadow memory bytes when the address is not 4-byte
aligned.  Since each 4 bytes are associated with an origin, it rounds the
address and size so that it can access all the origins that contain the
buffer.  However, when it checks the corresponding shadow bytes for a
particular origin, it incorrectly uses the original unrounded shadow
address.  This results in reads from shadow memory beyond the end of the
buffer's shadow memory, which crashes when that memory is not mapped.

To fix this, correctly align the shadow address before accessing the 4
shadow bytes corresponding to each origin.

Link: https://lkml.kernel.org/r/20250911195858.394235-1-ebiggers@kernel.org
Fixes: 2ef3cec44c60 ("kmsan: do not wipe out origin when doing partial unpoisoning")
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count

commit 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
count") introduced ->pt_share_count dedicated to hugetlb PMD share count
tracking, but omitted fixing copy_hugetlb_page_range(), leaving the
function relying on page_count() for tracking that no longer works.

When lazy page table copy for hugetlb is disabled, that is, revert commit
bcd51a3c679d ("hugetlb: lazy page table copies in fork()") fork()'ing with
hugetlb PMD sharing quickly lockup -

[  239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
[  239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
[  239.446631] Call Trace:
[  239.446633]  <TASK>
[  239.446636]  _raw_spin_lock+0x3f/0x60
[  239.446639]  copy_hugetlb_page_range+0x258/0xb50
[  239.446645]  copy_page_range+0x22b/0x2c0
[  239.446651]  dup_mmap+0x3e2/0x770
[  239.446654]  dup_mm.constprop.0+0x5e/0x230
[  239.446657]  copy_process+0xd17/0x1760
[  239.446660]  kernel_clone+0xc0/0x3e0
[  239.446661]  __do_sys_clone+0x65/0xa0
[  239.446664]  do_syscall_64+0x82/0x930
[  239.446668]  ? count_memcg_events+0xd2/0x190
[  239.446671]  ? syscall_trace_enter+0x14e/0x1f0
[  239.446676]  ? syscall_exit_work+0x118/0x150
[  239.446677]  ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
[  239.446681]  ? clear_bhb_loop+0x30/0x80
[  239.446684]  ? clear_bhb_loop+0x30/0x80
[  239.446686]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

There are two options to resolve the potential latent issue:
  1. warn against PMD sharing in copy_hugetlb_page_range(),
  2. fix it.
This patch opts for the second option.
While at it, simplify the comment, the details are not actually relevant
anymore.

Link: https://lkml.kernel.org/r/20250916004520.1604530-1-jane.chu@oracle.com
Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count")
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/hugetlb: fix folio is still mapped when deleted

Migration may be raced with fallocating hole.  remove_inode_single_folio
will unmap the folio if the folio is still mapped.  However, it's called
without folio lock.  If the folio is migrated and the mapped pte has been
converted to migration entry, folio_mapped() returns false, and won't
unmap it.  Due to extra refcount held by remove_inode_single_folio,
migration fails, restores migration entry to normal pte, and the folio is
mapped again.  As a result, we triggered BUG in filemap_unaccount_folio.

The log is as follows:
BUG: Bad page cache in process hugetlb  pfn:156c00
page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00
head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file"
flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f4(hugetlb)
page dumped because: still mapped when deleted
CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
  <TASK>
  dump_stack_lvl+0x4f/0x70
  filemap_unaccount_folio+0xc4/0x1c0
  __filemap_remove_folio+0x38/0x1c0
  filemap_remove_folio+0x41/0xd0
  remove_inode_hugepages+0x142/0x250
  hugetlbfs_fallocate+0x471/0x5a0
  vfs_fallocate+0x149/0x380

Hold folio lock before checking if the folio is mapped to avold race with
migration.

Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com
Fixes: 4aae8d1c051e ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth, IPsec and CAN.

  No known regressions at this point.

  Current release - regressions:

   - xfrm: xfrm_alloc_spi shouldn't use 0 as SPI

  Previous releases - regressions:

   - xfrm: fix offloading of cross-family tunnels

   - bluetooth: fix several races leading to UaFs

   - dsa: lantiq_gswip: fix FDB entries creation for the CPU port

   - eth:
       - tun: update napi->skb after XDP process
       - mlx: fix UAF in flow counter release

  Previous releases - always broken:

   - core: forbid FDB status change while nexthop is in a group

   - smc: fix warning in smc_rx_splice() when calling get_page()

   - can: provide missing ndo_change_mtu(), to prevent buffer overflow.

   - eth:
       - i40e: fix VF config validation
       - broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl"

* tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()
  net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port
  net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()
  libie: fix string names for AQ error codes
  net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD
  net/mlx5: HWS, ignore flow level for multi-dest table
  net/mlx5: fs, fix UAF in flow counter release
  selftests: fib_nexthops: Add test cases for FDB status change
  selftests: fib_nexthops: Fix creation of non-FDB nexthops
  nexthop: Forbid FDB status change while nexthop is in a group
  net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS
  bnxt_en: correct offset handling for IPv6 destination address
  ptp: document behavior of PTP_STRICT_FLAGS
  broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl
  broadcom: fix support for PTP_PEROUT_DUTY_CYCLE
  Bluetooth: MGMT: Fix possible UAFs
  Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync
  Bluetooth: hci_event: Fix UAF in hci_conn_tx_dequeue
  Bluetooth: hci_sync: Fix hci_resume_advertising_sync
  Bluetooth: Fix build after header cleanup
  ...

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"virtio,vhost: last minute fixes

  More small fixes. Most notably this fixes crashes and hangs in
  vhost-net"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  MAINTAINERS, mailmap: Update address for Peter Hilber
  virtio_config: clarify output parameters
  uapi: vduse: fix typo in comment
  vhost: Take a reference on the task in struct vhost_task.
  vhost-net: flush batched before enabling notifications
  Revert "vhost/net: Defer TX queue re-enable until after sendmsg"
  vhost-net: unbreak busy polling
  vhost-scsi: fix argument order in tport allocation error message

platform/x86: lg-laptop: Fix WMAB call in fan_mode_store()

When WMAB is called to set the fan mode, the new mode is read from either
bits 0-1 or bits 4-5 (depending on the value of some other EC register).
Thus when WMAB is called with bits 4-5 zeroed and called again with
bits 0-1 zeroed, the second call undoes the effect of the first call.
This causes writes to /sys/devices/platform/lg-laptop/fan_mode to have
no effect (and causes reads to always report a status of zero).

Fix this by calling WMAB once, with the mode set in bits 0,1 and 4,5.
When the fan mode is returned from WMAB it always has this form, so
there is no need to preserve the other bits. As a bonus, the driver
now supports the "Performance" fan mode seen in the LG-provided Windows
control app, which provides less aggressive CPU throttling but louder
fan noise and shorter battery life.

Also, correct the documentation to reflect that 0 corresponds to the
default mode (what the Windows app calls "Optimal") and 1 corresponds
to the silent mode.

Fixes: dbf0c5a6b1f8 ("platform/x86: Add LG Gram laptop special features driver")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204913#c4
Signed-off-by: Daniel Lee <dany97@live.ca>
Link: https://patch.msgid.link/MN2PR06MB55989CB10E91C8DA00EE868DDC1CA@MN2PR06MB5598.namprd06.prod.outlook.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()

This code calls kfree_rcu(new_node, rcu) and then dereferences "new_node"
and then dereferences it on the next line. Two lines later, we take
a mutex so I don't think this is an RCU safe region. Re-order it to do
the dereferences before queuing up the free.

Fixes: 68fbff68dbea ("octeontx2-pf: Add police action for TC flower")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/aNKCL1jKwK8GRJHh@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/panthor: Defer scheduler entitiy destruction to queue release

Commit de8548813824 ("drm/panthor: Add the scheduler logical block")
handled destruction of a group's queues' drm scheduler entities early
into the group destruction procedure.

However, that races with the group submit ioctl, because by the time
entities are destroyed (through the group destroy ioctl), the submission
procedure might've already obtained a group handle, and therefore the
ability to push jobs into entities. This is met with a DRM error message
within the drm scheduler core as a situation that should never occur.

Fix by deferring drm scheduler entity destruction to queue release time.

Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250919164436.531930-1-adrian.larumbe@collabora.com

Merge branch 'lantiq_gswip-fixes'

Vladimir Oltean says:

====================
lantiq_gswip fixes

This is a small set of fixes which I believe should be backported for
the lantiq_gswip driver. Daniel Golle asked me to submit them here:
https://lore.kernel.org/netdev/aLiDfrXUbw1O5Vdi@pidgin.makrotopia.org/

As mentioned there, a merge conflict with net-next is expected, due to
the movement of the driver to the 'drivers/net/dsa/lantiq' folder there.
Good luck :-/

Patch 2/2 fixes an old regression and is the minimal fix for that, as
discussed here:
https://lore.kernel.org/netdev/aJfNMLNoi1VOsPrN@pidgin.makrotopia.org/

Patch 1/2 was identified by me through static analysis, and I consider
it to be a serious deficiency. It needs a test tag.
====================

Link: https://patch.msgid.link/20250918072142.894692-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port

The blamed commit and others in that patch set started the trend
of reusing existing DSA driver API for a new purpose: calling
ds->ops->port_fdb_add() on the CPU port.

The lantiq_gswip driver was not prepared to handle that, as can be seen
from the many errors that Daniel presents in the logs:

[  174.050000] gswip 1e108000.switch: port 2 failed to add fa:aa:72:f4:8b:1e vid 1 to fdb: -22
[  174.060000] gswip 1e108000.switch lan2: entered promiscuous mode
[  174.070000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 0 to fdb: -22
[  174.090000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 1 to fdb: -22
[  174.090000] gswip 1e108000.switch: port 2 failed to delete fa:aa:72:f4:8b:1e vid 1 from fdb: -2

The errors are because gswip_port_fdb() wants to get a handle to the
bridge that originated these FDB events, to associate it with a FID.
Absolutely honourable purpose, however this only works for user ports.

To get the bridge that generated an FDB entry for the CPU port, one
would need to look at the db.bridge.dev argument. But this was
introduced in commit c26933639b54 ("net: dsa: request drivers to perform
FDB isolation"), first appeared in v5.18, and when the blamed commit was
introduced in v5.14, no such API existed.

So the core DSA feature was introduced way too soon for lantiq_gswip.
Not acting on these host FDB entries and suppressing any errors has no
other negative effect, and practically returns us to not supporting the
host filtering feature at all - peacefully, this time.

Fixes: 10fae4ac89ce ("net: dsa: include bridge addresses which are local in the host fdb list")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Closes: https://lore.kernel.org/netdev/aJfNMLNoi1VOsPrN@pidgin.makrotopia.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-3-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()

A port added to a "single port bridge" operates as standalone, and this
is mutually exclusive to being part of a Linux bridge. In fact,
gswip_port_bridge_join() calls gswip_add_single_port_br() with
add=false, i.e. removes the port from the "single port bridge" to enable
autonomous forwarding.

The blamed commit seems to have incorrectly thought that ds->ops->port_enable()
is called one time per port, during the setup phase of the switch.

However, it is actually called during the ndo_open() implementation of
DSA user ports, which is to say that this sequence of events:

1. ip link set swp0 down
2. ip link add br0 type bridge
3. ip link set swp0 master br0
4. ip link set swp0 up

would cause swp0 to join back the "single port bridge" which step 3 had
just removed it from.

The correct DSA hook for one-time actions per port at switch init time
is ds->ops->port_setup(). This is what seems to match the coder's
intention; also see the comment at the beginning of the file:

* At the initialization the driver allocates one bridge table entry for
~~~~~~~~~~~~~~~~~~~~~
* each switch port which is used when the port is used without an
* explicit bridge.

Fixes: 8206e0ce96b3 ("net: dsa: lantiq: Add VLAN unaware bridge offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-2-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

sched/deadline: Fix dl_server behaviour

John reported undesirable behaviour with the dl_server since commit:
cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling").

When starving fair tasks on purpose (starting spinning FIFO tasks),
his fair workload, which often goes (briefly) idle, would delay fair
invocations for a second, running one invocation per second was both
unexpected and terribly slow.

The reason this happens is that when dl_se->server_pick_task() returns
NULL, indicating no runnable tasks, it would yield, pushing any later
jobs out a whole period (1 second).

Instead simply stop the server. This should restore behaviour in that
a later wakeup (which restarts the server) will be able to continue
running (subject to the CBS wakeup rules).

Notably, this does not re-introduce the behaviour cccb45d7c4295 set
out to solve, any start/stop cycle is naturally throttled by the timer
period (no active cancel).

Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>

sched/deadline: Fix dl_server getting stuck

John found it was easy to hit lockup warnings when running locktorture
on a 2 CPU VM, which he bisected down to: commit cccb45d7c429
("sched/deadline: Less agressive dl_server handling").

While debugging it seems there is a chance where we end up with the
dl_server dequeued, with dl_se->dl_server_active. This causes
dl_server_start() to return without enqueueing the dl_server, thus it
fails to run when RT tasks starve the cpu.

When this happens, dl_server_timer() catches the
'!dl_se->server_has_tasks(dl_se)' case, which then calls
replenish_dl_entity() and dl_server_stopped() and finally return
HRTIMER_NO_RESTART.

This ends in no new timer and also no enqueue, leaving the dl_server
'dead', allowing starvation.

What should have happened is for the bandwidth timer to start the
zero-laxity timer, which in turn would enqueue the dl_server and cause
dl_se->server_pick_task() to be called -- which will stop the
dl_server if no fair tasks are observed for a whole period.

IOW, it is totally irrelevant if there are fair tasks at the moment of
bandwidth refresh.

This removes all dl_se->server_has_tasks() users, so remove the whole
thing.

Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>

afs: Fix potential null pointer dereference in afs_put_server

afs_put_server() accessed server->debug_id before the NULL check, which
could lead to a null pointer dereference. Move the debug_id assignment,
ensuring we never dereference a NULL server pointer.

Fixes: 2757a4dc1849 ("afs: Fix access after dec in put functions")
Cc: stable@vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'probes-fixes-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

- fprobe: Even if there is a memory allocation failure, try to remove
   the addresses recorded until then from the filter. Previously we just
   skipped it.

- tracing: dynevent: Add a missing lockdown check on dynevent. This
   dynevent is the interface for all probe events. Thus if there is no
   check, any probe events can be added after lock down the tracefs.

* tag 'probes-fixes-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: dynevent: Add a missing lockdown check on dynevent
  tracing: fprobe: Fix to remove recorded module addresses from filter

libie: fix string names for AQ error codes

The LIBIE_AQ_STR macro() introduced by commit 5feaa7a07b85 ("libie: add
adminq helper for converting err to str") is used in order to generate
strings for printing human readable error codes. Its definition is missing
the separating underscore ('_') character which makes the resulting strings
difficult to read. Additionally, the string won't match the source code,
preventing search tools from working properly.

Add the missing underscore character, fixing the error string names.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 5feaa7a07b85 ("libie: add adminq helper for converting err to str")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250923205657.846759-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: af_alg - Fix incorrect boolean values in af_alg_ctx

Commit 1b34cbbf4f01 ("crypto: af_alg - Disallow concurrent writes in
af_alg_sendmsg") changed some fields from bool to 1-bit bitfields of
type u32.

However, some assignments to these fields, specifically 'more' and
'merge', assign values greater than 1. These relied on C's implicit
conversion to bool, such that zero becomes false and nonzero becomes
true.

With a 1-bit bitfields of type u32 instead, mod 2 of the value is taken
instead, resulting in 0 being assigned in some cases when 1 was intended.

Fix this by restoring the bool type.

Fixes: 1b34cbbf4f01 ("crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'soc-fixes-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"There are a few minor code fixes for tegra firmware, i.MX firmware
  and the eyeq reset controller, and a MAINTAINERS update as Alyssa
  Rosenzweig moves on to non-kernel projects.

  The other changes are all for devicetree files:

   - Multiple Marvell Armada SoCs need changes to fix PCIe, audio and
     SATA

   - A socfpga board fails to probe the ethernet phy

   - The two temperature sensors on i.MX8MP are swapped

   - Allwinner devicetree files cause build-time warnings

   - Two Rockchip based boards need corrections for headphone detection
     and SPI flash"

* tag 'soc-fixes-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  MAINTAINERS: remove Alyssa Rosenzweig
  firmware: tegra: Do not warn on missing memory-region property
  arm64: dts: marvell: cn9132-clearfog: fix multi-lane pci x2 and x4 ports
  arm64: dts: marvell: cn9132-clearfog: disable eMMC high-speed modes
  arm64: dts: marvell: cn913x-solidrun: fix sata ports status
  ARM: dts: kirkwood: Fix sound DAI cells for OpenRD clients
  arm64: dts: imx8mp: Correct thermal sensor index
  ARM: imx: Kconfig: Adjust select after renamed config option
  firmware: imx: Add stub functions for SCMI CPU API
  firmware: imx: Add stub functions for SCMI LMM API
  firmware: imx: Add stub functions for SCMI MISC API
  riscv: dts: allwinner: rename devterm i2c-gpio node to comply with binding
  arm64: dts: rockchip: Fix the headphone detection on the orangepi 5
  arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6
  ARM: dts: socfpga: sodia: Fix mdio bus probe and PHY address
  reset: eyeq: fix OF node leak
  ARM64: dts: mcbin: fix SATA ports on Macchiatobin
  ARM: dts: armada-370-db: Fix stereo audio input routing on Armada 370
  ARM: dts: allwinner: Minor whitespace cleanup

Merge tag 'pm-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Rafael:
"Fix a locking issue in the cpufreq core introduced recently and caught
by lockdep (Christian Loehle)"

* tag 'pm-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Initialize cpufreq-based invariance before subsys

Merge tag 'for-6.17-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
"One more regression fix for a problem in zoned mode: mounting would
  fail if the number of open and active zones reached a common limit
  that didn't use to be checked"

* tag 'for-6.17-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: don't fail mount needlessly due to too many active zones

Merge tag '6.17-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- free_transport fix for disconnect races

- minor delayed work fix

* tag '6.17-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
smb: server: use disable_work_sync in transport_rdma.c
smb: server: don't use delayed_work for post_recv_credits_work

tracing: dynevent: Add a missing lockdown check on dynevent

Since dynamic_events interface on tracefs is compatible with
kprobe_events and uprobe_events, it should also check the lockdown
status and reject if it is set.

Link: https://lore.kernel.org/all/175824455687.45175.3734166065458520748.stgit@devnote2/
Fixes: 17911ff38aa5 ("tracing: Add locked_down checks to the open calls of files created for tracefs")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org

tracing: fprobe: Fix to remove recorded module addresses from filter

Even if there is a memory allocation failure in fprobe_addr_list_add(),
there is a partial list of module addresses. So remove the recorded
addresses from filter if exists.
This also removes the redundant ret local variable.

Fixes: a3dc2983ca7b ("tracing: fprobe: Cleanup fprobe hash when module unloading")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Menglong Dong <menglong8.dong@gmail.com>

kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17

clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:

     {
      __label__ local_lbl;
...
unsafe_get_user(uval, uaddr, local_lbl);
...
return 0;
local_lbl:
return -EFAULT;
     }

when two such scopes exist in the same function:

  error: cannot jump from this asm goto statement to one of its possible targets

There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.

That issue prevents using local labels for a cleanup based user access
mechanism.

After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.

But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.

The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:

void bar(void **);
void* baz(void);

int  foo (void) {
    {
    asm goto("jmp %l0"::::l0);
    return 0;
l0:
    return 1;
    }
    void *x __attribute__((cleanup(bar))) = baz();
    {
    asm goto("jmp %l0"::::l1);
    return 42;
l1:
    return 0xff;
    }
}

Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.

That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.

Thanks to Nathan for pointing to the relevant clang issue!

Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link: https://github.com/llvm/llvm-project/commit/f023f5cdb2e6c19026f04a15b5a935c041835d14

futex: Use correct exit on failure from futex_hash_allocate_default()

copy_process() uses the wrong error exit path from futex_hash_allocate_default().
After exiting from futex_hash_allocate_default(), neither tasklist_lock
nor siglock has been acquired. The exit label bad_fork_core_free unlocks
both of these locks which is wrong.

The next exit label, bad_fork_cancel_cgroup, is the correct exit.
sched_cgroup_fork() did not allocate any resources that need to freed.

Use bad_fork_cancel_cgroup on error exit from futex_hash_allocate_default().

Fixes: 7c4f75a21f636 ("futex: Allow automatic allocation of process wide futex hash")
Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/68cb1cbd.050a0220.2ff435.0599.GAE@google.com

MAINTAINERS: Update Paul Walmsley's E-mail address

My experiment with using corporate Gmail for Linux kernel list
interaction has come to an end. For my MAINTAINERS entries that
use that E-mail address, let's switch those to use the k.org E-mail
forwarding.

Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>

riscv: Use an atomic xchg in pudp_huge_get_and_clear()

Make sure we return the right pud value and not a value that could
have been overwritten in between by a different core.

Fixes: c3cc2a4a3a23 ("riscv: Add support for PUD THP")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250814-dev-alex-thp_pud_xchg-v1-1-b4704dfae206@rivosinc.com
[pjw@kernel.org: use xchg rather than atomic_long_xchg; avoid atomic op for !CONFIG_SMP like x86]
Signed-off-by: Paul Walmsley <pjw@kernel.org>

Merge branch 'mlx5-misc-fixes-2025-09-22'

Tariq Toukan says:

====================
mlx5 misc fixes 2025-09-22

This patchset provides misc bug fixes from the team to the mlx5 Eth
and core drivers.
====================

Link: https://patch.msgid.link/1758525094-816583-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD

Include MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD in the FEC RS stats
handling. This addresses a gap introduced when adding support for
200G/lane link modes.

Fixes: 4e343c11efbb ("net/mlx5e: Support FEC settings for 200G per lane link modes")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, ignore flow level for multi-dest table

When HWS creates multi-dest FW table and adds rules to
forward to other tables, ignore the flow level enforcement
in FW, because HWS is responsible for table levels.

This fixes the following error:

  mlx5_core 0000:08:00.0: mlx5_cmd_out_err:818:(pid 192306):
     SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
     status bad parameter(0x3), syndrome (0x6ae84c), err(-22)

Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: fs, fix UAF in flow counter release

Fix a kernel trace [1] caused by releasing an HWS action of a local flow
counter in mlx5_cmd_hws_delete_fte(), where the HWS action refcount and
mutex were not initialized and the counter struct could already be freed
when deleting the rule.

Fix it by adding the missing initializations and adding refcount for the
local flow counter struct.

[1] Kernel log:
Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x48
  mlx5_fs_put_hws_action.part.0.cold+0x21/0x94 [mlx5_core]
  mlx5_fc_put_hws_action+0x96/0xad [mlx5_core]
  mlx5_fs_destroy_fs_actions+0x8b/0x152 [mlx5_core]
  mlx5_cmd_hws_delete_fte+0x5a/0xa0 [mlx5_core]
  del_hw_fte+0x1ce/0x260 [mlx5_core]
  mlx5_del_flow_rules+0x12d/0x240 [mlx5_core]
  ? ttwu_queue_wakelist+0xf4/0x110
  mlx5_ib_destroy_flow+0x103/0x1b0 [mlx5_ib]
  uverbs_free_flow+0x20/0x50 [ib_uverbs]
  destroy_hw_idr_uobject+0x1b/0x50 [ib_uverbs]
  uverbs_destroy_uobject+0x34/0x1a0 [ib_uverbs]
  uobj_destroy+0x3c/0x80 [ib_uverbs]
  ib_uverbs_run_method+0x23e/0x360 [ib_uverbs]
  ? uverbs_finalize_object+0x60/0x60 [ib_uverbs]
  ib_uverbs_cmd_verbs+0x14f/0x2c0 [ib_uverbs]
  ? do_tty_write+0x1a9/0x270
  ? file_tty_write.constprop.0+0x98/0xc0
  ? new_sync_write+0xfc/0x190
  ib_uverbs_ioctl+0xd7/0x160 [ib_uverbs]
  __x64_sys_ioctl+0x87/0xc0
  do_syscall_64+0x59/0x90

Fixes: b581f4266928 ("net/mlx5: fs, manage flow counters HWS action sharing by refcount")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'nexthop-various-fixes'

Ido Schimmel says:

====================
nexthop: Various fixes

Patch #1 fixes a NPD that was recently reported by syzbot.

Patch #2 fixes an issue in the existing FIB nexthop selftest.

Patch #3 extends the selftest with test cases for the bug that was fixed
in the first patch.
====================

Link: https://patch.msgid.link/20250921150824.149157-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_nexthops: Add test cases for FDB status change

Add the following test cases for both IPv4 and IPv6:

* Can change from FDB nexthop to non-FDB nexthop and vice versa.
* Can change FDB nexthop address while in a group.
* Cannot change from FDB nexthop to non-FDB nexthop and vice versa while
  in a group.

Output without "nexthop: Forbid FDB status change while nexthop is in a
group":

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

IPv6 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
[...]

IPv4 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
[...]

Tests passed:  36
Tests failed:   4
Tests skipped:  0

Output with "nexthop: Forbid FDB status change while nexthop is in a
group":

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

IPv6 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
[...]

IPv4 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
[...]

Tests passed:  40
Tests failed:   0
Tests skipped:  0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_nexthops: Fix creation of non-FDB nexthops

The test creates non-FDB nexthops without a nexthop device which leads
to the expected failure, but for the wrong reason:

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 63 via 2001:db8:91::4
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 64 via 2001:db8:91::5
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 63/64 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
[...]

IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 14 via 172.16.1.2
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 15 via 172.16.1.3
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 14/15 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

COMMAND: ip -netns me-nRsN3E nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 104 group 14/15
Error: Invalid nexthop id.
TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
[...]
COMMAND: ip -netns me-0dlhyd ro add 172.16.0.0/22 nhid 15
Error: Nexthop id does not exist.
TEST: Route add with fdb nexthop                                    [ OK ]

In addition, as can be seen in the above output, a couple of IPv4 test
cases used the non-FDB nexthops (14 and 15) when they intended to use
the FDB nexthops (16 and 17). These test cases only passed because
failure was expected, but they failed for the wrong reason.

Fix the test to create the non-FDB nexthops with a nexthop device and
adjust the IPv4 test cases to use the FDB nexthops instead of the
non-FDB nexthops.

Output after the fix:

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 63 via 2001:db8:91::4 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 64 via 2001:db8:91::5 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 63/64 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
[...]

IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 14 via 172.16.1.2 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 15 via 172.16.1.3 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 14/15 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

COMMAND: ip -netns me-lNzfHP nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 104 group 16/17
Error: Non FDB nexthop group cannot have fdb nexthops.
TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
[...]
COMMAND: ip -netns me-lNzfHP ro add 172.16.0.0/22 nhid 16
Error: Route cannot point to a fdb nexthop.
TEST: Route add with fdb nexthop                                    [ OK ]
[...]
Tests passed:  30
Tests failed:   0
Tests skipped:  0

Fixes: 0534c5489c11 ("selftests: net: add fdb nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Forbid FDB status change while nexthop is in a group

The kernel forbids the creation of non-FDB nexthop groups with FDB
nexthops:

# ip nexthop add id 1 via 192.0.2.1 fdb
# ip nexthop add id 2 group 1
Error: Non FDB nexthop group cannot have fdb nexthops.

And vice versa:

# ip nexthop add id 3 via 192.0.2.2 dev dummy1
# ip nexthop add id 4 group 3 fdb
Error: FDB nexthop group can only have fdb nexthops.

However, as long as no routes are pointing to a non-FDB nexthop group,
the kernel allows changing the type of a nexthop from FDB to non-FDB and
vice versa:

# ip nexthop add id 5 via 192.0.2.2 dev dummy1
# ip nexthop add id 6 group 5
# ip nexthop replace id 5 via 192.0.2.2 fdb
# echo $?
0

This configuration is invalid and can result in a NPD [1] since FDB
nexthops are not associated with a nexthop device:

# ip route add 198.51.100.1/32 nhid 6
# ping 198.51.100.1

Fix by preventing nexthop FDB status change while the nexthop is in a
group:

# ip nexthop add id 7 via 192.0.2.2 dev dummy1
# ip nexthop add id 8 group 7
# ip nexthop replace id 7 via 192.0.2.2 fdb
Error: Cannot change nexthop FDB status while in a group.

[1]
BUG: kernel NULL pointer dereference, address: 00000000000003c0
[...]
Oops: Oops: 0000 [#1] SMP
CPU: 6 UID: 0 PID: 367 Comm: ping Not tainted 6.17.0-rc6-virtme-gb65678cacc03 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:fib_lookup_good_nhc+0x1e/0x80
[...]
Call Trace:
<TASK>
fib_table_lookup+0x541/0x650
ip_route_output_key_hash_rcu+0x2ea/0x970
ip_route_output_key_hash+0x55/0x80
__ip4_datagram_connect+0x250/0x330
udp_connect+0x2b/0x60
__sys_connect+0x9c/0xd0
__x64_sys_connect+0x18/0x20
do_syscall_64+0xa4/0x2a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops")
Reported-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c9a4d2.050a0220.3c6139.0e63.GAE@google.com/
Tested-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS

Currently, alloc_skb_with_frags() will only fill (MAX_SKB_FRAGS - 1)
slots. I think it should use all MAX_SKB_FRAGS slots, as callers of
alloc_skb_with_frags() will size their allocation of frags based
on MAX_SKB_FRAGS.

This issue was discovered via a test patch that sets 'order' to 0
in alloc_skb_with_frags(), which effectively tests/simulates high
fragmentation. In this case sendmsg() on unix sockets will fail every
time for large allocations. If the PAGE_SIZE is 4K, then data_len will
request 68K or 17 pages, but alloc_skb_with_frags() can only allocate
64K in this case or 16 pages.

Fixes: 09c2c90705bb ("net: allow alloc_skb_with_frags() to allocate bigger packets")
Signed-off-by: Jason Baron <jbaron@akamai.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250922191957.2855612-1-jbaron@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>