cifs: client: fix memory leak in smb3_fs_context_parse_param
The user calls fsconfig twice, but when the program exits, free() only
frees ctx->source for the second fsconfig, not the first.
Regarding fc->source, there is no code in the fs context related to its
memory reclamation.
To fix this memory leak, release the source memory corresponding to ctx
or fc before each parsing.
Reported-by: syzbot+72afd4c236e6bc3f4bac@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=72afd4c236e6bc3f4bac Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: fix cifs_pick_channel when channel needs reconnect
cifs_pick_channel iterates candidate channels using cur. The
reconnect-state test mistakenly used a different variable.
This checked the wrong slot and would cause us to skip a healthy channel
and to dispatch on one that needs reconnect, occasionally failing
operations when a channel was down.
Fix by replacing for the correct variable.
Fixes: fc43a8ac396d ("cifs: cifs_pick_channel should try selecting active channels") Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Sun, 9 Nov 2025 17:29:44 +0000 (09:29 -0800)]
Merge tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Two reverts merged into one commit to handle a regression caused by a
wrong cleanup because the underlying implications were unclear"
* tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: muxes: pca954x: Fix broken reset-gpio usage
Linus Torvalds [Sun, 9 Nov 2025 17:22:08 +0000 (09:22 -0800)]
Merge tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Strip trailing padding bytes from modules.builtin.modinfo to fix
error during modules_install with certain versions of kmod
- Drop unused static inline function warning in .c files with clang
from W=1 to W=2
- Ensure kernel-doc.py invocations use the PYTHON3 make variable to
ensure user's choice of Python interpreter is always respected
* tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Let kernel-doc.py use PYTHON3 override
compiler_types: Move unused static inline functions warning to W=2
kbuild: Strip trailing padding bytes from modules.builtin.modinfo
Jean Delvare [Fri, 7 Nov 2025 18:29:33 +0000 (19:29 +0100)]
kbuild: Let kernel-doc.py use PYTHON3 override
It is possible to force a specific version of python to be used when
building the kernel by passing PYTHON3= on the make command line.
However kernel-doc.py is currently called with python3 hard-coded and
thus ignores this setting.
Use $(PYTHON3) to run $(KERNELDOC) so that the desired version of
python is used.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://patch.msgid.link/20251107192933.2bfe9e57@endymion Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Linus Torvalds [Sat, 8 Nov 2025 23:37:03 +0000 (15:37 -0800)]
Merge tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fix from Dave Airlie:
"Brown paper bag, the dma mask fix which I applied and actually looked
through for bad things, actually broke newer GPUs, there might be some
latent part in the boot path that is assuming 32-bit still, but we
will figure that out elsewhere.
nouveau:
- revert DMA mask change"
* tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/nouveau: set DMA mask before creating the flush page"
Linus Torvalds [Sat, 8 Nov 2025 23:34:23 +0000 (15:34 -0800)]
Merge tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"The two reverts are for patches that I shouldn't have applied. The
rx8025 patch fixes an issue present since 2022:
Linus Torvalds [Sat, 8 Nov 2025 17:01:11 +0000 (09:01 -0800)]
Merge tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix AMD PCI root device caching regression that triggers
on certain firmware variants
- Fix the zen5_rdseed_microcode[] array to be NULL-terminated
- Add more AMD models to microcode signature checking
* tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Add more known models to entry sign checking
x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode
x86/amd_node: Fix AMD root device caching
Linus Torvalds [Sat, 8 Nov 2025 16:59:05 +0000 (08:59 -0800)]
Merge tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a group-throttling bug in the fair scheduler"
* tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
Linus Torvalds [Sat, 8 Nov 2025 16:43:01 +0000 (08:43 -0800)]
Merge tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
"This contain fixes for the RT and zoned allocator, and a few fixes for
atomic writes"
* tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: free xfs_busy_extents structure when no RT extents are queued
xfs: fix zone selection in xfs_select_open_zone_mru
xfs: fix a rtgroup leak when xfs_init_zone fails
xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
xfs: fix delalloc write failures in software-provided atomic writes
Tested the latest kernel on my GB203 and this seems to break it somehow.
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: GSP-FMC boot failed (mbox: 0x0000000b)
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: init failed, -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: init failed with -5
Nov 09 04:16:14 bighp kernel: nouveau: drm:00000000:00000080: init failed with -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: drm: Device allocation failed: -5
Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: probe with driver nouveau failed with error -5
Not sure why, I went over the patch and thought it should have worked, but there must be some
32-bit problem maybe in the FMC boot path.
Pavel Begunkov [Fri, 7 Nov 2025 18:41:26 +0000 (18:41 +0000)]
io_uring: fix regbuf vector size truncation
There is a report of io_estimate_bvec_size() truncating the calculated
number of segments that leads to corruption issues. Check it doesn't
overflow "int"s used later. Rough but simple, can be improved on top.
Cc: stable@vger.kernel.org Fixes: 9ef4cbbcb4ac3 ("io_uring: add infra for importing vectored reg buffers") Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-458654612@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Günther Noack <gnoack@google.com> Tested-by: Günther Noack <gnoack@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 7 Nov 2025 22:51:11 +0000 (14:51 -0800)]
Merge tag 'drm-fixes-2025-11-08' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Back from travel, thanks to Simona for handling things. regular fixes,
seems about the right size, but spread out a bit.
amdgpu has the usual range of fixes, xe has a few fixes, and nouveau
has a couple of fixes, one for blackwell modifiers on 8/16 bit
surfaces.
Otherwise a few small fixes for mediatek, sched, imagination and
pixpaper.
xe:
- Fix missing synchronization on unbind
- Fix device shutdown when doing FLR
- Fix user fence signaling order
i915:
- Avoid lock inversion when pinning to GGTT on CHV/BXT+VTD
- Fix conversion between clock ticks and nanoseconds
mediatek:
- Disable AFBC support on Mediatek DRM driver
- Add pm_runtime support for GCE power control
imagination:
- kconfig: Fix dependencies
nouveau:
- Set DMA mask earlier
- Advertize correct modifiers for GB20x
pixpaper:
- kconfig: Fix dependencies"
* tag 'drm-fixes-2025-11-08' of https://gitlab.freedesktop.org/drm/kernel: (26 commits)
drm/xe: Enforce correct user fence signaling order using
drm/xe: Do clean shutdown also when using flr
drm/xe: Move declarations under conditional branch
drm/xe/guc: Synchronize Dead CT worker with unbind
drm/amd/display: Enable mst when it's detected but yet to be initialized
drm/amdgpu: Fix wait after reset sequence in S3
drm/amd: Fix suspend failure with secure display TA
drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
drm/tiny: pixpaper: add explicit dependency on MMU
drm/nouveau: Advertise correct modifiers on GB20x
drm: define NVIDIA DRM format modifiers for GB20x
drm/nouveau: set DMA mask before creating the flush page
drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb
drm/amd/display: Fix NULL deref in debugfs odm_combine_segments
drm/amdkfd: Don't clear PT after process killed
drm/amdgpu/smu: Handle S0ix for vangogh
drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()
drm/amd/display: Fix black screen with HDMI outputs
drm/amd/display: Don't stretch non-native images by default in eDP
drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()
...
Linus Torvalds [Fri, 7 Nov 2025 21:19:18 +0000 (13:19 -0800)]
Merge tag 'parisc-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
- fix crash triggered by unaligned access in parisc unwinder
* tag 'parisc-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid crash due to unaligned access in unwinder
Linus Torvalds [Fri, 7 Nov 2025 21:13:09 +0000 (13:13 -0800)]
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
- Syzkaller found a case where maths overflows can cause divide by 0
- Typo in a compiler bug warning fix in the selftests broke the
selftests
- type1 compatability had a mismatch when unmapping an already unmapped
range, it should succeed
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()
iommufd: Don't overflow during division for dirty tracking
Peter Zijlstra [Thu, 6 Nov 2025 10:50:00 +0000 (11:50 +0100)]
compiler_types: Move unused static inline functions warning to W=2
Per Nathan, clang catches unused "static inline" functions in C files
since commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
Linus said:
> So I entirely ignore W=1 issues, because I think so many of the extra
> warnings are bogus.
>
> But if this one in particular is causing more problems than most -
> some teams do seem to use W=1 as part of their test builds - it's fine
> to send me a patch that just moves bad warnings to W=2.
>
> And if anybody uses W=2 for their test builds, that's THEIR problem..
Here is the change to bump the warning from W=1 to W=2.
Fixes: 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251106105000.2103276-1-andriy.shevchenko@linux.intel.com
[nathan: Adjust comment as well] Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Linus Torvalds [Fri, 7 Nov 2025 16:10:55 +0000 (08:10 -0800)]
Merge tag 'gpio-fixes-for-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- use the firmware node of the GPIO chip, not its label for software
node lookup
- fix invalid pointer access in GPIO debugfs
- drop unused functions from gpio-tb10x
- fix a regression in gpio-aggregator: restore the set_config()
callback in the driver
- correct schema $id path in ti,twl4030 DT bindings
* tag 'gpio-fixes-for-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: tb10x: Drop unused tb10x_set_bits() function
gpio: aggregator: restore the set_config operation
gpiolib: fix invalid pointer access in debugfs
gpio: swnode: don't use the swnode's name as the key for GPIO lookup
dt-bindings: gpio: ti,twl4030: Correct the schema $id path
Linus Torvalds [Fri, 7 Nov 2025 16:07:11 +0000 (08:07 -0800)]
Merge tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Check for reader catching up in ring_buffer_map_get_reader()
If the reader catches up to the writer in the memory mapped ring
buffer then calling rb_get_reader_page() will return NULL as there's
no pages left. But this isn't checked for before calling
rb_get_reader_page() and the return of NULL causes a warning.
If it is detected that the reader caught up to the writer, then
simply exit the routine
- Fix memory leak in histogram create_field_var()
The couple of the error paths in create_field_var() did not properly
clean up what was allocated. Make sure everything is freed properly
on error
- Fix help message of tools latency_collector
The help message incorrectly stated that "-t" was the same as
"--threads" whereas "--threads" is actually represented by "-e"
* tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/tools: Fix incorrcet short option in usage text for --threads
tracing: Fix memory leaks in create_field_var()
ring-buffer: Do not warn in ring_buffer_map_get_reader() when reader catches up
Linus Torvalds [Fri, 7 Nov 2025 15:52:45 +0000 (07:52 -0800)]
Merge tag 'io_uring-6.18-20251106' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Remove the sync refill API that was added in this release, in
anticipation of doing it in a better way for the next release
- Fix type extension for calculating size off nr_pages, like we do
in other spots
* tag 'io_uring-6.18-20251106' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: fix types for region size calulation
io_uring/zcrx: remove sync refill uapi
Linus Torvalds [Fri, 7 Nov 2025 15:47:08 +0000 (07:47 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"All fixes in the UFS driver.
The big contributor to the diffstats is the Intel controller S0ix/S3
fix which has to special case the suspend/resume patch for intel
controllers in ufshcd-pci.c"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix invalid probe error return value
scsi: ufs: ufs-pci: Set UFSHCD_QUIRK_PERFORM_LINK_STARTUP_ONCE for Intel ADL
scsi: ufs: core: Add a quirk to suppress link_startup_again
scsi: ufs: ufs-pci: Fix S0ix/S3 for Intel controllers
scsi: ufs: core: Revert "Make HID attributes visible"
scsi: ufs: core: Reduce link startup failure logging
scsi: ufs: core: Fix a race condition related to the "hid" attribute group
scsi: ufs: ufs-qcom: Fix UFS OCP issue during UFS power down (PC=3)
Linus Torvalds [Fri, 7 Nov 2025 15:39:57 +0000 (07:39 -0800)]
Merge tag 'v6.18-rc4-smb-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- More safely detect RDMA capable devices correctly
* tag 'v6.18-rc4-smb-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: detect RDMA capable netdevs include IPoIB
ksmbd: detect RDMA capable lower devices when bridge and vlan netdev is used
Zhang Chujun [Thu, 6 Nov 2025 03:10:40 +0000 (11:10 +0800)]
tracing/tools: Fix incorrcet short option in usage text for --threads
The help message incorrectly listed '-t' as the short option for
--threads, but the actual getopt_long configuration uses '-e'.
This mismatch can confuse users and lead to incorrect command-line
usage. This patch updates the usage string to correctly show:
"-e, --threads NRTHR"
to match the implementation.
Note: checkpatch.pl reports a false-positive spelling warning on
'Run', which is intentional.
Matthew Brost [Fri, 31 Oct 2025 23:40:45 +0000 (16:40 -0700)]
drm/xe: Enforce correct user fence signaling order using
Prevent application hangs caused by out-of-order fence signaling when
user fences are attached. Use drm_syncobj (via dma-fence-chain) to
guarantee that each user fence signals in order, regardless of the
signaling order of the attached fences. Ensure user fence writebacks to
user space occur in the correct sequence.
Jouni Högander [Fri, 31 Oct 2025 12:23:11 +0000 (14:23 +0200)]
drm/xe: Do clean shutdown also when using flr
Currently Xe driver is triggering flr without any clean-up on
shutdown. This is causing random warnings from pending related works as the
underlying hardware is reset in the middle of their execution.
Fix this by performing clean shutdown also when using flr.
Fixes: 501d799a47e2 ("drm/xe: Wire up device shutdown handler") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20251031122312.1836534-1-jouni.hogander@intel.com Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
(cherry picked from commit a4ff26b7c8ef38e4dd34f77cbcd73576fdde6dd4) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Tejas Upadhyay [Tue, 7 Oct 2025 10:02:08 +0000 (15:32 +0530)]
drm/xe: Move declarations under conditional branch
The xe_device_shutdown() function was needing a few declarations
that were only required under a specific condition. This change
moves those declarations to be within that conditional branch
to avoid unnecessary declarations.
drm/xe/guc: Synchronize Dead CT worker with unbind
Cancel and wait for any Dead CT worker to complete before continuing
with device unbinding. Else the worker will end up using resources freed
by the undind operation.
Cc: Zhanjun Dong <zhanjun.dong@intel.com> Fixes: d2c5a5a926f4 ("drm/xe/guc: Dead CT helper") Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20251103123144.3231829-6-balasubramani.vivekanandan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 492671339114e376aaa38626d637a2751cdef263) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Zilin Guan [Thu, 6 Nov 2025 12:01:32 +0000 (12:01 +0000)]
tracing: Fix memory leaks in create_field_var()
The function create_field_var() allocates memory for 'val' through
create_hist_field() inside parse_atom(), and for 'var' through
create_var(), which in turn allocates var->type and var->var.name
internally. Simply calling kfree() to release these structures will
result in memory leaks.
Use destroy_hist_field() to properly free 'val', and explicitly release
the memory of var->type and var->var.name before freeing 'var' itself.
Steven Rostedt [Thu, 16 Oct 2025 17:28:48 +0000 (13:28 -0400)]
ring-buffer: Do not warn in ring_buffer_map_get_reader() when reader catches up
The function ring_buffer_map_get_reader() is a bit more strict than the
other get reader functions, and except for certain situations the
rb_get_reader_page() should not return NULL. If it does, it triggers a
warning.
This warning was triggering but after looking at why, it was because
another acceptable situation was happening and it wasn't checked for.
If the reader catches up to the writer and there's still data to be read
on the reader page, then the rb_get_reader_page() will return NULL as
there's no new page to get.
In this situation, the reader page should not be updated and no warning
should trigger.
Linus Torvalds [Fri, 7 Nov 2025 00:24:12 +0000 (16:24 -0800)]
Merge tag 'probes-fixes-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probe fixes from Masami Hiramatsu:
- tprobe-events: Fix to register tracepoint correctly
tprobe-events missed to set tracepoint data structure before
registering callback when enabling it. This sets it correctly.
- tprobe-events: Fix to put tracepoint_user when disable the event
tprobe-events missed to unregister tracepoint callback when the event
is disabled. This ensures to unregister it.
* tag 'probes-fixes-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: tprobe-events: Fix to put tracepoint_user when disable the tprobe
tracing: tprobe-events: Fix to register tracepoint correctly
* tag 'perf-tools-fixes-for-v6.18-1-2025-11-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf symbols: Handle '1' symbols in /proc/kallsyms
tools headers asm: Sync fls headers header with the kernel sources
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources to handle new exit reasons
tools headers svm: Sync svm headers with the kernel sources
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
MAINTAINERS: Add James Clark as a perf tools reviewer
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Update tools's copy of drm.h to pick DRM_IOCTL_GEM_CHANGE_HANDLE
tools headers x86 cpufeatures: Sync with the kernel sources
tools headers x86: Sync table due to introducion of uprobe syscall
tools headers: Sync uapi/linux/fcntl.h with the kernel sources
tools headers: Sync uapi/linux/prctl.h with the kernel source
tools headers uapi: Update fs.h with the kernel sources
tools arch x86: Sync msr-index.h to pick AMD64_{PERF_CNTR_GLOBAL_STATUS_SET,SAVIC_CONTROL}, IA32_L3_QOS_{ABMC,EXT}_CFG
Linus Torvalds [Thu, 6 Nov 2025 23:44:18 +0000 (15:44 -0800)]
Merge tag 'riscv-for-linus-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
- A fix to disable KASAN checks while walking a non-current task's
stackframe (following x86)
- A fix for a kvrealloc()-related memory leak in
module_frob_arch_sections()
- Two replacements of strcpy() with strscpy()
- A change to use the RISC-V .insn assembler directive when possible to
assemble instructions from hex opcodes
- Some low-impact fixes in the ptdump code and kprobes test code
* tag 'riscv-for-linus-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
cpuidle: riscv-sbi: Replace deprecated strcpy in sbi_cpuidle_init_cpu
riscv: KGDB: Replace deprecated strcpy in kgdb_arch_handle_qxfer_pkt
riscv: asm: use .insn for making custom instructions
riscv: tests: Make RISCV_KPROBES_KUNIT tristate
riscv: tests: Rename kprobes_test_riscv to kprobes_riscv
riscv: Fix memory leak in module_frob_arch_sections()
riscv: ptdump: use seq_puts() in pt_dump_seq_puts() macro
riscv: stacktrace: Disable KASAN checks for non-current tasks
Linus Torvalds [Thu, 6 Nov 2025 23:40:14 +0000 (15:40 -0800)]
Merge tag 'acpi-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a coding mistake in the ACPI Smart Battery Subsystem (SBS)
driver and two documentation issues:
- Fix computation of the battery->present value in acpi_battery_read()
to work when battery->id is not zero (Dan Carpenter)
- Fix comment typo in the ACPI CPPC library (Chu Guangqing)
- Fix I2C device references in two ASL examples in the firmware guide
that were broken by a previous update (Jonas Gorski)"
* tag 'acpi-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: SBS: Fix present test in acpi_battery_read()
ACPI: CPPC: Fix typo in a comment
Documentation: ACPI: i2c-muxes: fix I2C device references
tracing: tprobe-events: Fix to put tracepoint_user when disable the tprobe
__unregister_trace_fprobe() checks tf->tuser to put it when removing
tprobe. However, disable_trace_fprobe() does not use it and only calls
unregister_fprobe(). Thus it forgets to disable tracepoint_user.
If the trace_fprobe has tuser, put it for unregistering the tracepoint
callbacks when disabling tprobe correctly.
tracing: tprobe-events: Fix to register tracepoint correctly
Since __tracepoint_user_init() calls tracepoint_user_register() without
initializing tuser->tpoint with given tracpoint, it does not register
tracepoint stub function as callback correctly, and tprobe does not work.
Initializing tuser->tpoint correctly before tracepoint_user_register()
so that it sets up tracepoint callback.
Merge two documentation fixes for 6.18-rc5, a commet typo fix in the
ACPI CPPC library (Chu Guangqing) and fixes for two ASL examples in the
firmware guide (Jonas Gorski).
Linus Torvalds [Thu, 6 Nov 2025 20:48:18 +0000 (12:48 -0800)]
Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:
"Two Curve25519 related fixes:
- Re-enable KASAN support on curve25519-hacl64.c with gcc.
- Disable the arm optimized Curve25519 code on CPU_BIG_ENDIAN
kernels. It has always been broken in that configuration"
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: arm/curve25519: Disable on CPU_BIG_ENDIAN
lib/crypto: curve25519-hacl64: Fix older clang KASAN workaround for GCC
Linus Torvalds [Thu, 6 Nov 2025 20:45:15 +0000 (12:45 -0800)]
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt fix from Eric Biggers:
"Fix an UBSAN warning that started occurring when the block layer
started supporting logical_block_size > PAGE_SIZE"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT
Wayne Lin [Wed, 5 Nov 2025 02:36:31 +0000 (10:36 +0800)]
drm/amd/display: Enable mst when it's detected but yet to be initialized
[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.
[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst
V2: Adjust the commit msg a bit
Fixes: bc068194f548 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT") Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62320fb8d91a0bddc44a228203cfa9bfbb5395bd) Cc: stable@vger.kernel.org
drm/amd: Fix suspend failure with secure display TA
commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.
Add that to the failure handling path even on a quick failure.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679 Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4104c0a454f6a4d1e0d14895d03c0e7bdd0c8240)
Samuel Zhang [Wed, 5 Nov 2025 03:04:08 +0000 (03:04 +0000)]
drm/amdgpu: fix gpu page fault after hibernation on PF passthrough
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.
Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.
The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5d1b32cfe4a676fe552416cb5ae847b215463a1a)
Linus Torvalds [Thu, 6 Nov 2025 16:52:30 +0000 (08:52 -0800)]
Merge tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
Including fixes from bluetooth and wireless.
Current release - new code bugs:
- ptp: expose raw cycles only for clocks with free-running counter
- bonding: fix null-deref in actor_port_prio setting
- mdio: ERR_PTR-check regmap pointer returned by
device_node_to_regmap()
- eth: libie: depend on DEBUG_FS when building LIBIE_FWLOG
Previous releases - regressions:
- virtio_net: fix perf regression due to bad alignment of
virtio_net_hdr_v1_hash
- Revert "wifi: ath10k: avoid unnecessary wait for service ready
message" caused regressions for QCA988x and QCA9984
- Revert "wifi: ath12k: Fix missing station power save configuration"
caused regressions for WCN7850
- eth: bnxt_en: shutdown FW DMA in bnxt_shutdown(), fix memory
corruptions after kexec
Previous releases - always broken:
- virtio-net: fix received packet length check for big packets
- sctp: fix races in socket diag handling
- wifi: add an hrtimer-based delayed work item to avoid low
granularity of timers set relatively far in the future, and use it
where it matters (e.g. when performing AP-scheduled channel switch)
- eth: mlx5e:
- correctly propagate error in case of module EEPROM read failure
- fix HW-GRO on systems with PAGE_SIZE == 64kB
- dsa: b53: fixes for tagging, link configuration / RMII, FDB,
multicast
- phy: lan8842: implement latest errata"
* tag 'net-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
selftests/vsock: avoid false-positives when checking dmesg
net: bridge: fix MST static key usage
net: bridge: fix use-after-free due to MST port state bypass
lan966x: Fix sleeping in atomic context
bonding: fix NULL pointer dereference in actor_port_prio setting
net: dsa: microchip: Fix reserved multicast address table programming
net: wan: framer: pef2256: Switch to devm_mfd_add_devices()
net: libwx: fix device bus LAN ID
net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages
net/mlx5e: SHAMPO, Fix skb size check for 64K pages
net/mlx5e: SHAMPO, Fix header mapping for 64K pages
net: ti: icssg-prueth: Fix fdb hash size configuration
net/mlx5e: Fix return value in case of module EEPROM read error
net: gro_cells: Reduce lock scope in gro_cell_poll
libie: depend on DEBUG_FS when building LIBIE_FWLOG
wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
netpoll: Fix deadlock in memory allocation under spinlock
net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error
virtio-net: fix received length check in big packets
bnxt_en: Fix warning in bnxt_dl_reload_down()
...
kbuild: Strip trailing padding bytes from modules.builtin.modinfo
After commit d50f21091358 ("kbuild: align modinfo section for Secureboot
Authenticode EDK2 compat"), running modules_install with certain
versions of kmod (such as 29.1 in Ubuntu Jammy) in certain
configurations may fail with:
depmod: ERROR: kmod_builtin_iter_next: unexpected string without modname prefix
The additional padding bytes to ensure .modinfo is aligned within
vmlinux.unstripped are unexpected by kmod, as this section has always
just been null-terminated strings.
Strip the trailing padding bytes from modules.builtin.modinfo after it
has been extracted from vmlinux.unstripped to restore the format that
kmod expects while keeping .modinfo aligned within vmlinux.unstripped to
avoid regressing the Authenticode calculation fix for EDK2.
Bobby Eshleman [Wed, 5 Nov 2025 15:59:19 +0000 (07:59 -0800)]
selftests/vsock: avoid false-positives when checking dmesg
Sometimes VMs will have some intermittent dmesg warnings that are
unrelated to vsock. Change the dmesg parsing to filter on strings
containing 'vsock' to avoid false positive failures that are unrelated
to vsock. The downside is that it is possible for some vsock related
warnings to not contain the substring 'vsock', so those will be missed.
Fixes: a4a65c6fe08b ("selftests/vsock: add initial vmtest.sh for vsock") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251105-vsock-vmtest-dmesg-fix-v2-1-1a042a14892c@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 6 Nov 2025 15:32:19 +0000 (07:32 -0800)]
Merge branch 'net-bridge-fix-two-mst-bugs'
Nikolay Aleksandrov says:
====================
net: bridge: fix two MST bugs
Patch 01 fixes a race condition that exists between expired fdb deletion
and port deletion when MST is enabled. Learning can happen after the
port's state has been changed to disabled which could lead to that
port's memory being used after it's been freed. The issue was reported
by syzbot, more information in patch 01. Patch 02 fixes an issue with
MST's static key which Ido spotted, we can have multiple bridges with MST
and a single bridge can erroneously disable it for all.
====================
As Ido pointed out, the static key usage in MST is buggy and should use
inc/dec instead of enable/disable because we can have multiple bridges
with MST enabled which means a single bridge can disable MST for all.
Use static_branch_inc/dec to avoid that. When destroying a bridge decrement
the key if MST was enabled.
Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Reported-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/netdev/20251104120313.1306566-1-razor@blackwall.org/T/#m6888d87658f94ed1725433940f4f4ebb00b5a68b Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251105111919.1499702-3-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: bridge: fix use-after-free due to MST port state bypass
syzbot reported[1] a use-after-free when deleting an expired fdb. It is
due to a race condition between learning still happening and a port being
deleted, after all its fdbs have been flushed. The port's state has been
toggled to disabled so no learning should happen at that time, but if we
have MST enabled, it will bypass the port's state, that together with VLAN
filtering disabled can lead to fdb learning at a time when it shouldn't
happen while the port is being deleted. VLAN filtering must be disabled
because we flush the port VLANs when it's being deleted which will stop
learning. This fix adds a check for the port's vlan group which is
initialized to NULL when the port is getting deleted, that avoids the port
state bypass. When MST is enabled there would be a minimal new overhead
in the fast-path because the port's vlan group pointer is cache-hot.
Horatiu Vultur [Wed, 5 Nov 2025 07:49:55 +0000 (08:49 +0100)]
lan966x: Fix sleeping in atomic context
The following warning was seen when we try to connect using ssh to the device.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 104, name: dropbear
preempt_count: 1, expected: 0
INFO: lockdep is turned off.
CPU: 0 UID: 0 PID: 104 Comm: dropbear Tainted: G W 6.18.0-rc2-00399-g6f1ab1b109b9-dirty #530 NONE
Tainted: [W]=WARN
Hardware name: Generic DT based system
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x7c/0xac
dump_stack_lvl from __might_resched+0x16c/0x2b0
__might_resched from __mutex_lock+0x64/0xd34
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from lan966x_stats_get+0x5c/0x558
lan966x_stats_get from dev_get_stats+0x40/0x43c
dev_get_stats from dev_seq_printf_stats+0x3c/0x184
dev_seq_printf_stats from dev_seq_show+0x10/0x30
dev_seq_show from seq_read_iter+0x350/0x4ec
seq_read_iter from seq_read+0xfc/0x194
seq_read from proc_reg_read+0xac/0x100
proc_reg_read from vfs_read+0xb0/0x2b0
vfs_read from ksys_read+0x6c/0xec
ksys_read from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0b11fa8 to 0xf0b11ff0)
1fa0: 000000010000100000000008be9048d80000100000000001
1fc0: 00000001000010000000000800000003be9059200000001e0000000000000001
1fe0: 0005404cbe9048c000018684b6ec2cd8
It seems that we are using a mutex in a atomic context which is wrong.
Change the mutex with a spinlock.
Fixes: 12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251105074955.1766792-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Wed, 5 Nov 2025 07:26:20 +0000 (07:26 +0000)]
bonding: fix NULL pointer dereference in actor_port_prio setting
Liang reported an issue where setting a slave’s actor_port_prio to
predefined values such as 0, 255, or 65535 would cause a system crash.
The problem occurs because in bond_opt_parse(), when the provided value
matches a predefined table entry, the function returns that table entry,
which does not contain slave information. Later, in
bond_option_actor_port_prio_set(), calling bond_slave_get_rtnl() leads
to a NULL pointer dereference.
Since actor_port_prio is defined as a u16 and initialized to the default
value of 255 in ad_initialize_port(), there is no need for the
bond_actor_port_prio_tbl. Using the BOND_OPTFLAG_RAWVAL flag is sufficient.
Fixes: 6b6dc81ee7e8 ("bonding: add support for per-port LACP actor priority") Reported-by: Liang Li <liali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20251105072620.164841-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KSZ9477/KSZ9897 and LAN937X families of switches use a reserved multicast
address table for some specific forwarding with some multicast addresses,
like the one used in STP. The hardware assumes the host port is the last
port in KSZ9897 family and port 5 in LAN937X family. Most of the time
this assumption is correct but not in other cases like KSZ9477.
Originally the function just setups the first entry, but the others still
need update, especially for one common multicast address that is used by
PTP operation.
LAN937x also uses different register bits when accessing the reserved
table.
Fixes: 457c182af597 ("net: dsa: microchip: generic access to ksz9477 static and reserved table") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Tested-by: Łukasz Majewski <lukma@nabladev.com> Link: https://patch.msgid.link/20251105033741.6455-1-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
LiangCheng Wang [Tue, 28 Oct 2025 02:55:38 +0000 (10:55 +0800)]
drm/tiny: pixpaper: add explicit dependency on MMU
The DRM_GEM_SHMEM_HELPER helper requires MMU enabled because it uses
vmf_insert_pfn() in its mmap implementation. On NOMMU configurations
(e.g. some RISC-V randconfig builds), this symbol is unavailable and
selecting DRM_GEM_SHMEM_HELPER causes a modpost undefined reference:
Normally, Kconfig prevents this helper from being selected when
CONFIG_MMU=n. However, in some randconfig builds (such as those used by
0day CI), select statements can override unmet dependencies, triggering
the issue.
Add an explicit dependency on MMU to DRM_PIXPAPER to prevent this.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510280213.0rlYA4T3-lkp@intel.com/ Fixes: 0c4932f6ddf8 ("drm/tiny: pixpaper: Fix missing dependency on DRM_GEM_SHMEM_HELPER") Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: LiangCheng Wang <zaq14760@gmail.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251028-bar-v1-1-edfbd13fafff@gmail.com
Peter Zijlstra [Wed, 16 Jul 2025 14:29:46 +0000 (16:29 +0200)]
futex: Optimize per-cpu reference counting
Shrikanth noted that the per-cpu reference counter was still some 10%
slower than the old immutable option (which removes the reference
counting entirely).
Further optimize the per-cpu reference counter by:
- switching from RCU to preempt;
- using __this_cpu_*() since we now have preempt disabled;
- switching from smp_load_acquire() to READ_ONCE().
This is all safe because disabling preemption inhibits the RCU grace
period exactly like rcu_read_lock().
Having preemption disabled allows using __this_cpu_*() provided the
only access to the variable is in task context -- which is the case
here.
Furthermore, since we know changing fph->state to FR_ATOMIC demands a
full RCU grace period we can rely on the implied smp_mb() from that to
replace the acquire barrier().
This is very similar to the percpu_down_read_internal() fast-path.
The reason this is significant for PowerPC is that it uses the generic
this_cpu_*() implementation which relies on local_irq_disable() (the
x86 implementation relies on it being a single memop instruction to be
IRQ-safe). Switching to preempt_disable() and __this_cpu*() avoids
this IRQ state swizzling. Also, PowerPC needs LWSYNC for the ACQUIRE
barrier, not having to use explicit barriers safes a bunch.
Combined this reduces the performance gap by half, down to some 5%.
Fixes: 760e6f7befba ("futex: Remove support for IMMUTABLE") Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251106092929.GR4067720@noisy.programming.kicks-ass.net
Aaron Lu [Thu, 30 Oct 2025 03:27:55 +0000 (11:27 +0800)]
sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
When a cfs_rq is to be throttled, its limbo list should be empty and
that's why there is a warn in tg_throttle_down() for non empty
cfs_rq->throttled_limbo_list.
When running a test with the following hierarchy:
root
/ \
A* ...
/ | \ ...
B
/ \
C*
where both A and C have quota settings, that warn on non empty limbo list
is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
part of the cfs_rq for the sake of simpler representation).
Debug showed it happened like this:
Task group C is created and quota is set, so in tg_set_cfs_bandwidth(),
cfs_rq_c is initialized with runtime_enabled set, runtime_remaining
equals to 0 and *unthrottled*. Before any tasks are enqueued to cfs_rq_c,
*multiple* throttled tasks can migrate to cfs_rq_c (e.g., due to task
group changes). When enqueue_task_fair(cfs_rq_c, throttled_task) is
called and cfs_rq_c is in a throttled hierarchy (e.g., A is throttled),
these throttled tasks are directly placed into cfs_rq_c's limbo list by
enqueue_throttled_task().
Later, when A is unthrottled, tg_unthrottle_up(cfs_rq_c) enqueues these
tasks. The first enqueue triggers check_enqueue_throttle(), and with zero
runtime_remaining, cfs_rq_c can be throttled in throttle_cfs_rq() if it
can't get more runtime and enters tg_throttle_down(), where the warning
is hit due to remaining tasks in the limbo list.
I think it's a chaos to trigger throttle on unthrottle path, the status
of a being unthrottled cfs_rq can be in a mixed state in the end, so fix
this by granting 1ns to cfs_rq in tg_set_cfs_bandwidth(). This ensures
cfs_rq_c has a positive runtime_remaining when initialized as unthrottled
and cannot enter tg_unthrottle_up() with zero runtime_remaining.
Also, update outdated comments in tg_throttle_down() since
unthrottle_cfs_rq() is no longer called with zero runtime_remaining.
While at it, remove a redundant assignment to se in tg_throttle_down().
Fixes: e1fad12dcb66 ("sched/fair: Switch to task based throttle model") Reviewed-By: Benjamin Segall <bsegall@google.com> Suggested-by: Benjamin Segall <bsegall@google.com> Signed-off-by: Aaron Lu <ziqianlu@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Hao Jia <jiahao1@lixiang.com> Link: https://patch.msgid.link/20251030032755.560-1-ziqianlu@bytedance.com
xfs: free xfs_busy_extents structure when no RT extents are queued
kmemleak occasionally reports leaking xfs_busy_extents structure
from xfs_scrub calls after running xfs/528 (but attributed to following
tests), which seems to be caused by not freeing the xfs_busy_extents
structure when tr.queued is 0 and xfs_trim_rtgroup_extents breaks out
of the main loop. Free the structure in this case.
Fixes: a3315d11305f ("xfs: use rtgroup busy extent list for FITRIM") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Vlastimil Babka [Mon, 3 Nov 2025 12:24:15 +0000 (13:24 +0100)]
slab: prevent infinite loop in kmalloc_nolock() with debugging
In review of a followup work, Harry noticed a potential infinite loop.
Upon closed inspection, it already exists for kmalloc_nolock() on a
cache with debugging enabled, since commit af92793e52c3 ("slab:
Introduce kmalloc_nolock() and kfree_nolock().")
When alloc_single_from_new_slab() fails to trylock node list_lock, we
keep retrying to get partial slab or allocate a new slab. If we indeed
interrupted somebody holding the list_lock, the trylock fill fail
deterministically and we end up allocating and defer-freeing slabs
indefinitely with no progress.
To fix it, fail the allocation if spinning is not allowed. This is
acceptable in the restricted context of kmalloc_nolock(), especially
with debugging enabled.
Reported-by: Harry Yoo <harry.yoo@oracle.com> Closes: https://lore.kernel.org/all/aQLqZjjq1SPD3Fml@hyeyoo/ Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20251103-fix-nolock-loop-v1-1-6e2b3e82b9da@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Jakub Kicinski [Thu, 6 Nov 2025 02:04:54 +0000 (18:04 -0800)]
Merge tag 'wireless-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Just two small fixes:
- ath12k: revert a change that caused performance regressions
- hwsim: don't ignore netns on netlink socket matching
* tag 'wireless-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
Revert "wifi: ath12k: Fix missing station power save configuration"
====================
Haotian Zhang [Wed, 5 Nov 2025 03:47:16 +0000 (11:47 +0800)]
net: wan: framer: pef2256: Switch to devm_mfd_add_devices()
The driver calls mfd_add_devices() but fails to call mfd_remove_devices()
in error paths after successful MFD device registration and in the remove
function. This leads to resource leaks where MFD child devices are not
properly unregistered.
Replace mfd_add_devices with devm_mfd_add_devices to automatically
manage the device resources.
Fixes: c96e976d9a05 ("net: wan: framer: Add support for the Lantiq PEF2256 framer") Suggested-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Acked-by: Herve Codina <herve.codina@bootlin.com> Link: https://patch.msgid.link/20251105034716.662-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiawen Wu [Tue, 4 Nov 2025 06:23:21 +0000 (14:23 +0800)]
net: libwx: fix device bus LAN ID
The device bus LAN ID was obtained from PCI_FUNC(), but when a PF
port is passthrough to a virtual machine, the function number may not
match the actual port index on the device. This could cause the driver
to perform operations such as LAN reset on the wrong port.
Fix this by reading the LAN ID from port status register.
Dragos Tatulea [Tue, 4 Nov 2025 06:48:35 +0000 (08:48 +0200)]
net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages
The MLX5E_SHAMPO_WQ_HEADER_PER_PAGE and
MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE macros are used directly in
several places under the assumption that there will always be more
headers per WQE than headers per page. However, this assumption doesn't
hold for 64K page sizes and higher MTUs (> 4K). This can be first
observed during header page allocation: ksm_entries will become 0 during
alignment to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.
This patch introduces 2 additional members to the mlx5e_shampo_hd struct
which are meant to be used instead of the macrose mentioned above.
When the number of headers per WQE goes below
MLX5E_SHAMPO_WQ_HEADER_PER_PAGE, clamp the number of headers per
page and expand the header size accordingly so that the headers
for one WQE cover a full page.
All the formulas are adapted to use these two new members.
Fixes: 945ca432bfd0 ("net/mlx5e: SHAMPO, Drop info array") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dragos Tatulea [Tue, 4 Nov 2025 06:48:34 +0000 (08:48 +0200)]
net/mlx5e: SHAMPO, Fix skb size check for 64K pages
mlx5e_hw_gro_skb_has_enough_space() uses a formula to check if there is
enough space in the skb frags to store more data. This formula is
incorrect for 64K page sizes and it triggers early GRO session
termination because the first fragment will blow up beyond
GRO_LEGACY_MAX_SIZE.
This patch adds a special case for page sizes >= GRO_LEGACY_MAX_SIZE
(64K) which uses the skb->len instead. Within this context,
the check is safe from fragment overflow because the hardware
will continuously fill the data up to the reservation size of 64K
and the driver will coalesce all data from the same page to the same
fragment. This means that the data will span one fragment or at most
two for such a large page size.
It is expected that the if statement will be optimized out as the
check is done with constants.
Dragos Tatulea [Tue, 4 Nov 2025 06:48:33 +0000 (08:48 +0200)]
net/mlx5e: SHAMPO, Fix header mapping for 64K pages
HW-GRO is broken on mlx5 for 64K page sizes. The patch in the fixes tag
didn't take into account larger page sizes when doing an align down
of max_ksm_entries. For 64K page size, max_ksm_entries is 0 which will skip
mapping header pages via WQE UMR. This breaks header-data split
and will result in the following syndrome:
Furthermore, the function that fills in WQE UMRs for the headers
(mlx5e_build_shampo_hd_umr()) only supports mapping page sizes that
fit in a single UMR WQE.
This patch goes back to the old non-aligned max_ksm_entries value and it
changes mlx5e_build_shampo_hd_umr() to support mapping a large page over
multiple UMR WQEs.
This means that mlx5e_build_shampo_hd_umr() can now leave a page only
partially mapped. The caller, mlx5e_alloc_rx_hd_mpwqe(), ensures that
there are enough UMR WQEs to cover complete pages by working on
ksm_entries that are multiples of MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.
Fixes: 8a0ee54027b1 ("net/mlx5e: SHAMPO, Simplify UMR allocation for headers") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1762238915-1027590-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ICSSG driver does the initial FDB configuration which
includes setting the control registers. Other run time
management like learning is managed by the PRU's. The default
FDB hash size used by the firmware is 512 slots, which is
currently missing in the current driver. Update the driver
FDB config to include FDB hash size as well.
Please refer trm [1] 6.4.14.12.17 section on how the FDB config
register gets configured. From the table 6-1404, there is a reset
field for FDB_HAS_SIZE which is 4, meaning 1024 slots. Currently
the driver is not updating this reset value from 4(1024 slots) to
3(512 slots). This patch fixes this by updating the reset value
to 512 slots.
[1]: https://www.ti.com/lit/pdf/spruim2 Fixes: abd5576b9c57f ("net: ti: icssg-prueth: Add support for ICSSG switch firmware") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251104104415.3110537-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Tue, 4 Nov 2025 14:15:36 +0000 (16:15 +0200)]
net/mlx5e: Fix return value in case of module EEPROM read error
mlx5e_get_module_eeprom_by_page() has weird error handling.
First, it is treating -EINVAL as a special case, but it is unclear why.
Second, it tries to fail "gracefully" by returning the number of bytes
read even in case of an error. This results in wrongly returning
success (0 return value) if the error occurs before any bytes were
read.
Simplify the error handling by returning an error when such occurs. This
also aligns with the error handling we have in mlx5e_get_module_eeprom()
for the old API.
This fixes the following case where the query fails, but userspace
ethtool wrongly treats it as success and dumps an output:
net: gro_cells: Reduce lock scope in gro_cell_poll
One GRO-cell device's NAPI callback can nest into the GRO-cell of
another device if the underlying device is also using GRO-cell.
This is the case for IPsec over vxlan.
These two GRO-cells are separate devices. From lockdep's point of view
it is the same because each device is sharing the same lock class and so
it reports a possible deadlock assuming one device is nesting into
itself.
Hold the bh_lock only while accessing gro_cell::napi_skbs in
gro_cell_poll(). This reduces the locking scope and avoids acquiring the
same lock class multiple times.
Fixes: 25718fdcbdd2 ("net: gro_cells: Use nested-BH locking for gro_cell") Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/all/66664116-edb8-48dc-ad72-d5223696dd19@nvidia.com/ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20251104153435.ty88xDQt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
libie: depend on DEBUG_FS when building LIBIE_FWLOG
LIBIE_FWLOG is unusable without DEBUG_FS. Mark it in Kconfig.
Fix build error on ixgbe when DEBUG_FS is not set. To not add another
layer of #if IS_ENABLED(LIBIE_FWLOG) in ixgbe fwlog code define debugfs
dentry even when DEBUG_FS isn't enabled. In this case the dummy
functions of LIBIE_FWLOG will be used, so not initialized dentry isn't a
problem.
Fixes: 641585bc978e ("ixgbe: fwlog support for e610") Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/lkml/f594c621-f9e1-49f2-af31-23fbcb176058@roeck-us.net/ Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20251104172333.752445-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
James Jones [Thu, 30 Oct 2025 18:11:53 +0000 (11:11 -0700)]
drm/nouveau: Advertise correct modifiers on GB20x
8 and 16 bit formats use a different layout on
GB20x than they did on prior chips. Add the
corresponding DRM format modifiers to the list of
modifiers supported by the display engine on such
chips, and filter the supported modifiers for each
format based on its bytes per pixel in
nv50_plane_format_mod_supported().
Note this logic will need to be updated when GB10
support is added, since it is a GB20x chip that
uses the pre-GB20x sector layout for all formats.
Fixes: 6cc6e08d4542 ("drm/nouveau/kms: add support for GB20x") Signed-off-by: James Jones <jajones@nvidia.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251030181153.1208-3-jajones@nvidia.com
James Jones [Thu, 30 Oct 2025 18:11:52 +0000 (11:11 -0700)]
drm: define NVIDIA DRM format modifiers for GB20x
The layout of bits within the individual tiles
(referred to as sectors in the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
changed for 8 and 16-bit surfaces starting in
Blackwell 2 GPUs (With the exception of GB10).
To denote the difference, extend the sector field
in the parametric format modifier definition used
to generate modifier values for NVIDIA hardware.
Without this change, it would be impossible to
differentiate the two layouts based on modifiers,
and as a result software could attempt to share
surfaces directly between pre-GB20x and GB20x
cards, resulting in corruption when the surface
was accessed on one of the GPUs after being
populated with content by the other.
Of note: This change causes the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to
evaluate its "s" parameter twice, with the side
effects that entails. I surveyed all usage of the
modifier in the kernel and Mesa code, and that
does not appear to be problematic in any current
usage, but I thought it was worth calling out.
Fixes: 6cc6e08d4542 ("drm/nouveau/kms: add support for GB20x") Signed-off-by: James Jones <jajones@nvidia.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251030181153.1208-2-jajones@nvidia.com
Timur Tabi [Tue, 14 Oct 2025 17:45:12 +0000 (12:45 -0500)]
drm/nouveau: set DMA mask before creating the flush page
Set the DMA mask before calling nvkm_device_ctor(), so that when the
flush page is created in nvkm_fb_ctor(), the allocation will not fail
if the page is outside of DMA address space, which can easily happen if
IOMMU is disable. In such situations, you will get an error like this:
nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0).
Commit 38f5359354d4 ("rm/nouveau/pci: set streaming DMA mask early")
set the mask after calling nvkm_device_ctor(), but back then there was
no flush page being created, which might explain why the mask wasn't
set earlier.
Flush page allocation was added in commit 5728d064190e ("drm/nouveau/fb:
handle sysmem flush page from common code"). nvkm_fb_ctor() calls
alloc_page(), which can allocate a page anywhere in system memory, but
then calls dma_map_page() on that page. But since the DMA mask is still
set to 32, the map can fail if the page is allocated above 4GB. This is
easy to reproduce on systems with a lot of memory and IOMMU disabled.
An alternative approach would be to force the allocation of the flush
page to low memory, by specifying __GFP_DMA32. However, this would
always allocate the page in low memory, even though the hardware can
access high memory.
Linus Torvalds [Wed, 5 Nov 2025 19:15:36 +0000 (11:15 -0800)]
Merge tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
- Fix/workaround a couple Rust 1.91.0 build issues when sanitizers are
enabled due to extra checking performed by the compiler and an
upstream issue already fixed for Rust 1.93.0
- Fix future Rust 1.93.0 builds by supporting the stabilized name for
the 'no-jump-tables' flag
- Fix a couple private/broken intra-doc links uncovered by the future
move of pin-init to 'syn'
* tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: kbuild: support `-Cjump-tables=n` for Rust 1.93.0
rust: kbuild: workaround `rustdoc` doctests modifier bug
rust: kbuild: treat `build_error` and `rustdoc` as kernel objects
rust: condvar: fix broken intra-doc link
rust: devres: fix private intra-doc link
xfs: fix zone selection in xfs_select_open_zone_mru
xfs_select_open_zone_mru needs to pass XFS_ZONE_ALLOC_OK to
xfs_try_use_zone because we only want to tightly pack into zones of the
same or a compatible temperature instead of any available zone.
This got broken in commit 0301dae732a5 ("xfs: refactor hint based zone
allocation"), which failed to update this particular caller when
switching to an enum. xfs/638 sometimes, but not reliably fails due to
this change.
Fixes: 0301dae732a5 ("xfs: refactor hint based zone allocation") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Drop the rtgrop reference when xfs_init_zone fails for a conventional
device.
Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Wed, 5 Nov 2025 00:15:38 +0000 (16:15 -0800)]
xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
I think there are several things wrong with this function:
A) xfs_bmapi_write can return a much larger unwritten mapping than what
the caller asked for. We convert part of that range to written, but
return the entire written mapping to iomap even though that's
inaccurate.
B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an
unwritten mapping could be *smaller* than the write range (or even
the hole range). In this case, we convert too much file range to
written state because we then return a smaller mapping to iomap.
C) It doesn't handle delalloc mappings. This I covered in the patch
that I already sent to the list.
D) Reassigning count_fsb to handle the hole means that if the second
cmap lookup attempt succeeds (due to racing with someone else) we
trim the mapping more than is strictly necessary. The changing
meaning of count_fsb makes this harder to notice.
E) The tracepoint is kinda wrong because @length is mutated. That makes
it harder to chase the data flows through this function because you
can't just grep on the pos/bytecount strings.
F) We don't actually check that the br_state = XFS_EXT_NORM assignment
is accurate, i.e that the cow fork actually contains a written
mapping for the range we're interested in
G) Somewhat inadequate documentation of why we need to xfs_trim_extent
so aggressively in this function.
H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped
the write range to s_maxbytes.
Fix these issues, and then the atomic writes regressions in generic/760,
generic/617, generic/091, generic/263, and generic/521 all go away for
me.
Cc: stable@vger.kernel.org # v6.16 Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Wed, 5 Nov 2025 00:12:00 +0000 (16:12 -0800)]
xfs: fix delalloc write failures in software-provided atomic writes
With the 20 Oct 2025 release of fstests, generic/521 fails for me on
regular (aka non-block-atomic-writes) storage:
QA output created by 521
dowrite: write: Input/output error
LOG DUMP (8553 total operations):
1( 1 mod 256): SKIPPED (no operation)
2( 2 mod 256): WRITE 0x7e000 thru 0x8dfff (0x10000 bytes) HOLE
3( 3 mod 256): READ 0x69000 thru 0x79fff (0x11000 bytes)
4( 4 mod 256): FALLOC 0x53c38 thru 0x5e853 (0xac1b bytes) INTERIOR
5( 5 mod 256): COPY 0x55000 thru 0x59fff (0x5000 bytes) to 0x25000 thru 0x29fff
6( 6 mod 256): WRITE 0x74000 thru 0x88fff (0x15000 bytes)
7( 7 mod 256): ZERO 0xedb1 thru 0x11693 (0x28e3 bytes)
with a warning in dmesg from iomap about XFS trying to give it a
delalloc mapping for a directio write. Fix the software atomic write
iomap_begin code to convert the reservation into a written mapping.
This doesn't fix the data corruption problems reported by generic/760,
but it's a start.
Cc: stable@vger.kernel.org # v6.16 Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Martin Willi [Mon, 3 Nov 2025 08:24:36 +0000 (09:24 +0100)]
wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
hwsim radios marked destroy_on_close are removed when the Netlink socket
that created them is closed. As the portid is not unique across network
namespaces, closing a socket in one namespace may remove radios in another
if it has the destroy_on_close flag set.
Instead of matching the network namespace, match the netgroup of the radio
to limit radio removal to those that have been created by the closing
Netlink socket. The netgroup of a radio identifies the network namespace
it was created in, and matching on it removes a destroy_on_close radio
even if it has been moved to another namespace.
In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).
CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.
Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.
Linus Torvalds [Wed, 5 Nov 2025 09:56:15 +0000 (18:56 +0900)]
Merge tag 'media/v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- honour privacy led with pdx86/int3472
- fix invalid file access on cx18 and ivtv
- forbid remove_bufs when legacy fileio is active on videbuf2
- add an heuristic to find stream entity on uvcvideo
* tag 'media/v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videobuf2: forbid remove_bufs when legacy fileio is active
media: uvcvideo: Use heuristic to find stream entity
media: v4l2-subdev / pdx86: int3472: Use "privacy" as con_id for the privacy LED
media: ivtv: Fix invalid access to file *
media: cx18: Fix invalid access to file *
Breno Leitao [Mon, 3 Nov 2025 16:38:17 +0000 (08:38 -0800)]
netpoll: Fix deadlock in memory allocation under spinlock
Fix a AA deadlock in refill_skbs() where memory allocation while holding
skb_pool->lock can trigger a recursive lock acquisition attempt.
The deadlock scenario occurs when the system is under severe memory
pressure:
1. refill_skbs() acquires skb_pool->lock (spinlock)
2. alloc_skb() is called while holding the lock
3. Memory allocator fails and calls slab_out_of_memory()
4. This triggers printk() for the OOM warning
5. The console output path calls netpoll_send_udp()
6. netpoll_send_udp() attempts to acquire the same skb_pool->lock
7. Deadlock: the lock is already held by the same CPU
This bug was exposed by commit 248f6571fd4c51 ("netpoll: Optimize skb
refilling on critical path") which removed refill_skbs() from the
critical path (where nested printk was being deferred), letting nested
printk being called from inside refill_skbs()
Refactor refill_skbs() to never allocate memory while holding
the spinlock.
Another possible solution to fix this problem is protecting the
refill_skbs() from nested printks, basically calling
printk_deferred_{enter,exit}() in refill_skbs(), then, any nested
pr_warn() would be deferred.
I prefer this approach, given I _think_ it might be a good idea to move
the alloc_skb() from GFP_ATOMIC to GFP_KERNEL in the future, so, having
the alloc_skb() outside of the lock will be necessary step.
There is a possible TOCTOU issue when checking for the pool length, and
queueing the new allocated skb, but, this is not an issue, given that
an extra SKB in the pool is harmless and it will be eventually used.
Nishanth Menon [Mon, 3 Nov 2025 16:28:11 +0000 (10:28 -0600)]
net: ethernet: ti: netcp: Standardize knav_dma_open_channel to return NULL on error
Make knav_dma_open_channel consistently return NULL on error instead
of ERR_PTR. Currently the header include/linux/soc/ti/knav_dma.h
returns NULL when the driver is disabled, but the driver
implementation does not even return NULL or ERR_PTR on failure,
causing inconsistency in the users. This results in a crash in
netcp_free_navigator_resources as followed (trimmed):
Unhandled fault: alignment exception (0x221) at 0xfffffff2
[fffffff2] *pgd=80000800207003, *pmd=82ffda003, *pte=00000000
Internal error: : 221 [#1] SMP ARM
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc7 #1 NONE
Hardware name: Keystone
PC is at knav_dma_close_channel+0x30/0x19c
LR is at netcp_free_navigator_resources+0x2c/0x28c
[... TRIM...]
Call trace:
knav_dma_close_channel from netcp_free_navigator_resources+0x2c/0x28c
netcp_free_navigator_resources from netcp_ndo_open+0x430/0x46c
netcp_ndo_open from __dev_open+0x114/0x29c
__dev_open from __dev_change_flags+0x190/0x208
__dev_change_flags from netif_change_flags+0x1c/0x58
netif_change_flags from dev_change_flags+0x38/0xa0
dev_change_flags from ip_auto_config+0x2c4/0x11f0
ip_auto_config from do_one_initcall+0x58/0x200
do_one_initcall from kernel_init_freeable+0x1cc/0x238
kernel_init_freeable from kernel_init+0x1c/0x12c
kernel_init from ret_from_fork+0x14/0x38
[... TRIM...]
Standardize the error handling by making the function return NULL on
all error conditions. The API is used in just the netcp_core.c so the
impact is limited.
Note, this change, in effect reverts commit 5b6cb43b4d62 ("net:
ethernet: ti: netcp_core: return error while dma channel open issue"),
but provides a less error prone implementation.
Suggested-by: Simon Horman <horms@kernel.org> Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251103162811.3730055-1-nm@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>