]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
9 days agoMerge branch 'next' into for-linus
Dmitry Torokhov [Mon, 20 Apr 2026 01:28:57 +0000 (18:28 -0700)] 
Merge branch 'next' into for-linus

Prepare input updates for 7.1 merge window.

9 days agoInput: charlieplex_keypad - add GPIO charlieplex keypad
Hugo Villeneuve [Sun, 19 Apr 2026 05:18:54 +0000 (22:18 -0700)] 
Input: charlieplex_keypad - add GPIO charlieplex keypad

Add support for GPIO-based charlieplex keypad, allowing to control
N^2-N keys using N GPIO lines.

Reuse matrix keypad keymap to simplify, even if there is no concept
of rows and columns in this type of keyboard.

Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20260312180304.3865850-5-hugo@hugovil.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
9 days agodt-bindings: input: add GPIO charlieplex keypad
Hugo Villeneuve [Sun, 19 Apr 2026 05:18:30 +0000 (22:18 -0700)] 
dt-bindings: input: add GPIO charlieplex keypad

Add DT bindings for GPIO charlieplex keypad.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://patch.msgid.link/20260312180304.3865850-4-hugo@hugovil.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
9 days agodt-bindings: input: add settling-time-us common property
Hugo Villeneuve [Sun, 19 Apr 2026 05:17:43 +0000 (22:17 -0700)] 
dt-bindings: input: add settling-time-us common property

Add common property that can be reused by other bindings.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://patch.msgid.link/20260312180304.3865850-3-hugo@hugovil.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
9 days agodt-bindings: input: add debounce-delay-ms common property
Hugo Villeneuve [Sun, 19 Apr 2026 05:16:38 +0000 (22:16 -0700)] 
dt-bindings: input: add debounce-delay-ms common property

A few bindings are already defining a debounce-delay-ms property, so
add it to the input binding to reduce redundant redefines.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://patch.msgid.link/20260312180304.3865850-2-hugo@hugovil.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
9 days agoInput: imx_keypad - fix spelling mistake "Colums" -> "Columns"
Ethan Carter Edwards [Sun, 19 Apr 2026 00:58:32 +0000 (20:58 -0400)] 
Input: imx_keypad - fix spelling mistake "Colums" -> "Columns"

There is a spelling mistake in two comments. Fix them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Link: https://patch.msgid.link/20260418-imx-typo-v1-1-2a15e54ad4e7@ethancedwards.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
9 days agoInput: edt-ft5x06 - fix use-after-free in debugfs teardown
Dmitry Torokhov [Sat, 11 Apr 2026 04:13:43 +0000 (21:13 -0700)] 
Input: edt-ft5x06 - fix use-after-free in debugfs teardown

The commit 68743c500c6e ("Input: edt-ft5x06 - use per-client debugfs
directory") removed the manual debugfs teardown, relying on the I2C core
to handle it. However, this creates a window where debugfs files are
still accessible after edt_ft5x06_ts_teardown_debugfs() frees
tsdata->raw_buffer.

To prevent a use-after-free, protect the freeing of raw_buffer with the
device mutex and set raw_buffer to NULL. The debugfs read function
already checks if raw_buffer is NULL under the same mutex, so this
safely avoids the use-after-free.

Fixes: 68743c500c6e ("Input: edt-ft5x06 - use per-client debugfs directory")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/adnJicDh-bTUaWXP@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
9 days agodrm/v3d: Reject empty multisync extension to prevent infinite loop
Ashutosh Desai [Wed, 15 Apr 2026 05:00:00 +0000 (05:00 +0000)] 
drm/v3d: Reject empty multisync extension to prevent infinite loop

v3d_get_extensions() walks a userspace-provided singly-linked list of
ioctl extensions without any bound on the chain length. A local user
can craft a self-referential extension (ext->next == &ext) with zero
in_sync_count and out_sync_count, which bypasses the existing duplicate-
extension guard:

    if (se->in_sync_count || se->out_sync_count)
            return -EINVAL;

The guard never fires because v3d_get_multisync_post_deps() returns
immediately when count is zero, leaving both fields at zero on every
iteration. The result is an infinite loop in kernel context, blocking
the calling thread and pegging a CPU core indefinitely.

Fix this by rejecting a multisync extension where both in_sync_count
and out_sync_count are zero in v3d_get_multisync_submit_deps(). An
empty multisync carries no synchronization information and serves no
useful purpose, so returning -EINVAL for such an extension is the
correct defense against this attack vector.

Fixes: e4165ae8304e ("drm/v3d: add multiple syncobjs support")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
Link: https://patch.msgid.link/20260415050000.3816128-1-ashutoshdesai993@gmail.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
9 days agoMAINTAINERS: add Rust I2C tree and update Igor Korotin's email
Igor Korotin [Thu, 9 Apr 2026 18:41:00 +0000 (19:41 +0100)] 
MAINTAINERS: add Rust I2C tree and update Igor Korotin's email

Add a git tree entry for Rust I2C development and update the e-mail
address. The tree will be used to collect patches and provide a basis
for integration and testing, including linux-next.

Signed-off-by: Igor Korotin <igor.korotin@linux.dev>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
9 days agoMerge tag 'i2c-host-7.1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi...
Wolfram Sang [Sun, 19 Apr 2026 22:03:38 +0000 (00:03 +0200)] 
Merge tag 'i2c-host-7.1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow

i2c-host for v7.1, part 2

- cx92755: convert I2C bindings to DT schema
- mediatek: add optional bus power management during transfers
- pxa: handle early bus busy condition

9 days agoMerge tag 'mm-hotfixes-stable-2026-04-19-00-14' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 19 Apr 2026 21:45:37 +0000 (14:45 -0700)] 
Merge tag 'mm-hotfixes-stable-2026-04-19-00-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM fixes from Andrew Morton:
 "7 hotfixes. 6 are cc:stable and all are for MM. Please see the
  individual changelogs for details"

* tag 'mm-hotfixes-stable-2026-04-19-00-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/damon/core: disallow non-power of two min_region_sz on damon_start()
  mm/vmalloc: take vmap_purge_lock in shrinker
  mm: call ->free_folio() directly in folio_unmap_invalidate()
  mm: blk-cgroup: fix use-after-free in cgwb_release_workfn()
  mm/zone_device: do not touch device folio after calling ->folio_free()
  mm/damon/core: disallow time-quota setting zero esz
  mm/mempolicy: fix weighted interleave auto sysfs name

9 days agoMerge tag 'driver-core-7.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 19 Apr 2026 19:58:08 +0000 (12:58 -0700)] 
Merge tag 'driver-core-7.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fixes from Danilo Krummrich:

 - Prevent a device from being probed before device_add() has finished
   initializing it; gate probe with a "ready_to_probe" device flag to
   avoid races with concurrent driver_register() calls

 - Fix a kernel-doc warning for DEV_FLAG_COUNT introduced by the above

 - Return -ENOTCONN from software_node_get_reference_args() when a
   referenced software node is known but not yet registered, allowing
   callers to defer probe

 - In sysfs_group_attrs_change_owner(), also check is_visible_const();
   missed when the const variant was introduced

* tag 'driver-core-7.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  driver core: Add kernel-doc for DEV_FLAG_COUNT enum value
  sysfs: attribute_group: Respect is_visible_const() when changing owner
  software node: return -ENOTCONN when referenced swnode is not registered yet
  driver core: Don't let a device probe until it's ready

9 days agoMerge tag 'staging-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 19 Apr 2026 15:51:32 +0000 (08:51 -0700)] 
Merge tag 'staging-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver updates from Greg KH:
 "Here is the "big" set of staging driver changes for 7.1-rc1.

  Nothing major in here at all, just lots of little cleanups for the
  staging drivers, driven by new developers getting their feet wet in
  kernel development. "Largest" thing in here is the change of some of
  the octeon variable types into proper kernel ones.

  Full details are in the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (154 commits)
  staging: rtl8723bs: remove redundant & parentheses
  staging: most: dim2: replace BUG_ON() in poison_channel()
  staging: most: dim2: replace BUG_ON() in enqueue()
  staging: most: dim2: replace BUG_ON() in configure_channel()
  staging: most: dim2: replace BUG_ON() in service_done_flag()
  staging: most: dim2: replace BUG_ON() in try_start_dim_transfer()
  staging: rtl8723bs: remove unused RTL8188E antenna selection macros
  staging: rtl8723bs: remove redundant blank lines in basic_types.h
  staging: rtl8723bs: wrap complex macros with parentheses
  staging: rtl8723bs: remove unused WRITEEF/READEF byte macros
  staging: rtl8723bs: rename camelCase variable
  staging: greybus: audio: fix error message for BTN_3 button
  staging: rtl8723bs: rename variables to snake_case
  staging: rtl8723bs: fix spelling in comment
  staging: rtl8723bs: cleanup return in sdio_init()
  staging: rtl8723bs: use direct returns in sdio_dvobj_init()
  staging: rtl8723bs: remove unused arg at odm_interface.h
  greybus: raw: fix use-after-free if write is called after disconnect
  greybus: raw: fix use-after-free on cdev close
  staging: rtl8723bs: fix logical continuations in xmit_linux.c
  ...

9 days agoMerge tag 'usb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 19 Apr 2026 15:47:40 +0000 (08:47 -0700)] 
Merge tag 'usb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt updates from Greg KH:
 "Here is the big set of USB and Thunderbolt changes for 7.1-rc1.

  Lots of little things in here, nothing major, just constant
  improvements, updates, and new features. Highlights are:

   - new USB power supply driver support.

     These changes did touch outside of drivers/usb/ but got acks from
     the relevant mantainers for them.

   - dts file updates and conversions

   - string function conversions into "safer" ones

   - new device quirks

   - xhci driver updates

   - usb gadget driver minor fixes

   - typec driver additions and updates

   - small number of thunderbolt driver changes

   - dwc3 driver updates and additions of new hardware support

   - other minor driver updates

  All of these have been in the linux-next tree for a while with no
  reported issues"

* tag 'usb-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (176 commits)
  usb: dwc3: starfive: Add JHB100 USB 2.0 DRD controller
  dt-bindings: usb: dwc3: add support for StarFive JHB100
  dt-bindings: usb: atmel,at91sam9rl-udc: convert to DT schema
  dt-bindings: usb: atmel,at91rm9200-udc: convert to DT schema
  dt-bindings: usb: generic-ehci: fix schema structure and add at91sam9g45 constraints
  dt-bindings: usb: generic-ohci: add AT91RM9200 OHCI binding support
  arm: dts: at91: remove unused #address-cells/#size-cells from sam9x60 udc node
  drivers/usb/host: Fix spelling error 'seperate' -> 'separate'
  usbip: tools: add hint when no exported devices are found
  USB: serial: iuu_phoenix: fix iuutool author name
  usb: gadget: f_ncm: validate minimum block_len in ncm_unwrap_ntb()
  usb: gadget: f_phonet: fix skb frags[] overflow in pn_rx_complete()
  usb: gadget: f_hid: Add missing error code
  usb: typec: cros_ec_ucsi: Load driver from OF and ACPI definitions
  dt-bindings: chrome: Add cros-ec-ucsi compatibility to typec binding
  USB: of: Simplify with scoped for each OF child loop
  usbip: validate number_of_packets in usbip_pack_ret_submit()
  usb: gadget: renesas_usb3: validate endpoint index in standard request handlers
  usb: core: config: reverse the size check of the SSP isoc endpoint descriptor
  usb: typec: ucsi: Set usb mode on partner change
  ...

9 days agoMerge tag 'tty-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 19 Apr 2026 15:44:41 +0000 (08:44 -0700)] 
Merge tag 'tty-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here is the set of tty and serial driver changes for 7.1-rc1.

  Not much here this cycle, biggest thing is the removal of an old
  driver that never got any actual hardware support (esp32), and the
  second try to moving the tty ports to their own workqueues (first try
  was in 7.0-rc1 but was reverted due to problems)

  Otherwise it's just a small set of driver updates and some vt modifier
  key enhancements.

  All have been in linux-next for a while with no reported issues"

* tag 'tty-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (35 commits)
  tty: serial: ip22zilog: Fix section mispatch warning
  hvc/xen: Check console connection flag
  serial: sh-sci: Add support for RZ/G3L RSCI
  dt-bindings: serial: renesas,rsci: Document RZ/G3L SoC
  tty: atmel_serial: update outdated reference to atmel_tasklet_func()
  serial: xilinx_uartps: Drop unused include
  serial: qcom-geni: drop stray newline format specifier
  serial: 8250: loongson: Enable building on MIPS Loongson64
  dt-bindings: serial: 8250: Add Loongson 3A4000 uart compatible
  serial: 8250_fintek: Add support for F81214E
  tty: tty_port: add workqueue to flip TTY buffer
  vt: support ITU-T T.416 color subparameters
  serial: qcom-geni: Fix RTS behavior with flow control
  tty: serial: imx: keep dma request disabled before dma transfer setup
  tty: serial: 8250: Add SystemBase Multi I/O cards
  serial: pic32_uart: allow driver to be compiled on all architectures with COMPILE_TEST
  serial: tegra: remove Kconfig dependency on APB DMA controller
  dt-bindings: serial: amlogic,meson-uart: Add compatible string for A9
  dt-bindings: serial: atmel,at91-usart: add microchip,lan9691-usart
  serial: auart: check clk_enable() return in console write
  ...

9 days agoMerge tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 19 Apr 2026 15:01:17 +0000 (08:01 -0700)] 
Merge tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song)

   Address the longstanding "dying memcg problem". A situation wherein a
   no-longer-used memory control group will hang around for an extended
   period pointlessly consuming memory

 - "fix unexpected type conversions and potential overflows" (Qi Zheng)

   Fix a couple of potential 32-bit/64-bit issues which were identified
   during review of the "Eliminate Dying Memory Cgroup" series

 - "kho: history: track previous kernel version and kexec boot count"
   (Breno Leitao)

   Use Kexec Handover (KHO) to pass the previous kernel's version string
   and the number of kexec reboots since the last cold boot to the next
   kernel, and print it at boot time

 - "liveupdate: prevent double preservation" (Pasha Tatashin)

   Teach LUO to avoid managing the same file across different active
   sessions

 - "liveupdate: Fix module unloading and unregister API" (Pasha
   Tatashin)

   Address an issue with how LUO handles module reference counting and
   unregistration during module unloading

 - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar)

   Simplify and clean up the zswap crypto compression handling and
   improve the lifecycle management of zswap pool's per-CPU acomp_ctx
   resources

 - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race"
   (SeongJae Park)

   Address unlikely but possible leaks and deadlocks in damon_call() and
   damon_walk()

 - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park)

   Fix a couple of root-only wild pointer dereferences

 - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race"
   (SeongJae Park)

   Update the DAMON documentation to warn operators about potential
   races which can occur if the commit_inputs parameter is altered at
   the wrong time

 - "Minor hmm_test fixes and cleanups" (Alistair Popple)

   Bugfixes and a cleanup for the HMM kernel selftests

 - "Modify memfd_luo code" (Chenghao Duan)

   Cleanups, simplifications and speedups to the memfd_lou code

 - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport)

   Support for userfaultfd in guest_memfd

 - "selftests/mm: skip several tests when thp is not available" (Chunyu
   Hu)

   Fix several issues in the selftests code which were causing breakage
   when the tests were run on CONFIG_THP=n kernels

 - "mm/mprotect: micro-optimization work" (Pedro Falcato)

   A couple of nice speedups for mprotect()

 - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav)

   Document upcoming changes in the maintenance of KHO, LUO, memfd_luo,
   kexec, crash, kdump and probably other kexec-based things - they are
   being moved out of mm.git and into a new git tree

* tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits)
  MAINTAINERS: add page cache reviewer
  mm/vmscan: avoid false-positive -Wuninitialized warning
  MAINTAINERS: update Dave's kdump reviewer email address
  MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE
  MAINTAINERS: drop include/linux/kho/abi/ from KHO
  MAINTAINERS: update KHO and LIVE UPDATE maintainers
  MAINTAINERS: update kexec/kdump maintainers entries
  mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd()
  selftests: mm: skip charge_reserved_hugetlb without killall
  userfaultfd: allow registration of ranges below mmap_min_addr
  mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update
  mm/hugetlb: fix early boot crash on parameters without '=' separator
  zram: reject unrecognized type= values in recompress_store()
  docs: proc: document ProtectionKey in smaps
  mm/mprotect: special-case small folios when applying permissions
  mm/mprotect: move softleaf code out of the main function
  mm: remove '!root_reclaim' checking in should_abort_scan()
  mm/sparse: fix comment for section map alignment
  mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete()
  selftests/mm: transhuge_stress: skip the test when thp not available
  ...

9 days agocifs: update internal module version number
Steve French [Fri, 17 Apr 2026 04:26:08 +0000 (23:26 -0500)] 
cifs: update internal module version number

  to 2.60

Signed-off-by: Steve French <stfrench@microsoft.com>
9 days agosmb: client: compress: fix bad encoding on last LZ77 flag
Enzo Matsumiya [Mon, 13 Apr 2026 19:07:07 +0000 (16:07 -0300)] 
smb: client: compress: fix bad encoding on last LZ77 flag

End-of-stream flag could lead to UB because of int promotion
(overwriting signed bit).

Fix it by changing operand from '1' to '1UL'.

Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
9 days agomm/damon/core: disallow non-power of two min_region_sz on damon_start()
SeongJae Park [Sat, 11 Apr 2026 21:36:36 +0000 (14:36 -0700)] 
mm/damon/core: disallow non-power of two min_region_sz on damon_start()

Commit d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region") introduced
a bug that allows unaligned DAMON region address ranges.  Commit
c80f46ac228b ("mm/damon/core: disallow non-power of two min_region_sz")
fixed it, but only for damon_commit_ctx() use case.  Still, DAMON sysfs
interface can emit non-power of two min_region_sz via damon_start().  Fix
the path by adding the is_power_of_2() check on damon_start().

The issue was discovered by sashiko [1].

Link: https://lore.kernel.org/20260411213638.77768-1-sj@kernel.org
Link: https://lore.kernel.org/20260403155530.64647-1-sj@kernel.org
Fixes: d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.18.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 days agomm/vmalloc: take vmap_purge_lock in shrinker
Uladzislau Rezki (Sony) [Mon, 13 Apr 2026 19:26:46 +0000 (21:26 +0200)] 
mm/vmalloc: take vmap_purge_lock in shrinker

decay_va_pool_node() can be invoked concurrently from two paths:
__purge_vmap_area_lazy() when pools are being purged, and the shrinker via
vmap_node_shrink_scan().

However, decay_va_pool_node() is not safe to run concurrently, and the
shrinker path currently lacks serialization, leading to races and possible
leaks.

Protect decay_va_pool_node() by taking vmap_purge_lock in the shrinker
path to ensure serialization with purge users.

Link: https://lore.kernel.org/20260413192646.14683-1-urezki@gmail.com
Fixes: 7679ba6b36db ("mm: vmalloc: add a shrinker to drain vmap pools")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <baoquan.he@linux.dev>
Cc: chenyichong <chenyichong@uniontech.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 days agomm: call ->free_folio() directly in folio_unmap_invalidate()
Matthew Wilcox (Oracle) [Mon, 13 Apr 2026 18:43:11 +0000 (19:43 +0100)] 
mm: call ->free_folio() directly in folio_unmap_invalidate()

We can only call filemap_free_folio() if we have a reference to (or hold a
lock on) the mapping.  Otherwise, we've already removed the folio from the
mapping so it no longer pins the mapping and the mapping can be removed,
causing a use-after-free when accessing mapping->a_ops.

Follow the same pattern as __remove_mapping() and load the free_folio
function pointer before dropping the lock on the mapping.  That lets us
make filemap_free_folio() static as this was the only caller outside
filemap.c.

Link: https://lore.kernel.org/20260413184314.3419945-1-willy@infradead.org
Fixes: fb7d3bc41493 ("mm/filemap: drop streaming/uncached pages when writeback completes")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-501448199@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 days agomm: blk-cgroup: fix use-after-free in cgwb_release_workfn()
Breno Leitao [Mon, 13 Apr 2026 10:09:19 +0000 (03:09 -0700)] 
mm: blk-cgroup: fix use-after-free in cgwb_release_workfn()

cgwb_release_workfn() calls css_put(wb->blkcg_css) and then later accesses
wb->blkcg_css again via blkcg_unpin_online().  If css_put() drops the last
reference, the blkcg can be freed asynchronously (css_free_rwork_fn ->
blkcg_css_free -> kfree) before blkcg_unpin_online() dereferences the
pointer to access blkcg->online_pin, resulting in a use-after-free:

  BUG: KASAN: slab-use-after-free in blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367)
  Write of size 4 at addr ff11000117aa6160 by task kworker/71:1/531
   Workqueue: cgwb_release cgwb_release_workfn
   Call Trace:
    <TASK>
     blkcg_unpin_online (./include/linux/instrumented.h:112 ./include/linux/atomic/atomic-instrumented.h:400 ./include/linux/refcount.h:389 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 block/blk-cgroup.c:1367)
     cgwb_release_workfn (mm/backing-dev.c:629)
     process_scheduled_works (kernel/workqueue.c:3278 kernel/workqueue.c:3385)

   Freed by task 1016:
    kfree (./include/linux/kasan.h:235 mm/slub.c:2689 mm/slub.c:6246 mm/slub.c:6561)
    css_free_rwork_fn (kernel/cgroup/cgroup.c:5542)
    process_scheduled_works (kernel/workqueue.c:3302 kernel/workqueue.c:3385)

** Stack based on commit 66672af7a095 ("Add linux-next specific files
for 20260410")

I am seeing this crash sporadically in Meta fleet across multiple kernel
versions.  A full reproducer is available at:
https://github.com/leitao/debug/blob/main/reproducers/repro_blkcg_uaf.sh

(The race window is narrow.  To make it easily reproducible, inject a
msleep(100) between css_put() and blkcg_unpin_online() in
cgwb_release_workfn().  With that delay and a KASAN-enabled kernel, the
reproducer triggers the splat reliably in less than a second.)

Fix this by moving blkcg_unpin_online() before css_put(), so the
cgwb's CSS reference keeps the blkcg alive while blkcg_unpin_online()
accesses it.

Link: https://lore.kernel.org/20260413-blkcg-v1-1-35b72622d16c@debian.org
Fixes: 59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: JP Kobryn <inwardvessel@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 days agomm/zone_device: do not touch device folio after calling ->folio_free()
Matthew Brost [Fri, 10 Apr 2026 23:03:46 +0000 (16:03 -0700)] 
mm/zone_device: do not touch device folio after calling ->folio_free()

The contents of a device folio can immediately change after calling
->folio_free(), as the folio may be reallocated by a driver with a
different order.  Instead of touching the folio again to extract the
pgmap, use the local stack variable when calling percpu_ref_put_many().

Link: https://lore.kernel.org/20260410230346.4009855-1-matthew.brost@intel.com
Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Balbir Singh <balbirs@nvidia.com>
Reviewed-by: Vishal Moola <vishal.moola@gmail.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 days agomm/damon/core: disallow time-quota setting zero esz
SeongJae Park [Tue, 7 Apr 2026 00:31:52 +0000 (17:31 -0700)] 
mm/damon/core: disallow time-quota setting zero esz

When the throughput of a DAMOS scheme is very slow, DAMOS time quota can
make the effective size quota smaller than damon_ctx->min_region_sz.  In
the case, damos_apply_scheme() will skip applying the action, because the
action is tried at region level, which requires >=min_region_sz size.
That is, the quota is effectively exceeded for the quota charge window.

Because no action will be applied, the total_charged_sz and
total_charged_ns are also not updated.  damos_set_effective_quota() will
try to update the effective size quota before starting the next charge
window.  However, because the total_charged_sz and total_charged_ns have
not updated, the throughput and effective size quota are also not changed.
Since effective size quota can only be decreased, other effective size
quota update factors including DAMOS quota goals and size quota cannot
make any change, either.

As a result, the scheme is unexpectedly deactivated until the user notices
and mitigates the situation.  The users can mitigate this situation by
changing the time quota online or re-install the scheme.  While the
mitigation is somewhat straightforward, finding the situation would be
challenging, because DAMON is not providing good observabilities for that.
Even if such observability is provided, doing the additional monitoring
and the mitigation is somewhat cumbersome and not aligned to the intention
of the time quota.  The time quota was intended to help reduce the user's
administration overhead.

Fix the problem by setting time quota-modified effective size quota be at
least min_region_sz always.

The issue was discovered [1] by sashiko.

Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org
Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 days agomm/mempolicy: fix weighted interleave auto sysfs name
Joshua Hahn [Tue, 7 Apr 2026 14:14:14 +0000 (07:14 -0700)] 
mm/mempolicy: fix weighted interleave auto sysfs name

The __ATTR macro is a utility that makes defining kobj_attributes easier
by stringfying the name, verifying the mode, and setting the show/store
fields in a single initializer.  It takes a raw token as the first value,
rather than a string, so that __ATTR family macros like __ATTR_RW can
token-paste it for inferring the _show / _store function names.

Commit e341f9c3c841 ("mm/mempolicy: Weighted Interleave Auto-tuning") used
the __ATTR macro to define the "auto" sysfs for weighted interleave.  A
few months later, commit 2fb6915fa22d ("compiler_types.h: add "auto" as a
macro for "__auto_type"") introduced a #define macro which expanded auto
into __auto_type.

This led to the "auto" token passed into __ATTR to be expanded out into
__auto_type, and the sysfs entry to be displayed as __auto_type as well.

Expand out the __ATTR macro and directly pass a string "auto" instead of
the raw token 'auto' to prevent it from being expanded out.  Also bypass
the VERIFY_OCTAL_PERMISSIONS check by triple checking that 0664 is indeed
the intended permissions for this sysfs file.

Before:
$ ls /sys/kernel/mm/mempolicy/weighted_interleave
__auto_type  node0

After:
$ ls /sys/kernel/mm/mempolicy/weighted_interleave/
auto  node0

Link: https://lore.kernel.org/20260407141415.3080960-1-joshua.hahnjy@gmail.com
Fixes: 2fb6915fa22d ("compiler_types.h: add "auto" as a macro for "__auto_type"")
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Rakie Kim <rakie.kim@sk.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agolib/crypto: docs: Add rst documentation to Documentation/crypto/
Eric Biggers [Sat, 18 Apr 2026 19:21:38 +0000 (12:21 -0700)] 
lib/crypto: docs: Add rst documentation to Documentation/crypto/

Add a documentation file Documentation/crypto/libcrypto.rst which
provides a high-level overview of lib/crypto/.

Also add several sub-pages which include the kernel-doc for the
algorithms that have it.  This makes the existing, quite extensive
kernel-doc start being included in the HTML and PDF documentation.

Note that the intent is very much *not* that everyone has to read these
Documentation/ files.  The library is intended to be straightforward and
use familiar conventions; generally it should be possible to dive right
into the kernel-doc.  You shouldn't need to read a lot of documentation
to just call `sha256()`, for example, or to run the unit tests if you're
already familiar with KUnit.  (This differs from the traditional crypto
API which has a larger barrier to entry.)

Nevertheless, this seems worth adding.  Hopefully it is useful and makes
LWN no longer consider the library to be "meticulously undocumented".

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20260418192138.15556-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
10 days agodocs: kdoc: Expand 'at_least' when creating parameter list
Eric Biggers [Sat, 18 Apr 2026 19:21:37 +0000 (12:21 -0700)] 
docs: kdoc: Expand 'at_least' when creating parameter list

sphinx doesn't know that the kernel headers do:

    #define at_least static

Do this replacement before declarations are passed to it.

This prevents errors like the following from appearing once the
lib/crypto/ kernel-doc is wired up to the sphinx build:

   linux/Documentation/crypto/libcrypto:128: ./include/crypto/sha2.h:773: WARNING: Error in declarator or parameters
Error in declarator or parameters
Invalid C declaration: Expected ']' in end of array operator. [error at 59]
  void sha512_final (struct sha512_ctx *ctx, u8 out[at_least SHA512_DIGEST_SIZE])

Acked-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20260418192138.15556-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
10 days agoMerge tag 'pinctrl-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 18 Apr 2026 23:59:09 +0000 (16:59 -0700)] 
Merge tag 'pinctrl-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "Core changes:

   - Perform basic checks on pin config properties so as not to allow
     directly contradictory settings such as setting a pin to more than
     one bias or drive mode

   - Handle input-threshold-voltage-microvolt property

   - Introduce pinctrl_gpio_get_config() handling in the core for SCMI
     GPIO using pin control

  New drivers:

   - GPIO-by-pin control driver (also appearing in the GPIO pull
     request) fulfilling a promise on a comment from Grant Likely many
     years ago: "can't GPIO just be a front-end for pin control?" it
     turns out it can, if and only if you design something new from
     scratch, such as SCMI

   - Broadcom BCM7038 as a pinctrl-single delegate

   - Mobileye EyeQ6Lplus OLB pin controller

   - Qualcomm Eliza and Hawi families TLMM pin controllers

   - Qualcomm SDM670 and Milos family LPASS LPI pin controllers

   - Qualcomm IPQ5210 pin controller

   - Realtek RTD1625 pin controller support

   - Rockchip RV1103B pin controller support

   - Texas Instruments AM62L as a pinctrl-single delegate

  Improvements:

   - Set config implementation for the Spacemit K1 pin controller"

* tag 'pinctrl-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (84 commits)
  pinctrl: qcom: Add Hawi pinctrl driver
  dt-bindings: pinctrl: qcom: Describe Hawi TLMM block
  dt-bindings: pinctrl: pinctrl-max77620: convert to DT schema
  pinctrl: single: Add bcm7038-padconf compatible matching
  dt-bindings: pinctrl: pinctrl-single: Add brcm,bcm7038-padconf
  dt-bindings: pinctrl: apple,pinctrl: Add t8122 compatible
  pinctrl: qcom: sdm670-lpass-lpi: label variables as static
  pinctrl: sophgo: pinctrl-sg2044: Fix wrong module description
  pinctrl: sophgo: pinctrl-sg2042: Fix wrong module description
  pinctrl: qcom: add sdm670 lpi tlmm
  dt-bindings: pinctrl: qcom: Add SDM670 LPASS LPI pinctrl
  dt-bindings: qcom: lpass-lpi-common: add reserved GPIOs property
  pinctrl: qcom: Introduce IPQ5210 TLMM driver
  dt-bindings: pinctrl: qcom: add IPQ5210 pinctrl
  pinctrl: qcom: Drop redundant intr_target_reg on modern SoCs
  pinctrl: qcom: eliza: Fix interrupt target bit
  pinctrl: core: Don't use "proxy" headers
  pinctrl: amd: Support new ACPI ID AMDI0033
  pinctrl: renesas: rzg2l: Drop superfluous blank line
  pinctrl: renesas: rzg2l: Fix save/restore of {IOLH,IEN,PUPD,SMT} registers
  ...

10 days agoMerge tag 'i3c/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Linus Torvalds [Sat, 18 Apr 2026 23:48:13 +0000 (16:48 -0700)] 
Merge tag 'i3c/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
 "Subsystem:
   - add sysfs option to rescan bus via entdaa
   - fix error code handling for send_ccc_cmd

  Drivers:
   - mipi-i3c-hci-pci: Intel Nova Lake-H I3C support"

* tag 'i3c/for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: (22 commits)
  i3c: mipi-i3c-hci: fix IBI payload length calculation for final status
  i3c: master: adi: Fix error propagation for CCCs
  i3c: master: Fix error codes at send_ccc_cmd
  i3c: master: Move bus_init error suppression
  i3c: master: Move entdaa error suppression
  i3c: master: Move rstdaa error suppression
  i3c: dw: Simplify xfer cleanup with __free(kfree)
  i3c: dw: Fix memory leak in dw_i3c_master_i3c_xfers()
  i3c: master: renesas: Use __free(kfree) for xfer cleanup in renesas_i3c_send_ccc_cmd()
  i3c: master: renesas: Fix memory leak in renesas_i3c_i3c_xfers()
  i3c: master: dw-i3c: Balance PM runtime usage count on probe failure
  i3c: master: dw-i3c: Fix missing reset assertion in remove() callback
  i3c: mipi-i3c-hci-pci: Enable IBI while runtime suspended for Intel controllers
  i3c: mipi-i3c-hci-pci: Add optional ability to manage child runtime PM
  i3c: mipi-i3c-hci: Allow parent to manage runtime PM
  i3c: mipi-i3c-hci: Add quirk to allow IBI while runtime suspended
  i3c: mipi-i3c-hci-pci: Set d3hot_delay to 0 for Intel controllers
  i3c: fix missing newline in dev_err messages
  i3c: master: use kzalloc_flex
  i3c: mipi-i3c-hci-pci: Add support for Intel Nova Lake-H I3C
  ...

10 days agoeventfs: Hold eventfs_mutex and SRCU when remount walks events
David Carlier [Sat, 18 Apr 2026 19:17:37 +0000 (20:17 +0100)] 
eventfs: Hold eventfs_mutex and SRCU when remount walks events

Commit 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the
events descriptor") had eventfs_set_attrs() recurse through ei->children
on remount.  The walk only holds the rcu_read_lock() taken by
tracefs_apply_options() over tracefs_inodes, which is wrong:

  - list_for_each_entry over ei->children races with the list_del_rcu()
    in eventfs_remove_rec() -- LIST_POISON1 deref, same shape as
    d2603279c7d6.
  - eventfs_inodes are freed via call_srcu(&eventfs_srcu, ...).
    rcu_read_lock() does not extend an SRCU grace period, so ti->private
    can be reclaimed under the walk.
  - The writes to ei->attr race with eventfs_set_attr(), which holds
    eventfs_mutex.

Reproducer:

  while :; do mount -o remount,uid=$((RANDOM%1000)) /sys/kernel/tracing; done &
  while :; do
      echo "p:kp submit_bio" > /sys/kernel/tracing/kprobe_events
      echo > /sys/kernel/tracing/kprobe_events
  done

Wrap the events portion of tracefs_apply_options() in
eventfs_remount_lock()/_unlock() that take eventfs_mutex and
srcu_read_lock(&eventfs_srcu).  eventfs_set_attrs() doesn't sleep so the
nested rcu_read_lock() is fine; lockdep_assert_held() pins the contract.

Comment in tracefs_drop_inode() said "RCU cycle" -- it is SRCU.

Fixes: 340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the events descriptor")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260418191737.10289-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
10 days agoeventfs: Use list_add_tail_rcu() for SRCU-protected children list
David Carlier [Sat, 18 Apr 2026 15:22:50 +0000 (16:22 +0100)] 
eventfs: Use list_add_tail_rcu() for SRCU-protected children list

Commit d2603279c7d6 ("eventfs: Use list_del_rcu() for SRCU protected
list variable") converted the removal side to pair with the
list_for_each_entry_srcu() walker in eventfs_iterate(). The insertion
in eventfs_create_dir() was left as a plain list_add_tail(), which on
weakly-ordered architectures can expose a new entry to the SRCU reader
before its list pointers and fields are observable.

Use list_add_tail_rcu() so the publication pairs with the existing
list_del_rcu() and list_for_each_entry_srcu().

Fixes: 43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260418152251.199343-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
10 days agof2fs: add page-order information for large folio reads in iostat
Daniel Lee [Fri, 17 Apr 2026 17:50:40 +0000 (10:50 -0700)] 
f2fs: add page-order information for large folio reads in iostat

Track read folio counts by order in F2FS iostat sysfs and tracepoints.

Signed-off-by: Daniel Lee <chullee@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
10 days agosctp: fix OOB write to userspace in sctp_getsockopt_peer_auth_chunks
Michael Bommarito [Thu, 16 Apr 2026 03:19:03 +0000 (23:19 -0400)] 
sctp: fix OOB write to userspace in sctp_getsockopt_peer_auth_chunks

sctp_getsockopt_peer_auth_chunks() checks that the caller's optval
buffer is large enough for the peer AUTH chunk list with

    if (len < num_chunks)
            return -EINVAL;

but then writes num_chunks bytes to p->gauth_chunks, which lives
at offset offsetof(struct sctp_authchunks, gauth_chunks) == 8
inside optval.  The check is missing the sizeof(struct
sctp_authchunks) = 8-byte header.  When the caller supplies
len == num_chunks (for any num_chunks > 0) the test passes but
copy_to_user() writes sizeof(struct sctp_authchunks) = 8 bytes
past the declared buffer.

The sibling function sctp_getsockopt_local_auth_chunks() at the
next line already has the correct check:

    if (len < sizeof(struct sctp_authchunks) + num_chunks)
            return -EINVAL;

Align the peer variant with its sibling.

Reproducer confirms on v7.0-13-generic: an unprivileged userspace
caller that opens a loopback SCTP association with AUTH enabled,
queries num_chunks with a short optval, then issues the real
getsockopt with len == num_chunks and sentinel bytes painted past
the buffer observes those sentinel bytes overwritten with the
peer's AUTH chunk type.  The bytes written are under the peer's
control but land in the caller's own userspace; this is not a
kernel memory corruption, but it is a kernel-side contract
violation that can silently corrupt adjacent userspace data.

Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20260416031903.1447072-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: ks8851: Avoid excess softirq scheduling
Marek Vasut [Wed, 15 Apr 2026 23:09:45 +0000 (01:09 +0200)] 
net: ks8851: Avoid excess softirq scheduling

The code injects a packet into netif_rx() repeatedly, which will add
it to its internal NAPI and schedule a softirq, and process it. It is
more efficient to queue multiple packets and process them all at the
local_bh_enable() time.

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: e0863634bf9f ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Vasut <marex@nabladev.com>
Link: https://patch.msgid.link/20260415231020.455298-2-marex@nabladev.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: ks8851: Reinstate disabling of BHs around IRQ handler
Marek Vasut [Wed, 15 Apr 2026 23:09:44 +0000 (01:09 +0200)] 
net: ks8851: Reinstate disabling of BHs around IRQ handler

If the driver executes ks8851_irq() AND a TX packet has been sent, then
the driver enables TX queue via netif_wake_queue() which schedules TX
softirq to queue packets for this device.

If CONFIG_PREEMPT_RT=y is set AND a packet has also been received by
the MAC, then ks8851_rx_pkts() calls netdev_alloc_skb_ip_align() to
allocate SKBs for the received packets. If netdev_alloc_skb_ip_align()
is called with BH enabled, then local_bh_enable() at the end of
netdev_alloc_skb_ip_align() will trigger the pending softirq processing,
which may ultimately call the .xmit callback ks8851_start_xmit_par().
The ks8851_start_xmit_par() will try to lock struct ks8851_net_par
.lock spinlock, which is already locked by ks8851_irq() from which
ks8851_start_xmit_par() was called. This leads to a deadlock, which
is reported by the kernel, including a trace listed below.

If CONFIG_PREEMPT_RT is not set, then since commit 0913ec336a6c0
("net: ks8851: Fix deadlock with the SPI chip variant") the deadlock
can also be triggered without received packet in the RX FIFO. The
pending softirqs will be processed on return from
spin_unlock_bh(&ks->statelock) in ks8851_irq(), which triggers the
deadlock as well.

Fix the problem by disabling BH around critical sections, including the
IRQ handler, thus preventing the net_tx_action() softirq from triggering
during these critical sections. The net_tx_action() softirq is triggered
once BH are re-enabled and at the end of the IRQ handler, once all the
other IRQ handler actions have been completed.

 __schedule from schedule_rtlock+0x1c/0x34
 schedule_rtlock from rtlock_slowlock_locked+0x548/0x904
 rtlock_slowlock_locked from rt_spin_lock+0x60/0x9c
 rt_spin_lock from ks8851_start_xmit_par+0x74/0x1a8
 ks8851_start_xmit_par from netdev_start_xmit+0x20/0x44
 netdev_start_xmit from dev_hard_start_xmit+0xd0/0x188
 dev_hard_start_xmit from sch_direct_xmit+0xb8/0x25c
 sch_direct_xmit from __qdisc_run+0x1f8/0x4ec
 __qdisc_run from qdisc_run+0x1c/0x28
 qdisc_run from net_tx_action+0x1f0/0x268
 net_tx_action from handle_softirqs+0x1a4/0x270
 handle_softirqs from __local_bh_enable_ip+0xcc/0xe0
 __local_bh_enable_ip from __alloc_skb+0xd8/0x128
 __alloc_skb from __netdev_alloc_skb+0x3c/0x19c
 __netdev_alloc_skb from ks8851_irq+0x388/0x4d4
 ks8851_irq from irq_thread_fn+0x24/0x64
 irq_thread_fn from irq_thread+0x178/0x28c
 irq_thread from kthread+0x12c/0x138
 kthread from ret_from_fork+0x14/0x28

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: e0863634bf9f ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Vasut <marex@nabladev.com>
Link: https://patch.msgid.link/20260415231020.455298-1-marex@nabladev.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoaf_unix: Drop all SCM attributes for SOCKMAP.
Kuniyuki Iwashima [Wed, 15 Apr 2026 18:48:29 +0000 (18:48 +0000)] 
af_unix: Drop all SCM attributes for SOCKMAP.

SOCKMAP can hide inflight fd from AF_UNIX GC.

When a socket in SOCKMAP receives skb with inflight fd,
sk_psock_verdict_data_ready() looks up the mapped socket and
enqueue skb to its psock->ingress_skb.

Since neither the old nor the new GC can inspect the psock
queue, the hidden skb leaks the inflight sockets.  Note that
this cannot be detected via kmemleak because inflight sockets
are linked to a global list.

In addition, SOCKMAP redirect breaks the Tarjan-based GC's
assumption that unix_edge.successor is always alive, which
is no longer true once skb is redirected, resulting in
use-after-free below. [0]

Moreover, SOCKMAP does not call scm_stat_del() properly,
so unix_show_fdinfo() could report an incorrect fd count.

sk_msg_recvmsg() does not support any SCM attributes in the
first place.

Let's drop all SCM attributes before passing skb to the
SOCKMAP layer.

[0]:
BUG: KASAN: slab-use-after-free in unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251)
Read of size 8 at addr ffff888125362670 by task kworker/56:1/496

CPU: 56 UID: 0 PID: 496 Comm: kworker/56:1 Not tainted 7.0.0-rc7-00263-gb9d8b856689d #3 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Workqueue: events sk_psock_backlog
Call Trace:
 <TASK>
 dump_stack_lvl (lib/dump_stack.c:122)
 print_report (mm/kasan/report.c:379)
 kasan_report (mm/kasan/report.c:597)
 unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251)
 unix_destroy_fpl (net/unix/garbage.c:317)
 unix_destruct_scm (./include/net/scm.h:80 ./include/net/scm.h:86 net/unix/af_unix.c:1976)
 sk_psock_backlog (./include/linux/skbuff.h:?)
 process_scheduled_works (kernel/workqueue.c:?)
 worker_thread (kernel/workqueue.c:?)
 kthread (kernel/kthread.c:438)
 ret_from_fork (arch/x86/kernel/process.c:164)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
 </TASK>

Allocated by task 955:
 kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78)
 __kasan_slab_alloc (mm/kasan/common.c:369)
 kmem_cache_alloc_noprof (mm/slub.c:4539)
 sk_prot_alloc (net/core/sock.c:2240)
 sk_alloc (net/core/sock.c:2301)
 unix_create1 (net/unix/af_unix.c:1099)
 unix_create (net/unix/af_unix.c:1169)
 __sock_create (net/socket.c:1606)
 __sys_socketpair (net/socket.c:1811)
 __x64_sys_socketpair (net/socket.c:1863 net/socket.c:1860 net/socket.c:1860)
 do_syscall_64 (arch/x86/entry/syscall_64.c:?)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Freed by task 496:
 kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78)
 kasan_save_free_info (mm/kasan/generic.c:587)
 __kasan_slab_free (mm/kasan/common.c:287)
 kmem_cache_free (mm/slub.c:6165)
 __sk_destruct (net/core/sock.c:2282 net/core/sock.c:2384)
 sk_psock_destroy (./include/net/sock.h:?)
 process_scheduled_works (kernel/workqueue.c:?)
 worker_thread (kernel/workqueue.c:?)
 kthread (kernel/kthread.c:438)
 ret_from_fork (arch/x86/kernel/process.c:164)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:258)

Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types")
Reported-by: Xingyu Jin <xingyuj@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260415184830.3988432-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agonet: stmmac: Update default_an_inband before passing value to phylink_config
KhaiWenTan [Thu, 16 Apr 2026 10:26:09 +0000 (18:26 +0800)] 
net: stmmac: Update default_an_inband before passing value to phylink_config

get_interfaces() will update both the plat->phy_interfaces and
mdio_bus_data->default_an_inband based on reading a SERDES register. As
get_interfaces() will be called after default_an_inband had already been
read, dwmac-intel regressed as a result with incorrect default_an_inband
value in phylink_config.

Therefore, we moved the priv->plat->get_interfaces() to be executed first
before assigning priv->plat->default_an_inband to config->default_an_inband
to ensure default_an_inband is in correct value.

Fixes: d3836052fe09 ("net: stmmac: intel: convert speed_mode_2500() to get_interfaces()")
Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260416102609.7953-1-khai.wen.tan@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoipv6: fix possible UAF in icmpv6_rcv()
Eric Dumazet [Thu, 16 Apr 2026 10:35:05 +0000 (10:35 +0000)] 
ipv6: fix possible UAF in icmpv6_rcv()

Caching saddr and daddr before pskb_pull() is problematic
since skb->head can change.

Remove these temporary variables:

- We only access &ipv6_hdr(skb)->saddr and &ipv6_hdr(skb)->daddr
  when net_dbg_ratelimited() is called in the slow path.

- Avoid potential future misuse after pskb_pull() call.

Fixes: 4b3418fba0fe ("ipv6: icmp: include addresses in debug messages")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260416103505.2380753-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge branch 'intel-wired-lan-driver-updates-2026-04-14-ice-i40e-iavf-idpf-e1000e'
Jakub Kicinski [Sat, 18 Apr 2026 19:01:41 +0000 (12:01 -0700)] 
Merge branch 'intel-wired-lan-driver-updates-2026-04-14-ice-i40e-iavf-idpf-e1000e'

Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2026-04-14 (ice, i40e, iavf, e1000e)

Grzegorz updates the logic for adjusting the PTP hardware clock on E830,
fixing a bug that prevented adjustments below S32_MAX/MIN nanoseconds.

Grzegorz and Zoli update the PCS latency settings for E825 devices at 10GbE
and 25GbE, improving the accuracy of timestamps based on data from
production hardware.

Michal Schmidt fixes a double-free that could happen if a particular error
path is taken in ice_xmit_frame_ring().

Guangshuo fixes a double-free that could happen during error paths in the
ice_sf_eth_activate() function.

Paul Greenwalt fixes the PHY link configuration when the link-down-on-close
driver parameter is enabled and new media is inserted.

Paul Greenwalt fixes the ICE_AQ_LINK_SPEED_M macro for 200G, enabling 200G
link speed advertisement.

Keita Morisaki fixes a race condition in the ice Tx timestamp ring cleanup,
preventing a possible NULL pointer dereference.

Kohei Enju fixes a potential NULL pointer dereference in ice_set_ring_param().

Kohei Enju fixes i40e to stop advertising IFF_SUPP_NOFCS, when the driver
does not actually support the feature.

Petr fixes the VLAN L2TAG2 mask when the iAVF VF and a PF negotiate use of
the legacy Rx descriptor format.

Matt fixes the unrolling logic for PTP when the e1000e probe fails after
the PTP clock has been registered.

 **A note to stable backports**

  The patches [7/12] ("ice: fix race condition in TX timestamp ring
  cleanup") and [8/12] ("ice: fix potential NULL pointer deref in error
  path of ice_set_ringparam()") must be backported together. Otherwise the
  fix in patch 8 will not work properly.
====================

Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-0-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoe1000e: Unroll PTP in probe error handling
Matt Vollrath [Fri, 17 Apr 2026 00:53:36 +0000 (17:53 -0700)] 
e1000e: Unroll PTP in probe error handling

If probe fails after registering the PTP clock and its delayed work,
these resources must be released.

This was not an issue until a 2016 fix moved the e1000e_ptp_init() call
before the jump to err_register.

Fixes: aa524b66c5ef ("e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl")
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-12-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoiavf: fix wrong VLAN mask for legacy Rx descriptors L2TAG2
Petr Oros [Fri, 17 Apr 2026 00:53:34 +0000 (17:53 -0700)] 
iavf: fix wrong VLAN mask for legacy Rx descriptors L2TAG2

The IAVF_RXD_LEGACY_L2TAG2_M mask was incorrectly defined as
GENMASK_ULL(63, 32), extracting 32 bits from qw2 instead of the
16-bit VLAN tag. In the legacy Rx descriptor layout, the 2nd L2TAG2
(VLAN tag) occupies bits 63:48 of qw2, not 63:32.

The oversized mask causes FIELD_GET to return a 32-bit value where the
actual VLAN tag sits in bits 31:16. When this value is passed to
iavf_receive_skb() as a u16 parameter, it gets truncated to the lower
16 bits (which contain the 1st L2TAG2, typically zero). As a result,
__vlan_hwaccel_put_tag() is never called and software VLAN interfaces
on VFs receive no traffic.

This affects VFs behind ice PF (VIRTCHNL VLAN v2) when the PF
advertises VLAN stripping into L2TAG2_2 and legacy descriptors are
used.

The flex descriptor path already uses the correct mask
(IAVF_RXD_FLEX_L2TAG2_2_M = GENMASK_ULL(63, 48)).

Reproducer:
 1. Create 2 VFs on ice PF (echo 2 > sriov_numvfs)
 2. Disable spoofchk on both VFs
 3. Move each VF into a separate network namespace
 4. On each VF: create VLAN interface (e.g. vlan 198), assign IP,
    bring up
 5. Set rx-vlan-offload OFF on both VFs
 6. Ping between VLAN interfaces -> expect PASS
    (VLAN tag stays in packet data, kernel matches in-band)
 7. Set rx-vlan-offload ON on both VFs
 8. Ping between VLAN interfaces -> expect FAIL if bug present
    (HW strips VLAN tag into descriptor L2TAG2 field, wrong mask
    extracts bits 47:32 instead of 63:48, truncated to u16 -> zero,
    __vlan_hwaccel_put_tag() never called, packet delivered to parent
    interface, not VLAN interface)

The reproducer requires legacy Rx descriptors. On modern ice + iavf
with full PTP support, flex descriptors are always negotiated and the
buggy legacy path is never reached. Flex descriptors require all of:
 - CONFIG_PTP_1588_CLOCK enabled
 - VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC granted by PF
 - PTP capabilities negotiated (VIRTCHNL_VF_CAP_PTP)
 - VIRTCHNL_1588_PTP_CAP_RX_TSTAMP supported
 - VIRTCHNL_RXDID_2_FLEX_SQ_NIC present in DDP profile

If any condition is not met, iavf_select_rx_desc_format() falls back
to legacy descriptors (RXDID=1) and the wrong L2TAG2 mask is hit.

Fixes: 2dc8e7c36d80 ("iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-10-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoi40e: don't advertise IFF_SUPP_NOFCS
Kohei Enju [Fri, 17 Apr 2026 00:53:33 +0000 (17:53 -0700)] 
i40e: don't advertise IFF_SUPP_NOFCS

i40e advertises IFF_SUPP_NOFCS, allowing users to use the SO_NOFCS
socket option. However, this option is silently ignored, as the driver
does not check skb->no_fcs, and always enables FCS insertion offload.

Fix this by removing the advertisement of IFF_SUPP_NOFCS.

This behavior can be reproduced with a simple AF_PACKET socket:

  import socket
  s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
  s.setsockopt(socket.SOL_SOCKET, 43, 1) # SO_NOFCS
  s.bind(("eth0", 0))
  s.send(b'\xff' * 64)

Previously, send() succeeds but the driver ignores SO_NOFCS.
With this change, send() fails with -EPROTONOSUPPORT, as expected.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-9-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: fix potential NULL pointer deref in error path of ice_set_ringparam()
Kohei Enju [Fri, 17 Apr 2026 00:53:32 +0000 (17:53 -0700)] 
ice: fix potential NULL pointer deref in error path of ice_set_ringparam()

ice_set_ringparam nullifies tstamp_ring of temporary tx_rings, without
clearing ICE_TX_RING_FLAGS_TXTIME bit.
When ICE_TX_RING_FLAGS_TXTIME is set and the subsequent
ice_setup_tx_ring() call fails, a NULL pointer dereference could happen
in the unwinding sequence:

ice_clean_tx_ring()
-> ice_is_txtime_cfg() == true (ICE_TX_RING_FLAGS_TXTIME is set)
-> ice_free_tx_tstamp_ring()
  -> ice_free_tstamp_ring()
    -> tstamp_ring->desc (NULL deref)

Clear ICE_TX_RING_FLAGS_TXTIME bit to avoid the potential issue.

Note that this potential issue is found by manual code review.
Compile test only since unfortunately I don't have E830 devices.

Fixes: ccde82e90946 ("ice: add E830 Earliest TxTime First Offload support")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-8-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: fix race condition in TX timestamp ring cleanup
Keita Morisaki [Fri, 17 Apr 2026 00:53:31 +0000 (17:53 -0700)] 
ice: fix race condition in TX timestamp ring cleanup

Fix a race condition between ice_free_tx_tstamp_ring() and ice_tx_map()
that can cause a NULL pointer dereference.

ice_free_tx_tstamp_ring currently clears the ICE_TX_FLAGS_TXTIME flag
after NULLing the tstamp_ring. This could allow a concurrent ice_tx_map
call on another CPU to dereference the tstamp_ring, which could lead to
a NULL pointer dereference.

  CPU A:ice_free_tx_tstamp_ring() | CPU B:ice_tx_map()
  --------------------------------|---------------------------------
  tx_ring->tstamp_ring = NULL     |
                                  | ice_is_txtime_cfg() -> true
                                  | tstamp_ring = tx_ring->tstamp_ring
                                  | tstamp_ring->count  // NULL deref!
  flags &= ~ICE_TX_FLAGS_TXTIME   |

Fix by:
1. Reordering ice_free_tx_tstamp_ring() to clear the flag before
   NULLing the pointer, with smp_wmb() to ensure proper ordering.
2. Adding smp_rmb() in ice_tx_map() after the flag check to order the
   flag read before the pointer read, using READ_ONCE() for the
   pointer, and adding a NULL check as a safety net.
3. Converting tx_ring->flags from u8 to DECLARE_BITMAP() and using
   atomic bitops (set_bit(), clear_bit(), test_bit()) for all flag
   operations throughout the driver:
   - ICE_TX_RING_FLAGS_XDP
   - ICE_TX_RING_FLAGS_VLAN_L2TAG1
   - ICE_TX_RING_FLAGS_VLAN_L2TAG2
   - ICE_TX_RING_FLAGS_TXTIME

Fixes: ccde82e909467 ("ice: add E830 Earliest TxTime First Offload support")
Signed-off-by: Keita Morisaki <kmta1236@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-7-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: fix ICE_AQ_LINK_SPEED_M for 200G
Paul Greenwalt [Fri, 17 Apr 2026 00:53:30 +0000 (17:53 -0700)] 
ice: fix ICE_AQ_LINK_SPEED_M for 200G

When setting PHY configuration during driver initialization, 200G link
speed is not being advertised even when the PHY is capable. This is
because the get PHY capabilities link speed response is being masked by
ICE_AQ_LINK_SPEED_M, which does not include the 200G link speed bit.

ICE_AQ_LINK_SPEED_200GB is defined as BIT(11), but the mask 0x7FF only
covers bits 0-10. Fix ICE_AQ_LINK_SPEED_M to use GENMASK(11, 0) so
that it covers all defined link speed bits including 200G.

Fixes: 24407a01e57c ("ice: Add 200G speed/phy type use")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-6-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: fix PHY config on media change with link-down-on-close
Paul Greenwalt [Fri, 17 Apr 2026 00:53:29 +0000 (17:53 -0700)] 
ice: fix PHY config on media change with link-down-on-close

Commit 1a3571b5938c ("ice: restore PHY settings on media insertion")
introduced separate flows for setting PHY configuration on media
present: ice_configure_phy() when link-down-on-close is disabled, and
ice_force_phys_link_state() when enabled. The latter incorrectly uses
the previous configuration even after module change, causing link
issues such as wrong speed or no link.

Unify PHY configuration into a single ice_phy_cfg() function with a
link_en parameter, ensuring PHY capabilities are always fetched fresh
from hardware.

Fixes: 1a3571b5938c ("ice: restore PHY settings on media insertion")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-5-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: fix double-free of tx_buf skb
Michal Schmidt [Fri, 17 Apr 2026 00:53:28 +0000 (17:53 -0700)] 
ice: fix double-free of tx_buf skb

If ice_tso() or ice_tx_csum() fail, the error path in
ice_xmit_frame_ring() frees the skb, but the 'first' tx_buf still points
to it and is marked as valid (ICE_TX_BUF_SKB).
'next_to_use' remains unchanged, so the potential problem will
likely fix itself when the next packet is transmitted and the tx_buf
gets overwritten. But if there is no next packet and the interface is
brought down instead, ice_clean_tx_ring() -> ice_unmap_and_free_tx_buf()
will find the tx_buf and free the skb for the second time.

The fix is to reset the tx_buf type to ICE_TX_BUF_EMPTY in the error
path, so that ice_unmap_and_free_tx_buf().
Move the initialization of 'first' up, to ensure it's already valid in
case we hit the linearization error path.

The bug was spotted by AI while I had it looking for something else.
It also proposed an initial version of the patch.

I reproduced the bug and tested the fix by adding code to inject
failures, on a build with KASAN.

I looked for similar bugs in related Intel drivers and did not find any.

Fixes: d76a60ba7afb ("ice: Add support for VLANs and offloads")
Assisted-by: Claude:claude-4.6-opus-high Cursor
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-4-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: fix double free in ice_sf_eth_activate() error path
Guangshuo Li [Fri, 17 Apr 2026 00:53:27 +0000 (17:53 -0700)] 
ice: fix double free in ice_sf_eth_activate() error path

When auxiliary_device_add() fails, ice_sf_eth_activate() jumps to
aux_dev_uninit and calls auxiliary_device_uninit(&sf_dev->adev).

The device release callback ice_sf_dev_release() frees sf_dev, but
the current error path falls through to sf_dev_free and calls
kfree(sf_dev) again, causing a double free.

Keep kfree(sf_dev) for the auxiliary_device_init() failure path, but
avoid falling through to sf_dev_free after auxiliary_device_uninit().

Fixes: 13acc5c4cdbe ("ice: subfunction activation and base devlink ops")
Cc: stable@vger.kernel.org
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-3-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: update PCS latency settings for E825 10G/25Gb modes
Grzegorz Nitka [Fri, 17 Apr 2026 00:53:26 +0000 (17:53 -0700)] 
ice: update PCS latency settings for E825 10G/25Gb modes

Update MAC Rx/Tx offset registers settings (PHY_MAC_[RX|TX]_OFFSET
registers) with the data obtained with the latest research. It applies
to PCS latency settings for the following speeds/modes:
* 10Gb NO-FEC
        - TX latency changed from 71.25 ns to 73 ns
        - RX latency changed from -25.6 ns to -28 ns
* 25Gb NO-FEC
- TX latency changed from 28.17 ns to 33 ns
        - RX latency changed from -12.45 ns to -12 ns
* 25Gb RS-FEC
        - TX latency changed from 64.5 ns to 69 ns
        - RX latency changed from -3.6 ns to -3 ns

The original data came from simulation and pre-production hardware.
The new data measures the actual delays and as such is more accurate.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Co-developed-by: Zoltan Fodor <zoltan.fodor@intel.com>
Signed-off-by: Zoltan Fodor <zoltan.fodor@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-2-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoice: fix 'adjust' timer programming for E830 devices
Grzegorz Nitka [Fri, 17 Apr 2026 00:53:25 +0000 (17:53 -0700)] 
ice: fix 'adjust' timer programming for E830 devices

Fix incorrect 'adjust the timer' programming sequence for E830 devices
series. Only shadow registers GLTSYN_SHADJ were programmed in the
current implementation. According to the specification [1], write to
command GLTSYN_CMD register is also required with CMD field set to
"Adjust the Time" value, for the timer adjustment to take the effect.

The flow was broken for the adjustment less than S32_MAX/MIN range
(around +/- 2 seconds). For bigger adjustment, non-atomic programming
flow is used, involving set timer programming. Non-atomic flow is
implemented correctly.

Testing hints:
Run command:
phc_ctl /dev/ptpX get adj 2 get
Expected result:
Returned timestamps differ at least by 2 seconds

[1] Intel® Ethernet Controller E830 Datasheet rev 1.3, chapter 9.7.5.4
https://cdrdv2.intel.com/v1/dl/getContent/787353?explicitVersion=true

Fixes: f00307522786 ("ice: Implement PTP support for E830 devices")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260416-iwl-net-submission-2026-04-14-v2-1-686c33c9828d@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge tag 'ovpn-net-20260417' of https://github.com/OpenVPN/ovpn-net-next
Jakub Kicinski [Sat, 18 Apr 2026 18:44:11 +0000 (11:44 -0700)] 
Merge tag 'ovpn-net-20260417' of https://github.com/OpenVPN/ovpn-net-next

Antonio Quartulli says:

====================
This batch includes only fixes to the selftest harness:
* switch to TAP test orchestration
* parse slurped notifications as returned by jq -s
* add ovpn_ prefix to helpers and global variables to avoid clashes
* fail test in case of netlink notification mismatch
* add missing kernel config dependencies
* add delay when launching multiple ynl/cli.py listeners

* tag 'ovpn-net-20260417' of https://github.com/OpenVPN/ovpn-net-next:
  selftests: ovpn: serialize YNL listener startup
  selftests: ovpn: align command flow with TAP
  selftests: ovpn: add prefix to helpers and shared variables
  selftests: ovpn: flatten slurped notification JSON before filtering
  selftests: ovpn: fail notification check on mismatch
  selftests: ovpn: add nftables config dependencies for test-mark
====================

Link: https://patch.msgid.link/20260417090305.2775723-1-antonio@openvpn.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agoMerge tag 'parisc-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/delle...
Linus Torvalds [Sat, 18 Apr 2026 18:37:36 +0000 (11:37 -0700)] 
Merge tag 'parisc-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc architecture updates from Helge Deller:

 - A fix to make modules on 32-bit parisc architecture work again

 - Drop ip_fast_csum() inline assembly to avoid unaligned memory
   accesses

 - Allow to build kernel without 32-bit VDSO

 - Reference leak fix in error path in LED driver

* tag 'parisc-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: led: fix reference leak on failed device registration
  module.lds.S: Fix modules on 32-bit parisc architecture
  parisc: Allow to build without VDSO32
  parisc: Include 32-bit VDSO only when building for 32-bit or compat mode
  parisc: Allow to disable COMPAT mode on 64-bit kernel
  parisc: Fix default stack size when COMPAT=n
  parisc: Fix signal code to depend on CONFIG_COMPAT instead of CONFIG_64BIT
  parisc: is_compat_task() shall return false for COMPAT=n
  parisc: Avoid compat syscalls when COMPAT=n
  parisc: _llseek syscall is only available for 32-bit userspace
  parisc: Drop ip_fast_csum() inline assembly implementation
  parisc: update outdated comments for renamed ccio_alloc_consistent()

10 days agoMerge tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Sat, 18 Apr 2026 18:29:14 +0000 (11:29 -0700)] 
Merge tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock updates from Mike Rapoport:

 - improve debuggability of reserve_mem kernel parameter handling with
   print outs in case of a failure and debugfs info showing what was
   actually reserved

 - Make memblock_free_late() and free_reserved_area() use the same core
   logic for freeing the memory to buddy and ensure it takes care of
   updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled.

* tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  x86/alternative: delay freeing of smp_locks section
  memblock: warn when freeing reserved memory before memory map is initialized
  memblock, treewide: make memblock_free() handle late freeing
  memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y
  memblock: extract page freeing from free_reserved_area() into a helper
  memblock: make free_reserved_area() more robust
  mm: move free_reserved_area() to mm/memblock.c
  powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact()
  powerpc: fadump: pair alloc_pages_exact() with free_pages_exact()
  memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name()
  memblock: move reserve_bootmem_range() to memblock.c and make it static
  memblock: Add reserve_mem debugfs info
  memblock: Print out errors on reserve_mem parser

10 days agoMerge branch 'tcp-take-care-of-tcp_get_timestamping_opt_stats-races'
Jakub Kicinski [Sat, 18 Apr 2026 18:10:15 +0000 (11:10 -0700)] 
Merge branch 'tcp-take-care-of-tcp_get_timestamping_opt_stats-races'

Eric Dumazet says:

====================
tcp: take care of tcp_get_timestamping_opt_stats() races

tcp_get_timestamping_opt_stats() does not own the socket lock,
this is intentional.

It calls tcp_get_info_chrono_stats() while other threads could
change chrono fields in tcp_chrono_set(). It also reads many
tcp socket fields that can be modified by other cpus/threads.

I do not think we need coherent TCP socket state snapshot
in tcp_get_timestamping_opt_stats().

Add READ_ONCE()/WRITE_ONCE() or data_race() annotations.

Note that icsk_ca_state is a bitfield, thus not covered
in this series.
====================

Link: https://patch.msgid.link/20260416200319.3608680-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomailbox: mailbox-test: make data_ready a per-instance variable
Wolfram Sang [Fri, 17 Apr 2026 07:42:36 +0000 (09:42 +0200)] 
mailbox: mailbox-test: make data_ready a per-instance variable

While not the default case, multiple tests can be run simultaneously.
Then, data_ready being a global variable will be overwritten and the
per-instance lock will not help. Turn the global variable into a
per-instance one to avoid this problem.

Fixes: e339c80af95e ("mailbox: mailbox-test: don't rely on rx_buffer content to signal data ready")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
10 days agotcp: annotate data-races around tp->plb_rehash
Eric Dumazet [Thu, 16 Apr 2026 20:03:19 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->plb_rehash

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: 29c1c44646ae ("tcp: add u32 counter in tcp_sock and an SNMP counter for PLB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-15-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomailbox: mailbox-test: initialize struct earlier
Wolfram Sang [Fri, 17 Apr 2026 07:42:35 +0000 (09:42 +0200)] 
mailbox: mailbox-test: initialize struct earlier

The waitqueue must be initialized before the debugfs files are created
because from that time, requests from userspace can already be made.
Similarily, drvdata and spinlock needs to be initialized before we
request the channel, otherwise dangling irqs might run into problems
like a NULL pointer exception.

Fixes: 8ea4484d0c2b ("mailbox: Add generic mechanism for testing Mailbox Controllers")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
10 days agomailbox: mailbox-test: don't free the reused channel
Wolfram Sang [Fri, 17 Apr 2026 07:42:34 +0000 (09:42 +0200)] 
mailbox: mailbox-test: don't free the reused channel

The RX channel can be aliased to the TX channel if it has a different
MMIO. This special case needs to be handled when freeing the channels
otherwise a double-free occurs.

Fixes: 8ea4484d0c2b ("mailbox: Add generic mechanism for testing Mailbox Controllers")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
10 days agomailbox: mailbox-test: handle channel errors consistently
Wolfram Sang [Fri, 17 Apr 2026 07:42:33 +0000 (09:42 +0200)] 
mailbox: mailbox-test: handle channel errors consistently

mbox_test_request_channel() returns either an ERR_PTR or NULL. The
callers, however, mostly checked for non-NULL which allows for bogus
code paths when an ERR_PTR is treated like a valid channel. A later
commit tried to fix it in one place but missed the other ones. Because
the ERR_PTR is only used for -ENOMEM once and is converted to
-EPROBE_DEFER anyhow, convert the callee to only return NULL which
simplifies handling a lot and makes it less error prone.

Fixes: 8ea4484d0c2b ("mailbox: Add generic mechanism for testing Mailbox Controllers")
Fixes: 9b63a810c6f9 ("mailbox: mailbox-test: Fix an error check in mbox_test_probe()")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
10 days agotcp: annotate data-races around (tp->write_seq - tp->snd_nxt)
Eric Dumazet [Thu, 16 Apr 2026 20:03:18 +0000 (20:03 +0000)] 
tcp: annotate data-races around (tp->write_seq - tp->snd_nxt)

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() annotations to keep KCSAN happy.

WRITE_ONCE() annotations are already present.

Fixes: e08ab0b377a1 ("tcp: add bytes not sent to SCM_TIMESTAMPING_OPT_STATS")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-14-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->timeout_rehash
Eric Dumazet [Thu, 16 Apr 2026 20:03:17 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->timeout_rehash

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: 32efcc06d2a1 ("tcp: export count for rehash attempts")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-13-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->srtt_us
Eric Dumazet [Thu, 16 Apr 2026 20:03:16 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->srtt_us

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: e8bd8fca6773 ("tcp: add SRTT to SCM_TIMESTAMPING_OPT_STATS")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->reord_seen
Eric Dumazet [Thu, 16 Apr 2026 20:03:15 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->reord_seen

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: 7ec65372ca53 ("tcp: add stat of data packet reordering events")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-11-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->dsack_dups
Eric Dumazet [Thu, 16 Apr 2026 20:03:14 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->dsack_dups

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: 7e10b6554ff2 ("tcp: add dsack blocks received stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->bytes_retrans
Eric Dumazet [Thu, 16 Apr 2026 20:03:13 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->bytes_retrans

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: fb31c9b9f6c8 ("tcp: add data bytes retransmitted stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->bytes_sent
Eric Dumazet [Thu, 16 Apr 2026 20:03:12 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->bytes_sent

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: ba113c3aa79a ("tcp: add data bytes sent stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: add data-race annotations for TCP_NLA_SNDQ_SIZE
Eric Dumazet [Thu, 16 Apr 2026 20:03:11 +0000 (20:03 +0000)] 
tcp: add data-race annotations for TCP_NLA_SNDQ_SIZE

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: 87ecc95d81d9 ("tcp: add send queue size stat in SCM_TIMESTAMPING_OPT_STATS")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->delivered and tp->delivered_ce
Eric Dumazet [Thu, 16 Apr 2026 20:03:10 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->delivered and tp->delivered_ce

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: feb5f2ec6464 ("tcp: export packets delivery info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races around tp->snd_ssthresh
Eric Dumazet [Thu, 16 Apr 2026 20:03:09 +0000 (20:03 +0000)] 
tcp: annotate data-races around tp->snd_ssthresh

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: 7156d194a077 ("tcp: add snd_ssthresh stat in SCM_TIMESTAMPING_OPT_STATS")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: add data-races annotations around tp->reordering, tp->snd_cwnd
Eric Dumazet [Thu, 16 Apr 2026 20:03:08 +0000 (20:03 +0000)] 
tcp: add data-races annotations around tp->reordering, tp->snd_cwnd

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE(), WRITE_ONCE() data_race() annotations to keep KCSAN happy.

Fixes: bb7c19f96012 ("tcp: add related fields into SCM_TIMESTAMPING_OPT_STATS")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: add data-race annotations around tp->data_segs_out and tp->total_retrans
Eric Dumazet [Thu, 16 Apr 2026 20:03:07 +0000 (20:03 +0000)] 
tcp: add data-race annotations around tp->data_segs_out and tp->total_retrans

tcp_get_timestamping_opt_stats() intentionally runs lockless, we must
add READ_ONCE() and WRITE_ONCE() annotations to keep KCSAN happy.

Fixes: 7e98102f4897 ("tcp: record pkts sent and retransmistted")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agotcp: annotate data-races in tcp_get_info_chrono_stats()
Eric Dumazet [Thu, 16 Apr 2026 20:03:06 +0000 (20:03 +0000)] 
tcp: annotate data-races in tcp_get_info_chrono_stats()

tcp_get_timestamping_opt_stats() does not own the socket lock,
this is intentional.

It calls tcp_get_info_chrono_stats() while other threads could
change chrono fields in tcp_chrono_set().

I do not think we need coherent TCP socket state snapshot
in tcp_get_timestamping_opt_stats(), I chose to only
add annotations to keep KCSAN happy.

Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260416200319.3608680-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 days agomailbox: update kdoc for struct mbox_controller
Wolfram Sang [Mon, 13 Apr 2026 10:42:39 +0000 (12:42 +0200)] 
mailbox: update kdoc for struct mbox_controller

Add field for missing lock around the hrtimer. Add 'Required' where
the core checks for valid entries.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
10 days agomailbox: add sanity check for channel array
Wolfram Sang [Mon, 13 Apr 2026 10:42:38 +0000 (12:42 +0200)] 
mailbox: add sanity check for channel array

Fail gracefully if there is no channel array attached to the mailbox
controller. Otherwise the later dereference will cause an OOPS which
might not be seen because mailbox controllers might instantiate very
early. Remove the comment explaining the obvious while here.

Fixes: 2b6d83e2b8b7 ("mailbox: Introduce framework for mailbox")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
10 days agoksmbd: fix out-of-bounds write in smb2_get_ea() EA alignment
Tristan Madani [Fri, 17 Apr 2026 19:33:17 +0000 (19:33 +0000)] 
ksmbd: fix out-of-bounds write in smb2_get_ea() EA alignment

smb2_get_ea() applies 4-byte alignment padding via memset() after
writing each EA entry. The bounds check on buf_free_len is performed
before the value memcpy, but the alignment memset fires unconditionally
afterward with no check on remaining space.

When the EA value exactly fills the remaining buffer (buf_free_len == 0
after value subtraction), the alignment memset writes 1-3 NUL bytes
past the buf_free_len boundary. In compound requests where the response
buffer is shared across commands, the first command (e.g., READ) can
consume most of the buffer, leaving a tight remainder for the QUERY_INFO
EA response. The alignment memset then overwrites past the physical
kvmalloc allocation into adjacent kernel heap memory.

Add a bounds check before the alignment memset to ensure buf_free_len
can accommodate the padding bytes.

This is the same bug pattern fixed by commit beef2634f81f ("ksmbd: fix
potencial OOB in get_file_all_info() for compound requests") and
commit fda9522ed6af ("ksmbd: fix OOB write in QUERY_INFO for compound
requests"), both of which added bounds checks before unconditional
writes in QUERY_INFO response handlers.

Cc: stable@vger.kernel.org
Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound")
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agoksmbd: use check_add_overflow() to prevent u16 DACL size overflow
Tristan Madani [Fri, 17 Apr 2026 19:54:57 +0000 (19:54 +0000)] 
ksmbd: use check_add_overflow() to prevent u16 DACL size overflow

set_posix_acl_entries_dacl() and set_ntacl_dacl() accumulate ACE sizes
in u16 variables. When a file has many POSIX ACL entries, the
accumulated size can wrap past 65535, causing the pointer arithmetic
(char *)pndace + *size to land within already-written ACEs. Subsequent
writes then overwrite earlier entries, and pndacl->size gets a
truncated value.

Use check_add_overflow() at each accumulation point to detect the
wrap before it corrupts the buffer, consistent with existing
check_mul_overflow() usage elsewhere in smbacl.c.

Cc: stable@vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agoksmbd: fix use-after-free in smb2_open during durable reconnect
Akif [Fri, 17 Apr 2026 18:27:09 +0000 (23:57 +0530)] 
ksmbd: fix use-after-free in smb2_open during durable reconnect

In smb2_open, the call to ksmbd_put_durable_fd(fp) drops the reference
to the durable file descriptor early during the durable reconnect
process. If an error occurs subsequently (eg, ksmbd_iov_pin_rsp fails)
or a scavenger accesses the file, it leads to a use-after-free when
accessing fp properties (eg fp->create_time).

Move the single put to the end of the function below err_out2 so fp
stays valid until smb2_open returns.

Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2")
Signed-off-by: Akif <akif.sait111@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agoksmbd: validate num_aces and harden ACE walk in smb_inherit_dacl()
Michael Bommarito [Fri, 17 Apr 2026 18:45:57 +0000 (14:45 -0400)] 
ksmbd: validate num_aces and harden ACE walk in smb_inherit_dacl()

smb_inherit_dacl() trusts the on-disk num_aces value from the parent
directory's DACL xattr and uses it to size a heap allocation:

  aces_base = kmalloc(sizeof(struct smb_ace) * num_aces * 2, ...);

num_aces is a u16 read from le16_to_cpu(parent_pdacl->num_aces)
without checking that it is consistent with the declared pdacl_size.
An authenticated client whose parent directory's security.NTACL is
tampered (e.g. via offline xattr corruption or a concurrent path that
bypasses parse_dacl()) can present num_aces = 65535 with minimal
actual ACE data.  This causes a ~8 MB allocation (not kzalloc, so
uninitialized) that the subsequent loop only partially populates, and
may also overflow the three-way size_t multiply on 32-bit kernels.

Additionally, the ACE walk loop uses the weaker
offsetof(struct smb_ace, access_req) minimum size check rather than
the minimum valid on-wire ACE size, and does not reject ACEs whose
declared size is below the minimum.

Reproduced on UML + KASAN + LOCKDEP against the real ksmbd code path.
A legitimate mount.cifs client creates a parent directory over SMB
(ksmbd writes a valid security.NTACL xattr), then the NTACL blob on
the backing filesystem is rewritten to set num_aces = 0xFFFF while
keeping the posix_acl_hash bytes intact so ksmbd_vfs_get_sd_xattr()'s
hash check still passes.  A subsequent SMB2 CREATE of a child under
that parent drives smb2_open() into smb_inherit_dacl() (share has
"vfs objects = acl_xattr" set), which fails the page allocator:

  WARNING: mm/page_alloc.c:5226 at __alloc_frozen_pages_noprof+0x46c/0x9c0
  Workqueue: ksmbd-io handle_ksmbd_work
   __alloc_frozen_pages_noprof+0x46c/0x9c0
   ___kmalloc_large_node+0x68/0x130
   __kmalloc_large_node_noprof+0x24/0x70
   __kmalloc_noprof+0x4c9/0x690
   smb_inherit_dacl+0x394/0x2430
   smb2_open+0x595d/0xabe0
   handle_ksmbd_work+0x3d3/0x1140

With the patch applied the added guard rejects the tampered value
with -EINVAL before any large allocation runs, smb2_open() falls back
to smb2_create_sd_buffer(), and the child is created with a default
SD.  No warning, no splat.

Fix by:

  1. Validating num_aces against pdacl_size using the same formula
     applied in parse_dacl().

  2. Replacing the raw kmalloc(sizeof * num_aces * 2) with
     kmalloc_array(num_aces * 2, sizeof(...)) for overflow-safe
     allocation.

  3. Tightening the per-ACE loop guard to require the minimum valid
     ACE size (offsetof(smb_ace, sid) + CIFS_SID_BASE_SIZE) and
     rejecting under-sized ACEs, matching the hardening in
     smb_check_perm_dacl() and parse_dacl().

v1 -> v2:
  - Replace the synthetic test-module splat in the changelog with a
    real-path UML + KASAN reproduction driven through mount.cifs and
    SMB2 CREATE; Namjae flagged the kcifs3_test_inherit_dacl_old name
    in v1 since it does not exist in ksmbd.
  - Drop the commit-hash citation from the code comment per Namjae's
    review; keep the parse_dacl() pointer.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agosmb: server: fix max_connections off-by-one in tcp accept path
DaeMyung Kang [Thu, 16 Apr 2026 21:17:35 +0000 (06:17 +0900)] 
smb: server: fix max_connections off-by-one in tcp accept path

The global max_connections check in ksmbd's TCP accept path counts
the newly accepted connection with atomic_inc_return(), but then
rejects the connection when the result is greater than or equal to
server_conf.max_connections.

That makes the effective limit one smaller than configured. For
example:

- max_connections=1 rejects the first connection
- max_connections=2 allows only one connection

The per-IP limit in the same function uses <= correctly because it
counts only pre-existing connections. The global limit instead checks
the post-increment total, so it should reject only when that total
exceeds the configured maximum.

Fix this by changing the comparison from >= to >, so exactly
max_connections simultaneous connections are allowed and the next one
is rejected. This matches the documented meaning of max_connections
in fs/smb/server/ksmbd_netlink.h as the "Number of maximum simultaneous
connections".

Fixes: 0d0d4680db22 ("ksmbd: add max connections parameter")
Cc: stable@vger.kernel.org
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agoksmbd: require minimum ACE size in smb_check_perm_dacl()
Michael Bommarito [Tue, 14 Apr 2026 19:15:33 +0000 (15:15 -0400)] 
ksmbd: require minimum ACE size in smb_check_perm_dacl()

Both ACE-walk loops in smb_check_perm_dacl() only guard against an
under-sized remaining buffer, not against an ACE whose declared
`ace->size` is smaller than the struct it claims to describe:

  if (offsetof(struct smb_ace, access_req) > aces_size)
      break;
  ace_size = le16_to_cpu(ace->size);
  if (ace_size > aces_size)
      break;

The first check only requires the 4-byte ACE header to be in bounds;
it does not require access_req (4 bytes at offset 4) to be readable.
An attacker who has set a crafted DACL on a file they own can declare
ace->size == 4 with aces_size == 4, pass both checks, and then

  granted |= le32_to_cpu(ace->access_req);               /* upper loop */
  compare_sids(&sid, &ace->sid);                         /* lower loop */

reads access_req at offset 4 (OOB by up to 4 bytes) and ace->sid at
offset 8 (OOB by up to CIFS_SID_BASE_SIZE + SID_MAX_SUB_AUTHORITIES
* 4 bytes).

Tighten both loops to require

  ace_size >= offsetof(struct smb_ace, sid) + CIFS_SID_BASE_SIZE

which is the smallest valid on-wire ACE layout (4-byte header +
4-byte access_req + 8-byte sid base with zero sub-auths).  Also
reject ACEs whose sid.num_subauth exceeds SID_MAX_SUB_AUTHORITIES
before letting compare_sids() dereference sub_auth[] entries.

parse_sec_desc() already enforces an equivalent check (lines 441-448);
smb_check_perm_dacl() simply grew weaker validation over time.

Reachability: authenticated SMB client with permission to set an ACL
on a file.  On a subsequent CREATE against that file, the kernel
walks the stored DACL via smb_check_perm_dacl() and triggers the
OOB read.  Not pre-auth, and the OOB read is not reflected to the
attacker, but KASAN reports and kernel state corruption are
possible.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agoksmbd: validate response sizes in ipc_validate_msg()
Michael Bommarito [Wed, 15 Apr 2026 11:25:00 +0000 (07:25 -0400)] 
ksmbd: validate response sizes in ipc_validate_msg()

ipc_validate_msg() computes the expected message size for each
response type by adding (or multiplying) attacker-controlled fields
from the daemon response to a fixed struct size in unsigned int
arithmetic.  Three cases can overflow:

  KSMBD_EVENT_RPC_REQUEST:
      msg_sz = sizeof(struct ksmbd_rpc_command) + resp->payload_sz;
  KSMBD_EVENT_SHARE_CONFIG_REQUEST:
      msg_sz = sizeof(struct ksmbd_share_config_response) +
               resp->payload_sz;
  KSMBD_EVENT_LOGIN_REQUEST_EXT:
      msg_sz = sizeof(struct ksmbd_login_response_ext) +
               resp->ngroups * sizeof(gid_t);

resp->payload_sz is __u32 and resp->ngroups is __s32.  Each addition
can wrap in unsigned int; the multiplication by sizeof(gid_t) mixes
signed and size_t, so a negative ngroups is converted to SIZE_MAX
before the multiply.  A wrapped value of msg_sz that happens to
equal entry->msg_sz bypasses the size check on the next line, and
downstream consumers (smb2pdu.c:6742 memcpy using rpc_resp->payload_sz,
kmemdup in ksmbd_alloc_user using resp_ext->ngroups) then trust the
unverified length.

Use check_add_overflow() on the RPC_REQUEST and SHARE_CONFIG_REQUEST
paths to detect integer overflow without constraining functional
payload size; userspace ksmbd-tools grows NDR responses in 4096-byte
chunks for calls like NetShareEnumAll, so a hard transport cap is
unworkable on the response side.  For LOGIN_REQUEST_EXT, reject
resp->ngroups outside the signed [0, NGROUPS_MAX] range up front and
report the error from ipc_validate_msg() so it fires at the IPC
boundary; with that bound the subsequent multiplication and addition
stay well below UINT_MAX.  The now-redundant ngroups check and
pr_err in ksmbd_alloc_user() are removed.

This is the response-side analogue of aab98e2dbd64 ("ksmbd: fix
integer overflows on 32 bit systems"), which hardened the request
side.

Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Fixes: a77e0e02af1c ("ksmbd: add support for supplementary groups")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agosmb: server: fix active_num_conn leak on transport allocation failure
Michael Bommarito [Tue, 14 Apr 2026 22:54:38 +0000 (18:54 -0400)] 
smb: server: fix active_num_conn leak on transport allocation failure

Commit 77ffbcac4e56 ("smb: server: fix leak of active_num_conn in
ksmbd_tcp_new_connection()") addressed the kthread_run() failure
path.  The earlier alloc_transport() == NULL path in the same
function has the same leak, is reachable pre-authentication via any
TCP connect to port 445, and was empirically reproduced on UML
(ARCH=um, v7.0-rc7): a small number of forced allocation failures
were sufficient to put ksmbd into a state where every subsequent
connection attempt was rejected for the remainder of the boot.

ksmbd_kthread_fn() increments active_num_conn before calling
ksmbd_tcp_new_connection() and discards the return value, so when
alloc_transport() returns NULL the socket is released and -ENOMEM
returned without decrementing the counter.  Each such failure
permanently consumes one slot from the max_connections pool; once
cumulative failures reach the cap, atomic_inc_return() hits the
threshold on every subsequent accept and every new connection is
rejected.  The counter is only reset by module reload.

An unauthenticated remote attacker can drive the server toward the
memory pressure that makes alloc_transport() fail by holding open
connections with large RFC1002 lengths up to MAX_STREAM_PROT_LEN
(0x00FFFFFF); natural transient allocation failures on a loaded
host produce the same drift more slowly.

Mirror the existing rollback pattern in ksmbd_kthread_fn(): on the
alloc_transport() failure path, decrement active_num_conn gated on
server_conf.max_connections.

Repro details: with the patch reverted, forced alloc_transport()
NULL returns leaked counter slots and subsequent connection
attempts -- including legitimate connects issued after the
forced-fail window had closed -- were all rejected with "Limit the
maximum number of connections".  With this patch applied, the same
connect sequence produces no rejections and the counter cycles
cleanly between zero and one on every accept.

Fixes: 0d0d4680db22 ("ksmbd: add max connections parameter")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
10 days agoMerge tag 'i2c-for-7.1-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 18 Apr 2026 16:44:22 +0000 (09:44 -0700)] 
Merge tag 'i2c-for-7.1-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
 "The biggest news in this pull request is that it will start the last
  cycle of me handling the I2C subsystem. From 7.2. on, I will pass
  maintainership to Andi Shyti who has been maintaining the I2C drivers
  for a while now and who has done a great job in doing so.

  We will use this cycle for a hopefully smooth transition. Thanks must
  go to Andi for stepping up! I will still be around for guidance.

  Updates:
   - generic cleanups in npcm7xx, qcom-cci, xiic and designware DT
     bindings
   - atr: use kzalloc_flex for alias pool allocation
   - ixp4xx: convert bindings to DT schema
   - ocores: use read_poll_timeout_atomic() for polling waits
   - qcom-geni: skip extra TX DMA TRE for single read messages
   - s3c24xx: validate SMBus block length before using it
   - spacemit: refactor xfer path and add K1 PIO support
   - tegra: identify DVC and VI with SoC data variants
   - tegra: support SoC-specific register offsets
   - xiic: switch to devres and generic fw properties
   - xiic: skip input clock setup on non-OF systems
   - various minor improvements in other drivers

  rtl9300:
   - add per-SoC callbacks and clock support for RTL9607C
   - add support for new 50 kHz and 2.5 MHz bus speeds
   - general refactoring in preparation for RTL9607C support

  New support:
   - DesignWare GOOG5000 (ACPI HID)
   - Intel Nova Lake (ACPI ID)
   - Realtek RTL9607C
   - SpacemiT K3 binding
   - Tegra410 register layout support"

* tag 'i2c-for-7.1-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (40 commits)
  i2c: usbio: Add ACPI device-id for NVL platforms
  i2c: qcom-geni: Avoid extra TX DMA TRE for single read message in GPI mode
  i2c: atr: use kzalloc_flex
  i2c: spacemit: introduce pio for k1
  i2c: spacemit: move i2c_xfer_msg()
  i2c: xiic: skip input clock setup on non-OF systems
  i2c: xiic: use numbered adapter registration
  i2c: xiic: cosmetic: use resource format specifier in debug log
  i2c: xiic: cosmetic cleanup
  i2c: xiic: switch to generic device property accessors
  i2c: xiic: remove duplicate error message
  i2c: xiic: switch to devres managed APIs
  i2c: rtl9300: add RTL9607C i2c controller support
  i2c: rtl9300: introduce new function properties to driver data
  i2c: rtl9300: introduce clk struct for upcoming rtl9607 support
  dt-bindings: i2c: realtek,rtl9301-i2c: extend for clocks and RTL9607C support
  i2c: rtl9300: introduce a property for 8 bit width reg address
  i2c: rtl9300: introduce F_BUSY to the reg_fields struct
  i2c: rtl9300: introduce max length property to driver data
  i2c: rtl9300: split data_reg into read and write reg
  ...

10 days agoMerge tag 'for-linus-7.1-1' of https://github.com/cminyard/linux-ipmi
Linus Torvalds [Sat, 18 Apr 2026 16:33:54 +0000 (09:33 -0700)] 
Merge tag 'for-linus-7.1-1' of https://github.com/cminyard/linux-ipmi

Pull ipmi updates from Corey Minyard:
 "Small updates and fixes (mostly to the BMC software):

   - Fix one issue in the host side driver where a kthread can be left
     running on a specific memory allocation failre at probe time

   - Replace system_wq with system_percpu_wq so system_wq can eventually
     go away"

* tag 'for-linus-7.1-1' of https://github.com/cminyard/linux-ipmi:
  ipmi:ssif: Clean up kthread on errors
  ipmi:ssif: Remove unnecessary indention
  ipmi: ssif_bmc: Fix KUnit test link failure when KUNIT=m
  ipmi: ssif_bmc: add unit test for state machine
  ipmi: ssif_bmc: change log level to dbg in irq callback
  ipmi: ssif_bmc: fix message desynchronization after truncated response
  ipmi: ssif_bmc: fix missing check for copy_to_user() partial failure
  ipmi: ssif_bmc: cancel response timer on remove
  ipmi: Replace use of system_wq with system_percpu_wq

10 days agoMerge tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 18 Apr 2026 16:24:56 +0000 (09:24 -0700)] 
Merge tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Namhyung Kim:
 "perf report:

   - Add 'comm_nodigit' sort key to combine similar threads that only
     have different numbers in the comm. In the following example, the
     'comm_nodigit' will have samples from all threads starting with
     "bpfrb/" into an entry "bpfrb/<N>".

        $ perf report -s comm_nodigit,comm -H
        ...
        #
        #    Overhead  CommandNoDigit / Command
        # ...........  ........................
        #
            20.30%     swapper
               20.30%     swapper
            13.37%     chrome
               13.37%     chrome
            10.07%     bpfrb/<N>
                7.47%     bpfrb/0
                0.70%     bpfrb/1
                0.47%     bpfrb/3
                0.46%     bpfrb/2
                0.25%     bpfrb/4
                0.23%     bpfrb/5
                0.20%     bpfrb/6
                0.14%     bpfrb/10
                0.07%     bpfrb/7

   - Support flat layout for symfs. The --symfs option is to specify the
     location of debugging symbol files. The default 'hierarchy' layout
     would search the symbol file using the same path of the original
     file under the symfs root. The new 'flat' layout would search only
     in the root directory.

   - Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more
     predicate flags.

  perf stat:

   - Add --pmu-filter option to select specific PMUs. This would be
     useful when you measure metrics from multiple instance of uncore
     PMUs with similar names.

        # perf stat -M cpa_p0_avg_bw
         Performance counter stats for 'system wide':

            19,417,779,115      hisi_sicl0_cpa0/cpa_cycles/      #     0.00 cpa_p0_avg_bw
                         0      hisi_sicl0_cpa0/cpa_p0_wr_dat/
                         0      hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/
                         0      hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/
            19,417,751,103      hisi_sicl10_cpa0/cpa_cycles/     #     0.00 cpa_p0_avg_bw
                         0      hisi_sicl10_cpa0/cpa_p0_wr_dat/
                         0      hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/
                         0      hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/
            19,417,730,679      hisi_sicl2_cpa0/cpa_cycles/      #     0.31 cpa_p0_avg_bw
                75,635,749      hisi_sicl2_cpa0/cpa_p0_wr_dat/
                18,520,640      hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/
                         0      hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/
            19,417,674,227      hisi_sicl8_cpa0/cpa_cycles/      #     0.00 cpa_p0_avg_bw
                         0      hisi_sicl8_cpa0/cpa_p0_wr_dat/
                         0      hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/
                         0      hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/

              19.417734480 seconds time elapsed

     With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU.

        # perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw
         Performance counter stats for 'system wide':

             6,234,093,559      cpa_cycles                       #     0.60 cpa_p0_avg_bw
                50,548,465      cpa_p0_wr_dat
                 7,552,182      cpa_p0_rd_dat_64b
                         0      cpa_p0_rd_dat_32b

               6.234139320 seconds time elapsed

  Data type profiling:

   - Quality improvements by tracking register state more precisely

   - Ensure array members to get the type

   - Handle more cases for global variables

  Vendor event/metric updates:

   - Update various Intel events and metrics

   - Add NVIDIA Tegra 410 Olympus events

  Internal changes:

   - Verify perf.data header for maliciously crafted files

   - Update perf test to cover more usages and make them robust

   - Move a couple of copied kernel headers not to annoy objtool build

   - Fix a bug in map sorting in name order

   - Remove some unused codes

  Misc:

   - Fix module symbol resolution with non-zero text address

   - Add -t/--threads option to `perf bench mem mmap`

   - Track duration of exit*() syscall by `perf trace -s`

   - Add core.addr2line-timeout and core.addr2line-disable-warn config
     items"

* tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits)
  perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND
  perf annotate: Use jump__delete when freeing LoongArch jumps
  perf test: Fixes for check branch stack sampling
  perf test: Fix inet_pton probe failure and unroll call graph
  perf build: fix "argument list too long" in second location
  perf header: Add sanity checks to HEADER_BPF_BTF processing
  perf header: Sanity check HEADER_BPF_PROG_INFO
  perf header: Sanity check HEADER_PMU_CAPS
  perf header: Sanity check HEADER_HYBRID_TOPOLOGY
  perf header: Sanity check HEADER_CACHE
  perf header: Sanity check HEADER_GROUP_DESC
  perf header: Sanity check HEADER_PMU_MAPPINGS
  perf header: Sanity check HEADER_MEM_TOPOLOGY
  perf header: Sanity check HEADER_NUMA_TOPOLOGY
  perf header: Sanity check HEADER_CPU_TOPOLOGY
  perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO
  perf header: Bump up the max number of command line args allowed
  perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO
  perf sample: Fix documentation typo
  perf arm_spe: Improve SIMD flags setting
  ...

10 days agosh: Drop CONFIG_FIRMWARE_EDID from defconfig files
Thomas Zimmermann [Wed, 1 Apr 2026 08:32:34 +0000 (10:32 +0200)] 
sh: Drop CONFIG_FIRMWARE_EDID from defconfig files

CONFIG_FIRMWARE_EDID=y depends on X86 or EFI_GENERIC_STUB. Neither
is true here, so drop the lines from the defconfig files.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
10 days agosh: Remove CONFIG_VSYSCALL reference from UAPI
Thomas Weißschuh [Tue, 24 Feb 2026 15:35:31 +0000 (16:35 +0100)] 
sh: Remove CONFIG_VSYSCALL reference from UAPI

AT_SYSINFO_EHDR defines the auxvector index representing the vDSO
entrypoint. Its value or presence does not depend on whether a vDSO
is actually provided by the kernel.

The definition of AT_SYSINFO_EHDR was gated between CONFIG_VSYSCALL to
avoid a default gate VMA to be created. However that default gate VMA
was removed entirely in commit a6c19dfe3994
("arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area").

Remove the now unnecessary conditional.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
10 days agosh: Fix typo in SPDX license ID lines
Tim Bird [Thu, 12 Feb 2026 19:28:45 +0000 (12:28 -0700)] 
sh: Fix typo in SPDX license ID lines

Both platform_early.c and platform_early.h have an extra dash in
their SPDX-License-Identifier lines. Use the correct (single-dash)
syntax for these lines.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
10 days agosh: Include <linux/io.h> in dac.h
Thomas Zimmermann [Tue, 28 Oct 2025 17:07:55 +0000 (18:07 +0100)] 
sh: Include <linux/io.h> in dac.h

Include <linux/io.h> to avoid depending on <linux/backlight.h>
for including it. Declares __raw_readb() and __raw_writeb().

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510282206.wI0HrqcK-lkp@intel.com/
Fixes: 243ce64b2b37 ("backlight: Do not include <linux/fb.h> in header file")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Daniel Thompson (RISCstar) <danielt@kernel.org>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Lee Jones <lee@kernel.org>
Cc: Daniel Thompson <danielt@kernel.org>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
10 days agoMAINTAINERS: add page cache reviewer
Jan Kara [Wed, 15 Apr 2026 17:40:40 +0000 (19:40 +0200)] 
MAINTAINERS: add page cache reviewer

Add myself as a page cache reviewer since I tend to review changes in
these areas anyway.

[akpm@linux-foundation.org: add linux-mm@kvack.org]
Link: https://lore.kernel.org/20260415174039.13016-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agomm/vmscan: avoid false-positive -Wuninitialized warning
Arnd Bergmann [Tue, 14 Apr 2026 06:51:58 +0000 (08:51 +0200)] 
mm/vmscan: avoid false-positive -Wuninitialized warning

When the -fsanitize=bounds sanitizer is enabled, gcc-16 sometimes runs
into a corner case in the read_ctrl_pos() pos function, where it sees
possible undefined behavior from the 'tier' index overflowing, presumably
in the case that this was called with a negative tier:

In function 'get_tier_idx',
    inlined from 'isolate_folios' at mm/vmscan.c:4671:14:
mm/vmscan.c: In function 'isolate_folios':
mm/vmscan.c:4645:29: error: 'pv.refaulted' is used uninitialized [-Werror=uninitialized]

Part of the problem seems to be that read_ctrl_pos() has unusual calling
conventions since commit 37a260870f2c ("mm/mglru: rework type selection")
where passing MAX_NR_TIERS makes it accumulate all tiers but passing a
smaller positive number makes it read a single tier instead.

Shut up the warning by adding a fake initialization to the two instances
of this variable that can run into that corner case.

Link: https://lore.kernel.org/all/CAJHvVcjtFW86o5FoQC8MMEXCHAC0FviggaQsd5EmiCHP+1fBpg@mail.gmail.com/
Link: https://lore.kernel.org/20260414065206.3236176-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Koichiro Den <koichiro.den@canonical.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agoMAINTAINERS: update Dave's kdump reviewer email address
Dave Young [Wed, 15 Apr 2026 03:29:26 +0000 (11:29 +0800)] 
MAINTAINERS: update Dave's kdump reviewer email address

Use my personal email address due to the Red Hat work will stop soon

Link: https://lore.kernel.org/ad8GFhh3SI1wb7IC@darkstar.users.ipa.redhat.com
Signed-off-by: Dave Young <ruirui.yang@linux.dev>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agoMAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE
Pratyush Yadav (Google) [Tue, 14 Apr 2026 12:17:20 +0000 (12:17 +0000)] 
MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE

The directory does not exist any more.

Link: https://lore.kernel.org/20260414121752.1912847-4-pratyush@kernel.org
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agoMAINTAINERS: drop include/linux/kho/abi/ from KHO
Pratyush Yadav (Google) [Tue, 14 Apr 2026 12:17:19 +0000 (12:17 +0000)] 
MAINTAINERS: drop include/linux/kho/abi/ from KHO

The KHO entry already includes include/linux/kho.  Listing its
subdirectory is redundant.

Link: https://lore.kernel.org/20260414121752.1912847-3-pratyush@kernel.org
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agoMAINTAINERS: update KHO and LIVE UPDATE maintainers
Pratyush Yadav (Google) [Tue, 14 Apr 2026 12:17:18 +0000 (12:17 +0000)] 
MAINTAINERS: update KHO and LIVE UPDATE maintainers

Patch series "MAINTAINERS: update KHO and LIVE UPDATE entries".

This series contains some updates for the Kexec Handover (KHO) and Live
update entries.  Patch 1 updates the maintainers list and adds the
liveupdate tree.  Patches 2 and 3 clean up stale files in the list.

This patch (of 3):

I have been helping out with reviewing and developing KHO.  I would also
like to help maintain it.  Change my entry from R to M for KHO and live
update.  Alex has been inactive for a while, so to avoid over-crowding the
KHO entry and to keep the information up-to-date, move his entry from M to
R.

We also now have a tree for KHO and live update at liveupdate/linux.git
where we plan to start maintaining those subsystems and start queuing the
patches.  List that in the entries as well.

Link: https://lore.kernel.org/20260414121752.1912847-1-pratyush@kernel.org
Link: https://lore.kernel.org/20260414121752.1912847-2-pratyush@kernel.org
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agoMAINTAINERS: update kexec/kdump maintainers entries
Pasha Tatashin [Mon, 13 Apr 2026 12:11:46 +0000 (12:11 +0000)] 
MAINTAINERS: update kexec/kdump maintainers entries

Update KEXEC and KDUMP maintainer entries by adding the live update group
maintainers.  Remove Vivek Goyal due to inactivity to keep the MAINTAINERS
file up-to-date, and add Vivek to the CREDITS file to recognize their
contributions.

Link: https://lore.kernel.org/20260413121146.49215-1-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Pratyush Yadav <pratyush@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Diego Viola <diego.viola@gmail.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Martin Kepplinger <martink@posteo.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agomm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd()
Davidlohr Bueso [Thu, 12 Feb 2026 01:46:11 +0000 (17:46 -0800)] 
mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd()

The softleaf_is_migration() check is unreachable as entries that are not
device_private are filtered out.  Similarly, the PTE-level equivalent in
migrate_vma_collect_pmd() skips migration entries.

This dead branch also contained a double spin_unlock(ptl) bug.

Link: https://lore.kernel.org/20260212014611.416695-1-dave@stgolabs.net
Fixes: a30b48bf1b244 ("mm/migrate_device: implement THP migration of zone device pages")
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agoselftests: mm: skip charge_reserved_hugetlb without killall
Cao Ruichuang [Fri, 10 Apr 2026 04:41:39 +0000 (12:41 +0800)] 
selftests: mm: skip charge_reserved_hugetlb without killall

charge_reserved_hugetlb.sh tears down background writers with killall from
psmisc.  Minimal Ubuntu images do not always provide that tool, so the
selftest fails in cleanup for an environment reason rather than for the
hugetlb behavior it is trying to cover.

Skip the test when killall is unavailable, similar to the existing root
check, so these environments report the dependency clearly instead of
failing the test.

Link: https://lore.kernel.org/20260410044139.67480-1-create0818@163.com
Signed-off-by: Cao Ruichuang <create0818@163.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agouserfaultfd: allow registration of ranges below mmap_min_addr
Denis M. Karpov [Thu, 9 Apr 2026 10:33:45 +0000 (13:33 +0300)] 
userfaultfd: allow registration of ranges below mmap_min_addr

The current implementation of validate_range() in fs/userfaultfd.c
performs a hard check against mmap_min_addr.  This is redundant because
UFFDIO_REGISTER operates on memory ranges that must already be backed by a
VMA.

Enforcing mmap_min_addr or capability checks again in userfaultfd is
unnecessary and prevents applications like binary compilers from using
UFFD for valid memory regions mapped by application.

Remove the redundant check for mmap_min_addr.

We started using UFFD instead of the classic mprotect approach in the
binary translator to track application writes.  During development, we
encountered this bug.  The translator cannot control where the translated
application chooses to map its memory and if the app requires a
low-address area, UFFD fails, whereas mprotect would work just fine.  I
believe this is a genuine logic bug rather than an improvement, and I
would appreciate including the fix in stable.

Link: https://lore.kernel.org/20260409103345.15044-1-komlomal@gmail.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 days agomm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update
Breno Leitao [Thu, 9 Apr 2026 12:26:36 +0000 (05:26 -0700)] 
mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update

vmstat_shepherd uses delayed_work_pending() to check whether vmstat_update
is already scheduled for a given CPU before queuing it.  However,
delayed_work_pending() only tests WORK_STRUCT_PENDING_BIT, which is
cleared the moment a worker thread picks up the work to execute it.

This means that while vmstat_update is actively running on a CPU,
delayed_work_pending() returns false.  If need_update() also returns true
at that point (per-cpu counters not yet zeroed mid-flush), the shepherd
queues a second invocation with delay=0, causing vmstat_update to run
again immediately after finishing.

On a 72-CPU system this race is readily observable: before the fix, many
CPUs show invocation gaps well below 500 jiffies (the minimum
round_jiffies_relative() can produce), with the most extreme cases
reaching 0 jiffies—vmstat_update called twice within the same jiffy.

Fix this by replacing delayed_work_pending() with work_busy(), which
returns non-zero for both WORK_BUSY_PENDING (timer armed or work queued)
and WORK_BUSY_RUNNING (work currently executing).  The shepherd now
correctly skips a CPU in all busy states.

After the fix, all sub-jiffy and most sub-100-jiffie gaps disappear.  The
remaining early invocations have gaps in the 700–999 jiffie range,
attributable to round_jiffies_relative() aligning to a nearer
jiffie-second boundary rather than to this race.

Each spurious vmstat_update invocation has a measurable side effect:
refresh_cpu_vm_stats() calls decay_pcp_high() for every zone, which drains
idle per-CPU pages back to the buddy allocator via free_pcppages_bulk(),
taking the zone spinlock each time.  Eliminating the double-scheduling
therefore reduces zone lock contention directly.  On a 72-CPU stress-ng
workload measured with perf lock contention:

  free_pcppages_bulk contention count:  ~55% reduction
  free_pcppages_bulk total wait time:   ~57% reduction
  free_pcppages_bulk max wait time:     ~47% reduction

Note: work_busy() is inherently racy—between the check and the
subsequent queue_delayed_work_on() call, vmstat_update can finish
execution, leaving the work neither pending nor running.  In that narrow
window the shepherd can still queue a second invocation.  After the fix,
this residual race is rare and produces only occasional small gaps, a
significant improvement over the systematic double-scheduling seen with
delayed_work_pending().

Link: https://lore.kernel.org/20260409-vmstat-v2-1-e9d9a6db08ad@debian.org
Fixes: 7b8da4c7f07774 ("vmstat: get rid of the ugly cpu_stat_off variable")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>