]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 weeks agodrm/msm: mdss: Add Milos support
Luca Weiss [Fri, 1 May 2026 07:14:49 +0000 (09:14 +0200)] 
drm/msm: mdss: Add Milos support

Add support for MDSS on Milos.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Patchwork: https://patchwork.freedesktop.org/patch/722320/
Link: https://lore.kernel.org/r/20260501-milos-mdss-v3-7-58bfc58c0e13@fairphone.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodrm/msm/dsi: add support for DSI-PHY on Milos
Luca Weiss [Fri, 1 May 2026 07:14:48 +0000 (09:14 +0200)] 
drm/msm/dsi: add support for DSI-PHY on Milos

Add DSI PHY support for the Milos platform.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Patchwork: https://patchwork.freedesktop.org/patch/722319/
Link: https://lore.kernel.org/r/20260501-milos-mdss-v3-6-58bfc58c0e13@fairphone.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display: msm: document the Milos Mobile Display Subsystem
Luca Weiss [Fri, 1 May 2026 07:14:46 +0000 (09:14 +0200)] 
dt-bindings: display: msm: document the Milos Mobile Display Subsystem

Document the Mobile Display Subsystem (MDSS) on the Milos SoC.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/722315/
Link: https://lore.kernel.org/r/20260501-milos-mdss-v3-4-58bfc58c0e13@fairphone.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display: msm: document the Milos DPU
Luca Weiss [Fri, 1 May 2026 07:14:45 +0000 (09:14 +0200)] 
dt-bindings: display: msm: document the Milos DPU

Document the DPU Display Controller on the Milos Platform.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Patchwork: https://patchwork.freedesktop.org/patch/722313/
Link: https://lore.kernel.org/r/20260501-milos-mdss-v3-3-58bfc58c0e13@fairphone.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display: msm-dsi-controller-main: document the Milos DSI Controller
Luca Weiss [Fri, 1 May 2026 07:14:44 +0000 (09:14 +0200)] 
dt-bindings: display: msm-dsi-controller-main: document the Milos DSI Controller

Document the DSI Controller on the Milos Platform.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Patchwork: https://patchwork.freedesktop.org/patch/722310/
Link: https://lore.kernel.org/r/20260501-milos-mdss-v3-2-58bfc58c0e13@fairphone.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agodt-bindings: display: msm-dsi-phy-7nm: document the Milos DSI PHY
Luca Weiss [Fri, 1 May 2026 07:14:43 +0000 (09:14 +0200)] 
dt-bindings: display: msm-dsi-phy-7nm: document the Milos DSI PHY

Document the DSI PHY on the Milos Platform.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Patchwork: https://patchwork.freedesktop.org/patch/722309/
Link: https://lore.kernel.org/r/20260501-milos-mdss-v3-1-58bfc58c0e13@fairphone.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
3 weeks agoMerge tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Fri, 22 May 2026 13:40:31 +0000 (06:40 -0700)] 
Merge tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

 - Fix PAI NNPA mismatch between counting and recording, where sampling
   reports twice the value

 - Fix loss of PAI counter increments during recording on systems with
   many CPUs under heavy load, while counting is not affected

 - On some supported machines, CHSC cannot access memory outside the DMA
   zone, causing CHSC command failures. Restore GFP_DMA flag when
   allocating memory for CHSC control blocks

 - Align the numbering scheme for higher-level topology structures like
   socket, book, drawer with other hardware identifiers e.g. in sysfs,
   procfs and tools like lscpu

* tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/topology: Use zero-based numbering for containing entities
  s390/cio: Restore GFP_DMA for CHSC allocation
  s390/pai: Fix missing PAI counter increments under heavy load
  s390/pai: Disable duplicate read of kernel PAI counter value

3 weeks agoMerge tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka...
Linus Torvalds [Fri, 22 May 2026 13:23:56 +0000 (06:23 -0700)] 
Merge tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

 - Stable fix for a missing cpus_read_lock in one of the cpu sheaves
   flushing paths (Qing Wang)

* tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slub: hold cpus_read_lock around flush_rcu_sheaves_on_cache()

3 weeks agosignal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads()
Aleksandr Nogikh [Thu, 21 May 2026 14:22:40 +0000 (16:22 +0200)] 
signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads()

When a multi-threaded process receives a stop signal (e.g., SIGSTOP),
do_signal_stop() sets JOBCTL_STOP_PENDING and JOBCTL_STOP_CONSUME on all
threads and sets signal->group_stop_count to the number of threads. If
one of the threads concurrently calls execve(), de_thread() invokes
zap_other_threads() to kill all other threads. zap_other_threads()
aborts the pending group stop by resetting signal->group_stop_count to 0
and clears the JOBCTL_PENDING_MASK for all other threads. However, it
fails to clear the job control flags for the calling thread.

When execve() completes, the calling thread returns to user mode and
checks for pending signals. Seeing the stale JOBCTL_STOP_PENDING flag,
it calls do_signal_stop(), which invokes task_participate_group_stop().
Since JOBCTL_STOP_CONSUME is still set, it attempts to decrement the
already-zero signal->group_stop_count, triggering a warning:

sig->group_stop_count == 0
WARNING: CPU: 1 PID: 6475 at kernel/signal.c:373
task_participate_group_stop+0x215/0x2d0
Call Trace:
 <TASK>
 do_signal_stop+0x3be/0x5c0 kernel/signal.c:2619
 get_signal+0xa8c/0x1330 kernel/signal.c:2884
 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop+0x8c/0x4d0 kernel/entry/common.c:98
 do_syscall_64+0x33e/0xf80 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
 </TASK>

Fix this race condition by clearing the JOBCTL_PENDING_MASK for the
calling thread in zap_other_threads(), ensuring it does not retain any
stale job control state after the thread group is destroyed. This aligns
with other functions that tear down a thread group and abort group
stops, such as zap_process() and complete_signal(), which correctly
clear these flags for all threads including the current one.

Fixes: 39efa3ef3a37 ("signal: Use GROUP_STOP_PENDING to stop once for a single group stop")
Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot
Reported-by: syzbot+b109633ea805cac54a61@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b109633ea805cac54a61
Link: https://syzkaller.appspot.com/ai_job?id=d70208cc-862b-4fe3-bf02-3031e10cd0b3
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Link: https://patch.msgid.link/20260521142240.2973022-1-nogikh@google.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agofuse: reject fuse_notify() pagecache ops on directories
Jann Horn [Tue, 19 May 2026 14:29:38 +0000 (16:29 +0200)] 
fuse: reject fuse_notify() pagecache ops on directories

The operations FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE allow the
FUSE daemon to actively write/read pagecache contents.

For directories with FOPEN_CACHE_DIR, the pagecache is used as
kernel-internal cache storage, and userspace is not supposed to have
direct access to this cache - in particular, fuse_parse_cache() will hit
WARN_ON() if the cache contains bogus data.

Reject FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE on anything other than
regular files with -EINVAL.

Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260519-fuse-dir-pagecache-v2-1-5428fa48e175@google.com
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agofuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios
Jann Horn [Tue, 19 May 2026 14:40:34 +0000 (16:40 +0200)] 
fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios

FUSE_NOTIFY_RETRIEVE must be limited to uptodate folios; !uptodate folios
can contain uninitialized data.
Since FUSE_NOTIFY_RETRIEVE is intended to only return data that is already
in the page cache and not wait for data from the FUSE daemon, treat
!uptodate folios as if they weren't present.

This only has security impact on systems that don't enable automatic
zero-initialization of all page allocations via
CONFIG_INIT_ON_ALLOC_DEFAULT_ON or init_on_alloc=1.

Cc: stable@kernel.org
Fixes: 2d45ba381a74 ("fuse: add retrieve request")
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260519-fuse-retrieve-uptodate-v1-1-a7a1912a37f9@google.com
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoMerge tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 22 May 2026 13:16:00 +0000 (06:16 -0700)] 
Merge tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fixes from Marek Szyprowski:
 "Two minor updates for the DMA-mapping code, mainly fixing some rare
  corner cases (Petr Tesarik, Jianpeng Chang)"

* tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: move dma_map_resource() sanity check into debug code
  dma-direct: fix use of max_pfn

3 weeks agoMerge tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Fri, 22 May 2026 13:09:58 +0000 (06:09 -0700)] 
Merge tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Avoid NULL return from hist_field_name()

   The function hist_field_name() is directly passed to a strcat() which
   does not handle "NULL" characters. Return a zero length string when
   size is greater than the limit.

   This is used only to output already created histograms and no field
   currently is greater than the limit. But it should still not return
   NULL.

 - Do not call map->ops->elt_free() on allocation failure

   When elt_alloc() fails, it should not call the map->ops->elt_free()
   function if it exists, as that function may not be able to handle the
   free on allocation failures. The ->elt_free() should only be called
   when elt_alloc() succeeds.

* tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Do not call map->ops->elt_free() if elt_alloc() fails
  tracing: Avoid NULL return from hist_field_name() on truncation

3 weeks agoplatform/x86: bitland-mifs-wmi: add CONFIG_LEDS_CLASS dependency
Arnd Bergmann [Tue, 19 May 2026 20:28:01 +0000 (22:28 +0200)] 
platform/x86: bitland-mifs-wmi: add CONFIG_LEDS_CLASS dependency

The newly added driver requires the LED classdev support
and causes a link failure when that is disabled:

x86_64-linux-ld: vmlinux.o: in function `bitland_mifs_wmi_probe':
bitland-mifs-wmi.c:(.text+0xede02a): undefined reference to `devm_led_classdev_register_ext'

Fixes: dc1ec4fa86b2 ("platform/x86: bitland-mifs-wmi: Add new Bitland MIFS WMI driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260519202804.1339581-1-arnd@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 weeks agoASoC: stm: Use guard() for mutex & spin locks
Mark Brown [Fri, 22 May 2026 12:36:26 +0000 (13:36 +0100)] 
ASoC: stm: Use guard() for mutex & spin locks

phucduc.bui@gmail.com <phucduc.bui@gmail.com> says:

This series converts mutex and spinlock handling in the STM drivers
to use guard() helpers.
The changes are code cleanup only and should have no functional impact.

Link: https://patch.msgid.link/20260515112458.34378-1-phucduc.bui@gmail.com
3 weeks agoASoC: stm: stm32_spdifrx: Use guard() for spin locks
bui duc phuc [Fri, 15 May 2026 11:24:58 +0000 (18:24 +0700)] 
ASoC: stm: stm32_spdifrx: Use guard() for spin locks

Clean up the code using guard() for spin locks.
Merely code refactoring, and no behavior change.

Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260515112458.34378-5-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: stm: stm32_sai_sub: Use guard() for mutex & spin locks
bui duc phuc [Fri, 15 May 2026 11:24:57 +0000 (18:24 +0700)] 
ASoC: stm: stm32_sai_sub: Use guard() for mutex & spin locks

Clean up the code using guard() for mutex & spin locks.
Merely code refactoring, and no behavior change.

Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260515112458.34378-4-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: stm: stm32_i2s: Use guard() for spin locks
bui duc phuc [Fri, 15 May 2026 11:24:56 +0000 (18:24 +0700)] 
ASoC: stm: stm32_i2s: Use guard() for spin locks

Clean up the code using guard() for spin locks.
Merely code refactoring, and no behavior change.

Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260515112458.34378-3-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: stm: stm32_adfsdm: Use guard() for mutex locks
bui duc phuc [Fri, 15 May 2026 11:24:55 +0000 (18:24 +0700)] 
ASoC: stm: stm32_adfsdm: Use guard() for mutex locks

Clean up the code using guard() for mutex locks.
Merely code refactoring, and no behavior change.

Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260515112458.34378-2-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoeventpoll: add missing kernel-doc for @ctx function parameters
Randy Dunlap [Tue, 19 May 2026 04:23:14 +0000 (21:23 -0700)] 
eventpoll: add missing kernel-doc for @ctx function parameters

Add the missing kernel-doc comments to prevent kernel-doc build
warnings while building the documentation.

WARNING: fs/eventpoll.c:1684 function parameter 'ctx' not described in 'reverse_path_check'
WARNING: fs/eventpoll.c:2349 function parameter 'ctx' not described in 'ep_loop_check_proc'

Fixes: e09c77d94003 ("eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260519042314.124041-1-rdunlap@infradead.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoASoC: mediatek: mt8189: Fix probe resource cleanup
Cássio Gabriel [Thu, 14 May 2026 13:52:35 +0000 (10:52 -0300)] 
ASoC: mediatek: mt8189: Fix probe resource cleanup

The MT8189 AFE probe assigns reserved memory with
of_reserved_mem_device_init(), but only releases that assignment from
.remove().  If probe fails after the reserved memory has been assigned,
the assignment record is left behind.

The probe path also uses pm_runtime_get_sync() without checking its
return value.  If runtime resume fails, pm_runtime_get_sync() leaves the
usage count incremented and the driver continues initialization without
the device being resumed.  Use pm_runtime_resume_and_get() so resume
errors abort probe without leaking a PM usage count.

Finally, component registration failure currently jumps to a label that
drops a runtime PM reference even though the temporary probe reference
was already released.  Return the component registration error directly,
and do not drop an unmatched PM reference from .remove().

Fixes: 7eb153585598 ("ASoC: mediatek: mt8189: add platform driver")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260514-asoc-mt8189-probe-cleanup-v1-1-ded733363281@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agocrypto: atmel-sha204a - fail on hwrng registration error in probe path
Thorsten Blum [Sun, 17 May 2026 16:27:40 +0000 (18:27 +0200)] 
crypto: atmel-sha204a - fail on hwrng registration error in probe path

Commit 13909a0c8897 ("crypto: atmel-sha204a - provide the otp content")
overwrote the hwrng registration return value when creating the sysfs
group, which allowed atmel_sha204a_probe() to succeed even if
devm_hwrng_register() failed.

Return immediately when devm_hwrng_register() fails, and report both
hwrng and sysfs registration errors with dev_err(). Adjust the sysfs
error log message for consistency.

Fixes: 13909a0c8897 ("crypto: atmel-sha204a - provide the otp content")
Cc: stable@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: atmel-sha204a - remove sysfs group before hwrng
Thorsten Blum [Sun, 17 May 2026 12:37:07 +0000 (14:37 +0200)] 
crypto: atmel-sha204a - remove sysfs group before hwrng

atmel_sha204a_probe() registers the hwrng before creating the sysfs
group. Mirror this order in atmel_sha204a_remove() by removing the sysfs
group before unregistering the hwrng.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: omap-des - drop of_match_ptr from OF match table
Thorsten Blum [Sun, 17 May 2026 10:36:52 +0000 (12:36 +0200)] 
crypto: omap-des - drop of_match_ptr from OF match table

Drop of_match_ptr() because OF matching is stubbed out when CONFIG_OF=n.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: omap-des - add COMPILE_TEST and fix CONFIG_OF=n build
Thorsten Blum [Sun, 17 May 2026 10:34:14 +0000 (12:34 +0200)] 
crypto: omap-des - add COMPILE_TEST and fix CONFIG_OF=n build

CRYPTO_DEV_OMAP_DES only depends on ARCH_OMAP2PLUS, which is ARM-only
and selects OF via ARM's USE_OF, making any non-OF code unreachable.

Add COMPILE_TEST so the driver can be built with CONFIG_OF=n, making the
non-OF code reachable.

Fix the resulting non-OF build failures:

- omap_des_irq() was defined inside a CONFIG_OF block, but is referenced
  unconditionally from omap_des_probe(). Move the CONFIG_OF guard so it
  only covers omap_des_get_of().

- The non-OF omap_des_get_of() stub took a struct device *, while
  omap_des_probe() passes a struct platform_device *. Make the stub
  prototype match the OF implementation and the caller.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agoMIPS: Remove unused arch/mips/crypto directory
Ethan Nelson-Moore [Sun, 17 May 2026 03:20:56 +0000 (20:20 -0700)] 
MIPS: Remove unused arch/mips/crypto directory

The last MIPS crypto code was moved to lib/crypto/mips in
commit c9e5ac0ab9d1 ("lib/crypto: mips/md5: Migrate optimized code into
library"). However, arch/mips/crypto still contains stub Kconfig,
Makefile, and .gitignore files. Remove these unnecessary files.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agoLoongArch: Remove unused arch/loongarch/crypto directory
Ethan Nelson-Moore [Sun, 17 May 2026 03:14:26 +0000 (20:14 -0700)] 
LoongArch: Remove unused arch/loongarch/crypto directory

All LoongArch crypto code was moved to arch/loongarch/lib in
commit 72f51a4f4b07 ("loongarch/crc32: expose CRC32 functions through
lib"). However, arch/loongarch/crypto still contains stub Kconfig and
Makefile files. Remove these unnecessary files.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: atmel-sha - use memcpy_and_pad to simplify hmac_setup
Thorsten Blum [Sat, 16 May 2026 23:42:12 +0000 (01:42 +0200)] 
crypto: atmel-sha - use memcpy_and_pad to simplify hmac_setup

Use memcpy_and_pad() instead of memcpy() followed by memset() to
simplify atmel_sha_hmac_setup().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: drivers - remove of_match_ptr from OF match tables
Thorsten Blum [Sat, 16 May 2026 18:23:36 +0000 (20:23 +0200)] 
crypto: drivers - remove of_match_ptr from OF match tables

Drop of_match_ptr() because OF matching is stubbed out when CONFIG_OF=n.

Indent bcm_spu_pdriver.driver and its members while at it.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: eip93 - fix reset ring register definition
Aleksander Jan Bajkowski [Sat, 16 May 2026 12:26:51 +0000 (14:26 +0200)] 
crypto: eip93 - fix reset ring register definition

This patch fixes a descriptor ring reset. This causes a hang in the
driver's unload/load sequence.

Fixes: 9739f5f93b78 ("crypto: eip93 - Add Inside Secure SafeXcel EIP-93 crypto engine support")
Suggested-by: Benjamin Larsson <benjamin.larsson@genexis.eu>
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: atmel-i2c - drop redundant void * callback cast in enqueue
Thorsten Blum [Fri, 15 May 2026 20:29:48 +0000 (22:29 +0200)] 
crypto: atmel-i2c - drop redundant void * callback cast in enqueue

The callback already has the correct type - remove the redundant cast.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agoinclude: Remove unused crypto-ux500.h
Costa Shulyupin [Fri, 15 May 2026 19:02:14 +0000 (22:02 +0300)] 
include: Remove unused crypto-ux500.h

The UX500 crypto drivers were removed in commit 453de3eb08c4
("crypto: ux500/cryp - delete driver") and commit dd7b7972cb89
("crypto: ux500/hash - delete driver"). No file includes
this header.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: tegra - Don't touch bo refcount in host1x bo pin/unpin
Mikko Perttunen [Fri, 15 May 2026 02:34:52 +0000 (11:34 +0900)] 
crypto: tegra - Don't touch bo refcount in host1x bo pin/unpin

Since commit "gpu: host1x: Allow entries in BO caches to be freed",
host1x_bo_pin() and host1x_bo_unpin() handle the bo's refcount
themselves. .pin/.unpin callbacks should not adjust it.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: riscv/aes - replace min_t with min in riscv64_aes_ctr_crypt
Thorsten Blum [Thu, 14 May 2026 16:55:10 +0000 (18:55 +0200)] 
crypto: riscv/aes - replace min_t with min in riscv64_aes_ctr_crypt

Use the simpler min() macro since the values are unsigned and
compatible.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agoX.509: Fix validation of ASN.1 certificate header
Lukas Wunner [Thu, 14 May 2026 06:55:58 +0000 (08:55 +0200)] 
X.509: Fix validation of ASN.1 certificate header

x509_load_certificate_list() seeks to enforce that a certificate starts
with 0x30 0x82 (ASN.1 SEQUENCE tag followed by a length of more than 256
and less than 65535 bytes).

But it only enforces that *either* of those two byte values are present,
instead of checking for the *conjunction* of the two values.  Fix it.

Fixes: 631cc66eb9ea ("MODSIGN: Provide module signing public keys to the kernel")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/r/20260508033917.B5873C2BCB0@smtp.kernel.org/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.7+
Reviewed-by: Ignat Korchagin <ignat@linux.win>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agoDocumentation: qat_rl: make rate limiting wording clearer
Fiona Trahe [Wed, 13 May 2026 15:33:08 +0000 (16:33 +0100)] 
Documentation: qat_rl: make rate limiting wording clearer

The term "capability" typically refers to an ability to perform an
action, whereas "capacity" denotes a measurable amount of resources.

Since the sysfs-driver-qat_rl document describes remaining resources
available to perform work, "capacity" is the more accurate term.

Replace "capability" with "capacity" in the document.

Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - handle sysfs-triggered reset callbacks
Ahsan Atta [Wed, 13 May 2026 15:16:59 +0000 (17:16 +0200)] 
crypto: qat - handle sysfs-triggered reset callbacks

A reset requested through /sys/bus/pci/devices/.../reset invokes the
driver reset_prepare() and reset_done() callbacks. The QAT driver does
not implement those callbacks today, so the reset proceeds without
quiescing the device or bringing it back up afterward, which leaves
the device unusable.

Hook reset_prepare() and reset_done() into adf_err_handler so the
common shutdown and recovery flow also runs for reset. Skip device
quiesce if the device is already in a down state.

Cc: stable@vger.kernel.org
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - factor out AER reset helpers
Ahsan Atta [Wed, 13 May 2026 15:16:58 +0000 (17:16 +0200)] 
crypto: qat - factor out AER reset helpers

Move the shutdown and recovery sequences out of adf_error_detected()
and adf_slot_reset() into reset_prepare() and reset_done() helpers.

This makes the AER recovery path easier to follow and prepares the
common reset flow for reuse by additional PCI reset callbacks without
duplicating the logic.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - skip restart for down devices
Ahsan Atta [Wed, 13 May 2026 15:16:57 +0000 (17:16 +0200)] 
crypto: qat - skip restart for down devices

Skip the shutdown and restart flow when adf_slot_reset() is entered
for a device that is already down. In that case, leave
ADF_STATUS_RESTARTING clear and let adf_slot_reset() restore PCI
function state without calling adf_dev_up(), re-enabling SR-IOV, or
sending restarted notifications.

This is in preparation for adding reset_prepare() and reset_done()
callbacks in adf_aer.c.

Cc: stable@vger.kernel.org
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - centralize bus master enable
Ahsan Atta [Wed, 13 May 2026 15:16:56 +0000 (17:16 +0200)] 
crypto: qat - centralize bus master enable

QAT driver currently toggles PCI bus mastering in multiple places
(probe paths, and reset callbacks). This makes BME state depend on
call ordering and on what PCI command bits were captured in saved PCI
config state.

Make BME control explicit and deterministic:
- remove pci_set_master() from device-specific probe paths
- add adf_set_bme() and call it from adf_dev_init() so BME is enabled
  at one point before device bring-up
- drop redundant pci_set_master() and pci_clear_master from adf_aer.c
  and rely on the unified init path for BME enablement

This is in preparation for adding reset_prepare() and reset_done()
hooks. In the PCI reset callback flow, the PCI core saves and
restores device configuration state around reset_prepare() and
reset_done(). This change is needed to ensure that we are able to
properly shutdown or reinitialize the device post sysfs triggered
resets.

Cc: stable@vger.kernel.org
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - notify fatal error before AER reset preparation
Ahsan Atta [Wed, 13 May 2026 15:16:55 +0000 (17:16 +0200)] 
crypto: qat - notify fatal error before AER reset preparation

Send fatal error notifications to subsystems and VFs as soon as
AER error detection starts, before entering the reset preparation
shutdown sequence.

This reduces notification latency and ensures peers are informed
immediately on fatal detection, rather than after restart-state setup
and arbitration teardown.

Cc: stable@vger.kernel.org
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - keep VFs enabled during reset
Ahsan Atta [Wed, 13 May 2026 15:16:54 +0000 (17:16 +0200)] 
crypto: qat - keep VFs enabled during reset

When a reset is triggered via sysfs, the PCI core invokes the
reset_prepare() callback while holding pci_dev_lock(), which includes
the PCI configuration space access semaphore. If reset_prepare() calls
adf_dev_down(), the call chain adf_dev_stop() -> adf_disable_sriov()
-> pci_disable_sriov() attempts to acquire the same semaphore,
resulting in a deadlock.

Avoid this by skipping pci_disable_sriov() when ADF_STATUS_RESTARTING
is set. During reset the PCI topology is preserved, so VF devices
remain valid and enumerated across the reset. VF notification and the
quiesce handshake via adf_pf2vf_notify_restarting() are still
performed unconditionally so that VFs stop submitting work before the
PF shuts down.

Correspondingly, skip pci_enable_sriov() in adf_enable_sriov() when
VFs are already present, since their PCI devices were preserved from
before the restart.

This is in preparation for adding reset_prepare() and reset_done()
callbacks in adf_aer.c.

Cc: stable@vger.kernel.org
Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - fix VF2PF work teardown race in adf_disable_sriov()
Giovanni Cabiddu [Wed, 13 May 2026 14:47:32 +0000 (15:47 +0100)] 
crypto: qat - fix VF2PF work teardown race in adf_disable_sriov()

The VF2PF interrupt handler queues PF-side response work that stores a
raw pointer to per-VF state (struct adf_accel_vf_info). Currently,
adf_disable_sriov() destroys per-VF mutexes and frees vf_info without
stopping new VF2PF work or waiting for in-flight workers to complete. A
concurrently scheduled or already queued worker can then dereference
freed memory.

This manifests as a use-after-free when KASAN is enabled:

  BUG: KASAN: null-ptr-deref in mutex_lock+0x76/0xe0
  Write of size 8 at addr 0000000000000260 by task kworker/24:2/...
  Workqueue: qat_pf2vf_resp_wq adf_iov_send_resp [intel_qat]
  Call Trace:
    kasan_report+0x119/0x140
    mutex_lock+0x76/0xe0
    adf_gen4_pfvf_send+0xd4/0x1f0 [intel_qat]
    adf_recv_and_handle_vf2pf_msg+0x290/0x360 [intel_qat]
    adf_iov_send_resp+0x8c/0xe0 [intel_qat]
    process_one_work+0x6ac/0xfd0
    worker_thread+0x4dd/0xd30
    kthread+0x326/0x410
    ret_from_fork+0x33b/0x670

Add a PF-local flag, vf2pf_disabled, that gates work queueing, worker
processing, and interrupt re-enabling during teardown. Set this flag
atomically with the hardware interrupt mask inside
adf_disable_all_vf2pf_interrupts(). After masking, synchronize the AE
cluster MSI-X interrupt and flush the PF response workqueue before
tearing down per-VF locks and state so all in-flight work completes
before vf_info is destroyed.

Introduce adf_enable_all_vf2pf_interrupts() to clear the flag and
unmask all VF2PF interrupts under the same lock when SR-IOV is
re-enabled. This ensures the software flag and hardware state transition
atomically on both the enable and disable paths.

Cc: stable@vger.kernel.org
Fixes: ed8ccaef52fa ("crypto: qat - Add support for SRIOV")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: ecc - Fix carry overflow in vli multiplication
Anastasia Tishchenko [Wed, 13 May 2026 10:57:40 +0000 (13:57 +0300)] 
crypto: ecc - Fix carry overflow in vli multiplication

The carry flag calculation fails when r01.m_high is saturated
(0xFFFFFFFFFFFFFFFF) and addition of lower bits overflows.

The condition (r01.m_high < product.m_high) doesn't handle the case
where r01.m_high == product.m_high and an additional carry exists
from lower-bit overflow.

When commit 3c4b23901a0c ("crypto: ecdh - Add ECDH software support")
introduced crypto/ecc.c, it split the muladd() function in the
micro-ecc library into separate mul_64_64() and add_128_128() helpers.
It seems the check got lost in translation.

Add proper handling for this boundary by accounting for the carry
from the lower addition.

Fixes: 3c4b23901a0c ("crypto: ecdh - Add ECDH software support")
Signed-off-by: Anastasia Tishchenko <sv3iry@gmail.com>
Cc: stable@vger.kernel.org # v4.8+
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - remove MODULE_VERSION
Giovanni Cabiddu [Wed, 13 May 2026 08:57:45 +0000 (09:57 +0100)] 
crypto: qat - remove MODULE_VERSION

In-tree drivers do not need MODULE_VERSION as the kernel release
identifies the version of their code. The static version "0.6.0", which
the QAT drivers currently report, can be misleading as it might suggest
the drivers are outdated.

Remove MODULE_VERSION() from all QAT driver modules and the related
ADF_DRV_VERSION, ADF_MAJOR_VERSION, ADF_MINOR_VERSION and
ADF_BUILD_VERSION macros from adf_common_drv.h.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: atmel - use min3 to simplify atmel_sha_append_sg
Thorsten Blum [Tue, 12 May 2026 14:51:24 +0000 (16:51 +0200)] 
crypto: atmel - use min3 to simplify atmel_sha_append_sg

Replace two consecutive min() calls with min3() to simplify the code.

And since count is unsigned and cannot be less than zero, adjust the if
check and update the comment accordingly.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: cesa - use max to simplify mv_cesa_probe
Thorsten Blum [Tue, 12 May 2026 13:34:15 +0000 (15:34 +0200)] 
crypto: cesa - use max to simplify mv_cesa_probe

Use max() to simplify mv_cesa_probe() and improve its readability.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - rename adf_ctl_drv.c to adf_module.c
Giovanni Cabiddu [Mon, 11 May 2026 10:04:09 +0000 (11:04 +0100)] 
crypto: qat - rename adf_ctl_drv.c to adf_module.c

Now that the character device and IOCTL interface have been removed,
adf_ctl_drv.c only contains module_init/module_exit hooks. Rename it
to adf_module.c to better reflect its purpose and rename the init/exit
functions accordingly.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: qat - remove unused character device and IOCTLs
Giovanni Cabiddu [Mon, 11 May 2026 10:04:08 +0000 (11:04 +0100)] 
crypto: qat - remove unused character device and IOCTLs

The QAT driver exposes a character device (qat_adf_ctl) with IOCTLs
for device configuration, start, stop, status query and enumeration.
These IOCTLs are not part of any public uAPI header and have no known
in-tree or out-of-tree users. Device lifecycle is already managed via
sysfs.

The ioctl interface also increases the attack surface and is the
subject of a number of bug reports.

Remove the character device, the IOCTL definitions, and the related
data structures (adf_dev_status_info, adf_user_cfg_key_val,
adf_user_cfg_section, adf_user_cfg_ctl_data). Drop the now-unused
adf_cfg_user.h header and strip adf_ctl_drv.c down to the minimal
module_init/module_exit hooks for workqueue, AER, and crypto/compression
algorithm registration.

Clean up leftover dead code that was only reachable from the removed
IOCTL paths: adf_cfg_del_all(), adf_devmgr_verify_id(),
adf_devmgr_get_num_dev(), adf_devmgr_get_dev_by_id(),
adf_get_vf_real_id() and the unused ADF_CFG macros.

Additionally, drop the entry associated to QAT IOCTLs in
ioctl-number.rst.

Cc: stable@vger.kernel.org
Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Reported-by: Zhi Wang <wangzhi@stu.xidian.edu.cn>
Reported-by: Bin Yu <byu@xidian.edu.cn>
Reported-by: MingYu Wang <w15303746062@163.com>
Closes: https://lore.kernel.org/all/61d6d499.ab89.19b9b7f3186.Coremail.wangzhi_xd@stu.xidian.edu.cn/
Link: https://lore.kernel.org/all/20260508034841.256794-1-w15303746062@163.com/
Link: https://lore.kernel.org/all/20260508023542.256299-1-w15303746062@163.com/
Link: https://lore.kernel.org/all/20260504025120.98242-1-w15303746062@163.com/
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agocrypto: hisilicon/sec2 - lower priority for hisilicon crypto implementations
lizhi [Mon, 11 May 2026 00:49:27 +0000 (08:49 +0800)] 
crypto: hisilicon/sec2 - lower priority for hisilicon crypto implementations

Lower the priority of HiSilicon's crypto implementations to allow more
suitable alternatives to be selected. For example, certain kernel
use-cases do not benefit from HiSilicon's symmetric crypto algorithms.
This change ensures that more appropriate options are chosen first while
retaining HiSilicon's implementations as alternatives.

Signed-off-by: lizhi <lizhi206@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Reviewed-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 weeks agodriver core: Constify core device attributes
Thomas Weißschuh [Tue, 12 May 2026 16:39:15 +0000 (18:39 +0200)] 
driver core: Constify core device attributes

To make sure these attributes are not modified by accident or by an
attacker, move them to read-only memory.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-5-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: Allow the constification of device attributes
Thomas Weißschuh [Tue, 12 May 2026 16:39:14 +0000 (18:39 +0200)] 
driver core: Allow the constification of device attributes

Allow device attribute to reside in read-only memory.
Both const and non-const attributes are handled by the utility macros
and attributes can be migrated one-by-one.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-4-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: Stop using generic sysfs macros for device attributes
Thomas Weißschuh [Tue, 12 May 2026 16:39:13 +0000 (18:39 +0200)] 
driver core: Stop using generic sysfs macros for device attributes

The constification of device attributes will require a transition phase,
where 'struct device_attribute' contains a classic non-const and a new
const variant of the 'show' and 'store' callbacks.

As __ATTR() and friends can not handle this duplication stop using them.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-3-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: Add low-level macros for device attributes
Thomas Weißschuh [Tue, 12 May 2026 16:39:12 +0000 (18:39 +0200)] 
driver core: Add low-level macros for device attributes

For the upcoming constification of device attributes the generic
__ATTR() macros are insufficient.

Prepare for a split by introducing new low-level macros specific to
device attributes.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-2-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: Delete DEVICE_ATTR_PREALLOC()
Thomas Weißschuh [Tue, 12 May 2026 16:39:11 +0000 (18:39 +0200)] 
driver core: Delete DEVICE_ATTR_PREALLOC()

This macro is unused and would create extra work during the upcoming
constification of device attributes. Remove it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-1-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: Avoid warning when removing a device while its supplier is unbinding
Herve Codina [Mon, 11 May 2026 15:57:50 +0000 (17:57 +0200)] 
driver core: Avoid warning when removing a device while its supplier is unbinding

During driver removal, the following warning can appear:
   WARNING: CPU: 1 PID: 139 at drivers/base/core.c:1497 __device_links_no_driver+0xcc/0xfc
   ...
   Call trace:
     __device_links_no_driver+0xcc/0xfc (P)
     device_links_driver_cleanup+0xa8/0xf0
     device_release_driver_internal+0x208/0x23c
     device_links_unbind_consumers+0xe0/0x108
     device_release_driver_internal+0xec/0x23c
     device_links_unbind_consumers+0xe0/0x108
     device_release_driver_internal+0xec/0x23c
     device_links_unbind_consumers+0xe0/0x108
     device_release_driver_internal+0xec/0x23c
     driver_detach+0xa0/0x12c
     bus_remove_driver+0x6c/0xbc
     driver_unregister+0x30/0x60
     pci_unregister_driver+0x20/0x9c
     lan966x_pci_driver_exit+0x18/0xa90 [lan966x_pci]

This warning is triggered when a consumer is removed because the links
status of its supplier is not DL_DEV_DRIVER_BOUND and the link flag
DL_FLAG_SYNC_STATE_ONLY is not set.

The topology in terms of consumers/suppliers used was the following
(consumer ---> supplier):

      i2c -----------> OIC ----> PCI device
       |                ^
       |                |
       +---> pinctrl ---+

When the PCI device is removed, the OIC (interrupt controller) has to be
removed. In order to remove the OIC, pinctrl and i2c need to be removed
and to remove pinctrl, i2c need to be removed. The removal order is:
  1) i2c
  2) pinctrl
  3) OIC
  4) PCI device

In details, the removal sequence is the following (with 0000:01:00.0 the
PCI device):
  driver_detach: call device_release_driver_internal(0000:01:00.0)...
    device_links_busy(0000:01:00.0):
      links->status = DL_DEV_UNBINDING
    device_links_unbind_consumers(0000:01:00.0):
      0000:01:00.0--oic link->status = DL_STATE_SUPPLIER_UNBIND
      call device_release_driver_internal(oic)...
        device_links_busy(oic):
          links->status = DL_DEV_UNBINDING
        device_links_unbind_consumers(oic):
          oic--pinctrl link->status = DL_STATE_SUPPLIER_UNBIND
          call device_release_driver_internal(pinctrl)...
            device_links_busy(pinctrl):
              links->status = DL_DEV_UNBINDING
            device_links_unbind_consumers(pinctrl):
              pinctrl--i2c link->status = DL_STATE_SUPPLIER_UNBIND
              call device_release_driver_internal(i2c)...
                device_links_busy(i2c): links->status = DL_DEV_UNBINDING
                __device_links_no_driver(i2c)...
                  pinctrl--i2c link->status is DL_STATE_SUPPLIER_UNBIND
                  oic--i2c link->status is DL_STATE_ACTIVE
                  oic--i2c link->supplier->links.status is DL_DEV_UNBINDING

The warning is triggered by the i2c removal because the OIC (supplier)
links status is not DL_DEV_DRIVER_BOUND. Its links status is indeed set
to DL_DEV_UNBINDING.

It is perfectly legit to have the links status set to DL_DEV_UNBINDING
in that case. Indeed we had started to unbind the OIC which triggered
the consumer unbinding and didn't finish yet when the i2c is unbound.

Avoid the warning when the supplier links status is set to
DL_DEV_UNBINDING and thus support this removal sequence without any
warnings.

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Saravana Kannan <saravanak@google.com>
Link: https://patch.msgid.link/20260511155755.34428-4-herve.codina@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agoof: dynamic: Fix overlayed devices not probing because of fw_devlink
Saravana Kannan [Mon, 11 May 2026 15:57:49 +0000 (17:57 +0200)] 
of: dynamic: Fix overlayed devices not probing because of fw_devlink

When an overlay is applied, if the target device has already probed
successfully and bound to a device, then some of the fw_devlink logic
that ran when the device was probed needs to be rerun. This allows newly
created dangling consumers of the overlayed device tree nodes to be
moved to become consumers of the target device.

[Herve: Add the call to driver_deferred_probe_trigger()]
[Herve: Use fwnode_test_flag() to test fwnode flags value]

Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays")
Reported-by: Herve Codina <herve.codina@bootlin.com>
Closes: https://lore.kernel.org/lkml/CAMuHMdXEnSD4rRJ-o90x4OprUacN_rJgyo8x6=9F9rZ+-KzjOg@mail.gmail.com/
Closes: https://lore.kernel.org/all/20240221095137.616d2aaa@bootlin.com/
Closes: https://lore.kernel.org/lkml/20240312151835.29ef62a0@bootlin.com/
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/lkml/20240411235623.1260061-3-saravanak@google.com/
[Herve: Rebase on top of recent kernel]
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Tested-by: Kalle Niemi <kaleposti@gmail.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260511155755.34428-3-herve.codina@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agoRevert "treewide: Fix probing of devices in DT overlays"
Saravana Kannan [Mon, 11 May 2026 15:57:48 +0000 (17:57 +0200)] 
Revert "treewide: Fix probing of devices in DT overlays"

This reverts commit 1a50d9403fb90cbe4dea0ec9fd0351d2ecbd8924.

While the commit fixed fw_devlink overlay handling for one case, it
broke it for another case. So revert it and redo the fix in a separate
patch.

Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays")
Reported-by: Herve Codina <herve.codina@bootlin.com>
Closes: https://lore.kernel.org/lkml/CAMuHMdXEnSD4rRJ-o90x4OprUacN_rJgyo8x6=9F9rZ+-KzjOg@mail.gmail.com/
Closes: https://lore.kernel.org/all/20240221095137.616d2aaa@bootlin.com/
Closes: https://lore.kernel.org/lkml/20240312151835.29ef62a0@bootlin.com/
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/lkml/20240411235623.1260061-2-saravanak@google.com/
[Herve: Fix conflicts due to f72e77c33e4b ("device property: Make
modifications of fwnode "flags" thread safe")]

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com> # for I2C
Link: https://patch.msgid.link/20260511155755.34428-2-herve.codina@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: Use mod_delayed_work to prevent lost deferred probe work
Zhang Yuwei [Fri, 10 Apr 2026 02:44:48 +0000 (10:44 +0800)] 
driver core: Use mod_delayed_work to prevent lost deferred probe work

The deferred_probe_timeout_work may be permanently and unexpectedly
canceled when deferred_probe_extend_timeout() executes concurrently.
Starting with deferred_probe_timeout_work pending, the problem can
occur after the following sequence:

  CPU0                                 CPU1
deferred_probe_extend_timeout
  -> cancel_delayed_work() => true
                                   deferred_probe_extend_timeout
                                     -> cancel_delayed_work()
                                       -> __cancel_work()
                                         -> try_grab_pending()
  -> schedule_delayed_work()
    -> queue_delayed_work_on()
(Since the pending bit is grabbed,
 it just returns without queuing)
                                         -> set_work_pool_and_clear_pending()
                                  (This __cancel_work() returns false and
                                     the work will never be queued again)

The root cause is that the WORK_STRUCT_PENDING_BIT of the work_struct
is set temporarily in __cancel_work() (via try_grab_pending()). This
transient state prevents the work_struct from being successfully queued
by another CPU.

To fix this, replace the original non-atomic cancel and schedule
mechanism with mod_delayed_work(). This ensures the modification is
handled atomically and guarantees that the work is not lost.

Fixes: 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
Signed-off-by: Zhang Yuwei <zhangyuwei20@huawei.com>
Link: https://patch.msgid.link/20260410024448.387231-1-zhangyuwei20@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodevice property: fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
Stepan Ionichev [Thu, 14 May 2026 17:14:55 +0000 (22:14 +0500)] 
device property: fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()

When called with FWNODE_GRAPH_ENDPOINT_NEXT, the function walks every
endpoint under the requested port and, for any endpoint whose ID is
greater than or equal to the requested one, may store a fwnode
reference in best_ep via fwnode_handle_get(). If a later iteration
finds an exact-ID match, the function returns that endpoint directly
without dropping the reference held by best_ep, leaking it.

Drop the saved candidate before returning the exact-match endpoint.

This affects callers that use FWNODE_GRAPH_ENDPOINT_NEXT to ask for
the next endpoint with ID >= the requested one (used by a number of
media drivers, e.g. imx7/8, sun6i CSI, omap3isp, xilinx-csi2,
stm32-csi). Each leak retains a fwnode reference until reboot/unbind.

Fixes: 0fcc2bdc8aff ("device property: Add fwnode_graph_get_endpoint_by_id()")
Signed-off-by: Stepan Ionichev <sozdayvek@gmail.com>
Link: https://patch.msgid.link/20260514171455.27271-1-sozdayvek@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodevice core: make struct device_driver groups members constant arrays
Heiner Kallweit [Mon, 16 Mar 2026 22:11:12 +0000 (23:11 +0100)] 
device core: make struct device_driver groups members constant arrays

Constify the groups arrays, allowing to assign constant arrays.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/42624513-923c-4970-834d-036282e24e24@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: make struct bus_type groups members constant arrays
Heiner Kallweit [Mon, 16 Mar 2026 22:10:31 +0000 (23:10 +0100)] 
driver core: make struct bus_type groups members constant arrays

Constify the groups arrays, allowing to assign constant arrays.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/265f6584-8edd-48a0-9568-a9d584b9ec3a@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: constify group arrays arguments in driver_add_groups and driver_remove_g...
Heiner Kallweit [Mon, 16 Mar 2026 22:09:37 +0000 (23:09 +0100)] 
driver core: constify group arrays arguments in driver_add_groups and driver_remove_groups

Constify the groups array argument in driver_add_groups and
driver_remove_groups. This allows to pass constant arrays as
arguments.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/21a1e5f1-c6a0-4f6f-aa86-1e6abd25f9c6@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodriver core: delete useless forward declaration of "struct class"
Alexey Dobriyan [Sun, 17 May 2026 16:17:47 +0000 (19:17 +0300)] 
driver core: delete useless forward declaration of "struct class"

"struct class" is defined earlier on both cases.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: https://patch.msgid.link/6d5937c5-9d41-4cfe-9e42-0946e12dc72d@p183
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodevfreq: change devfreq_event_class to a const struct
Jori Koolstra [Tue, 3 Mar 2026 16:25:04 +0000 (17:25 +0100)] 
devfreq: change devfreq_event_class to a const struct

The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change devfreq_event_class to be a const struct class and drop
the class_create() call. Compile tested.

Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20260303162505.3748001-1-jkoolstra@xs4all.nl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodevcoredump: Remove exit call
Heiner Kallweit [Thu, 2 Apr 2026 21:17:13 +0000 (23:17 +0200)] 
devcoredump: Remove exit call

Kconfig symbol DEV_COREDUMP is of type bool, therefore devcoredump
can't be built as a module and the exit code is a no-op.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/39a3821b-03d6-4ff0-97b7-82411a76d39a@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agokernfs: fix suspicious RCU usage in kernfs_put()
Conor Kotwasinski [Thu, 16 Apr 2026 13:43:15 +0000 (09:43 -0400)] 
kernfs: fix suspicious RCU usage in kernfs_put()

Commit 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.")
converted the WARN_ONCE() in kernfs_put() to read kn->name and
parent->name via rcu_dereference(), but kernfs_put() has callers that
hold neither kernfs_rwsem nor the RCU read lock. The inode eviction
path driven by memory reclaim is one such case:

  kernfs_put+0x53/0x60 fs/kernfs/dir.c:602
  evict+0x3c2/0xad0 fs/inode.c:846
  iput_final fs/inode.c:1966 [inline]
  iput.part.0+0x605/0xf50 fs/inode.c:2015
  iput+0x35/0x40 fs/inode.c:1981
  dentry_unlink_inode+0x2a1/0x490 fs/dcache.c:467
  __dentry_kill+0x1d0/0x600 fs/dcache.c:670
  shrink_dentry_list+0x180/0x5e0 fs/dcache.c:1174
  prune_dcache_sb+0xea/0x150 fs/dcache.c:1256
  super_cache_scan+0x328/0x550 fs/super.c:223
  ...
  kswapd+0x556/0xba0 mm/vmscan.c:7343

lockdep complains with "suspicious RCU usage" whenever the WARN
fires from such a context.

Wrap the rcu_dereference() calls in an RCU read-side critical section.
Gate on the active-ref check so the lock is only taken when the WARN
is about to fire.

Note that this does not address the underlying imbalance in
kn->active that triggers the WARN.

Fixes: 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.")
Reported-by: syzbot+0dfe499ea713e0a15bec@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0dfe499ea713e0a15bec
Signed-off-by: Conor Kotwasinski <conorkotwasinski2024@u.northwestern.edu>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://patch.msgid.link/20260416134315.1474726-1-conorkotwasinski2024@u.northwestern.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agodrm: verisilicon: add support for cursor planes
Icenowy Zheng [Wed, 6 May 2026 17:56:10 +0000 (01:56 +0800)] 
drm: verisilicon: add support for cursor planes

Verisilicon display controllers support hardware cursors per output
port.

Add support for them as cursor planes.

Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260506175610.2542888-3-zhengxingda@iscas.ac.cn
3 weeks agodrm: verisilicon: add max cursor size to HWDB
Icenowy Zheng [Wed, 6 May 2026 17:56:09 +0000 (01:56 +0800)] 
drm: verisilicon: add max cursor size to HWDB

Different display controller variants support different maximum cursor
size. All known DC8200 variants support both 32x32 and 64x64, but some
DC8000 variants support either only 32x32 or up to 256x256.

The minimum size is fixed at 32 and only PoT square sizes are supported.

Add the max cursor size field to HWDB and fill all entries with 64.

Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20260506175610.2542888-2-zhengxingda@iscas.ac.cn
3 weeks agospi: Use named initializers for arrays of i2c_device_data
Uwe Kleine-König (The Capable Hub) [Mon, 18 May 2026 17:05:41 +0000 (19:05 +0200)] 
spi: Use named initializers for arrays of i2c_device_data

While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.

The mentioned robustness is relevant for a planned change to struct
i2c_device_id that replaces .driver_data by an anonymous union.

While touching these arrays, drop a comma after the list terminator.

This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20260518170542.807843-2-u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agoASoC: SOF: ipc4-topology: Enable deep buffer capture
Seppo Ingalsuo [Fri, 22 May 2026 07:56:59 +0000 (10:56 +0300)] 
ASoC: SOF: ipc4-topology: Enable deep buffer capture

This patch lets a capture PCM to operate with deep buffer
by taking dsp_max_burst_size_in_ms from topology if it is
defined. Earlier the value from topology was omitted for
capture.

If not defined, the maximum burst size for capture is set
similarly as before to one millisecond.

The dma_buffer_size is set similarly as for playback from
largest of deep_buffer_dma_ms or SOF_IPC4_MIN_DMA_BUFFER_SIZE
times OBS.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20260522075659.2645-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agospi: aspeed: Fix __iomem annotation and VLA parameter
Mark Brown [Fri, 22 May 2026 10:55:30 +0000 (11:55 +0100)] 
spi: aspeed: Fix __iomem annotation and VLA parameter

Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com> says:

This series fixes two sparse warnings reported by the kernel test robot.
The first patch fixes missing __iomem annotation on an MMIO pointer
parameter, which also caused a redundant cast at the call site.
A VLA function parameter warning is also fixed in this patch series.

Link: https://patch.msgid.link/20260522071621.102507-1-chin-ting_kuo@aspeedtech.com
3 weeks agospi: aspeed: Replace VLA parameter with flat pointer in calibration helper
Chin-Ting Kuo [Fri, 22 May 2026 07:16:21 +0000 (15:16 +0800)] 
spi: aspeed: Replace VLA parameter with flat pointer in calibration helper

aspeed_spi_ast2600_optimized_timing() declared its buffer argument as a
variable-length array parameter (u8 buf[rows][cols]), which causes a
sparse warning. Replace the VLA parameter with a plain u8 * and compute
the 2-D index manually. The corresponding call site is also updated.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605180441.uD3toFRJ-lkp@intel.com/
Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Link: https://patch.msgid.link/20260522071621.102507-3-chin-ting_kuo@aspeedtech.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agospi: aspeed: Fix missing __iomem annotation in output transfer path
Chin-Ting Kuo [Fri, 22 May 2026 07:16:20 +0000 (15:16 +0800)] 
spi: aspeed: Fix missing __iomem annotation in output transfer path

The dst parameter of aspeed_spi_user_transfer_tx() is an MMIO address
obtained from chip->ahb_base, but it was typed as void * instead of
void __iomem *.  This caused a sparse warning report. Fix the
parameter type to void __iomem * and drop the now-unnecessary
cast at the call site.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605180441.uD3toFRJ-lkp@intel.com/
Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Link: https://patch.msgid.link/20260522071621.102507-2-chin-ting_kuo@aspeedtech.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 weeks agonetfilter: nf_tables: fix dst corruption in same register operation
Fernando Fernandez Mancera [Mon, 11 May 2026 14:37:56 +0000 (16:37 +0200)] 
netfilter: nf_tables: fix dst corruption in same register operation

For lshift and rshift, the shift operations are performed in a loop over
32-bit words. The loop calculates the shifted value and write it to dst,
and then immediately reads from src to calculate the carry for the next
iteration. Because src and dst could point to the same memory location,
the carry is incorrectly calculated using the newly modified dst value
instead of the original src value.

Adding a temporary local variable to cache the original value before
writing to dst and using it for the carry calculation solves the
problem. In addition, partial overlap is rejected from control plane for
all kind of operations including byteorder. This was tested with the
following bytecode:

table test_table ip flags 0 use 1 handle 1
ip test_table test_chain use 3 type filter hook input prio 0 policy accept packets 0 bytes 0 flags 1
ip test_table test_chain 2
  [ immediate reg 1 0x44332211 0x88776655 ]
  [ bitwise reg 1 = ( reg 1 << 0x08000000 ) ]
  [ cmp eq reg 1 0x66443322 0x00887766 ]
  [ counter pkts 0 bytes 0 ]
ip test_table test_chain 4 3
  [ immediate reg 1 0x44332211 0x88776655 ]
  [ bitwise reg 1 = ( reg 1 << 0x08000000 ) ]
  [ cmp eq reg 1 0x55443322 0x00887766 ]
  [ counter pkts 21794 bytes 1917798 ]

Fixes: 567d746b55bc ("netfilter: bitwise: add support for shifts.")
Acked-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agoselftests: netfilter: add nft_fib_nexthop test
Jiayuan Chen [Wed, 20 May 2026 02:34:11 +0000 (10:34 +0800)] 
selftests: netfilter: add nft_fib_nexthop test

Functional coverage of nft_fib6_eval()'s nexthop enumeration over
three route shapes:

  1) single external nexthop (nhid)
  2) external nexthop group (nhid -> group)
  3) old-style multipath (nexthop ... nexthop ...)

Each scenario places one nexthop on the input device (veth0). For
(2) and (3) the matching nexthop is the second member, so the walk
has to traverse beyond the primary nh. Two nft counters on prerouting
verify the data path: one increments only when fib reports veth0 as
the oif, the other counts "missing" results and must stay at zero.

  ./nft_fib_nexthop.sh
  PASS: single external nexthop (nhid -> veth0)
  PASS: nexthop group (dummy0 + veth0)
  PASS: old-style multipath (sibling on veth0)

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: nft_fib_ipv6: handle routes via external nexthop
Jiayuan Chen [Wed, 20 May 2026 02:34:10 +0000 (10:34 +0800)] 
netfilter: nft_fib_ipv6: handle routes via external nexthop

fib6_info has a union:

    union {
        struct list_head fib6_siblings;
        struct list_head nh_list;
    };

Old-style multipath (ip -6 route add ... nexthop ... nexthop ...) uses
fib6_siblings.  External nexthop (ip -6 route add ... nhid N) uses
nh_list, linked into &nh->f6i_list.

nft_fib6_info_nh_uses_dev() blindly walks &rt->fib6_siblings, causing
an OOB read past the struct nexthop slab when rt->nh is set:

  ==================================================================
  BUG: KASAN: slab-out-of-bounds in nft_fib6_eval+0x1362/0x16c0
  Read of size 8 at addr ffff888103a099d0 by task ping/386

  CPU: 2 UID: 0 PID: 386 Comm: ping Not tainted 7.1.0-rc3+ #251 PREEMPT
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x76/0xa0
   print_report+0xd1/0x5f0
   kasan_report+0xe7/0x130
   __asan_report_load8_noabort+0x14/0x30
   nft_fib6_eval+0x1362/0x16c0
   nft_do_chain+0x279/0x18c0
   nft_do_chain_ipv6+0x1a8/0x230
   nf_hook_slow+0xad/0x200
   ipv6_rcv+0x152/0x380
   __netif_receive_skb_one_core+0x118/0x1c0
  ==================================================================

Branch by route shape: when rt->nh is set, walk via
nexthop_for_each_fib6_nh() (also covers nh groups, which the original
code missed); otherwise walk fib6_siblings, guarded by READ_ONCE() of
rt->fib6_nsiblings as required by commit 31d7d67ba127 ("ipv6: annotate
data-races around rt->fib6_nsiblings").

Fixes: 1c32b24c234b ("netfilter: nft_fib_ipv6: switch to fib6_lookup")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: nft_fib_ipv6: walk fib6_siblings under RCU
Jiayuan Chen [Wed, 20 May 2026 02:34:09 +0000 (10:34 +0800)] 
netfilter: nft_fib_ipv6: walk fib6_siblings under RCU

nft_fib6_info_nh_uses_dev() runs from nft_fib6_eval() in softirq under
rcu_read_lock().  fib6_siblings is modified by writers that hold
tb6_lock but do not wait for RCU readers, so the sibling walk should
use list_for_each_entry_rcu(): it adds READ_ONCE() on the ->next
pointer and lets CONFIG_PROVE_RCU_LIST validate the locking.

No functional change for non-debug builds.

Fixes: 1c32b24c234b ("netfilter: nft_fib_ipv6: switch to fib6_lookup")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: ebtables: fix OOB read in compat_mtw_from_user
Florian Westphal [Tue, 19 May 2026 20:52:07 +0000 (22:52 +0200)] 
netfilter: ebtables: fix OOB read in compat_mtw_from_user

Luxiao Xu says:

 The function compat_mtw_from_user() converts ebtables extensions from
 32-bit user structures to kernel native structures. However, it lacks
 proper validation of the user-supplied match_size/target_size.

 When certain extensions are processed, the kernel-side translation
 logic may perform memory accesses based on the extension's expected
 size. If the user provides a size smaller than what the extension
 requires, it results in an out-of-bounds read as reported by KASAN.

 This fix introduces a check to ensure match_size is at least as large
 as the extension's required compatsize. This covers matches, watchers,
 and targets, while maintaining compatibility with standard targets.

AFAIU this is relevant for matches that need to go though
match->compat_from_user() call.  Those that use plain memcpy with the
user-provided size are ok because the caller checks that size vs the
start of the next rule entry offset (which itself is checked vs. total
size copied from userspace).

The ->compat_from_user() callbacks assume they can read compatsize bytes,
so they need this extra check.

Based on an earlier patch from Luxiao Xu.

Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Luxiao Xu <rakukuip@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: disable payload mangling in userns
Florian Westphal [Sat, 16 May 2026 15:23:21 +0000 (23:23 +0800)] 
netfilter: disable payload mangling in userns

Several parts of network stack rely on iph->ihl validation
done by network stack before PRE_ROUTING.

Disable this feature for user namespaces for now.

tcp option handling is likely safe even for LOCAL_IN, so this
this leaves tcp option mangling via nft_exthdr.c as-is.

I don't think these are the only means to alter packets, but these
appear to be relatively prominent.

This could be relaxed later.  Example:
 - allow userns for ingress hook.
 - allow userns if base is transport header.

 Also, we should revalidate or restrict generally:
 - Don't allow linklayer writes to spill into network header
 - restrict ipv4 and ipv6 to 'known safe' writes, e.g.
   saddr/daddr/check/tos

Reported-by: Qi Tang <tpluszz77@gmail.com>
Reported-by: Tong Liu <lyutoon@gmail.com>
Tested-by: Qi Tang <tpluszz77@gmail.com>
Link: https://lore.kernel.org/netfilter-devel/20260515100411.3141-1-fw@strlen.de/
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: xt_cpu: prefer raw_smp_processor_id
Florian Westphal [Tue, 19 May 2026 18:10:08 +0000 (20:10 +0200)] 
netfilter: xt_cpu: prefer raw_smp_processor_id

With PREEMPT_RCU we get splat:

BUG: using smp_processor_id() in preemptible [..]
caller is cpu_mt+0x53/0xd0 net/netfilter/xt_cpu.c:37
CPU: 1 .. Comm: syz.3.1377 #0 PREEMPT(full)
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 check_preemption_disabled+0xd3/0xe0 lib/smp_processor_id.c:47
 cpu_mt+0x53/0xd0 net/netfilter/xt_cpu.c:37
 [..]

Just use raw version instead.
This is similar to 14d14a5d2957 ("netfilter: nft_meta: use raw_smp_processor_id()").

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Reported-by: syzbot+690d3e3ffa7335ac10eb@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: nf_conntrack_gre: fix gre keymap list corruption
Florian Westphal [Thu, 14 May 2026 12:21:57 +0000 (14:21 +0200)] 
netfilter: nf_conntrack_gre: fix gre keymap list corruption

Quoting reporter:
  A race between GRE keymap insertion and destruction can corrupt the
  kernel list or use a freed object. `nf_ct_gre_keymap_add()` publishes a
  new keymap pointer before the embedded `list_head` is linked, while
  `nf_ct_gre_keymap_destroy()` can concurrently delete and free that
  same object. An unprivileged user can reach this through the PPTP
  conntrack helper by racing PPTP control messages or helper teardown,
  leading to KASAN-detectable list corruption/UAF in kernel context.

 ## Root Cause Analysis
 `exp_gre()` installs GRE expectations for a PPTP control flow and then
  adds two GRE keymap entries [..]

 The add path publishes `ct_pptp_info->keymap[dir]` before linking the
 embedded list node [..]
 Concurrent teardown deletes that partially initialized object.

Make add/destroy symmetric: install both, destroy both while under lock.

Furthermore, we should refuse to publish a new mapping in case ct is going
away, else we may leak the allocation.

The "retrans" detection is strange:  existing mapping is checked for key
equality with the new mapping, then for "is on the list" via list walk.

But I can't see how an existing keymap entry can be NOT on list.

Change this to only check if we're asked to map same tuple again -- if so,
   skip re-install, else signal failure.

Last, add a bug trap for the keymap list; it has to be empty when namespace
is going away.

Reported-by: Leo Lin <leo@depthfirst.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: synproxy: refresh tcphdr after skb_ensure_writable
Chris Mason [Tue, 19 May 2026 19:36:14 +0000 (12:36 -0700)] 
netfilter: synproxy: refresh tcphdr after skb_ensure_writable

synproxy_tstamp_adjust() rewrites the TCP timestamp option in place
and then patches the TCP checksum via inet_proto_csum_replace4() on
the caller-supplied tcphdr pointer.  Both ipv4_synproxy_hook() and
ipv6_synproxy_hook() obtain that pointer with skb_header_pointer()
before calling in, so it may either alias skb->head directly or
point at the caller's on-stack _tcph buffer.

Between obtaining the pointer and using it, the function calls
skb_ensure_writable(skb, optend), which on a cloned or non-linear
skb invokes pskb_expand_head() and frees the old skb->head.  After
that point the cached th is stale:

    caller (ipv[46]_synproxy_hook)
      th = skb_header_pointer(skb, ..., &_tcph)
      synproxy_tstamp_adjust(skb, protoff, th, ...)
        skb_ensure_writable(skb, optend)
          pskb_expand_head()        /* kfree(old skb->head) */
        ...
        inet_proto_csum_replace4(&th->check, ...)
                                    /* writes into freed head, or
                                       into the caller's stack copy
                                       leaving the on-wire checksum
                                       stale */

The option bytes are written through skb->data and are fine; only
the checksum update goes through th and so lands in the wrong
place.  The result is either a write into freed slab memory or a
packet leaving with a checksum that does not match its payload.

Fix by re-deriving th from skb->data + protoff immediately after
skb_ensure_writable() succeeds, so the subsequent checksum update
targets the linear, writable header.

Fixes: 48b1de4c110a ("netfilter: add SYNPROXY core/target")
Assisted-by: kres (claude-opus-4-7)
Signed-off-by: Chris Mason <clm@meta.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agonetfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction...
Hamza Mahfooz [Mon, 11 May 2026 14:43:14 +0000 (10:43 -0400)] 
netfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction check

An unintended behavior in the TCP conntrack state machine allows a
connection to be forced into the CLOSE state using an RST packet with an
invalid sequence number.

Specifically, after a SYN packet is observed, an RST with an invalid SEQ
can transition the conntrack entry to TCP_CONNTRACK_CLOSE, regardless of
whether the RST corresponds to the expected reply direction. The relevant
code path assumes the RST is a response to an outgoing SYN, but does not
validate packet direction or ensure that a matching SYN was actually sent
in the opposite direction.

As a result, a crafted packet sequence consisting of a SYN followed by an
invalid-sequence RST can prematurely terminate an active NAT entry. This
makes connection teardown easier than intended.

So, tighten the state transition logic to ensure that RST-triggered
CLOSE transitions only occur when the RST is a valid response to a
previously observed SYN in the correct direction.

Cc: stable@vger.kernel.org
Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
3 weeks agodevice property: set fwnode->secondary to NULL in fwnode_init()
Bartosz Golaszewski [Wed, 6 May 2026 11:57:00 +0000 (13:57 +0200)] 
device property: set fwnode->secondary to NULL in fwnode_init()

If a firmware node is allocated on the stack (for instance: temporary
software node whose life-time we control) or on the heap - but using a
non-zeroing allocation function - and initialized using fwnode_init(),
its secondary pointer will contain uninitalized memory which likely will
be neither NULL nor IS_ERR() and so may end up being dereferenced (for
example: in dev_to_swnode()). Set fwnode->secondary to NULL on
initialization.

Cc: stable <stable@kernel.org>
Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agominix: release the sb buffer_head when setting the v3 block size fails
Christoph Hellwig [Mon, 18 May 2026 13:03:30 +0000 (15:03 +0200)] 
minix: release the sb buffer_head when setting the v3 block size fails

At this point the superblock is already read, so jump to the label that
releases the buffer_head for it.

Fixes: d893fc670546 ("minix: handle set_blocksize failures")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260518130330.529085-1-hch@lst.de
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agomisc: rp1: Send IACK on IRQ activate to fix kdump/kexec
Xiaolei Wang [Mon, 18 May 2026 07:34:05 +0000 (15:34 +0800)] 
misc: rp1: Send IACK on IRQ activate to fix kdump/kexec

After a kexec/kdump reboot, the macb Ethernet controller fails to
receive any packets, causing DHCP to hang indefinitely and the network
interface to be unusable despite link being up.

The root cause is that RP1's level-triggered MSI-X interrupt sources
(such as macb on hwirq 6) may have their internal state machines stuck
in the "waiting for IACK" state. This happens because the previous
kernel crashed before sending the acknowledgment for a pending level
interrupt.

In this stuck state, RP1 will not generate new MSI-X writes even though
the interrupt source remains asserted. Since no new MSI-X is sent, the
GIC never sees a new edge, the chained IRQ handler is never invoked,
and the interrupt is permanently lost.

Fix this by sending MSIX_CFG_IACK in rp1_irq_activate(). This
unconditionally resets the MSI-X state machine back to idle when a
child device requests its interrupt. If the interrupt source is still
asserted, RP1 will immediately issue a new MSI-X with the freshly
configured msg_addr/msg_data, and normal interrupt delivery resumes.

Writing IACK when the state machine is already idle (i.e., on a normal
cold boot) is harmless — it has no effect.

Fixes: 49d63971f963 ("misc: rp1: RaspberryPi RP1 misc driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://patch.msgid.link/20260518073405.2115003-1-xiaolei.wang@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agogpib: cb7210: Fix region leak when request_irq fails
Hongling Zeng [Mon, 18 May 2026 02:29:39 +0000 (10:29 +0800)] 
gpib: cb7210: Fix region leak when request_irq fails

When request_irq() fails, the region allocated by request_region()
is not released. Fix this by adding an error handling path with
proper goto labels to release the region.

Fixes: e9dc69956d4d ("staging: gpib: Add Computer Boards GPIB driver")
Closes: https://lore.kernel.org/oe-kbuild-all/202605160620.ReBOadPX-lkp@intel.com/
Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn>
Cc: stable <stable@kernel.org>
Link: https://patch.msgid.link/20260518022939.16881-1-zenghongling@kylinos.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agoparport: Fix race between port and client registration
Ben Hutchings [Tue, 5 May 2026 18:45:12 +0000 (20:45 +0200)] 
parport: Fix race between port and client registration

The parport subsystem registers port devices before they are fully
initialised, resulting in a race condition where client drivers such
as lp can attach to ports that are not completely initialised or even
being torn down.

When the port and client drivers are built as modules and loaded
around the same time during boot, this occasionally results in a
crash.  I was able to make this happen reliably in a VM with a
PC-style parallel port by patching parport_pc to fail probing:

> --- a/drivers/parport/parport_pc.c
> +++ b/drivers/parport/parport_pc.c
> @@ -2069,7 +2069,7 @@ static struct parport *__parport_pc_probe_port(unsigned long int base,
>   if (!p)
>   goto out3;
>
> - base_res = request_region(base, 3, p->name);
> + base_res = NULL;
>   if (!base_res)
>   goto out4;
>

and then running:

    while true; do
        modprobe lp & modprobe parport_pc
wait
rmmod lp parport_pc
    done

for a few seconds.

In the long term I think port registration should be changed to put
the call to device_add() inside parport_announce_port(), but since the
latter currently cannot fail this will require changing all port
drivers.

For now, add a flag to indicate whether a port has been "announced"
and only try to attach client drivers to ports when the flag is set.

Fixes: 6fa45a226897 ("parport: add device-model to parport subsystem")
Closes: https://bugs.debian.org/1130365
Closes: https://lore.kernel.org/all/6ba903ad-9897-42bb-8c2d-337385cc3746@molgen.mpg.de/
Cc: stable <stable@kernel.org>
Signed-off-by: Ben Hutchings <benh@debian.org>
Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://patch.msgid.link/afo6uBv68GDevbMD@decadent.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agouio: uio_pci_generic_sva: fix double free of devm_kzalloc() memory
Guangshuo Li [Tue, 5 May 2026 15:02:56 +0000 (23:02 +0800)] 
uio: uio_pci_generic_sva: fix double free of devm_kzalloc() memory

uio_pci_sva allocates struct uio_pci_sva_dev with devm_kzalloc() in
probe(), but then calls kfree(udev) both on the probe() error path
(label out_free) and again in remove().

Because devm_kzalloc() allocations are devres-managed and are freed
automatically when the device is detached (including after a failing
probe() and during driver unbind), the explicit kfree() can lead to a
double free.

If probe() fails after devm_kzalloc(), the error path frees udev and
devres cleanup will free it again when the core unwinds the partially
bound device. On normal driver removal, remove() frees udev and devres
will free it again when the device is detached.

This issue was identified by a static analysis tool I developed and
confirmed by manual review. Fix by removing the manual kfree() calls
and dropping the now-unused label.

Fixes: 3397c3cd859a2 ("uio: Add SVA support for PCI devices via uio_pci_generic_sva.c")
Cc: stable <stable@kernel.org>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Link: https://patch.msgid.link/20260505150256.614071-1-lgs201920130244@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agoarm64: tlb: Flush walk cache when unsharing PMD tables
Zeng Heng [Thu, 21 May 2026 07:30:11 +0000 (15:30 +0800)] 
arm64: tlb: Flush walk cache when unsharing PMD tables

When huge_pmd_unshare() is called to unshare a PMD table, the
tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true
but the aarch64 tlb_flush() only checked tlb->freed_tables to
determine whether to use TLBF_NONE (vae1is, invalidates walk
cache) or TLBF_NOWALKCACHE (vale1is, leaf-only).

This caused the stale PMD page table entry to remain in the walk cache
after unshare, potentially leading to incorrect page table walks.

Fix by including unshared_tables in the check, so that when
unsharing tables, TLBF_NONE is used and the walk cache is properly
invalidated.

Here is the detailed distinction between vae1is and vale1is:

| Instruction Combination  | Actual Invalidation Scope                         |
| ------------------------ | --------------------------------------------------|
| `VAE1IS`  + TTL=`0`      | All entries at all levels (full invalidation)     |
| `VAE1IS`  + TTL=`2` (L2) | Non-leaf at Level 0/1 + leaf at Level 2           |
| `VALE1IS` + TTL=`0`      | Leaf entries at all levels (non-leaf not cleared) |
| `VALE1IS` + TTL=`2` (L2) | Leaf entry at Level 2 only                        |

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Fixes: 8ce720d5bd91 ("mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather")
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 weeks agoinit: do_mounts: use kmalloc() for allocations of temporary buffers
Mike Rapoport (Microsoft) [Wed, 20 May 2026 08:16:51 +0000 (11:16 +0300)] 
init: do_mounts: use kmalloc() for allocations of temporary buffers

Several places in init/do_mounts.c allocate temporary buffers for
filesystem names or options using __get_free_page() or alloc_page().

Usage of alloc_page() APIs is not required there and only creates
unnecessary noise with castings or conversion from struct page to void *.

kmalloc() is a better API for these uses and it also provides better
scalability and more debugging possibilities.

Replace use of __get_free_page() and alloc_page() with kmalloc().

While on it, add a check for -ENOMEM condition in mount_root_generic().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Link: https://patch.msgid.link/20260520-init-v1-1-aaf2ebac5ad9@kernel.org
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agoMerge patch series "writeback: fix race between cgroup_writeback_umount() and inode_s...
Christian Brauner [Fri, 22 May 2026 10:06:42 +0000 (12:06 +0200)] 
Merge patch series "writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()"

Baokun Li <libaokun@linux.alibaba.com> says:

When a container exits, a race between cgroup_writeback_umount() and
inode_switch_wbs() / cleanup_offline_cgwb() can trigger
"VFS: Busy inodes after unmount" followed by a use-after-free on
percpu counters.

There is a window between inode_prepare_wbs_switch() returning true
(having passed the SB_ACTIVE check and grabbed the inode) and the
subsequent wb_queue_isw() call.  If cgroup_writeback_umount() observes
the global isw_nr_in_flight counter as non-zero but flush_workqueue()
finds nothing queued, it returns early -- leaving a held inode
reference that blocks evict_inodes() and a later iput() that hits
freed percpu counters.

Patch 1 closes the race by extending the RCU read-side critical
section to cover the window from inode_prepare_wbs_switch() through
wb_queue_isw(), and adding synchronize_rcu() in the umount path so
that all in-flight switchers complete queueing before
flush_workqueue() runs.  rcu_barrier() is intentionally retained so
the same hunk applies cleanly to stable trees that still queue
switches via queue_rcu_work().

Patch 2 removes the now-dead rcu_barrier() that was left over from
the queue_rcu_work() era (replaced by plain queue_work() in commit
e1b849cfa6b6 "writeback: Avoid contention on wb->list_lock when
switching inodes").  This is mainline-only.

Patch 3 replaces the global synchronize_rcu()/flush_workqueue() pair
with a per-sb counter (s_isw_nr_in_flight) plus three small helpers
(cgroup_writeback_pin / cgroup_writeback_unpin /
cgroup_writeback_drain), eliminating the global serialization
penalty.  This also reverts the RCU extension from patch 1 since the
per-sb counter makes it unnecessary.

Performance
-----------

Measured on a 16 vCPU QEMU guest, all kernels share the same .config.
Background load: 4 ext4 superblocks each running

  while :; do
      mkdir /sys/fs/cgroup/<tag>-tmp$N
      ( echo $BASHPID > <tag>-tmp$N/cgroup.procs
        dd if=/dev/zero of=$mp/burner bs=4k count=256 conv=notrunc \
       oflag=sync)
      rmdir /sys/fs/cgroup/<tag>-tmp$N
  done

This drives both inode_switch_wbs() (different cgroups writing the
same inode) and cleanup_offline_cgwb() (dying memcgs), keeping the
global isw_nr_in_flight non-zero throughout the run.  Latencies are
wall-clock around umount(8) on a separate target sb; only the target
sb's umount is measured.

Four kernels are compared at each step of the series:

  base       pre-fix mainline
  +race      base + patch 1 (race fix, keeps rcu_barrier)
  +rmbarrier +race + patch 2 (drop rcu_barrier)
  +persb     +rmbarrier + patch 3 (per-sb counter)

Target sb runs its own cgwb churn:

                p50      p95      p99      max
  base         99.7 ms 112.9 ms 112.9 ms 127.2 ms
  +race       110.2 ms 153.8 ms 153.8 ms 160.4 ms
  +rmbarrier   67.6 ms  88.3 ms  88.3 ms  96.8 ms
  +persb        7.9 ms  10.0 ms  10.0 ms  10.1 ms

Idle target umount under cross-sb cgwb-switch pressure:

                p50      p95      p99      max
  base         92.0 ms 123.5 ms 136.5 ms 141.3 ms
  +race       118.8 ms 154.6 ms 164.7 ms 165.3 ms
  +rmbarrier   62.7 ms  95.4 ms 108.1 ms 108.6 ms
  +persb        5.3 ms   6.9 ms   7.4 ms   7.4 ms

8 concurrent umounts of idle sbs under the same pressure:

                p50      p95      p99      max
  base        137.5 ms 166.9 ms 166.9 ms 171.3 ms
  +race       162.2 ms 183.9 ms 183.9 ms 217.0 ms
  +rmbarrier   61.3 ms  99.5 ms  99.5 ms 113.7 ms
  +persb        8.1 ms   9.1 ms   9.1 ms   9.5 ms

A no-pressure baseline run (no background load) measures ~5 ms p50
across all four kernels, validating that the methodology has no
systematic bias.

In-kernel cgroup_writeback_umount() cumulative cost across the same
run (bpftrace, ~340 calls covering all four scenarios):

                                cgroup_writeback_umount() time
  base                          21240 ms total  (~62 ms / call)
  +race      (rcu_barrier+sync) 24966 ms total  (~73 ms / call)
  +rmbarrier (synchronize_rcu)  12371 ms total  (~36 ms / call)
  +persb     (per-sb counter)    1.37 ms total  ( ~4 us / call)

Under +persb the wait_var_event() condition is true on entry
whenever the target sb has nothing in flight, so synchronize_rcu()
and flush_workqueue() are never called on this path.

Notes:

  - Patch 1 adds ~10-27 ms p50 over base by introducing
    synchronize_rcu().  This is the cost of closing the race
    correctly and is paid by stable backports as well.
  - Patch 2 ("drop rcu_barrier()") was expected to be a pure cleanup
    on mainline, but actually removes a real wait: rcu_barrier()
    drains call_rcu() callbacks from *all* subsystems, and the
    cgroup teardown path keeps that pipeline busy under this
    workload.  Removing it cuts ~43-101 ms p50 on top of patch 1.
  - Patch 3 (per-sb counter) replaces the global wait entirely; the
    target sb no longer waits for activity on unrelated sbs,
    recovering near-baseline latency in all three scenarios.

* patches from https://patch.msgid.link/20260521095016.2791354-1-libaokun@linux.alibaba.com:
  writeback: use a per-sb counter to drain inode wb switches at umount
  writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
  writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()

Link: https://patch.msgid.link/20260521095016.2791354-1-libaokun@linux.alibaba.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
3 weeks agowriteback: use a per-sb counter to drain inode wb switches at umount
Baokun Li [Thu, 21 May 2026 09:50:16 +0000 (17:50 +0800)] 
writeback: use a per-sb counter to drain inode wb switches at umount

Tracking in-flight inode wb switches with a single global counter
(isw_nr_in_flight) plus a synchronize_rcu() based wait in
cgroup_writeback_umount() forces every umount to take a global hit
whenever any other superblock on the system has wb switches in flight,
even if the superblock being unmounted has none of its own.

Replace the global synchronize_rcu()/flush_workqueue() pair with a
per-sb counter, s_isw_nr_in_flight, plus three small helpers:

  - cgroup_writeback_pin(sb)   - increment counter
  - cgroup_writeback_unpin(sb) - decrement and wake drainer if last
  - cgroup_writeback_drain(sb) - wait for counter to reach zero

The wiring is:

  - inode_prepare_wbs_switch() pins before checking SB_ACTIVE and
    grabbing the inode; failure paths unpin before returning.  A
    lockless SB_ACTIVE check at the top of the function lets us skip
    the atomic_inc/smp_mb dance once SB_ACTIVE has been cleared (it
    is monotonic and never set back).
  - process_inode_switch_wbs() unpins after the matching iput().
  - cgroup_writeback_umount() drains the per-sb counter via
    wait_var_event().

The smp_mb() pair between inode_prepare_wbs_switch() and
cgroup_writeback_umount() keeps the SB_ACTIVE / counter ordering:
either the umounter sees a non-zero counter and waits, or the
switcher sees SB_ACTIVE cleared and aborts before grabbing the
inode.

The global isw_nr_in_flight is left in place, since it is still used
to throttle in-flight switches via WB_FRN_MAX_IN_FLIGHT.

The rcu_read_lock() extension in inode_switch_wbs() and
cleanup_offline_cgwb() that the race fix added is no longer needed
and is reverted; the synchronize_rcu() that the race fix added to
cgroup_writeback_umount() is dropped as well.

The following numbers were measured on a 16 vCPU QEMU guest with 4
background superblocks each churning "create memcg -> write 1 MiB ->
rmdir memcg" to keep the global isw_nr_in_flight non-zero.  Latencies
are wall-clock around umount(8); only the target sb's umount is
measured.

Target sb runs its own cgwb churn:

                              p50      p95      p99      max
  global synchronize_rcu()   67.6 ms  88.3 ms  88.3 ms  96.8 ms
  per-sb counter (this)       7.9 ms  10.0 ms  10.0 ms  10.1 ms

Idle target umount latency under cross-sb cgwb-switch pressure:

                              p50      p95      p99      max
  global synchronize_rcu()   62.7 ms  95.4 ms 108.1 ms 108.6 ms
  per-sb counter (this)       5.3 ms   6.9 ms   7.4 ms   7.4 ms
  no-pressure baseline        4.9 ms   5.9 ms   6.3 ms   6.7 ms

8 concurrent umounts of idle sbs under the same pressure:

                              p50      p95      max
  global synchronize_rcu()   61.3 ms  99.5 ms 113.7 ms
  per-sb counter (this)       8.1 ms   9.1 ms   9.5 ms

In-kernel cgroup_writeback_umount() time across the same run
(bpftrace, ~340 calls covering all scenarios):

  global synchronize_rcu()    12371 ms total (~36 ms / call)
  per-sb counter (this)        1.37 ms total ( ~4 us / call)

Suggested-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/177910456953.488929.2169908940676707307.b4-review@b4
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
Link: https://patch.msgid.link/20260521095016.2791354-4-libaokun@linux.alibaba.com
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agowriteback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
Baokun Li [Thu, 21 May 2026 09:50:15 +0000 (17:50 +0800)] 
writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()

Commit e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when
switching inodes") replaced the queue_rcu_work() based scheduling of
inode wb switches with a plain queue_work().  Since then no switcher
goes through call_rcu(), so rcu_barrier() in cgroup_writeback_umount()
has no callbacks of its own to wait for.  It still drains unrelated
call_rcu() callbacks from other subsystems on busy systems, which
incidentally slows umount down; drop it.

Fixes: e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when switching inodes")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
Link: https://patch.msgid.link/20260521095016.2791354-3-libaokun@linux.alibaba.com
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agowriteback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
Baokun Li [Thu, 21 May 2026 09:50:14 +0000 (17:50 +0800)] 
writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()

When a container exits, the following BUG_ON() is occasionally triggered:

==================================================================
 VFS: Busy inodes after unmount of sdb (ext4)
 ------------[ cut here ]------------
 kernel BUG at fs/super.c:695!
 CPU: 3 PID: 6 Comm: containerd-shim Tainted: G OE K 6.6 #1
 pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
 pc : generic_shutdown_super+0xf0/0x100
 lr : generic_shutdown_super+0xf0/0x100
 Call trace:
  generic_shutdown_super+0xf0/0x100
  kill_block_super+0x20/0x48
  ext4_kill_sb+0x28/0x60
  deactivate_locked_super+0x54/0x130
  deactivate_super+0x84/0xa0
  cleanup_mnt+0xa4/0x140
  __cleanup_mnt+0x18/0x28
  task_work_run+0x78/0xe0
  do_notify_resume+0x204/0x240
==================================================================

The root cause is a race between cgroup_writeback_umount() and
inode_switch_wbs()/cleanup_offline_cgwb(). There is a window between
inode_prepare_wbs_switch() returning true and the subsequent
wb_queue_isw() call. Following is the process that triggers the issue:

      CPU A (umount)           |          CPU B (writeback)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                 inode_switch_wbs/cleanup_offline_cgwb
                                  atomic_inc(&isw_nr_in_flight)
                                  inode_prepare_wbs_switch
                                   -> passes SB_ACTIVE check
                                   __iget(inode)
 generic_shutdown_super
  sb->s_flags &= ~SB_ACTIVE
  cgroup_writeback_umount(sb)
   smp_mb()
   atomic_read(&isw_nr_in_flight)
   rcu_barrier()
    -> no pending RCU callbacks
   flush_workqueue(isw_wq)
    -> nothing queued, returns
  evict_inodes(sb)
   -> Inode skipped as isw still holds a ref.
  sop->put_super(sb)
   /* destroys percpu counters */
  -> VFS: Busy inodes after unmount!
                                  wb_queue_isw()
                                   queue_work(isw_wq, ...)
                                  /* later in work function */
                                  inode_switch_wbs_work_fn
                                   process_inode_switch_wbs
                                    iput() -> evict
                                     percpu_counter_dec() // UAF!

Fix this by extending the RCU read-side critical section in
inode_switch_wbs() and cleanup_offline_cgwb() to cover from
inode_prepare_wbs_switch() through wb_queue_isw().  Since there is
no sleep in this window, rcu_read_lock() can be used.  Then add a
synchronize_rcu() in cgroup_writeback_umount() before the existing
rcu_barrier(), so that all in-flight switchers that have passed the
SB_ACTIVE check have completed queue_work() before flush_workqueue()
is called.

The existing rcu_barrier() is intentionally retained so this fix can
be backported unchanged to stable kernels (5.10.y, 6.6.y, ...) that
still queue switches via queue_rcu_work(). It is a no-op on current
mainline (since commit e1b849cfa6b6 ("writeback: Avoid contention on
wb->list_lock when switching inodes")) and is removed in a follow-up
patch.

Fixes: a1a0e23e4903 ("writeback: flush inode cgroup wb switches instead of pinning super_block")
Cc: stable@vger.kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/all/mxnjq2l6guusfchvauxr3v7c4bwjasybxlleqbbh4efloeqspz@iqylk76ohufz
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
Link: https://patch.msgid.link/20260521095016.2791354-2-libaokun@linux.alibaba.com
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
3 weeks agorust_binder: Avoid holding lock when dropping delivered_death
Matthew Maurer [Fri, 3 Apr 2026 18:18:58 +0000 (18:18 +0000)] 
rust_binder: Avoid holding lock when dropping delivered_death

In 6c37bebd8c926, we switched to looping over the list and dropping each
individual node, ostensibly without the lock held in the loop body.

If the kernel were using Rust Edition 2024, the comment would be
accurate, and the lock would not be held across the drop. However, the
kernel is currently using 2021, so tail expression lifetime extension
results in the lock being held across the drop. Explicitly binding the
expression result to a variable makes the lockguard no longer part of a
tail expression, causing the lock to be dropped before entering the loop
body.

This was detected via `CONFIG_PROVE_LOCKING` identifying an invalid wait
context at the drop site.

Reported-by: David Stevens <stevensd@google.com>
Signed-off-by: Matthew Maurer <mmaurer@google.com>
Cc: stable <stable@kernel.org>
Fixes: 6c37bebd8c92 ("rust_binder: avoid mem::take on delivered_deaths")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://patch.msgid.link/20260403-lockhold-v1-1-c332b56cd8ae@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agorust_binder: avoid calling pending_oneway_finished() on TF_UPDATE_TXN
Alice Ryhl [Tue, 14 Apr 2026 12:02:34 +0000 (12:02 +0000)] 
rust_binder: avoid calling pending_oneway_finished() on TF_UPDATE_TXN

When an outdated transaction is removed from `oneway_todo` due to
`TF_UPDATE_TXN`, its `Allocation` is dropped. The current implementation
of `Allocation::drop` calls `pending_oneway_finished()`, assuming the
transaction was executed. This leads to premature execution of the next
queued one-way transaction.

Fix this by taking the `oneway_node` from the `Allocation` of the
outdated transaction before it is dropped. This prevents
`Allocation::drop` from signaling completion.

We do not call `take_oneway_node()` from `Transaction::cancel` because
it's actually correct to call `pending_oneway_finished()` on cancel if
the transaction did not come from `oneway_todo`. This ensures that if
`BINDER_THREAD_EXIT` is invoked and cancels a oneway transaction, then
the next transaction is taken from `oneway_todo`.

This bug does not lead to any issues in the kernel, but may lead to
Binder delivering transactions to userspace earlier than userspace
expected to receive them.

Cc: stable <stable@kernel.org>
Fixes: eafedbc7c050 ("rust_binder: add Rust Binder driver")
Assisted-by: Antigravity:gemini
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://patch.msgid.link/20260414-tf-update-txn-fix-v1-1-d2b83303acc9@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agoserial: dz: Enable modular build
Maciej W. Rozycki [Wed, 6 May 2026 22:42:56 +0000 (23:42 +0100)] 
serial: dz: Enable modular build

Enable modular build since the driver now has a proper module_exit()
handler.  There's nothing specific to DZ hardware to prevent driver
unloading and reloading from working.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://patch.msgid.link/alpine.DEB.2.21.2605062331420.46195@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 weeks agoserial: zs: Convert to use a platform device
Maciej W. Rozycki [Wed, 6 May 2026 22:42:52 +0000 (23:42 +0100)] 
serial: zs: Convert to use a platform device

Prevent a crash from happening as the first serial port is initialised:

  Console: switching to mono frame buffer device 160x64
  fb0: PMAG-AA frame buffer device at tc0
  DECstation Z85C30 serial driver version 0.10
  CPU 0 Unable to handle kernel paging request at virtual address 0000002c, epc == 803ab00c, ra == 803aafe0
  Oops[#1]:
  CPU: 0 PID: 1 Comm: swapper Not tainted 6.4.0-rc3-00031-g84a9582fd203-dirty #57
  $ 0   : 00000000 10012c00 803aaeb0 00000000
  $ 4   : 80e12f60 80e12f50 80e12f58 81000030
  $ 8   : 00000000 805ff37c 00000000 33433538
  $12   : 65732030 00000006 80c2915d 6c616972
  $16   : 80e12f00 807b7630 00000000 00000000
  $20   : 00000004 00000348 000001a0 807623b8
  $24   : 00000018 00000000
  $28   : 80c24000 80c25d60 8078b148 803aafe0
  Hi    : 00000000
  Lo    : 00000000
  epc   : 803ab00c serial_base_ctrl_add+0x78/0xf4
  ra    : 803aafe0 serial_base_ctrl_add+0x4c/0xf4
  Status: 10012c03 KERNEL EXL IE
  Cause : 00000008 (ExcCode 02)
  BadVA : 0000002c
  PrId  : 00000440 (R4400SC)
  Modules linked in:
  Process swapper (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
  Stack : 80760000 00000cc0 00400044 00400040 803aa02c 80d61ab8 00000000 807b7630
          80760000 807623b8 807b7628 803aa644 80386998 00000000 80e17780 80220f68
          80e17780 80d61ab8 80c17d80 80e17780 80e17780 8063c798 80e17780 80383fa0
          00000010 80e17780 00000000 80386998 807a0000 00000000 00400040 8038f848
          807623b8 80d61ab8 00000004 80e17780 00000000 803a68e4 80c25e2c 803bb884
          ...
  Call Trace:
  [<803ab00c>] serial_base_ctrl_add+0x78/0xf4
  [<803aa644>] serial_core_register_port+0x174/0x69c
  [<8077e9ac>] zs_init+0xc8/0xfc
  [<800404d4>] do_one_initcall+0x40/0x2ac
  [<8076cecc>] kernel_init_freeable+0x1e4/0x270
  [<80605bec>] kernel_init+0x20/0x108
  [<800431e8>] ret_from_kernel_thread+0x14/0x1c

  Code: 2442aeb0  ae120024  ae0200d0 <8c67002c50e00001  8c670000  3c06806e  3c05806e  afb30010

  ---[ end trace 0000000000000000 ]---

(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.

Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device.  Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.

Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it.  Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.

An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.

Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.

Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10
Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>