Linus Torvalds [Fri, 22 May 2026 13:40:31 +0000 (06:40 -0700)]
Merge tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix PAI NNPA mismatch between counting and recording, where sampling
reports twice the value
- Fix loss of PAI counter increments during recording on systems with
many CPUs under heavy load, while counting is not affected
- On some supported machines, CHSC cannot access memory outside the DMA
zone, causing CHSC command failures. Restore GFP_DMA flag when
allocating memory for CHSC control blocks
- Align the numbering scheme for higher-level topology structures like
socket, book, drawer with other hardware identifiers e.g. in sysfs,
procfs and tools like lscpu
* tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/topology: Use zero-based numbering for containing entities
s390/cio: Restore GFP_DMA for CHSC allocation
s390/pai: Fix missing PAI counter increments under heavy load
s390/pai: Disable duplicate read of kernel PAI counter value
Linus Torvalds [Fri, 22 May 2026 13:23:56 +0000 (06:23 -0700)]
Merge tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Stable fix for a missing cpus_read_lock in one of the cpu sheaves
flushing paths (Qing Wang)
* tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: hold cpus_read_lock around flush_rcu_sheaves_on_cache()
Aleksandr Nogikh [Thu, 21 May 2026 14:22:40 +0000 (16:22 +0200)]
signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads()
When a multi-threaded process receives a stop signal (e.g., SIGSTOP),
do_signal_stop() sets JOBCTL_STOP_PENDING and JOBCTL_STOP_CONSUME on all
threads and sets signal->group_stop_count to the number of threads. If
one of the threads concurrently calls execve(), de_thread() invokes
zap_other_threads() to kill all other threads. zap_other_threads()
aborts the pending group stop by resetting signal->group_stop_count to 0
and clears the JOBCTL_PENDING_MASK for all other threads. However, it
fails to clear the job control flags for the calling thread.
When execve() completes, the calling thread returns to user mode and
checks for pending signals. Seeing the stale JOBCTL_STOP_PENDING flag,
it calls do_signal_stop(), which invokes task_participate_group_stop().
Since JOBCTL_STOP_CONSUME is still set, it attempts to decrement the
already-zero signal->group_stop_count, triggering a warning:
Fix this race condition by clearing the JOBCTL_PENDING_MASK for the
calling thread in zap_other_threads(), ensuring it does not retain any
stale job control state after the thread group is destroyed. This aligns
with other functions that tear down a thread group and abort group
stops, such as zap_process() and complete_signal(), which correctly
clear these flags for all threads including the current one.
Fixes: 39efa3ef3a37 ("signal: Use GROUP_STOP_PENDING to stop once for a single group stop") Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot Reported-by: syzbot+b109633ea805cac54a61@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b109633ea805cac54a61 Link: https://syzkaller.appspot.com/ai_job?id=d70208cc-862b-4fe3-bf02-3031e10cd0b3 Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Link: https://patch.msgid.link/20260521142240.2973022-1-nogikh@google.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Jann Horn [Tue, 19 May 2026 14:29:38 +0000 (16:29 +0200)]
fuse: reject fuse_notify() pagecache ops on directories
The operations FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE allow the
FUSE daemon to actively write/read pagecache contents.
For directories with FOPEN_CACHE_DIR, the pagecache is used as
kernel-internal cache storage, and userspace is not supposed to have
direct access to this cache - in particular, fuse_parse_cache() will hit
WARN_ON() if the cache contains bogus data.
Reject FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE on anything other than
regular files with -EINVAL.
Jann Horn [Tue, 19 May 2026 14:40:34 +0000 (16:40 +0200)]
fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios
FUSE_NOTIFY_RETRIEVE must be limited to uptodate folios; !uptodate folios
can contain uninitialized data.
Since FUSE_NOTIFY_RETRIEVE is intended to only return data that is already
in the page cache and not wait for data from the FUSE daemon, treat
!uptodate folios as if they weren't present.
This only has security impact on systems that don't enable automatic
zero-initialization of all page allocations via
CONFIG_INIT_ON_ALLOC_DEFAULT_ON or init_on_alloc=1.
Linus Torvalds [Fri, 22 May 2026 13:16:00 +0000 (06:16 -0700)]
Merge tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
"Two minor updates for the DMA-mapping code, mainly fixing some rare
corner cases (Petr Tesarik, Jianpeng Chang)"
* tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: move dma_map_resource() sanity check into debug code
dma-direct: fix use of max_pfn
Linus Torvalds [Fri, 22 May 2026 13:09:58 +0000 (06:09 -0700)]
Merge tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Avoid NULL return from hist_field_name()
The function hist_field_name() is directly passed to a strcat() which
does not handle "NULL" characters. Return a zero length string when
size is greater than the limit.
This is used only to output already created histograms and no field
currently is greater than the limit. But it should still not return
NULL.
- Do not call map->ops->elt_free() on allocation failure
When elt_alloc() fails, it should not call the map->ops->elt_free()
function if it exists, as that function may not be able to handle the
free on allocation failures. The ->elt_free() should only be called
when elt_alloc() succeeds.
* tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Do not call map->ops->elt_free() if elt_alloc() fails
tracing: Avoid NULL return from hist_field_name() on truncation
The newly added driver requires the LED classdev support
and causes a link failure when that is disabled:
x86_64-linux-ld: vmlinux.o: in function `bitland_mifs_wmi_probe':
bitland-mifs-wmi.c:(.text+0xede02a): undefined reference to `devm_led_classdev_register_ext'
Fixes: dc1ec4fa86b2 ("platform/x86: bitland-mifs-wmi: Add new Bitland MIFS WMI driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260519202804.1339581-1-arnd@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
This series converts mutex and spinlock handling in the STM drivers
to use guard() helpers.
The changes are code cleanup only and should have no functional impact.
Randy Dunlap [Tue, 19 May 2026 04:23:14 +0000 (21:23 -0700)]
eventpoll: add missing kernel-doc for @ctx function parameters
Add the missing kernel-doc comments to prevent kernel-doc build
warnings while building the documentation.
WARNING: fs/eventpoll.c:1684 function parameter 'ctx' not described in 'reverse_path_check'
WARNING: fs/eventpoll.c:2349 function parameter 'ctx' not described in 'ep_loop_check_proc'
The MT8189 AFE probe assigns reserved memory with
of_reserved_mem_device_init(), but only releases that assignment from
.remove(). If probe fails after the reserved memory has been assigned,
the assignment record is left behind.
The probe path also uses pm_runtime_get_sync() without checking its
return value. If runtime resume fails, pm_runtime_get_sync() leaves the
usage count incremented and the driver continues initialization without
the device being resumed. Use pm_runtime_resume_and_get() so resume
errors abort probe without leaking a PM usage count.
Finally, component registration failure currently jumps to a label that
drops a runtime PM reference even though the temporary probe reference
was already released. Return the component registration error directly,
and do not drop an unmatched PM reference from .remove().
Thorsten Blum [Sun, 17 May 2026 16:27:40 +0000 (18:27 +0200)]
crypto: atmel-sha204a - fail on hwrng registration error in probe path
Commit 13909a0c8897 ("crypto: atmel-sha204a - provide the otp content")
overwrote the hwrng registration return value when creating the sysfs
group, which allowed atmel_sha204a_probe() to succeed even if
devm_hwrng_register() failed.
Return immediately when devm_hwrng_register() fails, and report both
hwrng and sysfs registration errors with dev_err(). Adjust the sysfs
error log message for consistency.
Fixes: 13909a0c8897 ("crypto: atmel-sha204a - provide the otp content") Cc: stable@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Sun, 17 May 2026 12:37:07 +0000 (14:37 +0200)]
crypto: atmel-sha204a - remove sysfs group before hwrng
atmel_sha204a_probe() registers the hwrng before creating the sysfs
group. Mirror this order in atmel_sha204a_remove() by removing the sysfs
group before unregistering the hwrng.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thorsten Blum [Sun, 17 May 2026 10:34:14 +0000 (12:34 +0200)]
crypto: omap-des - add COMPILE_TEST and fix CONFIG_OF=n build
CRYPTO_DEV_OMAP_DES only depends on ARCH_OMAP2PLUS, which is ARM-only
and selects OF via ARM's USE_OF, making any non-OF code unreachable.
Add COMPILE_TEST so the driver can be built with CONFIG_OF=n, making the
non-OF code reachable.
Fix the resulting non-OF build failures:
- omap_des_irq() was defined inside a CONFIG_OF block, but is referenced
unconditionally from omap_des_probe(). Move the CONFIG_OF guard so it
only covers omap_des_get_of().
- The non-OF omap_des_get_of() stub took a struct device *, while
omap_des_probe() passes a struct platform_device *. Make the stub
prototype match the OF implementation and the caller.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The last MIPS crypto code was moved to lib/crypto/mips in
commit c9e5ac0ab9d1 ("lib/crypto: mips/md5: Migrate optimized code into
library"). However, arch/mips/crypto still contains stub Kconfig,
Makefile, and .gitignore files. Remove these unnecessary files.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
All LoongArch crypto code was moved to arch/loongarch/lib in
commit 72f51a4f4b07 ("loongarch/crc32: expose CRC32 functions through
lib"). However, arch/loongarch/crypto still contains stub Kconfig and
Makefile files. Remove these unnecessary files.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Costa Shulyupin [Fri, 15 May 2026 19:02:14 +0000 (22:02 +0300)]
include: Remove unused crypto-ux500.h
The UX500 crypto drivers were removed in commit 453de3eb08c4
("crypto: ux500/cryp - delete driver") and commit dd7b7972cb89
("crypto: ux500/hash - delete driver"). No file includes
this header.
Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Mikko Perttunen [Fri, 15 May 2026 02:34:52 +0000 (11:34 +0900)]
crypto: tegra - Don't touch bo refcount in host1x bo pin/unpin
Since commit "gpu: host1x: Allow entries in BO caches to be freed",
host1x_bo_pin() and host1x_bo_unpin() handle the bo's refcount
themselves. .pin/.unpin callbacks should not adjust it.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Lukas Wunner [Thu, 14 May 2026 06:55:58 +0000 (08:55 +0200)]
X.509: Fix validation of ASN.1 certificate header
x509_load_certificate_list() seeks to enforce that a certificate starts
with 0x30 0x82 (ASN.1 SEQUENCE tag followed by a length of more than 256
and less than 65535 bytes).
But it only enforces that *either* of those two byte values are present,
instead of checking for the *conjunction* of the two values. Fix it.
Fixes: 631cc66eb9ea ("MODSIGN: Provide module signing public keys to the kernel") Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://lore.kernel.org/r/20260508033917.B5873C2BCB0@smtp.kernel.org/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.7+ Reviewed-by: Ignat Korchagin <ignat@linux.win> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A reset requested through /sys/bus/pci/devices/.../reset invokes the
driver reset_prepare() and reset_done() callbacks. The QAT driver does
not implement those callbacks today, so the reset proceeds without
quiescing the device or bringing it back up afterward, which leaves
the device unusable.
Hook reset_prepare() and reset_done() into adf_err_handler so the
common shutdown and recovery flow also runs for reset. Skip device
quiesce if the device is already in a down state.
Cc: stable@vger.kernel.org Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 13 May 2026 15:16:58 +0000 (17:16 +0200)]
crypto: qat - factor out AER reset helpers
Move the shutdown and recovery sequences out of adf_error_detected()
and adf_slot_reset() into reset_prepare() and reset_done() helpers.
This makes the AER recovery path easier to follow and prepares the
common reset flow for reuse by additional PCI reset callbacks without
duplicating the logic.
No functional change intended.
Cc: stable@vger.kernel.org Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 13 May 2026 15:16:57 +0000 (17:16 +0200)]
crypto: qat - skip restart for down devices
Skip the shutdown and restart flow when adf_slot_reset() is entered
for a device that is already down. In that case, leave
ADF_STATUS_RESTARTING clear and let adf_slot_reset() restore PCI
function state without calling adf_dev_up(), re-enabling SR-IOV, or
sending restarted notifications.
This is in preparation for adding reset_prepare() and reset_done()
callbacks in adf_aer.c.
Cc: stable@vger.kernel.org Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 13 May 2026 15:16:56 +0000 (17:16 +0200)]
crypto: qat - centralize bus master enable
QAT driver currently toggles PCI bus mastering in multiple places
(probe paths, and reset callbacks). This makes BME state depend on
call ordering and on what PCI command bits were captured in saved PCI
config state.
Make BME control explicit and deterministic:
- remove pci_set_master() from device-specific probe paths
- add adf_set_bme() and call it from adf_dev_init() so BME is enabled
at one point before device bring-up
- drop redundant pci_set_master() and pci_clear_master from adf_aer.c
and rely on the unified init path for BME enablement
This is in preparation for adding reset_prepare() and reset_done()
hooks. In the PCI reset callback flow, the PCI core saves and
restores device configuration state around reset_prepare() and
reset_done(). This change is needed to ensure that we are able to
properly shutdown or reinitialize the device post sysfs triggered
resets.
Cc: stable@vger.kernel.org Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 13 May 2026 15:16:55 +0000 (17:16 +0200)]
crypto: qat - notify fatal error before AER reset preparation
Send fatal error notifications to subsystems and VFs as soon as
AER error detection starts, before entering the reset preparation
shutdown sequence.
This reduces notification latency and ensures peers are informed
immediately on fatal detection, rather than after restart-state setup
and arbitration teardown.
Cc: stable@vger.kernel.org Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ahsan Atta [Wed, 13 May 2026 15:16:54 +0000 (17:16 +0200)]
crypto: qat - keep VFs enabled during reset
When a reset is triggered via sysfs, the PCI core invokes the
reset_prepare() callback while holding pci_dev_lock(), which includes
the PCI configuration space access semaphore. If reset_prepare() calls
adf_dev_down(), the call chain adf_dev_stop() -> adf_disable_sriov()
-> pci_disable_sriov() attempts to acquire the same semaphore,
resulting in a deadlock.
Avoid this by skipping pci_disable_sriov() when ADF_STATUS_RESTARTING
is set. During reset the PCI topology is preserved, so VF devices
remain valid and enumerated across the reset. VF notification and the
quiesce handshake via adf_pf2vf_notify_restarting() are still
performed unconditionally so that VFs stop submitting work before the
PF shuts down.
Correspondingly, skip pci_enable_sriov() in adf_enable_sriov() when
VFs are already present, since their PCI devices were preserved from
before the restart.
This is in preparation for adding reset_prepare() and reset_done()
callbacks in adf_aer.c.
Cc: stable@vger.kernel.org Signed-off-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Wed, 13 May 2026 14:47:32 +0000 (15:47 +0100)]
crypto: qat - fix VF2PF work teardown race in adf_disable_sriov()
The VF2PF interrupt handler queues PF-side response work that stores a
raw pointer to per-VF state (struct adf_accel_vf_info). Currently,
adf_disable_sriov() destroys per-VF mutexes and frees vf_info without
stopping new VF2PF work or waiting for in-flight workers to complete. A
concurrently scheduled or already queued worker can then dereference
freed memory.
This manifests as a use-after-free when KASAN is enabled:
BUG: KASAN: null-ptr-deref in mutex_lock+0x76/0xe0
Write of size 8 at addr 0000000000000260 by task kworker/24:2/...
Workqueue: qat_pf2vf_resp_wq adf_iov_send_resp [intel_qat]
Call Trace:
kasan_report+0x119/0x140
mutex_lock+0x76/0xe0
adf_gen4_pfvf_send+0xd4/0x1f0 [intel_qat]
adf_recv_and_handle_vf2pf_msg+0x290/0x360 [intel_qat]
adf_iov_send_resp+0x8c/0xe0 [intel_qat]
process_one_work+0x6ac/0xfd0
worker_thread+0x4dd/0xd30
kthread+0x326/0x410
ret_from_fork+0x33b/0x670
Add a PF-local flag, vf2pf_disabled, that gates work queueing, worker
processing, and interrupt re-enabling during teardown. Set this flag
atomically with the hardware interrupt mask inside
adf_disable_all_vf2pf_interrupts(). After masking, synchronize the AE
cluster MSI-X interrupt and flush the PF response workqueue before
tearing down per-VF locks and state so all in-flight work completes
before vf_info is destroyed.
Introduce adf_enable_all_vf2pf_interrupts() to clear the flag and
unmask all VF2PF interrupts under the same lock when SR-IOV is
re-enabled. This ensures the software flag and hardware state transition
atomically on both the enable and disable paths.
Cc: stable@vger.kernel.org Fixes: ed8ccaef52fa ("crypto: qat - Add support for SRIOV") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: ecc - Fix carry overflow in vli multiplication
The carry flag calculation fails when r01.m_high is saturated
(0xFFFFFFFFFFFFFFFF) and addition of lower bits overflows.
The condition (r01.m_high < product.m_high) doesn't handle the case
where r01.m_high == product.m_high and an additional carry exists
from lower-bit overflow.
When commit 3c4b23901a0c ("crypto: ecdh - Add ECDH software support")
introduced crypto/ecc.c, it split the muladd() function in the
micro-ecc library into separate mul_64_64() and add_128_128() helpers.
It seems the check got lost in translation.
Add proper handling for this boundary by accounting for the carry
from the lower addition.
Giovanni Cabiddu [Wed, 13 May 2026 08:57:45 +0000 (09:57 +0100)]
crypto: qat - remove MODULE_VERSION
In-tree drivers do not need MODULE_VERSION as the kernel release
identifies the version of their code. The static version "0.6.0", which
the QAT drivers currently report, can be misleading as it might suggest
the drivers are outdated.
Remove MODULE_VERSION() from all QAT driver modules and the related
ADF_DRV_VERSION, ADF_MAJOR_VERSION, ADF_MINOR_VERSION and
ADF_BUILD_VERSION macros from adf_common_drv.h.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Mon, 11 May 2026 10:04:09 +0000 (11:04 +0100)]
crypto: qat - rename adf_ctl_drv.c to adf_module.c
Now that the character device and IOCTL interface have been removed,
adf_ctl_drv.c only contains module_init/module_exit hooks. Rename it
to adf_module.c to better reflect its purpose and rename the init/exit
functions accordingly.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Mon, 11 May 2026 10:04:08 +0000 (11:04 +0100)]
crypto: qat - remove unused character device and IOCTLs
The QAT driver exposes a character device (qat_adf_ctl) with IOCTLs
for device configuration, start, stop, status query and enumeration.
These IOCTLs are not part of any public uAPI header and have no known
in-tree or out-of-tree users. Device lifecycle is already managed via
sysfs.
The ioctl interface also increases the attack surface and is the
subject of a number of bug reports.
Remove the character device, the IOCTL definitions, and the related
data structures (adf_dev_status_info, adf_user_cfg_key_val,
adf_user_cfg_section, adf_user_cfg_ctl_data). Drop the now-unused
adf_cfg_user.h header and strip adf_ctl_drv.c down to the minimal
module_init/module_exit hooks for workqueue, AER, and crypto/compression
algorithm registration.
Clean up leftover dead code that was only reachable from the removed
IOCTL paths: adf_cfg_del_all(), adf_devmgr_verify_id(),
adf_devmgr_get_num_dev(), adf_devmgr_get_dev_by_id(),
adf_get_vf_real_id() and the unused ADF_CFG macros.
Additionally, drop the entry associated to QAT IOCTLs in
ioctl-number.rst.
lizhi [Mon, 11 May 2026 00:49:27 +0000 (08:49 +0800)]
crypto: hisilicon/sec2 - lower priority for hisilicon crypto implementations
Lower the priority of HiSilicon's crypto implementations to allow more
suitable alternatives to be selected. For example, certain kernel
use-cases do not benefit from HiSilicon's symmetric crypto algorithms.
This change ensures that more appropriate options are chosen first while
retaining HiSilicon's implementations as alternatives.
Signed-off-by: lizhi <lizhi206@huawei.com> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Reviewed-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thomas Weißschuh [Tue, 12 May 2026 16:39:14 +0000 (18:39 +0200)]
driver core: Allow the constification of device attributes
Allow device attribute to reside in read-only memory.
Both const and non-const attributes are handled by the utility macros
and attributes can be migrated one-by-one.
Thomas Weißschuh [Tue, 12 May 2026 16:39:13 +0000 (18:39 +0200)]
driver core: Stop using generic sysfs macros for device attributes
The constification of device attributes will require a transition phase,
where 'struct device_attribute' contains a classic non-const and a new
const variant of the 'show' and 'store' callbacks.
As __ATTR() and friends can not handle this duplication stop using them.
Herve Codina [Mon, 11 May 2026 15:57:50 +0000 (17:57 +0200)]
driver core: Avoid warning when removing a device while its supplier is unbinding
During driver removal, the following warning can appear:
WARNING: CPU: 1 PID: 139 at drivers/base/core.c:1497 __device_links_no_driver+0xcc/0xfc
...
Call trace:
__device_links_no_driver+0xcc/0xfc (P)
device_links_driver_cleanup+0xa8/0xf0
device_release_driver_internal+0x208/0x23c
device_links_unbind_consumers+0xe0/0x108
device_release_driver_internal+0xec/0x23c
device_links_unbind_consumers+0xe0/0x108
device_release_driver_internal+0xec/0x23c
device_links_unbind_consumers+0xe0/0x108
device_release_driver_internal+0xec/0x23c
driver_detach+0xa0/0x12c
bus_remove_driver+0x6c/0xbc
driver_unregister+0x30/0x60
pci_unregister_driver+0x20/0x9c
lan966x_pci_driver_exit+0x18/0xa90 [lan966x_pci]
This warning is triggered when a consumer is removed because the links
status of its supplier is not DL_DEV_DRIVER_BOUND and the link flag
DL_FLAG_SYNC_STATE_ONLY is not set.
The topology in terms of consumers/suppliers used was the following
(consumer ---> supplier):
When the PCI device is removed, the OIC (interrupt controller) has to be
removed. In order to remove the OIC, pinctrl and i2c need to be removed
and to remove pinctrl, i2c need to be removed. The removal order is:
1) i2c
2) pinctrl
3) OIC
4) PCI device
In details, the removal sequence is the following (with 0000:01:00.0 the
PCI device):
driver_detach: call device_release_driver_internal(0000:01:00.0)...
device_links_busy(0000:01:00.0):
links->status = DL_DEV_UNBINDING
device_links_unbind_consumers(0000:01:00.0):
0000:01:00.0--oic link->status = DL_STATE_SUPPLIER_UNBIND
call device_release_driver_internal(oic)...
device_links_busy(oic):
links->status = DL_DEV_UNBINDING
device_links_unbind_consumers(oic):
oic--pinctrl link->status = DL_STATE_SUPPLIER_UNBIND
call device_release_driver_internal(pinctrl)...
device_links_busy(pinctrl):
links->status = DL_DEV_UNBINDING
device_links_unbind_consumers(pinctrl):
pinctrl--i2c link->status = DL_STATE_SUPPLIER_UNBIND
call device_release_driver_internal(i2c)...
device_links_busy(i2c): links->status = DL_DEV_UNBINDING
__device_links_no_driver(i2c)...
pinctrl--i2c link->status is DL_STATE_SUPPLIER_UNBIND
oic--i2c link->status is DL_STATE_ACTIVE
oic--i2c link->supplier->links.status is DL_DEV_UNBINDING
The warning is triggered by the i2c removal because the OIC (supplier)
links status is not DL_DEV_DRIVER_BOUND. Its links status is indeed set
to DL_DEV_UNBINDING.
It is perfectly legit to have the links status set to DL_DEV_UNBINDING
in that case. Indeed we had started to unbind the OIC which triggered
the consumer unbinding and didn't finish yet when the i2c is unbound.
Avoid the warning when the supplier links status is set to
DL_DEV_UNBINDING and thus support this removal sequence without any
warnings.
Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Saravana Kannan <saravanak@google.com> Link: https://patch.msgid.link/20260511155755.34428-4-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Saravana Kannan [Mon, 11 May 2026 15:57:49 +0000 (17:57 +0200)]
of: dynamic: Fix overlayed devices not probing because of fw_devlink
When an overlay is applied, if the target device has already probed
successfully and bound to a device, then some of the fw_devlink logic
that ran when the device was probed needs to be rerun. This allows newly
created dangling consumers of the overlayed device tree nodes to be
moved to become consumers of the target device.
[Herve: Add the call to driver_deferred_probe_trigger()]
[Herve: Use fwnode_test_flag() to test fwnode flags value]
Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays") Reported-by: Herve Codina <herve.codina@bootlin.com> Closes: https://lore.kernel.org/lkml/CAMuHMdXEnSD4rRJ-o90x4OprUacN_rJgyo8x6=9F9rZ+-KzjOg@mail.gmail.com/ Closes: https://lore.kernel.org/all/20240221095137.616d2aaa@bootlin.com/ Closes: https://lore.kernel.org/lkml/20240312151835.29ef62a0@bootlin.com/ Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/lkml/20240411235623.1260061-3-saravanak@google.com/
[Herve: Rebase on top of recent kernel] Signed-off-by: Herve Codina <herve.codina@bootlin.com> Tested-by: Kalle Niemi <kaleposti@gmail.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260511155755.34428-3-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
driver core: Use mod_delayed_work to prevent lost deferred probe work
The deferred_probe_timeout_work may be permanently and unexpectedly
canceled when deferred_probe_extend_timeout() executes concurrently.
Starting with deferred_probe_timeout_work pending, the problem can
occur after the following sequence:
CPU0 CPU1
deferred_probe_extend_timeout
-> cancel_delayed_work() => true
deferred_probe_extend_timeout
-> cancel_delayed_work()
-> __cancel_work()
-> try_grab_pending()
-> schedule_delayed_work()
-> queue_delayed_work_on()
(Since the pending bit is grabbed,
it just returns without queuing)
-> set_work_pool_and_clear_pending()
(This __cancel_work() returns false and
the work will never be queued again)
The root cause is that the WORK_STRUCT_PENDING_BIT of the work_struct
is set temporarily in __cancel_work() (via try_grab_pending()). This
transient state prevents the work_struct from being successfully queued
by another CPU.
To fix this, replace the original non-atomic cancel and schedule
mechanism with mod_delayed_work(). This ensures the modification is
handled atomically and guarantees that the work is not lost.
Stepan Ionichev [Thu, 14 May 2026 17:14:55 +0000 (22:14 +0500)]
device property: fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
When called with FWNODE_GRAPH_ENDPOINT_NEXT, the function walks every
endpoint under the requested port and, for any endpoint whose ID is
greater than or equal to the requested one, may store a fwnode
reference in best_ep via fwnode_handle_get(). If a later iteration
finds an exact-ID match, the function returns that endpoint directly
without dropping the reference held by best_ep, leaking it.
Drop the saved candidate before returning the exact-match endpoint.
This affects callers that use FWNODE_GRAPH_ENDPOINT_NEXT to ask for
the next endpoint with ID >= the requested one (used by a number of
media drivers, e.g. imx7/8, sun6i CSI, omap3isp, xilinx-csi2,
stm32-csi). Each leak retains a fwnode reference until reboot/unbind.
Jori Koolstra [Tue, 3 Mar 2026 16:25:04 +0000 (17:25 +0100)]
devfreq: change devfreq_event_class to a const struct
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Change devfreq_event_class to be a const struct class and drop
the class_create() call. Compile tested.
Commit 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.")
converted the WARN_ONCE() in kernfs_put() to read kn->name and
parent->name via rcu_dereference(), but kernfs_put() has callers that
hold neither kernfs_rwsem nor the RCU read lock. The inode eviction
path driven by memory reclaim is one such case:
lockdep complains with "suspicious RCU usage" whenever the WARN
fires from such a context.
Wrap the rcu_dereference() calls in an RCU read-side critical section.
Gate on the active-ref check so the lock is only taken when the WARN
is about to fire.
Note that this does not address the underlying imbalance in
kn->active that triggers the WARN.
Icenowy Zheng [Wed, 6 May 2026 17:56:09 +0000 (01:56 +0800)]
drm: verisilicon: add max cursor size to HWDB
Different display controller variants support different maximum cursor
size. All known DC8200 variants support both 32x32 and 64x64, but some
DC8000 variants support either only 32x32 or up to 256x256.
The minimum size is fixed at 32 and only PoT square sizes are supported.
Add the max cursor size field to HWDB and fill all entries with 64.
spi: Use named initializers for arrays of i2c_device_data
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
The mentioned robustness is relevant for a planned change to struct
i2c_device_id that replaces .driver_data by an anonymous union.
While touching these arrays, drop a comma after the list terminator.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
Seppo Ingalsuo [Fri, 22 May 2026 07:56:59 +0000 (10:56 +0300)]
ASoC: SOF: ipc4-topology: Enable deep buffer capture
This patch lets a capture PCM to operate with deep buffer
by taking dsp_max_burst_size_in_ms from topology if it is
defined. Earlier the value from topology was omitted for
capture.
If not defined, the maximum burst size for capture is set
similarly as before to one millisecond.
The dma_buffer_size is set similarly as for playback from
largest of deep_buffer_dma_ms or SOF_IPC4_MIN_DMA_BUFFER_SIZE
times OBS.
Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://patch.msgid.link/20260522075659.2645-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
Mark Brown [Fri, 22 May 2026 10:55:30 +0000 (11:55 +0100)]
spi: aspeed: Fix __iomem annotation and VLA parameter
Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com> says:
This series fixes two sparse warnings reported by the kernel test robot.
The first patch fixes missing __iomem annotation on an MMIO pointer
parameter, which also caused a redundant cast at the call site.
A VLA function parameter warning is also fixed in this patch series.
Chin-Ting Kuo [Fri, 22 May 2026 07:16:21 +0000 (15:16 +0800)]
spi: aspeed: Replace VLA parameter with flat pointer in calibration helper
aspeed_spi_ast2600_optimized_timing() declared its buffer argument as a
variable-length array parameter (u8 buf[rows][cols]), which causes a
sparse warning. Replace the VLA parameter with a plain u8 * and compute
the 2-D index manually. The corresponding call site is also updated.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605180441.uD3toFRJ-lkp@intel.com/ Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Link: https://patch.msgid.link/20260522071621.102507-3-chin-ting_kuo@aspeedtech.com Signed-off-by: Mark Brown <broonie@kernel.org>
Chin-Ting Kuo [Fri, 22 May 2026 07:16:20 +0000 (15:16 +0800)]
spi: aspeed: Fix missing __iomem annotation in output transfer path
The dst parameter of aspeed_spi_user_transfer_tx() is an MMIO address
obtained from chip->ahb_base, but it was typed as void * instead of
void __iomem *. This caused a sparse warning report. Fix the
parameter type to void __iomem * and drop the now-unnecessary
cast at the call site.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605180441.uD3toFRJ-lkp@intel.com/ Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Link: https://patch.msgid.link/20260522071621.102507-2-chin-ting_kuo@aspeedtech.com Signed-off-by: Mark Brown <broonie@kernel.org>
netfilter: nf_tables: fix dst corruption in same register operation
For lshift and rshift, the shift operations are performed in a loop over
32-bit words. The loop calculates the shifted value and write it to dst,
and then immediately reads from src to calculate the carry for the next
iteration. Because src and dst could point to the same memory location,
the carry is incorrectly calculated using the newly modified dst value
instead of the original src value.
Adding a temporary local variable to cache the original value before
writing to dst and using it for the carry calculation solves the
problem. In addition, partial overlap is rejected from control plane for
all kind of operations including byteorder. This was tested with the
following bytecode:
Jiayuan Chen [Wed, 20 May 2026 02:34:11 +0000 (10:34 +0800)]
selftests: netfilter: add nft_fib_nexthop test
Functional coverage of nft_fib6_eval()'s nexthop enumeration over
three route shapes:
1) single external nexthop (nhid)
2) external nexthop group (nhid -> group)
3) old-style multipath (nexthop ... nexthop ...)
Each scenario places one nexthop on the input device (veth0). For
(2) and (3) the matching nexthop is the second member, so the walk
has to traverse beyond the primary nh. Two nft counters on prerouting
verify the data path: one increments only when fib reports veth0 as
the oif, the other counts "missing" results and must stay at zero.
./nft_fib_nexthop.sh
PASS: single external nexthop (nhid -> veth0)
PASS: nexthop group (dummy0 + veth0)
PASS: old-style multipath (sibling on veth0)
nft_fib6_info_nh_uses_dev() blindly walks &rt->fib6_siblings, causing
an OOB read past the struct nexthop slab when rt->nh is set:
==================================================================
BUG: KASAN: slab-out-of-bounds in nft_fib6_eval+0x1362/0x16c0
Read of size 8 at addr ffff888103a099d0 by task ping/386
Branch by route shape: when rt->nh is set, walk via
nexthop_for_each_fib6_nh() (also covers nh groups, which the original
code missed); otherwise walk fib6_siblings, guarded by READ_ONCE() of
rt->fib6_nsiblings as required by commit 31d7d67ba127 ("ipv6: annotate
data-races around rt->fib6_nsiblings").
Jiayuan Chen [Wed, 20 May 2026 02:34:09 +0000 (10:34 +0800)]
netfilter: nft_fib_ipv6: walk fib6_siblings under RCU
nft_fib6_info_nh_uses_dev() runs from nft_fib6_eval() in softirq under
rcu_read_lock(). fib6_siblings is modified by writers that hold
tb6_lock but do not wait for RCU readers, so the sibling walk should
use list_for_each_entry_rcu(): it adds READ_ONCE() on the ->next
pointer and lets CONFIG_PROVE_RCU_LIST validate the locking.
Florian Westphal [Tue, 19 May 2026 20:52:07 +0000 (22:52 +0200)]
netfilter: ebtables: fix OOB read in compat_mtw_from_user
Luxiao Xu says:
The function compat_mtw_from_user() converts ebtables extensions from
32-bit user structures to kernel native structures. However, it lacks
proper validation of the user-supplied match_size/target_size.
When certain extensions are processed, the kernel-side translation
logic may perform memory accesses based on the extension's expected
size. If the user provides a size smaller than what the extension
requires, it results in an out-of-bounds read as reported by KASAN.
This fix introduces a check to ensure match_size is at least as large
as the extension's required compatsize. This covers matches, watchers,
and targets, while maintaining compatibility with standard targets.
AFAIU this is relevant for matches that need to go though
match->compat_from_user() call. Those that use plain memcpy with the
user-provided size are ok because the caller checks that size vs the
start of the next rule entry offset (which itself is checked vs. total
size copied from userspace).
The ->compat_from_user() callbacks assume they can read compatsize bytes,
so they need this extra check.
Based on an earlier patch from Luxiao Xu.
Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Luxiao Xu <rakukuip@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Sat, 16 May 2026 15:23:21 +0000 (23:23 +0800)]
netfilter: disable payload mangling in userns
Several parts of network stack rely on iph->ihl validation
done by network stack before PRE_ROUTING.
Disable this feature for user namespaces for now.
tcp option handling is likely safe even for LOCAL_IN, so this
this leaves tcp option mangling via nft_exthdr.c as-is.
I don't think these are the only means to alter packets, but these
appear to be relatively prominent.
This could be relaxed later. Example:
- allow userns for ingress hook.
- allow userns if base is transport header.
Also, we should revalidate or restrict generally:
- Don't allow linklayer writes to spill into network header
- restrict ipv4 and ipv6 to 'known safe' writes, e.g.
saddr/daddr/check/tos
Florian Westphal [Thu, 14 May 2026 12:21:57 +0000 (14:21 +0200)]
netfilter: nf_conntrack_gre: fix gre keymap list corruption
Quoting reporter:
A race between GRE keymap insertion and destruction can corrupt the
kernel list or use a freed object. `nf_ct_gre_keymap_add()` publishes a
new keymap pointer before the embedded `list_head` is linked, while
`nf_ct_gre_keymap_destroy()` can concurrently delete and free that
same object. An unprivileged user can reach this through the PPTP
conntrack helper by racing PPTP control messages or helper teardown,
leading to KASAN-detectable list corruption/UAF in kernel context.
## Root Cause Analysis
`exp_gre()` installs GRE expectations for a PPTP control flow and then
adds two GRE keymap entries [..]
The add path publishes `ct_pptp_info->keymap[dir]` before linking the
embedded list node [..]
Concurrent teardown deletes that partially initialized object.
Make add/destroy symmetric: install both, destroy both while under lock.
Furthermore, we should refuse to publish a new mapping in case ct is going
away, else we may leak the allocation.
The "retrans" detection is strange: existing mapping is checked for key
equality with the new mapping, then for "is on the list" via list walk.
But I can't see how an existing keymap entry can be NOT on list.
Change this to only check if we're asked to map same tuple again -- if so,
skip re-install, else signal failure.
Last, add a bug trap for the keymap list; it has to be empty when namespace
is going away.
Reported-by: Leo Lin <leo@depthfirst.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Chris Mason [Tue, 19 May 2026 19:36:14 +0000 (12:36 -0700)]
netfilter: synproxy: refresh tcphdr after skb_ensure_writable
synproxy_tstamp_adjust() rewrites the TCP timestamp option in place
and then patches the TCP checksum via inet_proto_csum_replace4() on
the caller-supplied tcphdr pointer. Both ipv4_synproxy_hook() and
ipv6_synproxy_hook() obtain that pointer with skb_header_pointer()
before calling in, so it may either alias skb->head directly or
point at the caller's on-stack _tcph buffer.
Between obtaining the pointer and using it, the function calls
skb_ensure_writable(skb, optend), which on a cloned or non-linear
skb invokes pskb_expand_head() and frees the old skb->head. After
that point the cached th is stale:
caller (ipv[46]_synproxy_hook)
th = skb_header_pointer(skb, ..., &_tcph)
synproxy_tstamp_adjust(skb, protoff, th, ...)
skb_ensure_writable(skb, optend)
pskb_expand_head() /* kfree(old skb->head) */
...
inet_proto_csum_replace4(&th->check, ...)
/* writes into freed head, or
into the caller's stack copy
leaving the on-wire checksum
stale */
The option bytes are written through skb->data and are fine; only
the checksum update goes through th and so lands in the wrong
place. The result is either a write into freed slab memory or a
packet leaving with a checksum that does not match its payload.
Fix by re-deriving th from skb->data + protoff immediately after
skb_ensure_writable() succeeds, so the subsequent checksum update
targets the linear, writable header.
Hamza Mahfooz [Mon, 11 May 2026 14:43:14 +0000 (10:43 -0400)]
netfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction check
An unintended behavior in the TCP conntrack state machine allows a
connection to be forced into the CLOSE state using an RST packet with an
invalid sequence number.
Specifically, after a SYN packet is observed, an RST with an invalid SEQ
can transition the conntrack entry to TCP_CONNTRACK_CLOSE, regardless of
whether the RST corresponds to the expected reply direction. The relevant
code path assumes the RST is a response to an outgoing SYN, but does not
validate packet direction or ensure that a matching SYN was actually sent
in the opposite direction.
As a result, a crafted packet sequence consisting of a SYN followed by an
invalid-sequence RST can prematurely terminate an active NAT entry. This
makes connection teardown easier than intended.
So, tighten the state transition logic to ensure that RST-triggered
CLOSE transitions only occur when the RST is a valid response to a
previously observed SYN in the correct direction.
device property: set fwnode->secondary to NULL in fwnode_init()
If a firmware node is allocated on the stack (for instance: temporary
software node whose life-time we control) or on the heap - but using a
non-zeroing allocation function - and initialized using fwnode_init(),
its secondary pointer will contain uninitalized memory which likely will
be neither NULL nor IS_ERR() and so may end up being dereferenced (for
example: in dev_to_swnode()). Set fwnode->secondary to NULL on
initialization.
Cc: stable <stable@kernel.org> Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xiaolei Wang [Mon, 18 May 2026 07:34:05 +0000 (15:34 +0800)]
misc: rp1: Send IACK on IRQ activate to fix kdump/kexec
After a kexec/kdump reboot, the macb Ethernet controller fails to
receive any packets, causing DHCP to hang indefinitely and the network
interface to be unusable despite link being up.
The root cause is that RP1's level-triggered MSI-X interrupt sources
(such as macb on hwirq 6) may have their internal state machines stuck
in the "waiting for IACK" state. This happens because the previous
kernel crashed before sending the acknowledgment for a pending level
interrupt.
In this stuck state, RP1 will not generate new MSI-X writes even though
the interrupt source remains asserted. Since no new MSI-X is sent, the
GIC never sees a new edge, the chained IRQ handler is never invoked,
and the interrupt is permanently lost.
Fix this by sending MSIX_CFG_IACK in rp1_irq_activate(). This
unconditionally resets the MSI-X state machine back to idle when a
child device requests its interrupt. If the interrupt source is still
asserted, RP1 will immediately issue a new MSI-X with the freshly
configured msg_addr/msg_data, and normal interrupt delivery resumes.
Writing IACK when the state machine is already idle (i.e., on a normal
cold boot) is harmless — it has no effect.
Hongling Zeng [Mon, 18 May 2026 02:29:39 +0000 (10:29 +0800)]
gpib: cb7210: Fix region leak when request_irq fails
When request_irq() fails, the region allocated by request_region()
is not released. Fix this by adding an error handling path with
proper goto labels to release the region.
Ben Hutchings [Tue, 5 May 2026 18:45:12 +0000 (20:45 +0200)]
parport: Fix race between port and client registration
The parport subsystem registers port devices before they are fully
initialised, resulting in a race condition where client drivers such
as lp can attach to ports that are not completely initialised or even
being torn down.
When the port and client drivers are built as modules and loaded
around the same time during boot, this occasionally results in a
crash. I was able to make this happen reliably in a VM with a
PC-style parallel port by patching parport_pc to fail probing:
while true; do
modprobe lp & modprobe parport_pc
wait
rmmod lp parport_pc
done
for a few seconds.
In the long term I think port registration should be changed to put
the call to device_add() inside parport_announce_port(), but since the
latter currently cannot fail this will require changing all port
drivers.
For now, add a flag to indicate whether a port has been "announced"
and only try to attach client drivers to ports when the flag is set.
Guangshuo Li [Tue, 5 May 2026 15:02:56 +0000 (23:02 +0800)]
uio: uio_pci_generic_sva: fix double free of devm_kzalloc() memory
uio_pci_sva allocates struct uio_pci_sva_dev with devm_kzalloc() in
probe(), but then calls kfree(udev) both on the probe() error path
(label out_free) and again in remove().
Because devm_kzalloc() allocations are devres-managed and are freed
automatically when the device is detached (including after a failing
probe() and during driver unbind), the explicit kfree() can lead to a
double free.
If probe() fails after devm_kzalloc(), the error path frees udev and
devres cleanup will free it again when the core unwinds the partially
bound device. On normal driver removal, remove() frees udev and devres
will free it again when the device is detached.
This issue was identified by a static analysis tool I developed and
confirmed by manual review. Fix by removing the manual kfree() calls
and dropping the now-unused label.
Fixes: 3397c3cd859a2 ("uio: Add SVA support for PCI devices via uio_pci_generic_sva.c") Cc: stable <stable@kernel.org> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260505150256.614071-1-lgs201920130244@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zeng Heng [Thu, 21 May 2026 07:30:11 +0000 (15:30 +0800)]
arm64: tlb: Flush walk cache when unsharing PMD tables
When huge_pmd_unshare() is called to unshare a PMD table, the
tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true
but the aarch64 tlb_flush() only checked tlb->freed_tables to
determine whether to use TLBF_NONE (vae1is, invalidates walk
cache) or TLBF_NOWALKCACHE (vale1is, leaf-only).
This caused the stale PMD page table entry to remain in the walk cache
after unshare, potentially leading to incorrect page table walks.
Fix by including unshared_tables in the check, so that when
unsharing tables, TLBF_NONE is used and the walk cache is properly
invalidated.
Here is the detailed distinction between vae1is and vale1is:
| Instruction Combination | Actual Invalidation Scope |
| ------------------------ | --------------------------------------------------|
| `VAE1IS` + TTL=`0` | All entries at all levels (full invalidation) |
| `VAE1IS` + TTL=`2` (L2) | Non-leaf at Level 0/1 + leaf at Level 2 |
| `VALE1IS` + TTL=`0` | Leaf entries at all levels (non-leaf not cleared) |
| `VALE1IS` + TTL=`2` (L2) | Leaf entry at Level 2 only |
Merge patch series "writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()"
Baokun Li <libaokun@linux.alibaba.com> says:
When a container exits, a race between cgroup_writeback_umount() and
inode_switch_wbs() / cleanup_offline_cgwb() can trigger
"VFS: Busy inodes after unmount" followed by a use-after-free on
percpu counters.
There is a window between inode_prepare_wbs_switch() returning true
(having passed the SB_ACTIVE check and grabbed the inode) and the
subsequent wb_queue_isw() call. If cgroup_writeback_umount() observes
the global isw_nr_in_flight counter as non-zero but flush_workqueue()
finds nothing queued, it returns early -- leaving a held inode
reference that blocks evict_inodes() and a later iput() that hits
freed percpu counters.
Patch 1 closes the race by extending the RCU read-side critical
section to cover the window from inode_prepare_wbs_switch() through
wb_queue_isw(), and adding synchronize_rcu() in the umount path so
that all in-flight switchers complete queueing before
flush_workqueue() runs. rcu_barrier() is intentionally retained so
the same hunk applies cleanly to stable trees that still queue
switches via queue_rcu_work().
Patch 2 removes the now-dead rcu_barrier() that was left over from
the queue_rcu_work() era (replaced by plain queue_work() in commit e1b849cfa6b6 "writeback: Avoid contention on wb->list_lock when
switching inodes"). This is mainline-only.
Patch 3 replaces the global synchronize_rcu()/flush_workqueue() pair
with a per-sb counter (s_isw_nr_in_flight) plus three small helpers
(cgroup_writeback_pin / cgroup_writeback_unpin /
cgroup_writeback_drain), eliminating the global serialization
penalty. This also reverts the RCU extension from patch 1 since the
per-sb counter makes it unnecessary.
Performance
-----------
Measured on a 16 vCPU QEMU guest, all kernels share the same .config.
Background load: 4 ext4 superblocks each running
This drives both inode_switch_wbs() (different cgroups writing the
same inode) and cleanup_offline_cgwb() (dying memcgs), keeping the
global isw_nr_in_flight non-zero throughout the run. Latencies are
wall-clock around umount(8) on a separate target sb; only the target
sb's umount is measured.
Four kernels are compared at each step of the series:
p50 p95 p99 max
base 99.7 ms 112.9 ms 112.9 ms 127.2 ms
+race 110.2 ms 153.8 ms 153.8 ms 160.4 ms
+rmbarrier 67.6 ms 88.3 ms 88.3 ms 96.8 ms
+persb 7.9 ms 10.0 ms 10.0 ms 10.1 ms
Idle target umount under cross-sb cgwb-switch pressure:
p50 p95 p99 max
base 92.0 ms 123.5 ms 136.5 ms 141.3 ms
+race 118.8 ms 154.6 ms 164.7 ms 165.3 ms
+rmbarrier 62.7 ms 95.4 ms 108.1 ms 108.6 ms
+persb 5.3 ms 6.9 ms 7.4 ms 7.4 ms
8 concurrent umounts of idle sbs under the same pressure:
p50 p95 p99 max
base 137.5 ms 166.9 ms 166.9 ms 171.3 ms
+race 162.2 ms 183.9 ms 183.9 ms 217.0 ms
+rmbarrier 61.3 ms 99.5 ms 99.5 ms 113.7 ms
+persb 8.1 ms 9.1 ms 9.1 ms 9.5 ms
A no-pressure baseline run (no background load) measures ~5 ms p50
across all four kernels, validating that the methodology has no
systematic bias.
In-kernel cgroup_writeback_umount() cumulative cost across the same
run (bpftrace, ~340 calls covering all four scenarios):
cgroup_writeback_umount() time
base 21240 ms total (~62 ms / call)
+race (rcu_barrier+sync) 24966 ms total (~73 ms / call)
+rmbarrier (synchronize_rcu) 12371 ms total (~36 ms / call)
+persb (per-sb counter) 1.37 ms total ( ~4 us / call)
Under +persb the wait_var_event() condition is true on entry
whenever the target sb has nothing in flight, so synchronize_rcu()
and flush_workqueue() are never called on this path.
Notes:
- Patch 1 adds ~10-27 ms p50 over base by introducing
synchronize_rcu(). This is the cost of closing the race
correctly and is paid by stable backports as well.
- Patch 2 ("drop rcu_barrier()") was expected to be a pure cleanup
on mainline, but actually removes a real wait: rcu_barrier()
drains call_rcu() callbacks from *all* subsystems, and the
cgroup teardown path keeps that pipeline busy under this
workload. Removing it cuts ~43-101 ms p50 on top of patch 1.
- Patch 3 (per-sb counter) replaces the global wait entirely; the
target sb no longer waits for activity on unrelated sbs,
recovering near-baseline latency in all three scenarios.
* patches from https://patch.msgid.link/20260521095016.2791354-1-libaokun@linux.alibaba.com:
writeback: use a per-sb counter to drain inode wb switches at umount
writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
Baokun Li [Thu, 21 May 2026 09:50:16 +0000 (17:50 +0800)]
writeback: use a per-sb counter to drain inode wb switches at umount
Tracking in-flight inode wb switches with a single global counter
(isw_nr_in_flight) plus a synchronize_rcu() based wait in
cgroup_writeback_umount() forces every umount to take a global hit
whenever any other superblock on the system has wb switches in flight,
even if the superblock being unmounted has none of its own.
Replace the global synchronize_rcu()/flush_workqueue() pair with a
per-sb counter, s_isw_nr_in_flight, plus three small helpers:
- cgroup_writeback_pin(sb) - increment counter
- cgroup_writeback_unpin(sb) - decrement and wake drainer if last
- cgroup_writeback_drain(sb) - wait for counter to reach zero
The wiring is:
- inode_prepare_wbs_switch() pins before checking SB_ACTIVE and
grabbing the inode; failure paths unpin before returning. A
lockless SB_ACTIVE check at the top of the function lets us skip
the atomic_inc/smp_mb dance once SB_ACTIVE has been cleared (it
is monotonic and never set back).
- process_inode_switch_wbs() unpins after the matching iput().
- cgroup_writeback_umount() drains the per-sb counter via
wait_var_event().
The smp_mb() pair between inode_prepare_wbs_switch() and
cgroup_writeback_umount() keeps the SB_ACTIVE / counter ordering:
either the umounter sees a non-zero counter and waits, or the
switcher sees SB_ACTIVE cleared and aborts before grabbing the
inode.
The global isw_nr_in_flight is left in place, since it is still used
to throttle in-flight switches via WB_FRN_MAX_IN_FLIGHT.
The rcu_read_lock() extension in inode_switch_wbs() and
cleanup_offline_cgwb() that the race fix added is no longer needed
and is reverted; the synchronize_rcu() that the race fix added to
cgroup_writeback_umount() is dropped as well.
The following numbers were measured on a 16 vCPU QEMU guest with 4
background superblocks each churning "create memcg -> write 1 MiB ->
rmdir memcg" to keep the global isw_nr_in_flight non-zero. Latencies
are wall-clock around umount(8); only the target sb's umount is
measured.
Target sb runs its own cgwb churn:
p50 p95 p99 max
global synchronize_rcu() 67.6 ms 88.3 ms 88.3 ms 96.8 ms
per-sb counter (this) 7.9 ms 10.0 ms 10.0 ms 10.1 ms
Idle target umount latency under cross-sb cgwb-switch pressure:
p50 p95 p99 max
global synchronize_rcu() 62.7 ms 95.4 ms 108.1 ms 108.6 ms
per-sb counter (this) 5.3 ms 6.9 ms 7.4 ms 7.4 ms
no-pressure baseline 4.9 ms 5.9 ms 6.3 ms 6.7 ms
8 concurrent umounts of idle sbs under the same pressure:
p50 p95 max
global synchronize_rcu() 61.3 ms 99.5 ms 113.7 ms
per-sb counter (this) 8.1 ms 9.1 ms 9.5 ms
In-kernel cgroup_writeback_umount() time across the same run
(bpftrace, ~340 calls covering all scenarios):
global synchronize_rcu() 12371 ms total (~36 ms / call)
per-sb counter (this) 1.37 ms total ( ~4 us / call)
Baokun Li [Thu, 21 May 2026 09:50:15 +0000 (17:50 +0800)]
writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
Commit e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when
switching inodes") replaced the queue_rcu_work() based scheduling of
inode wb switches with a plain queue_work(). Since then no switcher
goes through call_rcu(), so rcu_barrier() in cgroup_writeback_umount()
has no callbacks of its own to wait for. It still drains unrelated
call_rcu() callbacks from other subsystems on busy systems, which
incidentally slows umount down; drop it.
Fixes: e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when switching inodes") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260521095016.2791354-3-libaokun@linux.alibaba.com Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Baokun Li [Thu, 21 May 2026 09:50:14 +0000 (17:50 +0800)]
writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
When a container exits, the following BUG_ON() is occasionally triggered:
==================================================================
VFS: Busy inodes after unmount of sdb (ext4)
------------[ cut here ]------------
kernel BUG at fs/super.c:695!
CPU: 3 PID: 6 Comm: containerd-shim Tainted: G OE K 6.6 #1
pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : generic_shutdown_super+0xf0/0x100
lr : generic_shutdown_super+0xf0/0x100
Call trace:
generic_shutdown_super+0xf0/0x100
kill_block_super+0x20/0x48
ext4_kill_sb+0x28/0x60
deactivate_locked_super+0x54/0x130
deactivate_super+0x84/0xa0
cleanup_mnt+0xa4/0x140
__cleanup_mnt+0x18/0x28
task_work_run+0x78/0xe0
do_notify_resume+0x204/0x240
==================================================================
The root cause is a race between cgroup_writeback_umount() and
inode_switch_wbs()/cleanup_offline_cgwb(). There is a window between
inode_prepare_wbs_switch() returning true and the subsequent
wb_queue_isw() call. Following is the process that triggers the issue:
CPU A (umount) | CPU B (writeback)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
inode_switch_wbs/cleanup_offline_cgwb
atomic_inc(&isw_nr_in_flight)
inode_prepare_wbs_switch
-> passes SB_ACTIVE check
__iget(inode)
generic_shutdown_super
sb->s_flags &= ~SB_ACTIVE
cgroup_writeback_umount(sb)
smp_mb()
atomic_read(&isw_nr_in_flight)
rcu_barrier()
-> no pending RCU callbacks
flush_workqueue(isw_wq)
-> nothing queued, returns
evict_inodes(sb)
-> Inode skipped as isw still holds a ref.
sop->put_super(sb)
/* destroys percpu counters */
-> VFS: Busy inodes after unmount!
wb_queue_isw()
queue_work(isw_wq, ...)
/* later in work function */
inode_switch_wbs_work_fn
process_inode_switch_wbs
iput() -> evict
percpu_counter_dec() // UAF!
Fix this by extending the RCU read-side critical section in
inode_switch_wbs() and cleanup_offline_cgwb() to cover from
inode_prepare_wbs_switch() through wb_queue_isw(). Since there is
no sleep in this window, rcu_read_lock() can be used. Then add a
synchronize_rcu() in cgroup_writeback_umount() before the existing
rcu_barrier(), so that all in-flight switchers that have passed the
SB_ACTIVE check have completed queue_work() before flush_workqueue()
is called.
The existing rcu_barrier() is intentionally retained so this fix can
be backported unchanged to stable kernels (5.10.y, 6.6.y, ...) that
still queue switches via queue_rcu_work(). It is a no-op on current
mainline (since commit e1b849cfa6b6 ("writeback: Avoid contention on
wb->list_lock when switching inodes")) and is removed in a follow-up
patch.
Matthew Maurer [Fri, 3 Apr 2026 18:18:58 +0000 (18:18 +0000)]
rust_binder: Avoid holding lock when dropping delivered_death
In 6c37bebd8c926, we switched to looping over the list and dropping each
individual node, ostensibly without the lock held in the loop body.
If the kernel were using Rust Edition 2024, the comment would be
accurate, and the lock would not be held across the drop. However, the
kernel is currently using 2021, so tail expression lifetime extension
results in the lock being held across the drop. Explicitly binding the
expression result to a variable makes the lockguard no longer part of a
tail expression, causing the lock to be dropped before entering the loop
body.
This was detected via `CONFIG_PROVE_LOCKING` identifying an invalid wait
context at the drop site.
Reported-by: David Stevens <stevensd@google.com> Signed-off-by: Matthew Maurer <mmaurer@google.com> Cc: stable <stable@kernel.org> Fixes: 6c37bebd8c92 ("rust_binder: avoid mem::take on delivered_deaths") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://patch.msgid.link/20260403-lockhold-v1-1-c332b56cd8ae@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alice Ryhl [Tue, 14 Apr 2026 12:02:34 +0000 (12:02 +0000)]
rust_binder: avoid calling pending_oneway_finished() on TF_UPDATE_TXN
When an outdated transaction is removed from `oneway_todo` due to
`TF_UPDATE_TXN`, its `Allocation` is dropped. The current implementation
of `Allocation::drop` calls `pending_oneway_finished()`, assuming the
transaction was executed. This leads to premature execution of the next
queued one-way transaction.
Fix this by taking the `oneway_node` from the `Allocation` of the
outdated transaction before it is dropped. This prevents
`Allocation::drop` from signaling completion.
We do not call `take_oneway_node()` from `Transaction::cancel` because
it's actually correct to call `pending_oneway_finished()` on cancel if
the transaction did not come from `oneway_todo`. This ensures that if
`BINDER_THREAD_EXIT` is invoked and cancels a oneway transaction, then
the next transaction is taken from `oneway_todo`.
This bug does not lead to any issues in the kernel, but may lead to
Binder delivering transactions to userspace earlier than userspace
expected to receive them.
Enable modular build since the driver now has a proper module_exit()
handler. There's nothing specific to DZ hardware to prevent driver
unloading and reloading from working.
(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.
Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device. Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.
Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it. Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.
An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.
Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10 Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>