Linus Torvalds [Fri, 22 May 2026 20:19:41 +0000 (13:19 -0700)]
Merge tag 'spi-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"Another batch of driver fixes from Johan fixing error handling paths,
plus another from Felix. We also have a new device ID added in the DT
bindings for SpacemiT K3"
* tag 'spi-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dt-bindings: fsl-qspi: support SpacemiT K3
spi: ti-qspi: fix use-after-free after DMA setup failure
spi: sprd: fix error pointer deref after DMA setup failure
spi: qup: fix error pointer deref after DMA setup failure
spi: mtk-snfi: Fix resource leak in mtk_snand_read_page_cache()
spi: ep93xx: fix error pointer deref after DMA setup failure
Linus Torvalds [Fri, 22 May 2026 20:17:29 +0000 (13:17 -0700)]
Merge tag 'regulator-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes here, one very minor Kconfig fix and a fix for a
nasty issue with error reporting in the tps65219 driver"
* tag 'regulator-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: tps65219: fix irq_data.rdev not being assigned
regulator: Kconfig: fix a typo in help
Linus Torvalds [Fri, 22 May 2026 19:28:47 +0000 (12:28 -0700)]
Merge tag 'gpio-fixes-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- propagate the error code from regulator_enable() in resume path in
gpio-pca953x
- take the device lock when calling device_is_bound() in virtual GPIO
drivers
- fix software node leak in remove path in gpio-aggregator
- fix a potential use-after-free in gpio-aggregator
- harden the GPIO character device uAPI: check that line config
attributes are correctly zeroed
* tag 'gpio-fixes-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: virtuser: lock device when calling device_is_bound()
gpio: aggregator: lock device when calling device_is_bound()
gpio: sim: lock device when calling device_is_bound()
gpio: aggregator: remove the software node when deactivating the aggregator
gpio: aggregator: fix a potential use-after-free
gpio: cdev: check if uAPI v2 config attributes are correctly zeroed
gpio: pca953x: propagate regulator_enable() error from resume
Linus Torvalds [Fri, 22 May 2026 19:22:22 +0000 (12:22 -0700)]
Merge tag 'sound-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"As expected, we still continue receiving lots of small fixes.
One major change is about HD-audio pending IRQ handling, but this
would influence only on odd machines or slow VMs. There are a few
other fixes for the core part, but most of them are not-too-serious
UAF fixes, while the rest are mostly device-specific fixes and quirks.
ALSA Core:
- Fix for PCM silencing with bogus iov_iter
- Fixes for past-the-end iterators in timer and seq
- Serialization of UMP output teardown
- Rate-limit ELD parsing errors
HD-audio:
- Fixes for IRQ work handling and SSID matching
- Various Realtek quirks for HP and ASUS laptops, including LED fixes
ASoC:
- Intel: ACPI match table updates for PTL, NVL, and ARL platforms
- Cirrus Logic: Fixes for cs-amp-lib and cs35l56 codecs
- Various platform fixes for AMD, FSL SAI, TI OMAP, and Qualcomm
- DT-binding fix for MediaTek
Others:
- USB ua101: Reject too-short USB descriptors
- Scarlett2: Fix for flash writes
- ASIHPI: Fix for potential OOB access
- AMD SPI: Fix for bus number in ACPI probe
MAINTAINERS:
- Updates for SOF and TI maintainers"
* tag 'sound-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (47 commits)
ASoC: codecs: pcm512x: fix null-ptr dereference in pcm512x_overclock_xxx_put()
ASoC: Intel: soc-acpi-intel-ptl-match: Remove unnecessary cs42l43 match
ASoC: soc-acpi-intel-ptl-match: Make Chrome matches conditional
ASoC: Intel: soc-acpi: Add entry for sof_es8336 in NVL match table.
ASoC: Intel: sof_sdw: Add support for nvlrvp in NVL platform
ASoC: cs-amp-lib: Fix typo in error message: write -> read
ASoC: cs-amp-lib: Fix missing dput() after debugfs_lookup()
ASoC: cs-amp-lib: Fix wrong sizeof() in _cs_amp_set_efi_calibration_data()
ASoC: cs35l56: Fix flushing of IRQ work in cs35l56_sdw_remove()
MAINTAINERS: ASoC: Intel/SOF: Remove Ranjani Sridharan as maintainer
ALSA: seq: Serialize UMP output teardown with event_input
ALSA: scarlett2: Allow flash writes ending at segment boundary
ALSA: hda/realtek: Add LED quirk for HP ProBook 430 G6
ALSA: hda/intel: Make sure to cancel irq-pending work at closing PCM stream
ALSA: hda: Move irq pending work into hda-intel stream
ASoC: soc-utils: Add missing va_end in snd_soc_ret()
ALSA: ua101: Reject too-short USB descriptors
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP 16 Piston OmniBook X
ALSA: seq: avoid past-the-end iterator in snd_seq_create_port()
ALSA: timer: avoid past-the-end iterator in snd_timer_dev_register()
...
Linus Torvalds [Fri, 22 May 2026 19:06:23 +0000 (12:06 -0700)]
Merge tag 'block-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Fix memory leak for peer-to-peer addresses
- Fix dma map leaks on resource errors
- Another bio integrity fix, fixing a recent regression
- Fix for an issue with the request pre-allocation and caching when IO
is queued, where if a bio split occurred and ended up blocking, the
list could be corrupted
* tag 'block-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: avoid use-after-free in disk_free_zone_resources()
blk-mq: pop cached request if it is usable
nvme-pci: fix dma mapping leak on data setup error
nvme-pci: fix dma_vecs leak on p2p memory
bio-integrity-fs: pass data iter to bio_integrity_verify()
Linus Torvalds [Fri, 22 May 2026 18:53:28 +0000 (11:53 -0700)]
Merge tag 'io_uring-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Fix for an issue with IORING_OP_NOP and using injection results
- Fix for an issue in IORING_OP_WAITID, where the info state was
assumed cleared by the lower level syscall handler, but for some
cases it is not. Just clear the data upfront, so that non-initialized
data isn't copied back to userspace
- Fix for a lockdep reported issue, where IORING_OP_BIND enters file
create and hence hits mnt_want_write(), which creates a three part
lockdep cycle between the super lock, io_uring's uring_lock, and the
cred mutex
- Fix a regression introduced in this cycle with how linked timeouts
are deleted
- Ensure that the ->opcode nospec indexing on the opcode issue side
covers all the cases
* tag 'io_uring-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/nop: pass all errors to userspace
io_uring/timeout: splice timed out link in timeout handler
io_uring: propagate array_index_nospec opcode into req->opcode
io_uring/waitid: clear waitid info before copying it to userspace
io_uring/net: punt IORING_OP_BIND async if it needs file create
Ziyu Zhang [Tue, 19 May 2026 16:56:36 +0000 (00:56 +0800)]
vsock: keep poll shutdown state consistent
vsock_poll() reads vsk->peer_shutdown before taking the socket lock
to set EPOLLHUP and EPOLLRDHUP, then reads it again after taking
the lock to report EOF readability. A shutdown packet can update
peer_shutdown while poll is waiting for the lock, so one poll invocation
can report EOF readability without the corresponding HUP/RDHUP bits.
For connectible sockets, take one peer_shutdown snapshot after
lock_sock() and use it for all peer-shutdown-derived poll bits. For
datagram sockets, which do not take lock_sock() in poll(), take one
lockless READ_ONCE() snapshot and pair it with WRITE_ONCE() on the
writer side.
This keeps the peer-shutdown-derived bits internally consistent for each
poll pass.
Linus Torvalds [Fri, 22 May 2026 17:52:26 +0000 (10:52 -0700)]
Merge tag 'v7.1-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix missing lock
- Fix dentry in use after unmounting
- cifs.upcall security fix
- require CAP_NET_ADMIN for swn netlink
- change allocation in DUP_CTX_STR to GFP_KERNEL
- minor smbdirect debug fix
- handle_read_data() folio fix
* tag 'v7.1-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: change allocation requirements in DUP_CTX_STR macro
smb: client: require net admin for CIFS SWN netlink
smb: smbdirect: divide, not multiply, milliseconds by 1000
cifs: Fix busy dentry used after unmounting
smb: client: use data_len for SMB2 READ encrypted folioq copy
smb: client: reject userspace cifs.spnego descriptions
smb: client: protect tc_count increment in smb2_find_smb_sess_tcon_unlocked()
Weiming Shi [Thu, 21 May 2026 16:33:13 +0000 (09:33 -0700)]
tun: free page on build_skb failure in tun_xdp_one()
When build_skb() fails in tun_xdp_one(), the function sets ret to
-ENOMEM and jumps to the out label, which returns without freeing the
page that vhost_net_build_xdp() allocated for the frame. As with the
short-frame rejection path, tun_sendmsg() discards the per-buffer error
and still returns total_len, so vhost_tx_batch() takes the success path
and never frees the page. Each build_skb() failure in a batch leaks one
page-frag chunk.
Free the page before taking the error path, matching the put_page() the
other error exits of tun_xdp_one() already perform.
Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260521163312.1479805-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Weiming Shi [Thu, 21 May 2026 16:32:31 +0000 (09:32 -0700)]
tap: free page on error paths in tap_get_user_xdp()
tap_get_user_xdp() rejects a frame shorter than ETH_HLEN with -EINVAL,
and returns -ENOMEM when build_skb() fails. Both paths jump to the err
label without freeing the page that vhost_net_build_xdp() allocated for
the frame. tap_sendmsg() discards the per-buffer return value and always
returns 0, so vhost_tx_batch() takes the success path and never frees
the page; each rejected frame in a batch leaks one page-frag chunk.
Free the page on both error paths, before the skb is built. This is the
tap counterpart of the same leak in tun_xdp_one().
Fixes: 0efac27791ee ("tap: accept an array of XDP buffs through sendmsg()") Fixes: ed7f2afdd0e0 ("tap: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260521163230.1478627-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Weiming Shi [Wed, 20 May 2026 16:00:21 +0000 (09:00 -0700)]
tun: free page on short-frame rejection in tun_xdp_one()
tun_xdp_one() returns -EINVAL on a frame shorter than ETH_HLEN without
freeing the page that vhost_net_build_xdp() allocated for it.
tun_sendmsg() discards that -EINVAL and still returns total_len, so
vhost_tx_batch() takes the success path and never frees the page; each
short frame in a batch leaks one page-frag chunk.
A local process that can open /dev/net/tun and /dev/vhost-net can hit
this path: it attaches a tun/tap device as the vhost-net backend and
feeds TX descriptors whose length minus the virtio-net header is below
ETH_HLEN. Each kick leaks the page-frag chunks for that batch, and a
tight submission loop exhausts host memory and triggers an OOM panic.
Free the page before returning -EINVAL, matching the XDP-program error
path in the same function.
Fixes: 049584807f1d ("tun: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260520160020.375349-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 22 May 2026 14:13:13 +0000 (07:13 -0700)]
Merge tag 'pm-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix maximum frequency computation in the intel_pstate driver for
two processor models, update its documentation and fix issues related
to the dynamic EPP support (added during the current development
cycle) in the amd-pstate driver:
- Fix maximum frequency computation in the intel_pstate driver for
Raptor Lake-E and Bartlett Lake that are SMP platforms derived from
hybrid ones (Rafael Wysocki, Henry Tseng)
- Fix the description of asymmetric packing with SMT in the
intel_pstate driver documentation (Ricardo Neri)
- Fix multiple amd-pstate driver issues related to dynamic EPP
support added recently, including making it opt-in only (K Prateek
Nayak, Mario Limonciello)"
* tag 'pm-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Drop Kconfig option for dynamic EPP
cpufreq: intel_pstate: Use HYBRID_SCALING_FACTOR_ADL for Bartlett Lake
cpufreq: intel_pstate: Use correct scaling factor on Raptor Lake-E
Documentation: intel_pstate: Fix description of asymmetric packing with SMT
cpufreq/amd-pstate-ut: Drop policy reference before driver switch
cpufreq/amd-pstate: Use "epp_default_dc" as default when dynamic_epp is disabled
cpufreq/amd-pstate: Reorder notifier unregistration and floor perf reset
cpufreq/amd-pstate: Allow writes to dynamic_epp when state isn't modified
cpufreq/amd-pstate: Return -ENOMEM on failure to allocate profile_name
cpufreq/amd-pstate: Grab "amd_pstate_driver_lock" when toggling dynamic_epp
Johan Hovold [Fri, 22 May 2026 10:16:21 +0000 (12:16 +0200)]
USB: serial: cypress_m8: fix memory corruption with small endpoint
Make sure that the interrupt-out endpoint max packet size is at least
eight bytes to avoid user-controlled slab corruption or NULL-pointer
dereference should a malicious device report a smaller size.
Fixes: 3416eaa1f8f8 ("USB: cypress_m8: Packet format is separate from characteristic size") Cc: stable@vger.kernel.org # 2.6.26 Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org>
Linus Torvalds [Fri, 22 May 2026 14:06:21 +0000 (07:06 -0700)]
Merge tag 'acpi-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support fix from Rafael Wysocki:
"Unbreak system wakeup on critical battery status in the ACPI battery
driver inadvertently broken during the 7.0 development cycle"
* tag 'acpi-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: battery: Fix system wakeup on critical battery status
Damien Le Moal [Fri, 22 May 2026 11:56:22 +0000 (20:56 +0900)]
block: avoid use-after-free in disk_free_zone_resources()
The function disk_update_zone_resources() may call
disk_free_zone_resources() in case of error, and following this,
blk_revalidate_disk_zones() will again calls disk_free_zone_resources() if
disk_update_zone_resources() failed. If a zone worker thread is being used
(which is the default for a rotational media zoned device),
disk_free_zone_resources() will try to stop the zone worker thread twice
because disk->zone_wplugs_worker is not reset to NULL when the worker
thread is stopped the first time.
In disk_free_zone_resources(), fix this by correctly clearing
disk->zone_wplugs_worker to NULL when the worker thread is stopped.
And while at it, since disk_free_zone_resources() is always called after a
failed call to disk_update_zone_resources(), remove the unnecessary call
to disk_free_zone_resources() in disk_update_zone_resources().
Fixes: 1365b6904fd0 ("block: allow submitting all zone writes from a single context") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260522115622.588535-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 22 May 2026 13:53:11 +0000 (06:53 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Handle probe on hinted conditional branch instructions.
BC.cond instructions can be simulated in the same way as B.cond
instructions, so extend the decode mask for B.cond to cover BC.cond
- Flush the walk cache when unsharing PMD tables. Recent changes to
huge_pmd_unshare() introduced mmu_gather::unshared_tables but the
arm64 code was still treating the TLB flushing as only targeting leaf
entries (TLBI VALE1IS).
Fix it by using non-leaf-only instructions (TLBI VAE1IS) when
tlb->unshared_tables is set
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: tlb: Flush walk cache when unsharing PMD tables
arm64: probes: Handle probes on hinted conditional branch instructions
Linus Torvalds [Fri, 22 May 2026 13:40:31 +0000 (06:40 -0700)]
Merge tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix PAI NNPA mismatch between counting and recording, where sampling
reports twice the value
- Fix loss of PAI counter increments during recording on systems with
many CPUs under heavy load, while counting is not affected
- On some supported machines, CHSC cannot access memory outside the DMA
zone, causing CHSC command failures. Restore GFP_DMA flag when
allocating memory for CHSC control blocks
- Align the numbering scheme for higher-level topology structures like
socket, book, drawer with other hardware identifiers e.g. in sysfs,
procfs and tools like lscpu
* tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/topology: Use zero-based numbering for containing entities
s390/cio: Restore GFP_DMA for CHSC allocation
s390/pai: Fix missing PAI counter increments under heavy load
s390/pai: Disable duplicate read of kernel PAI counter value
Linus Torvalds [Fri, 22 May 2026 13:23:56 +0000 (06:23 -0700)]
Merge tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Stable fix for a missing cpus_read_lock in one of the cpu sheaves
flushing paths (Qing Wang)
* tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: hold cpus_read_lock around flush_rcu_sheaves_on_cache()
Aleksandr Nogikh [Thu, 21 May 2026 14:22:40 +0000 (16:22 +0200)]
signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads()
When a multi-threaded process receives a stop signal (e.g., SIGSTOP),
do_signal_stop() sets JOBCTL_STOP_PENDING and JOBCTL_STOP_CONSUME on all
threads and sets signal->group_stop_count to the number of threads. If
one of the threads concurrently calls execve(), de_thread() invokes
zap_other_threads() to kill all other threads. zap_other_threads()
aborts the pending group stop by resetting signal->group_stop_count to 0
and clears the JOBCTL_PENDING_MASK for all other threads. However, it
fails to clear the job control flags for the calling thread.
When execve() completes, the calling thread returns to user mode and
checks for pending signals. Seeing the stale JOBCTL_STOP_PENDING flag,
it calls do_signal_stop(), which invokes task_participate_group_stop().
Since JOBCTL_STOP_CONSUME is still set, it attempts to decrement the
already-zero signal->group_stop_count, triggering a warning:
Fix this race condition by clearing the JOBCTL_PENDING_MASK for the
calling thread in zap_other_threads(), ensuring it does not retain any
stale job control state after the thread group is destroyed. This aligns
with other functions that tear down a thread group and abort group
stops, such as zap_process() and complete_signal(), which correctly
clear these flags for all threads including the current one.
Fixes: 39efa3ef3a37 ("signal: Use GROUP_STOP_PENDING to stop once for a single group stop") Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot Reported-by: syzbot+b109633ea805cac54a61@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b109633ea805cac54a61 Link: https://syzkaller.appspot.com/ai_job?id=d70208cc-862b-4fe3-bf02-3031e10cd0b3 Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Link: https://patch.msgid.link/20260521142240.2973022-1-nogikh@google.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Jann Horn [Tue, 19 May 2026 14:29:38 +0000 (16:29 +0200)]
fuse: reject fuse_notify() pagecache ops on directories
The operations FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE allow the
FUSE daemon to actively write/read pagecache contents.
For directories with FOPEN_CACHE_DIR, the pagecache is used as
kernel-internal cache storage, and userspace is not supposed to have
direct access to this cache - in particular, fuse_parse_cache() will hit
WARN_ON() if the cache contains bogus data.
Reject FUSE_NOTIFY_STORE and FUSE_NOTIFY_RETRIEVE on anything other than
regular files with -EINVAL.
Jann Horn [Tue, 19 May 2026 14:40:34 +0000 (16:40 +0200)]
fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios
FUSE_NOTIFY_RETRIEVE must be limited to uptodate folios; !uptodate folios
can contain uninitialized data.
Since FUSE_NOTIFY_RETRIEVE is intended to only return data that is already
in the page cache and not wait for data from the FUSE daemon, treat
!uptodate folios as if they weren't present.
This only has security impact on systems that don't enable automatic
zero-initialization of all page allocations via
CONFIG_INIT_ON_ALLOC_DEFAULT_ON or init_on_alloc=1.
Linus Torvalds [Fri, 22 May 2026 13:16:00 +0000 (06:16 -0700)]
Merge tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
"Two minor updates for the DMA-mapping code, mainly fixing some rare
corner cases (Petr Tesarik, Jianpeng Chang)"
* tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: move dma_map_resource() sanity check into debug code
dma-direct: fix use of max_pfn
Linus Torvalds [Fri, 22 May 2026 13:09:58 +0000 (06:09 -0700)]
Merge tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Avoid NULL return from hist_field_name()
The function hist_field_name() is directly passed to a strcat() which
does not handle "NULL" characters. Return a zero length string when
size is greater than the limit.
This is used only to output already created histograms and no field
currently is greater than the limit. But it should still not return
NULL.
- Do not call map->ops->elt_free() on allocation failure
When elt_alloc() fails, it should not call the map->ops->elt_free()
function if it exists, as that function may not be able to handle the
free on allocation failures. The ->elt_free() should only be called
when elt_alloc() succeeds.
* tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Do not call map->ops->elt_free() if elt_alloc() fails
tracing: Avoid NULL return from hist_field_name() on truncation
The newly added driver requires the LED classdev support
and causes a link failure when that is disabled:
x86_64-linux-ld: vmlinux.o: in function `bitland_mifs_wmi_probe':
bitland-mifs-wmi.c:(.text+0xede02a): undefined reference to `devm_led_classdev_register_ext'
netfilter: nf_tables: fix dst corruption in same register operation
For lshift and rshift, the shift operations are performed in a loop over
32-bit words. The loop calculates the shifted value and write it to dst,
and then immediately reads from src to calculate the carry for the next
iteration. Because src and dst could point to the same memory location,
the carry is incorrectly calculated using the newly modified dst value
instead of the original src value.
Adding a temporary local variable to cache the original value before
writing to dst and using it for the carry calculation solves the
problem. In addition, partial overlap is rejected from control plane for
all kind of operations including byteorder. This was tested with the
following bytecode:
Jiayuan Chen [Wed, 20 May 2026 02:34:11 +0000 (10:34 +0800)]
selftests: netfilter: add nft_fib_nexthop test
Functional coverage of nft_fib6_eval()'s nexthop enumeration over
three route shapes:
1) single external nexthop (nhid)
2) external nexthop group (nhid -> group)
3) old-style multipath (nexthop ... nexthop ...)
Each scenario places one nexthop on the input device (veth0). For
(2) and (3) the matching nexthop is the second member, so the walk
has to traverse beyond the primary nh. Two nft counters on prerouting
verify the data path: one increments only when fib reports veth0 as
the oif, the other counts "missing" results and must stay at zero.
./nft_fib_nexthop.sh
PASS: single external nexthop (nhid -> veth0)
PASS: nexthop group (dummy0 + veth0)
PASS: old-style multipath (sibling on veth0)
nft_fib6_info_nh_uses_dev() blindly walks &rt->fib6_siblings, causing
an OOB read past the struct nexthop slab when rt->nh is set:
==================================================================
BUG: KASAN: slab-out-of-bounds in nft_fib6_eval+0x1362/0x16c0
Read of size 8 at addr ffff888103a099d0 by task ping/386
Branch by route shape: when rt->nh is set, walk via
nexthop_for_each_fib6_nh() (also covers nh groups, which the original
code missed); otherwise walk fib6_siblings, guarded by READ_ONCE() of
rt->fib6_nsiblings as required by commit 31d7d67ba127 ("ipv6: annotate
data-races around rt->fib6_nsiblings").
Jiayuan Chen [Wed, 20 May 2026 02:34:09 +0000 (10:34 +0800)]
netfilter: nft_fib_ipv6: walk fib6_siblings under RCU
nft_fib6_info_nh_uses_dev() runs from nft_fib6_eval() in softirq under
rcu_read_lock(). fib6_siblings is modified by writers that hold
tb6_lock but do not wait for RCU readers, so the sibling walk should
use list_for_each_entry_rcu(): it adds READ_ONCE() on the ->next
pointer and lets CONFIG_PROVE_RCU_LIST validate the locking.
Florian Westphal [Tue, 19 May 2026 20:52:07 +0000 (22:52 +0200)]
netfilter: ebtables: fix OOB read in compat_mtw_from_user
Luxiao Xu says:
The function compat_mtw_from_user() converts ebtables extensions from
32-bit user structures to kernel native structures. However, it lacks
proper validation of the user-supplied match_size/target_size.
When certain extensions are processed, the kernel-side translation
logic may perform memory accesses based on the extension's expected
size. If the user provides a size smaller than what the extension
requires, it results in an out-of-bounds read as reported by KASAN.
This fix introduces a check to ensure match_size is at least as large
as the extension's required compatsize. This covers matches, watchers,
and targets, while maintaining compatibility with standard targets.
AFAIU this is relevant for matches that need to go though
match->compat_from_user() call. Those that use plain memcpy with the
user-provided size are ok because the caller checks that size vs the
start of the next rule entry offset (which itself is checked vs. total
size copied from userspace).
The ->compat_from_user() callbacks assume they can read compatsize bytes,
so they need this extra check.
Based on an earlier patch from Luxiao Xu.
Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Luxiao Xu <rakukuip@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Sat, 16 May 2026 15:23:21 +0000 (23:23 +0800)]
netfilter: disable payload mangling in userns
Several parts of network stack rely on iph->ihl validation
done by network stack before PRE_ROUTING.
Disable this feature for user namespaces for now.
tcp option handling is likely safe even for LOCAL_IN, so this
this leaves tcp option mangling via nft_exthdr.c as-is.
I don't think these are the only means to alter packets, but these
appear to be relatively prominent.
This could be relaxed later. Example:
- allow userns for ingress hook.
- allow userns if base is transport header.
Also, we should revalidate or restrict generally:
- Don't allow linklayer writes to spill into network header
- restrict ipv4 and ipv6 to 'known safe' writes, e.g.
saddr/daddr/check/tos
Florian Westphal [Thu, 14 May 2026 12:21:57 +0000 (14:21 +0200)]
netfilter: nf_conntrack_gre: fix gre keymap list corruption
Quoting reporter:
A race between GRE keymap insertion and destruction can corrupt the
kernel list or use a freed object. `nf_ct_gre_keymap_add()` publishes a
new keymap pointer before the embedded `list_head` is linked, while
`nf_ct_gre_keymap_destroy()` can concurrently delete and free that
same object. An unprivileged user can reach this through the PPTP
conntrack helper by racing PPTP control messages or helper teardown,
leading to KASAN-detectable list corruption/UAF in kernel context.
## Root Cause Analysis
`exp_gre()` installs GRE expectations for a PPTP control flow and then
adds two GRE keymap entries [..]
The add path publishes `ct_pptp_info->keymap[dir]` before linking the
embedded list node [..]
Concurrent teardown deletes that partially initialized object.
Make add/destroy symmetric: install both, destroy both while under lock.
Furthermore, we should refuse to publish a new mapping in case ct is going
away, else we may leak the allocation.
The "retrans" detection is strange: existing mapping is checked for key
equality with the new mapping, then for "is on the list" via list walk.
But I can't see how an existing keymap entry can be NOT on list.
Change this to only check if we're asked to map same tuple again -- if so,
skip re-install, else signal failure.
Last, add a bug trap for the keymap list; it has to be empty when namespace
is going away.
Reported-by: Leo Lin <leo@depthfirst.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Chris Mason [Tue, 19 May 2026 19:36:14 +0000 (12:36 -0700)]
netfilter: synproxy: refresh tcphdr after skb_ensure_writable
synproxy_tstamp_adjust() rewrites the TCP timestamp option in place
and then patches the TCP checksum via inet_proto_csum_replace4() on
the caller-supplied tcphdr pointer. Both ipv4_synproxy_hook() and
ipv6_synproxy_hook() obtain that pointer with skb_header_pointer()
before calling in, so it may either alias skb->head directly or
point at the caller's on-stack _tcph buffer.
Between obtaining the pointer and using it, the function calls
skb_ensure_writable(skb, optend), which on a cloned or non-linear
skb invokes pskb_expand_head() and frees the old skb->head. After
that point the cached th is stale:
caller (ipv[46]_synproxy_hook)
th = skb_header_pointer(skb, ..., &_tcph)
synproxy_tstamp_adjust(skb, protoff, th, ...)
skb_ensure_writable(skb, optend)
pskb_expand_head() /* kfree(old skb->head) */
...
inet_proto_csum_replace4(&th->check, ...)
/* writes into freed head, or
into the caller's stack copy
leaving the on-wire checksum
stale */
The option bytes are written through skb->data and are fine; only
the checksum update goes through th and so lands in the wrong
place. The result is either a write into freed slab memory or a
packet leaving with a checksum that does not match its payload.
Fix by re-deriving th from skb->data + protoff immediately after
skb_ensure_writable() succeeds, so the subsequent checksum update
targets the linear, writable header.
Hamza Mahfooz [Mon, 11 May 2026 14:43:14 +0000 (10:43 -0400)]
netfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction check
An unintended behavior in the TCP conntrack state machine allows a
connection to be forced into the CLOSE state using an RST packet with an
invalid sequence number.
Specifically, after a SYN packet is observed, an RST with an invalid SEQ
can transition the conntrack entry to TCP_CONNTRACK_CLOSE, regardless of
whether the RST corresponds to the expected reply direction. The relevant
code path assumes the RST is a response to an outgoing SYN, but does not
validate packet direction or ensure that a matching SYN was actually sent
in the opposite direction.
As a result, a crafted packet sequence consisting of a SYN followed by an
invalid-sequence RST can prematurely terminate an active NAT entry. This
makes connection teardown easier than intended.
So, tighten the state transition logic to ensure that RST-triggered
CLOSE transitions only occur when the RST is a valid response to a
previously observed SYN in the correct direction.
device property: set fwnode->secondary to NULL in fwnode_init()
If a firmware node is allocated on the stack (for instance: temporary
software node whose life-time we control) or on the heap - but using a
non-zeroing allocation function - and initialized using fwnode_init(),
its secondary pointer will contain uninitalized memory which likely will
be neither NULL nor IS_ERR() and so may end up being dereferenced (for
example: in dev_to_swnode()). Set fwnode->secondary to NULL on
initialization.
Cc: stable <stable@kernel.org> Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xiaolei Wang [Mon, 18 May 2026 07:34:05 +0000 (15:34 +0800)]
misc: rp1: Send IACK on IRQ activate to fix kdump/kexec
After a kexec/kdump reboot, the macb Ethernet controller fails to
receive any packets, causing DHCP to hang indefinitely and the network
interface to be unusable despite link being up.
The root cause is that RP1's level-triggered MSI-X interrupt sources
(such as macb on hwirq 6) may have their internal state machines stuck
in the "waiting for IACK" state. This happens because the previous
kernel crashed before sending the acknowledgment for a pending level
interrupt.
In this stuck state, RP1 will not generate new MSI-X writes even though
the interrupt source remains asserted. Since no new MSI-X is sent, the
GIC never sees a new edge, the chained IRQ handler is never invoked,
and the interrupt is permanently lost.
Fix this by sending MSIX_CFG_IACK in rp1_irq_activate(). This
unconditionally resets the MSI-X state machine back to idle when a
child device requests its interrupt. If the interrupt source is still
asserted, RP1 will immediately issue a new MSI-X with the freshly
configured msg_addr/msg_data, and normal interrupt delivery resumes.
Writing IACK when the state machine is already idle (i.e., on a normal
cold boot) is harmless — it has no effect.
Hongling Zeng [Mon, 18 May 2026 02:29:39 +0000 (10:29 +0800)]
gpib: cb7210: Fix region leak when request_irq fails
When request_irq() fails, the region allocated by request_region()
is not released. Fix this by adding an error handling path with
proper goto labels to release the region.
Ben Hutchings [Tue, 5 May 2026 18:45:12 +0000 (20:45 +0200)]
parport: Fix race between port and client registration
The parport subsystem registers port devices before they are fully
initialised, resulting in a race condition where client drivers such
as lp can attach to ports that are not completely initialised or even
being torn down.
When the port and client drivers are built as modules and loaded
around the same time during boot, this occasionally results in a
crash. I was able to make this happen reliably in a VM with a
PC-style parallel port by patching parport_pc to fail probing:
while true; do
modprobe lp & modprobe parport_pc
wait
rmmod lp parport_pc
done
for a few seconds.
In the long term I think port registration should be changed to put
the call to device_add() inside parport_announce_port(), but since the
latter currently cannot fail this will require changing all port
drivers.
For now, add a flag to indicate whether a port has been "announced"
and only try to attach client drivers to ports when the flag is set.
Guangshuo Li [Tue, 5 May 2026 15:02:56 +0000 (23:02 +0800)]
uio: uio_pci_generic_sva: fix double free of devm_kzalloc() memory
uio_pci_sva allocates struct uio_pci_sva_dev with devm_kzalloc() in
probe(), but then calls kfree(udev) both on the probe() error path
(label out_free) and again in remove().
Because devm_kzalloc() allocations are devres-managed and are freed
automatically when the device is detached (including after a failing
probe() and during driver unbind), the explicit kfree() can lead to a
double free.
If probe() fails after devm_kzalloc(), the error path frees udev and
devres cleanup will free it again when the core unwinds the partially
bound device. On normal driver removal, remove() frees udev and devres
will free it again when the device is detached.
This issue was identified by a static analysis tool I developed and
confirmed by manual review. Fix by removing the manual kfree() calls
and dropping the now-unused label.
Fixes: 3397c3cd859a2 ("uio: Add SVA support for PCI devices via uio_pci_generic_sva.c") Cc: stable <stable@kernel.org> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260505150256.614071-1-lgs201920130244@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zeng Heng [Thu, 21 May 2026 07:30:11 +0000 (15:30 +0800)]
arm64: tlb: Flush walk cache when unsharing PMD tables
When huge_pmd_unshare() is called to unshare a PMD table, the
tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true
but the aarch64 tlb_flush() only checked tlb->freed_tables to
determine whether to use TLBF_NONE (vae1is, invalidates walk
cache) or TLBF_NOWALKCACHE (vale1is, leaf-only).
This caused the stale PMD page table entry to remain in the walk cache
after unshare, potentially leading to incorrect page table walks.
Fix by including unshared_tables in the check, so that when
unsharing tables, TLBF_NONE is used and the walk cache is properly
invalidated.
Here is the detailed distinction between vae1is and vale1is:
| Instruction Combination | Actual Invalidation Scope |
| ------------------------ | --------------------------------------------------|
| `VAE1IS` + TTL=`0` | All entries at all levels (full invalidation) |
| `VAE1IS` + TTL=`2` (L2) | Non-leaf at Level 0/1 + leaf at Level 2 |
| `VALE1IS` + TTL=`0` | Leaf entries at all levels (non-leaf not cleared) |
| `VALE1IS` + TTL=`2` (L2) | Leaf entry at Level 2 only |
Matthew Maurer [Fri, 3 Apr 2026 18:18:58 +0000 (18:18 +0000)]
rust_binder: Avoid holding lock when dropping delivered_death
In 6c37bebd8c926, we switched to looping over the list and dropping each
individual node, ostensibly without the lock held in the loop body.
If the kernel were using Rust Edition 2024, the comment would be
accurate, and the lock would not be held across the drop. However, the
kernel is currently using 2021, so tail expression lifetime extension
results in the lock being held across the drop. Explicitly binding the
expression result to a variable makes the lockguard no longer part of a
tail expression, causing the lock to be dropped before entering the loop
body.
This was detected via `CONFIG_PROVE_LOCKING` identifying an invalid wait
context at the drop site.
Reported-by: David Stevens <stevensd@google.com> Signed-off-by: Matthew Maurer <mmaurer@google.com> Cc: stable <stable@kernel.org> Fixes: 6c37bebd8c92 ("rust_binder: avoid mem::take on delivered_deaths") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://patch.msgid.link/20260403-lockhold-v1-1-c332b56cd8ae@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alice Ryhl [Tue, 14 Apr 2026 12:02:34 +0000 (12:02 +0000)]
rust_binder: avoid calling pending_oneway_finished() on TF_UPDATE_TXN
When an outdated transaction is removed from `oneway_todo` due to
`TF_UPDATE_TXN`, its `Allocation` is dropped. The current implementation
of `Allocation::drop` calls `pending_oneway_finished()`, assuming the
transaction was executed. This leads to premature execution of the next
queued one-way transaction.
Fix this by taking the `oneway_node` from the `Allocation` of the
outdated transaction before it is dropped. This prevents
`Allocation::drop` from signaling completion.
We do not call `take_oneway_node()` from `Transaction::cancel` because
it's actually correct to call `pending_oneway_finished()` on cancel if
the transaction did not come from `oneway_todo`. This ensures that if
`BINDER_THREAD_EXIT` is invoked and cancels a oneway transaction, then
the next transaction is taken from `oneway_todo`.
This bug does not lead to any issues in the kernel, but may lead to
Binder delivering transactions to userspace earlier than userspace
expected to receive them.
Enable modular build since the driver now has a proper module_exit()
handler. There's nothing specific to DZ hardware to prevent driver
unloading and reloading from working.
(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.
Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device. Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.
Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it. Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.
An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.
Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10 Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-- where a pointer is dereferenced that has been derived from a null
pointer to the port's parent device.
Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device. Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.
Use platform_driver_probe() not just because the DZ device is fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the zs
driver and the first driver to claim it will prevent the other one from
using it. Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.
An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.
Conversely only starting the console port so late lets the reset code
fully utilise our delay handlers, so switch from udelay() to fsleep()
for transmitter draining so as to avoid busy-waiting for an excessive
amount of time.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10 Link: https://patch.msgid.link/alpine.DEB.2.21.2605062326540.46195@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Switch the driver to using the channel reset rather than hardware reset,
simplifying handling by removing an interference between channels that
causes the other channel to become uninitialised afterwards.
There is little difference between the two kinds of reset in terms of
register settings that result, and we initialise the whole register set
right away anyway. However this prevents a hang from happening should
the console output handler in the firmware try to access the other port
whose transmitter has been disabled and line parameters messed up.
For example this will happen if the keyboard port (port A) is chosen for
the system console, unusually but not insanely for a headless system, as
the port is wired to a standard DA-15 connector and an adapter can be
easily made. Or with the next change in place this would happen for the
regular console port (port B), since the keyboard port (port A) will be
initialised first.
Just remove the unnecessary complication then, a channel reset is good
enough. We still need the initialisation marker, now per channel rather
than per SCC, as for the console port zs_reset() will be called twice:
once early on via zs_serial_console_init() for the console setup only,
and then again via zs_config_port() as the port is associated with a TTY
device.
Calling zs_reset() in the course of setting up the serial device causes
line parameters to be reset and the transmitter disabled. We've been
lucky in that no message is usually produced to the kernel log between
this call and the later call to uart_set_options() in the course of
console setup done by zs_serial_console_init(), or the system would hang
as the console output handler in the firmware tried to access a port the
transmitter of which has been disabled and line parameters messed up.
This will change with the next change to the driver, so fix zs_reset()
such that line parameters are set for 9600n8 console operation as with
the system firmware and the transmitter re-enabled after reset. This
also means zs_pm() serves no purpose anymore, so drop it.
Calling dz_reset() in the course of setting up the serial device causes
line parameters to be reset and the transmitter disabled. We've been
lucky in that no message is usually produced to the kernel log between
this call and the later call to uart_set_options() in the course of
console setup done by dz_serial_console_init(), or the system would hang
as the console output handler in the firmware tried to access a port the
transmitter of which has been disabled and line parameters messed up.
This will change with the next change to the driver, so fix dz_reset()
such that line parameters are set for 9600n8 console operation as with
the system firmware and the transmitter re-enabled after reset. This
also means dz_pm() serves no purpose anymore, so drop it.
serial: dz: Fix bootconsole message clobbering at chip reset
In the DZ interface as implemented by the DC7085 gate array the serial
transmitters are double buffered, meaning that at the time a transmitter
is ready to accept the next character there is one in the transmit shift
register still being sent to the line. Issuing a master clear at this
time causes this character to be lost, so wait an extra amount of time
sufficient for the transmit shift register to drain at 9600bps, which is
the baud rate setting used by the firmware console.
Mind the specified 1.4us TRDY recovery time in the course and continue
using iob() as the completion barrier, since the platforms involved use
a write buffer that can delay and combine writes, and reorder them with
respect to reads regardless of the MMIO locations accessed and we still
lack a platform-independent handler for that.
When called from dz_serial_console_init() this is too early for fsleep()
to work and even before lpj has been calculated and therefore the delay
is actually not sufficient for the transmitter to drain and is merely a
placeholder now. This will be addressed in a follow-up change.
Jacques Nilo [Wed, 13 May 2026 13:30:25 +0000 (15:30 +0200)]
serial: 8250_dw: dispatch SysRq character in dw8250_handle_irq()
dw8250_handle_irq() calls serial8250_handle_irq_locked() with the port
lock held via guard(uart_port_lock_irqsave). The guard destructor is
plain uart_port_unlock_irqrestore(), so a SysRq character captured into
port->sysrq_ch by uart_prepare_sysrq_char() is dropped without ever
being dispatched to handle_sysrq().
This is the same regression pattern as in serial8250_handle_irq(),
introduced when 883c5a2bc934 ("serial: 8250_dw: Rework
dw8250_handle_irq() locking and IIR handling") moved the function to
the guard()-based locking scheme without using the sysrq-aware unlock
helper.
Switch to guard(uart_port_lock_check_sysrq_irqsave) so that captured
sysrq_ch is dispatched on scope exit, matching the fix in
serial8250_handle_irq().
Jacques Nilo [Wed, 13 May 2026 13:30:24 +0000 (15:30 +0200)]
serial: 8250: dispatch SysRq character in serial8250_handle_irq()
serial8250_handle_irq() captures a SysRq character into port->sysrq_ch
inside serial8250_handle_irq_locked() via uart_prepare_sysrq_char()
(reached from serial8250_read_char()). Dispatch of that captured
character to handle_sysrq() is expected to happen at port-unlock time,
through uart_unlock_and_check_sysrq[_irqrestore]().
After commit 8324a54f604d ("serial: 8250: Add
serial8250_handle_irq_locked()") the function was reduced to a wrapper
that takes the port lock via guard(uart_port_lock_irqsave) whose
destructor is plain uart_port_unlock_irqrestore(). The sysrq-aware
unlock helper is no longer called, so port->sysrq_ch is captured but
never dispatched: BREAK + SysRq key is consumed silently.
This was the very condition Johan Hovold's 853a9ae29e978 ("serial:
8250: fix handle_irq locking", 2021) introduced
uart_unlock_and_check_sysrq_irqrestore() to address.
Switch to the new guard(uart_port_lock_check_sysrq_irqsave), whose
destructor is the sysrq-aware unlock helper, restoring the pre-split
behaviour. Update the Context: comment on serial8250_handle_irq_locked()
so future HW-specific 8250 wrappers know to use the same guard or the
explicit sysrq-aware unlock.
Verified on RTL8196E with CONFIG_MAGIC_SYSRQ_SERIAL=y: BREAK + 'h' on
the console UART produces the SysRq help dump in dmesg and the brk
counter in /proc/tty/driver/serial increments correctly.
uart_handle_break() and uart_prepare_sysrq_char() (in
include/linux/serial_core.h) capture a SysRq character into
port->sysrq_ch while the port lock is held and rely on the unlock
helper -- uart_unlock_and_check_sysrq_irqrestore() -- to dispatch the
captured character to handle_sysrq() on scope exit.
The existing guard(uart_port_lock_irqsave) cannot be used by IRQ
handlers that process RX, because its destructor calls plain
uart_port_unlock_irqrestore() and silently drops port->sysrq_ch.
Add a dedicated guard(uart_port_lock_check_sysrq_irqsave) variant
whose destructor is the sysrq-aware unlock helper. The lock side is
identical to uart_port_lock_irqsave -- only the unlock-time behaviour
differs. Callers that may capture SysRq characters must use
guard(uart_port_lock_check_sysrq_irqsave); the existing
guard(uart_port_lock_irqsave) keeps its current plain-unlock semantics
for the many callers that do not process RX.
The new macro is placed after the CONFIG_MAGIC_SYSRQ_SERIAL block so
both definitions of uart_unlock_and_check_sysrq_irqrestore() (sysrq
enabled and disabled) are visible at expansion time. When
CONFIG_MAGIC_SYSRQ_SERIAL=n the destructor degenerates to plain
uart_port_unlock_irqrestore(), so there is no overhead.
No functional change on its own; users are converted in the following
patches.
Tudor Ambarus [Fri, 15 May 2026 12:41:21 +0000 (12:41 +0000)]
tty: serial: samsung: Remove redundant port lock acquisition in rx helpers
Sashiko identified a deadlock when the console flow is engaged [1].
When console flow control is enabled (UPF_CONS_FLOW),
s3c24xx_serial_stop_tx() calls s3c24xx_serial_rx_enable() and
s3c24xx_serial_start_tx() calls s3c24xx_serial_rx_disable().
The serial core framework invokes the .stop_tx() and .start_tx()
callbacks with the port->lock spinlock already held. Furthermore, all
internal driver paths that invoke stop_tx (such as the DMA TX
completion handler s3c24xx_serial_tx_dma_complete() or the PIO TX IRQ
handler s3c24xx_serial_tx_irq()) also acquire port->lock prior to
calling it. (Note that s3c24xx_serial_start_tx() is only invoked by the
serial core).
However, s3c24xx_serial_rx_enable() and s3c24xx_serial_rx_disable()
unconditionally attempt to acquire port->lock again using
uart_port_lock_irqsave(). Since spinlocks are not recursive, this
causes a deadlock on the same CPU when console flow control is engaged.
Remove the redundant lock acquisition from both rx helper functions.
Cc: stable <stable@kernel.org> Fixes: b497549a035c ("[ARM] S3C24XX: Split serial driver into core and per-cpu drivers") Reported-by: John Ogness <john.ogness@linutronix.de> Closes: https://sashiko.dev/#/patchset/20260506121606.5805-1-john.ogness%40linutronix.de [1] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://patch.msgid.link/20260515-samsung-tty-flow-control-deadlock-v1-1-93255edbc9bc@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
altera_jtaguart_probe() maps the register window before registering the
UART port, but it ignores failures from uart_add_one_port(). If port
registration fails, probe still returns success and the mapping remains
live until a later remove path that is not part of probe failure cleanup.
Return the uart_add_one_port() error and unmap the register window on
that failure path.
This issue was identified during our ongoing static-analysis research while
reviewing kernel code.
Fixes: 5bcd601049c6 ("serial: Add driver for the Altera JTAG UART") Cc: stable <stable@kernel.org> Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Link: https://patch.msgid.link/20260512065837.79528-1-mhun512@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wentao Guan [Fri, 22 May 2026 09:13:58 +0000 (17:13 +0800)]
USB: cdc-acm: Fix bit overlap and move quirk definitions to header
The VENDOR_CLASS_DATA_IFACE and ALWAYS_POLL_CTRL quirk flags added in
commit f58752ebcb35 ("USB: cdc-acm: Add quirks for Yoga Book 9 14IAH10
INGENIC touchscreen") were placed inside the acm_ctrl_msg() function
rather than in the header with the other quirk flags. Then, their
values (BIT(9) and BIT(10)) collided with NO_UNION_12 which is already
BIT(9).
Move the definitions to drivers/usb/class/cdc-acm.h where they belong
and shift them to BIT(10) and BIT(11) to avoid the overlap.
Claudio Imbrenda [Tue, 19 May 2026 15:01:14 +0000 (17:01 +0200)]
KVM: s390: Properly reset zero bit in PGSTE
In case of memory pressure, it's possible that a guest page gets freed
and then almost immediately reused by the guest. If CMMA is enabled,
_essa_clear_cbrl() will discard all pages that are either unused or
zero. If a discarded page is reused before _essa_clear_cbrl() is called,
and the pgste.zero bit is not cleared, the page will be discarded
despite not being unused.
When calling _gmap_ptep_xchg(), always clear the pgste.zero bit. This
prevents the page from being accidentally discarded when not unused.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Claudio Imbrenda [Tue, 19 May 2026 15:01:13 +0000 (17:01 +0200)]
KVM: s390: vsie: Fix redundant rmap entries
The address passed to the gmap rmap was not being masked. As a
consequence several different (but functionally equivalent) rmap
entries were being created for each shadowed table.
Fix this by properly masking the address depending on the table level.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Claudio Imbrenda [Tue, 19 May 2026 15:01:12 +0000 (17:01 +0200)]
KVM: s390: vsie: Fix unshadowing logic
In some cases (i.e. under extreme memory pressure on the host),
attempting to shadow memory will result in the same memory being
unshadowed, causing a loop.
Add a PGSTE bit to distinguish between shadowed memory and shadowed DAT
tables, fix the unshadowing logic in _gmap_ptep_xchg() to prevent
unnecessary unshadowing and perform better checks.
Also fix the unshadowing logic in _gmap_crstep_xchg_atomic() which did
not unshadow properly when the large page would become unprotected.
Opportunistically add a check in gmap_protect_rmap() to make sure it
won't be called with level == TABLE_TYPE_PAGE_TABLE.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Xu Yang [Mon, 27 Apr 2026 07:57:55 +0000 (15:57 +0800)]
usb: chipidea: core: convert ci_role_switch to local variable
When a system contains multiple USB controllers, the global ci_role_switch
variable may be overwritten by subsequent driver initialization code.
This can cause issues in the following cases:
- The 2nd ci_hdrc_probe() sees ci_role_switch.fwnode as non-NULL even
though the "usb-role-switch" property is not present for the controller.
- When the ci_hdrc device is unbound and bound again, ci_role_switch
fwnode will not be reassigned, and the old value will be used instead.
Convert ci_role_switch to a local variable to fix these issues.
Fixes: 05559f10ed79 ("usb: chipidea: add role switch class support") Cc: stable <stable@kernel.org> Acked-by: Peter Chen <peter.chen@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://patch.msgid.link/20260427075755.3611217-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb: gadget: f_fs: serialize DMABUF cancel against request completion
ffs_epfile_dmabuf_io_complete() calls usb_ep_free_request() on the
completed request but leaves priv->req, the back-pointer that
ffs_dmabuf_transfer() set on submission, pointing at the freed
memory. A later FUNCTIONFS_DMABUF_DETACH ioctl or
ffs_epfile_release() on the close path still sees priv->req
non-NULL under ffs->eps_lock:
if (priv->ep && priv->req)
usb_ep_dequeue(priv->ep, priv->req);
so usb_ep_dequeue() is called on a freed usb_request.
On dummy_hcd the dequeue path only walks a live queue and
pointer-compares, so the freed pointer reads without faulting and
KASAN requires an explicit check at the FunctionFS call site to
surface the use-after-free. On SG-capable in-tree UDCs the
dequeue path dereferences the supplied request immediately:
* chipidea's ep_dequeue() does
container_of(req, struct ci_hw_req, req) and reads
hwreq->req.status before acquiring its own lock.
* cdnsp's cdnsp_gadget_ep_dequeue() reads request->status first.
The narrower option of clearing priv->req via cmpxchg() in the
completion does not close the race: the completion runs without
eps_lock, so a cancel path holding eps_lock can still observe
priv->req non-NULL, race a concurrent completion that clears and
frees, and pass the freed pointer to usb_ep_dequeue(). A slightly
longer fix that moves the free into the cleanup work is needed.
Same class of lifetime race as the recent usbip-vudc timer fix [1].
Take eps_lock in the sole place that mutates priv->req from the
callback direction by moving usb_ep_free_request() out of the
completion into ffs_dmabuf_cleanup(), the existing work handler
scheduled by ffs_dmabuf_signal_done() on
ffs->io_completion_wq. Clear priv->req there under eps_lock
before freeing, and only clear if priv->req still names our
request (a subsequent ffs_dmabuf_transfer() on the same
attachment may have queued a new one).
This keeps the existing dummy_hcd sync-dequeue invariant: the
completion callback is still invoked by the UDC without
eps_lock held (dummy_hcd drops its own lock before calling the
callback), and the callback now takes no f_fs lock at all.
Serialization against the cancel path happens in cleanup, which
runs from the workqueue with no f_fs lock held on entry.
The priv ref count protects the containing ffs_dmabuf_priv:
ffs_dmabuf_transfer() takes a ref via ffs_dmabuf_get(), cleanup
drops it via ffs_dmabuf_put(), so priv stays live for the
cleanup even after the cancel path's list_del + ffs_dmabuf_put.
The ffs_dmabuf_transfer() error path no longer frees usb_req
inline: fence->req and fence->ep are set before usb_ep_queue(),
so ffs_dmabuf_cleanup() (scheduled by the error-path
ffs_dmabuf_signal_done()) owns the free regardless of whether
the queue succeeded.
Reproduced under KASAN on both detach and close paths against
dummy_hcd with an observability hook
(kasan_check_byte(priv->req) immediately before usb_ep_dequeue)
at the two FunctionFS cancel sites to surface the stale-pointer
access; the hook is not part of this patch. The KASAN
allocator / free stacks in the captured splats identify the
same request: alloc in dummy_alloc_request, free in
dummy_timer, fault reached from ffs_epfile_release (close) and
from the FUNCTIONFS_DMABUF_DETACH ioctl (detach). With the
patch applied, both paths are silent under the same hook.
The bug is reached from the FunctionFS device node, which in
real deployments is owned by the privileged gadget daemon
(adbd, UMS, composite gadget services, etc.); it is not
reachable from unprivileged userspace or from a USB host on the
cable. FunctionFS mounts default to GLOBAL_ROOT_UID, but the
filesystem supports uid=, gid=, and fmode= delegation to a
non-root gadget daemon, so on real deployments the attacker may
be a less-privileged service rather than root.
usb: gadget: f_fs: copy only received bytes on short ep0 read
ffs_ep0_read() allocates its control-OUT data buffer with
kmalloc() (not kzalloc) at the Length value from the Setup
packet, then copies that full len to userspace regardless of
how many bytes were actually received:
data = kmalloc(len, GFP_KERNEL);
...
ret = __ffs_ep0_queue_wait(ffs, data, len);
if ((ret > 0) && (copy_to_user(buf, data, len)))
ret = -EFAULT;
__ffs_ep0_queue_wait() returns req->actual, which on a short
control OUT transfer is strictly less than len. The
copy_to_user() call still copies len bytes, so on a short OUT
the last (len - ret) bytes of the kmalloc() buffer --
uninitialised slab residue -- are delivered to the FunctionFS
daemon.
Short ep0 OUT completions are specified USB control-transfer
behavior and are produced by in-tree UDCs:
* dwc2 continues on req->actual < req->length for ep0 DATA OUT
(short-not-ok is the only ep0-OUT stall path).
* aspeed_udc ends ep0 OUT on rx_len < ep->ep.maxpacket.
* renesas_usbf logs "ep0 short packet" and completes the
request.
* dwc3 stalls on short IN but not on short OUT.
A short ep0 OUT is therefore not evidence of a broken UDC; it is
a normal condition f_fs has to cope with. The sibling gadgetfs
implementation in drivers/usb/gadget/legacy/inode.c already does
this correctly via min(len, dev->req->actual) before
copy_to_user(). This patch brings f_fs.c to the same safe
pattern rather than trimming at a defensive layer.
The bug is reached from the FunctionFS device node, which in
real deployments is owned by the privileged gadget daemon
(adbd, UMS, composite gadget services, etc.); it is not
reachable from unprivileged userspace. Linux host stacks
normally reject short-wLength control OUTs before they reach
the gadget, so reproducing this required a build that
bypasses that host-side check. With the bypass in place, a
1-byte payload on a 64-byte Setup produces 63 bytes of
non-canary slab residue in the daemon's read buffer.
Fix by copying only ret (actually received) bytes to
userspace.
Seungjin Bae [Mon, 18 May 2026 23:43:14 +0000 (19:43 -0400)]
usb: gadget: dummy_hcd: Reject hub port requests for non-existent ports
The `dummy_hub_control()` function handles USB hub class requests
to the virtual root hub. The `GetPortStatus` case returns -EPIPE for
requests with `wIndex != 1`, since the virtual root hub has only a
single port. However, the `ClearPortFeature` and `SetPortFeature`
cases lack the same check.
Fix this by extending the `wIndex != 1` rejection to both cases,
matching the existing behavior of `GetPortStatus`.
Hang Cao [Wed, 15 Apr 2026 06:42:38 +0000 (14:42 +0800)]
dt-bindings: usb: Fix EIC7700 USB reset's issue
The EIC7700 USB requires a USB PHY reset operation; otherwise, the USB
will not work. The reason why the USB driver that was applied can work
properly is that the USB PHY has already been reset in ESWIN's U-Boot.
However, the proper functioning of the USB driver should not be dependent
on the bootloader. Therefore, it is necessary to incorporate the USB PHY
reset signal into the DT bindings.
This patch does not introduce any backward incompatibility since the dts
is not upstream yet. As array of reset operations are used in the driver,
no modifications to the USB controller driver are needed.
Fixes: c640a4239db5 ("dt-bindings: usb: Add ESWIN EIC7700 USB controller") Cc: stable <stable@kernel.org> Signed-off-by: Hang Cao <caohang@eswincomputing.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20260415064238.1784-1-caohang@eswincomputing.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usbip: vudc: Fix use after free bug in vudc_remove due to race condition
This patch follows up Zheng Wang's 2023 report of a use-after-free in
vudc_remove(). The original thread stalled on Shuah Khan's request for
runtime testing of the unplug/unbind path. This patch supplies that
testing and keeps Zheng's original fix shape.
In vudc_probe(), v_init_timer() binds udc->tr_timer.timer to v_timer().
usbip_sockfd_store() starts the timer via v_start_timer()/v_kick_timer().
vudc_remove() can then free the containing struct vudc while the timer is
still pending or executing.
KASAN confirms the race on an unpatched x86_64 QEMU guest with
CONFIG_KASAN=y, CONFIG_USBIP_VUDC=y, CONFIG_USB_ZERO=y, and a tight loop
that repeatedly writes a socket fd to usbip_sockfd, closes the socket
pair, and unbinds/rebinds usbip-vudc.0:
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x8ba/0x8e0
Write of size 8 at addr ffff888001b80740 by task trigger_and_unb/239
Allocated by task 239:
vudc_probe+0x4d/0xaa0
Freed by task 239:
kfree+0x18f/0x520
device_release_driver_internal+0x388/0x540
unbind_store+0xd9/0x100
This lands in the timer core rather than v_timer() itself because the
embedded timer_list is being walked after its containing struct vudc has
already been freed. The underlying lifetime bug is the same one Zheng
reported.
With v_stop_timer() called from vudc_remove() and the timer deleted
synchronously, the same harness completed 5000 bind/unbind iterations
with no KASAN report.
dt-bindings: usb: ti,omap4-musb: Drop duplicate 'usb-phy' property constraints
The deprecated 'usb-phy' property is documented already in usb.yaml and
doesn't need a type definition here. It just needs constraints on how
many entries there are.
As this is a host controller, reference usb-hcd.yaml which then
references usb.yaml.
Fixes: 70fcdc82cf4a ("dt-bindings: usb: ti,omap4-musb: convert to DT schema") Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260508182556.1759173-1-robh@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sam Burkels [Fri, 1 May 2026 13:23:46 +0000 (15:23 +0200)]
usb: storage: Add quirks for PNY Elite Portable SSD
The PNY Elite Portable SSD (USB ID 154b:f009) is a sibling of the
already-quirked PNY Pro Elite SSDs (154b:f00b and 154b:f00d). Like its
siblings, it uses a Phison-based USB-SATA bridge that exhibits
firmware bugs when bound to the uas driver.
Without quirks, the device fails to complete READ CAPACITY commands
when accessed over UAS on a SuperSpeed (USB 3) port. The device
enumerates and reports as a SCSI direct-access device, but reports
zero logical blocks and never finishes spin-up:
usb 2-3: new SuperSpeed USB device number 8 using xhci_hcd
usb 2-3: New USB device found, idVendor=154b, idProduct=f009
usb 2-3: Product: PNY ELITE PSSD
usb 2-3: Manufacturer: PNY
scsi host0: uas
scsi 0:0:0:0: Direct-Access PNY PNY ELITE PSSD 0
sd 0:0:0:0: [sda] Spinning up disk...
[...10+ seconds of polling, no progress...]
sd 0:0:0:0: [sda] Read Capacity(16) failed: hostbyte=DID_ERROR
sd 0:0:0:0: [sda] Read Capacity(10) failed: hostbyte=DID_ERROR
sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
Tested each individual quirk to find the minimum that fixes this:
- US_FL_NO_ATA_1X alone: device hangs on spin-up
- US_FL_NO_REPORT_OPCODES alone: works on USB 2.0, hangs on USB 3.0
- US_FL_NO_ATA_1X | US_FL_NO_REPORT_OPCODES: works on both
With both quirks the device enumerates correctly while still using
the uas driver, and delivers full UAS throughput (~281 MB/s
sequential read on a USB 3.0 Gen 1 port).
The existing PNY Pro Elite entries (f00b, f00d) only set NO_ATA_1X,
but this device additionally chokes on REPORT OPCODES under
SuperSpeed.
Signed-off-by: Sam Burkels <sam@1a38.nl> Acked-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@kernel.org> Link: https://patch.msgid.link/20260501132346.86572-1-sam@1a38.nl Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen J. Fuhry [Wed, 13 May 2026 17:14:19 +0000 (13:14 -0400)]
USB: quirks: add NO_LPM for Lenovo ThinkPad USB-C Dock Gen2 hub controllers
The Lenovo ThinkPad USB-C Dock Gen2 (17ef:a391, 17ef:a392) hub
controllers exhibit link instability when USB Link Power Management
is enabled, similar to the dock's Ethernet adapter (17ef:a387) which
already carries USB_QUIRK_NO_LPM.
When the dock reconnects after a transient disconnect, the hub
controllers enter LPM states between re-enumeration retries, causing
repeated disconnect/reconnect cycles lasting up to two minutes.
Disabling LPM for these devices restores stable enumeration.
usb: usbtmc: reject interrupt endpoints with small wMaxPacketSize
The USB488 subclass specification requires interrupt wMaxPacketSize to
be 0x02, unless the device sends vendor-specific notifications.
Endpoints that advertise less than 2 bytes for wMaxPacketSize are
unlikely to work with the current driver, as URBs will not have enough
space for interrupt headers. Considering that any notification URBs will
be ignored by the driver, reject these endpoints early during probe.
usb: usbtmc: check URB actual_length for interrupt-IN notifications
USBTMC devices can use an optional interrupt endpoint for notification
messages. These typically contain two-byte headers indicating the
payload format, but the driver does not check if these headers are
present before accessing the data buffers. In cases where the URB
actual_length is not enough to fit these headers, the driver will either
cause an out-of-bounds read, or consume stale leftover data from a
previous notification.
Fix by checking if actual_data contains enough bytes for the headers,
otherwise resubmit URB to the interrupt endpoint.
Fixes: dbf3e7f654c0 ("Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation.") Reported-by: syzbot+abbfd103085885cf16a2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=abbfd103085885cf16a2 Cc: stable <stable@kernel.org> Suggested-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> Link: https://patch.msgid.link/20260505-usbtmc-iin-size-v3-1-a36113f62db7@igalia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wei-Cheng Chen [Tue, 5 May 2026 11:26:30 +0000 (19:26 +0800)]
xhci: tegra: Fix ghost USB device on dual-role port unplug
When a USB device is unplugged from the dual-role port, the device-mode
path in tegra_xhci_id_work() explicitly clears both SS and HS port power
via direct hub_control ClearPortFeature(POWER) calls. This preempts the
xHCI controller's normal disconnect processing -- PORT_CSC is never
generated, the USB core never sees the disconnect, and the device remains
in its internal tree as a ghost visible in lsusb.
Add an otg_set_port_power flag to control whether the dual-role switch
path performs explicit port power management. SoCs that need it
(Tegra124 / Tegra210 / Tegra186) set the flag; later SoCs (Tegra194 and
beyond) rely on the PHY mode change to handle disconnect naturally and
skip all port power calls.
Within the port power path, otg_reset_sspi additionally gates the SSPI
reset sequence on host-mode entry for SoCs that require it.
Flags set per SoC:
Tegra124, Tegra186 -> otg_set_port_power
Tegra210 -> otg_set_port_power, otg_reset_sspi
Tegra194 and later -> (none)
Kai Aizen [Thu, 30 Apr 2026 17:56:43 +0000 (20:56 +0300)]
usb: gadget: uvc: hold opts->lock across XU walks in uvc_function_bind
uvc_function_bind() walks &opts->extension_units twice without holding
opts->lock:
- directly, for the iExtension string-descriptor fixup loop;
- indirectly, four times via uvc_copy_descriptors() (once per speed),
where the helper iterates uvc->desc.extension_units (which aliases
&opts->extension_units) to size and emit XU descriptors.
The configfs side (uvcg_extension_make / uvcg_extension_drop, in
drivers/usb/gadget/function/uvc_configfs.c) takes opts->lock around its
list_add_tail / list_del operations. A privileged userspace process
that holds the configfs subtree open and writes the gadget UDC name
to bind the function while concurrently rmdir()'ing an extensions
subdir can race uvcg_extension_drop() against the bind-time list walks
and dereference a freed struct uvcg_extension.
Hold opts->lock from the start of the XU string-descriptor fixup
through the last uvc_copy_descriptors() call, releasing on the
descriptor-error path via a new error_unlock label that drops the
lock before falling through to the existing error label. This
matches the locking discipline of the configfs callbacks and removes
the only remaining unsynchronised reader of the XU list during bind.
Reachability: only privileged processes that can mount configfs and
write to gadget UDC files can trigger the race, so this is a
correctness fix rather than a security boundary.
Fixes: 0525210c9840 ("usb: gadget: uvc: Allow definition of XUs in configfs") Cc: stable <stable@kernel.org> Signed-off-by: Kai Aizen <kai.aizen.dev@gmail.com> Link: https://patch.msgid.link/20260430175643.67120-1-kai.aizen.dev@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Guangshuo Li [Mon, 27 Apr 2026 15:36:51 +0000 (23:36 +0800)]
usb: gadget: net2280: Fix double free in probe error path
usb_initialize_gadget() installs gadget_release() as the release
callback for the embedded gadget device. The struct net2280 instance is
therefore released through gadget_release() when the gadget device's last
reference is dropped.
The probe error path calls net2280_remove(), which tears down the
partially initialized device and drops the gadget reference with
usb_put_gadget(). Calling kfree(dev) afterwards can free the same object
again.
Drop the explicit kfree() and let the gadget device release callback
handle the final free. This issue was found by a static analysis tool
I am developing.
Guangshuo Li [Mon, 13 Apr 2026 14:21:19 +0000 (22:21 +0800)]
usb: gadget: f_hid: fix device reference leak in hidg_alloc()
hidg_alloc() initializes hidg->dev with device_initialize() before
calling dev_set_name(). If dev_set_name() fails, the function currently
jumps to err_unlock and returns without calling put_device().
This leaves the device reference unbalanced and prevents hidg_release()
from being called. Calling put_device() here is also safe, since
hidg_release() only frees resources owned by hidg.
The issue was identified by a static analysis tool I developed and
confirmed by manual review.
Route the dev_set_name() failure path through err_put_device so the
device reference is dropped properly.
Fixes: 89ff3dfac604 ("usb: gadget: f_hid: fix f_hidg lifetime vs cdev") Cc: stable <stable@kernel.org> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Reviewed-by: Johan Hovold johan@kernel.org Link: https://patch.msgid.link/20260413142119.2977716-1-lgs201920130244@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb: musb: omap2430: Fix use-after-free in omap2430_probe()
In omap2430_probe(), of_node_put(np) is called prematurely before the
last access to np, leading to a use-after-free if the node's reference
count drops to zero. Move the of_node_put() calls after the last use of
np in both the success and error paths.
The ESP out-of-place fast path appends the trailer in esp_output_head()
before esp_output_tail() allocates the destination page frag. The
head-side gate currently checks skb->data_len and tailen separately, but
the tail code allocates a single destination frag from the combined
post-trailer skb->data_len.
Reject the page-frag fast path when the combined aligned length exceeds a
page. Otherwise skb_page_frag_refill() may fall back to a single page while
the destination sg still spans the combined skb->data_len.
Restore this combined-length page gate for both IPv4 and IPv6.
Fixes: 5bd8baab087d ("esp: limit skb_page_frag_refill use to a single page") Cc: stable@vger.kernel.org Signed-off-by: Lin Ma <malin89@huawei.com> Signed-off-by: Chenyuan Mi <michenyuan@huawei.com> Signed-off-by: Jingguo Tan <tanjingguo@huawei.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
e521588 [Wed, 20 May 2026 07:27:17 +0000 (09:27 +0200)]
esp: fix page frag reference leak on skb_to_sgvec failure
In esp_output_tail(), when esp->inplace is false, the old skb page frags
are replaced with a new page from the xfrm page_frag cache. The source
scatterlist (sg) is built from the old frags before the replacement, and
esp_ssg_unref() is responsible for releasing the old page references
after the crypto operation completes.
However, if the second skb_to_sgvec() call (which builds the destination
scatterlist from the new page) fails, the code jumps to error_free which
only calls kfree(tmp). The old page frag references captured in the
source scatterlist are never released:
1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page)
2. nr_frags is set to 1 and frag[0] is replaced with the new page
3. Second skb_to_sgvec() fails -> goto error_free
4. kfree(tmp) frees the sg[] memory but old frags are not unref'd
5. kfree_skb() only releases frag[0] (the new page), not the old ones
Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
unconditionally unrefs the source scatterlist frags without checking
req->src and req->dst, since those fields are not yet initialized by
aead_request_set_crypt() at the point of the error. Existing callers
pass false to preserve the original behavior.
The same issue exists in both esp4 and esp6 as the code is identical.
Bibo Mao [Fri, 22 May 2026 07:05:12 +0000 (15:05 +0800)]
LoongArch: KVM: Move some variable declarations to paravirt.h
Some variables relative with paravirt feature are declared in the header
file asm/qspinlock.h, however this file can be included only when option
CONFIG_SMP is on. There is compiling warnings if CONFIG_SMP is off since
variables are not declared.
Move these variable declarations to header file asm/paravirt.h to avoid
compiling warnings.
Fixes: c43dce6f13fb ("LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605061313.O8Hswm2b-lkp@intel.com/ Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Fri, 22 May 2026 07:05:07 +0000 (15:05 +0800)]
LoongArch: kprobes: Fix handling of fatal unrecoverable recursions
KPROBE_HIT_SS and KPROBE_REENTER are two types of fatal recursions that
can not be safely recovered in kprobes.
KPROBE_HIT_SS means that a kprobe is hit during single-stepping. At
this point, the architecture-specific single-step context is already
active. Nested single-stepping would corrupt the state, as the kprobe
control block (kcb) and hardware registers cannot safely store multiple
levels of stepping state.
KPROBE_REENTER means that a third-level recursion occurs when a probe
is hit while the system is already handling a nested probe (second-
level). The kcb only provides a single slot (prev_kprobe) to backup the
state. When a third probe is hit, there is no more space to save the
state without corrupting the first-level backup.
Kprobes work by replacing instructions with breakpoints. In order to
execute the original instruction and continue, it must be moved to a
temporary "single-step" slot. Since there is no backup space left to
set up this slot safely, the CPU would be forced to return to the same
original breakpoint address, triggering an endless loop.
Currently, the code only prints a warning and returns. This leads to
an infinite re-entry loop as the CPU repeatedly hits the same trap and
a "stuck" CPU core because preemption was disabled at the start of the
handler and never re-enabled in this early return path.
Fix the logic by:
1. Merging KPROBE_HIT_SS and KPROBE_REENTER cases, as both represent
fatal recursions that cannot be safely recovered.
2. Replacing WARN_ON_ONCE() with BUG() to terminate the system. This
aligns LoongArch with other architectures (x86, arm64, riscv) and
prevents stack overflow while providing diagnostic information.
Tiezhu Yang [Fri, 22 May 2026 07:05:07 +0000 (15:05 +0800)]
LoongArch: kprobes: Use larch_insn_text_copy() to patch instructions
On SMP systems, kprobe handlers would occasionally fail to execute on
certain CPU cores. The issue is hard to reproduce and typically occurs
randomly under high system load.
The root cause is a software-side instruction hazard. According to the
LoongArch Reference Manual, while the cache coherency is maintained by
hardware, software must explicitly use the "IBAR" instruction to ensure
the instruction fetch unit (IFU) observes the effects of recent stores.
The current arch_arm_kprobe() and arch_disarm_kprobe() only execute the
"IBAR" barrier (via flush_insn_slot -> local_flush_icache_range) on the
local CPU. This leaves a vulnerable window where remote CPU cores may
continue executing stale instructions from their pipelines or prefetch
buffers, as they have not executed an "IBAR" since the code modification.
Switch to larch_insn_text_copy() to fix this:
1. Synchronization: It uses stop_machine_cpuslocked() to synchronize all
online CPUs, ensuring no CPU is executing the target code area during
modification.
2. Visibility: By passing cpu_online_mask to stop_machine_cpuslocked(),
the callback text_copy_cb() is executed on all online cores. Each CPU
core invokes local_flush_icache_range() to execute "IBAR", clearing
instruction hazards system-wide and ensuring the "break" instruction
is visible to the fetch units of all cores.
3. Robustness: It properly manages memory write permissions (ROX/RW) for
the kernel text segment during patching, ensuring compatibility with
CONFIG_STRICT_KERNEL_RWX.
Takashi Iwai [Fri, 22 May 2026 06:25:18 +0000 (08:25 +0200)]
Merge tag 'asoc-fix-v7.1-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v7.1
A bigger batch of fixes than usual due to -next not happeing last week,
this is mostly stuff for laptops - a lot of quirks and small fixes,
mainly for x86 and SoundWire. Nothing too big or exciting individually,
just two week's worth.
Byungchul Park [Fri, 15 May 2026 03:47:01 +0000 (12:47 +0900)]
Revert "mm: introduce a new page type for page pool in page type"
This reverts commit db359fccf212 ("mm: introduce a new page type for page
pool in page type") and a part of 735a309b4bfb9e ("net: add net_iov_init()
and use it to initialize ->page_type").
Netpp page_type'ed pages might be used in mapping so as to use @_mapcount.
However, since @page_type and @_mapcount are union'ed in struct page,
these two can't be used at the same time. Revert the commit introducing
page_type for Netpp for now.
The patch will be retried once @page_type and @_mapcount get allowed to be
used at the same time.
The revert also includes removal of @page_type initialization part
introduced by commit 735a309b4bfb9e ("net: add net_iov_init() and use it
to initialize ->page_type"), which will be restored on the retry.
Link: https://lore.kernel.org/20260515034701.17027-1-byungchul@sk.com Fixes: db359fccf212 ("mm: introduce a new page type for page pool in page type") Signed-off-by: Byungchul Park <byungchul@sk.com> Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Closes: https://lore.kernel.org/all/982b9bc1-0a0a-4fc5-8e3a-3672db2b29a1@nvidia.com Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Mark Bloch <mbloch@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Toke Hoiland-Jorgensen <toke@redhat.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sunny Patel [Fri, 1 May 2026 11:51:16 +0000 (17:21 +0530)]
mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page
When migrate_vma_insert_huge_pmd_page() jumps to unlock_abort due
to a PMD check failure, the pgtable allocated earlier via
pte_alloc_one() is never freed, causing a memory leak.
Added free_abort label to release the pgtable in error path.
Link: https://lore.kernel.org/20260501115122.23288-1-nueralspacetech@gmail.com Fixes: a30b48bf1b24 ("mm/migrate_device: implement THP migration of zone device pages") Signed-off-by: Sunny Patel <nueralspacetech@gmail.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kernel/fork: validate exit_signal in kernel_clone()
When a child process exits, it sends exit_signal to its parent via
do_notify_parent(). The clone() syscall constructs exit_signal as:
(lower_32_bits(clone_flags) & CSIGNAL)
CSIGNAL is 0xff, so values in the range 65-255 are possible. However,
valid_signal() only accepts signals up to _NSIG (64 on x86_64). A
non-zero non-valid exit_signal acts the same as exit_signal == 0: the
parent process is not signaled when the child terminates.
The syzkaller reproducer triggers this by calling clone() with flags=0x80,
resulting in exit_signal = (0x80 & CSIGNAL) = 128, which exceeds _NSIG and
is not a valid signal.
The v1 of this patch added the check only in the clone() syscall handler,
which is incomplete. kernel_clone() has other callers such as
sys_ia32_clone() which would remain unprotected. Move the check to
kernel_clone() to cover all callers.
Since the valid_signal() check is now in kernel_clone() and covers all
callers including clone3(), the same check in copy_clone_args_from_user()
becomes redundant and is removed. The higher 32bits check for clone3() is
kept as it is clone3() specific.
Note that this is a user-visible change: previously, passing an invalid
exit_signal to clone() was silently accepted. The man page for clone()
does not document any defined behavior for invalid exit_signal values, so
rejecting them with -EINVAL is the correct behavior. It is unlikely that
any sane application relies on passing an invalid exit_signal.
[oleg@redhat.com: the comment above kernel_clone() should be updated] Link: https://lore.kernel.org/abwvgU17W8wuW2-J@redhat.com Link: https://lore.kernel.org/20260316151956.563558-1-kartikey406@gmail.com Fixes: 3f2c788a1314 ("fork: prevent accidental access to clone3 features") Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bbe6b99feefc3a0842de Tested-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/20260307064202.353405-1-kartikey406@gmail.com/T/ Link: https://lore.kernel.org/all/20260316104536.558108-1-kartikey406@gmail.com/T/ Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ben Segall <bsegall@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexandre Ghiti [Mon, 18 May 2026 08:28:19 +0000 (10:28 +0200)]
mm: memcontrol: propagate NMI slab stats to memcg vmstats
flush_nmi_stats() drains per-node NMI slab atomics into the per-node
lruvec_stats, but does not propagate them to the memcg-level vmstats.
For non NMI case, account_slab_nmi_safe() calls mod_memcg_lruvec_state()
which updates both per-node lruvec_stats and memcg-level vmstats, so
flush_nmi_stats() needs to flush to per-node lruvec_stats as well as
memcg-level vmstats.
So fix this by flushing to the memcg-level vmstats for NMI too.
Link: https://lore.kernel.org/20260518082830.599102-1-alex@ghiti.fr Fixes: 940b01fc8dc1 ("memcg: nmi safe memcg stats for specific archs") Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Mon, 18 May 2026 15:25:58 +0000 (08:25 -0700)]
mm/damon/sysfs-schemes: delete tried region in regions_rmdirs()
DAMON sysfs maintains the DAMOS tried region directory objects via a
linked list. When the user requests refresh of the directories, DAMON
sysfs removes all the region directories first, and then generate updated
regions directory on the empty space. The removal function
(damon_sysfs_scheme_regions_rm_dirs()) only puts the kobj objects.
Deletion of the container region object from the linked list is done
inside the kobj release callback function.
If somehow the callback invocation is delayed, the list will contain
regions list that gonna be freed. If the updated region directories
creation is started in this situation, the list can be corrupted and
use-after-free can happen.
Because the kobj objects are managed by only DAMON sysfs, the issue cannot
happen in normal situation. But, such delays can be made on kernels that
built with CONFIG_DEBUG_KOBJECT_RELEASE. On the kernel, the issue can
indeed be reproduced like below.
# damo start --damos_action stat
# cd /sys/kernel/mm/damon/admin/kdamonds/0/
# for i in {1..10}; do echo update_schemes_tried_regions > state; done
# dmesg | grep underflow
[ 89.296152] refcount_t: underflow; use-after-free.
Fix the issue by removing the region object from the list when
decrementing the reference count.
Also update damos_sysfs_populate_region_dir() to add the region object to
the list only after the kobject_init_and_add() is success, so that fail of
kobject_init_and_add() is not leaving the deallocated object on the list.
Dev Jain [Mon, 18 May 2026 06:36:56 +0000 (12:06 +0530)]
mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one
Initialize nr_pages to 1 at the start of each loop iteration, like
folio_referenced_one() does.
Without this, nr_pages computed by a previous folio_unmap_pte_batch() call
can be reused on a later iteration that does not run
folio_unmap_pte_batch() again.
mmap a 64K large folio with MAP_ANONYMOUS | MAP_DROPPABLE, then call
madvise(MADV_FREE), then make the last page device-exclusive via
HMM_DMIRROR_EXCLUSIVE.
Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will
first clear the first 15 out of 16 entries mapping the lazyfree folio.
This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets
reused on a device-exclusive pte, thus potentially corrupting folio
refcount/mapcount.
At the moment, I have a userspace program which can make the kernel spit
out a trace, but the blow up is in folio_referenced_one(), because there
are existing bugs in the interaction between device-private and rmap
(which too I am investigating). I did a one liner kernel change to avoid
going into folio_referenced_one(), and the kernel blows up at
folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted.
Note that the bug is there not since file folio batching but lazyfree
folio batching, since device-exclusive only works for anonymous folios.
Userspace visible effect is simply kernel crashing somewhere due to
refcount/mapcount corruption.
Link: https://lore.kernel.org/20260518063656.3721056-1-dev.jain@arm.com Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation") Signed-off-by: Dev Jain <dev.jain@arm.com> Acked-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Harry Yoo <harry@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Richard Chang [Tue, 12 May 2026 07:49:18 +0000 (07:49 +0000)]
zram: fix use-after-free in zram_writeback_endio
A crash was observed in zram_writeback_endio due to a NULL pointer
dereference in wake_up. The root cause is a race condition between the
bio completion handler (zram_writeback_endio) and the writeback task.
In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after
releasing wb_ctl->done_lock. This creates a race window where the
writeback task can see num_inflight become 0, return, and free wb_ctl
before zram_writeback_endio calls wake_up().
This patch fixes this race by using RCU. By protecting wb_ctl with
rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free it,
we ensure that wb_ctl remains valid during the execution of
zram_writeback_endio.
Link: https://lore.kernel.org/20260512074918.2606208-1-richardycc@google.com Fixes: f405066a1f0d ("zram: introduce writeback bio batching") Signed-off-by: Richard Chang <richardycc@google.com> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Martin Liu <liumartin@google.com> Cc: wang wei <a929244872@163.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
memfd: deny writeable mappings when implying SEAL_WRITE
When SEAL_EXEC is added, SEAL_WRITE is implied to make W^X. But the
implied seal is set after the check that makes sure the memfd can not have
any writable mappings. This means one can use SEAL_EXEC to apply
SEAL_WRITE while having writeable mappings.
This breaks the contract that SEAL_WRITE provides and can be used by an
attacker to pass a memfd that appears to be write sealed but can still be
modified arbitrarily.
Fix this by adding the implied seals before the call for
mapping_deny_writable() is done.
Linpu Yu [Sun, 10 May 2026 05:43:30 +0000 (13:43 +0800)]
ipc: limit next_id allocation to the valid ID range
The checkpoint/restore sysctl path can request the next SysV IPC id
through ids->next_id. ipc_idr_alloc() currently forwards that request to
idr_alloc() with an open-ended upper bound.
If the valid tail of the SysV IPC id space is full, the allocation can
spill beyond ipc_mni. The returned SysV IPC id still uses the normal
index encoding, so later lookup and removal can target the wrong slot.
This leaves the real IDR entry behind and breaks the IDR state for the
object.
The bug is in ipc_idr_alloc() in the checkpoint/restore path.
2. The zero upper bound makes the allocation effectively open-ended.
Once the valid SysV IPC tail is occupied, idr_alloc() can spill past
ipc_mni and allocate an entry beyond the valid IPC id range.
3. The new object id is still encoded with the narrower SysV IPC index
width:
new->id = (new->seq << ipcmni_seq_shift()) + idx
4. Later removal goes through ipc_rmid(), which uses:
ipcid_to_idx(ipcp->id)
That truncates the real IDR index. An object actually stored at a
high index can then be removed as if it lived at a low in-range
index.
5. For shared memory, shm_destroy() frees the current object anyway, but
the real high IDR slot is left behind as a dangling pointer.
6. A subsequent walk of /proc/sysvipc/shm reaches the stale IDR entry
and dereferences freed memory.
Prevent this by bounding the requested allocation to ipc_mni so the
checkpoint/restore path fails once the valid range is exhausted.
Link: https://lore.kernel.org/cover.1778336914.git.linpu5433@gmail.com Link: https://lore.kernel.org/2eebe949bfa7d1f6e13b5be6a92c64c850ce9d45.1778336914.git.linpu5433@gmail.com Fixes: 03f595668017 ("ipc: add sysctl to specify desired next object id") Signed-off-by: Linpu Yu <linpu5433@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Cc: Kees Cook <kees@kernel.org> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Tue, 12 May 2026 16:06:43 +0000 (17:06 +0100)]
Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare"
This reverts commit ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use
mmap_prepare") with conflict resolution to account for changes in commit ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare").
The patch incorrectly handled hugetlb VMA lock allocation at the
mmap_prepare stage, where a failed allocation occurring after mmap_prepare
is called might result in the lock leaking.
There is no risk of a merge causing a similar issues, as
VMA_DONTEXPAND_BIT is set for hugetlb mappings.
As a first step in addressing this issue, simply revert the change so we
can rework how we do this having corrected the underlying issues.
We maintain the VMA flags changes as best we can, accounting for the fact
that we were working with a VMA descriptor previously and propagating
like-for-like changes for this.
Note that we invoke vma_set_flags() and do not call vma_start_write() as
vm_flags_set() does. This is OK as it's being done in an .mmap hook where
the VMA is not yet linked into the tree so nobody else can be accessing
it.
Link: https://lore.kernel.org/20260512160643.266960-1-ljs@kernel.org Fixes: ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reported-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Closes: https://lore.kernel.org/linux-mm/20260425070700.562229-1-25181214217@stu.xidian.edu.cn/ Acked-by: Muchun Song <muchun.song@linux.dev> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ian Ray [Wed, 6 May 2026 06:33:35 +0000 (09:33 +0300)]
MAINTAINERS: .mailmap: update after GEHC spin-off
Update my email address from @ge.com to @gehealthcare.com after GE
HealthCare was spun-off from GE.
Link: https://lore.kernel.org/20260506063335.3-1-ian.ray@gehealthcare.com Signed-off-by: Ian Ray <ian.ray@gehealthcare.com> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Cc: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>